VQEG Column: VQEG Meeting Jun. 2021 (virtual/online)

Introduction

Welcome to the fifth column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG).
The last VQEG plenary meeting took place online from 7 to 11 June 2021. As the previous meeting celebrated in December 2020, it was organized online (this time by Kingston University) with multiple sessions spread over five days, allowing remote participation of people from 22 different countries of America, Asia, and Europe. More than 100 participants registered to the meeting and they could attend the 40 presentations and several discussions that took place in all working groups. 
This column provides an overview of the recently completed VQEG plenary meeting, while all the information, minutes and files (including the presented slides) from the meeting are available online in the VQEG meeting website

Group picture of the VQEG Meeting 7-11 June 2021.

Several interesting presentations of state-of-the-art works can be of interest to the SIGMM community, in addition to the contributions to several working items of ITU from various VQEG groups. The progress on the new activities launched in the last VQEG plenary meeting (in relation to Live QoE assessment, SI/TI clarification, implementers guide for video quality metrics for coding applications, and the inclusion of video quality metrics as metadata in compressed streams), as well as the proposal for a new joint work on evaluation of immersive communication systems from a task-based or interactive perspective within the Immersive Media Group.

We encourage those readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Overview of VQEG Projects

Audiovisual HD (AVHD)

AVHD group works on improved subjective and objective methods for video-only and audiovisual quality of commonly available systems. Currently, after the project AVHD/P.NATS2 (a joint collaboration between VQEG and ITU SG12) finished in 2020 [1], two projects are ongoing within AVHD group: QoE Metrics for Live Video Streaming Applications (Live QoE), which was launched in the last plenary meeting, and Advanced Subjective Methods (AVHD-SUB).
The main discussion during the AVHD sessions was related to the Live QoE project, which was led by Shahid Satti (Opticom) and Rohit Puri (Twitch). In addition to the presentation of the project proposal, the main decisions reached until now were exposed (e.g., use of videos of 20-30 seconds with resolution 1080p and framerates up to 60fps, use ACR as subjective test methodology, generation of test conditions, etc.), as well as open questions were brought up for discussion, especially in relation to how to acquire premium content and network traces. 
In addition to this discussion, Steve Göring (TU Ilmenau) presented and open-source platform (AVrate Voyager) for crowdsourcing/online subjective tests [2], and Shahid Satti (Opticom) presented the performance results of the Opticom models on the project AVHD/P.NATS Phase 2. Finally, Ioannis Katsavounidis (Facebook) presented the subjective testing validation of the AV1 performance from the Alliance for Open Media (AOM) to gather feedback on the test plan and possible interested testing labs from VQEG. It is also worth noting that this session was recorded to be used as raw multimedia data for the Live QoE project. 

Quality Assessment for Health applications (QAH)

The session related to the QAH group group allocated three presentations apart from the project summary provided by Lucie Lévêque (Polytech Nantes). In particular, Meriem Outtas (INSA Rennes) provided a review on objective quality assessment of medical images and videos. This is is one of the topics jointly addressed by the group, which is working on an overview paper in line with the recent review on subjective medical image quality assessment [3]. Moreover, Zohaib Amjad Khan (Université Sorbonne Paris Nord) presented a work on video quality assessment of laparoscopic videos, while Aditja Raj and Maria Martini (Kingston University) presented their work on multivariate regression-based convolutional neural network model for fundus image quality assessment.

Statistical Analysis Methods (SAM)

The SAM session consisted of three presentations followed by discussions on the topics. One of this was related to the description of subjective experiment consistency by p-value p-p plot [4], which was presented by Jakub Nawała (AGH University of Science and Technology). In addition, Zhi Li (Netflix) and Rafał Figlus (AGH University of Science and Technology) presented the progress on the contribution from SAM to the ITU-T to modify the recommendation P.913 to include the MLE model for subject behavior in subjective experiments [5] and the recently available implementation of this model in Excel. Finally, Pablo Pérez (Nokia Bell Labs) and Lucjan Janowski (AGH University of Science and Technology) presented their work on the possibility of performing subjective experiments with four subjects [6].

Computer Generated Imagery (CGI)

Nabajeet Barman (Kingston University) presented a report on the current activities of the CGI group. The main current working topics are related to gaming quality assessment methodologies and quality prediction, and codec comparison for CG content. This group is closely collaborating with the ITU-T SG12, as reflected by its support on the completion of the 3 work items: ITU-T Rec. G.1032 on influence factors on gaming quality of experience, ITU-T Rec. P.809 on subjective evaluation methods for gaming quality, and ITU-T Rec. G.1072 on opinion model for gaming applications. Furthermore, CGI is contributing to 3 new work items: ITU-T work item P.BBQCG on parametric bitstream-based quality assessment of cloud gaming services, ITU-T work item G.OMMOG on opinion models for mobile online gaming applications, and ITU-T work item P.CROWDG on subjective evaluation of gaming quality with a crowdsourcing approach. 
In addition, four presentations were scheduled during the CGI slots. The first one was delivered by Joel Jung (Tencent Media Lab) and David Lindero (Ericsson), who presented the details of the ITU-T work item P.BBQCG. Another one was related to the evaluation of MPEG-5 Part 2 (LCEVC) for gaming video streaming applications, which was presented by Nabajeet Barman (Kingston University) and Saman Zadtootaghaj (Dolby Laboratories). Also Nabajeet together with Maria Martini (Kingston University) presented a dataset, codec comparison and challenges related to user generated HDR gaming video streaming [7]. Finally, JP Tauscher (Technische Universität Braunschweig) presented his work on EEG-based detection of deep fake images. 

No Reference Metrics (NORM)

The session for NORM group included a presentation on the impact of Spatial and Temporal Information (SI and TI) on video quality and compressibility [8], delivered by Werner Robitza (AVEQ GmbH), which was followed by a fruitful discussion on the compression complexity and on the activity related to SI/TI clarification launched in the last VQEG plenary meeting. In addition, there was another presentation from Mikołaj Leszczuk (AGH University of Science and Technology) on content type indicators for technologies supporting video sequence summarization. Finally, Ioannis Katsavounidis (Facebook) led a discussion on the inclusion of video quality metrics as metadata in compressed streams, with a report on the progress on this activity that was started in the last meeting. 

Joint Effort Group (JEG) – Hybrid

The JEG-Hybrid group is currently working on the development of a generally applicable no-reference hybrid perceptual/bitstream model. In this sense, Enrico Masala and Lohic Fotio Tiotsop (Politecnico di Tornio) presented the progress on designing a neural-network approach to model single observers using existing subjectively-annotated image and video datasets [9] (the design of subjective tests tailored for the training of this approach is envisioned for future work). In addition to this activity, the group is working in collaboration with the Sky Group on the “Hodor Project”, which is based on developing a measure that could allow to automatically identify video sequences for which quality metrics are likely to deliver inaccurate Mean Opinion Score (MOS) estimation.
Apart from these joint activities Dr. Yendo Hu (Carnation Communications Inc. and Jimei University) delivered a presentation proposing to work on a benchmarking standard to bring quality, bandwidth, and latency into a common measurement domain.

Quality Assessment for Computer Vision Applications (QACoViA)

In addition to a progress report, the QACoViA group scheduled two interesting presentations on enhancing artificial intelligence resilience to image coding artifacts through expert training (by Alban Marie from INSA Rennes) and on providing datasets to rain no-reference metrics for computer vision applications (by Carolina Whitaker from NTIA/ITS). 

5G Key Performance Indicators (5GKPI)

The 5GKPI session consisted of a presentation by Pablo Pérez (Nokia Bell-Labs) of the progress achieved by the group since the last plenary meeting in the following efforts: 1) the contribution to ITU-T Study Group 12 Question 13 related through the Technical Report about QoE in 5G video services (GSTR-5GQoE), which addresses QoE requirements and factors for some use cases like Tele-operated Driving (ToD), wireless content production, mixed reality offloading and first responder networks; 2) the contribution to the 5G Automotive Association (5GAA) through a high-level contribution on general QoE requirements for remote driving, considering for the near future the execution of subjective tests for ToD video quality; and 3) the long-term plan on working on a methodology to create simple opinion models to estimate average QoE for a network and use case.

Immersive Media Group (IMG)

Several presentations were delivered during the IMG session that were divided into two blocks: one covering technologies and studies related to the evaluation of immersive communication systems from a task-based or interactive perspective, and another one covering other topics related to the assessment of QoE of immersive media. 
The first set of presentations is related to a new proposal for a joint work within IMG related to the ITU-T work item P.QXM on QoE assessment of eXtended Reality meetings. Thus, Irene Viola (CWI) presented an overview of this work item. In addition, Carlos Cortés (Universidad Politécncia de Madrid) presented his work on evaluating the impact of delay on QoE in immersive interactive environments, Irene Viola (CWI) presented a dataset of point cloud dynamic humans for immersive telecommunications, Pablo César (CWI) presented their pipeline for social virtual reality [10], and Narciso García (Universidad Politécncia de Madrid) presented their real-time free-viewpoint video system (FVVLive) [11]. After these presentations, Jesús Gutiérrez (Universidad Politécncia de Madrid) led the discussion on joint next steps with IMG, which, in addition, to identify interested parties in joining the effort to study the evaluation of immersive communication systems, also covered the further analyses to be done from the subjective tests carried out with short 360-degree videos [12] and the studies carried out to assess quality and other factors (e.g., presence) with long omnidirectional sequences. In this sense, Marta Orduna (Universidad Politécnica de Madrid) presented her subjective study to validate a methodology to assess quality, presence, empathy, attitude, and attention in Social VR [13]. Future progress on these joint activities will be discussed in the group audio-calls. 
Within the other block of presentations related to immersive media topics, Maria Martini (Kingston University), Chulhee Lee (Yonsei University), and Patrick Le Callet (Université de Nantes) presented the status of IEEE standardization on QoE for immersive experiences (IEEE P3333.1.4 – Light Field, and IEEE P3333.1.3, deep learning-based quality assessment), Kjell Brunnström (RISE) presented their work on legibility and readability in augmented reality [14], Abdallah El Ali (CWI) presented his work on investigating the relationship between momentary emotion self-reports and head and eye movements in HMD-based 360° videos [15], Elijs Dima (Mid Sweden University) exposed his study on quality of experience in augmented telepresence considering the effects of viewing positions and depth-aiding augmentation [16], Silvia Rossi (UCL) presented her work towards behavioural analysis of 6-DoF user when consuming immersive media [17], and Yana Nehme (INSA Lyon) presented a study on exploring crowdsourcing for subjective quality assessment of 3D Graphics.

Intersector Rapporteur Group on Audiovisual Quality Assessment (IRG-AVQA) and Q19 Interim Meeting

During the IRG-AVQA session, an overview on the progress and recent works within ITU-R SG6 and ITU-T SG12 was provided. In particular, Chulhee Lee (Yonsei University) in collaboration with other ITU rapporteurs presented the progress of ITU-R WP6C on recommendations for HDR content, the work items within: ITU-T SG12 Question 9 on audio-related work items, SG12 Question 13 on gaming and immersive technologies (e.g., augmented/extended reality) among others, SG12 Question 14 recommendations and work items related to the development of video quality models, and SG12 Question 19 on work items related to television and multimedia. In addition, the progress of the group “Implementers Guide for Video Quality Metrics (IGVQM)”, launched in the last plenary meeting by Ioannis Katsavounidis (Facebook) was discussed addressing specific points to push the collection of video quality models and datasets to be used to develop an implementer’s guide for objective video quality metrics for coding applications. 

Other updates

The next VQEG plenary meeting will take place online in December 2021.

In addition, VQEG is investigating the possibility to disseminate the videos from all the talks from these plenary meetings via platforms such as Youtube and Facebook.

Finally, given that some modifications are being made to the public FTP of VQEG, if the links to the presentations included in this column are not opened by the browser, the reader can download all the presentations in one compressed file.

References

[1] A. Raake, S. Borer, S. Satti, J. Gustafsson, R.R.R. Rao, S. Medagli, P. List, S. Göring, D. Lindero, W. Robitza, G. Heikkilä, S. Broom, C. Schmidmer, B. Feiten, U. Wüstenhagen, T. Wittmann, M. Obermann, and R. Bitto, “Multi-model standard for bitstream-, pixel-based and hybrid video quality assessment of UHD/4K: ITU-T P.1204”, IEEE Access, vol. 8, pp. 193020-193049, Oct. 2020.
[2] R.R.R. Rao, S. Göring, and A. Raake, “Towards High Resolution Video Quality Assessment in the Crowd”, IEEE Int. Conference on Quality of Multimedia Experience (QoMEX), Jun. 2021.
[3] L. Lévêque, M. Outtas, H. Liu, and L. Zhang, “Comparative study of the methodologies used for subjective medical image quality assessment”, Physics in Medicine & Biology, Jul. 2021 (Accepted).
[4] J. Nawala, L. Janowski, B. Cmiel, and K. Rusek, “Describing Subjective Experiment Consistency by p-Value P–P Plot”, ACM International Conference on Multimedia (ACM MM), Oct. 2020.
[5] Z. Li, C. G. Bampis, L. Krasula, L. Janowski, and I. Katsavounidis, “A Simple Model for Subject Behavior in Subjective Experiments”, arXiv:2004.02067v3, May 2021.
[6] P. Perez, L. Janowski, N. Garcia, M. Pinson, “Subjective Assessment Experiments That Recruit Few Observers With Repetitions (FOWR)”, arXiv:2104.02618, Apr. 2021.
[7] N. Barman, and M. G. Martini, “User Generated HDR Gaming Video Streaming: Dataset, Codec Comparison and Challenges”, IEEE Transactions on Circuits and Systems for Video Technology, May 2021.
[8] W. Robitza, R.R.R. Rao, S. Göring, and A. Raake, “Impact of Spatial and Temporal Information on Video Quality and Compressibility”, IEEE Int. Conference on Quality of Multimedia Experience (QoMEX), Jun. 2021.
[9] L. Fotio Tiotsop, T. Mizdos, M. Uhrina, M. Barkowsky, P. Pocta, and E. Masala, “Modeling and estimating the subjects’ diversity of opinions in video quality assessment: a neural network based approach”, Multimedia Tools and Applications, vol. 80, pp. 3469–3487, Sep. 2020.
[10] J. Jansen, S. Subramanyam, R. Bouqueau, G. Cernigliaro, M. Martos Cabré, F. Pérez, and P. Cesar, “A Pipeline for Multiparty Volumetric Video Conferencing: Transmission of Point Clouds over Low Latency DASH”, ACM Multimedia Systems Conference (MMSys), May 2020.
[11] P. Carballeira, C. Carmona, C. Díaz, D. Berjón, D. Corregidor, J. Cabrera, F. Morán, C. Doblado, S. Arnaldo, M.M. Martín, and N. García, “FVV Live: A real-time free-viewpoint video system with consumer electronics hardware”, IEEE Transactions on Multimedia, May 2021.
[12] J. Gutiérrez, P. Pérez, M. Orduna, A. Singla, C. Cortés, P. Mazumdar, I. Viola, K. Brunnström, F. Battisti, N. Cieplińska, D. Juszka, L. Janowski, M. Leszczuk, A. Adeyemi-Ejeye, Y. Hu, Z. Chen, G. Van Wallendael, P. Lambert, C. Díaz, J. Hedlund, O. Hamsis, S. Fremerey, F. Hofmeyer, A. Raake, P. César, M. Carli, N. García, “Subjective evaluation of visual quality and simulator sickness of short 360° videos: ITU-T Rec. P.919”, IEEE Transactions on Multimedia, Jul. 2021 (Early Access).
[13] M. Orduna, P. Pérez, J. Gutiérrez, and N. García, “Methodology to Assess Quality, Presence, Empathy, Attitude, and Attention in Social VR: International Experiences Use Case”, arXiv:2103.02550, 2021.
[14] J. Falk, S. Eksvärd, B. Schenkman, B. Andrén, and K. Brunnström “Legibility and readability in Augmented Reality”, IEEE Int. Conference on Quality of Multimedia Experience (QoMEX), Jun. 2021.
[15] T. Xue,  A. El Ali,  G. Ding,  and P. Cesar, “Investigating the Relationship between Momentary Emotion Self-reports and Head and Eye Movements in HMD-based 360° VR Video Watching”, Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, May 2021.
[16] E. Dima, K. Brunnström, M. Sjöström, M. Andersson, J. Edlund, M. Johanson, and T. Qureshi, “Joint effects of depth-aiding augmentations and viewing positions on the quality of experience in augmented telepresence”, Quality and User Experience, vol. 5, Feb. 2020.
[17] S. Rossi, I. Viola, J. Jansen, S. Subramanyam, L. Toni, and P. Cesar, “Influence of Narrative Elements on User Behaviour in Photorealistic Social VR”, International Workshop on Immersive Mixed and Virtual Environment Systems (MMVE), Sep. 28, 2021.

VQEG Column: New topics

Introduction

Welcome to the fourth column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG).
During the last VQEG plenary meeting (14-18 Dec. 2020) various interesting discussions arose regarding new topics not addressed up to then by VQEG groups, which led to launching three new sub-projects and a new project related to: 1) clarifying the computation of spatial and temporal information (SI and TI), 2) including video quality metrics as metadata in compressed bitstreams, 3) Quality of Experience (QoE) metrics for live video streaming applications, and 4) providing guidelines on implementing objective video quality metrics to the video compression community.
The following sections provide more details about these new activities and try to encourage interested readers to follow and get involved in any of them by subscribing to the corresponding reflectors.

SI and TI Clarification

The VQEG No-Reference Metrics (NORM) group has recently focused on the topic of spatio-temporal complexity, revisiting the Spatial Information and Temporal Information (SI/TI) indicators, which are described in ITU-T Rec. P.910 [1]. They were originally developed for the T1A1 dataset in 1994 [2]. The metrics have found good use over the last 25 years – mostly employed for checking the complexity of video sources in datasets. However, SI/TI definitions contain ambiguities, so the goal of this sub-project is to provide revised definitions eliminating implementation inconsistencies.

Three main topics are discussed by VQEG in a series of online meetings:

  • Comparison of existing publicly available implementations for SI/TI: a comparison was made between several public open-source implementations for SI/TI, based on initial feedback from members of Facebook. Bugs and inconsistencies were identified with the handling of video frame borders, treatment of limited vs. full range content, as well as the reporting of TI values for the first frame. Also, the lack of standardized test vectors was brought up as an issue. As a consequence, a new reference library was developed in Python by members of TU Ilmenau, incorporating all bug fixes that were previously identified, and introducing a new test suite, to which the public is invited to contribute material. VQEG is now actively looking for specific test sequences that will be useful for both validating existing SI/TI implementations, but also extending the scope of the metrics, which is related to the next issue described below.
  • Study on how to apply SI/TI on different content formats: the description of SI/TI was found to be not suitable for extended applications such as video with a higher bit depth (> 8 Bit), HDR content, or spherical/3D video. Also, the question was raised on how to deal with the presence of scene changes in content. The community concluded that for content with higher bit depth, SI/TI functions should be calculated as specified, but that the output values could be mapped back to the original 8-Bit range to simplify comparisons. As for HDR, no conclusion was reached, given the inherent complexity of the subject. It was also preliminarily concluded that the treatment of scene changes should not be part of an SI/TI recommendation, instead focusing on calculating SI/TI for short sequences without scene changes, since the way scene changes would be dealt with may depend on the final application of the metrics.
  • Discussion on other relevant uses of SI/TI: it has been widely used for checking video datasets in terms of diversity and classifying content. Also, SI/TI have been used in some no-reference metrics as content features. The question was raised whether SI/TI could be used for predicting how well content could be encoded. The group noted that different encoders would deal with sources differently, e.g. related to noise in the video. It was stated that it would be nice to be able to find a metric that was purely related to content and not affected by encoding or representation.

As a first step, this revision of the topic of SI/TI has resulted in a harmonized implementation and in the identification of future application areas. Discussions on these topics will continue in the next months through audio-calls that are open to interested readers.

Video Quality Metadata Standard

Also within NORM group, another topic was launched related to the inclusion of video quality metadata in compressed streams [3].

Almost all modern transcoding pipelines use full-reference video quality metrics to decide on the most appropriate encoding settings. The computation of these quality metrics is demanding in terms of time and computational resources. In addition, estimation errors propagate and accumulate when quality metrics are recomputed several times along the transcoding pipeline. Thus, retaining the results of these metrics with the video can alleviate these constraints, requiring very little space and providing a “greener” way of estimating video quality. With this goal, the new sub-project has started working towards the definition of a standard format to include video quality metrics metadata both at video bitstream level and system layer [4].

In this sense, the experts involved in the new sub-project are working on the following items:

  • Identification of existing proposals and working groups within other standardisation bodies and organisations that address similar topics and propose amendments including new requirements. For example, MPEG has already worked on the adding of video quality metrics (e.g., PSNR, SSIM, MS-SSIM, VQM, PEVQ, MOS, FISG) metadata at system level (e.g, in MPEG2 streams [5], HTTP [6], etc.[7]).
  • Identification of quality metrics to be considered in the standard. In principle, validated and standardized metrics are of interest, although other metrics can be also considered after a validation process on a standard set of subjective data (e.g., using existing datasets). New metrics to those used in previous approaches are of special interest. (e.g., VMAF [8], FB-MOS [9]).
  • Consideration of the computation of multiple generations of full-reference metrics at different steps of the transcoding chain, of the use of metrics at different resolutions, different spatio-temporal aggregation methods, etc.
  • Definition of a standard video quality metadata payload, including relevant fields such as metric name (e.g., “SSIM”), version (e.g., “v0.6.1”), raw score (e.g., “0.9256”), mapped-to-MOS score (e.g., “3.89”), scaling method (e.g., “Lanczos-5”), temporal reference (e.g., “0-3” frames), aggregation method (e.g., “arithmetic mean”), etc [4].

More details and information on how to join this activity can be found in the NORM webpage.

QoE metrics for live video streaming applications

The VQEG Audiovisual HD Quality (AVHD) group launched a new sub-project on QoE metrics for live media streaming applications (Live QoE) in the last VQEG meeting [10].

The success of a live multimedia streaming session is defined by the experience of a participating audience. Both the content communicated by the media and the quality at which it is delivered matter – for the same content, the quality delivered to the viewer is a differentiating factor. Live media streaming systems undertake a lot of investment and operate under very tight service availability and latency constraints to support multimedia sessions for their audience. Both to measure the return on investment and to make sound investment decisions, it is paramount that we be able to measure the media quality offered by these systems. In this sense, given the large scale and complexity of media streaming systems, objective metrics are needed to measure QoE.

Therefore, the following topics have been identified and are studied [11]:

  • Creation of a high quality dataset, including media clips and subjective scores, which will be used to tune, train and develop objective QoE metrics. This dataset should represent the conditions that take place in typical live media streaming situations, therefore conditions and impairments comprising audio and video tracks (independently and jointly) will be considered. In addition, this datasets should cover a diverse set of content categories, including premium contentes (e.g., sports, movies, concerts, etc.) and user generated content (e.g., music, gaming, real life content, etc.).
  • Development of QoE objective metrics, especially focusing on no-reference or near-no-reference metrics, given the lack of access to the original video at various points in the live media streaming chain. Different types of models will be considered including signal-based (operate on the decoded signal), metadata-based (operate on available metadata, e.g. codecs, resolution, framerate, bitrate, etc.), bitstream-based (operate on the parsed bitstream), and hybrid models (combining signal and metadata) [12]. Also, machine-learning based models will be explored.

Certain challenges are envisioned to be faced when dealing with these two topics, such as separating “content” from “quality” (taking int account that content plays a big role on engagement and acceptability), spectrum expectations, role of network impairments and the collection of enough data to develop robust models [11]. Readers interested in joining this effort are encouraged to visit AVHD webpage for more details.

Implementer’s Guide to Video Quality Metrics

In the last meeting, a new dedicated group on Implementer’s Guide to Video Quality Metrics (IGVQM) was set up to work on introducing and provide guidelines on implementing objective video quality metrics to the video compression community.

During the development of new video coding standards, peak-signal-to-noise-ratio (PSNR) has been traditionally used as the main objective metric to determine which new coding tools to be adopted. It has been furthermore used to establish the bitrate savings that a new coding standard offers over its predecessor through the employment of the so-called “BD-rate” metric [13] that still relies on PSNR for measuring quality.

Although this choice was fully justified for the first image/video coding standards – JPEG (1992), MPEG1 (1994), MPEG2 (1996), JPEG2000 and even H.264/AVC (2004) – since there was simply no other alternative at that time, its continuing use for the development of H.265/HEVC (2013), VP9 (2013), AV1 (2018) and most recently EVC and VVC (2020) is questionable, given the rapid and continuous evolution of more perceptual image/video objective quality metrics, such as SSIM (2004) [14], MS-SSIM (2004) [15], and VMAF (2015) [8].

This project attempts to offer some guidance to the video coding community, including standards setting organisations, on how to better utilise existing objective video quality metrics to better capture the improvements offered by video coding tools. For this, the following goals have been envisioned:

  • Address video compression and scaling impairments only.
  • Explore and use “state-of-the-art” full-reference (pixel) objective metrics, examine applicability of no-reference objective metrics, and obtain reference implementations of them.
  • Offer temporal aggregation methods of image quality metrics into video quality metrics.
  • Present statistical analysis of existing subjective datasets, constraining them to compression and scaling artifacts.
  • Highlight differences among objective metrics and use-cases. For example, in case of very small differences, which metric is more sensitive? Which quality range is better served by what metric?
  • Offer standard logistic mappings of objective metrics to a normalised linear scale.

More details can be found in the working document that has been set up to launch the project [16] and on the VQEG website.

References

[1] ITU-T Rec. P.910. Subjective video quality assessment methods for multimedia applications, 2008.
[2] M. H. Pinson and A. Webster, “T1A1 Validation Test Database,” VQEG eLetter, vol. 1, no. 2, 2015.
[3] I. Katsavounidis, “Video quality metadata in compressed bitstreams”, Presentation in VQEG Meeting, Dec. 2020.
[4] I. Katsavounidis et al. “A case for embedding video quality metrics as metadata in compressed bitstreams, working document, 2019.
[5] ISO/IEC 13818-1:2015/AMD 6:2016 Carriage of Quality Metadata in MPEG2 Streams.
[6] ISO/IEC 23009 Dynamic Adaptive Streaming over HTTP (DASH).
[7] ISO/IEC 23001-10, MPEG Systems Technologies – Part 10: Carriage of timed metadata metrics of media in ISO base media file format.
[8] Toward a practical perceptual video quality metric, Tech blog with VMAF’s open sourcing on Github, Jun. 6, 2016.
[9] S.L. Regunathan, H. Wang, Y. Zhang, Y. R. Liu, D. Wolstencroft, S. Reddy, C. Stejerean, S. Gandhi, M. Chen, P. Sethi, A, Puntambekar, M. Coward, I. Katsavounidis, “Efficient measurement of quality at scale in Facebook video ecosystem”, in Applications of Digital Image Processing XLIII, vol. 11510, p. 115100J, Aug. 2020.
[10] R. Puri, “On a QoE metric for live media streaming applications”, Presentation in VQEG Meeting, Dec. 2020.
[11] R. Puri and S. Satti, “On a QoE metric for live media streaming applications”, working document, Jan. 2021.
[12] A. Raake, S. Borer, S. Satti, J. Gustafsson, R.R.R. Rao, S. Medagli, P. List, S. Göring, D. Lindero, W. Robitza, G. Heikkilä, S. Broom, C. Schmidmer, B. Feiten, U. Wüstenhagen, T. Wittmann, M. Obermann, R. Bitto, “Multi-model standard for bitstream-, pixel-based and hybrid video quality assessment of UHD/4K: ITU-T P.1204” , IEEE Access, vol. 8, Oct. 2020.
[13] G. Bjøntegaard, “Calculation of Average PSNR Differences Between RD-Curves”, Document VCEG-M33, ITU-T SG 16/Q6, 13th VCEG Meet- ing, Austin, TX, USA, Apr. 2001.
[14] Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” in IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, April 2004.
[15] Z. Wang, E. P. Simoncelli and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, Pacific Grove, CA, USA, 2003.
[16] I. Katsavounidis, “VQEG’s Implementer’s Guide to Video Quality Metrics (IGVQM) project , working document, 2021.

VQEG Column: VQEG Meeting Dec. 2020 (virtual/online)

Introduction

Welcome to the third column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG).
The last VQEG plenary meeting took place online from 14 to 18 December. Given the current circumstances, it was organized all online for the second time, with multiple sessions distributed over five to six hours each day allowing remote participation of people from different time zones. About 130 participants from 24 different countries registered to the meeting and could attend the several presentations and discussions that took place in all working groups.
This column provides an overview of this meeting, while all the information, minutes, files (including the presented slides), and video recordings from the meeting are available online in the VQEG meeting website. As highlights of interest for the SIGMM community, apart from several interesting presentations of state-of-the-art works, relevant contributions to ITU recommendations related to multimedia quality assessment were reported from various groups (e.g., on adaptive bitrate streaming services, on subjective quality assessment of 360-degree videos, on statistical analysis of quality assessments, on gaming applications, etc.), the new group on quality assessment for health applications was launched, and an interesting session on 5G use cases took place, as well as a workshop dedicated to user testing during Covid-19. In addition, new efforts have been launched related to the research on quality metrics for live media streaming applications, and to provide guidelines on implementing objective video quality metrics (ahead of PSNR) to the video compression community.
We encourage those readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Overview of VQEG Projects

Audiovisual HD (AVHD)

AVHD/P.NATS2 project was a joint collaboration between VQEG and ITU SG12, whose goal was to develop a multitude of objective models, varying in terms of complexity/type of input/use-cases for the assessment of video quality in adaptive bitrate streaming services over reliable transport up to 4K. The report of this project, which finished in January 2020, was approved in this meeting. In summary, it resulted in 10 model categories with models trained and validated on 26 subjective datasets. This activity resulted in 4 ITU standards (ITU-T Rec. P.1204 in [1], P.1204.3 in [2], P.1204.4 in [3], P.1204.5 in [4], a dataset created during this effort and a journal publication reporting details on the validation tests [5]. In this sense, one presentation by Alexander Raake (TU Ilmenau) provided details on the P.NATS Phase 2 project and the resulting ITU recommendations, while details of the processing chain used in the project were presented by Werner Robitza (AVEQ GmbH) and David Lindero (Ericsson).
In addition to this activity, there were various presentations covering topics related to this group. For instance, Cindy Chen, Deepa Palamadai Sundar, and Visala Vaduganathan (Facebook) presented their work on hardware acceleration of video quality metrics. Also from Facebook, Haixiong Wang presented their work on efficient measurement of quality at scale in their video ecosystem [6]. Lucjan Janowski (AGH University) proposed a discussion on more ecologically valid subjective experiments, Alan Bovik (University of Texas at Austin) presented a hitchhiker’s guide to SSIM, and Ali Ak (Université de Nantes) presented a comprehensive analysis of crowdsourcing for subjective evaluation of tone mapping operators. Finally, Rohit Puri (Twitch) opened a discussion on the research on QoE metrics for live media streaming applications, which led to the agreement to start a new sub-project within AVHD group on this topic.

Psycho-Physiological Quality Assessment (PsyPhyQA)

The chairs of the PsyPhyQA group provided an update on the activities carried out. In this sense, a test plan for psychophysiological video quality assessment was established and currently the group is aiming to develop ideas to do quality assessment tests with psychophysiological measures in times of a pandemic and to collect and discuss ideas about possible joint works. In addition, the project is trying to learn about physiological correlates of simulator sickness, and in this sense, a presentation was delivered J.P. Tauscher (Technische Universität Braunschweig) on exploring neural and peripheral physiological correlates of simulator sickness. Finally, Waqas Ellahi (Université de Nantes) gave a presentation on visual fidelity of tone mapping operators from gaze data using HMM [7].

Quality Assessment for Health applications (QAH)

This was the first meeting for this new QAH group. The chairs informed about the first audio call that took place on November to launch the project, know how many people are interested in this project, what each member has already done on medical images, what each member wants to do in this joint project, etc.
The plenary meeting served to collect ideas about possible joint works and to share experiences on related studies. In this sense, Lucie Lévêque (Université Gustave Eiffel) presented a review on subjective assessment of the perceived quality of medical images and videos, Maria Martini (Kingston University London) talked about the suitability of VMAF for quality assessment of medical videos (ultrasound & wireless capsule endoscopy), and Jorge Caviedes (ASU) delivered a presentation on cognition inspired diagnostic image quality models.

Statistical Analysis Methods (SAM)

The update report from SAM group presented the ongoing progress on new methods for data analysis, including the discussion with ITU-T (P.913 [8]) and ITU-R (BT.500 [9]) about including a new one in the recommendations.
Several interesting presentations related to the ongoing work within SAM were delivered. For instance, Jakub Nawala (AGH University) presented the “su-JSON”, a uniform JSON-based subjective data format, as well as his work on describing subjective experiment consistency by p-value p–p plots. An interesting discussion was raised by Lucjan Janowski (AGH University) on how to define the quality of a single sequence, analyzing different perspectives (e.g., crowd, experts, psychology, etc.). Also, Babak Naderi (TU Berlin) presented an analysis on the relation on Mean Opinion Score (MOS) and ranked-based statistics. Recent advances on Netflix quality metric VMAF were presented by Zhi Li (Netflix), especially on the properties of VMAF in the presence of image enhancement. Finally, two more presentations addressed the progress on statistical analyses of quality assessment data, one by Margaret Pinson (NTIA/ITS) on the computation of confidence intervals, and one by Suiyi Ling (Université de Nantes) on a probabilistic model to recover the ground truth and annotator’s behavior.

Computer Generated Imagery (CGI)

The report from the chairs of the CGI group covered the progress on the research on assessment methodologies for quality assessment of gaming services (e.g., ITU-T P.809 [10]), on crowdsourcing quality assessment for gaming application (P.808 [11]), on quality prediction and opinion models for cloud gaming (e.g., ITU-T G.1072 [12]), and on models (signal-, bitstream-, and parametric-based models) for video quality assessment of CGI content (e.g., nofu, NDNetGaming, GamingPara, DEMI, NR-GVQM, etc.).
In terms of planned activities, the group is targeting the generation of new gaming datasets and tools for metrics to assess gaming QoE, but also the group is aiming at identifying other topics of interest in CGI rather than gaming content.
In addition, there was a presentation on updates on gaming standardization activities and deep learning models for gaming quality prediction by Saman Zadtootaghaj (TU Berlin), another one on subjective assessment of multi-dimensional aesthetic assessment for mobile game images by Suiyi Ling (Université de Nantes), and one addressing quality assessment of gaming videos compressed via AV1 by Maria Martini (Kingston University London), leading to interesting discussions on those topics.

No Reference Metrics (NORM)

The session for NORM group included a presentation on the differences among existing implementations of spatial and temporal perceptual information indices (SI and TI as defined in ITU-T P.910 [13]) by Cosmin Stejerean (Facebook), which led to an open discussion and to the agreement on launching an effort to clarify the ambiguous details that have led to different implementations (and different results), to generate test vectors for reference and validation of the implementations and to address the computation of these indicators for HDR content. In addition, Margaret Pinson (NTIA/ITS) presented the paradigm of no-reference metric research analyzing design problems and presenting a framework for collaborative development of no-reference metrics for image and video quality. Finally, Ioannis Katsavounidis (Facebook) delivered a talk on addressing the addition of video quality metadata in compressed bitstreams. Further discussions on these topics are planned in the next month within the group.

Joint Effort Group (JEG) – Hybrid

The JEG-Hybrid group is currently working in collaboration with Sky Group in determining when video quality metrics are likely to inaccurately predict the MOS and on modelling single observers’ quality perception based in artificial intelligence techniques. In this sense, Lohic Fotio (Politecnico di Tornio) presented his work on artificial intelligence-based observers for media quality assessment. Also, together with Florence Agboma (Sky UK) they presented their work on comparing commercial and open source video quality metrics for HD constant bitrate videos. Finally, Dariusz Grabowski (AGH University) presented his work on comparing full-reference video quality metrics using cluster analysis.

Quality Assessment for Computer Vision Applications (QACoViA)

The QACoViA group announced Lu Zhang (INSA Rennes) as new third co-chair, who will also work in the near future in a project related to image compression for optimized recognition by distributed neural networks. In addition, Mikołaj Leszczuk (AGH University) presented a report on a recently finished project related to objective video quality assessment method for recognition tasks, in collaboration with Huawei through its Innovation Research Programme.

5G Key Performance Indicators (5GKPI)

The 5GKPI session was oriented to identify possible interested partners and joint works (e.g., contribution to ITU-T SG12 recommendation G.QoE-5G [14], generation of open/reference datasets, etc.). In this sense, it included four presentations of use cases of interest: tele-operated driving by Yungpeng Zang (5G Automotive Association), content production related to the European project 5G-Records by Paola Sunna (EBU), Augmented/Virtual Reality by Bill Krogfoss (Bell Labs Consulting), and QoE for remote controlled use cases by Kjell Brunnström (RISE).

Immersive Media Group (IMG)

A report on the updates within the IMG group was initially presented, especially covering the current joint work investigating the subjective quality assessment of 360-degree video. In particular, a cross-lab test, involving 10 different labs, were carried out at the beginning of 2020 resulting in relevant outcomes including various contributions to ITU SG12/Q13 and MPEG AhG on Quality of Immersive Media. It is worth noting that the new ITU-T recommendation P.919 [15], related to subjective quality assessment of 360-degree videos (in line with ITU-R BT.500 [8] or ITU-T P.910 [13]), was approved in mid-October, and was supported by the results of these cross-lab tests. 
Furthermore, since these tests have already finished, there was a presentation by Pablo Pérez (Nokia Bell-Labs) on possible future joint activities within IMG, which led to an open discussion after it that will continue in future audio calls.
In addition, a total of four talks covered topics related to immersive media technologies, including an update from the Audiovisual Technology Group of the TU Ilmenau on immersive media topics, and a presentation of a no-reference quality metric for light field content based on a structural representation of the epipolar plane image by Ali Ak and Patrick Le Callet (Université de Nantes) [16]. Also, there were two presentations related to 3D graphical contents, one addressing the perceptual characterization of 3D graphical contents based on visual attention patterns by Mona Abid (Université de Nantes), and another one comparing subjective methods for quality assessment of 3D graphics in virtual reality by Yana Nehmé (INSA Lyon). 

Intersector Rapporteur Group on Audiovisual Quality Assessment (IRG-AVQA) and Q19 Interim Meeting

Chulhee Lee (Yonsei University) chaired the IRG-AVQA session, providing an overview on the progress and recent works within ITU-R WP6C in HDR related topics and ITU-T SG12 Questions 9, 13, 14, 19 (e.g., P.NATS Phase 2 and follow-ups, subjective assessment of 360-degree video, QoE factors for AR applications, etc.). In addition, a new work item was announced within ITU-T SG9: End-to-end network characteristics requirements for video services (J.pcnp-char [17]).
From the discussions raised during this session, a new dedicated group was set up to work on introducing and provide guidelines on implementing objective video quality metrics, ahead of PSNR, to the video compression community. The group was named “Implementers Guide for Video Quality Metrics (IGVQM)” and will be chaired by Ioannis Katsavounidis (Facebook), accounting with the involvement of several people from VQEG.
After the IRG-AVQA session, the Q19 interim meeting took place with a report by Chulhee Lee and a presentation by Zhi Li (Netflix) on an update on improvements on subjective experiment data analysis process.

Other updates

Apart from the aforementioned groups, the Human Factors for Visual Experience (HVEI) is still active coordinating VQEG activities in liaison with the IEEE Standards Association Working Groups on HFVE, especially on perceptual quality assessment of 3D, UHD and HD contents, quality of experience assessment for VR and MR, quality assessment of light-field imaging contents, and deep-learning-based assessment of visual experience based on human factors. In this sense, there are ongoing contributions from VQEG members to IEEE Standards.
In addition, there was a workshop dedicated to user testing during Covid-19, which included a presentation on precaution for lab experiments by Kjell Brunnström (RISE), another presentation by Babak Naderi (TU Berlin) on subjective tests during the pandemic, and a break-out session for discussions on the topic.

Finally, the next VQEG plenary meeting will take place in spring 2021 (exact dates still to be agreed), probably online again.

References

[1] ITU-T Rec. P.1204. Video quality assessment of streaming services over reliable transport for resolutions up to 4K, 2020.
[2] ITU-T Rec. P.1204.3. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to full bitstream information, 2020.
[3] ITU-T Rec. P.1204.4. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to full and reduced reference pixel information, 2020.
[4] ITU-T Rec. P.1204.5. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to transport and received pixel information, 2020.
[5] A. Raake, S. Borer, S. Satti, J. Gustafsson, R.R.R. Rao, S. Medagli, P. List, S. Göring, D. Lindero, W. Robitza, G. Heikkilä, S. Broom, C. Schmidmer, B. Feiten, U. Wüstenhagen, T. Wittmann, M. Obermann, R. Bitto, “Multi-model standard for bitstream-, pixel-based and hybrid video quality assessment of UHD/4K: ITU-T P.1204”, IEEE Access, vol. 8, pp. 193020-193049, Oct. 2020.
[6] S.L. Regunathan, H. Wang, Y. Zhang, Y. R. Liu, D. Wolstencroft, S. Reddy, C. Stejerean, S. Gandhi, M. Chen, P. Sethi, A, Puntambekar, M. Coward, I. Katsavounidis, “Efficient measurement of quality at scale in Facebook video ecosystem”, in Applications of Digital Image Processing XLIII, vol. 11510, p. 115100J, Aug. 2020.
[7] W. Ellahi, T. Vigier and P. Le Callet, “HMM-Based Framework to Measure the Visual Fidelity of Tone Mapping Operators”, IEEE International Conference on Multimedia & Expo Workshops (ICMEW), London, United Kingdom, Jul. 2020.
[8] ITU-R Rec. BT.500-14. Methodology for the subjective assessment of the quality of television pictures, 2019.
[9] ITU-T Rec. P.913. Methods for the subjective assessment of video quality, audio quality and audiovisual quality of Internet video and distribution, 2016.
[10] ITU-T Rec. P.809. Subjective evaluation methods for gaming quality, 2018.
[11] ITU-T Rec. P.808. Subjective evaluation of speech quality with a crowdsourcing approach, 2018.
[12] ITU-T Rec. G.1072. Opinion model predicting gaming quality of experience for cloud gaming services, 2020.
[13] ITU-T Rec. P.910. Subjective video quality assessment methods for multimedia applications, 2008.
[14] ITU-T Rec. G.QoE-5G. QoE factors for new services in 5G networks, 2020 (under study).
[15] ITU-T Rec. P.919. Subjective test methodologies for 360º video on head-mounted displays, 2020.
[16] A. Ak, S. Ling and P. Le Callet, “No-Reference Quality Evaluation of Light Field Content Based on Structural Representation of The Epipolar Plane Image”, IEEE International Conference on Multimedia & Expo Workshops (ICMEW), London, United Kingdom, Jul. 2020.
[17] ITU-T Rec. J.pcnp-char. E2E Network Characteristics Requirement for Video Services, 2020 (under study).

VQEG Column: Recent contributions to ITU recommendations

Welcome to the second column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG).
VQEG plays a major role in research and the development of standards on video quality and this column presents examples of recent contributions to International Telecommunication Union (ITU) recommendations, as well as ongoing contributions to recommendations to come in the near future. In addition, the formation of a new group within VQEG addressing Quality Assessment for Health Applications (QAH) has been announced.  

VQEG website: www.vqeg.org
Authors: 
Jesús Gutiérrez (jesus.gutierrez@upm.es), Universidad Politécnica de Madrid (Spain)
Kjell Brunnström (kjell.brunnstrom@ri.se), RISE (Sweden) 
Thanks to Lucjan Janowski (AGH University of Science and Technology), Alexander Raake (TU Ilmenau) and Shahid Satti (Opticom) for their help and contributions.

Introduction

VQEG is an international and independent organisation that provides a forum for technical experts in perceptual video quality assessment from industry, academia, and standardization organisations. Although VQEG does not develop or publish standards, several activities (e.g., validation tests, multi-lab test campaigns, objective quality models developments, etc.) carried out by VQEG groups have been instrumental in the development of international recommendations and standards. VQEG contributions have been mainly submitted to relevant ITU Study Groups (e.g., ITU-T SG9, ITU-T SG12, ITU-R WP6C), but also to other standardization bodies, such as MPEG, ITU-R SG6, ATIS, IEEE P.3333 and P.1858, DVB, and ETSI. 

In our first column on the ACM SIGMM Records we provided a table summarizing the several VQEG studies that have resulted in ITU Recommendations. In this new column, we describe with more detail the last contributions to recent ITU standards, and we provide an insight on the ongoing contributions that may result in ITU recommendations in the near future.

ITU Recommendations with recent inputs from VQEG

ITU-T Rec. P.1204 standard series

A campaign within the ITU-T Study Group (SG) 12 (Question 14) in collaboration with the VQEG AVHD group resulted in the development of three new video quality model standards for the assessment of sequences of up to UHD/4K resolution. This campaign was carried out during more than two years under the project “AVHD-AS / P.NATS Phase 2”. While “P.NATS Phase 1” (finalized in 2016 and resulting in the standards series ITU-T Rec. P.1203, P.1203.1, P.1203.2 and P.1203.3) addressed the development of improved bitstream-based models for the prediction of the overall quality of long (1-5 minutes) video streaming sessions, the second phase addressed the development of short-term video quality models covering a wider scope with bitstream-based, pixel-based and hybrid models. The P.NATS Phase 2 project was executed as a competition between nine participating institutions in different tracks resulting in the aforementioned three types of video quality models. 

For the competition, a total of 26 databases were created, 13 used for training and 13 for validation and selection of the winning models. In order to establish the ground truth, subjective video quality tests were performed on four different display devices (PC-monitors, 55-75” TVs, mobile, and tablet) with at least 24 subjects each and using the 5-point Absolute Category Rating (ACR) scale. In total, about 5000 test sequences with a duration of around 8 seconds were evaluated, containing a variety of resolutions, encoding configurations, bitrates, and framerates using the codecs H.264/AVC, H.265/HEVC and VP9.   

More details about the whole workflow and results of the competition can be found in [1]. As a result of this competition, the new standard series ITU-T Rec. P.1204 [2] has been recently published, including a bitstream-based model  (ITU-T Rec. P.1204.3 [3]), a pixel-based model (ITU-T Rec. P.1204.4 [4]) and a hybrid model (ITU-T Rec. P.1204.5 [5]).

ITU-T Rec. P.1401

ITU-T Rec. P.1401 [6] is about statistical analysis, evaluation and reporting guidelines of quality measurements and was recently revised in January 2020.  Based on the article by Brunnström and Barkowsky [7], it was recognized and pointed out by VQEG that this Recommendation, which is very useful, lacked a section on the topic of multiple comparisons and its potential impact on the performance evaluations of objective quality methods. In the latest revision, Section 7.6.5 covers this topic.

Ongoing VQEG Inputs to ITU Recommendations

ITU-T Rec. P.919

ITU has been working on a recommendation for subjective test methodologies for 360º video on Head-Mounted Displays (HMDs), under the SG12 Question 13 (Q13). The Immersive Media Group (IMG) of the VQEG has collaborated in this effort through the fulfilment of the Phase 1 of the Test Plan for Quality Assessment of 360-degree Video. In particular, the Phase 1 of this test plan addresses the assessment of short sequences (less than 30 seconds), in the spirit of ITU-R BT.500 [8] and ITU-T P.910 [9]. In this sense, the evaluation of audiovisual quality and simulator sickness was considered. On the other hand, the Phase 2 of the test plan (envisioned for the near future) covers the assessment of other factors that can be more influential with longer sequences (several minutes), such as immersiveness and presence.  

Therefore, within Phase 1 the IMG designed and executed a cross-lab test with the participation of ten international laboratories, from AGH University of Science and Technology (Poland), Centrum Wiskunde & Informatica (The Netherlands), Ghent University (Belgium), Nokia Bell-Labs (Spain), Roma TRE University (Italy), RISE Acreo (Sweden), TU Ilmenau (Germany), Universidad Politécnica de Madrid (Spain), University of Surrey (England), Wuhan University (China). 

This test was aimed at assessing and validating subjective evaluation methodologies for 360º video. Thus, the single-stimulus methodology Absolute Category Rating (ACR) and the double-stimulus Degradation Category Rating (DCR) were considered to evaluate audiovisual quality of 360º videos distorted with uniform and non-uniform degradations.  In particular, different configurations of uniform and tile-based coding were applied to eight video sources with different spatial, temporal and exploration properties. Other influence factors were also studied, such as the influence of the sequence duration (from 10 to 30s) and the test setup (considering different HMDs and methods to collect the observers’ ratings, using audio or not, etc.).  Finally, in addition to the evaluation of audiovisual quality, the assessment of simulator sickness symptoms was addressed studying the use of different questionnaires. As a result of this work, the IMG of VQEG presented two contributions to the recommendation ITU-T Rec. P.919 (ex P.360-VR), which has been consented in the last SG12 meeting (7-11 September 2020) and is envisioned to be published soon. In addition, the results and the annotated dataset coming from the cross-lab test will be published soon.

ITU-T Rec. P.913

Another upcoming contribution is prepared by the Statistical Analysis Group (SAM). The main goal of the proposal is to increase the precision of the subjective experiment analysis by describing a subjective answer as a random variable. The random variable is described by three key influencing factors, the sequence quality, a subject bias, and a subject precision. It is further development of the ITU-T P.913 [10] recommendation where subject bias was introduced. Adding subject precision allows for two achievements: Better handling unreliable subjects and easier estimation procedure. 

Current standards describe a way to remove an unreliable subject. The problem is that the methods proposed in BT.500 [8] and P.913 [10] are different and point to different subjects. Also, both methods have some arbitrary parameters (e.g., thresholds) deciding when a subject should be removed. It means that two subjects can be similarly imprecise but one is over the threshold, and we accept all his answers as correct and the other is under the threshold, and we remove her all answers. The proposed method weights the impact of each subject answer depending on the subject precision. As the consequence, each subject is to some extent removed and kept. The balance between how much information we keep and how much we remove depends on the subject precision. 

The estimation procedure of the proposed model, described in the literature, is MLE (Maximum Likelihood Estimation). Such estimation is computationally costly and needs a careful setup to obtain a reliable solution. Therefore, we proposed Alternating Projection (AP) solver which is less general than MLE but works as well as MLE for the subject model estimation. This solver is called “alternating projection” because, in a loop, we alternate between projecting (or averaging) the opinion scores along the subject dimension and the stimulus dimension. It increases the precision of the obtained model parameters’ step by step weighting more information coming from the more precise subjects. More details can be found in the white paper in [11].

Other updates 

A new VQEG group has been recently established related to Quality Assessment for Health Applications (QAH), with the motivation to study visual quality requirements for medical imaging and telemedicine. The main goals of this new group are:

  • Assemble all the existing publicly accessible databases on medical quality.
  • Develop databases with new diagnostic tasks and new objective quality assessment models.
  • Provide methodologies, recommendations and guidelines for subjective test of medical image quality assessment.
  • Study the quality requirements and Quality of Experience in the context of telemedicine and other telehealth services.

For any further questions or expressions of interest to join this group, please contact QAH Chair Lu Zhang (lu.ge@insa-rennes.fr), Vice Chair Meriem Outtas (Meriem.Outtas@insa-rennes.fr), and Vice Chair Hantao Liu (hantao.liu@cs.cardiff.ac.uk).

References

[1] A. Raake, S. Borer, S. Satti, J. Gustafsson, R.R.R. Rao, S. Medagli, P. List, S. Göring, D. Lindero, W. Robitza, G. Heikkilä, S. Broom, C. Schmidmer, B. Feiten, U. Wüstenhagen, T. Wittmann, M. Obermann, R. Bitto, “Multi-model standard for bitstream-, pixel-based and hybrid video quality assessment of UHD/4K: ITU-T P.1204” , IEEE Access, 2020 (Available online soon).   
[2] ITU-T Rec. P.1204. Video quality assessment of streaming services over reliable transport for resolutions up to 4K. Geneva, Switzerland: ITU, 2020.
[3] ITU-T Rec. P.1204.3. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to full bitstream information. Geneva, Switzerland: ITU, 2020.
[4] ITU-T Rec. P.1204.4. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to full and reduced reference pixel information. Geneva, Switzerland: ITU, 2020.
[5] ITU-T Rec. P.1204.5. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to transport and received pixel information. Geneva, Switzerland: ITU, 2020.
[6] ITU-T Rec. P.1401. Methods, metrics and procedures for statistical evaluation, qualification and comparison of objective quality prediction models. Geneva, Switzerland: ITU, 2020.
[7] K. Brunnström and M. Barkowsky, “Statistical quality of experience analysis: on planning the sample size and statistical significance testing”, Journal of Electronic Imaging, vol. 27, no. 5,  p. 11, Sep. 2018 (DOI: 10.1117/1.JEI.27.5.053013).
[8] ITU-R Rec. BT.500-14. Methodology for the subjective assessment of the quality of television pictures. Geneva, Switzerland: ITU, 2019.
[9]  ITU-T Rec. P.910. Subjective video quality assessment methods for multimedia applications. Geneva, Switzerland: ITU, 2008.
[10] ITU-T Rec. P.913. Methods for the subjective assessment of video quality, audio quality and audiovisual quality of Internet video and distribution quality television in any environment. Geneva, Switzerland: ITU, 2016.
[11] Z. Li, C. G. Bampis, L. Janowski, I. Katsavounidis, “A simple model for subject behavior in subjective experiments”, arXiv:2004.02067, Apr. 2020.

Standards Column: VQEG

Welcome to the first column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG).
VQEG is an international and independent organisation of technical experts in perceptual video quality assessment from industry, academia, and government organisations.
This column briefly introduces the mission and main activities of VQEG, establishing a starting point of a series of columns that will provide regular updates of the advances within the current ongoing projects, as well as reports of the VQEG meetings. 
The editors of these columns are Jesús Gutiérrez (upper photo, jesus.gutierrez@upm.es), co-chair of the Immersive Media Group of VQEG and Kjell Brunnström (lower photo, kjell.brunnstrom@ri.se), general co-chair of VQEG.  Feel free to contact them for any further questions, comments or information, and also to check the VQEG website: www.vqeg.org.

Introduction

The Video Quality Experts Group (VQEG) was born from a need to bring together experts in subjective video quality assessment and objective quality measurement. The first VQEG meeting, held in Turin in 1997, was attended by a small group of experts drawn from ITU-T and ITU-R Study Groups. VQEG was first grounded in basic subjective methodology and objective tool development/verification for video quality assessment such that the industry could be moved forward with standardization and implementation. At the beginning it was focused around measuring the perceived video quality since the distribution path for video and audio were limited and known.

Over the last 20 years from the formation of VQEG the ecosystem has changed dramatically and thus so must the work. Multimedia is now pervasive on all devices and methods of distribution from broadcast to cellular data networks. This shift has the expertise within VQEG to move from the visual (no-audio) quality of video to Quality of Experience (QoE).

The march forward of technologies means that VQEG needs to react and be on the leading edge of developing, defining and deploying methods and tools that help address these new technologies and move the industry forward. This also means that we need to embrace both qualitative and quantitative ways of defining these new spaces and terms. Taking a holistic approach to QoE will enable VQEG to drive forward and faster with unprecedented collaboration and execution

VQEG is open to all interested from industry, academia, government organizations and Standard-Developing Organizations (SDOs). There are no fees involved, no membership applications and no invitations are needed to participate in VQEG activities. Subscription to the main VQEG email list (ituvidq@its.bldrdoc.gov) constitutes membership in VQEG.

VQEG conducts work via discussions over email reflectors, regularly scheduled conference calls and, in general, two face-to-face meetings per year. There are currently more than 500 people registered across 11 email reflectors, including a main reflector for general announcements relevant to the entire group, and different project reflectors dedicated to technical discussions of specific projects. A LinkedIn group exists as well.

Objectives

The main objectives of VQEG are: 

  • To provide a forum, via email lists and face-to-face meetings for video quality assessment experts to exchange information and work together on common goals. 
  • To formulate test plans that clearly and specifically define the procedures for performing subjective assessment tests and objective models validations.
  • To produce open source databases of multimedia material and test results, as well as software tools. 
  • To conduct subjective studies of multimedia and immersive technologies and provide a place for collaborative model development to take place.

Projects

Currently, several working groups are active within VQEG, classified under four main topics:

  1. Subjective Methods: Based on collaborative efforts to improve subjective video quality test methods.
    • Audiovisual HD (AVHD), project “Advanced Subjective Methods” (AVHD-SUB): This group investigates improved audiovisual subjective quality testing methods. This effort may lead to a revision of ITU-T Rec. P.911. As examples of its activities, the group has investigated alternative experiment designs for subjective tests, to validate subjective testing of long video sequences that are only viewed once by each subject. In addition, it conducted a joint investigation into the impact of the environment on mean opinion scores (MOS).
    • Psycho-Physiological Quality Assessment (PsyPhyQA): The aim of this project is to establish novel psychophysiology based techniques and methodologies for video quality assessment and real-time interaction of humans with advanced video communication environments. Specifically, some of the aspects that the project is looking at include: video quality assessment based on human psychophysiology (including, eye gaze, EEG, EKG, EMG, GSR, etc.), computational video quality models based on psychophysiological measurements, signal processing and machine learning techniques for psychophysiology based video quality assessment, experimental design and methodologies for psychophysiological assessment, correlates of psychophysics and psychophysiology. PsyPhyQA has published a dataset and testplan for a common framework for the evaluation of psychophysiological visual quality assessment.
    • Statistical Analysis Methods (SAM): This group addresses problems related to how to better analyze and improve data quality coming from subjective experiments and how to consider uncertainty in objective media quality predictors/models development. Its main goals are: to improve methods used to draw conclusions from subjective experiments, to understand the process of expressing opinion in a subjective experiment, to improve subjective experiment design to facilitate analysis and applications, to improve the analysis of objective model performances, and to revisit standardised methods for the assessment of the performance of objective model performances. 
  2. Objective Metrics: Working towards developing and validating objective video quality metrics.
    • Audiovisual HD (AVHD), project “AVHD-AS / P.NATS phase 2”: It is a joint project of VQEG and ITU Study Group 12 Question 14. The main goal is to develop a multitude of objective models, varying in terms of complexity/type of input/use-cases for the assessment of video quality in HTTP/TCIP based adaptive bitrate streaming services (e.g., YouTube, Vimeo, Amazon Video, Netflix, etc). For these services quality experienced by the end user is affected by video coding degradations, and delivery degradations due to initial buffering, re-buffering and media adaptations caused by the changes in bitrate, resolution, and frame rate
    • Computer Generated Imagery (CGI): focuses on the computer generated content for both images and videos material. The main goals are as follows: creating a large database of computer generated content, analyzing the content (feature extraction before and after rendering), analyzing the performance of objective quality metrics, evaluating/developing existing/new quality metrics/models for CGI material, studying rendering adaptation techniques (depending on the network constraints). This activity is in-line with the ITU-T work item P.BBQCG (Parametric Bitstream-based Quality Assessment of Cloud Gaming Services). 
    • No Reference Metrics (NORM): This group is an open collaborative for developing No-Reference metrics and methods for monitoring use case specific visual service quality. The NORM group is a complementary, industry-driven alternative of QoE to measure automatically the visual quality by using perceived indicators. Its main activities are to maintain a list of real-world use cases for visual quality monitoring, a list of potential algorithms and methods for no reference MOS and/or key indicators (visual artifact detection) for each use case, a list of methods (including datasets) to train and validate the algorithms for each use case, and a list of methods to provide root cause indication for each use case. In addition, the group encourages open discussions and knowledge sharing on all aspects related to no-reference metric research and development. 
    • Joint Effort Group (JEG) – Hybrid: This group is an open collaboration working together to develop a robust Hybrid Perceptual/Bit-Stream model. It has developed and made available routines to create and capture bit-stream data and parse bit-streams into HMIX files. Efforts are underway into developing subjectively rated video quality datasets with bit-stream data that can be used by all JEG researchers. The goal is to produce one model that combines metrics developed separately by a variety of researchers. 
    • Quality Assessment for Computer Vision Applications (QACoViA): the goal of this group is to study the visual quality requirements for computer vision methods, especially focusing on: testing methodologies and frameworks to identify the limit of computer vision methods with respect to the visual quality of the ingest; the minimum quality requirements and objective visual quality measure to estimate if a visual content is the operating region of computer vision; and delivering implementable algorithms being a proof/demonstrate of the new proposal concept of an objective video quality assessment methods for recognition tasks.
  3. Industry and Applications: Focused on seeking improved understanding of new video technologies and applications.
    • 5G Key Performance Indicators (5GKPI): Studies the relationship between the Key Performance Indicators (KPI) of new communication networks (namely 5G, but extensible to others) and the QoE of the video services on top of them. With this aim, this group addresses: the definition of relevant use cases (e.g., video for industrial applications, or mobility scenarios), the study of global QoE aspects for video in mobility and industrial scenarios, the identification of the relevant network KPIs(e.g., bitrate, latency, etc.) and application-level video KPIs (e.g., picture quality, A/V sync, etc.) and the generation of open datasets for algorithm testing and training.
    • IMG (Immersive Media Group): This group researches on quality assessment of immersive media, with the main goals of generating datasets of immersive media content, validating subjective test methods, and baseline quality assessment of immersive systems providing guidelines for QoE evaluation. The technologies covered by this group include: 360-degree content, virtual/augmented mixed reality, stereoscopic 3D content, Free Viewpoint Video, multiview technologies, light field content, etc.
  4. Support and Outreach: Responsible for the support for VQEG’s activities.
    • eLetter: The goal of VQEG eLetter is to provide up-to-date technical advances on video quality related topics. Each issue of VQEG eletter features a collection of papers authored by well-known researchers. These papers are contributed by invited authors or authors responding to a call-for-paper, and they can be: technical papers, summary/review of other publications, best practice anthologies, reprints of difficult to obtain articles, and responses to other articles. VQEG wants the eLetter to be interactive in nature.
    • Human Factors for Visual Experiences (HFVE): The objectives of this group is  to uphold the liaison relation between VQEG and the IEEE standardization group P3333.1. Some examples of the activities going on within this group are the standard for the (deep learning-based) assessment based on human factors of visual experiences with virtual/augmented/mixed reality and the standards on human factors for the  quality assessment of light field imaging (IEEE P3333.1.4) and on quality assessment of high dynamic range technologies. 
    • Independent Lab Group (ILG): The ILG act as independent arbitrators, whose generous contributions make possible the VQEG validation tests. Their goal is to ensure that all VQEG validation testing is unbiased and done to high quality standards. 
    • Joint Effort Group (JEG): is an activity within VQEG that promotes collaborative efforts addressed to: validate metrics through both subjective dataset completion and metric design, extend subjective datasets in order to better identify the limitations of quality metrics, improve subjective methodologies to address new scenarios and use cases that involve QoE issues, and increase the knowledge about both subjective and objective video quality assessment.
    • Joint Qualinet-VQEG team on Immersive Media: The objectives of this joint team from Qualinet and VQEG are: to uphold the liaison relation between both bodies, to inform both QUALINET and VQEG on the activities in respective organizations (especially on the topic of immersive media), to promote collaborations on other topics (i.e., form new joint teams), and to uphold the liaison relation with ITU-T SG12, in particular on topics around interactive, augmented and virtual reality QoE.
    • Tools and Subjective Labs Setup: The objective of this project is to provide the video quality research community with a wide variety of software tools and guidance in order to facilitate research. Tools are available in the following categories: quality analysis (software to run quality analyses), encoding (video encoding tools), streaming (streaming and extracting information from video streams), subjective test software (tools for running and analyzing subjective tests), and helper tools (miscellaneous helper tools).

In addition, the Intersector Rapporteur Group on Audiovisual Quality Assessment (IRG-AVQA) studies topics related to video and audiovisual quality assessment (both subjective and objective) among ITU-R Study Group 6 and ITU-T Study Group 12. VQEG colocates meetings with the IRG-AVQA to encourage a wider range of experts to contribute to Recommendations. 

For more details and previous closed projects please check: https://www.its.bldrdoc.gov/vqeg/projects-home.aspx

Major achievements

VQEG activities are documented in reports and submitted to relevant ITU Study Groups (e.g., ITU-T SG9, ITU-T SG12, ITU-R WP6C), and other SDOs as appropriate. Several VQEG studies have resulted in ITU Recommendations.

VQEG ProjectDescriptionITU Recommendations
Full Reference Television (FRTV) Phase I Examined the performance of FR and NR models on standard definition video. The test materials used in this test plan and the subjective tests data are freely available to researchers. ITU-T J.143 (2000), ITU-T J.144 (2001), ITU-T J.149 (2004)
Full Reference Television (FRTV) Phase II Examined the performance of FR and NR models on standard definition video, using the DSCQS methodology. ITU-T J.144 (2004)
ITU-R BT.1683 (2004)
Multimedia (MM) Phase I Examined the performance of FR, RR and NR models for VGA, CIF and QCIF video (no audio).ITU-T J.148 (2003)
ITU-T P.910 (2008)
ITU-T J.246 (2008)
ITU-T J.247 (2008)
ITU-T J.340 (2010)
ITU-R BT.1683 (2004)
Reduced Reference / No Reference Television (RRNR-TV) Examined the performance of RR and NR models on standard definition video ITU-T J.244 (2008)
ITU-T J.249 (2010)
ITU-R BT.1885 (2011)
High Definition Television (HDTV) Examined the performance of FR, RR and NR models for HDTV. Some of the video sequences used in this test are publicly available in the Consumer Digital Video Library.ITU-T J.341 (2011)
ITU-T J.342 (2011)
QARTStudied the subjective quality evaluation of video used for recognition tasks and task-based multimedia applications. ITU-T P.912 (2008)
Hybrid Perceptual BitstreamExamined the performance of Hybri models for VGA/WVGA and HDTV ITU-T J.343 (2014)
ITU-T J.343.1-6 (2014)
3DTVInvestigated how to assess 3DTV subjective video quality, covering methodologies, display requirements and evaluation of visual discomfort and fatigue. ITU-T P.914 (2016)
ITU-T P.915 (2016)
ITU-T P.916 (2016)
Audiovisual HD (AVHD)On one side, addressed the subjective evaluation of audio-video quality metrics.
On the other side, developed model standards for video quality assessment of streaming services over reliable transport for resolutions up to 4K/UHD, in collaboration with ITU-T SG12.
ITU-T P.913 (2014)
ITU-T P.1204 (2020)
ITU-T P.1204.3 (2020)
ITU-T P.1204.4 (2020)
ITU-T P.1204.5 (2020)

The contribution to current ITU standardization efforts is still ongoing. For example, updated texts have been contributed by VQEG on statistical analysis in ITU-T Rec. P.1401, and on subjective quality assessment of 360-degree video in ITU-T P.360-VR. 

Apart from this, VQEG is supporting the research on QoE by providing for the research community tools and datasets. For instance, it is worth noting the wide variety of software tools and guidance in order to facilitate research provided by VQEG Tools and Subjective Labs Setup via GitHub. Another example, is the VQEG Image Quality Evaluation Tool (VIQET), which is an objective no-reference photo quality evaluation tool. Finally, several datasets have been published which can be found in the websites of the corresponding projects, in the Consumer Digital Video Library or in other repositories.

General articles for the interested reader about the work of VQEG, especially covering the previous works are [1, 2].

References

[1] Q. Huynh-Thu, A. Webster, K. Brunnström, and M. Pinson, “VQEG: Shaping Standards on Video Quality”, in 1st International Conference on Advanced Imaging, Tokyo, Japan, 2015.
[2] K. Brunnström, D. Hands, F. Speranza, and A. Webster, “VQEG Validation and ITU Standardisation of Objective Perceptual Video Quality Metrics”, IEEE Signal Processing Magazine, vol. 26, no. 3, pp. 96-101, May 2009.