|Welcome to the second column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG).|
VQEG plays a major role in research and the development of standards on video quality and this column presents examples of recent contributions to International Telecommunication Union (ITU) recommendations, as well as ongoing contributions to recommendations to come in the near future. In addition, the formation of a new group within VQEG addressing Quality Assessment for Health Applications (QAH) has been announced.
VQEG website: www.vqeg.org
Jesús Gutiérrez (firstname.lastname@example.org), Universidad Politécnica de Madrid (Spain)
Kjell Brunnström (email@example.com), RISE (Sweden)
Thanks to Lucjan Janowski (AGH University of Science and Technology), Alexander Raake (TU Ilmenau) and Shahid Satti (Opticom) for their help and contributions.
VQEG is an international and independent organisation that provides a forum for technical experts in perceptual video quality assessment from industry, academia, and standardization organisations. Although VQEG does not develop or publish standards, several activities (e.g., validation tests, multi-lab test campaigns, objective quality models developments, etc.) carried out by VQEG groups have been instrumental in the development of international recommendations and standards. VQEG contributions have been mainly submitted to relevant ITU Study Groups (e.g., ITU-T SG9, ITU-T SG12, ITU-R WP6C), but also to other standardization bodies, such as MPEG, ITU-R SG6, ATIS, IEEE P.3333 and P.1858, DVB, and ETSI.
In our first column on the ACM SIGMM Records we provided a table summarizing the several VQEG studies that have resulted in ITU Recommendations. In this new column, we describe with more detail the last contributions to recent ITU standards, and we provide an insight on the ongoing contributions that may result in ITU recommendations in the near future.
ITU Recommendations with recent inputs from VQEG
ITU-T Rec. P.1204 standard series
A campaign within the ITU-T Study Group (SG) 12 (Question 14) in collaboration with the VQEG AVHD group resulted in the development of three new video quality model standards for the assessment of sequences of up to UHD/4K resolution. This campaign was carried out during more than two years under the project “AVHD-AS / P.NATS Phase 2”. While “P.NATS Phase 1” (finalized in 2016 and resulting in the standards series ITU-T Rec. P.1203, P.1203.1, P.1203.2 and P.1203.3) addressed the development of improved bitstream-based models for the prediction of the overall quality of long (1-5 minutes) video streaming sessions, the second phase addressed the development of short-term video quality models covering a wider scope with bitstream-based, pixel-based and hybrid models. The P.NATS Phase 2 project was executed as a competition between nine participating institutions in different tracks resulting in the aforementioned three types of video quality models.
For the competition, a total of 26 databases were created, 13 used for training and 13 for validation and selection of the winning models. In order to establish the ground truth, subjective video quality tests were performed on four different display devices (PC-monitors, 55-75” TVs, mobile, and tablet) with at least 24 subjects each and using the 5-point Absolute Category Rating (ACR) scale. In total, about 5000 test sequences with a duration of around 8 seconds were evaluated, containing a variety of resolutions, encoding configurations, bitrates, and framerates using the codecs H.264/AVC, H.265/HEVC and VP9.
More details about the whole workflow and results of the competition can be found in . As a result of this competition, the new standard series ITU-T Rec. P.1204  has been recently published, including a bitstream-based model (ITU-T Rec. P.1204.3 ), a pixel-based model (ITU-T Rec. P.1204.4 ) and a hybrid model (ITU-T Rec. P.1204.5 ).
ITU-T Rec. P.1401
ITU-T Rec. P.1401  is about statistical analysis, evaluation and reporting guidelines of quality measurements and was recently revised in January 2020. Based on the article by Brunnström and Barkowsky , it was recognized and pointed out by VQEG that this Recommendation, which is very useful, lacked a section on the topic of multiple comparisons and its potential impact on the performance evaluations of objective quality methods. In the latest revision, Section 7.6.5 covers this topic.
Ongoing VQEG Inputs to ITU Recommendations
ITU-T Rec. P.919
ITU has been working on a recommendation for subjective test methodologies for 360º video on Head-Mounted Displays (HMDs), under the SG12 Question 13 (Q13). The Immersive Media Group (IMG) of the VQEG has collaborated in this effort through the fulfilment of the Phase 1 of the Test Plan for Quality Assessment of 360-degree Video. In particular, the Phase 1 of this test plan addresses the assessment of short sequences (less than 30 seconds), in the spirit of ITU-R BT.500  and ITU-T P.910 . In this sense, the evaluation of audiovisual quality and simulator sickness was considered. On the other hand, the Phase 2 of the test plan (envisioned for the near future) covers the assessment of other factors that can be more influential with longer sequences (several minutes), such as immersiveness and presence.
Therefore, within Phase 1 the IMG designed and executed a cross-lab test with the participation of ten international laboratories, from AGH University of Science and Technology (Poland), Centrum Wiskunde & Informatica (The Netherlands), Ghent University (Belgium), Nokia Bell-Labs (Spain), Roma TRE University (Italy), RISE Acreo (Sweden), TU Ilmenau (Germany), Universidad Politécnica de Madrid (Spain), University of Surrey (England), Wuhan University (China).
This test was aimed at assessing and validating subjective evaluation methodologies for 360º video. Thus, the single-stimulus methodology Absolute Category Rating (ACR) and the double-stimulus Degradation Category Rating (DCR) were considered to evaluate audiovisual quality of 360º videos distorted with uniform and non-uniform degradations. In particular, different configurations of uniform and tile-based coding were applied to eight video sources with different spatial, temporal and exploration properties. Other influence factors were also studied, such as the influence of the sequence duration (from 10 to 30s) and the test setup (considering different HMDs and methods to collect the observers’ ratings, using audio or not, etc.). Finally, in addition to the evaluation of audiovisual quality, the assessment of simulator sickness symptoms was addressed studying the use of different questionnaires. As a result of this work, the IMG of VQEG presented two contributions to the recommendation ITU-T Rec. P.919 (ex P.360-VR), which has been consented in the last SG12 meeting (7-11 September 2020) and is envisioned to be published soon. In addition, the results and the annotated dataset coming from the cross-lab test will be published soon.
ITU-T Rec. P.913
Another upcoming contribution is prepared by the Statistical Analysis Group (SAM). The main goal of the proposal is to increase the precision of the subjective experiment analysis by describing a subjective answer as a random variable. The random variable is described by three key influencing factors, the sequence quality, a subject bias, and a subject precision. It is further development of the ITU-T P.913  recommendation where subject bias was introduced. Adding subject precision allows for two achievements: Better handling unreliable subjects and easier estimation procedure.
Current standards describe a way to remove an unreliable subject. The problem is that the methods proposed in BT.500  and P.913  are different and point to different subjects. Also, both methods have some arbitrary parameters (e.g., thresholds) deciding when a subject should be removed. It means that two subjects can be similarly imprecise but one is over the threshold, and we accept all his answers as correct and the other is under the threshold, and we remove her all answers. The proposed method weights the impact of each subject answer depending on the subject precision. As the consequence, each subject is to some extent removed and kept. The balance between how much information we keep and how much we remove depends on the subject precision.
The estimation procedure of the proposed model, described in the literature, is MLE (Maximum Likelihood Estimation). Such estimation is computationally costly and needs a careful setup to obtain a reliable solution. Therefore, we proposed Alternating Projection (AP) solver which is less general than MLE but works as well as MLE for the subject model estimation. This solver is called “alternating projection” because, in a loop, we alternate between projecting (or averaging) the opinion scores along the subject dimension and the stimulus dimension. It increases the precision of the obtained model parameters’ step by step weighting more information coming from the more precise subjects. More details can be found in the white paper in .
A new VQEG group has been recently established related to Quality Assessment for Health Applications (QAH), with the motivation to study visual quality requirements for medical imaging and telemedicine. The main goals of this new group are:
- Assemble all the existing publicly accessible databases on medical quality.
- Develop databases with new diagnostic tasks and new objective quality assessment models.
- Provide methodologies, recommendations and guidelines for subjective test of medical image quality assessment.
- Study the quality requirements and Quality of Experience in the context of telemedicine and other telehealth services.
For any further questions or expressions of interest to join this group, please contact QAH Chair Lu Zhang (firstname.lastname@example.org), Vice Chair Meriem Outtas (Meriem.Outtas@insa-rennes.fr), and Vice Chair Hantao Liu (email@example.com).
 A. Raake, S. Borer, S. Satti, J. Gustafsson, R.R.R. Rao, S. Medagli, P. List, S. Göring, D. Lindero, W. Robitza, G. Heikkilä, S. Broom, C. Schmidmer, B. Feiten, U. Wüstenhagen, T. Wittmann, M. Obermann, R. Bitto, “Multi-model standard for bitstream-, pixel-based and hybrid video quality assessment of UHD/4K: ITU-T P.1204” , IEEE Access, 2020 (Available online soon).
 ITU-T Rec. P.1204. Video quality assessment of streaming services over reliable transport for resolutions up to 4K. Geneva, Switzerland: ITU, 2020.
 ITU-T Rec. P.1204.3. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to full bitstream information. Geneva, Switzerland: ITU, 2020.
 ITU-T Rec. P.1204.4. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to full and reduced reference pixel information. Geneva, Switzerland: ITU, 2020.
 ITU-T Rec. P.1204.5. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to transport and received pixel information. Geneva, Switzerland: ITU, 2020.
 ITU-T Rec. P.1401. Methods, metrics and procedures for statistical evaluation, qualification and comparison of objective quality prediction models. Geneva, Switzerland: ITU, 2020.
 K. Brunnström and M. Barkowsky, “Statistical quality of experience analysis: on planning the sample size and statistical significance testing”, Journal of Electronic Imaging, vol. 27, no. 5, p. 11, Sep. 2018 (DOI: 10.1117/1.JEI.27.5.053013).
 ITU-R Rec. BT.500-14. Methodology for the subjective assessment of the quality of television pictures. Geneva, Switzerland: ITU, 2019.
 ITU-T Rec. P.910. Subjective video quality assessment methods for multimedia applications. Geneva, Switzerland: ITU, 2008.
 ITU-T Rec. P.913. Methods for the subjective assessment of video quality, audio quality and audiovisual quality of Internet video and distribution quality television in any environment. Geneva, Switzerland: ITU, 2016.
 Z. Li, C. G. Bampis, L. Janowski, I. Katsavounidis, “A simple model for subject behavior in subjective experiments”, arXiv:2004.02067, Apr. 2020.