VQEG Column: Finalization of Recommendation Series P.1204, a Multi-Model Video Quality Evaluation Standard – The New Standards P.1204.1 and P.1204.2

Abstract

This column introduces the now completed ITU-T P.1204 video quality model standards for assessing sequences up to UHD/4K resolution. Initially developed over two years by ITU-T Study Group 12 (Question Q14/12) and VQEG, the work used a large dataset of 26 subjective tests (13 for training, 13 for validation), each involving at least 24 participants rating sequences on the 5-point ACR scale. The tests covered diverse encoding settings, bitrates, resolutions, and framerates for H.264/AVC, H.265/HEVC, and VP9 codecs. The resulting 5,000-sequence dataset forms the largest lab-based source for model development to date. Initially standardized were P.1204.3, a no-reference bitstream-based model with full bitstream access, P.1204.4, a pixel-based, reduced-/full-reference model, and P.1204.5, a no-reference hybrid model. The current record focuses on the latest additions to the series, namely P.1204.1, a parametric, metadata-based model using only information about which codec was used, plus bitrate, framerate and resolution, and P.1204.2, which in addition uses frame-size and frame-type information to include video-content aspects into the predictions.

Introduction

Video quality under specific encoding settings is central to applications such as VoD, live streaming, and audiovisual communication. In HTTP-based adaptive streaming (HAS) services, bitrate ladders define video representations across resolutions and bitrates, balancing screen resolution and network capacity. Video quality, a key contributor to users’ Quality of Experience (QoE), can vary with bandwidth fluctuations, buffer delays, or playback stalls. 

While such quality fluctuations and broader QoE aspects are discussed elsewhere, this record focuses on short-term video quality as modeled by ITU-T P.1204 for HAS-type content. These models assess segments of around 10s under reliable transport (e.g., TCP, QUIC), covering resolution, framerate, and encoding effects, but excluding pixel-level impairments from packet loss under unreliable transport.

Because video quality is perceptual, subjective tests, laboratory or crowdsourced, remain essential, especially at high resolutions such as 4K UHD under controlled viewing conditions (1.5H or 1.6H viewing distance). Yet, studies show limited perceptual gain between HD and 4K, depending on source content, underlining the need for representative test materials. Given the high cost of such tests, objective (instrumental) models are required for scalable, automated assessment supporting applications like bitrate ladder design and service monitoring.

Four main model classes exist: metadata-based, bitstream-based, pixel-based, and hybrid. Metadata-based models use codec parameters (e.g., resolution, bitrate) and are lightweight; bitstream-based models analyze encoded streams without decoding, as in ITU-T P.1203 and P.1204.3 [1][2][3][7]. Pixel-based models compare decoded frames and include Full Reference and Reduced Reference models (e.g., P.1204.4, and also PSNR [9], SSIM [10], VMAF [11][12]), as well as No Reference variants. Finally, hybrid models combine pixel and bitstream or metadata inputs, exemplified by the ITU-T P.1204.5 standard. These three standards, P.1204.3 P.1204.4 and P.1204.5, formed the initial P.1204 Recommendation series finalized in 2020.

ITU-T P.1204 series completed with P.1204.1 and P.1204.2

The respective standardization project under the Work Item name P.NATS Phase 2 (read: Peanuts) was a unique video quality model development competition conducted in collaboration between ITU-T Study Group 12 (SG12) and the Video Quality Experts Group (VQEG). The target use cases were for up to UHD/4K resolution, with presentation on UHD/4K resolution PC/TV or Mobile/Tablet (MO/TA). For the first time, bitstream-, pixel-based, and hybrid models were jointly developed, trained, and validated, using a large common subjective dataset comprising 26 tests, each with at least 24 participants (see, e.g., [1] for details). The P.NATS Phase 2 work built on the earlier “P.NATS Phase 1” project, which resulted in the ITU-T Rec. P.1203 standards series (P.1203, P.1203.1, P.1203.2, P.1203.3). In the P.NATS Phase 2 project, video quality models in five different categories were evaluated, and different candidates were found to be eligible to be recommended as standards. The initially standardized three models out of the five categories were the aforementioned P.1204.3, P.1204.4 and P.1204.5. However, due to the lack of consensus between the winning proponents, no models were recommended as standards for the category “bitstream Mode 0” with access to high-level metadata only, such as the video codec, resolution, framerate and bitrate used,  and “bitstream Mode 1”, with further access to frame-size information that can be used for content-complexity estimation.

For the latest model additions of P.1204.1 and P.1204.2, subsets of the databases initially used in the P.NATS Phase 2 project were employed for model training. Two different datasets belonging to the two contexts PC/TV and MO/TA were used for training the models. AVT-PNATS-UHD-1 is the dataset for the PC/TV use case and ERCS-PNATS-UHD-1 the dataset used for the MO/TA use case. 

AVT-PNATS-UHD-1 [7] consists of four different subjective tests conducted by TU Ilmenau as part of the P.NATS Phase 2 competition. The target resolution of these datasets was 3840 x 2160 pixels. ERCS-PNATS-UHD-1 [1] is a dataset targeting the MO/TA use case. It consists of one subjective test conducted by Ericsson as part of the P.NATS Phase 2 competition. The target resolution of these datasets was 2560 x 1440 pixels. 

For model performance evaluation, beyond AVT-PNATS-UHD-1, further externally available video-quality test databases were used, as outlined in the following.

AVT-VQDB-UHD-1: This is a publicly available dataset and consists of four different subjective tests. All the four tests had a full-factorial design. In total, 17 different SRCs with a duration of 7-10 s were used across all the four tests. All the sources had a resolution of 3840×2160 pixels and a framerate of 60 fps. For HRC design, bitrate was selected in fixed (i.e. non-adaptive) values per PVS between 200kbps and 40000kbps, resolution between 360p and 2160p and framerate between 15fps and 60fps. In all the tests, a 2-pass encoding approach was used to encode the videos, with medium preset for H.264 and H.265, and the speed parameter for VP9 set to the default value “0”. A total of 104 participants in the four tests.

GVS: This dataset consists of 24 SRCs that have been extracted from 12 different games. The SRCs are of 1920×1080 pixel resolution, 30fps framerate and have a duration of 30s . The HRC design included three different resolutions, namely, 480p, 720p and 1080p . 90 PVSs resulting from 15 bitrate-resolution pairs were used for subjective evaluation. A total of 25 participants rated all the 90 PVSs.

KUGVD: Six SRCs out of the 24 SRCs from the GVSwere used to develop KUGVD. The same bitrate-resolution pairs from GVS were included to define the HRCs. In total, 90 PVSs were used in the subjective evaluation and 17 participants took part in the test.

CGVDS:  This dataset consists of SRCs captured at 60fps from 15 different games. For designing the HRCs, three resolutions, namely, 480p, 720p and 1080p at three different framerates of 20, 30, and 60fps were considered. To ensure that the SRCs from all the games could be assessed by test subjects, the overall test was split into 5 different subjective tests, with a minimum of 72 PVSs being rated in each of the tests. A total of over 100 participants took part over the five different tests, with a minimum of 20 participants per test.

Twitch: The Twitch Dataset consists of 36 different games, with 6 games each representing one out of 6 pre-defined genres. The dataset consists of streams directly downloaded from Twitch. A total of 351 video sequences of approximately 50s duration across all representations were downloaded. 90 video sequences out of these 351 video sequences were selected for subjective evaluation. Only the first 30s of the chosen 90 PVSs were considered for subjective testing. Six different resolutions between 160p and 1080p at framerates of 30 and 60fps were used. 29 participants rated all the 90 PVSs.

BBQCG: This is the training dataset developed as part of the P.BBQCG work item. This dataset consists of nine subjective test databases. Three out of these nine test databases consisted of processed video sequences (PVSs) up to 1080p/120fps and the remaining had PVSs up to 4K/60fps. Three codecs, namely, H.264, H.265, and AV1 were used to encode the videos. Overall 900 different PVSs were created from 12 sources (SRCs) by encoding the SRCs with different encoding settings.

AVT-VQDB-UHD-1-VD: This dataset consists of 16 source contents encoded using a CRF-based encoding approach. Overall 192 PVSs were generated by encoding all 16 sources in four resolutions, namely, 360p, 720p, 1080p, 2160p with three CRF values (22, 30, 38) each. A total of 40 subjects participate in the study.

ITU-T P.1204.1 and P.1204.2 model prediction performance

The performance figures of the two new models P.1204.1 and P.1204.2 models on the different datasets are indicated in Table 1 (P.1204.1) and Table 2 (P.1204.2) below.

Table 1: Performance of P.1204.1 (Mode 0) on the evaluation datasets, in terms of Root Mean Square Error (RMSE, measure used as winning criterion in the ITU-T/VQEG modelling competition). Pearson Correlation Coefficeint (PCC), Spearman Rank Correlation Coefficient (SRCC) and Kendall’s tau.
DatasetRMSEPCCSRCCKendall
AVT-VQDB-UHD-10.4990.890 0.8770.684
KUGVD 0.8400.590 0.5700.410
GVS 0.690 0.670 0.6500.490
CGVDS 0.470 0.7800.7500.560
Twitch 0.430 0.9200.8900.710
BBQCG 0.598 (on a 7-point scale) 0.8410.8430.647
AVT-VQDB-UHD-1-VD0.6500.8140.8130.617
Table 2: Performance of P.1204.1 (Mode 1) on the evaluation datasets, in terms of Root Mean Square Error (RMSE, measure used as winning criterion in the ITU-T/VQEG modelling competition). Pearson Correlation Coefficeint (PCC), Spearman Rank Correlation Coefficient (SRCC) and Kendall’s tau.
DatasetRMSEPCCSRCCKendall
AVT-VQDB-UHD-10.4760.9010.9000.730
KUGVD0.5000.8700.8600.690
GVS0.4200.890 0.870 0.710
CGVDS0.3600.9000.8800.690
Twitch0.3700.940 0.9300.770
BBQCG0.737 (on a 7-point scale)0.745 0.746 0.547
AVT-VQDB-UHD-1-VD0.5980.845 0.845 0.654

For all databases except BBQCG and KUGVD, the Mode 0 model P.1204.1 performs in a solid way, as shown in Table 1. With the information about frame types and sizes available to the Mode 1 model P.1204.2, performance improves considerably, as shown in Table 2. For performance results of all three previously standardized models, P.1204.3, P.1204.4 and P.1204.5, the reader is referred to [1] and the individual standards, [4][5][6]. For the P.1204.3 model, complementary performance information is presented in, e.g., [2][7]. For P.1204.4, additional model performance information is available in [8], including results for AV1, AVS2, and VVC.

The following plots provide an illustration of how the new P.1204.1 Mode 0 model may be used. Here, bitrate-ladder-type graphs are presented, with the predicted Mean Opinion Score on a 5-point scale plotted over log bitrate.


Codec: H.264

Codec: H.265

Codec: VP9

Conclusions and Outlook

The P.1204 standard series now comprises the complete initially planned set of models, namely:

  • ITU-T P.1204.1: Bitstream Mode 0, i.e., metadata-based model with access to information about video codec, resolution, framerate and bitrate used.
  • ITU-T P.1204.2: Bitstream Mode 1, i.e., metadata-based model with access to information about video codec, resolution, framerate and bitrate used, plus information about video frame types and sizes.
  • ITU-T P.1204.3: Bitstream Mode 3 [1][2][3][7].
  • ITU-T P.1204.4: Pixel-based reduced- and full-reference [1][5][8].
  • ITU-T P.1204.5: Hybrid no-reference Mode 0 [1][6].

Extensions of some of these models beyond the initial scope of codecs (H.264/AVC, H.265/HEVC, VP9) have been included over the last few years. Here, P.1204.4 and P.1204.5 have been extended (P.1204.5) or evaluated (P.1204.4) to also cover the AV1 video codec. Work in ITU-T SG12 (Q14/12) is ongoing so as to also extend P.1204.1, P.1204.2 and P.1204.3 to newer codecs such as AV1, and all five models are planned to be extended so as to also cover VVC. It is noted that for P.1204.3, P.1204.4 and P.1204.5, also long-term quality integration modules that generate per-session scores for up to 5min long streaming sessions have been described in Appendices of the respective recommendations. For P.1204.1 and P.1204.2, this extension still has to be completed. Initial evaluations for similar Mode 0 and Mode 1 models that use the P.1204.3-type long-term integration can be found in [7].

References

[1] Raake, A., Borer, S., Satti, S.M., Gustafsson, J., Rao, R.R.R., Medagli, S., List, P., Göring, S., Lindero, D., Robitza, W. and Heikkilä, G., 2020. Multi-model standard for bitstream-, pixel-based and hybrid video quality assessment of UHD/4K: ITU-T P. 1204. IEEE Access, 8, pp.193020-193049.
[2] Rao, R.R.R., Göring, S., List, P., Robitza, W., Feiten, B., Wüstenhagen, U. and Raake, A., 2020, May. Bitstream-based model standard for 4K/UHD: ITU-T P. 1204.3—Model details, evaluation, analysis and open source implementation. In 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX) (pp. 1-6).
[3] ITU-T Rec. P.1204, 2025. Video quality assessment of streaming services over reliable transport for resolutions up to 4K. International Telecommunication Union (ITU-T), Geneva, Switzerland.
[4] ITU-T Rec. P.1204.3, 2020. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to full bitstream information. International Telecommunication Union (ITU-T), Geneva, Switzerland.
[5] ITU-T Rec. P.1204.4, 2022. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to full and reduced reference pixel information. International Telecommunication Union (ITU-T), Geneva, Switzerland.
[6] ITU-T Rec. P.1204.5, 2023. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to transport and received pixel information. International Telecommunication Union (ITU-T), Geneva, Switzerland.
[7] Rao, R.R.R., Göring, S. and Raake, A., 2022. AVQBits – Adaptive video quality model based on bitstream information for various video applications. IEEE Access, 10, pp.80321-80351.
[8] Borer, S., 2022, September. Performance of ITU-T P. 1204.4 on Video Encoded with AV1, AVS2, VVC. In 2022 14th International Conference on Quality of Multimedia Experience (QoMEX) (pp. 1-4).
[9] Winkler, S. and Mohandas, P., 2008. The evolution of video quality measurement: From PSNR to hybrid metrics. IEEE transactions on Broadcasting, 54(3), pp.660-668.
[10] Wang, Z., Lu, L. and Bovik, A.C., 2004. Video quality assessment based on structural distortion measurement. Signal Processing: Image Communication, 19(2), pp.121-132.
[11] Li, Z., Aaron, A., Katsavounidis, I., Moorthy, A., and Manohara, M., 2016. Toward A Practical Perceptual Video Quality Metric, Netflix TechBlog.
[12] Li, Z., Swanson, K., Bampis, C., Krasula, L., and Aaron, A., 2020. Toward a Better Quality Metric for the Video Community, Netflix TechBlog.

VQEG Column: VQEG Meeting May 2025

Introduction

From May 5th to 9th, 2025 Meta hosted the plenary meeting of the Video Quality Experts Group (VQEG) in their headquarters in Menlo Park (CA, United Sates). Around 150 participants registered to the meeting, coming from industry and academic institutions from 26 different countries worldwide.

The meeting was dedicated to present updates and discuss about topics related to the ongoing projects within VQEG. All the related information, minutes, and files from the meeting are available online in the VQEG meeting website, and video recordings of the meeting are available in Youtube.

All the topics mentioned bellow can be of interest for the SIGMM community working on quality assessment, but special attention can be devoted to the first activities of the group on Subjective and objective assessment of GenAI content (SOGAI) and to the advances on the contribution of the Immersive Media Group (IMG) group to the International Telecommunication Union (ITU) towards the Rec. ITU-T P.IXC for the evaluation of Quality of Experience (QoE) of immersive interactive communication systems.

Readers of these columns who are interested in VQEG’s ongoing projects are encouraged to subscribe to the corresponding mailing lists to stay informed and get involved.

Group picture of the meeting

Overview of VQEG Projects

Immersive Media Group (IMG)

The IMG group researches on the quality assessment of immersive media technologies. Currently, the main joint activity of the group is the development of a test plan to evaluate the QoE of immersive interactive communication systems, which is carried out in collaboration with ITU-T through the work item P.IXC. In this meeting, Pablo Pérez (Nokia XR Lab, Spain), Marta Orduna (Nokia XR Lab, Spain), and Jesús Gutiérrez (Universidad Politécnica de Madrid, Spain) presented the status of this recommendation and the next steps to be addressed towards a new contribution to ITU-T in its next meeting in September 2025. Also, in this meeting, it was decided that Marta Orduna will replace Pablo Pérez as vice-chair of IMG. In addition, the following presentations related to IMG topics were delivered:

Statistical Analysis Methods (SAM)

The SAM group investigates on analysis methods both for the results of subjective experiments and for objective quality models and metrics. In relation with these topics, the following presentations were delivered during the meeting:

Joint Effort Group (JEG) – Hybrid

The group JEG addresses several areas of Video Quality Assessment (VQA), such as the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. The chair of this group, Enrico Masala (Politecnico di Torino, Italy) presented the updates on the latest activities of the group, including the current results of the Implementer’s Guide for Video Quality Metrics (IGVQM) project. In addition to this, the following presentations were delivered:

Emerging Technologies Group (ETG)

The ETG group focuses on various aspects of multimedia that, although they are not necessarily directly related to “video quality”, can indirectly impact the work carried out within VQEG and are not addressed by any of the existing VQEG groups. In particular, this group aims to provide a common platform for people to gather together and discuss new emerging topics, possible collaborations in the form of joint survey papers, funding proposals, etc. In this sense, the following topics were presented and discussed in the meeting:

  • Avinab Saha (UT Austin, United States) presented the dataset of perceived expression differences, FaceExpressions-70k, which contains 70,500 subjective expression comparisons rated by over 1,000 study participants obtained via crowdsourcing.
  • Mathias Wien (RWTH Aachen University, Germany) reported on recent developments in MPEG AG 5 and JVET for preparations towards a Call for Evidence (CfE) on video compression with capability beyond VVC.
  • Effrosyni Doutsi (Foundation for Research and Technology – Hellas, Greece) presented her research on novel evaluation frameworks for spike-based compression mechanisms.
  • David Ronca (Meta Platforms Inc. United States) presented the Video Codec Acid Test (VCAT), which is a benchmarking tool for hardware and software decoders on Android devices.

Subjective and objective assessment of GenAI content (SOGAI)

The SOGAI group seeks to standardize both subjective testing methodologies and objective metrics for assessing the quality of GenAI-generated content. In this first meeting of the group since its foundation, the following topics were presented and discussed:

  • Ryan Lei and Qi Cai (Meta Platforms Inc., United states) presented their work on learning from subjective evaluation of Super Resolution (SR) in production use cases at scale, which included extensive benchmarking tests and subjective evaluation with external crowdsource vendors.
  • Ioannis Katsavounidis, Qi Cai, Elias Kokkinis, Shankar Regunathan (Meta Platforms Inc., United States) presented their work on learning from synergistic subjective/objective evaluation of auto dubbing in production use cases.
  • Kamil Koniuch (AGH University of Krakow, Poland) presented his research on cognitive perspective on Absolute Category Rating (ACR) scale tests
  • Patrick Le Callet (Nantes Universite, France) presented his work, in collaboration with researchers from SJTU (China) on perceptual quality assessment of AI-generated omnidirectional images, including the annotated dataset called AIGCOIQA2024.

Multimedia Experience and Human Factors (MEHF)

The MEHF group focuses on the human factors influencing audiovisual and multimedia experiences, facilitating a comprehensive understanding of how human factors impact the perceived quality of multimedia content. In this meeting, the following presentations were given:

5G Key Performance Indicators (5GKPI)

The 5GKPI group studies the relationship between key performance indicators of new 5G networks and QoE of video services on top of them. In this meeting, Pablo Pérez (Nokia XR Lab, Spain) and the rest of the team presented a first draft of the VQEG Whitepaper on QoE management in telecommunication networks, which shares insights and recommendations on actionable controls and performance metrics that the Content Application Providers (CAPs) and Network Service Providers (NSPs) can use to infer, measure and manage QoE.

In addition, Pablo Perez (Nokia XR Lab, Spain), Marta Orduna (Nokia XR Lab, Spain), and Kamil Koniuch (AGH University of Krakow, Poland) presented design guidelines and a proposal of a simple but practical QoE model for communication networks, with a focus on 5G/6G compatibility.

Quality Assessment for Health Applications (QAH)

The QAH group is focused on the quality assessment of health applications. It addresses subjective evaluation, generation of datasets, development of objective metrics, and task-based approaches. In this meeting, Lumi Xia (INSA Rennes, France) presented her research on task-based medical image quality assessment by numerical observer.

Other updates

Apart from this, Ajit Ninan (Meta Platforms Inc., United States) delivered a keynote on rethinking visual quality for perceptual display; a panel was organized with Christos Bampis (Netflix, United States), Denise Noyes (Meta Platforms Inc., United States), and Yilin Wang (Google, United States) addressing what more is left to do on optimizing video quality for adaptive streaming applications, which was moderated by Narciso García (Universidad Politécnica de Madrid, Spain); and there was a co-located ITU-T Q19 interim meeting. In addition, although no progresses were presented in this meeting, the groups on No Reference Metrics (NORM) and on Quality Assessment for Computer Vision Applications (QACoViA) are still active.  

Finally, as already announced in the VQEG website, the next VQEG plenary meeting will be online or hybrid online/in-person, probably in November or December 2025.

VQEG Column: VQEG Meeting November 2024

Introduction

The last plenary meeting of the Video Quality Experts Group (VQEG) was held online by the Institute for Telecommunication Sciences (ITS) of the National Telecommunications and Information Adminsitration (NTIA) from November 18th to 22nd, 2024. The meeting was attended by 70 participants from industry and academic institutions from 17 different countries worldwide.

The meeting was dedicated to present updates and discuss about topics related to the ongoing projects within VQEG. All the related information, minutes, and files from the meeting are available online in the VQEG meeting website, and video recordings of the meeting are available in Youtube.

All the topics mentioned bellow can be of interest for the SIGMM community working on quality assessment, but special attention can be devoted to the creation of a new group focused on Subjective and objective assessment of GenAI content (SOGAI) and to the recent contribution of the Immersive Media Group (IMG) group to the International Telecommunication Union (ITU) towards the Rec. ITU-T P.IXC for the evaluation of Quality of Experience (QoE) of immersive interactive communication systems. Finally, it is worth noting that Ioannis Katsavounidis (Meta, US) joins Kjell Brunnström (RISE, Sweden) as co-chairs of VQEG, substituting Margaret Pinson (NTIA(ITS).

Readers of these columns interested in the ongoing projects of VQEG are encouraged to subscribe to their corresponding reflectors to follow the activities going on and to get involved in them.

Group picture of the online meeting

Overview of VQEG Projects

Audiovisual HD (AVHD)

The AVHD group works on developing and validating subjective and objective methods to analyze commonly available video systems. In this meeting, Lucjan Janowski (AGH University of Krakow, Poland) and Margaret Pinson (NTIA/ITS) presented their proposal to fix wording related to an experiment realism and validity, based on the experience in the psychology domain that addresses the important concept of describing how much results from lab experiment can be used outside a laboratory.

In addition, given that there are no current joint activities of the group, the AVHD project will become dormant, with the possibility to be activated when new activities are planned.

Statistical Analysis Methods (SAM)

The group SAM investigates on analysis methods both for the results of subjective experiments and for objective quality models and metrics. In addition to a discussion on the future activities of the group lead by its chairs Ioannis Katsavounidis (Meta, US), Zhi Li (Netflix, US), and Lucjan Janowski (AGH University of Krakow, Poland), the following presentations were delivered during the meeting:  

No Reference Metrics (NORM)

The group NORM addresses a collaborative effort to develop no-reference metrics for monitoring visual service quality. In this sense, Ioannis Katsavounidis (Meta, US) and Margaret Pinson (NTIA/ITS) summarized recent discussions within the group on developing best practices for subjective test methods when analyzing Artificial Intelligence (AI) generated images and videos. This discussion resulted in the creation of a new VQEG project called Subjective and objective assessment of GenAI content (SOGAI) to investigate subjective and objective methods to evaluate the content produced by generative AI approaches.

Emerging Technologies Group (ETG)

The ETG group focuses on various aspects of multimedia that, although they are not necessarily directly related to “video quality”, can indirectly impact the work carried out within VQEG and are not addressed by any of the existing VQEG groups. In particular, this group aims to provide a common platform for people to gather together and discuss new emerging topics, possible collaborations in the form of joint survey papers, funding proposals, etc. During this meeting, Abhijay Ghildyal (Portland State University, US), Saman Zadtootaghaj (Sony Interactive Entertainment, Germany), and Nabajeet Barman (Sony Interactive Entertainment, UK) presented their work on quality assessment of AI generated content and AI enhanced content. In addition, Matthias Wien (RWTH Aachen University, Germany) presented the approach, design and methodology for the evaluation of AI-based Point Cloud Compression in the corresponding Call for Proposals in MPEG. Finally, Abhijay Ghildyal (Portland State University, US) presented his work on how foundation models boost low-level perceptual similarity metrics, investigating the potential of using intermediate features or activations from these models for low-level image quality assessment, and showing that such metrics can outperform existing ones without requiring additional training.

Joint Effort Group (JEG) – Hybrid

The group JEG addresses several areas of Video Quality Assessment (VQA), such as the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. In addition, the group includes the VQEG project Implementer’s Guide for Video Quality Metrics (IGVQM). The chair of this group, Enrico Masala (Politecnico di Torino, Italy) presented the updates on the latest activities going on, including the plans for experiments within the IGVMQ project to get feedback from other VQEG members.

In addition to this, Lohic Fotio Tiotsop (Politecnico di Torino, Italy) delivered two presentations. The first one focused on the prediction of the opinion score distribution via AI-based observers in media quality assessment, while the second one analyzed unexpected scoring behaviors in image quality assessment comparing controlled and crowdsourced subjective tests.

Immersive Media Group (IMG)

The IMG group researches on the quality assessment of immersive media technologies. Currently, the main joint activity of the group is the development of a test plan to evaluate the QoE of immersive interactive communication systems, which is carried out in collaboration with ITU-T through the work item P.IXC. In this meeting, Pablo Pérez (Nokia XR Lab, Spain), Marta Orduna (Nokia XR Lab, Spain), and Jesús Gutiérrez (Universidad Politécnica de Madrid, Spain) presented the status of the Rec. ITU-T P.IXC that the group was writing based on the joint test plan developed in the last months and that was submitted to ITU and discussed in its meeting in January 2025.

Also, in relation with this test plan, Lucjan Janowski (AGH University of Krakow, Poland) and Margaret Pinson (NTIA/ITS) presented an overview of ITU recommendations for interactive experiments that can be used in the IMG context.

In relation with other topics addressed by IMG, Emin Zerman (Mid Sweden University, Sweden) delivered two presentations. The first one presented the BASICS dataset, which contains a representative range of nearly 1500 point clouds assessed by thousands of participants to enable robust quality assessments for 3D scenes. The approach involved a careful selection of diverse source scenes and the application of specific “distortions” to simulate real-world compression impacts, including traditional and learning-based methods. The second presentation described a spherical light field database (SLFDB) for immersive telecommunication and telepresence applications, which comprises 60-view omnidirectional captures across 20 scenes, providing a comprehensive basis for telepresence research.

Quality Assessment for Computer Vision Applications (QACoViA)

The group QACoViA addresses the study the visual quality requirements for computer vision methods, where the final user is an algorithm. In this meeting, Mehr un Nisa (AGH University of Krakow, Poland) presented a comparative performance analysis of deep learning architectures in underwater image classification. In particular, the study assessed the performance of the VGG-16, EfficientNetB0, and SimCLR models in classifying 5,000 underwater images. The results reveal each model’s strengths and weaknesses, providing insights for future improvements in underwater image analysis

5G Key Performance Indicators (5GKPI)

The 5GKPI group studies relationship between key performance indicators of new 5G networks and QoE of video services on top of them. In this meeting, Pablo Perez (Nokia XR Lab, Spain) and Francois Blouin (Meta, US) and others presented the progress on the 5G-KPI White Paper, sharing some of the ideas on QoS-to-QoE modeling that the group has been working on to get feedback from other VQEG members.

Multimedia Experience and Human Factors (MEHF)

The MEHF group focuses on the human factors influencing audiovisual and multimedia experiences, facilitating a comprehensive understanding of how human factors impact the perceived quality of multimedia content. In this meeting, Dominika Wanat (AGH University of Krakow, Poland) presented MANIANA (Mobile Appliance for Network Interrupting, Analysis & Notorious Annoyance), an IoT device for testing QoS and QoE applications in home network conditions that is made based on Raspberry Pi 4 minicomputer and open source solutions and allows safe, robust, and universal testing applications.

Other updates

Apart from this, it is worth noting that, although no progresses were presented in this meeting, the Quality Assessment for Health Applications (QAH) group is still active and focused on the quality assessment of health applications. It addresses subjective evaluation, generation of datasets, development of objective metrics, and task-based approaches.

In addition, the Computer Generated Imagery (CGI) project became dormant, since it recent activities can be covered by other existing groups such as ETG and SOGAI.

Also, in this meeting Margaret Pinson (NTIA/ITS) stepped down as co-chair of VQEG and Ioannis Katsavounidis (Meta, US) is the new co-chair together with Kjell Brunnström (RISE, Sweden).

Finally, as already announced in the VQEG website, the next VQEG plenary meeting be hosted by Meta at Meta’s Menlo Park campus, California, in the United States from May 5th to 9th, 2025. For more information see: https://vqeg.org/meetings-home/vqeg-meeting-information/

VQEG Column: VQEG Meeting July 2024

Introduction

The University of Klagenfurt (Austria) hosted from July 01-05, 2024 a plenary meeting of the Video Quality Experts Group (VQEG). More than 110 participants from 20 different countries could attend this meeting in person and remotely.

The first three days of the meeting were dedicated to presentations and discussions about topics related to the ongoing projects within VQEG, while during the last two days an IUT-T Study Group G12 Question 19 (SG12/Q9) interim meeting took place. All the related information, minutes, and files from the meeting are available online in the VQEG meeting website, and video recordings of the meeting are available in Youtube.

All the topics mentioned bellow can be of interest for the SIGMM community working on quality assessment, but special attention can be devoted to the workshop on quality assessment towards 6G held within the 5GKPI group, and to the dedicated meeting of the IMG group hosted by the Distributed and Interactive Systems Group (DIS) of the CWI in September 2024 to work on ITU-T P.IXC recommendation. In addition, during those days there was a co-located ITU-T SG12 Q19 interim meeting.

Readers of these columns interested in the ongoing projects of VQEG are encouraged to subscribe to their corresponding reflectors to follow the activities going on and to get involved in them.

Another plenary meeting of VQEG has taken place from 18th 22nd of November 2024 and will be reported in a following issue of the ACM SIGMM Records.

VQEG plenary meeting at University of Klagenfurt (Austria), from July 01-05, 2024

Overview of VQEG Projects

Audiovisual HD (AVHD)

The AVHD group works on developing and validating subjective and objective methods to analyze commonly available video systems. During the meeting, there were 8 presentations covering very diverse topics within this project, such as open-source efforts, quality models, and subjective assessment methodologies:

Quality Assessment for Health applications (QAH)

The QAH group is focused on the quality assessment of health applications. It addresses subjective evaluation, generation of datasets, development of objective metrics, and task-based approaches. Joshua Maraval and Meriem Outtas (INSA Rennes, France) a dual rig approach for capturing multi-view video and spatialized audio capture for medical training applications, including a dataset for quality assessment purposes.

Statistical Analysis Methods (SAM)

The group SAM investigates on analysis methods both for the results of subjective experiments and for objective quality models and metrics. The following presentations were delivered during the meeting:  

No Reference Metrics (NORM)

The group NORM addresses a collaborative effort to develop no-reference metrics for monitoring visual service quality. In this sense, the following topics were covered:

  • Yixu Chen (Amazon, US) presented their development of a metric tailored for video compression and scaling, which can extrapolate to different dynamic ranges, is suitable for real-time video quality metrics delivery in the bitstream, and can achieve better correlation than VMAF and P.1204.3.
  • Filip Korus (AGH University of Krakow, Poland) talked about the detection of hard-to-compress video sequences (e.g., video content generated during e-sports events) based on objective quality metrics, and proposed a machine-learning model to assess compression difficulty.
  • Hadi Amirpour (University of Klagenfurt, Austria) provided a summary of activities in video complexity analysis, covering from VCA to DeepVCA and describing a Grand Challenge on Video Complexity.
  • Pierre Lebreton (Capacités & Nantes Université, France) presented a new dataset that allows studying the differences among existing UGC video datasets, in terms of characteristics, covered range of quality, and the implication of these quality ranges on training and validation performance of quality prediction models.
  • Zhengzhong Tu (Texas A&M University, US) introduced a comprehensive video quality evaluator (COVER) designed to evaluate video quality holistically, from a technical, aesthetic, and semantic perspective. It is based on leveraging three parallel branches: a Swin Transformer backbone to predict technical quality, a ConvNet employed to derive aesthetic quality, and a CLIP image encoder to obtain semantic quality.

Emerging Technologies Group (ETG)

The ETG group focuses on various aspects of multimedia that, although they are not necessarily directly related to “video quality”, can indirectly impact the work carried out within VQEG and are not addressed by any of the existing VQEG groups. In particular, this group aims to provide a common platform for people to gather together and discuss new emerging topics, possible collaborations in the form of joint survey papers, funding proposals, etc. During this meeting, the following presentations were delivered:

Joint Effort Group (JEG) – Hybrid

The group JEG-Hybrid addresses several areas of Video Quality Assessment (VQA), such as the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. In addition, the group includes the VQEG project Implementer’s Guide for Video Quality Metrics (IGVQM). The chair of this group,  Enrico Masala (Politecnico di Torino, Italy) presented the updates on the latest activities going on, including the status of the IGVQM project and a new image dataset, which will be partially subjectively annotated, to train DNN models to predict single user’s subjective quality perception. In addition to this:

Immersive Media Group (IMG)

The IMG group researches on the quality assessment of immersive media technologies. Currently, the main joint activity of the group is the development of a test plan to evaluate the QoE of immersive interactive communication systems, which is carried out in collaboration with ITU-T through the work item P.IXC. In this meeting, Pablo Pérez (Nokia XR Lab, Spain) and Jesús Gutiérrez (Universidad Politécnica de Madrid, Spain) provided an update on the progress of the test plan, reviewing the status of the subjective tests that were being performed at the 13 involved labs. Also in relation with this test plan:

In relation with other topics addressed by IMG:

In addition, a specific meeting of the group was held at Distributed and Interactive Systems Group (DIS) of CWI in Amsterdam (Netherlands) from the 2nd to the 4th of September to progress on the joint test plan for evaluating immersive communication systems. A total of 26 international experts from seven countries (Netherlands, Spain, Italy, UK, Sweden, Germany, US, and Poland) participated, with 7 attending online. In particular, the meeting featured presentations on the status of tests run by 13 participating labs, leading to insightful discussions and progress towards the ITU-T P.IXC recommendation.

IMG meeting at CWI (2-4 September, 2024, Netherlands)

Quality Assessment for Computer Vision Applications (QACoViA)

The group QACoViA addresses the study the visual quality requirements for computer vision methods, where the final user is an algorithm. In this meeting, Mikołaj Leszczuk (AGH University of Krakow, Poland) presented a study introducing a novel evaluation framework designed to address accurately predicting the impact of different quality factors on recognition algorithm, by focusing on machine vision rather than human perceptual quality metrics.

5G Key Performance Indicators (5GKPI)

The 5GKPI group studies relationship between key performance indicators of new 5G networks and QoE of video services on top of them. In this meeting, a workshop was organized by Pablo Pérez (Nokia XR Lab, Spain) and Kjell Brunnström (RISE, Sweden) on “Future directions of 5GKPI: Towards 6G“.

The workshop consisted of a set of diverse topics such as: QoS and QoE management in 5G/6G networks by (Michelle Zorzi, University of Padova, Italy); parametric QoE models and QoE management by Tobias Hoßfeld (University of. Würzburb, Germany) and Pablo Pérez (Nokia XR Lab, Spain); current status of standardization and industry by Kjell Brunnström (RISE, Sweden) and Gunilla Berndtsson (Ericsson); content and applications provider perspectives on QoE management by François Blouin (Meta, US); and communications service provider perspectives by Theo Karagioules and Emir Halepovic (AT&T, US). In addition, a panel moderated by Narciso García (Universidad Politécnica de Madrid, Spain) with Christian Timmerer (University of Klagenfurt, Austria), Enrico Masala (Politecnico di Torino, Italy) and Francois Blouin (Meta, US) as speakers.

Human Factors for Visual Experiences (HFVE)

The HFVE group covers human factors related to audiovisual experiences and upholds the liaison relation between VQEG and the IEEE standardization group P3333.1. In this meeting, there were two presentations related to these topics:

  • Mikołaj Leszczuk and Kamil Koniuch (AGH University of Krakow, Poland) presented a two-part insight into the realm of image quality assessment: 1) it provided an overview of the TUFIQoE project (Towards Better Understanding of Factors Influencing the QoE by More Ecologically-Valid Evaluation Standards) with a focus on challenges related to ecological validity; and 2) it delved into the ‘Psychological Image Quality’ experiment, highlighting the influence of emotional content on multimedia quality perception.

The 2nd Edition of International Summer School on Extended Reality Technology and Experience (XRTX)

The ACM Special Interest  Group on Multimedia (ACM SIGMM), co-sponsored the second edition of the International Summer School on Extended Reality Technology and Experience (XRTX), which was took place from July 8-11, 2024 in Madrid (Spain) hosted by Universidad Carlos III de Madrid. As in the first edition in 2023, in the organization also participated Universidad Politécnica de Madrid, Nokia, Universidad de Zaragoza and Universidad Rey Juan Carlos. It attracted 29 participants from different disciplines (e.g., engineering, computer science, psychology, etc.) and 7 different countries.

Students and organizers of the Summer School XRTX (July 8-11, 2024, Madrid)

The support from ACM SIGMM permitted to bring top researchers in the field of XR to deliver keynotes in different topics related to technological (e.g., volumetric video, XR for healthcare) and user-centric factors (e.g., user experience evaluation, multisensory experiences, etc.), such as Pablo César (CWI, Netherlands), Anthony Steed (UCL, UK), Manuela Chessa (University of Genoa, Italy), Qi Sun (New York University), Diego Gutiérrez (Universidad de Zaragoza, Spain), and Marianna Obrist (UCL, UK). Also, an industry session was also included, led by Gloria Touchard (Nokia, Spain).

Keynote sessions

In addition to these 7 keynotes, the program included a creative design session led by Elena Márquez-Segura (Universidad Carlos III, Spain), a tutorial on Ubiq given by Sebastian Friston (UCL, UK), and 2 practical sessions led by Telmo Zarraonandia (Universidad Carlos III, Spain) and Sandra Malpica (Universidad de Zaragoza, Spain) to get hands-on experience working with Unity for VR development .

Design and practical sessions

Moreover, ​there were poster and demo sessions distributed along the whole duration of the school, in which the participants could showcase their works.

Poster and demo sessions

The summer school was also sponsored by Nokia and the Computer Science and Engineering Department of Universidad Carlos III, which allowed to offer grants to support a number of students with the registration and traveling costs.

Finally, in addition to science, there was time for fun with social activities, like practicing real and VR archery and visiting and immersive exhibition about Pompeii.

Practicing real and VR archery
Immersive exhibition about Pompeii

The list of talks was:

  • “Towards volumetric video conferencing” by Pablo Cesar.
  • “Strategies for Designing and Evaluating eXtended Reality Experiences” by Anthony Steed.
  • “XR for healthcare: Immersive and interactive technologies for serious games and exergames”, by Manuela Chessa.
  • “Toward Human-Centered XR: Bridging Cognition and Computation”, by Qi Sun.
  • “Improving the user’s experience in VR” by Diego Gutiérrez.
  • “Why XR is important for Nokia”, Gloria Touchard.
  • “The Role of Multisensory Experiences in Extended Reality: Unlocking the Power of Smell” by Marianna Obrist.

VQEG Column: VQEG Meeting December 2023

Introduction

The last plenary meeting of the Video Quality Experts Group (VQEG) was held online by the University of Konstantz (Germany) in December 18th to 21st, 2023. It offered the possibility to more than 100 registered participants from 19 different countries worldwide to attend the numerous presentations and discussions about topics related to the ongoing projects within VQEG. All the related information, minutes, and files from the meeting are available online in the VQEG meeting website, and video recordings of the meeting are soon available at Youtube.

All the topics mentioned below can be of interest for the SIGMM community working on quality assessment, but special attention can be devoted to the current activities on improvements of the statistical analysis of subjective experiments and objective metrics and on the development of a test plan to evaluate the QoE of immersive interactive communication systems in collaboration with ITU.

Readers of these columns interested in the ongoing projects of VQEG are encouraged to suscribe to the VQEG’s  email reflectors to follow the activities going on and to get involved with them.

As already announced in the VQEG website, the next VQEG plenary meeting be hosted by Universität Klagenfurt in Austria from July 1st to 5th, 2024.

Group picture of the online meeting

Overview of VQEG Projects

Audiovisual HD (AVHD)

The AVHD group works on developing and validating subjective and objective methods to analyze commonly available video systems. During the meeting, there were various sessions in which presentations related to these topics were discussed.

Firstly, Ali Ak (Nantes Université, France), provided an analysis of the relation between acceptance/annoyance and visual quality in a recently collected dataset of several User Generated Content (UGC) videos. Then, Syed Uddin (AGH University of Krakow, Poland) presented a video quality assessment method based on the quantization parameter of MPEG encoders (MPEG-4, MPEG-AVC, and MPEG-HEVC) leveraging VMAF. In addition, Sang Heon Le (LG Electronics, Korea) presented a technique for pre-enhancement for video compression and applicable subjective quality metrics. Another talk was given by Alexander Raake (TU Ilmenau, Germany), who presented AVQBits, a versatile no-reference bitstream-based video quality model (based on the standardized ITU-T P.1204.3 model) that can be applied in several contexts such as video service monitoring, evaluation of video encoding quality, of gaming video QoE, and even of omnidirectional video quality. Also, Jingwen Zhu (Nantes Université, France) and Hadi Amirpour (University of Klagenfurt, Austria) described a study on the evaluation of the effectiveness of different video quality metrics in predicting the Satisfied User Ratio (SUR) in order to enhance the VMAF proxy to better capture content-specific characteristics. Andreas Pastor (Nantes Université, France) presented a method to predict the distortion perceived locally by human eyes in AV1-encoded videos using deep features, which can be easily integrated into video codecs as a pre-processing step before starting encoding.

In relation with standardization efforts, Mathias Wien (RWTH Aachen University, Germany) gave an overview on recent expert viewing tests that have been conducted within MPEG AG5 at the 143rd and 144th MPEG meetings. Also, Kamil Koniuch (AGH University of Krakow, Poland) presented a proposal to update the Survival Game task defined in the ITU-T Recommendation P.1301 on subjective quality evaluation of audio and audiovisual multiparty telemeetings, in order to improve its implementation and application to recent efforts such as the evaluation of immersive communication systems within the ITU-T P.IXC (see the paragraph related to the Immersive Media Group).

Quality Assessment for Health applications (QAH)

The QAH group is focused on the quality assessment of health applications. It addresses subjective evaluation, generation of datasets, development of objective metrics, and task-based approaches. Recently, the group has been working towards an ITU-T recommendation for the assessment of medical contents. On this topic, Meriem Outtas (INSA Rennes, France) led a discussion dealing with the edition of a draft of this recommendation. In addition, Lumi Xia (INSA Rennes, France) presented a study of task-based medical image quality assessment focusing on a use case of adrenal lesions.

Statistical Analysis Methods (SAM)

The group SAM investigates on analysis methods both for the results of subjective experiments and for objective quality models and metrics. This was one of the most active groups in this meeting, with several presentations on related topics.

On this topic, Krzystof Rusek (AGH University of Krakow, Poland) presented a Python package to estimate Generalized Score Distribution (GSD) parameters and showed how to use it to test the results obtained in subjective experiments. Andreas Pastor (Nantes Université, France) presented a comparison between two subjective studies using Absolute Category Rating with Hidden Reference (ACR-HR) and Degradation Category Rating (DCR), conducted in a controlled laboratory environment on SDR HD, UHD, and HDR UHD contents using naive observers. The goal of these tests is to estimate rate-distortion savings between two modern video codecs and compare the precision and accuracy of both subjective methods. He also presented another study on the comparison of conditions for omnidirectional video with spatial audio in terms of subjective quality and impacts on objective metrics resolving power.

In addition, Lukas Krasula (Netflix, USA) introduced e2nest, a web-based platform to conduct media-centric (video, audio, and images) subjective tests. Also, Dietmar Saupe (University of Konstanz, Germany) and Simon Del Pin (NTNU, Norway) showed the results of a study analyzing the national difference in image quality assessment, showing significant differences in various areas. Alexander Raake (TU Ilmenau, Germany) presented a study on the remote testing of high resolution images and videos, using AVrate Voyager , which is a publicly accessible framework for online tests. Finally, Dominik Keller (TU Ilmenau, Germany) presented a recent study exploring the impact of 8K (UHD-2) resolution on HDR video quality, considering different viewing distances. The results showed that the enhanced video quality of 8K HDR over 4K HDR diminishes with increasing viewing distance.

No Reference Metrics (NORM)

The group NORM addresses a collaborative effort to develop no-reference metrics for monitoring visual service quality. In At this meeting, Ioannis Katsavounidis (Meta, USA) led a discussion on the current efforts to improve complexity image and video metrics. In addition, Krishna Srikar Durbha (Univeristy of Texas at Austin, USA) presented a technique to tackle the problem of bitrate ladder construction based on multiple Visual Information Fidelity (VIF) feature sets extracted from different scales and subbands of a video

Emerging Technologies Group (ETG)

The ETG group focuses on various aspects of multimedia that, although they are not necessarily directly related to “video quality”, can indirectly impact the work carried out within VQEG and are not addressed by any of the existing VQEG groups. In particular, this group aims to provide a common platform for people to gather together and discuss new emerging topics, possible collaborations in the form of joint survey papers, funding proposals, etc.

In this meeting, Nabajeet Barman and Saman Zadtootaghaj (Sony Interactive Entertainment, Germany), suggested a topic to start to be discussed within VQEG: Quality Assessment of AI Generated/Modified Content. The goal is to have subsequent discussions on this topic within the group and write a position or whitepaper.

Joint Effort Group (JEG) – Hybrid

The group JEG addresses several areas of Video Quality Assessment (VQA), such as the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. In addition, the group includes the VQEG project Implementer’s Guide for Video Quality Metrics (IGVQM). At the meeting, Enrico Masala (Politecnico di Torino, Italy) provided  updates on the activities of the group and on IGVQM.

Apart from this, there were three presentations addressing related topics in this meeting, delivered by Lohic Fotio Tiotsop (Politecnico di Torino, Italy). The first presentation focused on quality estimation in subjective experiments and the identification of peculiar subject behaviors, introducing a robust approach for estimating subjective quality from noisy ratings, and a novel subject scoring model that enables highlighting several peculiar behaviors. Also, he introduced a non-parametric perspective to address the media quality recovery problem, without making any a priori assumption on the subjects’ scoring behavior. Finally, he presented an approach called “human-in-the-loop training process” that uses  multiple cycles of a human voting, DNN training, and inference procedure.

Immersive Media Group (IMG)

The IMG group is performing research on the quality assessment of immersive media technologies. Currently, the main joint activity of the group is the development of a test plan to evaluate the QoE of immersive interactive communication systems, which is carried out in collaboration with ITU-T through the work item P.IXC. In this meeting, Pablo Pérez (Nokia XR Lab, Spain), Jesús Gutiérrez (Universidad Politécnica de Madrid, Spain), Kamil Koniuch (AGH University of Krakow, Poland), Ashutosh Singla (CWI, The Netherlands) and other researchers involved in the test plan provided an update on the status of the test plan, focusing on the description of four interactive tasks to be performed in the test, the considered measures, and the 13 different experiments that will be carried out in the labs involved in the test plan. Also, in relation with this test plan, Felix Immohr (TU Ilmenau, Germany), presented a study on the impact of spatial audio on social presence and user behavior in multi-modal VR communications.

Diagram of the methodology of the joint IMG test plan

Quality Assessment for Computer Vision Applications (QACoViA)

The group QACoViA addresses the study the visual quality requirements for computer vision methods, where the final user is an algorithm. In this meeting, Mikołaj Leszczuk (AGH University of Krakow, Poland) and  Jingwen Zhu (Nantes Université, France) presented a specialized data set developed for enhancing Automatic License Plate Recognition (ALPR) systems. In addition, Hanene Brachemi (IETR-INSA Rennes, France), presented an study on evaluating the vulnerability of deep learning-based image quality assessment methods to adversarial attacks. Finally, Alban Marie (IETR-INSA Rennes, France) delivered a talk on the exploration of lossy image coding trade-off between rate, machine perception and quality.

5G Key Performance Indicators (5GKPI)

The 5GKPI group studies relationship between key performance indicators of new 5G networks and QoE of video services on top of them. At the meeting, Pablo Pérez (Nokia XR Lab, Spain) led an open discussion on the future activities of the group towards 6G, including a brief presentation of QoS/QoE management in 3GPP and presenting potential opportunities to influence QoE in 6G.

VQEG Column: VQEG Meeting June 2023

Introduction

This column provides a report on the last Video Quality Experts Group (VQEG) plenary meeting, which took place from 26 to 30 June 2023 in San Mateo (USA), hosted by Sony Interactive Entertainment. More than 90 participants worldwide registered for the hybrid meeting, counting with the physical attendance of more than 40 people. This meeting was co-located with the ITU-T SG12 meeting, which took place in the first two days of the week. In addition, more than 50 presentations related to the ongoing projects within VQEG were provided, leading to interesting discussions among the researchers attending the meeting. All the related information, minutes, and files from the meeting are available online on the VQEG meeting website, and video recordings of the meeting are available on Youtube.

In this meeting, there were several aspects that can be relevant for the SIGMM community working on quality assessment. For instance, there are interesting new work items and efforts on updating existing recommendations discussed in the ITU-T SG12 co-located meeting (see the section about the Intersector Rapporteur Group on Audiovisual Quality Assessment). In addition, there was an interesting panel related to deep learning for video coding and video quality with experts from different companies (e.g., Netflix, Adobe, Meta, and Google) (see the Emerging Technologies Group section). Also, a special session on Quality of Experience (QoE) for gaming was organized, involving researchers from several international institutions. Apart from this, readers may be interested in the presentation about MPEG activities on quality assessment and the different developments from industry and academia on tools, algorithms and methods for video quality assessment.

We encourage readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Group picture of the VQEG Meeting 26-30 June 2023 hosted by Sony Interactive Entertainment (San Mateo, USA).

Overview of VQEG Projects

Audiovisual HD (AVHD)

The AVHD group investigates improved subjective and objective methods for analyzing commonly available video systems. In this meeting, there were several presentations related to topics covered by this group, which were distributed in different sessions during the meeting.

Nabajeet Barman (Kingston University, UK) presented a datasheet for subjective and objective quality assessment datasets. Ali Ak (Nantes Université, France) delivered a presentation on the acceptability and annoyance of video quality in context. Mikołaj Leszczuk (AGH University, Poland) presented a crowdsourcing pixel quality study using non-neutral photos. Kamil Koniuch (AGH University, Poland) discussed about the role of theoretical models in ecologically valid studies, covering the example of a video quality of experience model. Jingwen Zhu (Nantes Université, France) presented her work on evaluating the streaming experience of the viewers with Just Noticeable Difference (JND)-based Encoding. Also, Lucjan Janowski (AGH University, Poland) talked about proposing a more ecologically-valid experiment protocol using YouTube platform.

In addition, there were four presentations by researchers from the industry sector. Hojat Yeganeh (SSIMWAVE/IMAX, USA) talked about how more accurate video quality assessment metrics would lead to more savings. Lukas Krasula (Netflix, USA) delivered a presentation on subjective video quality for 4K HDR-WCG content using a browser-based approach for at-home testing. Also, Christos Bampis (Netflix, USA) presented the work done by Netflix on improving video quality with neural networks. Finally, Pranav Sodhani (Apple, USA) talked about how to evaluate videos with the Advanced Video Quality Tool (AVQT).

Quality Assessment for Health applications (QAH)

The QAH group works on the quality assessment of health applications, considering both subjective evaluation and the development of datasets, objective metrics, and task-based approaches. The group is currently working towards an ITU-T recommendation for the assessment of medical contents. In this sense, Meriem Outtas (INSA Rennes, France) led an editing session of a draft of this recommendation.

Statistical Analysis Methods (SAM)

The SAM group works on improving analysis methods both for the results of subjective experiments and for objective quality models and metrics. The group is currently working on updating and merging the ITU-T recommendations P.913, P.911, and P.910.

Apart from this, several researchers presented their works on related topics. For instance, Pablo Pérez (Nokia XR Lab, Spain) presented (not so) new findings about transmission rating scale and subjective scores. Also, Jingwen Zhu (Nantes Université, France) presented ZREC, an approach for mean and percentile opinion scores recovery. In addition, Andreas Pastor (Nantes Université, France) presented three works: 1) on the accuracy of open video quality metrics for local decision in AV1 video codec, 2) on recovering quality scores in noisy pairwise subjective experiments using negative log-likelihood, and 3) on guidelines for subjective haptic quality assessment, considering a case study on quality assessment of compressed haptic signals. Lucjan Janowski (AGH University, Poland) discussed about experiment precision, proposing experiment precision measures and methods for experiments comparison. Finally, there were three presentations from members of the University of Konstanz (Germany). Dietmar Saupe presented the JPEG AIC-3 activity on fine-grained assessment of subjective quality of compressed images, Mohsen Jenadeleh talked about how relaxed forced choice improves performance of visual quality assessment methods, and Mirko Dulfer presented his work on quantization for Mean Opinion Score (MOS) recovery in Absolute Category Rating (ACR) experiments.

Computer Generated Imagery (CGI)

CGI group is devoted to analyzing and evaluating of computer-generated content, with a focus on gaming in particular. In this meeting, Saman Zadtootaghaj (Sony Interactive Entertainment, Germany) an Nabajeet Barman (Kingston University, UK) organized a special gaming session, in which researchers from several international institutions presented their work in this topic. Among them, Yu-Chih Chen (UT Austin LIVE Lab, USA) presented GAMIVAL, a Video Quality Prediction on Mobile Cloud Gaming Content. Also, Urvashi Pal (Akamai, USA) delivered a presentation on web streaming quality assessment via computer vision applications over cloud. Mathias Wien (RWTH Aachen University, Germany) provided updates on ITU-T P.BBQCG work item, dataset and model development. Avinab Saha (UT Austin LIVE Lab, USA) presented a study of subjective and objective quality assessment of mobile cloud gaming videos. Finally, Irina Cotanis (Infovista, Sweden) and Karan Mitra (Luleå University of Technology, Sweden) presented their work towards QoE models for mobile cloud and virtual reality games.

No Reference Metrics (NORM)

The NORM group is an open collaborative project for developing no-reference metrics for monitoring visual service quality. In this meeting, Margaret Pinson (NTIA, USA) and Ioannis Katsavounidis (Meta, USA), two of the chairs of the group, provided a summary of NORM successes and discussion of current efforts for improved complexity metric. In addition, there were six presentations dealing with related topics. C.-C. Jay Kuo (University of Southern California, USA) talked about blind visual quality assessment for mobile/edge computing. Vignesh V. Menon (University of Klagenfurt, Austria) presented the updates of the Video Quality Analyzer (VQA). Yilin Wang (Google/YouTube, USA) gave a talk on the recent updates on the Universal Video Quality (UVQ). Farhad Pakdaman (Tampere University, Finland) and Li Yu (Nanjing University, China), presented a low complexity no-reference image quality assessment based on multi-scale attention mechanism with natural scene statistics. Finally, Mikołaj Leszczuk (AGH University, Poland) presented his work on visual quality indicators adapted to resolution changes and on considering in-the-wild video content as a special case of user generated content and a system for its recognition.

Emerging Technologies Group (ETG)

The main objective of the ETG group is to address various aspects of multimedia that do not fall under the scope of any of the existing VQEG groups. The topics addressed are not necessarily directly related to “video quality” but can indirectly impact the work addressed as part of VQEG. This group aims to provide a common platform for people to gather together and discuss new emerging topics, discuss possible collaborations in the form of joint survey papers/whitepapers, funding proposals, etc.

One of the topics addressed by this group is related to the use of artificial-intelligence technologies to different domains, such as compression, super-resolution, and video quality assessment. In this sense, Saman Zadtootaghaj (Sony Interactive Entertainment, Germany) organized a panel session with experts from different companies (e.g., Netflix, Adobe, Meta, and Google) on deep learning in the video coding and video quality domains. In this sense, Marcos Conde (Sony Interactive Entertainment, Germany) and David Minnen (Google, USA) gave a talk on generative compression and the challenges for quality assessment.

Another topic covered by this group is greening of streaming and related trends. In this sense, Vignesh V. Menon and Samira Afzal (University of Klagenfurt, Austria) presented their work on green variable framerate encoding for adaptive live streaming. Also, Prajit T. Rajendran (Université Paris Saclay, France) and Vignesh V. Menon (University of Klagenfurt, Austria) delivered a presentation on energy efficient live per-title encoding for adaptive streaming. Finally, Berivan Isik (Stanford University, USA) talked about sandwiched video compression to efficiently extending the reach of standard codecs with neural wrappers.

Joint Effort Group (JEG) – Hybrid

The JEG group was focused on a joint work to develop hybrid perceptual/bitstream metrics and gradually evolved over time to include several areas of Video Quality Assessment (VQA), such as the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. In addition, the group will include under its activities the VQEG project Implementer’s Guide for Video Quality Metrics (IGVQM).

Apart from this, there were three presentations addressing related topics in this meeting. Nabajeet Barman (Kingston University, UK) presented a subjective dataset for multi-screen video streaming applications. Also, Lohic Fotio (Politecnico di Torino, Italy) presented his works entitled “Human-in-the-loop” training procedure of the artificial-intelligence-based observer (AIO) of a real subject and advances on the “template” on how to report DNN-based video quality metrics.

The website of the group includes a list of activities of interest, freely available publications, and other resources.

Immersive Media Group (IMG)

The IMG group is focused on the research on quality assessment of immersive media. The main joint activity going on within the group is the development of a test plan to evaluate the QoE of immersive interactive communication systems, which is carried out in collaboration with ITU-T through the work item P.IXC. In this meeting, Pablo Pérez (Nokia XR Lab, Spain) and Jesús Gutiérrez (Universidad Politécnica de Madrid, Spain) provided a report on the status of the test plan, including the test proposals from 13 different groups that have joined the activity, which will be launched in September.

In addition to this, Shirin Rafiei (RISE, Sweden) delivered a presentation on her work on human interaction in industrial tele-operated driving through a laboratory investigation.

Quality Assessment for Computer Vision Applications (QACoViA)

The goal of the QACoViA group is to study the visual quality requirements for computer vision methods, where the “final observer” is an algorithm. In this meeting, Avrajyoti Dutta (AGH University, Poland) delivered a presentation dealing with the subjective quality assessment of video summarization algorithms through a crowdsourcing approach.

Intersector Rapporteur Group on Audiovisual Quality Assessment (IRG-AVQA)

This VQEG meeting was co-located with the rapporteur group meeting of ITU-T Study Group 12 – Question 19, coordinated by Chulhee Lee (Yonsei University, Korea). During the first two days of the week, the experts from ITU-T and VQEG worked together on various topics. For instance, there was an editing session to work together on the VQEG proposal to merge the ITU-T Recommendations P.910, P.911, and P.913, including updates with new methods. Another topic addressed during this meeting was the working item “P.obj-recog”, related to the development of an object-recognition-rate-estimation model in surveillance video of autonomous driving. In this sense, a liaison statement was also discussed with the VQEG AVHD group. Also in relation to this group, another liaison statement was discussed on the new work item “P.SMAR” on subjective tests for evaluating the user experience for mobile Augmented Reality (AR) applications.

Other updates

One interesting presentation was given by Mathias Wien (RWTH Aachen University, Germany) on the quality evaluation activities carried out within the MPEG Visual Quality Assessment group, including the expert viewing tests. This presentation and the follow-up discussions will help to strengthen the collaboration between VQEG and MPEG on video quality evaluation activities.

The next VQEG plenary meeting will take place in autumn 2023 and will be announced soon on the VQEG website.

VQEG Column: Emerging Technologies Group (ETG)

Introduction

This column provides an overview of the new Video Quality Experts Group (VQEG) group called the Emerging Technologies Group (ETG), which was created during the last VQEG plenary meeting in December 2022. For an introduction to VQEG, please check the VQEG homepage or this presentation.

The works addressed by this new group can be of interest for the SIGMM community since they are related to AI-based technologies for image and video processing, greening of streaming, blockchain in media and entertainment, and ongoing related standardization activities.

About ETG

The main objective of this group is to address various aspects of multimedia that do not fall under the scope of any of the existing VQEG groups. The group, through its activities, aims to provide a common platform for people to gather together and discuss new emerging topics and ideas, discuss possible collaborations in the form of joint survey papers/whitepapers, funding proposals, etc. The topics addressed are not necessarily directly related to “video quality” but rather focus on any ongoing work in the field of multimedia which can indirectly impact the work addressed as part of VQEG. 

Scope

During the creation of the group, the following topics were tentatively identified to be of possible interest to the members of this group and VQEG in general: 

  • AI-based technologies:
    • Super Resolution
    • Learning-based video compression
    • Video coding for machines, etc., 
    • Enhancement, Denoising and other pre- and post-filter techniques
  • Greening of streaming and related trends
    • For example, trade-off between HDR and SDR to save energy and its impact on visual quality
  • Ongoing Standards Activities (which might impact the QoE of end users and hence will be relevant for VQEG)
    • 3GPP, SVTA, CTA WAVE, UHDF, etc.
    • MPEG/JVET
  • Blockchain in Media and Entertainment

Since the creation of the group, four talks on various topics have been organized, an overview of which is summarized next.

Overview of the Presentations

We briefly provide a summary of various talks that have been organized by the group since its inception.

On the work by MPEG Systems Smart Contracts for Media Subgroup

The first presentation was on the topic of the recent work by MPEG Systems on Smart Contract for Media [1], which was delivered by Dr Panos Kudumakis who is the Head of UK Delegation, ISO/IEC JTC1/SC29 & Chair of British Standards Institute (BSI) IST/37. Dr Panos in this talk highlighted the efforts in the last few years by MPEG towards developing several standardized ontologies catering to the needs of the media industry with respect to the codification of Intellectual Property Rights (IPR) information toward the fair trade of media. However, since inference and reasoning capabilities normally associated with ontology use cannot naturally be done on DLT environments, there is a huge potential to unlock the Semantic Web and, in turn, the creative economy by bridging this interoperability gap [2]. In that direction, ISO/IEC 21000-23 Smart Contracts for Media standard specifies the means (e.g., APIs) for converting MPEG IPR ontologies to smart contracts that can be executed on existing DLT environments [3]. The talk discussed the recent works that have been done as part of this effort and also on the ongoing efforts towards the design of a full-fledged ISO/IEC 23000-23 Decentralized Media Rights Application Format standard based on MPEG technologies (e.g., audio-visual codecs, file formats, streaming protocols, and smart contracts) and non-MPEG technologies (e.g., DLTs, content, and creator IDs). 
The recording of the presentation is available here, and the slides can be accessed here.

Introduction to NTIRE Workshop on Quality Assessment for Video Enhancement

The second presentation was given by Xiaohong Liu and Yuxuan Gao from Shanghai Jiao Tong University, China about one of the CVPR challenge workshops called the NTIRE 2023 Quality Assessment of Video Enhancement Challenge. The presentation described the motivation for starting this challenge and how this is of great relevance to the video community in general. Then the presenters described the dataset such as the dataset creation process, subjective tests to obtain ratings, and the reasoning behind the choice of the split of the dataset into training, validation, and test sets. The results of this challenge are scheduled to be presented at the upcoming spring meeting end of June 2023. The presentation recording is available here.  

Perception: The Next Milestone in Learned Image Compression

Johannes Balle from Google was the third presenter on the topic of “Perception: The Next Milestone in Learned Image Compression.” In the first part, Johannes discussed the learned compression and described the nonlinear transforms [4] and how they could achieve a higher image compression rate than linear transforms. Next, they emphasized the importance of perceptual metrics in comparison to distortion metrics by introducing the difference between perceptual quality vs. reconstruction quality [5]. Next, an example of generative-based image compression is presented where the two criteria of distortion metric and perceptual metric (named as realism criteria) are combined, HiFiC [6]. Finally, the talk concluded with an introduction to perceptual spaces and an example of a perceptual metric, PIM [7]. The presentation slides can be found here.

Compression with Neural Fields

Emilien Dupont (DeepMind) was the fourth presenter. He started the talk with a short introduction on the emergence of neural compression that fits a signal, e.g., an image or video, to a neural network. He then discussed the two recent works on neural compression that he was involved in, named COIN [8] and COIN++ [9].  He then made a short overview of other Implicit Neural Representation in the domain of video such as NerV [10] and NIRVANA [11]. The slides for the presentation can be found here.

Upcoming Presentations

As part of the ongoing efforts of the group, the following talks/presentations are scheduled in the next two months. For an updated schedule and list of presentations, please check the ETG homepage here.

Sustainable/Green Video Streaming

Given the increasing carbon footprint of streaming services and climate crisis, many new collaborative efforts have started recently, such as the Greening of the Streaming alliance, Ultra HD Sustainability forum, etc. In addition, research works recently have started focussing on how to make video streaming more greener/sustainable. A talk providing an overview of the recent works and progress in direction is tentatively scheduled around mid-May, 2023.    

Panel discussion at VQEG Spring Meeting (June 26-30, 2023), Sony Interactive Entertainment HQ, San Mateo, US

During the next face-to-face VQEG meeting in San Mateo there will be an interesting panel discussion on the topic of “Deep Learning in Video Quality and Compression.” The goal is to invite the machine learning experts to VQEG and bring the two groups closer. ETG will organize the panel discussion, and the following four panellists are currently invited to join this event: Zhi Li (Netflix), Ioannis Katsavounidis (Meta), Richard Zhang (Adobe), and Mathias Wien (RWTH Aachen). Before this panel discussion, two talks are tentatively scheduled, the first one on video super-resolution and the second one focussing on learned image compression. 
The meeting will talk place in hybrid mode allowing for participation both in-person and online. For further information about the meeting, please check the details here and if interested, register for the meeting.

Joining and Other Logistics

While participation in the talks is open to everyone, to get notified about upcoming talks and participate in the discussion, please consider subscribing to etg@vqeg.org email reflector and join the slack channel using this link. The meeting minutes are available here. We are always looking for new ideas to improve. If you have suggestions on topics we should focus on or have recommendation of presenters, please reach out to the chairs (Nabajeet and Saman).

References

[1] White paper on MPEG Smart Contracts for Media.
[2] DLT-based Standards for IPR Management in the Media Industry.
[3] DLT-agnostic Media Smart Contracts (ISO/IEC 21000-23).
[4] [2007.03034] Nonlinear Transform Coding.
[5] [1711.06077] The Perception-Distortion Tradeoff.
[6] [2006.09965] High-Fidelity Generative Image Compression.
[7] [2006.06752] An Unsupervised Information-Theoretic Perceptual Quality Metric.
[8] Coin: Compression with implicit neural representations.
[9] COIN++: Neural compression across modalities.
[10] Nerv: Neural representations for videos.
[11] NIRVANA: Neural Implicit Representations of Videos with Adaptive Networks and Autoregressive Patch-wise Modeling.

VQEG Column: VQEG Meeting December 2022

Introduction

This column provides an overview of the last Video Quality Experts Group (VQEG) plenary meeting, which took place from 12 to 16 December 2022. Around 100 participants from 21 different countries around the world registered for the meeting that was organized online by Brightcove (United Kingdom). During the five days, there were more than 40 presentations and discussions among researchers working on topics related to the projects ongoing within VQEG. All the related information, minutes, and files from the meeting are available online on the VQEG meeting website, and video recordings of the meeting are available on Youtube.

Many of the works presented in this meeting can be relevant for the SIGMM community working on quality assessment. Particularly interesting can be the proposals to update and merge ITU-T recommendations P.913, P.911, and P.910, the kick-off of the test plan to evaluate the QoE of immersive interactive communication systems, and the creation of a new group on emerging technologies that will start working on AI-based technologies and greening of streaming and related trends.

We encourage readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Group picture of the VQEG Meeting 12-16 December 2022 (online).

Overview of VQEG Projects

Audiovisual HD (AVHD)

The AVHD group investigates improved subjective and objective methods for analysing commonly available video systems. Currently, there are two projects ongoing under this group: Quality of Experience (QoE) Metrics for Live Video Streaming Applications (Live QoE) and Advanced Subjective Methods (AVHD-SUB).

In this meeting, there were three presentations related to topics covered by this group. In the first one, Maria Martini (Kingston University, UK), presented her work on converting video quality assessment metrics. In particular, the work addressed the relationship between SSIM and PSNR for DCT-based compressed images and video, exploiting the content-related factor [1]. The second presentation was given by Urvashi Pal (Akamai, Australia) and dealt with video codec profiling with video quality assessment complexities and resolutions. Finally, Jingwen Zhu (Nantes Université, France) presented her work on the benefit of parameter-driven approaches for the modelling and the prediction of a Satisfied User Ratio for compressed videos [2].

Quality Assessment for Health applications (QAH)

The QAH group works on the quality assessment of health applications, considering both subjective evaluation and the development of datasets, objective metrics, and task-based approaches. Currently there is an open discussion on new topics to address within the group, such as the application of visual attention models and studies to health applications. Also, an opportunity to conduct medical perception research was announced, which was proposed by Elizabeth Krupinski and will take place in the European Congress of Radiology (Vienna, Austria, Mar. 2023).

In addition, four research works were presented at the meeting. Firstly, Julie Fournier (INSA Rennes, France) presented new insights on affinity therapy for people with ASD, based on an eye-tracking study on images. The second presentation was delivered by Lumi Xia (INSA Rennes, France) and dealt with the evaluation of the usability of deep learning-based denoising models for low-dose CT simulation. Also, Mohamed Amine Kerkouri (University of Orleans, France), presented his work on deep-based quality assessment of medical images through domain adaptation. Finally, Jorge Caviedes (ASU, USA) delivered a talk on cognition inspired diagnostic image quality models, emphasising the need of distinguishing among interpretability (e.g., medical professional is confident in making a diagnosis), adequacy (e.g., capture technique shows the right area for assessment), and visual quality (e.g., MOS) in quality assessment of medical contents.

Statistical Analysis Methods (SAM)

The SAM group works on improving analysis methods both for the results of subjective experiments and for objective quality models and metrics. The group is currently working on updating and merging the ITU-T recommendations P.913, P.911, and P.910. The suggestion is to make P.910 and P.911 obsolete and make P.913 the only recommendation from ITU-T on subjective video quality assessments. The group worked on the liaison and document to be sent to ITU-T SG12 and will be available in the meeting files.

In addition, Mohsen Jenadeleh (Univerity of Konstanz, Germany) presented his work on collective just noticeable difference assessment for compressed video with Flicker Test and QUEST+.

Computer Generated Imagery (CGI)

CGI group is devoted to analysing and evaluating computer-generated content, with a focus on gaming in particular. The group is currently working in collaboration with ITU-T SG12 on the work item P.BBQCG on Parametric bitstream-based Quality Assessment of Cloud Gaming Services. In this sense, Saman Zadtootaghaj (Sony Interactive Entertainment, Germany) provided an update on the ongoing activities. In addition, they are working on two new work items: G.OMMOG on Opinion Model for Mobile Online Gaming applications and P.CROWDG on Subjective Evaluation of Gaming Quality with a Crowdsourcing Approach. Also, the group is working on identifying other topics and interests in CGI rather than gaming content.

No Reference Metrics (NORM)

The NORM group is an open collaborative project for developing no-reference metrics for monitoring visual service quality. Currently, the group is working on three topics: the development of no-reference metrics, the clarification of the computation of the Spatial and Temporal Indexes (SI and TI, defined in the ITU-T Recommendation P.910), and the development of a standard for video quality metadata. 

In relation to the first topic, Margaret Pinson (NTIA/ITS, US), talked about why no-reference metrics for image and video quality lack accuracy and reproducibility [3] and presented new datasets containing camera noise and compression artifacts for the development of no-reference metrics by the group. In addition, Oliver Wiedeman (University of Konstanz, Germany) presented his work on cross-resolution image quality assessment.

Regarding the computation of complexity indices, Maria Martini (Kingston University, UK) presented a study comparing 12 metrics (and possible combinations) for assessing video content complexity. Vignesh V. Menon (University of Klagenfurt, Austria) presented a summary of live per-title encoding approaches using video complexity features. Ioannis Katsavounidis and Cosmin Stejerean (Meta, US) presented their work on using motion search to order videos by coding complexity, also making available the software in open source. In addition, they led a discussion on supplementing classic SI and TI with improved complexity metrics (VCA, motion search, etc.).

Finally, related to the third topic, Ioannis Katsavounidis (Meta, US) provided an update on the status of the project. Given that the idea is already mature enough, a contribution will be made to MPEG to consider the insertion of metadata of video metrics into the encoded video streams. In addition, a liaison with AOMedia will be established that may go beyond this particular topic. And include best practices on subjective testing, IMG topics, etc.

Joint Effort Group (JEG) – Hybrid

The JEG group was focused on a joint work to develop hybrid perceptual/bitstream metrics and gradually evolved over time to include several areas of Video Quality Assessment (VQA), such as the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. Currently, the group is working on research problems rather than algorithms and models with immediate applicability. In addition, the group has launched a new website, which includes a list of activities of interest, freely available publications, and other resources. 

Two examples of research problems addressed by the group were shown by the two presentations given by Lohic Fotio Tiotsop (Politecnico di Torino, Italy). The topic of the first presentation was related to the training of artificial intelligence observers for a wide range of applications, while the second presentation provided guidelines to train, validate, and publish DNN-based objective measures.

5G Key Performance Indicators (5GKPI)

The 5GKPI group studies the relationship between key performance indicators of new 5G networks and QoE of video services on top of them. In this meeting, Pablo Pérez (Nokia XR Lab, Spain) presented an overview of activities related to QoE and XR within 3GPP.

Immersive Media Group (IMG)

The IMG group is focused on the research on quality assessment of immersive media. The main joint activity going on within the group is the development of a test plan to evaluate the QoE of immersive interactive communication systems. After the discussions that took place in previous meetings and audio calls, a tentative schedule has been proposed to start the execution of the test plan in the following months. In this sense, a new work item will be proposed in the next ITU-T SG12 meeting to establish a collaboration between VQEG-IMG and ITU on this topic.

In addition to this, a variety of different topics related to immersive media technologies were covered in the works presented during the meeting. For example, Yaosi Hu (Wuhan University, China) presented her work on video quality assessment based on quality aggregation networks. In relation to light field imaging, Maria Martini (Kingston University, UK) exposed the main problems related to what light field quality assessment datasets are currently meeting and presented a new dataset. Also, there were three talks by researchers from CWI (Netherlands) dealing with point cloud QoE assessment: Silvia Rossi presented a behavioral analysis in a 6-DoF VR system, taking into account the influence of content, quality and user disposition [4]; Shishir Subramanyam presented his work related to the subjective QoE evaluation of user-centered adaptive streaming of dynamic point clouds [5]; and Irene Viola presented a point cloud objective quality assessment using PCA-based descriptors (PointPCA). Another presentation related to point cloud quality assessment was delivered by Marouane Tliba (Université d’Orleans, France), who presented an efficient deep-based graph objective metric

In addition, Shirin Rafiei (RISE, Sweden) gave a talk on UX and QoE aspects of remote control operations using a laboratory platform, Marta Orduna (Universidad Politécnica de Madrid, Spain) presented her work on comparing ACR, SSDQE, and SSCQE in long duration 360-degree videos, whose results will be used to submit a proposal to extend ITU-T Rec. P.919 for long sequences, and Ali Ak (Nantes Université, France) his work on just noticeable differences to HDR/SDR image/video quality.    

Quality Assessment for Computer Vision Applications (QACoViA)

The goal of the QACoViA group is to study the visual quality requirements for computer vision methods, where the “final observer” is an algorithm. Four presentations were delivered in this meeting addressing diverse related topics. In the first one, Mikołaj Leszczuk (AGH University, Poland) presented a method for assessing objective video quality for automatic license plate recognition tasks [6]. Also, Femi Adeyemi-Ejeye (University of Surrey, UK) presented his work related to the assessment of rail 8K-UHD CCTV facing video for the investigation of collisions. The third presentation dealt with the application of facial expression recognition and was delivered by Lucie Lévêque (Nantes Université, France), who compared the robustness of humans and deep neural networks on this task [7]. Finally, Alban Marie (INSA Rennes, France) presented a study video coding for machines through a large-scale evaluation of DNNs robustness to compression artefacts for semantic segmentation [8].

Other updates

In relation to the Human Factors for Visual Experiences (HFVE) group, Maria Martini (Kingston University, UK) provided a summary of the status of IEEE recommended practice for the quality assessment of light field imaging. Also, Kjell Brunnström (RISE, Sweden) presented a study related to the perceptual quality of video on simulated low temperatures in LCD vehicle displays.

In addition, a new group was created in this meeting called Emerging Technologies Group (ETG), whose main objective is to address various aspects of multimedia that do not fall under the scope of any of the existing VQEG groups. The topics addressed are not necessarily directly related to “video quality” but can indirectly impact the work addressed as part of VQEG. In particular, two major topics of interest were currently identified: AI-based technologies and greening of streaming and related trends. Nevertheless, the group aims to provide a common platform for people to gather together and discuss new emerging topics, discuss possible collaborations in the form of joint survey papers/whitepapers, funding proposals, etc.

Moreover, it was agreed during the meeting to make the Psycho-Physiological Quality Assessment (PsyPhyQA) group dormant until interest resumes in this effort. Also, it was proposed to move the Implementer’s Guide for Video Quality Metrics (IGVQM) project into the JEG-Hybrid, since their activities are currently closely related. This will be discussed in future group meetings and the final decisions will be announced. Finally, as a reminder, the VQEG GitHub with tools and subjective labs setup is still online and kept updated.

The next VQEG plenary meeting will take place in May 2023 and the location will be announced soon on the VQEG website.

References

[1] Maria G. Martini, “On the relationship between SSIM and PSNR for DCT-based compressed images and video: SSIM as content-aware PSNR”, TechRxiv. Preprint. https://doi.org/10.36227/techrxiv.21725390.v1, 2022.
[2] J. Zhu, P. Le Callet; A. Perrin, S. Sethuraman, K. Rahul, “On The Benefit of Parameter-Driven Approaches for the Modeling and the Prediction of Satisfied User Ratio for Compressed Video”, IEEE International Conference on Image Processing (ICIP), Oct. 2022.
[3] Margaret H. Pinson, “Why No Reference Metrics for Image and Video Quality Lack Accuracy and Reproducibility”, Frontiers in Signal Processing, Jul. 2022.
[4] S. Rossi, I. viola, P. Cesar, “Behavioural Analysis in a 6-DoF VR System: Influence of Content, Quality and User Disposition”, Proceedings of the 1st Workshop on Interactive eXtended Reality, Oct. 2022.
[5] S. Subramanyam, I. Viola, J. Jansen, E. Alexiou, A. Hanjalic, P. Cesar, “Subjective QoE Evaluation of User-Centered Adaptive Streaming of Dynamic Point Clouds”, International Conference on Quality of Multimedia Experience (QoMEX), Sep. 2022.
[6] M. Leszczuk, L. Janowski, J. Nawała, and A. Boev, “Method for Assessing Objective Video Quality for Automatic License Plate Recognition Tasks”, Communications in Computer and Information Science, Oct. 2022.
[7] L. Lévêque, F. Villoteau, E. V. B. Sampaio, M. Perreira Da Silva, and P. Le Callet, “Comparing the Robustness of Humans and Deep Neural Networks on Facial Expression Recognition”, Electronics, 11(23), Dec. 2022.
[8] A. Marie, K. Desnos, L. Morin, and Lu Zhang, “Video Coding for Machines: Large-Scale Evaluation of Deep Neural Networks Robustness to Compression Artifacts for Semantic Segmentation”, IEEE International Workshop on Multimedia Signal Processing (MMSP), Sep. 2022.

VQEG Column: VQEG Meeting May 2022

Introduction

Welcome to this new column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG), which will provide an overview of the last VQEG plenary meeting that took place from 9 to 13 May 2022. It was organized by INSA Rennes (France), and it was the first face-to-face meeting after the series of online meetings due to the Covid-19 pandemic. Remote attendance was also offered, which made possible that around 100 participants, from 17 different countries, attended the meeting (more than 30 of them attended in person). During the meeting, more than 40 presentations were provided, and interesting discussion took place. All the related information, minutes, and files from the meeting are available online in the VQEG meeting website, and video recordings of the meeting are available in Youtube.

Many of the works presented at this meeting can be relevant for the SIGMM community working on quality assessment. Particularly interesting can be the proposals to update the ITU-T Recommendations P.910 and P.913, as well as the presented publicly available datasets. We encourage those readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Group picture of the VQEG Meeting 9-13 May 2022 in Rennes (France).

Overview of VQEG Projects

Audiovisual HD (AVHD)

The AVHD group investigates improved subjective and objective methods for analyzing commonly available video systems. In this sense, the group continues working on extensions of the ITU-T Recommendation P.1204 to cover other encoders (e.g., AV1) apart from H.264, HEVC, and VP9. In addition, the project’s Quality of Experience (QoE) Metrics for Live Video Streaming Applications (Live QoE) and Advanced Subjective Methods (AVHD-SUB) are still ongoing. 

In this meeting, several AVHD-related topics were discussed, supported by six different presentations. In the first one, Mikolaj Leszczuk (AGH University, Poland) presented an analysis of the influence on the subjective assessment of the quality of video transmission of experiment conditions, such as video sequence order, variation and repeatability that can entail a “learning” process of the test participants during the test. In the second presentation, Lucjan Janowski (AGH University, Poland) presented two proposals towards more ecologically valid experiment designs: the first one using the Absolute Category Rating [1] without scale but in a “think aloud” manner, and the second one called “Your Youtube, our lab” in which the user selects the content that he or she prefers and a question quality appears during the viewing experience through a specifically designed interface. Also dealing with the study of testing methodologies, Babak Naderi (TU-Berlin, Germany) presented work on subjective evaluation of video quality with a crowdsourcing approach, while Pierre David (Capacités, France) presented a three-lab experiment, involving Capacités (France), RISE (Sweden) and AGH University (Poland) on quality evaluation of social media videos. Kjell Brunnström (RISE, Sweden) continued by giving an overview of video quality assessment of Video Assistant Refereeing (VAR) systems, and lastly, Olof Lindman (SVT, Sweden) presented another effort to reduce the lack of open datasets with the Swedish Television (SVT) Open Content.

Quality Assessment for Health applications (QAH)

The QAH group works on the quality assessment of health applications, considering both subjective evaluation and the development of datasets, objective metrics, and task-based approaches. In this meeting, Lucie Lévêque (Nantes Université, France) provided an overview of the recent activities of the group, including a submitted review paper on objective quality assessment for medical images, a special session accepted for IEEE International Conference on Image Processing (ICIP) that will take place in October in Bordeaux (France), and a paper submitted to IEEE ICIP on quality assessment through detection task of covid-19 pneumonia. The work described in this paper was also presented by Meriem Outtas (INSA Rennes, France).

In addition, there were two more presentations related to the quality assessment of medical images. Firstly, Yuhao Sun (University of Edinburgh, UK) presented their research on a no-reference image quality metric for visual distortions on Computed Tomography (CT) scans [2]. Finally, Marouane Tliba (Université d’Orleans, France) presented his studies on quality assessment of medical images through deep-learning techniques using domain adaptation.

Statistical Analysis Methods (SAM)

The SAM group works on improving analysis methods both for the results of subjective experiments and for objective quality models and metrics. The group is currently working on a proposal to update the ITU-T Recommendation P.913, including new testing methods for subjective quality assessment and statistical analysis of the results. Margaret Pinson presented this work during the meeting.   

In addition, five presentations were delivered addressing topics related to the group activities. Jakub Nawała (AGH University, Poland) presented the Generalised Score Distribution to accurately describe responses from subjective quality experiments. Three presentations were provided by members of Nantes Université (France): Ali Ak presented his work on spammer detection on pairwise comparison experiments, Andreas Pastor talked about how to improve the maximum likelihood difference scaling method in order to measure the inter-content scale, and Chama El Majeny presented the functionalities of a subjective test analysis tool, whose code will be publicly available. Finally, Dietmar Saupe (Univerity of Konstanz, Germany) delivered a presentation on subjective image quality assessment with boosted triplet comparisons.

Computer Generated Imagery (CGI)

CGI group is devoted to analyzing and evaluating computer-generated content, with a focus on gaming in particular. Currently, the group is working on the ITU-T Work Item P.BBQCG on Parametric bitstream-based Quality Assessment of Cloud Gaming Services. Apart from this, Jerry (Xiangxu) Yu (University of Texas at Austin, US) presented a work on subjective and objective quality assessment of user-generated gaming videos and Nasim Jamshidi (TUB, Germany) presented a deep-learning bitstream-based video quality model for CG content.

No Reference Metrics (NORM)

The NORM group is an open collaborative project for developing no-reference metrics for monitoring visual service quality. Currently, the group is working on three topics: the development of no-reference metrics, the clarification of the computation of the Spatial and Temporal Indexes (SI and TI, defined in the ITU-T Recommendation P.910), and on the development of a standard for video quality metadata.  

At this meeting, this was one of the most active groups and the corresponding sessions included several presentations and discussions. Firstly, Yiannis Andreopoulos (iSIZE, UK) presented their work on domain-specific fusion of multiple objective quality metrics. Then, Werner Robitza (AVEQ GmbH/TU Ilmenau, Germany) presented the updates on SI/TI clarification activities, which is leading an update of the ITU-T Recommendation P.910. In addition, Lukas Krasula (Netflix, US) presented their investigations on the relation between banding annoyance and the overall quality perceived by the viewers. Hadi Amirpour (University of Klagenfurt, Austria) delivered two presentations related to their Video Complexity Analyzer and their Video Complexity Dataset, which are both publicly available. Finally, Mikołaj Leszczuk (AGH University , Poland) gave two talks on their research related to User-Generated Content (UGC) (a.k.a. in-the-wild video content) recognition and on advanced video quality indicators to characterise video content.   

Joint Effort Group (JEG) – Hybrid

The JEG group was focused on joint work to develop hybrid perceptual/bitstream metrics and gradually evolved over time to include several areas of Video Quality Assessment (VQA), such as the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. A report on the ongoing activities of the group was presented by Enrico Masala (Politecnico di Torino, Italy), which included the release of a new website to reflect the evolution that happened in the last few years within the group. Although currently the group is not directly seeking the development of new metrics or tools readily available for VQA, it is still working on related topics such as the studies by Lohic Fotio Tiotsop (Politecnico di Torino, Italy) on the sensitivity of artificial intelligence-based observers to input signal modification.

5G Key Performance Indicators (5GKPI)

The 5GKPI group studies the relationship between key performance indicators of new 5G networks and QoE of video services on top of them. In this meeting, Pablo Pérez (Nokia, Spain) presented an extended report on the group activities, from which it is worth noting the joint work on a contribution to the ITU-T Work Item G.QoE-5G

Immersive Media Group (IMG)

The IMG group is focused on the research on the quality assessment of immersive media. Currently, the main joint activity of the group is the development of a test plan for evaluating the QoE of immersive interactive communication systems. In this sense, Pablo Pérez (Nokia, Spain) and Jesús Gutiérrez (Universidad Politécnica de Madrid, Spain) presented a follow up on this test plan including an overview of the state-of-the-art on related works and a taxonomy classifying the existing systems [3]. This test plan is closely related to the work carried out by the ITU-T on QoE Assessment of eXtended Reality Meetings, so Gunilla Berndtsson (Ericsson, Sweden) presented the latest advances on the development of the P.QXM.  

Apart from this, there were four presentations related to the quality assessment of immersive media. Shirin Rafiei (RISE, Sweden) presented a study on QoE assessment of an augmented remote operating system for scaling in smart mining applications. Zhengyu Zhang (INSA Rennes, France) gave a talk on a no-reference quality metric for light field images based on deep-learning and exploiting angular and spatial information. Ali Ak (Nantes Université, France) presented a study on the effect of temporal sub-sampling on the accuracy of the quality assessment of volumetric video. Finally, Waqas Ellahi (Nantes Université, France) showed their research on a machine-learning framework to predict Tone-Mapping Operator (TMO) preference based on image and visual attention features [4].

Quality Assessment for Computer Vision Applications (QACoViA)

The goal of the QACoViA group is to study the visual quality requirements for computer vision methods. In this meeting, there were three presentations related to this topic. Mikołaj Leszczuk (AGH University, Poland) presented an objective video quality assessment method for face recognition tasks. Also, Alban Marie  (INSA Rennes, France) showed an analysis of the correlation of quality metrics with artificial intelligence accuracy. Finally, Lucie Lévêque (Nantes Université, France) gave an overview of a study on the reliability of existing algorithms for facial expression recognition [5]. 

Intersector Rapporteur Group on Audiovisual Quality Assessment (IRG-AVQA)

The IRG-AVQA group studies topics related to video and audiovisual quality assessment (both subjective and objective) among ITU-R Study Group 6 and ITU-T Study Group 12. In this sense, Chulhee Lee (Yonsei University, South Korea) and Alexander Raake (TU Ilmenau, Germany) provided an overview on ongoing activities related to quality assessment within ITU-R and ITU-T.

Other updates

In addition, the Human Factors for Visual Experiences (HFVE), whose objective is to uphold the liaison relation between VQEG and the IEEE standardization group P3333.1, presented their advances in relation to two standards: IEEE P3333.1.3 – Deep-Learning-based assessment of VE based on HF, which has been approved and published, and the IEEE P3333.1.4 on Light field imaging, which has been submitted and is in the process to be approved. Also, although there were not many activities in this meeting within the Implementer’s Guide for Video Quality Metrics (IGVQM) and the Psycho-Physiological Quality Assessment (PsyPhyQA) they are still active. Finally, as a reminder, the VQEG GitHub with tools and subjective labs setup is still online and kept updated.

The next VQEG plenary meeting will take place online in December 2022. Please, see VQEG Meeting information page for more information.

References

[1] ITU, “Subjective video quality assessment methods for multimedia applications”, ITU-T Recommendation P.910, Jul. 2022.
[2] Y. Sun, G. Mogos, “Impact of Visual Distortion on Medical Images”, IAENG International Journal of Computer Science, 1:49, Mar. 2022.
[3] P. Pérez, E. González-sosa, J. Gutiérrez, N. García, “Emerging Immersive Communication Systems: Overview, Taxonomy, and Good Practices for QoE Assessment”, Frontiers in Signal Processing, Jul. 2022.
[4] W. Ellahi, T. Vigier, P. Le Callet, “A machine-learning framework to predict TMO preference based on image and visual attention features”, International Workshop on Multimedia Signal Processing, Oct. 2021.
[5] E. M. Barbosa Sampaio, L. Lévêque, P. Le Callet, M. Perreira Da Silva, “Are facial expression recognition algorithms reliable in the context of interactive media? A new metric to analyse their performance”, ACM International Conference on Interactive Media Experiences, Jun. 2022.