VQEG Column: VQEG Meeting Jun. 2021 (virtual/online)

Introduction

Welcome to the fifth column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG).
The last VQEG plenary meeting took place online from 7 to 11 June 2021. As the previous meeting celebrated in December 2020, it was organized online (this time by Kingston University) with multiple sessions spread over five days, allowing remote participation of people from 22 different countries of America, Asia, and Europe. More than 100 participants registered to the meeting and they could attend the 40 presentations and several discussions that took place in all working groups. 
This column provides an overview of the recently completed VQEG plenary meeting, while all the information, minutes and files (including the presented slides) from the meeting are available online in the VQEG meeting website

Group picture of the VQEG Meeting 7-11 June 2021.

Several interesting presentations of state-of-the-art works can be of interest to the SIGMM community, in addition to the contributions to several working items of ITU from various VQEG groups. The progress on the new activities launched in the last VQEG plenary meeting (in relation to Live QoE assessment, SI/TI clarification, implementers guide for video quality metrics for coding applications, and the inclusion of video quality metrics as metadata in compressed streams), as well as the proposal for a new joint work on evaluation of immersive communication systems from a task-based or interactive perspective within the Immersive Media Group.

We encourage those readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Overview of VQEG Projects

Audiovisual HD (AVHD)

AVHD group works on improved subjective and objective methods for video-only and audiovisual quality of commonly available systems. Currently, after the project AVHD/P.NATS2 (a joint collaboration between VQEG and ITU SG12) finished in 2020 [1], two projects are ongoing within AVHD group: QoE Metrics for Live Video Streaming Applications (Live QoE), which was launched in the last plenary meeting, and Advanced Subjective Methods (AVHD-SUB).
The main discussion during the AVHD sessions was related to the Live QoE project, which was led by Shahid Satti (Opticom) and Rohit Puri (Twitch). In addition to the presentation of the project proposal, the main decisions reached until now were exposed (e.g., use of videos of 20-30 seconds with resolution 1080p and framerates up to 60fps, use ACR as subjective test methodology, generation of test conditions, etc.), as well as open questions were brought up for discussion, especially in relation to how to acquire premium content and network traces. 
In addition to this discussion, Steve Göring (TU Ilmenau) presented and open-source platform (AVrate Voyager) for crowdsourcing/online subjective tests [2], and Shahid Satti (Opticom) presented the performance results of the Opticom models on the project AVHD/P.NATS Phase 2. Finally, Ioannis Katsavounidis (Facebook) presented the subjective testing validation of the AV1 performance from the Alliance for Open Media (AOM) to gather feedback on the test plan and possible interested testing labs from VQEG. It is also worth noting that this session was recorded to be used as raw multimedia data for the Live QoE project. 

Quality Assessment for Health applications (QAH)

The session related to the QAH group group allocated three presentations apart from the project summary provided by Lucie Lévêque (Polytech Nantes). In particular, Meriem Outtas (INSA Rennes) provided a review on objective quality assessment of medical images and videos. This is is one of the topics jointly addressed by the group, which is working on an overview paper in line with the recent review on subjective medical image quality assessment [3]. Moreover, Zohaib Amjad Khan (Université Sorbonne Paris Nord) presented a work on video quality assessment of laparoscopic videos, while Aditja Raj and Maria Martini (Kingston University) presented their work on multivariate regression-based convolutional neural network model for fundus image quality assessment.

Statistical Analysis Methods (SAM)

The SAM session consisted of three presentations followed by discussions on the topics. One of this was related to the description of subjective experiment consistency by p-value p-p plot [4], which was presented by Jakub Nawała (AGH University of Science and Technology). In addition, Zhi Li (Netflix) and Rafał Figlus (AGH University of Science and Technology) presented the progress on the contribution from SAM to the ITU-T to modify the recommendation P.913 to include the MLE model for subject behavior in subjective experiments [5] and the recently available implementation of this model in Excel. Finally, Pablo Pérez (Nokia Bell Labs) and Lucjan Janowski (AGH University of Science and Technology) presented their work on the possibility of performing subjective experiments with four subjects [6].

Computer Generated Imagery (CGI)

Nabajeet Barman (Kingston University) presented a report on the current activities of the CGI group. The main current working topics are related to gaming quality assessment methodologies and quality prediction, and codec comparison for CG content. This group is closely collaborating with the ITU-T SG12, as reflected by its support on the completion of the 3 work items: ITU-T Rec. G.1032 on influence factors on gaming quality of experience, ITU-T Rec. P.809 on subjective evaluation methods for gaming quality, and ITU-T Rec. G.1072 on opinion model for gaming applications. Furthermore, CGI is contributing to 3 new work items: ITU-T work item P.BBQCG on parametric bitstream-based quality assessment of cloud gaming services, ITU-T work item G.OMMOG on opinion models for mobile online gaming applications, and ITU-T work item P.CROWDG on subjective evaluation of gaming quality with a crowdsourcing approach. 
In addition, four presentations were scheduled during the CGI slots. The first one was delivered by Joel Jung (Tencent Media Lab) and David Lindero (Ericsson), who presented the details of the ITU-T work item P.BBQCG. Another one was related to the evaluation of MPEG-5 Part 2 (LCEVC) for gaming video streaming applications, which was presented by Nabajeet Barman (Kingston University) and Saman Zadtootaghaj (Dolby Laboratories). Also Nabajeet together with Maria Martini (Kingston University) presented a dataset, codec comparison and challenges related to user generated HDR gaming video streaming [7]. Finally, JP Tauscher (Technische Universität Braunschweig) presented his work on EEG-based detection of deep fake images. 

No Reference Metrics (NORM)

The session for NORM group included a presentation on the impact of Spatial and Temporal Information (SI and TI) on video quality and compressibility [8], delivered by Werner Robitza (AVEQ GmbH), which was followed by a fruitful discussion on the compression complexity and on the activity related to SI/TI clarification launched in the last VQEG plenary meeting. In addition, there was another presentation from Mikołaj Leszczuk (AGH University of Science and Technology) on content type indicators for technologies supporting video sequence summarization. Finally, Ioannis Katsavounidis (Facebook) led a discussion on the inclusion of video quality metrics as metadata in compressed streams, with a report on the progress on this activity that was started in the last meeting. 

Joint Effort Group (JEG) – Hybrid

The JEG-Hybrid group is currently working on the development of a generally applicable no-reference hybrid perceptual/bitstream model. In this sense, Enrico Masala and Lohic Fotio Tiotsop (Politecnico di Tornio) presented the progress on designing a neural-network approach to model single observers using existing subjectively-annotated image and video datasets [9] (the design of subjective tests tailored for the training of this approach is envisioned for future work). In addition to this activity, the group is working in collaboration with the Sky Group on the “Hodor Project”, which is based on developing a measure that could allow to automatically identify video sequences for which quality metrics are likely to deliver inaccurate Mean Opinion Score (MOS) estimation.
Apart from these joint activities Dr. Yendo Hu (Carnation Communications Inc. and Jimei University) delivered a presentation proposing to work on a benchmarking standard to bring quality, bandwidth, and latency into a common measurement domain.

Quality Assessment for Computer Vision Applications (QACoViA)

In addition to a progress report, the QACoViA group scheduled two interesting presentations on enhancing artificial intelligence resilience to image coding artifacts through expert training (by Alban Marie from INSA Rennes) and on providing datasets to rain no-reference metrics for computer vision applications (by Carolina Whitaker from NTIA/ITS). 

5G Key Performance Indicators (5GKPI)

The 5GKPI session consisted of a presentation by Pablo Pérez (Nokia Bell-Labs) of the progress achieved by the group since the last plenary meeting in the following efforts: 1) the contribution to ITU-T Study Group 12 Question 13 related through the Technical Report about QoE in 5G video services (GSTR-5GQoE), which addresses QoE requirements and factors for some use cases like Tele-operated Driving (ToD), wireless content production, mixed reality offloading and first responder networks; 2) the contribution to the 5G Automotive Association (5GAA) through a high-level contribution on general QoE requirements for remote driving, considering for the near future the execution of subjective tests for ToD video quality; and 3) the long-term plan on working on a methodology to create simple opinion models to estimate average QoE for a network and use case.

Immersive Media Group (IMG)

Several presentations were delivered during the IMG session that were divided into two blocks: one covering technologies and studies related to the evaluation of immersive communication systems from a task-based or interactive perspective, and another one covering other topics related to the assessment of QoE of immersive media. 
The first set of presentations is related to a new proposal for a joint work within IMG related to the ITU-T work item P.QXM on QoE assessment of eXtended Reality meetings. Thus, Irene Viola (CWI) presented an overview of this work item. In addition, Carlos Cortés (Universidad Politécncia de Madrid) presented his work on evaluating the impact of delay on QoE in immersive interactive environments, Irene Viola (CWI) presented a dataset of point cloud dynamic humans for immersive telecommunications, Pablo César (CWI) presented their pipeline for social virtual reality [10], and Narciso García (Universidad Politécncia de Madrid) presented their real-time free-viewpoint video system (FVVLive) [11]. After these presentations, Jesús Gutiérrez (Universidad Politécncia de Madrid) led the discussion on joint next steps with IMG, which, in addition, to identify interested parties in joining the effort to study the evaluation of immersive communication systems, also covered the further analyses to be done from the subjective tests carried out with short 360-degree videos [12] and the studies carried out to assess quality and other factors (e.g., presence) with long omnidirectional sequences. In this sense, Marta Orduna (Universidad Politécnica de Madrid) presented her subjective study to validate a methodology to assess quality, presence, empathy, attitude, and attention in Social VR [13]. Future progress on these joint activities will be discussed in the group audio-calls. 
Within the other block of presentations related to immersive media topics, Maria Martini (Kingston University), Chulhee Lee (Yonsei University), and Patrick Le Callet (Université de Nantes) presented the status of IEEE standardization on QoE for immersive experiences (IEEE P3333.1.4 – Light Field, and IEEE P3333.1.3, deep learning-based quality assessment), Kjell Brunnström (RISE) presented their work on legibility and readability in augmented reality [14], Abdallah El Ali (CWI) presented his work on investigating the relationship between momentary emotion self-reports and head and eye movements in HMD-based 360° videos [15], Elijs Dima (Mid Sweden University) exposed his study on quality of experience in augmented telepresence considering the effects of viewing positions and depth-aiding augmentation [16], Silvia Rossi (UCL) presented her work towards behavioural analysis of 6-DoF user when consuming immersive media [17], and Yana Nehme (INSA Lyon) presented a study on exploring crowdsourcing for subjective quality assessment of 3D Graphics.

Intersector Rapporteur Group on Audiovisual Quality Assessment (IRG-AVQA) and Q19 Interim Meeting

During the IRG-AVQA session, an overview on the progress and recent works within ITU-R SG6 and ITU-T SG12 was provided. In particular, Chulhee Lee (Yonsei University) in collaboration with other ITU rapporteurs presented the progress of ITU-R WP6C on recommendations for HDR content, the work items within: ITU-T SG12 Question 9 on audio-related work items, SG12 Question 13 on gaming and immersive technologies (e.g., augmented/extended reality) among others, SG12 Question 14 recommendations and work items related to the development of video quality models, and SG12 Question 19 on work items related to television and multimedia. In addition, the progress of the group “Implementers Guide for Video Quality Metrics (IGVQM)”, launched in the last plenary meeting by Ioannis Katsavounidis (Facebook) was discussed addressing specific points to push the collection of video quality models and datasets to be used to develop an implementer’s guide for objective video quality metrics for coding applications. 

Other updates

The next VQEG plenary meeting will take place online in December 2021.

In addition, VQEG is investigating the possibility to disseminate the videos from all the talks from these plenary meetings via platforms such as Youtube and Facebook.

Finally, given that some modifications are being made to the public FTP of VQEG, if the links to the presentations included in this column are not opened by the browser, the reader can download all the presentations in one compressed file.

References

[1] A. Raake, S. Borer, S. Satti, J. Gustafsson, R.R.R. Rao, S. Medagli, P. List, S. Göring, D. Lindero, W. Robitza, G. Heikkilä, S. Broom, C. Schmidmer, B. Feiten, U. Wüstenhagen, T. Wittmann, M. Obermann, and R. Bitto, “Multi-model standard for bitstream-, pixel-based and hybrid video quality assessment of UHD/4K: ITU-T P.1204”, IEEE Access, vol. 8, pp. 193020-193049, Oct. 2020.
[2] R.R.R. Rao, S. Göring, and A. Raake, “Towards High Resolution Video Quality Assessment in the Crowd”, IEEE Int. Conference on Quality of Multimedia Experience (QoMEX), Jun. 2021.
[3] L. Lévêque, M. Outtas, H. Liu, and L. Zhang, “Comparative study of the methodologies used for subjective medical image quality assessment”, Physics in Medicine & Biology, Jul. 2021 (Accepted).
[4] J. Nawala, L. Janowski, B. Cmiel, and K. Rusek, “Describing Subjective Experiment Consistency by p-Value P–P Plot”, ACM International Conference on Multimedia (ACM MM), Oct. 2020.
[5] Z. Li, C. G. Bampis, L. Krasula, L. Janowski, and I. Katsavounidis, “A Simple Model for Subject Behavior in Subjective Experiments”, arXiv:2004.02067v3, May 2021.
[6] P. Perez, L. Janowski, N. Garcia, M. Pinson, “Subjective Assessment Experiments That Recruit Few Observers With Repetitions (FOWR)”, arXiv:2104.02618, Apr. 2021.
[7] N. Barman, and M. G. Martini, “User Generated HDR Gaming Video Streaming: Dataset, Codec Comparison and Challenges”, IEEE Transactions on Circuits and Systems for Video Technology, May 2021.
[8] W. Robitza, R.R.R. Rao, S. Göring, and A. Raake, “Impact of Spatial and Temporal Information on Video Quality and Compressibility”, IEEE Int. Conference on Quality of Multimedia Experience (QoMEX), Jun. 2021.
[9] L. Fotio Tiotsop, T. Mizdos, M. Uhrina, M. Barkowsky, P. Pocta, and E. Masala, “Modeling and estimating the subjects’ diversity of opinions in video quality assessment: a neural network based approach”, Multimedia Tools and Applications, vol. 80, pp. 3469–3487, Sep. 2020.
[10] J. Jansen, S. Subramanyam, R. Bouqueau, G. Cernigliaro, M. Martos Cabré, F. Pérez, and P. Cesar, “A Pipeline for Multiparty Volumetric Video Conferencing: Transmission of Point Clouds over Low Latency DASH”, ACM Multimedia Systems Conference (MMSys), May 2020.
[11] P. Carballeira, C. Carmona, C. Díaz, D. Berjón, D. Corregidor, J. Cabrera, F. Morán, C. Doblado, S. Arnaldo, M.M. Martín, and N. García, “FVV Live: A real-time free-viewpoint video system with consumer electronics hardware”, IEEE Transactions on Multimedia, May 2021.
[12] J. Gutiérrez, P. Pérez, M. Orduna, A. Singla, C. Cortés, P. Mazumdar, I. Viola, K. Brunnström, F. Battisti, N. Cieplińska, D. Juszka, L. Janowski, M. Leszczuk, A. Adeyemi-Ejeye, Y. Hu, Z. Chen, G. Van Wallendael, P. Lambert, C. Díaz, J. Hedlund, O. Hamsis, S. Fremerey, F. Hofmeyer, A. Raake, P. César, M. Carli, N. García, “Subjective evaluation of visual quality and simulator sickness of short 360° videos: ITU-T Rec. P.919”, IEEE Transactions on Multimedia, Jul. 2021 (Early Access).
[13] M. Orduna, P. Pérez, J. Gutiérrez, and N. García, “Methodology to Assess Quality, Presence, Empathy, Attitude, and Attention in Social VR: International Experiences Use Case”, arXiv:2103.02550, 2021.
[14] J. Falk, S. Eksvärd, B. Schenkman, B. Andrén, and K. Brunnström “Legibility and readability in Augmented Reality”, IEEE Int. Conference on Quality of Multimedia Experience (QoMEX), Jun. 2021.
[15] T. Xue,  A. El Ali,  G. Ding,  and P. Cesar, “Investigating the Relationship between Momentary Emotion Self-reports and Head and Eye Movements in HMD-based 360° VR Video Watching”, Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, May 2021.
[16] E. Dima, K. Brunnström, M. Sjöström, M. Andersson, J. Edlund, M. Johanson, and T. Qureshi, “Joint effects of depth-aiding augmentations and viewing positions on the quality of experience in augmented telepresence”, Quality and User Experience, vol. 5, Feb. 2020.
[17] S. Rossi, I. Viola, J. Jansen, S. Subramanyam, L. Toni, and P. Cesar, “Influence of Narrative Elements on User Behaviour in Photorealistic Social VR”, International Workshop on Immersive Mixed and Virtual Environment Systems (MMVE), Sep. 28, 2021.

JPEG Column: 91st JPEG Meeting

JPEG Committee issues a Call for Proposals on Holography coding

The 91st JPEG meeting was held online from 19 to 23 April 2021. This meeting saw several activities relating to holographic coding, notably the release of the JPEG Pleno Holography Call for Proposals, consolidated with the definition of the use cases and requirements for holographic coding and common test conditions that will assure the evaluation of the future proposals.

Reconstructed hologram from B-com database (http://plenodb.jpeg.org/).

The 91st meeting was also marked by the start of a new exploration initiative on Non-Fungible Tokens (NFTs), due to the recent interest in this technology in a large number of applications and in particular in digital art. Since NFTs rely on decentralized networks and JPEG has been analysing the implications of Blockchains and distributed ledger technologies in imaging, it is a natural next step to explore how JPEG standardization can facilitate interoperability between applications that make use of NFTs.

The following presents an overview of the major achievements carried out during the 91st JPEG meeting.

The 91st JPEG meeting had the following highlights:

  • JPEG launches call for proposals for the first standard in holographic coding,
  • JPEG NFT,
  • JPEG Fake Media,
  • JPEG AI,
  • JPEG Systems,
  • JPEG XS,
  • JPEG XL,
  • JPEG DNA,
  • JPEG Reference Software.

JPEG launches call for proposals for the first standard in holographic coding

JPEG Pleno aims to provide a standard framework for representing new imaging modalities, such as light field, point cloud, and holographic content. JPEG Pleno Holography is the first standardization effort for a versatile solution to efficiently compress holograms for a wide range of applications ranging from holographic microscopy to tomography, interferometry, and printing and display, as well as their associated hologram types. Key functionalities include support for both lossy and lossless coding, scalability, random access, and integration within the JPEG Pleno system architecture, with the goal of supporting a royalty free baseline.

The final Call for Proposals (CfP) on JPEG Pleno Holography – a milestone in the roll-out of the JPEG Pleno framework – has been issued as the main result of the 91st JPEG meeting, Online, 19-23 April 2021. The deadline for expressions of interest and registration is 1 August 2021. Submissions to the Call for Proposals are due on 1 September 2021.

A second milestone reached at this meeting was the promotion to International Standard of JPEG Pleno Part 2: Light Field Coding (ISO/IEC 21794-2). This standard provides light field coding tools originating from either microlens cameras or camera arrays. Part 1 of this standard, which was promoted to International Standard earlier, provides the overall file format syntax supporting light field, holography and point cloud modalities.

During the 91st JPEG meeting, the JPEG Committee officially began an exciting phase of JPEG Pleno Point Cloud coding standardisation with a focus on learning-based point cloud coding.

The scope of the JPEG Pleno Point Cloud activity is the creation of a learning-based coding standard for point clouds and associated attributes, offering a single-stream, compact compressed domain representation, supporting advanced flexible data access functionalities. The JPEG Pleno Point Cloud standard targets both interactive human visualization, with significant compression efficiency over state of the art point cloud coding solutions commonly used at equivalent subjective quality, and also enables effective performance for 3D processing and computer vision tasks. The JPEG Committee expects the standard to support a royalty-free baseline.

The standard is envisioned to provide a number of unique benefits, including an efficient single point cloud representation for both humans and machines. The intent is to provide humans with the ability to visualise and interact with the point cloud geometry and attributes while providing machines the ability to perform 3D processing and computer vision tasks in the compressed domain, enabling lower complexity and higher accuracy through the use of compressed domain features extracted from the original instead of the lossily decoded point cloud.

JPEG NFT

Non-Fungible Tokens have been the focus of much attention in recent months. Several digitals assets that NFTs point to are either in existing JPEG formats or can be represented in current and emerging formats under development by the JPEG Committee. Furthermore, several trust and security issues have been raised regarding NFTs and the digital assets they rely on. Here again, JPEG Committee has a significant track record in security and trust in imaging applications. Building on this background, the JPEG Committee has launched a new exploration initiative around NFTs to better understand the needs in terms of imaging requirements and how existing as well as potential JPEG standards can help bring security and trust to NFTs in a wide range of applications and notably those that rely on contents that are represented in JPEG formats in still and animated pictures and 3D contents. The first steps in this initiative involve outreach to stakeholders in NFTs and its application and organization of a workshop to discuss challenges and current solutions in NFTs, notably in the context of applications relevant to the scope of the JPEG Standardization Committee. JPEG Committee invites interested parties to subscribe to the mailing list of the JPEG NFT exploration via http://listregistration.jpeg.org.

JPEG Fake Media

The JPEG Fake Media exploration activity continues its work to assess standardization needs to facilitate secure and reliable annotation of media asset creation and modifications in good faith usage scenarios as well as in those with malicious intent. At the 91st meeting, the JPEG Committee released an updated version of the “JPEG Fake Media Context, Use Cases and Requirements” document. This new version includes several refinements including an improved and coherent set of definitions covering key terminology. The requirements have been extended and reorganized into three main identified categories: media creation and modification descriptions, metadata embedding framework and authenticity verification framework. The presentations and video recordings of the 2nd Workshop on JPEG Fake Media are now available on the JPEG website. JPEG invites interested parties to regularly visit https://jpeg.org/jpegfakemedia for the latest information and subscribe to the mailing list via http://listregistration.jpeg.org.

JPEG AI

At the 91st meeting, the results of the JPEG AI exploration experiments for the image processing and computer vision tasks defined at the previous 90th meeting were presented and discussed. Based on the analysis of the results, the exploration experiments description was improved. This activity will allow the definition of a performance assessment framework to use in the learning-based image codecs latent representation in several visual analysis tasks, such as compressed domain image classification and compressed domain material and texture recognition. Moreover, the impact of such experiments on the current version of the Common Test Conditions (CTC) was discussed. 

Moreover, the draft of the Call for Proposals was analysed, notably regarding the training dataset and training procedures as well as the submission requirements. The timeline of the JPEG AI work item was discussed and it was agreed that the final Call for Proposals (CfP) will be issued as an outcome of the 93rd JPEG Meeting. The deadline for expression of interest and registration is 5 November 2021. Further, the submission of bitstreams and decoded images for the test dataset are due on 30 January 2022.

JPEG Systems

During the 91st meeting, the Draft International Standard (DIS) text of JLINK (ISO/IEC 19566-7) and Committee Draft (CD) text of JPEG Snack (ISO/IEC 19566-8) were completed and will be submitted for ballot. Amendments for JUMBF (ISO/IEC 19566-5 AMD1) and JPEG 360 (ISO/IEC 19566-6 AMD1) received a final review and are being released for publication. In addition, new extensions to JUMBF (ISO/IEC 19566-5) are under consideration to support rapidly emerging use cases related to content authenticity and integrity; updated use cases and requirements are being drafted. Finally, discussions have started to create awareness on how to interact with JUMBF boxes and the information they contain, without breaking integrity or interoperability. Interested parties are invited to subscribe to the mailing list of the JPEG Systems AHG in order to contribute to the above activities via http://listregistration.jpeg.org.

JPEG XS

The second editions of JPEG XS Part 1 (Core coding system) and Part 3 (Transport and container formats) were prepared for Final Draft International Standard (FDIS) balloting, with the intention of having both standards published by October 2021. The second editions integrate new coding and signalling capabilities to support RAW Bayer colour filter array (CFA) images, 4:2:0 sampled images and mathematically lossless coding of up to 12-bits per component. The associated profiles and buffer models are handled in Part 2, which is currently under DIS ballot. The focus now has shifted to work on the second editions of Part 4 (Conformance testing) and Part 5 (Reference software). Finally, the JPEG Committee defined a study to investigate future improvements to high dynamic range (HDR) and mathematically lossless compression capabilities, while still honouring the low-complexity and low-latency requirements. In particular, for RAW Bayer CFA content, the JPEG Committee will work on extensions of JPEG XS supporting lossless compression of CFA patterns at sample bit depths above 12 bits.

JPEG XL

The JPEG Committee has finalized JPEG XL Part 2 (File format), which is now at the FDIS stage. A Main profile has been specified in draft Amendment 1 to Part 1, which entered the draft amendment (DAM) stage of the approval process at the current meeting. The draft Main profile has two levels: Level 5 for end-user image delivery and Level 10 for generic use cases, including image authoring workflows. Now that the criteria for conformance have been determined, the JPEG Committee has defined new core experiments to define a set of test codestreams that provides full coverage of the coding tools. Part 4 (Reference software) is now at the DIS stage. With the first edition FDIS texts of both Part 1 and Part 2 now complete, JPEG XL is ready for wide adoption.

JPEG DNA

The JPEG Committee has continued its exploration of coding of images in quaternary representation, particularly suitable for DNA storage. After a successful third workshop presentation by stakeholders, two new use cases were identified along with a large number of new requirements, and a new version of the JPEG DNA overview document was issued and is now made publicly available. It was decided to continue this exploration by organizing the fourth workshop and conducting further outreach to stakeholders, as well as continuing with improving the JPEG DNA overview document.

Interested parties are invited to refer to the following URL and to consider joining the effort by registering to the mailing list of JPEG DNA here: https://jpeg.org/jpegdna/index.html.

JPEG Reference Software

The JPEG Committee is pleased to announce that its standard on the JPEG reference software, 2nd edition, reached the state of International Standard and will be publicly available from both ITU and ISO/IEC.

This standard, to appear as ITU-T T.873 | ISO/IEC 10918-7 (2nd Edition) provides reference implementations to the first JPEG standard, used daily throughout the world. The software included in this document guides vendors on how JPEG (ISO/IEC 10918-1) can be implemented and may serve as a baseline and starting point for JPEG
encoders or decoders.

This second edition updates the two reference implementations to their latest versions, fixing minor defects in the software.

Final Quote

“JPEG standards continue to be a motor of innovation and an enabler of new applications in imaging as witnessed by the release of the first standard for coding of holographic content.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Future JPEG meetings are planned as follows:

  • No. 92, will be held online from 7 to 13 July 2021.
  • No 93, is planned to be held in Berlin, Germany during 16-22 October 2021.

MPEG Column: 134th MPEG Meeting (virtual/online)

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 134th MPEG meeting was once again held as an online meeting, and the official press release can be found here and comprises the following items:

  • First International Standard on Neural Network Compression for Multimedia Applications
  • Completion of the carriage of VVC and EVC
  • Completion of the carriage of V3C in ISOBMFF
  • Call for Proposals: (a) new Advanced Genomics Features and Technologies, (b) MPEG-I Immersive Audio, and (c) coded Representation of Haptics
  • MPEG evaluated Responses on Incremental Compression of Neural Networks
  • Progression of MPEG 3D Audio Standards
  • The first milestone of development of Open Font Format (2nd amendment)
  • Verification tests: (a) low Complexity Enhancement Video Coding (LCEVC) Verification Test and (b) more application cases of Versatile Video Coding (VVC)
  • Standardization work on Version 2 of VVC and VSEI started

In this column, the focus is on streaming-related aspects including a brief update about MPEG-DASH.

First International Standard on Neural Network Compression for Multimedia Applications

Artificial neural networks have been adopted for a broad range of tasks in multimedia analysis and processing, such as visual and acoustic classification, extraction of multimedia descriptors, or image and video coding. The trained neural networks for these applications contain many parameters (i.e., weights), resulting in a considerable size. Thus, transferring them to several clients (e.g., mobile phones, smart cameras) benefits from a compressed representation of neural networks.

At the 134th MPEG meeting, MPEG Video ratified the first international standards on Neural Network Compression for Multimedia Applications (ISO/IEC 15938-17), designed as a toolbox of compression technologies. The specification contains different methods for

  • parameter reduction (e.g., pruning, sparsification, matrix decomposition),
  • parameter transformation (e.g., quantization), and
  • entropy coding 

methods that can be assembled to encoding pipelines combining one or more (in the case of reduction) methods from each group.

The results show that trained neural networks for many common multimedia problems such as image or audio classification or image compression can be compressed by a factor of 10-20 with no performance loss and even by more than 30 with performance trade-off. The specification is not limited to a particular neural network architecture and is independent of the neural network exchange format choice. The interoperability with common neural network exchange formats is described in the annexes of the standard.

As neural networks are becoming increasingly important, the communication thereof over heterogeneous networks to a plethora of devices raises various challenges including efficient compression that is inevitable and addressed in this standard. ISO/IEC 15938 is commonly referred to as MPEG-7 (or the “multimedia content description interface”) and this standard becomes now part 15 of MPEG-7.

Research aspects: Like for all compression-related standards, research aspects are related to compression efficiency (lossy/lossless), computational complexity (runtime, memory), and quality-related aspects. Furthermore, the compression of neural networks for multimedia applications probably enables new types of applications and services to be deployed in the (near) future. Finally, simultaneous delivery and consumption (i.e., streaming) of neural networks including incremental updates thereof will become a requirement for networked media applications and services.

Carriage of Media Assets

At the 134th MPEG meeting, MPEG Systems completed the carriage of various media assets in MPEG-2 Systems (Transport Stream) and the ISO Base Media File Format (ISOBMFF), respectively.

In particular, the standards for the carriage of Versatile Video Coding (VVC) and Essential Video Coding (EVC) over both MPEG-2 Transport Stream (M2TS) and ISO Base Media File Format (ISOBMFF) reached their final stages of standardization, respectively:

  • For M2TS, the standard defines constraints to elementary streams of VVC and EVC to carry them in the packetized elementary stream (PES) packets. Additionally, buffer management mechanisms and transport system target decoder (T-STD) model extension are also defined.
  • For ISOBMFF, the carriage of codec initialization information for VVC and EVC is defined in the standard. Additionally, it also defines samples and sub-samples reflecting the high-level bitstream structure and independently decodable units of both video codecs. For VVC, signaling and extraction of a certain operating point are also supported.

Finally, MPEG Systems completed the standard for the carriage of Visual Volumetric Video-based Coding (V3C) data using ISOBMFF. Therefore, it supports media comprising multiple independent component bitstreams and considers that only some portions of immersive media assets need to be rendered according to the users’ position and viewport. Thus, the metadata indicating the relationship between the region in the 3D spatial data to be rendered and its location in the bitstream is defined. In addition, the delivery of the ISOBMFF file containing a V3C content over DASH and MMT is also specified in this standard.

Research aspects: Carriage of VVC, EVC, and V3C using M2TS or ISOBMFF provides an essential building block within the so-called multimedia systems layer resulting in a plethora of research challenges as it typically offers an interoperable interface to the actual media assets. Thus, these standards enable efficient and flexible provisioning or/and use of these media assets that are deliberately not defined in these standards and subject to competition.

Call for Proposals and Verification Tests

At the 134th MPEG meeting, MPEG issued three Call for Proposals (CfPs) that are briefly highlighted in the following:

  • Coded Representation of Haptics: Haptics provide an additional layer of entertainment and sensory immersion beyond audio and visual media. This CfP aims to specify a coded representation of haptics data, e.g., to be carried using ISO Base Media File Format (ISOBMFF) files in the context of MPEG-DASH or other MPEG-I standards.
  • MPEG-I Immersive Audio: Immersive Audio will complement other parts of MPEG-I (i.e., Part 3, “Immersive Video” and Part 2, “Systems Support”) in order to provide a suite of standards that will support a Virtual Reality (VR) or an Augmented Reality (AR) presentation in which the user can navigate and interact with the environment using 6 degrees of freedom (6 DoF), that being spatial navigation (x, y, z) and user head orientation (yaw, pitch, roll).
  • New Advanced Genomics Features and Technologies: This CfP aims to collect submissions of new technologies that can (i) provide improvements to the current compression, transport, and indexing capabilities of the ISO/IEC 23092 standards suite, particularly applied to data consisting of very long reads generated by 3rd generation sequencing devices, (ii) provide the support for representation and usage of graph genome references, (iii) include coding modes relying on machine learning processes, satisfying data access modalities required by machine learning and providing higher compression, and (iv) support of interfaces with existing standards for the interchange of clinical data.

Detailed information, including instructions on how to respond to the call for proposals, the requirements that must be considered, the test data to be used, and the submission and evaluation procedures for proponents are available at www.mpeg.org.

Call for proposals typically mark the beginning of the formal standardization work whereas verification tests are conducted once a standard has been completed. At the 134th MPEG meeting and despite the difficulties caused by the pandemic situation, MPEG completed verification tests for Versatile Video Coding (VVC) and Low Complexity Enhancement Video Coding (LCEVC).

For LCEVC, verification tests measured the benefits of enhancing four existing codecs of different generations (i.e., AVC, HEVC, EVC, VVC) using tools as defined in LCEVC within two sets of tests:

  • The first set of tests compared LCEVC-enhanced encoding with full-resolution single-layer anchors. The average bit rate savings produced by LCEVC when enhancing AVC were determined to be approximately 46% for UHD and 28% for HD. When enhancing HEVC approximately 31% for UHD and 24% for HD. Test results tend to indicate an overall benefit also when using LCEVC to enhance EVC and VVC.
  • The second set of tests confirmed that LCEVC provided a more efficient means of resolution enhancement of half-resolution anchors than unguided up-sampling. Comparing LCEVC full-resolution encoding with the up-sampled half-resolution anchors, the average bit-rate savings when using LCEVC with AVC, HEVC, EVC and VVC were calculated to be approximately 28%, 34%, 38%, and 32% for UHD and 27%, 26%, 21%, and 21% for HD, respectively.

For VVC, it was already the second round of verification testing including the following aspects:

  • 360-degree video for equirectangular and cubemap formats, where VVC shows on average more than 50% bit rate reduction compared to the previous major generation of MPEG video coding standard known as High Efficiency Video Coding (HEVC), developed in 2013.
  • Low-delay applications such as compression of conversational (teleconferencing) and gaming content, where the compression benefit is about 40% on average,
  • HD video streaming, with an average bit rate reduction of close to 50%.

A previous set of tests for 4K UHD content completed in October 2020 had shown similar gains. These verification tests used formal subjective visual quality assessment testing with “naïve” human viewers. The tests were performed under a strict hygienic regime in two test laboratories to ensure safe conditions for the viewers and test managers.

Research aspects: CfPs offer a unique possibility for researchers to propose research results for adoption into future standards. Verification tests provide objective or/and subjective evaluations of standardized tools which typically conclude the life cycle of a standard. The results of the verification tests are usually publicly available and can be used as a baseline for future improvements of the respective standards including the evaluation thereof.

DASH Update!

Finally, I’d like to provide a brief update on MPEG-DASH! At the 134th MPEG meeting, MPEG Systems recommended the approval of ISO/IEC FDIS 23009-1 5th edition. That is, the MPEG-DASH core specification will be available as 5th edition sometime this year. Additionally, MPEG requests that this specification becomes freely available which also marks an important milestone in the development of the MPEG-DASH standard. Most importantly, the 5th edition of this standard incorporates CMAF support as well as other enhancements defined in the amendment of the previous edition. Additionally, the MPEG-DASH subgroup of MPEG Systems is already working on the first amendment to its 5th edition entitled preroll, nonlinear playback, and other extensions. It is expected that the 5th edition will also impact related specifications within MPEG but also in other Standards Developing Organizations (SDOs) such as DASH-IF, i.e., defining interoperability points (IOPs) for various codecs and others, or CTA WAVE (Web Application Video Ecosystem), i.e., defining device playback capabilities such as the Common Media Client Data (CMCD). Both DASH-IF and CTA WAVE provide means for (conformance) test infrastructure for DASH and CMAF.

An updated overview of DASH standards/features can be found in the Figure below.

MPEG-DASH status as of April 2021.

Research aspects: MPEG-DASH has been ratified almost ten years ago which resulted in a plethora of research articles, mostly related to adaptive bitrate (ABR) algorithms and their impact on the streaming performance including the Quality of Experience (QoE). An overview of bitrate adaptation schemes is provided here including a list of open challenges and issues.

The 135th MPEG meeting will be again an online meeting in July 2021. Click here for more information about MPEG meetings and their developments.

Encouraging more Diverse Scientific Collaborations with the ConfFlow application

Introduction

ConfFlow is an application to encourage people with similar or complementary research interests to find each other at conferences. How scientific collaborations are initiated, how people meet and how an intention is developed to work together is an open question. The aim of this follow-up initiative to ConfLab: Meet the Chairs! held at ACM MM 2019 (conflab.ewi.tudelft.nl) is to help people in the multimedia community to connect with potential collaborators.

As a community, Multimedia is so diverse that it is easy for community members to miss out on very useful expertise and potentially fruitful collaborations. There is a lot of latent knowledge and potential synergies that could exist if we were to offer conference attendees an alternative perspective on their similarities to other attendees. As researchers, we typically find connections through talking to people at the conference either through scientific presentations, personal introductions, or by chance.

The aim of ConfFlow is to allow attendees to browse their similarity to other attendees by harvesting publicly available information about them related to their research interests. Depending on the richness of experience that users are looking for, ConfFlow aims to offer an alternative way for researchers to make new research connections with a similar space. At the basic level, we define the similarity of attendees with an approach similar to paper-reviewer assignment tools, such as the Toronto Paper Matching System (TPMS). Usually, TPMS is used to match reviewers to papers. In an analogous way, ConfFlow creates a visualised similarity space using the publications of the conference attendees. This will allow attendees to interactively explore and find new connections with researchers with complementary research interests (or similar ones).  More details about ConfFlow can be found in the associated demo paper [1]. An example snapshot of the application is shown in Figure 1 below.

ConfFlow was funded by the SIGMM special initiatives fund which supports initiatives related to boosting excellence and strength of SIGMM, addressing opportunities for growth in the community and SIGMM related activities, as well as nurturing new talent. The aim of ConfFlow is to target building on excellence, strengths, and community. 

Figure 1: Visualisation of ConfFlow

This report records our experience and practical issues related to running ConfFlow at ACM Multimedia last year.

Method

Privacy and Ethical Practices

The aim of ConfFlow was to adhere to the highest levels of ethical practice. One of the debates online relates to what is considered private data. One could consider that deriving novel information from publicly available data can still be considered an invasion of privacy [2]. So ConfFlow was proposed and designed to be opt-in only. This means that unlike the visualisation seen in Figure 1, all the identities for anyone visiting the ConfFlow application appeared as just an icon unless the person had activated their account and gave permission for others to see it. While this might seem quite strict, there can be unforeseen privacy related questions when social information is extracted from publicly available information as those who do not choose to opt-in can still become exposed. 

Due to this opt-in strict procedure, we needed to find an active way to engage conference attendees by advertising the application through the conference and also getting access to the conference attendee list so we could target and encourage those people to activate their accounts. This required close coordination with the General Chairs of ACM Multimedia 2020.

Application Realization

ConfFlow was rolled out at ACM Multimedia 2020 for conference attendees. Shortly after the building of this application was approved, the Corona Virus pandemic hit and ACM Multimedia became a virtual conference. Since the embedding space of ConfFlow needs to be built apriori, we needed to have access to the conference attendee list. The workload for the conference organisers increased significantly as a result of the pandemic so we did not manage to get the logistical support to optimise the impact of the application. Since we could not get this, we defaulted to visualising the much larger accepted author list. Each identity in ConfFlow needs to be manually verified which also takes considerable effort.

However, there remained the issue that the application was opt-in. For those who tested the application, they were disappointed because many people were not visible. Many of the authors in any case did not attend the conference, which exacerbated the sparsity issue. Advertising ConfFlow and encouraging participants to activate their account was extremely hard due to the virtual format of the conference and because it was hard to reach the actual conference attendees. 

The demo paper for the application was presented at ACM Multimedia 2020 and received positively.

Discussion and Recommendations

The instantiation of the app was well-received by community members and the SIGMM board. There were some teething problems that we aim to resolve in a follow up to the 2021 edition where we will revise the opt-in policy to something that can allow for a better user experience whilst being careful with individual privacy. We also want to make the possibility for users to connect with people they see in the embedding space directly in the app so that the use of ConfFlow as a social connector tool becomes more explicit. We also plan to focus on different ways to advertise and communicate the application for a wider userbase. Finally, due to the considerable effort required to verify the identities of all individuals in the visualisations, we would like to build a more efficient procedure to make visualisations in future years less manually intensive. To this end, the SIGMM board has funded a second edition of ConfFlow in order for these improvements to be made so we can realise the full potential of the idea while also minimising too much additional logistical support from conference general chairs. We look forward to seeing its impact on future research collaborations.

Acknowledgements

ConfFlow was supported in part by the SIGMM New Initiatives Fund and the Dutch NWO funded MINGLE project number 639.022.606. We thank users who gave feedback on the application during prototyping and implementation and the General Chairs of ACM Multimedia 2020 for their support.

References

[1] Ekin Gedik and Hayley Hung. 2020. ConfFlow: A Tool to Encourage New Diverse Collaborations. In Proceedings of the 28th ACM International Conference on Multimedia (MM ’20). Association for Computing Machinery, New York, NY, USA, 4562–4564. DOI:https://doi.org/10.1145/3394171.3414459.
[2] Townsend, L., & Wallace, C, 2016. Social Media Research: A Guide to Ethics.

An interview with Irene Viola

Irene at the beginning of her research career.

Describe your journey into research from your youth up to the present. What foundational lessons did you learn from this journey? Why were you initially attracted to multimedia?

My passion for multimedia stems from graphic design, actually. As a teenager, I taught myself Photoshop and I was playing around with coding websites. I chose Cinema and Media Engineering as my bachelor to combine the programming aspects with a media-based sensibility, and there I discovered that all the filters I had used in Photoshop had clear mathematical bases. I was hooked! I think the fact that I was coming from a more graphics background led me to always keep in mind the users who would see the end product. Applying filters and changing the appearance of an image or video needs to consider how the final user will engage with the content, how they will experience it. I think it has been very helpful in my research in the quality of experience for multimedia content.

Tell us more about your vision and objectives behind your current roles? What do you hope to accomplish and how will you bring this about?

I am currently working on immersive multimedia systems, and in particular on real-time communication systems. The vision is to make remote communication more lifelike, and interaction more natural. I think we’re all aware of how different a video call feels from a face-to-face meeting. Immersive multimedia can help users feel more present and connected, even when displaced in different corners of the globe. What I aim to accomplish is to bring this technology to everyday users, overcoming the current limitations.

Can you profile your current research, its challenges, opportunities, and implications?

My research is currently focused on the quality of experience for immersive media systems. There are several aspects to it: one aspect is to improve media delivery systems, be it by creating new compression solutions, or by improving the transmission efficiency through user-adaptive solutions, for example. The core idea is that we need to optimize transmission by keeping in mind how the users will interact with the content. Then there’s the aspect of quantifying the reaction of the users to the contents they’re visualizing, identifying the influencing factors and building models that can predict them. It’s quite challenging because we don’t fully understand yet how, and why, humans react the way they do to certain stimuli. But that’s also what makes it fascinating.

How would you describe your top innovative achievements in terms of the problems you were trying to solve, your solutions, and the impact it has today and into the future?

In terms of impact, I would say my top achievements would be the contributions to standardization bodies. My subjective methodologies were adopted to conduct the evaluation of the JPEG Pleno Call for Proposals for Light Field Compression, and along with my colleagues in VQEG, I have contributed to ITU recommendations. It’s quite gratifying to know that your research can serve the scientific community this way.

Over your distinguished career, what are the top lessons you want to share with the audience?

I think my message would be: don’t be afraid to switch up. Throughout my studies, I changed focus many times: in my bachelor, the focus was on sociological aspects of media, as well as technological ones; in my master, I dived deeper into the engineering side of it; in my PhD, I tried to understand the user reaction to media. Switching up allows you to see the same problem from different sides, which can be extremely useful in order to do successful research.

What is the best joke you know?

Since an image is worth a thousand words, I will leave you with my favourite comic strip, by artist Lee Gatlin:

A comic strip by Lee Gatlin (Original post)

If you were conducting this interview, what questions would you ask, and then what would be your answers?

My question would be: how do you best balance work and life? Which is a question I don’t have an answer for, and I’d like to read what other people do about it. I think research is pretty tough in this sense because you always have the feeling that there’s more that you could do, and if you just spend half an hour more, you can reach greater results. So, it’s hard to step back, and your work becomes your life. I try to be mindful of it and remind myself to disconnect, which also helps to get a fresh perspective.

A recent photo of Irene.

Bio: Irene Viola is a tenure-track researcher at the Centrum Wiskunde & Informatica. Her research interests include multimedia compression, transmission, and quality evaluation (https://www.ireneviola.com).

VQEG Column: New topics

Introduction

Welcome to the fourth column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG).
During the last VQEG plenary meeting (14-18 Dec. 2020) various interesting discussions arose regarding new topics not addressed up to then by VQEG groups, which led to launching three new sub-projects and a new project related to: 1) clarifying the computation of spatial and temporal information (SI and TI), 2) including video quality metrics as metadata in compressed bitstreams, 3) Quality of Experience (QoE) metrics for live video streaming applications, and 4) providing guidelines on implementing objective video quality metrics to the video compression community.
The following sections provide more details about these new activities and try to encourage interested readers to follow and get involved in any of them by subscribing to the corresponding reflectors.

SI and TI Clarification

The VQEG No-Reference Metrics (NORM) group has recently focused on the topic of spatio-temporal complexity, revisiting the Spatial Information and Temporal Information (SI/TI) indicators, which are described in ITU-T Rec. P.910 [1]. They were originally developed for the T1A1 dataset in 1994 [2]. The metrics have found good use over the last 25 years – mostly employed for checking the complexity of video sources in datasets. However, SI/TI definitions contain ambiguities, so the goal of this sub-project is to provide revised definitions eliminating implementation inconsistencies.

Three main topics are discussed by VQEG in a series of online meetings:

  • Comparison of existing publicly available implementations for SI/TI: a comparison was made between several public open-source implementations for SI/TI, based on initial feedback from members of Facebook. Bugs and inconsistencies were identified with the handling of video frame borders, treatment of limited vs. full range content, as well as the reporting of TI values for the first frame. Also, the lack of standardized test vectors was brought up as an issue. As a consequence, a new reference library was developed in Python by members of TU Ilmenau, incorporating all bug fixes that were previously identified, and introducing a new test suite, to which the public is invited to contribute material. VQEG is now actively looking for specific test sequences that will be useful for both validating existing SI/TI implementations, but also extending the scope of the metrics, which is related to the next issue described below.
  • Study on how to apply SI/TI on different content formats: the description of SI/TI was found to be not suitable for extended applications such as video with a higher bit depth (> 8 Bit), HDR content, or spherical/3D video. Also, the question was raised on how to deal with the presence of scene changes in content. The community concluded that for content with higher bit depth, SI/TI functions should be calculated as specified, but that the output values could be mapped back to the original 8-Bit range to simplify comparisons. As for HDR, no conclusion was reached, given the inherent complexity of the subject. It was also preliminarily concluded that the treatment of scene changes should not be part of an SI/TI recommendation, instead focusing on calculating SI/TI for short sequences without scene changes, since the way scene changes would be dealt with may depend on the final application of the metrics.
  • Discussion on other relevant uses of SI/TI: it has been widely used for checking video datasets in terms of diversity and classifying content. Also, SI/TI have been used in some no-reference metrics as content features. The question was raised whether SI/TI could be used for predicting how well content could be encoded. The group noted that different encoders would deal with sources differently, e.g. related to noise in the video. It was stated that it would be nice to be able to find a metric that was purely related to content and not affected by encoding or representation.

As a first step, this revision of the topic of SI/TI has resulted in a harmonized implementation and in the identification of future application areas. Discussions on these topics will continue in the next months through audio-calls that are open to interested readers.

Video Quality Metadata Standard

Also within NORM group, another topic was launched related to the inclusion of video quality metadata in compressed streams [3].

Almost all modern transcoding pipelines use full-reference video quality metrics to decide on the most appropriate encoding settings. The computation of these quality metrics is demanding in terms of time and computational resources. In addition, estimation errors propagate and accumulate when quality metrics are recomputed several times along the transcoding pipeline. Thus, retaining the results of these metrics with the video can alleviate these constraints, requiring very little space and providing a “greener” way of estimating video quality. With this goal, the new sub-project has started working towards the definition of a standard format to include video quality metrics metadata both at video bitstream level and system layer [4].

In this sense, the experts involved in the new sub-project are working on the following items:

  • Identification of existing proposals and working groups within other standardisation bodies and organisations that address similar topics and propose amendments including new requirements. For example, MPEG has already worked on the adding of video quality metrics (e.g., PSNR, SSIM, MS-SSIM, VQM, PEVQ, MOS, FISG) metadata at system level (e.g, in MPEG2 streams [5], HTTP [6], etc.[7]).
  • Identification of quality metrics to be considered in the standard. In principle, validated and standardized metrics are of interest, although other metrics can be also considered after a validation process on a standard set of subjective data (e.g., using existing datasets). New metrics to those used in previous approaches are of special interest. (e.g., VMAF [8], FB-MOS [9]).
  • Consideration of the computation of multiple generations of full-reference metrics at different steps of the transcoding chain, of the use of metrics at different resolutions, different spatio-temporal aggregation methods, etc.
  • Definition of a standard video quality metadata payload, including relevant fields such as metric name (e.g., “SSIM”), version (e.g., “v0.6.1”), raw score (e.g., “0.9256”), mapped-to-MOS score (e.g., “3.89”), scaling method (e.g., “Lanczos-5”), temporal reference (e.g., “0-3” frames), aggregation method (e.g., “arithmetic mean”), etc [4].

More details and information on how to join this activity can be found in the NORM webpage.

QoE metrics for live video streaming applications

The VQEG Audiovisual HD Quality (AVHD) group launched a new sub-project on QoE metrics for live media streaming applications (Live QoE) in the last VQEG meeting [10].

The success of a live multimedia streaming session is defined by the experience of a participating audience. Both the content communicated by the media and the quality at which it is delivered matter – for the same content, the quality delivered to the viewer is a differentiating factor. Live media streaming systems undertake a lot of investment and operate under very tight service availability and latency constraints to support multimedia sessions for their audience. Both to measure the return on investment and to make sound investment decisions, it is paramount that we be able to measure the media quality offered by these systems. In this sense, given the large scale and complexity of media streaming systems, objective metrics are needed to measure QoE.

Therefore, the following topics have been identified and are studied [11]:

  • Creation of a high quality dataset, including media clips and subjective scores, which will be used to tune, train and develop objective QoE metrics. This dataset should represent the conditions that take place in typical live media streaming situations, therefore conditions and impairments comprising audio and video tracks (independently and jointly) will be considered. In addition, this datasets should cover a diverse set of content categories, including premium contentes (e.g., sports, movies, concerts, etc.) and user generated content (e.g., music, gaming, real life content, etc.).
  • Development of QoE objective metrics, especially focusing on no-reference or near-no-reference metrics, given the lack of access to the original video at various points in the live media streaming chain. Different types of models will be considered including signal-based (operate on the decoded signal), metadata-based (operate on available metadata, e.g. codecs, resolution, framerate, bitrate, etc.), bitstream-based (operate on the parsed bitstream), and hybrid models (combining signal and metadata) [12]. Also, machine-learning based models will be explored.

Certain challenges are envisioned to be faced when dealing with these two topics, such as separating “content” from “quality” (taking int account that content plays a big role on engagement and acceptability), spectrum expectations, role of network impairments and the collection of enough data to develop robust models [11]. Readers interested in joining this effort are encouraged to visit AVHD webpage for more details.

Implementer’s Guide to Video Quality Metrics

In the last meeting, a new dedicated group on Implementer’s Guide to Video Quality Metrics (IGVQM) was set up to work on introducing and provide guidelines on implementing objective video quality metrics to the video compression community.

During the development of new video coding standards, peak-signal-to-noise-ratio (PSNR) has been traditionally used as the main objective metric to determine which new coding tools to be adopted. It has been furthermore used to establish the bitrate savings that a new coding standard offers over its predecessor through the employment of the so-called “BD-rate” metric [13] that still relies on PSNR for measuring quality.

Although this choice was fully justified for the first image/video coding standards – JPEG (1992), MPEG1 (1994), MPEG2 (1996), JPEG2000 and even H.264/AVC (2004) – since there was simply no other alternative at that time, its continuing use for the development of H.265/HEVC (2013), VP9 (2013), AV1 (2018) and most recently EVC and VVC (2020) is questionable, given the rapid and continuous evolution of more perceptual image/video objective quality metrics, such as SSIM (2004) [14], MS-SSIM (2004) [15], and VMAF (2015) [8].

This project attempts to offer some guidance to the video coding community, including standards setting organisations, on how to better utilise existing objective video quality metrics to better capture the improvements offered by video coding tools. For this, the following goals have been envisioned:

  • Address video compression and scaling impairments only.
  • Explore and use “state-of-the-art” full-reference (pixel) objective metrics, examine applicability of no-reference objective metrics, and obtain reference implementations of them.
  • Offer temporal aggregation methods of image quality metrics into video quality metrics.
  • Present statistical analysis of existing subjective datasets, constraining them to compression and scaling artifacts.
  • Highlight differences among objective metrics and use-cases. For example, in case of very small differences, which metric is more sensitive? Which quality range is better served by what metric?
  • Offer standard logistic mappings of objective metrics to a normalised linear scale.

More details can be found in the working document that has been set up to launch the project [16] and on the VQEG website.

References

[1] ITU-T Rec. P.910. Subjective video quality assessment methods for multimedia applications, 2008.
[2] M. H. Pinson and A. Webster, “T1A1 Validation Test Database,” VQEG eLetter, vol. 1, no. 2, 2015.
[3] I. Katsavounidis, “Video quality metadata in compressed bitstreams”, Presentation in VQEG Meeting, Dec. 2020.
[4] I. Katsavounidis et al. “A case for embedding video quality metrics as metadata in compressed bitstreams, working document, 2019.
[5] ISO/IEC 13818-1:2015/AMD 6:2016 Carriage of Quality Metadata in MPEG2 Streams.
[6] ISO/IEC 23009 Dynamic Adaptive Streaming over HTTP (DASH).
[7] ISO/IEC 23001-10, MPEG Systems Technologies – Part 10: Carriage of timed metadata metrics of media in ISO base media file format.
[8] Toward a practical perceptual video quality metric, Tech blog with VMAF’s open sourcing on Github, Jun. 6, 2016.
[9] S.L. Regunathan, H. Wang, Y. Zhang, Y. R. Liu, D. Wolstencroft, S. Reddy, C. Stejerean, S. Gandhi, M. Chen, P. Sethi, A, Puntambekar, M. Coward, I. Katsavounidis, “Efficient measurement of quality at scale in Facebook video ecosystem”, in Applications of Digital Image Processing XLIII, vol. 11510, p. 115100J, Aug. 2020.
[10] R. Puri, “On a QoE metric for live media streaming applications”, Presentation in VQEG Meeting, Dec. 2020.
[11] R. Puri and S. Satti, “On a QoE metric for live media streaming applications”, working document, Jan. 2021.
[12] A. Raake, S. Borer, S. Satti, J. Gustafsson, R.R.R. Rao, S. Medagli, P. List, S. Göring, D. Lindero, W. Robitza, G. Heikkilä, S. Broom, C. Schmidmer, B. Feiten, U. Wüstenhagen, T. Wittmann, M. Obermann, R. Bitto, “Multi-model standard for bitstream-, pixel-based and hybrid video quality assessment of UHD/4K: ITU-T P.1204” , IEEE Access, vol. 8, Oct. 2020.
[13] G. Bjøntegaard, “Calculation of Average PSNR Differences Between RD-Curves”, Document VCEG-M33, ITU-T SG 16/Q6, 13th VCEG Meet- ing, Austin, TX, USA, Apr. 2001.
[14] Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” in IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, April 2004.
[15] Z. Wang, E. P. Simoncelli and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, Pacific Grove, CA, USA, 2003.
[16] I. Katsavounidis, “VQEG’s Implementer’s Guide to Video Quality Metrics (IGVQM) project , working document, 2021.

MPEG Column: 133rd MPEG Meeting (virtual/online)

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 133rd MPEG meeting was once again held as an online meeting, and this time, kicked off with great news, that MPEG is one of the organizations honored as a 72nd Annual Technology & Engineering Emmy® Awards Recipient, specifically the MPEG Systems File Format Subgroup and its ISO Base Media File Format (ISOBMFF) et al.

The official press release can be found here and comprises the following items:

  • 6th Emmy® Award for MPEG Technology: MPEG Systems File Format Subgroup wins Technology & Engineering Emmy® Award
  • Essential Video Coding (EVC) verification test finalized
  • MPEG issues a Call for Evidence on Video Coding for Machines
  • Neural Network Compression for Multimedia Applications – MPEG calls for technologies for incremental coding of neural networks
  • MPEG Systems reaches the first milestone for supporting Versatile Video Coding (VVC) and Essential Video Coding (EVC) in the Common Media Application Format (CMAF)
  • MPEG Systems continuously enhances Dynamic Adaptive Streaming over HTTP (DASH)
  • MPEG Systems reached the first milestone to carry event messages in tracks of the ISO Base Media File Format

In this report, I’d like to focus on ISOBMFF, EVC, CMAF, and DASH.

MPEG Systems File Format Subgroup wins Technology & Engineering Emmy® Award

MPEG is pleased to report that the File Format subgroup of MPEG Systems is being recognized this year by the National Academy for Television Arts and Sciences (NATAS) with a Technology & Engineering Emmy® for their 20 years of work on the ISO Base Media File Format (ISOBMFF). This format was first standardized in 1999 as part of the MPEG-4 Systems specification and is now in its 6th edition as ISO/IEC 14496-12. It has been used and adopted by many other specifications, e.g.:

  • MP4 and 3GP file formats;
  • Carriage of NAL unit structured video in the ISO Base Media File Format which provides support for AVC, HEVC, VVC, EVC, and probably soon LCEVC;
  • MPEG-21 file format;
  • Dynamic Adaptive Streaming over HTTP (DASH) and Common Media Application Format (CMAF);
  • High-Efficiency Image Format (HEIF);
  • Timed text and other visual overlays in ISOBMFF;
  • Common encryption format;
  • Carriage of timed metadata metrics of media;
  • Derived visual tracks;
  • Event message track format;
  • Carriage of uncompressed video;
  • Omnidirectional Media Format (OMAF);
  • Carriage of visual volumetric video-based coding data;
  • Carriage of geometry-based point cloud compression data;
  • … to be continued!

This is MPEG’s fourth Technology & Engineering Emmy® Award (after MPEG-1 and MPEG-2 together with JPEG in 1996, Advanced Video Coding (AVC) in 2008, and MPEG-2 Transport Stream in 2013) and sixth overall Emmy® Award including the Primetime Engineering Emmy® Awards for Advanced Video Coding (AVC) High Profile in 2008 and High-Efficiency Video Coding (HEVC) in 2017, respectively.

Essential Video Coding (EVC) verification test finalized

At the 133rd MPEG meeting, a verification testing assessment of the Essential Video Coding (EVC) standard was completed. The first part of the EVC verification test using high dynamic range (HDR) and wide color gamut (WCG) was completed at the 132nd MPEG meeting. A subjective quality evaluation was conducted comparing the EVC Main profile to the HEVC Main 10 profile and the EVC Baseline profile to AVC High 10 profile, respectively:

  • Analysis of the subjective test results showed that the average bitrate savings for EVC Main profile are approximately 40% compared to HEVC Main 10 profile, using UHD and HD SDR content encoded in both random access and low delay configurations.
  • The average bitrate savings for the EVC Baseline profile compared to the AVC High 10 profile is approximately 40% using UHD SDR content encoded in the random-access configuration and approximately 35% using HD SDR content encoded in the low delay configuration.
  • Verification test results using HDR content had shown average bitrate savings for EVC Main profile of approximately 35% compared to HEVC Main 10 profile.

By providing significantly improved compression efficiency compared to HEVC and earlier video coding standards while encouraging the timely publication of licensing terms, the MPEG-5 EVC standard is expected to meet the market needs of emerging delivery protocols and networks, such as 5G, enabling the delivery of high-quality video services to an ever-growing audience. 

In addition to verification tests, EVC, along with VVC and CMAF were subject to further improvements to their support systems.

Research aspects: as for every new video codec, its compression efficiency and computational complexity are important performance metrics. Additionally, the availability of (efficient) open-source implementations (i.e., x264, x265, soon x266, VVenC, aomenc, et al., etc.) are vital for its adoption in the (academic) research community.

MPEG Systems reaches the first milestone for supporting Versatile Video Coding (VVC) and Essential Video Coding (EVC) in the Common Media Application Format (CMAF)

At the 133rd MPEG meeting, MPEG Systems promoted Amendment 2 of the Common Media Application Format (CMAF) to Committee Draft Amendment (CDAM) status, the first major milestone in the ISO/IEC approval process. This amendment defines:

  • constraints to (i) Versatile Video Coding (VVC) and (ii) Essential Video Coding (EVC) video elementary streams when carried in a CMAF video track;
  • codec parameters to be used for CMAF switching sets with VVC and EVC tracks; and
  • support of the newly introduced MPEG-H 3D Audio profile.

It is expected to reach its final milestone in early 2022. For research aspects related to CMAF, the reader is referred to the next section about DASH.

MPEG Systems continuously enhances Dynamic Adaptive Streaming over HTTP (DASH)

At the 133rd MPEG meeting, MPEG Systems promoted Part 8 of Dynamic Adaptive Streaming over HTTP (DASH) also referred to as “Session-based DASH” to its final stage of standardization (i.e., Final Draft International Standard (FDIS)).

Historically, in DASH, every client uses the same Media Presentation Description (MPD), as it best serves the scalability of the service. However, there have been increasing requests from the industry to enable customized manifests for enabling personalized services. MPEG Systems has standardized a solution to this problem without sacrificing scalability. Session-based DASH adds a mechanism to the MPD to refer to another document, called Session-based Description (SBD), which allows per-session information. The DASH client can use this information (i.e., variables and their values) provided in the SBD to derive the URLs for HTTP GET requests.

An updated overview of DASH standards/features can be found in the Figure below.

MPEG DASH Status as of January 2021.

Research aspects: CMAF is mostly like becoming the main segment format to be used in the context of HTTP adaptive streaming (HAS) and, thus, also DASH (hence also the name common media application format). Supporting a plethora of media coding formats will inevitably result in a multi-codec dilemma to be addressed in the near future as there will be no flag day where everyone will switch to a new coding format. Thus, designing efficient bitrate ladders for multi-codec delivery will an interesting research aspect, which needs to include device/player support (i.e., some devices/player will support only a subset of available codecs), storage capacity/costs within the cloud as well as within the delivery network, and network distribution capacity/costs (i.e., CDN costs).

The 134th MPEG meeting will be again an online meeting in April 2021. Click here for more information about MPEG meetings and their developments.

JPEG Column: 90th JPEG Meeting

JPEG AI becomes a new work item of ISO/IEC

The 90th JPEG meeting was held online from 18 to 22 January 2021. This meeting was distinguished by very relevant activities, notably the new JPEG AI standardization project planning, and the analysis of the Call for Evidence on JPEG Pleno Point Cloud Coding.

The new JPEG AI Learning-based Image Coding System has become an official new work item registered under ISO/IEC 6048 and aims at providing compression efficiency in addition to image processing and computer visions tasks without the need for decompression.

The response to the Call for Evidence on JPEG Pleno Point Cloud Coding was a learning-based method that was found to offer state of the art compression efficiency.  Considering this response, the JPEG Pleno Point Cloud activity will analyse the possibility of preparing a future call for proposals on learning-based coding solutions that will also consider new functionalities, building on the relevant use cases already identified that require machine learning tasks processed in the compressed domain.

Meanwhile the new JPEG XL coding system has reached FDIS stage and it is ready for adoption. JPEG XL offers compression efficiency similar to the best state of the art in image coding, the best lossless compression performance, affordable low complexity and integration with the legacy JPEG image coding standard allowing a friendly transition between the two standards.

The new JPEG AI logo.

The 90th JPEG meeting had the following highlights:

  • JPEG AI,
  • JPEG Pleno Point Cloud response to the Call for Evidence,
  • JPEG XL Core Coding System reaches FDIS stage,
  • JPEG Fake Media exploration,
  • JPEG DNA continues the exploration on image coding suitable for DNA storage,
  • JPEG systems,
  • JPEG XS 2nd edition of Profiles reaches DIS stage.

JPEG AI

The scope of the JPEG AI is the creation of a learning-based image coding standard offering a single-stream, compact compressed domain representation, targeting both human visualization with significant compression efficiency improvement over image coding standards in common use at equivalent subjective quality, and effective performance for image processing and computer vision tasks, with the goal of supporting a royalty-free baseline.

JPEG AI has made several advances during the 90th technical meeting. During this meeting, the JPEG AI Use Cases and Requirements were discussed and collaboratively defined. Moreover, the JPEG AI vision and the overall system framework of an image compression solution with efficient compressed domain representation was defined. Following this approach, a set of exploration experiments were defined to assess the capabilities of the compressed representation generated by learning-based image codecs, considering some specific computer vision and image processing tasks.

Moreover, the performance assessment of the most popular objective quality metrics, using subjective scores obtained during the call for evidence were discussed, as well as anchors and some techniques to perform spatial prediction and entropy coding.

JPEG Pleno Point Cloud response to the Call for Evidence

JPEG Pleno is working towards the integration of various modalities of plenoptic content under a single and seamless framework. Efficient and powerful point cloud representation is a key feature within this vision. Point cloud data supports a wide range of applications including computer-aided manufacturing, entertainment, cultural heritage preservation, scientific research and advanced sensing and analysis. During the 90th JPEG meeting, the JPEG Committee reached an exciting major milestone and reviewed the results of its Final Call for Evidence on JPEG Pleno Point Cloud Coding. With an innovative Deep Learning based point cloud codec supporting scalability and random access submitted, the Call for Evidence results highlighted the emerging role of Deep Learning in point cloud representation and processing. Between the 90th and 91st meetings, the JPEG Committee will be refining the scope and direction of this activity in light of the results of the Call for Evidence.

JPEG XL Core Coding System reaches FDIS stage

The JPEG Committee has finalized JPEG XL Part 1 (Core Coding System), which is now at FDIS stage. The committee has defined new core experiments to determine appropriate profiles and levels for the codec, as well as appropriate criteria for defining conformance. With Part 1 complete, and Part 2 close to completion, JPEG XL is ready for evaluation and adoption by the market.

JPEG Fake Media exploration

The JPEG Committee initiated the JPEG Fake Media JPEG exploration study with the objective to create a standard that can facilitate secure and reliable annotation of media asset generation and modifications. The initiative aims to support usage scenarios that are in good faith as well as those with malicious intent. During the 90th JPEG meeting, the committee released a new version of the document entitled “JPEG Fake Media: Context Use Cases and Requirements” which is available on the JPEG website. A first workshop on the topic was organized on the 15th of December 2020. The program, presentations and a video recording of this workshop are available on the JPEG website. A second workshop will be organized around March 2021. More details will be made available soon on JPEG.org. JPEG invites interested parties to regularly visit https://jpeg.org/jpegfakemedia for the latest information and subscribe to the mailing list via http://listregistration.jpeg.org.

JPEG DNA continues the exploration on image coding suitable for DNA storage

The JPEG Committee continued its exploration for coding of images in quaternary representation, particularly suitable for DNA storage. After a second successful workshop presentation by stakeholders, additional requirements were identified, and a new version of the JPEG DNA overview document was issued and made publicly available. It was decided to continue this exploration by organising a third workshop and further outreach to stakeholders, as well as a proposal for an updated version of the JPEG overview document. Interested parties are invited to refer to the following URL and to consider joining the effort by registering to the mailing list of JPEG DNA here: https://jpeg.org/jpegdna/index.html.

JPEG Systems

JUMBF (ISO/IEC 19566-5) Amendment 1 draft review is complete and it is proceeding to international standard and subsequent publication; additional features to support new applications are under consideration.   Likewise, JPEG 360 (ISO/IEC 19566-5) Amendment 1 draft review is complete, and it is proceeding to international standard and subsequent publication.  The JLINK (ISO/IEC 19566-7) standard completed the committee draft review and is preparing a DIS study text ahead of the 91st meeting. The JPEG Snack (ISO/IEC 19566-8) will make a second working draft.  Interested parties can subscribe to the mailing list of the JPEG Systems AHG in order to contribute to the above activities.

JPEG XS 2nd edition of Profiles reaches DIS stage

The 2nd edition of Part 2 (Profiles) is now at the DIS stage and defines the required new profiles and levels to support the compression of raw Bayer content, mathematically lossless coding of up to 12-bit per component images, and 4:2:0 sampled image content. With the second editions of Parts 1, 2, and 3 completed, and the scheduled second editions of Part 4 (Conformance) and 5 (Reference Software), JPEG XS will soon have received a complete backwards-compatible revision of its entire suite of standards. Moreover, the committee defined a new exploration study to create new coding tools for improving the HDR and mathematically lossless compression capabilities, while still honoring the low-complexity and low-latency requirements.

Final Quote

“The official approval of JPEG AI by JPEG Parent Bodies ISO and IEC is a strong signal of support of this activity and its importance in the creation of AI-based imaging applications” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Future JPEG meetings are planned as follows:

  • No 91, will be held online from April 19 to 23, 2021.
  • No 92, will be held online from July 7 to 13, 2021.

ITU-T Standardization Activities Targeting Gaming Quality of Experience

Motivation for Research in the Gaming Domain

The gaming industry has eminently managed to intrinsically motivate users to interact with their services. According to the latest report of Newzoo, there will be an estimated total of 2.7 billion players across the globe by the end of 2020. The global games market will generate revenues of $159.3 billion in 2020 [1]. This surpasses the movie industry (box offices and streaming services) by a factor of four and almost three times the music industry market in value [2].

The rapidly growing domain of online gaming emerged in the late 1990s and early 2000s allowing social relatedness to a great number of players. During traditional online gaming, typically, the game logic and the game user interface are locally executed and rendered on the player’s hardware. The client device is connected via the internet to a game server to exchange information influencing the game state, which is then shared and synchronized with all other players connected to the server. However, in 2009 a new concept called cloud gaming emerged that is comparable to the rise of Netflix for video consumption and Spotify for music consumption. On the contrary to traditional online gaming, cloud gaming is characterized by the execution of the game logic, rendering of the virtual scene, and video encoding on a cloud server, while the player’s client is solely responsible for video decoding and capturing of client input [3].

For online gaming and cloud gaming services, in contrast to applications such as voice, video, and web browsing, little information existed on factors influencing the Quality of Experience (QoE) of online video games, on subjective methods for assessing gaming QoE, or on instrumental prediction models to plan and manage QoE during service set-up and operation. For this reason, Study Group (SG) 12 of the Telecommunication Standardization Sector of the International Telecommunication Union (ITU-T) has decided to work on these three interlinked research tasks [4]. This was especially required since the evaluation of gaming applications is fundamentally different compared to task-oriented human-machine interactions. Traditional aspects such as effectiveness and efficiency as part of usability cannot be directly applied to gaming applications like a game without any challenges and time passing would result in boredom, and thus, a bad player experience (PX). The absence of standardized assessment methods as well as knowledge about the quantitative and qualitative impact of influence factors resulted in a situation where many researchers tended to use their own self-developed research methods. This makes collaborative work through reliably, valid, and comparable research very difficult. Therefore, it is the aim of this report to provide an overview of the achievements reached by ITU-T standardization activities targeting gaming QoE.

Theory of Gaming QoE

As a basis for the gaming research carried out, in 2013 a taxonomy of gaming QoE aspects was proposed by Möller et al. [5]. The taxonomy is divided into two layers of which the top layer contains various influencing factors grouped into user (also human), system (also content), and context factors. The bottom layer consists of game-related aspects including hedonic concepts such as appeal, pragmatic concepts such as learnability and intuitivity (part of playing quality which can be considered as a kind of game usability), and finally, the interaction quality. The latter is composed of output quality (e.g., audio and video quality), as well as input quality and interactive behaviour. Interaction quality can be understood as the playability of a game, i.e., the degree to which all functional and structural elements of a game (hardware and software) enable a positive PX. The second part of the bottom layer summarized concepts related to the PX such as immersion (see [6]), positive and negative affect, as well as the well-known concept of flow that describes an equilibrium between requirements (i.e., challenges) and abilities (i.e., competence). Consequently, based on the theory depicted in the taxonomy, the question arises which of these aspects are relevant (i.e., dominant), how they can be assessed, and to which extent they are impacted by the influencing factors.

Fig. 1: Taxonomy of gaming QoE aspects. Upper panel: Influence factors and interaction performance aspects; lower panel: quality features (cf. [5]).

Introduction to Standardization Activities

Building upon this theory, the SG 12 of the ITU-T has decided during the 2013-2016 Study Period to start work on three new work items called P.GAME, G.QoE-gaming, and G.OMG. However, there are also other related activities at the ITU-T summarized in Fig. 2 about evaluation methods (P.CrowdG), and gaming QoE modelling activities (G.OMMOG and P.BBQCG).

Fig. 2: Overview of ITU-T SG12 recommendations and on-going work items related to gaming services.

The efforts on the three initial work items continued during the 2017-2020 Study Period resulting in the recommendations G.1032, P.809, and G.1072, for which an overview will be given in this section.

ITU-T Rec. G.1032 (G.QoE-gaming)

The ITU-T Rec. G.1032 aims at identifying the factors which potentially influence gaming QoE. For this purpose, the Recommendation provides an overview table and then roughly classifies the influence factors into (A) human, (B) system, and (C) context influence factors. This classification is based on [7] but is now detailed with respect to cloud and online gaming services. Furthermore, the recommendation considers whether an influencing factor carries an influence mainly in a passive viewing-and-listening scenario, in an interactive online gaming scenario, or in an interactive cloud gaming scenario. This classification is helpful to evaluators to decide which type of impact may be evaluated with which type of text paradigm [4]. An overview of the influencing factors identified for the ITU-T Rec. G.1032 is presented in Fig. 3. For subjective user studies, in most cases the human and context factors should be controlled and their influence should be reduced as much as possible. For example, even though it might be a highly impactful aspect of today’s gaming domain, within the scope of the ITU-T cloud gaming modelling activities, only single-player user studies are conducted to reduce the impact of social aspects which are very difficult to control. On the other hand, as network operators and service providers are the intended stakeholders of gaming QoE models, the relevant system factors must be included in the development process of the models, in particular the game content as well as network and encoding parameters.

Fig. 3: Overview of influencing factors on gaming QoE summarized in ITU-T Rec. G.1032 (cf. [3]).

ITU-T Rec. P.809 (P.GAME)

The aim of the ITU-T Rec. P.809 is to describe subjective evaluation methods for gaming QoE. Since there is no single standardized evaluation method available that would cover all aspects of gaming QoE, the recommendation mainly summarizes the state of the art of subjective evaluation methods in order to help to choose suitable methods to conduct subjective experiments, depending on the purpose of the experiment. In its main body, the draft consists of five parts: (A) Definitions for games considered in the Recommendation, (B) definitions of QoE aspects relevant in gaming, (C) a description of test paradigms, (D) a description of the general experimental set-up, recommendations regarding passive viewing-and-listening tests and interactive tests, and (E) a description of questionnaires to be used for gaming QoE evaluation. It is amended by two paragraphs regarding performance and physiological response measurements and by (non-normative) appendices illustrating the questionnaires, as well as an extensive list of literature references [4].

Fundamentally, the ITU-T Rec. P.809 defines two test paradigms to assess gaming quality:

  • Passive tests with predefined audio-visual stimuli passively observed by a participant.
  • Interactive tests with game scenarios interactively played by a participant.

The passive paradigm can be used for gaming quality assessment when the impairment does not influence the interaction of players. This method suggests a short stimulus duration of 30s which allows investigating a great number of encoding conditions while reducing the influence of user behaviours on the stimulus due to the absence of their interaction. Even for passive tests, as the subjective ratings will be merged with those derived from interactive tests for QoE model developments, it is recommended to give instruction about the game rules and objectives to allow participants to have similar knowledge of the game. The instruction should also explain the difference between video quality and graphic quality (e.g., graphical details such as abstract and realistic graphics), as this is one of the common mistakes of participants in video quality assessment of gaming content.

The interactive test should be used when other quality features such as interaction quality, playing quality, immersion, and flow are under investigation. While for the interaction quality, a duration of 90s is proposed, a longer duration of 5-10min is suggested in the case of research targeting engagement concepts such as flow. Finally, the recommendation provides information about the selection of game scenarios as stimulus material for both test paradigms, e.g., ability to provide repetitive scenarios, balanced difficulty, representative scenes in terms of encoding complexity, and avoiding ethically questionable content.

ITU-T Rec. G.1072 (G.OMG)

The quality management of gaming services would require quantitative prediction models. Such models should be able to predict either “overall quality” (e.g., in terms of a Mean Opinion Score), or individual QoE aspects from characteristics of the system, potentially considering the player characteristics and the usage context. ITU-T Rec. G.1072 aims at the development of quality models for cloud gaming services based on the impact of impairments introduced by typical Internet Protocol (IP) networks on the quality experienced by players. G.1072 is a network planning tool that estimates the gaming QoE based on the assumption of network and encoding parameters as well as game content.

The impairment factors are derived from subjective ratings of the corresponding quality aspects, e.g., spatial video quality or interaction quality, and modelled by non-linear curve fitting. For the prediction of the overall score, linear regression is used. To create the impairment factors and regression, a data transformation from the MOS values of each test condition to the R-scale was performed, similar to the well-known E-model [8]. The R-scale, which results from an s-shaped conversion of the MOS scale, promises benefits regarding the additivity of the impairments and compensation for the fact that participants tend to avoid using the extremes of rating scales [3].

As the impact of the input parameters, e.g. delay, was shown to be highly content-dependent, the model used two modes. If no assumption on a game sensitivity class towards degradations is available to the user of the model (e.g. a network provider), the “default” mode of operation should be used that considers the highest (sensitivity) game class. The “default” mode of operation will result in a pessimistic quality prediction for games that are not of high complexity and sensitivity. If the user of the model can make an assumption about the game class (e.g. a service provider), the “extended” mode can predict the quality with a higher degree of accuracy based on the assigned game classes.

On-going Activities

While the three recommendations provide a basis for researchers, as well as network operators and cloud gaming service providers towards improving gaming QoE, the standardization activities continue by initiating new work items focusing on QoE assessment methods and gaming QoE model development for cloud gaming and online gaming applications. Thus, three work items have been established within the past two years.

ITU-T P.BBQCG

P.BBQCG is a work item that aims at the development of a bitstream model predicting cloud gaming QoE. Thus, the model will benefit from the bitstream information, from header and payload of packets, to reach a higher accuracy of audiovisual quality prediction, compared to G.1072. In addition, three different types of codecs and a wider range of network parameters will be considered to develop a generalizable model. The model will be trained and validated for H.264, H.265, and AV1 video codecs and video resolutions up to 4K. For the development of the model, two paradigms of passive and interactive will be followed. The passive paradigm will be considered to cover a high range of encoding parameters, while the interactive paradigm will cover the network parameters that might strongly influence the interaction of players with the game.

ITU-T P.CrowdG

A gaming QoE study is per se a challenging task on its own due to the multidimensionality of the QoE concept and a large number of influence factors. However, it becomes even more challenging if the test would follow a crowdsourcing approach which is of particular interest in times of the COVID-19 pandemic or if subjective ratings are required from a highly diverse audience, e.g., for the development or investigation of questionnaires. The aim of the P.CrowdG work item is to develop a framework that describes the best practices and guidelines that have to be considered for gaming QoE assessment using a crowdsourcing approach. In particular, the crowd gaming framework provides the means to ensure reliable and valid results despite the absence of an experimenter, controlled network, and visual observation of test participants had to be considered. In addition to the crowd game framework, guidelines will be given that provide recommendations to ensure collecting valid and reliable results, addressing issues such as how to make sure workers put enough focus on the gaming and rating tasks. While a possible framework for interactive tests of simple web-based games is already presented in [9], more work is required to complete the ITU-T work item for more advanced setups and passive tests.

ITU-T G.OMMOG

G.OMMOG is a work item that focuses on the development of an opinion model predicting gaming Quality of Experience (QoE) for mobile online gaming services. The work item is a possible extension of the ITU-T Rec. G.1072. In contrast to G.1072, the games are not executed on a cloud server but on a gaming server that exchanges game states with the user’s clients instead of a video stream. This more traditional gaming concept represents a very popular service, especially considering multiplayer gaming such as recently published AAA titles of the Multiplayer Online Battle Arena (MOBA) and battle royal genres.

So far, it is decided to follow a similar model structure to ITU-T Rec. G.1072. However, the component of spatial video quality, which was a major part of G.1072, will be removed, and the corresponding game type information will not be used. In addition, for the development of the model, it was decided to investigate the impact of variable delay and packet loss burst, especially as their interaction can have a high impact on the gaming QoE. It is assumed that more variability of these factors and their interplay will weaken the error handling of mobile online gaming services. Due to missing information on the server caused by packet loss or strong delays, the gameplay is assumed to be not smooth anymore (in the gaming domain, this is called ‘rubber banding’), which will lead to reduced temporal video quality.

About ITU-T SG12

ITU-T Study Group 12 is the expert group responsible for the development of international standards (ITU-T Recommendations) on performance, quality of service (QoS), and quality of experience (QoE). This work spans the full spectrum of terminals, networks, and services, ranging from speech over fixed circuit-switched networks to multimedia applications over mobile and packet-based networks.

In this article, the previous achievements of the ITU-T SG12 with respect to gaming QoE are described. The focus was in particular on subjective assessment methods, influencing factors, and modelling of gaming QoE. We hope that this information will significantly improve the work and research in this domain by enabling more reliable, comparable, and valid findings. Lastly, the report also points out many on-going activities in this rapidly changing domain, to which everyone is gladly invited to participate.

More information about the SG12, which will host its next E-meeting from 4-13 May 2021, can be found at ITU Study Group (SG) 12.

For more information about the gaming activities described in this report, please contact Sebastian Möller (sebastian.moeller@tu-berlin.de).

References

[1] T. Wijman, The World’s 2.7 Billion Gamers Will Spend $159.3 Billion on Games in 2020; The Market Will Surpass $200 Billion by 2023, 2020.

[2] S. Stewart, Video Game Industry Silently Taking Over Entertainment World, 2019.

[3] S. Schmidt, Assessing the Quality of Experience of Cloud Gaming Services, Ph.D. dissertation, Technische Universität Berlin, 2021.

[4] S. Möller, S. Schmidt, and S. Zadtootaghaj, “New ITU-T Standards for Gaming QoE Evaluation and Management”, in 2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX), IEEE, 2018.

[5] S. Möller, S. Schmidt, and J. Beyer, “Gaming Taxonomy: An Overview of Concepts and Evaluation Methods for Computer Gaming QoE”, in 2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX), IEEE, 2013.

[6] A. Perkis and C. Timmerer, Eds., QUALINET White Paper on Definitions of Immersive Media Experience (IMEx), European Network on Quality of Experience in Multimedia Systems and Services, 14th QUALINET meeting, 2020.

[7] P. Le Callet, S. Möller, and A. Perkis, Eds, Qualinet White Paper on Definitions of Quality of Experience, COST Action IC 1003, 2013.

[8] ITU-T Recommendation G.107, The E-model: A Computational Model for Use in Transmission Planning. Geneva: International Telecommunication Union, 2015.

[9] S. Schmidt, B. Naderi, S. S. Sabet, S. Zadtootaghaj, and S. Möller, “Assessing Interactive Gaming Quality of Experience Using a Crowdsourcing Approach”, in 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), IEEE, 2020.

Multidisciplinary Column: An Interview with Alex Thayer

Profile picture of Alex Thayer, PhD

Alex, could you tell us a bit about your background, and what the road to your current position was?

Profile picture of Alex Thayer, PhD

Alex Thayer, PhD. Head of Research, Amazon (Search); Affiliate Assistant Professor, University of Washington

Sure! I began my career in the tech industry in 1998, when I interned at the IBM Silicon Valley Lab in San Jose, California. Back then it was called the Santa Teresa Lab, and I completed a year-long internship because I wanted to get a richer professional experience than a single school quarter would provide. I also wanted to find an internship at a company that future employers would recognize when they saw my resume. 

At the time, I thought about my career as a narrative that would span decades: What story would I want to tell about my employment history 20 or 30 years later? In a sense, each job would become a “chapter” in that story. As I have learned over the years, this metaphor holds up and each chapter has a slightly different theme: from drama to comedy to Greek tragedy. After about 13 different tech industry jobs, I think I’ve got a lot of genres covered. 

After the year at IBM, I returned to Seattle and spent another year completing my degrees in Technical Communication (College of Engineering) and Art History (College of Art). After graduation, I focused on building my career as a technical writer. I worked at a voice recognition startup, then at a consulting firm, and I wound up doing a lot of “UX work” that was not quite codified into specific roles yet. For example, in a typical week I might work on the design of a UI component, rewrite the Javascript for a website, change the physical layout of a printed user manual, and write copy for a tutorial. I went back to the University of Washington in 2002 to get a Master of Science degree in the Technical Communication program, and to try teaching courses at the college level. 

Eventually I began working full-time at Microsoft in 2006. It was during my time there when I realized technical writing was not my passion. I decided to “adjust my career narrative” and shift toward UX design and research. I was able to make that happen partly because I worked on a cross-disciplinary team at Microsoft: We had interaction design, industrial design, user research, and content publishing included in the same team. I worked on software and hardware projects in a variety of capacities. For one project, I helped design the physical product packaging; on another project, I collaborated with my teammates on the vision for an adaptive keyboard. 

Eventually I hit the limits of what I could do professionally without returning to school and advancing my knowledge about people and their practices. I returned to the University of Washington and spent 4 years working on my PhD in Human Centered Design & Engineering. I moved with my family to the Bay Area in California near the conclusion of my PhD work, and I looked for a role with a focus on emerging technology and interfaces. I found that role at Intel, where I stayed for a year and a half before shifting to a very different research role at VMware. When an opportunity to work at HP Labs arose, I decided to make another career move after a year and a half. It was never my intention to work for different companies so quickly, but I thought about the career narrative perspective and the story I wanted to tell. That perspective helped me make my decision to change roles and work at HP.

What is the professional role of interdisciplinarity in your experience?

Because I have an interdisciplinary skill set, I have discovered that it can be tricky to find a job! As a “T-shaped” person, it’s not always easy to know how to bring my full set of skills to a specific role or organization. In my experience, companies are looking for experts who can go deep in a particular area, but who can also span a variety of topics and skills as needed. In practice, this means collaborating with colleagues who have an assortment of technical backgrounds and methodologies. In a typical week at my current role, I engage with product managers, designers, design technologists, business leaders, engineers, economists, and scientists. All of these roles have different requirements and dialects, which means I am constantly surrounded by “interdisciplinarity,” if that makes sense!

Also, because of my academic research focus on how people collaborate, it’s hard for me to imagine a world without “interdisciplinarity.” That’s how I think about the “role” of interdisciplinarity: It’s more of a fabric or texture that underpins the teams on which I work. And as a leader, I need to consider how different members of a team or organization come together and bring their unique skills and backgrounds to bear on the tasks at hand. 

As a tangible example, we had a terrific undergraduate intern at HP who was working on Computer Science and Humanities degrees at Stanford. His approach to his education resonated with me since I had taken a similar Engineering/Arts path in my own undergrad education. It was fun to watch him apply his thought processes and knowledge on a team of senior engineers, designers, and researchers. I believe he was successful in his intern role because he could reframe problems or goals in creative ways.  

In 2012, you successfully defended your dissertation on “Understanding University Students’ Use of Tools and Artifacts in Support of Collaborative Project Work”. Almost a decade later: what are your thoughts on today’s use of (multimedia) tools and devices at a university level? 

This is a great segue from the question about interdisciplinarity and collaboration! 

As a social scientist, I am excited to see how new tools and processes “come with” students as they graduate and enter the workforce. The space of design prototyping is evolving rapidly, for example, as recent grads expect to use the same tools on the job that they learned how to use while in school. My role at HP included people management, and I had a number of conversations about how to get access to the specific software and hardware tools that employees needed to achieve their vision. Some of these discussions were easy: one of my colleagues asked if he could buy an iron and an ironing board, for example. I said yes. Other discussions required more planning, like when our team wanted to purchase a laser cutter. So perhaps I am taking this question in an unexpected direction, but I do see an opportunity to bridge a gap between the tools and devices in use at the university level and the availability of those same tools and devices in industry.

To be honest, I have a lot to learn about how students are doing their work today. It’s been several years since I finished my PhD. I spent an entire academic quarter observing a class of advanced design students. When I think about how they were doing their project work nearly a decade ago, and when I think about how I saw students working at Yale a couple of years ago, it’s easy for me to see the advances in technology. Or when we took a trip to Wellesley a few years ago, I watched my young daughter play with the VR headsets and try her hand at archaeology. And yet we still love whiteboards and paper! Once university students are able to safely return to in-person learning, I’m sure we will keep using whiteboards and paper as two of our main tools for learning and collaboration.

Looking at your impressive set of published patents: your inventions draw from and actually span many different disciplines. 

Thanks! All of those patents represent the work of teams: I have been lucky to have worked with amazing people who, quite frankly, did the hard work to make those patents happen. So, returning to that topic of interdisciplinarity, I can only point to these published patents because of the amazing work of my colleagues. 

One anecdote stands out for me now, as I think back about my experience at HP Labs in particular. I was meeting with one of my teammates, an amazing colleague named Ian Robinson, and we were having our weekly one-on-one meeting. We were talking about tracking digital pen devices in Virtual Reality (VR) spaces. At one point we began riffing on the idea of a “low-cost” VR controller, and then we had a realization: rather than putting a lot of expensive technology inside a single pen, what if you designed a pair of objects that relied on a different VR tracking method? We could conceivably eliminate the need for some of the guts of the single object if we had two objects moving in virtual space. We stopped out meeting and walked over to our desks, hoping to catch some of our teammates. We described the essential concept to a few of our peers and that was the genesis of the “VR Grabbers” idea. Jackie Yang was a Stanford grad student who was working as an intern in our lab at the time, and he did an incredible amount of work on the project from that point on. His effort culminated in our UIST 2018 paper on which Jackie was the first author! 

How do you work across disciplines?

Continuing that “VR Grabbers” story, I was lucky enough to have a stimulating conversation with a really smart person in a place that enabled us to pursue the idea. Ian and I came from different professional backgrounds. We happened to find ourselves working together and, on that project, we made the most of our different skills. My role after that initial conversation was to evangelize the project inside the organization rather than develop the prototype, for example. So, while it was great to help a team come together around an idea, my involvement on the project was quite different than it would have been if I were earlier in my career.

I said a bit about collaboration earlier, but I’d like to go a bit deeper on this topic. In my dissertation I spent a lot of time in the literature review section exploring the different types of collaboration. I am a big believer in “contested collaboration,” which occurs when a team of people come from different backgrounds and bring their specific perspectives and experiences to bear on a project. It is certainly more challenging to lead a team that engages in contested collaboration: It would be a lot easier if everyone agreed all the time! I’m not saying anything new here, of course.

Could you name a grand research challenge in your current field of work?

I recently saw the 2021 AI Index Report from Stanford (https://aiindex.stanford.edu/report/) and I thought each topic raised in the summary of that report could represent a “grand research challenge.” On the topic of “generative everything”, I am particularly curious about the future of ideas. In 2019 I delivered one of the keynote presentations at the IEEE Games, Entertainment, and Media (IEEE GEM) conference at Yale University in New Haven, Connecticut. In part of my presentation, I raised the question about attribution of ideas and intellectual property when we “partner” with AI. I can imagine a future where it seems less clear “who” came up with an idea: the person or the AI agent? Thinking about the “VR Grabbers” story I told earlier, I wonder how that same story will play out 20 years from now. In my capacity as an affiliate assistant professor at the University of Washington, I’m excited to continue thinking about this topic!  

How and in what form do you feel we as academics can be most impactful?

I think academics need to keep doing what they’re doing. Perhaps that’s a trite answer, but as a society we need to preserve and protect the ability of academics to do their work, to ask very basic questions and be surprised by what they find. I’m not just talking about the need for basic R&D so we can find the next penicillin. I’m also talking about how companies incentivize the effort to identify and use academic work.

I also think others know a lot more about this topic, though! I’d suggest reviewing the 2017 DIS paper, Translational Resources: Reducing the Gap Between Academic Research and HCI Practice, as a useful starting point. Lucas Colusso recently completed his PhD in Human Centered Design & Engineering at the University of Washington, and he was the first author on that paper. Thanks to Professor Gary Hsieh in that department, I became aware of Lucas’ work and now I reference it with my team members when we talk about how to pursue research topics that will have lasting impact. I believe academics are the experts at generating knowledge, and in industry we can apply similar approaches on our projects. 


Bios

Alex Thayer, PhD is the Head of Research for Amazon (Search) in Palo Alto. He completed his PhD in Human Centered Design & Engineering at the University of Washington, where he is currently an Affiliate Assistant Professor. Prior to joining Amazon, Alex was the Chief Experience Architect for HP Labs. He has also worked at VMware, Intel, Microsoft, YouTube, and a voice recognition startup that was partly funded by James Doohan (Scotty from Star Trek). Alex’s professional work focuses on explorations of the social-technical gap and how we make sense of people’s habits, practices, and messy lives. His academic work spans topics from AR/VR to professional collaboration to digital gaming. He has published 12 patents on medical testing, haptic feedback systems, 3D and 4D printing, immersive displays, and wearable technology. He also co-leads his daughter’s Girl Scout troop.

Editor Biographies

Cynthia_Liem_2017Dr. Cynthia C. S. Liem is an Assistant Professor in the Multimedia Computing Group of Delft University of Technology, The Netherlands, and pianist of the Magma Duo. She initiated and co-coordinated the European research project PHENICX (2013-2016), focusing on technological enrichment of symphonic concert recordings with partners such as the Royal Concertgebouw Orchestra. Her research interests consider music and multimedia search and recommendation, and increasingly shift towards making people discover new interests and content which would not trivially be retrieved. Beyond her academic activities, Cynthia gained industrial experience at Bell Labs Netherlands, Philips Research and Google. She was a recipient of the Lucent Global Science and Google Anita Borg Europe Memorial scholarships, the Google European Doctoral Fellowship 2010 in Multimedia, and a finalist of the New Scientist Science Talent Award 2016 for young scientists committed to public outreach.

jochen_huberDr. Jochen Huber is Professor of Computer Science at Furtwangen University, Germany. Previously, he was a Senior User Experience Researcher with Synaptics and an SUTD-MIT postdoctoral fellow in the Fluid Interfaces Group at MIT Media Lab and the Augmented Human Lab at Singapore University of Technology and Design. He holds a Ph.D. in Computer Science and degrees in both Mathematics (Dipl.-Math.) and Computer Science (Dipl.-Inform.), all from Technische Universität Darmstadt, Germany. Jochen’s work is situated at the intersection of Human-Computer Interaction and Human Augmentation. He designs, implements and studies novel input technology in the areas of mobile, tangible & non-visual interaction, automotive UX and assistive augmentation. He has co-authored over 60 academic publications and regularly serves as program committee member in premier HCI and multimedia conferences. He was program co-chair of ACM TVX 2016 and Augmented Human 2015 and chaired tracks of ACM Multimedia, ACM Creativity and Cognition and ACM International Conference on Interface Surfaces and Spaces, as well as numerous workshops at ACM CHI and IUI. Further information can be found on his personal homepage: http://jochenhuber.com