VQEG Column: VQEG Meeting Dec. 2020 (virtual/online)

Introduction

Welcome to the third column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG).
The last VQEG plenary meeting took place online from 14 to 18 December. Given the current circumstances, it was organized all online for the second time, with multiple sessions distributed over five to six hours each day allowing remote participation of people from different time zones. About 130 participants from 24 different countries registered to the meeting and could attend the several presentations and discussions that took place in all working groups.
This column provides an overview of this meeting, while all the information, minutes, files (including the presented slides), and video recordings from the meeting are available online in the VQEG meeting website. As highlights of interest for the SIGMM community, apart from several interesting presentations of state-of-the-art works, relevant contributions to ITU recommendations related to multimedia quality assessment were reported from various groups (e.g., on adaptive bitrate streaming services, on subjective quality assessment of 360-degree videos, on statistical analysis of quality assessments, on gaming applications, etc.), the new group on quality assessment for health applications was launched, and an interesting session on 5G use cases took place, as well as a workshop dedicated to user testing during Covid-19. In addition, new efforts have been launched related to the research on quality metrics for live media streaming applications, and to provide guidelines on implementing objective video quality metrics (ahead of PSNR) to the video compression community.
We encourage those readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Overview of VQEG Projects

Audiovisual HD (AVHD)

AVHD/P.NATS2 project was a joint collaboration between VQEG and ITU SG12, whose goal was to develop a multitude of objective models, varying in terms of complexity/type of input/use-cases for the assessment of video quality in adaptive bitrate streaming services over reliable transport up to 4K. The report of this project, which finished in January 2020, was approved in this meeting. In summary, it resulted in 10 model categories with models trained and validated on 26 subjective datasets. This activity resulted in 4 ITU standards (ITU-T Rec. P.1204 in [1], P.1204.3 in [2], P.1204.4 in [3], P.1204.5 in [4], a dataset created during this effort and a journal publication reporting details on the validation tests [5]. In this sense, one presentation by Alexander Raake (TU Ilmenau) provided details on the P.NATS Phase 2 project and the resulting ITU recommendations, while details of the processing chain used in the project were presented by Werner Robitza (AVEQ GmbH) and David Lindero (Ericsson).
In addition to this activity, there were various presentations covering topics related to this group. For instance, Cindy Chen, Deepa Palamadai Sundar, and Visala Vaduganathan (Facebook) presented their work on hardware acceleration of video quality metrics. Also from Facebook, Haixiong Wang presented their work on efficient measurement of quality at scale in their video ecosystem [6]. Lucjan Janowski (AGH University) proposed a discussion on more ecologically valid subjective experiments, Alan Bovik (University of Texas at Austin) presented a hitchhiker’s guide to SSIM, and Ali Ak (Université de Nantes) presented a comprehensive analysis of crowdsourcing for subjective evaluation of tone mapping operators. Finally, Rohit Puri (Twitch) opened a discussion on the research on QoE metrics for live media streaming applications, which led to the agreement to start a new sub-project within AVHD group on this topic.

Psycho-Physiological Quality Assessment (PsyPhyQA)

The chairs of the PsyPhyQA group provided an update on the activities carried out. In this sense, a test plan for psychophysiological video quality assessment was established and currently the group is aiming to develop ideas to do quality assessment tests with psychophysiological measures in times of a pandemic and to collect and discuss ideas about possible joint works. In addition, the project is trying to learn about physiological correlates of simulator sickness, and in this sense, a presentation was delivered J.P. Tauscher (Technische Universität Braunschweig) on exploring neural and peripheral physiological correlates of simulator sickness. Finally, Waqas Ellahi (Université de Nantes) gave a presentation on visual fidelity of tone mapping operators from gaze data using HMM [7].

Quality Assessment for Health applications (QAH)

This was the first meeting for this new QAH group. The chairs informed about the first audio call that took place on November to launch the project, know how many people are interested in this project, what each member has already done on medical images, what each member wants to do in this joint project, etc.
The plenary meeting served to collect ideas about possible joint works and to share experiences on related studies. In this sense, Lucie Lévêque (Université Gustave Eiffel) presented a review on subjective assessment of the perceived quality of medical images and videos, Maria Martini (Kingston University London) talked about the suitability of VMAF for quality assessment of medical videos (ultrasound & wireless capsule endoscopy), and Jorge Caviedes (ASU) delivered a presentation on cognition inspired diagnostic image quality models.

Statistical Analysis Methods (SAM)

The update report from SAM group presented the ongoing progress on new methods for data analysis, including the discussion with ITU-T (P.913 [8]) and ITU-R (BT.500 [9]) about including a new one in the recommendations.
Several interesting presentations related to the ongoing work within SAM were delivered. For instance, Jakub Nawala (AGH University) presented the “su-JSON”, a uniform JSON-based subjective data format, as well as his work on describing subjective experiment consistency by p-value p–p plots. An interesting discussion was raised by Lucjan Janowski (AGH University) on how to define the quality of a single sequence, analyzing different perspectives (e.g., crowd, experts, psychology, etc.). Also, Babak Naderi (TU Berlin) presented an analysis on the relation on Mean Opinion Score (MOS) and ranked-based statistics. Recent advances on Netflix quality metric VMAF were presented by Zhi Li (Netflix), especially on the properties of VMAF in the presence of image enhancement. Finally, two more presentations addressed the progress on statistical analyses of quality assessment data, one by Margaret Pinson (NTIA/ITS) on the computation of confidence intervals, and one by Suiyi Ling (Université de Nantes) on a probabilistic model to recover the ground truth and annotator’s behavior.

Computer Generated Imagery (CGI)

The report from the chairs of the CGI group covered the progress on the research on assessment methodologies for quality assessment of gaming services (e.g., ITU-T P.809 [10]), on crowdsourcing quality assessment for gaming application (P.808 [11]), on quality prediction and opinion models for cloud gaming (e.g., ITU-T G.1072 [12]), and on models (signal-, bitstream-, and parametric-based models) for video quality assessment of CGI content (e.g., nofu, NDNetGaming, GamingPara, DEMI, NR-GVQM, etc.).
In terms of planned activities, the group is targeting the generation of new gaming datasets and tools for metrics to assess gaming QoE, but also the group is aiming at identifying other topics of interest in CGI rather than gaming content.
In addition, there was a presentation on updates on gaming standardization activities and deep learning models for gaming quality prediction by Saman Zadtootaghaj (TU Berlin), another one on subjective assessment of multi-dimensional aesthetic assessment for mobile game images by Suiyi Ling (Université de Nantes), and one addressing quality assessment of gaming videos compressed via AV1 by Maria Martini (Kingston University London), leading to interesting discussions on those topics.

No Reference Metrics (NORM)

The session for NORM group included a presentation on the differences among existing implementations of spatial and temporal perceptual information indices (SI and TI as defined in ITU-T P.910 [13]) by Cosmin Stejerean (Facebook), which led to an open discussion and to the agreement on launching an effort to clarify the ambiguous details that have led to different implementations (and different results), to generate test vectors for reference and validation of the implementations and to address the computation of these indicators for HDR content. In addition, Margaret Pinson (NTIA/ITS) presented the paradigm of no-reference metric research analyzing design problems and presenting a framework for collaborative development of no-reference metrics for image and video quality. Finally, Ioannis Katsavounidis (Facebook) delivered a talk on addressing the addition of video quality metadata in compressed bitstreams. Further discussions on these topics are planned in the next month within the group.

Joint Effort Group (JEG) – Hybrid

The JEG-Hybrid group is currently working in collaboration with Sky Group in determining when video quality metrics are likely to inaccurately predict the MOS and on modelling single observers’ quality perception based in artificial intelligence techniques. In this sense, Lohic Fotio (Politecnico di Tornio) presented his work on artificial intelligence-based observers for media quality assessment. Also, together with Florence Agboma (Sky UK) they presented their work on comparing commercial and open source video quality metrics for HD constant bitrate videos. Finally, Dariusz Grabowski (AGH University) presented his work on comparing full-reference video quality metrics using cluster analysis.

Quality Assessment for Computer Vision Applications (QACoViA)

The QACoViA group announced Lu Zhang (INSA Rennes) as new third co-chair, who will also work in the near future in a project related to image compression for optimized recognition by distributed neural networks. In addition, Mikołaj Leszczuk (AGH University) presented a report on a recently finished project related to objective video quality assessment method for recognition tasks, in collaboration with Huawei through its Innovation Research Programme.

5G Key Performance Indicators (5GKPI)

The 5GKPI session was oriented to identify possible interested partners and joint works (e.g., contribution to ITU-T SG12 recommendation G.QoE-5G [14], generation of open/reference datasets, etc.). In this sense, it included four presentations of use cases of interest: tele-operated driving by Yungpeng Zang (5G Automotive Association), content production related to the European project 5G-Records by Paola Sunna (EBU), Augmented/Virtual Reality by Bill Krogfoss (Bell Labs Consulting), and QoE for remote controlled use cases by Kjell Brunnström (RISE).

Immersive Media Group (IMG)

A report on the updates within the IMG group was initially presented, especially covering the current joint work investigating the subjective quality assessment of 360-degree video. In particular, a cross-lab test, involving 10 different labs, were carried out at the beginning of 2020 resulting in relevant outcomes including various contributions to ITU SG12/Q13 and MPEG AhG on Quality of Immersive Media. It is worth noting that the new ITU-T recommendation P.919 [15], related to subjective quality assessment of 360-degree videos (in line with ITU-R BT.500 [8] or ITU-T P.910 [13]), was approved in mid-October, and was supported by the results of these cross-lab tests. 
Furthermore, since these tests have already finished, there was a presentation by Pablo Pérez (Nokia Bell-Labs) on possible future joint activities within IMG, which led to an open discussion after it that will continue in future audio calls.
In addition, a total of four talks covered topics related to immersive media technologies, including an update from the Audiovisual Technology Group of the TU Ilmenau on immersive media topics, and a presentation of a no-reference quality metric for light field content based on a structural representation of the epipolar plane image by Ali Ak and Patrick Le Callet (Université de Nantes) [16]. Also, there were two presentations related to 3D graphical contents, one addressing the perceptual characterization of 3D graphical contents based on visual attention patterns by Mona Abid (Université de Nantes), and another one comparing subjective methods for quality assessment of 3D graphics in virtual reality by Yana Nehmé (INSA Lyon). 

Intersector Rapporteur Group on Audiovisual Quality Assessment (IRG-AVQA) and Q19 Interim Meeting

Chulhee Lee (Yonsei University) chaired the IRG-AVQA session, providing an overview on the progress and recent works within ITU-R WP6C in HDR related topics and ITU-T SG12 Questions 9, 13, 14, 19 (e.g., P.NATS Phase 2 and follow-ups, subjective assessment of 360-degree video, QoE factors for AR applications, etc.). In addition, a new work item was announced within ITU-T SG9: End-to-end network characteristics requirements for video services (J.pcnp-char [17]).
From the discussions raised during this session, a new dedicated group was set up to work on introducing and provide guidelines on implementing objective video quality metrics, ahead of PSNR, to the video compression community. The group was named “Implementers Guide for Video Quality Metrics (IGVQM)” and will be chaired by Ioannis Katsavounidis (Facebook), accounting with the involvement of several people from VQEG.
After the IRG-AVQA session, the Q19 interim meeting took place with a report by Chulhee Lee and a presentation by Zhi Li (Netflix) on an update on improvements on subjective experiment data analysis process.

Other updates

Apart from the aforementioned groups, the Human Factors for Visual Experience (HVEI) is still active coordinating VQEG activities in liaison with the IEEE Standards Association Working Groups on HFVE, especially on perceptual quality assessment of 3D, UHD and HD contents, quality of experience assessment for VR and MR, quality assessment of light-field imaging contents, and deep-learning-based assessment of visual experience based on human factors. In this sense, there are ongoing contributions from VQEG members to IEEE Standards.
In addition, there was a workshop dedicated to user testing during Covid-19, which included a presentation on precaution for lab experiments by Kjell Brunnström (RISE), another presentation by Babak Naderi (TU Berlin) on subjective tests during the pandemic, and a break-out session for discussions on the topic.

Finally, the next VQEG plenary meeting will take place in spring 2021 (exact dates still to be agreed), probably online again.

References

[1] ITU-T Rec. P.1204. Video quality assessment of streaming services over reliable transport for resolutions up to 4K, 2020.
[2] ITU-T Rec. P.1204.3. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to full bitstream information, 2020.
[3] ITU-T Rec. P.1204.4. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to full and reduced reference pixel information, 2020.
[4] ITU-T Rec. P.1204.5. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to transport and received pixel information, 2020.
[5] A. Raake, S. Borer, S. Satti, J. Gustafsson, R.R.R. Rao, S. Medagli, P. List, S. Göring, D. Lindero, W. Robitza, G. Heikkilä, S. Broom, C. Schmidmer, B. Feiten, U. Wüstenhagen, T. Wittmann, M. Obermann, R. Bitto, “Multi-model standard for bitstream-, pixel-based and hybrid video quality assessment of UHD/4K: ITU-T P.1204”, IEEE Access, vol. 8, pp. 193020-193049, Oct. 2020.
[6] S.L. Regunathan, H. Wang, Y. Zhang, Y. R. Liu, D. Wolstencroft, S. Reddy, C. Stejerean, S. Gandhi, M. Chen, P. Sethi, A, Puntambekar, M. Coward, I. Katsavounidis, “Efficient measurement of quality at scale in Facebook video ecosystem”, in Applications of Digital Image Processing XLIII, vol. 11510, p. 115100J, Aug. 2020.
[7] W. Ellahi, T. Vigier and P. Le Callet, “HMM-Based Framework to Measure the Visual Fidelity of Tone Mapping Operators”, IEEE International Conference on Multimedia & Expo Workshops (ICMEW), London, United Kingdom, Jul. 2020.
[8] ITU-R Rec. BT.500-14. Methodology for the subjective assessment of the quality of television pictures, 2019.
[9] ITU-T Rec. P.913. Methods for the subjective assessment of video quality, audio quality and audiovisual quality of Internet video and distribution, 2016.
[10] ITU-T Rec. P.809. Subjective evaluation methods for gaming quality, 2018.
[11] ITU-T Rec. P.808. Subjective evaluation of speech quality with a crowdsourcing approach, 2018.
[12] ITU-T Rec. G.1072. Opinion model predicting gaming quality of experience for cloud gaming services, 2020.
[13] ITU-T Rec. P.910. Subjective video quality assessment methods for multimedia applications, 2008.
[14] ITU-T Rec. G.QoE-5G. QoE factors for new services in 5G networks, 2020 (under study).
[15] ITU-T Rec. P.919. Subjective test methodologies for 360º video on head-mounted displays, 2020.
[16] A. Ak, S. Ling and P. Le Callet, “No-Reference Quality Evaluation of Light Field Content Based on Structural Representation of The Epipolar Plane Image”, IEEE International Conference on Multimedia & Expo Workshops (ICMEW), London, United Kingdom, Jul. 2020.
[17] ITU-T Rec. J.pcnp-char. E2E Network Characteristics Requirement for Video Services, 2020 (under study).

JPEG Column: 89th JPEG Meeting

JPEG initiates standardisation of image compression based on AI

The 89th JPEG meeting was held online from 5 to 9 October 2020.

During this meeting, multiple JPEG standardisation activities and explorations were discussed and progressed. Notably, the call for evidence on learning-based image coding was successfully completed and evidence was found that this technology promises several new functionalities while offering at the same time superior compression efficiency, beyond the state of the art. A new work item, JPEG AI, that will use learning-based image coding as core technology has been proposed, enlarging the already wide families of JPEG standards.

Figure 1. JPEG Families of standards and JPEG AI.

The 89th JPEG meeting had the following highlights:

  • JPEG AI call for evidence report
  • JPEG explores standardization needs to address fake media
  • JPEG Pleno Point Cloud Coding reviews the status of the call for evidence
  • JPEG Pleno Holography call for proposals timeline
  • JPEG DNA identifies use cases and requirements
  • JPEG XL standard defines the final specification
  • JPEG Systems JLINK reaches committee draft stage
  • JPEG XS 2nd Edition Parts 1, 2 and 3.

JPEG AI

At the 89th meeting, the submissions to the Call for Evidence on learning-based image coding were presented and discussed. Four submissions were received in response to the Call for Evidence. The results of the subjective evaluation of the submissions to the Call for Evidence were reported and discussed in detail by experts. It was agreed that there is strong evidence that learning-based image coding solutions can outperform the already defined anchors in terms of compression efficiency when compared to state-of-the-art conventional image coding architecture. Thus, it was decided to create a new standardisation activity for a JPEG AI on learning-based image coding system, that applies machine learning tools to achieve substantially better compression efficiency compared to current image coding systems, while offering unique features desirable for efficient distribution and consumption of images. This type of approach should allow obtaining an efficient compressed domain representation not only for visualisation but also for machine learning-based image processing and computer vision. JPEG AI releases to the public the results of the objective and subjective evaluations as well as the first version of common test conditions for assessing the performance of learning-based image coding systems.

JPEG explores standardization needs to address fake media

Recent advances in media modification, particularly deep learning-based approaches, can produce near realistic media content that is almost indistinguishable from authentic content. These developments open opportunities for production of new types of media contents that are useful for many creative industries but also increase risks of spread of maliciously modified content (e.g., ‘deepfake’) leading to social unrest, spreading of rumours or encouragement of hate crimes. The JPEG Committee is interested in exploring if a JPEG standard can facilitate a secure and reliable annotation of media modifications, both in good faith and malicious usage scenarios. 

The JPEG is currently discussing with stakeholders from academia, industry and other organisations to explore the use cases that will define a roadmap to identify the requirements leading to a potential standard. The Committee has received significant interest and has released a public document outlining the context, use cases and requirements. JPEG invites experts and technology users to actively participate in this activity and attend a workshop, to be held online in December 2020. Details on the activities of JPEG in this area can be found on the JPEG.org website. Interested parties are notably encouraged to register to the mailing list of the ad hoc group that has been set up to facilitate the discussions and coordination on this topic.

JPEG Pleno Point Cloud Coding

JPEG Pleno is working towards the integration of various modalities of plenoptic content under a single and seamless framework. Efficient and powerful point cloud representation is a key feature within this vision. Point cloud data supports a wide range of applications including computer-aided manufacturing, entertainment, cultural heritage preservation, scientific research and advanced sensing and analysis. During the 89th JPEG meeting, the JPEG Committee reviewed expressions of interest in the Final Call for Evidence on JPEG Pleno Point Cloud Coding. This Call for Evidence focuses specifically on point cloud coding solutions supporting scalability and random access of decoded point clouds. Between its 89th and 90th meetings, the JPEG Committee will be actively promoting this activity and collecting submissions to participate in the Call for Evidence.

JPEG Pleno Holography

At the 89th meeting, the JPEG Committee released an updated draft of the Call for Proposals for JPEG Pleno Holography. A final Call for Proposals on JPEG Pleno Holography will be released in April 2021. JPEG Pleno Holography is seeking for compression solutions of holographic content. The scope of the activity is quite large and addresses diverse use cases such as holographic microscopy and tomography, but also holographic displays and printing. Current activities are centred around refining the objective and subjective quality assessment procedures. Interested parties are already invited at this stage to participate in these activities.

JPEG DNA

JPEG standards are used in storage and archival of digital pictures. This puts the JPEG Committee in a good position to address the challenges of DNA-based storage by proposing an efficient image coding format to create artificial DNA molecules. JPEG DNA has been established as an exploration activity within the JPEG Committee to study use cases, to identify requirements and to assess the state of the art in DNA storage for the purpose of image archival using DNA in order to launch a standardization activity. To this end, a first workshop was organised on 30 September 2020. Presentations made at the workshop are available from the following URL: http://ds.jpeg.org/proceedings/JPEG_DNA_1st_Workshop_Proceedings.zip.
At its 89th meeting, the JPEG Committee released a second version of a public document that describes its findings regarding storage of digital images using artificial DNA. In this framework, JPEG DNA ad hoc group was re-conducted in order to continue its activities to further refine the above-mentioned document and to organise a second workshop. Interested parties are invited to join this activity by participating in the AHG through the following URL: http://listregistration.jpeg.org.

JPEG XL

Final technical comments by national bodies have been addressed and incorporated into the JPEG XL specification (ISO/IEC 18181-1) and the reference implementation. A draft FDIS study text has been prepared and final validation experiments are planned.

JPEG Systems

The JLINK (ISO/IEC 19566-7) standard has reached the committee draft stage and will be made public.  The JPEG Committee invites technical feedback on the document which is available on the JPEG website.  Development of the JPEG Snack (IS0/IEC 19566-8) standard has begun to support the defined use cases and requirements.  Interested parties can subscribe to the mailing list of the JPEG Systems AHG in order to contribute to the above activities.

JPEG XS

The JPEG committee is finalizing its work on the 2nd Editions of JPEG-XS Part 1, Part 2 and Part 3. Part 1 defines new coding tools required to efficiently compress raw Bayer images. The observed quality gains of raw Bayer compression over compressing in the RGB domain can be as high as 5dB PSNR. Moreover, the second edition adds support for mathematically lossless image compression and allows compression of 4:2:0 sub-sampled images. Part 2 defines new profiles for such content. With the support for low-complexity high-quality compression of raw Bayer (or Color-Filtered Array) data, JPEG XS proves to also be an excellent compression scheme in the professional and consumer digital camera market, as well as in the machine vision and automotive industry.

Final Quote

“JPEG AI will be a new work item completing the collection of JPEG standards. JPEG AI relies on artificial intelligence to compress images. This standard not only will offer superior compression efficiency beyond the current state of the art but also will open new possibilities for vision tasks by machines and computational imaging for humans.” Said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Future JPEG meetings are planned as follows:

  • No 90, will be held online from January 18 to 22, 2021.
  • N0 91, will be held online from April 19 to 23, 2021.

MPEG Column: 132nd MPEG Meeting (virtual/online)

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 132nd MPEG meeting was the first meeting with the new structure. That is, ISO/IEC JTC 1/SC 29/WG 11 — the official name of MPEG under the ISO structure — was disbanded after the 131st MPEG meeting and some of the subgroups of WG 11 (MPEG) have been elevated to independent MPEG Working Groups (WGs) and Advisory Groups (AGs) of SC 29 rather than subgroups of the former WG 11. Thus, the MPEG community is now an affiliated group of WGs and AGs that will continue meeting together according to previous MPEG meeting practices and will further advance the standardization activities of the MPEG work program.

The 132nd MPEG meeting was the first meeting with the new structure as follows (incl. Convenors and position within WG 11 structure):

  • AG 2 MPEG Technical Coordination (Convenor: Prof. Jörn Ostermann; for overall MPEG work coordination and prev. known as the MPEG chairs meeting; it’s expected that one can also provide inputs to this AG without being a member of this AG)
  • WG 2 MPEG Technical Requirements (Convenor Dr. Igor Curcio; former Requirements subgroup)
  • WG 3 MPEG Systems (Convenor: Dr. Youngkwon Lim; former Systems subgroup)
  • WG 4 MPEG Video Coding (Convenor: Prof. Lu Yu; former Video subgroup)
  • WG 5 MPEG Joint Video Coding Team(s) with ITU-T SG 16 (Convenor: Prof. Jens-Rainer Ohm; former JVET)
  • WG 6 MPEG Audio Coding (Convenor: Dr. Schuyler Quackenbush; former Audio subgroup)
  • WG 7 MPEG Coding of 3D Graphics (Convenor: Prof. Marius Preda, former 3DG subgroup)
  • WG 8 MPEG Genome Coding (Convenor: Prof. Marco Mattaveli; newly established WG)
  • AG 3 MPEG Liaison and Communication (Convenor: Prof. Kyuheon Kim; (former Communications subgroup)
  • AG 5 MPEG Visual Quality Assessment (Convenor: Prof. Mathias Wien; former Test subgroup).

The 132nd MPEG meeting was held as an online meeting and more than 300 participants continued to work efficiently on standards for the future needs of the industry. As a group, MPEG started to explore new application areas that will benefit from standardized compression technology in the future. A new web site has been created and can be found at http://mpeg.org/.

The official press release can be found here and comprises the following items:

  • Versatile Video Coding (VVC) Ultra-HD Verification Test Completed and Conformance and Reference Software Standards Reach their First Milestone
  • MPEG Completes Geometry-based Point Cloud Compression (G-PCC) Standard
  • MPEG Evaluates Extensions and Improvements to MPEG-G and Announces a Call for Evidence on New Advanced Genomics Features and Technologies
  • MPEG Issues Draft Call for Proposals on the Coded Representation of Haptics
  • MPEG Evaluates Responses to MPEG IPR Smart Contracts CfP
  • MPEG Completes Standard on Harmonization of DASH and CMAF
  • MPEG Completes 2nd Edition of the Omnidirectional Media Format (OMAF)
  • MPEG Completes the Low Complexity Enhancement Video Coding (LCEVC) Standard

In this report, I’d like to focus on VVC, G-PCC, DASH/CMAF, OMAF, and LCEVC.

Versatile Video Coding (VVC) Ultra-HD Verification Test Completed and Conformance & Reference Software Standards Reach their First Milestone

MPEG completed a verification testing assessment of the recently ratified Versatile Video Coding (VVC) standard for ultra-high definition (UHD) content with standard dynamic range, as may be used in newer streaming and broadcast television applications. The verification test was performed using rigorous subjective quality assessment methods and showed that VVC provides a compelling gain over its predecessor — the High-Efficiency Video Coding (HEVC) standard produced in 2013. In particular, the verification test was performed using the VVC reference software implementation (VTM) and the recently released open-source encoder implementation of VVC (VVenC):

  • Using its reference software implementation (VTM), VVC showed bit rate savings of roughly 45% over HEVC for comparable subjective video quality.
  • Using VVenC, additional bit rate savings of more than 10% relative to VTM were observed, which at the same time runs significantly faster than the reference software implementation.

Additionally, the standardization work for both conformance testing and reference software for the VVC standard reached its first major milestone, i.e., progressing to the Committee Draft ballot in the ISO/IEC approval process. The conformance testing standard (ISO/IEC 23090-15) will ensure interoperability among the diverse applications that use the VVC standard, and the reference software standard (ISO/IEC 23090-16) will provide an illustration of the capabilities of VVC and a valuable example showing how the standard can be implemented. The reference software will further facilitate the adoption of the standard by being available for use as the basis of product implementations.

Research aspects: as for every new video codec, its compression efficiency and computational complexity are important performance metrics. While the reference software (VTM) provides a valid reference in terms of compression efficiency it is not optimized for runtime. VVenC seems to provide already a significant improvement and with x266 another open source implementation will be available soon. Together with AOMedia’s AV1 (including its possible successor AV2), we are looking forward to a lively future in the area of video codecs.

MPEG Completes Geometry-based Point Cloud Compression Standard

MPEG promoted its ISO/IEC 23090-9 Geometry-based Point Cloud Compression (G-PCC) standard to the Final Draft International Standard (FDIS) stage. G-PCC addresses lossless and lossy coding of time-varying 3D point clouds with associated attributes such as color and material properties. This technology is particularly suitable for sparse point clouds. ISO/IEC 23090-5 Video-based Point Cloud Compression (V-PCC), which reached the FDIS stage in July 2020, addresses the same problem but for dense point clouds, by projecting the (typically dense) 3D point clouds onto planes, and then processing the resulting sequences of 2D images using video compression techniques. The generalized approach of G-PCC, where the 3D geometry is directly coded to exploit any redundancy in the point cloud itself, is complementary to V-PCC and particularly useful for sparse point clouds representing large environments.

Point clouds are typically represented by extremely large amounts of data, which is a significant barrier to mass-market applications. However, the relative ease of capturing and rendering spatial information compared to other volumetric video representations makes point clouds increasingly popular for displaying immersive volumetric data. The current draft reference software implementation of a lossless, intra-frame G‐PCC encoder provides a compression ratio of up to 10:1 and lossy coding of acceptable quality for a variety of applications with a ratio of up to 35:1.

By providing high immersion at currently available bit rates, the G‐PCC standard will enable various applications such as 3D mapping, indoor navigation, autonomous driving, advanced augmented reality (AR) with environmental mapping, and cultural heritage.

Research aspects: the main research focus related to G-PCC and V-PCC is currently on compression efficiency but one should not dismiss its delivery aspects including its dynamic, adaptive streaming. A recent paper on this topic has been published in the IEEE Communications Magazine and is entitled “From Capturing to Rendering: Volumetric Media Delivery With Six Degrees of Freedom“.

MPEG Finalizes the Harmonization of DASH and CMAF

MPEG successfully completed the harmonization of Dynamic Adaptive Streaming over HTTP (DASH) with Common Media Application Format (CMAF) featuring a DASH profile for the use with CMAF (as part of the 1st Amendment of ISO/IEC 23009-1:2019 4th edition).

CMAF and DASH segments are both based on the ISO Base Media File Format (ISOBMFF), which per se enables smooth integration of both technologies. Most importantly, this DASH profile defines (a) a normative mapping of CMAF structures to DASH structures and (b) how to use Media Presentation Description (MPD) as a manifest format.
Additional tools added to this amendment include

  • DASH events and timed metadata track timing and processing models with in-band event streams,
  • a method for specifying the resynchronization points of segments when the segments have internal structures that allow container-level resynchronization,
  • an MPD patch framework that allows the transmission of partial MPD information as opposed to the complete MPD using the XML patch framework as defined in IETF RFC 5261, and
  • content protection enhancements for efficient signalling.

It is expected that the 5th edition of the MPEG DASH standard (ISO/IEC 23009-1) containing this change will be issued at the 133rd MPEG meeting in January 2021. An overview of DASH standards/features can be found in the Figure below.

Research aspects: one of the features enabled by CMAF is low latency streaming that is actively researched within the multimedia systems community (e.g., here). The main research focus has been related to the ABR logic while its impact on the network is not yet fully understood and requires strong collaboration among stakeholders along the delivery path including ingest, encoding, packaging, (encryption), content delivery network (CDN), and consumption. A holistic view on ABR is needed to enable innovation and the next step towards the future generation of streaming technologies (https://athena.itec.aau.at/).

MPEG Completes 2nd Edition of the Omnidirectional Media Format

MPEG completed the standardization of the 2nd edition of the Omnidirectional MediA Format (OMAF) by promoting ISO/IEC 23009-2 to Final Draft International Standard (FDIS) status including the following features:

  • “Late binding” technologies to deliver and present only that part of the content that adapts to the dynamically changing users’ viewpoint. To enable an efficient implementation of such a feature, this edition of the specification introduces the concept of bitstream rewriting, in which a compliant bitstream is dynamically generated that, by combining the received portions of the bitstream, covers only the users’ viewport on the client.
  • Extension of OMAF beyond 360-degree video. This edition introduces the concept of viewpoints, which can be considered as user-switchable camera positions for viewing content or as temporally contiguous parts of a storyline to provide multiple choices for the storyline a user can follow.
  • Enhances the use of video, image, or timed text overlays on top of omnidirectional visual background video or images related to a sphere or a viewport.

Research aspects: standards usually define formats to enable interoperability but various informative aspects are left open for industry competition and subject to research and development. The same holds for OMAF and its 2nd edition enables researchers and developers to work towards efficient viewport-adaptive implementations focusing on the users’ viewport.

MPEG Completes the Low Complexity Enhancement Video Coding Standard

MPEG is pleased to announce the completion of the new ISO/IEC 23094-2 standard, i.e., Low Complexity Enhancement Video Coding (MPEG-5 Part 2 LCEVC), which has been promoted to Final Draft International Standard (FDIS) at the 132nd MPEG meeting.

  • LCEVC adds an enhancement data stream that can appreciably improve the resolution and visual quality of reconstructed video with an effective compression efficiency of limited complexity by building on top of existing and future video codecs.
  • LCEVC can be used to complement devices originally designed only for decoding the base layer bitstream, by using firmware, operating system, or browser support. It is designed to be compatible with existing video workflows (e.g., CDNs, metadata management, DRM/CA) and network protocols (e.g., HLS, DASH, CMAF) to facilitate the rapid deployment of enhanced video services.
  • LCEVC can be used to deliver higher video quality in limited bandwidth scenarios, especially when the available bit rate is low for high-resolution video delivery and decoding complexity is a challenge. Typical use cases include mobile streaming and social media, and services that benefit from high-density/low-power transcoding.

Research aspects: LCEVC provides a kind of scalable video coding by combining hardware- and software-based decoders that allow for certain flexibility as part of regular software life cycle updates. However, LCEVC has been never compared to Scalable Video Coding (SVC) and Scalable High-Efficiency Video Coding (SHVC) which could be an interesting aspect for future work.

The 133rd MPEG meeting will be again an online meeting in January 2021.

Click here for more information about MPEG meetings and their developments.

Immersive Media Experiences – Why finding Consensus is Important

An introduction to the QUALINET White Paper on Definitions of Immersive Media Experience (IMEx) [1].

Introduction

Immersive media are reshaping the way users experience reality. They are increasingly incorporated across enterprise and consumer sectors to offer experiential solutions to a diverse range of industries. Current technologies that afford an immersive media experience (IMEx) include Augmented Reality (AR), Virtual Reality (VR), Mixed Reality (MR), and 360-degree video. Popular uses can be found in enhancing connectivity applications, supporting knowledge-based tasks, learning & skill development, as well as adding immersive and interactive dimensions to the retail, business, and entertainment industries. Whereas the evolution of immersive media can be traced over the past 50 years, its current popularity boost is primarily owed to significant advances in the last decade brought about by improved connectivity, superior computing, and device capabilities. In specific, advancements witnessed in display technologies, visualizations, interaction & tracking devices, recognition technologies, platform development, new media formats, and increasing user demand for real-time & dynamic content across platforms.

Though still in its infancy, the immersive economy is growing into a dynamic and confident sector. Being an emerging sector, it is hard to find official data, but some estimations project the immersive media global market size to continue its upward growth at around 30% CAGR to reach USD180 Bn by 2022 [2,3]. Country-wise, the USA is expected to secure 1/3rd of the global immersive media market share followed by China, Japan, Germany, and the UK as likely immersive media markets where significant spending is anticipated. Consumer products and devices are poised to be the largest contributing segment. The growth in immersive consumer products is expected to continue as Head-Mounted Displays (HMD) become commonplace and interest in mobile augmented reality increase [4]. However, immersive media are no longer just a pursuit of alternative display technologies but pushing towards holistic ecosystems that seek contributions from hardware manufacturers, application & platform developers, content producers, and users. These ecosystems are making way for sophisticated content creation available on platforms that allow user participation, interaction, and skill integration through advanced tools.

Immersive media experience (IMEx), today, is not only how users view media but in fact a transformative way to consume media altogether. They draw considerable interdisciplinary interest from multiple disciplines. As stakeholders increase, the need for clarity and coherence on definitions and concepts become all the more important. In this article, we provide an overview and a brief survey of some of the key definitions that are central to IMEx including its Quality of Experience (QoE), application areas, influencing factors, and assessment methods. Our aim is to enable some clarity and initiate consensus, on topics related to IMEx that can be useful for researchers and practitioners working both inside academia and the industry.

Why understand IMEx?

IMEx combines reality with technology enabling emplaced multimedia experiences of standard media (film, photographic, or animated) as well as synthetic and interactive environments for users. They utilize visual, auditory, and haptic feedback to stimulate physical senses such that users psychologically feel immersed within these multidimensional media environments. This sense of “being there” is also referred to as presence.

As mentioned earlier, the enthusiasm for IMEx is mainly driven by the gaming, entertainment, retail, healthcare, digital marketing, and skill training industries. So far, research has tilted favourably towards innovation, with a particular interest in image capture, recognition, mapping, and display technologies over the past few years. However, the prevalence of IMEx has also ushered in a plethora of definitions, frameworks, and models to understand the psychological and phenomenological concepts associated with these media forms. Central, of course, are the closely related concepts of immersion and presence, which are interpreted varyingly across fields; for example, when one moves from literature to narratology to computer sciences. However, with immersive media, these three separate fields come together inside interactive digital narrative applications where immersive narratives are used to solve real-world problems. This is when noticeable interdisciplinary differences regarding definitions, scope, and constituents require urgent redressal to achieve a coherent understanding of the used concepts. Such consensus is vital for giving directionality to the future of immersive media that can be shared by all.

A White Paper on IMEx

A recent White Paper [1] by QUALINET, the European Network on Quality of Experience in Multimedia Systems and Services [5], is a contribution to the discussions related to Immersive Media Experience (IMEx). It attempts to build consensus around ideas and concepts that are related to IMEx but originate from multidisciplinary groups with a joint interest in multimedia experiences.

The QUALINET community aims at extending the notion of network-centric Quality of Service (QoS) in multimedia systems, by relying on the concept of Quality of Experience (QoE). The main scientific objective is the development of methodologies for subjective and objective quality metrics considering current and new trends in multimedia communication systems as witnessed by the appearance of new types of content and interactions.

The white paper was created based on an activity launched at the 13th QUALINET meeting on June 4, 2019, in Berlin as part of Task Force 7, Immersive Media Experiences (IMEx). The paper received contributions from 44 authors under 10 section leads, which were consolidated into a first draft and released among all section leads and editors for internal review. After incorporating the feedback from all section leads, the editors initially released the White Paper within the QUALINET community for review. Following feedback from QUALINET at large, the editors distributed the White Paper widely for an open, public community review (e.g., research communities/committees in ACM and IEEE, standards development organizations, various open email reflectors related to this topic). The feedback received from this public consultation process resulted in the final version which has been approved during the 14th QUALINET meeting on May 25, 2020.

Understanding the White Paper

The White Paper surveys definitions and concepts that contribute to IMEx. It describes the Quality of Experience (QoE) for immersive media by establishing a relationship between the concepts of QoE and IMEx. This article provides an outline of these concepts by looking at:

  • Survey of definitions of immersion and presence discusses various frameworks and conceptual models that are most relevant to these phenomena in terms of multimedia experiences.
  • Definition of immersive media experience describes experiential determinants for IMEx characterized through its various technological contexts.
  • Quality of experience for immersive media applies existing QoE concepts to understand the user-centric subjective feelings of “a sense of being there”, “a sense of agency”, and “cybersickness”.
  • The application area for immersive media experience presents an overview of immersive technologies in use within gaming, omnidirectional content, interactive storytelling, health, entertainment, and communications.
  • Influencing factors on immersive media experience look at the three existing influence factors on QoE with a pronounced emphasis on the human influence factor as of very high relevance to IMEx.
  • Assessment of immersive media experience underscores the importance of proper examination of multimedia systems, including IMEx, by highlighting three methods currently in use, i.e., subjective, behavioural, and psychophysiological.
  • Standardization activities discuss the three clusters of activities currently underway to achieve interoperability for IMEx: (i) data representation & formats; (ii) guidelines, systems standards, & APIs; and (iii) Quality of Experience (QoE).

Conclusions

Immersive media have significantly changed the use and experience of new digital media. These innovative technologies transcend traditional formats and present new ways to interact with digital information inside synthetic or enhanced realities, which include VR, AR, MR, and haptic communications. Earlier the need for a multidisciplinary consensus was discussed vis-à-vis definitions of IMEx. The QUALINET white paper provides such “a toolbox of definitions” for IMEx. It stands out for bringing together insights from multimedia groups spread across academia and industry, specifically the Video Quality Experts Group (VQEG) and the Immersive Media Group (IMG). This makes it a valuable asset for those working in the field of IMEx going forward.

References

[1] Perkis, A., Timmerer, C., et al., “QUALINET White Paper on Definitions of Immersive Media Experience (IMEx)”, European Network on Quality of Experience in Multimedia Systems and Services, 14th QUALINET meeting (online), May 25, 2020. Online: https://arxiv.org/abs/2007.07032
[2] Mateos-Garcia, J., Stathoulopoulos, K., & Thomas, N. (2018). The immersive economy in the UK (Rep. No. 18.1137.020). Innovate UK.
[3] Infocomm Media 2025 Supplementary Information (pp. 31-43, Rep.). (2015). Singapore: Ministry of Communications and Information.
[4] Hadwick, A. (2020). XR Industry Insight Report 2019-2020 (Rep.). San Francisco: VRX Conference & Expo.
[5] http://www.qualinet.eu/