VQEG Column: VQEG Meeting Dec. 2020 (virtual/online)

Introduction

Welcome to the third column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG).
The last VQEG plenary meeting took place online from 14 to 18 December. Given the current circumstances, it was organized all online for the second time, with multiple sessions distributed over five to six hours each day allowing remote participation of people from different time zones. About 130 participants from 24 different countries registered to the meeting and could attend the several presentations and discussions that took place in all working groups.
This column provides an overview of this meeting, while all the information, minutes, files (including the presented slides), and video recordings from the meeting are available online in the VQEG meeting website. As highlights of interest for the SIGMM community, apart from several interesting presentations of state-of-the-art works, relevant contributions to ITU recommendations related to multimedia quality assessment were reported from various groups (e.g., on adaptive bitrate streaming services, on subjective quality assessment of 360-degree videos, on statistical analysis of quality assessments, on gaming applications, etc.), the new group on quality assessment for health applications was launched, and an interesting session on 5G use cases took place, as well as a workshop dedicated to user testing during Covid-19. In addition, new efforts have been launched related to the research on quality metrics for live media streaming applications, and to provide guidelines on implementing objective video quality metrics (ahead of PSNR) to the video compression community.
We encourage those readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Overview of VQEG Projects

Audiovisual HD (AVHD)

AVHD/P.NATS2 project was a joint collaboration between VQEG and ITU SG12, whose goal was to develop a multitude of objective models, varying in terms of complexity/type of input/use-cases for the assessment of video quality in adaptive bitrate streaming services over reliable transport up to 4K. The report of this project, which finished in January 2020, was approved in this meeting. In summary, it resulted in 10 model categories with models trained and validated on 26 subjective datasets. This activity resulted in 4 ITU standards (ITU-T Rec. P.1204 in [1], P.1204.3 in [2], P.1204.4 in [3], P.1204.5 in [4], a dataset created during this effort and a journal publication reporting details on the validation tests [5]. In this sense, one presentation by Alexander Raake (TU Ilmenau) provided details on the P.NATS Phase 2 project and the resulting ITU recommendations, while details of the processing chain used in the project were presented by Werner Robitza (AVEQ GmbH) and David Lindero (Ericsson).
In addition to this activity, there were various presentations covering topics related to this group. For instance, Cindy Chen, Deepa Palamadai Sundar, and Visala Vaduganathan (Facebook) presented their work on hardware acceleration of video quality metrics. Also from Facebook, Haixiong Wang presented their work on efficient measurement of quality at scale in their video ecosystem [6]. Lucjan Janowski (AGH University) proposed a discussion on more ecologically valid subjective experiments, Alan Bovik (University of Texas at Austin) presented a hitchhiker’s guide to SSIM, and Ali Ak (Université de Nantes) presented a comprehensive analysis of crowdsourcing for subjective evaluation of tone mapping operators. Finally, Rohit Puri (Twitch) opened a discussion on the research on QoE metrics for live media streaming applications, which led to the agreement to start a new sub-project within AVHD group on this topic.

Psycho-Physiological Quality Assessment (PsyPhyQA)

The chairs of the PsyPhyQA group provided an update on the activities carried out. In this sense, a test plan for psychophysiological video quality assessment was established and currently the group is aiming to develop ideas to do quality assessment tests with psychophysiological measures in times of a pandemic and to collect and discuss ideas about possible joint works. In addition, the project is trying to learn about physiological correlates of simulator sickness, and in this sense, a presentation was delivered J.P. Tauscher (Technische Universität Braunschweig) on exploring neural and peripheral physiological correlates of simulator sickness. Finally, Waqas Ellahi (Université de Nantes) gave a presentation on visual fidelity of tone mapping operators from gaze data using HMM [7].

Quality Assessment for Health applications (QAH)

This was the first meeting for this new QAH group. The chairs informed about the first audio call that took place on November to launch the project, know how many people are interested in this project, what each member has already done on medical images, what each member wants to do in this joint project, etc.
The plenary meeting served to collect ideas about possible joint works and to share experiences on related studies. In this sense, Lucie Lévêque (Université Gustave Eiffel) presented a review on subjective assessment of the perceived quality of medical images and videos, Maria Martini (Kingston University London) talked about the suitability of VMAF for quality assessment of medical videos (ultrasound & wireless capsule endoscopy), and Jorge Caviedes (ASU) delivered a presentation on cognition inspired diagnostic image quality models.

Statistical Analysis Methods (SAM)

The update report from SAM group presented the ongoing progress on new methods for data analysis, including the discussion with ITU-T (P.913 [8]) and ITU-R (BT.500 [9]) about including a new one in the recommendations.
Several interesting presentations related to the ongoing work within SAM were delivered. For instance, Jakub Nawala (AGH University) presented the “su-JSON”, a uniform JSON-based subjective data format, as well as his work on describing subjective experiment consistency by p-value p–p plots. An interesting discussion was raised by Lucjan Janowski (AGH University) on how to define the quality of a single sequence, analyzing different perspectives (e.g., crowd, experts, psychology, etc.). Also, Babak Naderi (TU Berlin) presented an analysis on the relation on Mean Opinion Score (MOS) and ranked-based statistics. Recent advances on Netflix quality metric VMAF were presented by Zhi Li (Netflix), especially on the properties of VMAF in the presence of image enhancement. Finally, two more presentations addressed the progress on statistical analyses of quality assessment data, one by Margaret Pinson (NTIA/ITS) on the computation of confidence intervals, and one by Suiyi Ling (Université de Nantes) on a probabilistic model to recover the ground truth and annotator’s behavior.

Computer Generated Imagery (CGI)

The report from the chairs of the CGI group covered the progress on the research on assessment methodologies for quality assessment of gaming services (e.g., ITU-T P.809 [10]), on crowdsourcing quality assessment for gaming application (P.808 [11]), on quality prediction and opinion models for cloud gaming (e.g., ITU-T G.1072 [12]), and on models (signal-, bitstream-, and parametric-based models) for video quality assessment of CGI content (e.g., nofu, NDNetGaming, GamingPara, DEMI, NR-GVQM, etc.).
In terms of planned activities, the group is targeting the generation of new gaming datasets and tools for metrics to assess gaming QoE, but also the group is aiming at identifying other topics of interest in CGI rather than gaming content.
In addition, there was a presentation on updates on gaming standardization activities and deep learning models for gaming quality prediction by Saman Zadtootaghaj (TU Berlin), another one on subjective assessment of multi-dimensional aesthetic assessment for mobile game images by Suiyi Ling (Université de Nantes), and one addressing quality assessment of gaming videos compressed via AV1 by Maria Martini (Kingston University London), leading to interesting discussions on those topics.

No Reference Metrics (NORM)

The session for NORM group included a presentation on the differences among existing implementations of spatial and temporal perceptual information indices (SI and TI as defined in ITU-T P.910 [13]) by Cosmin Stejerean (Facebook), which led to an open discussion and to the agreement on launching an effort to clarify the ambiguous details that have led to different implementations (and different results), to generate test vectors for reference and validation of the implementations and to address the computation of these indicators for HDR content. In addition, Margaret Pinson (NTIA/ITS) presented the paradigm of no-reference metric research analyzing design problems and presenting a framework for collaborative development of no-reference metrics for image and video quality. Finally, Ioannis Katsavounidis (Facebook) delivered a talk on addressing the addition of video quality metadata in compressed bitstreams. Further discussions on these topics are planned in the next month within the group.

Joint Effort Group (JEG) – Hybrid

The JEG-Hybrid group is currently working in collaboration with Sky Group in determining when video quality metrics are likely to inaccurately predict the MOS and on modelling single observers’ quality perception based in artificial intelligence techniques. In this sense, Lohic Fotio (Politecnico di Tornio) presented his work on artificial intelligence-based observers for media quality assessment. Also, together with Florence Agboma (Sky UK) they presented their work on comparing commercial and open source video quality metrics for HD constant bitrate videos. Finally, Dariusz Grabowski (AGH University) presented his work on comparing full-reference video quality metrics using cluster analysis.

Quality Assessment for Computer Vision Applications (QACoViA)

The QACoViA group announced Lu Zhang (INSA Rennes) as new third co-chair, who will also work in the near future in a project related to image compression for optimized recognition by distributed neural networks. In addition, Mikołaj Leszczuk (AGH University) presented a report on a recently finished project related to objective video quality assessment method for recognition tasks, in collaboration with Huawei through its Innovation Research Programme.

5G Key Performance Indicators (5GKPI)

The 5GKPI session was oriented to identify possible interested partners and joint works (e.g., contribution to ITU-T SG12 recommendation G.QoE-5G [14], generation of open/reference datasets, etc.). In this sense, it included four presentations of use cases of interest: tele-operated driving by Yungpeng Zang (5G Automotive Association), content production related to the European project 5G-Records by Paola Sunna (EBU), Augmented/Virtual Reality by Bill Krogfoss (Bell Labs Consulting), and QoE for remote controlled use cases by Kjell Brunnström (RISE).

Immersive Media Group (IMG)

A report on the updates within the IMG group was initially presented, especially covering the current joint work investigating the subjective quality assessment of 360-degree video. In particular, a cross-lab test, involving 10 different labs, were carried out at the beginning of 2020 resulting in relevant outcomes including various contributions to ITU SG12/Q13 and MPEG AhG on Quality of Immersive Media. It is worth noting that the new ITU-T recommendation P.919 [15], related to subjective quality assessment of 360-degree videos (in line with ITU-R BT.500 [8] or ITU-T P.910 [13]), was approved in mid-October, and was supported by the results of these cross-lab tests. 
Furthermore, since these tests have already finished, there was a presentation by Pablo Pérez (Nokia Bell-Labs) on possible future joint activities within IMG, which led to an open discussion after it that will continue in future audio calls.
In addition, a total of four talks covered topics related to immersive media technologies, including an update from the Audiovisual Technology Group of the TU Ilmenau on immersive media topics, and a presentation of a no-reference quality metric for light field content based on a structural representation of the epipolar plane image by Ali Ak and Patrick Le Callet (Université de Nantes) [16]. Also, there were two presentations related to 3D graphical contents, one addressing the perceptual characterization of 3D graphical contents based on visual attention patterns by Mona Abid (Université de Nantes), and another one comparing subjective methods for quality assessment of 3D graphics in virtual reality by Yana Nehmé (INSA Lyon). 

Intersector Rapporteur Group on Audiovisual Quality Assessment (IRG-AVQA) and Q19 Interim Meeting

Chulhee Lee (Yonsei University) chaired the IRG-AVQA session, providing an overview on the progress and recent works within ITU-R WP6C in HDR related topics and ITU-T SG12 Questions 9, 13, 14, 19 (e.g., P.NATS Phase 2 and follow-ups, subjective assessment of 360-degree video, QoE factors for AR applications, etc.). In addition, a new work item was announced within ITU-T SG9: End-to-end network characteristics requirements for video services (J.pcnp-char [17]).
From the discussions raised during this session, a new dedicated group was set up to work on introducing and provide guidelines on implementing objective video quality metrics, ahead of PSNR, to the video compression community. The group was named “Implementers Guide for Video Quality Metrics (IGVQM)” and will be chaired by Ioannis Katsavounidis (Facebook), accounting with the involvement of several people from VQEG.
After the IRG-AVQA session, the Q19 interim meeting took place with a report by Chulhee Lee and a presentation by Zhi Li (Netflix) on an update on improvements on subjective experiment data analysis process.

Other updates

Apart from the aforementioned groups, the Human Factors for Visual Experience (HVEI) is still active coordinating VQEG activities in liaison with the IEEE Standards Association Working Groups on HFVE, especially on perceptual quality assessment of 3D, UHD and HD contents, quality of experience assessment for VR and MR, quality assessment of light-field imaging contents, and deep-learning-based assessment of visual experience based on human factors. In this sense, there are ongoing contributions from VQEG members to IEEE Standards.
In addition, there was a workshop dedicated to user testing during Covid-19, which included a presentation on precaution for lab experiments by Kjell Brunnström (RISE), another presentation by Babak Naderi (TU Berlin) on subjective tests during the pandemic, and a break-out session for discussions on the topic.

Finally, the next VQEG plenary meeting will take place in spring 2021 (exact dates still to be agreed), probably online again.

References

[1] ITU-T Rec. P.1204. Video quality assessment of streaming services over reliable transport for resolutions up to 4K, 2020.
[2] ITU-T Rec. P.1204.3. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to full bitstream information, 2020.
[3] ITU-T Rec. P.1204.4. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to full and reduced reference pixel information, 2020.
[4] ITU-T Rec. P.1204.5. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to transport and received pixel information, 2020.
[5] A. Raake, S. Borer, S. Satti, J. Gustafsson, R.R.R. Rao, S. Medagli, P. List, S. Göring, D. Lindero, W. Robitza, G. Heikkilä, S. Broom, C. Schmidmer, B. Feiten, U. Wüstenhagen, T. Wittmann, M. Obermann, R. Bitto, “Multi-model standard for bitstream-, pixel-based and hybrid video quality assessment of UHD/4K: ITU-T P.1204”, IEEE Access, vol. 8, pp. 193020-193049, Oct. 2020.
[6] S.L. Regunathan, H. Wang, Y. Zhang, Y. R. Liu, D. Wolstencroft, S. Reddy, C. Stejerean, S. Gandhi, M. Chen, P. Sethi, A, Puntambekar, M. Coward, I. Katsavounidis, “Efficient measurement of quality at scale in Facebook video ecosystem”, in Applications of Digital Image Processing XLIII, vol. 11510, p. 115100J, Aug. 2020.
[7] W. Ellahi, T. Vigier and P. Le Callet, “HMM-Based Framework to Measure the Visual Fidelity of Tone Mapping Operators”, IEEE International Conference on Multimedia & Expo Workshops (ICMEW), London, United Kingdom, Jul. 2020.
[8] ITU-R Rec. BT.500-14. Methodology for the subjective assessment of the quality of television pictures, 2019.
[9] ITU-T Rec. P.913. Methods for the subjective assessment of video quality, audio quality and audiovisual quality of Internet video and distribution, 2016.
[10] ITU-T Rec. P.809. Subjective evaluation methods for gaming quality, 2018.
[11] ITU-T Rec. P.808. Subjective evaluation of speech quality with a crowdsourcing approach, 2018.
[12] ITU-T Rec. G.1072. Opinion model predicting gaming quality of experience for cloud gaming services, 2020.
[13] ITU-T Rec. P.910. Subjective video quality assessment methods for multimedia applications, 2008.
[14] ITU-T Rec. G.QoE-5G. QoE factors for new services in 5G networks, 2020 (under study).
[15] ITU-T Rec. P.919. Subjective test methodologies for 360º video on head-mounted displays, 2020.
[16] A. Ak, S. Ling and P. Le Callet, “No-Reference Quality Evaluation of Light Field Content Based on Structural Representation of The Epipolar Plane Image”, IEEE International Conference on Multimedia & Expo Workshops (ICMEW), London, United Kingdom, Jul. 2020.
[17] ITU-T Rec. J.pcnp-char. E2E Network Characteristics Requirement for Video Services, 2020 (under study).

JPEG Column: 89th JPEG Meeting

JPEG initiates standardisation of image compression based on AI

The 89th JPEG meeting was held online from 5 to 9 October 2020.

During this meeting, multiple JPEG standardisation activities and explorations were discussed and progressed. Notably, the call for evidence on learning-based image coding was successfully completed and evidence was found that this technology promises several new functionalities while offering at the same time superior compression efficiency, beyond the state of the art. A new work item, JPEG AI, that will use learning-based image coding as core technology has been proposed, enlarging the already wide families of JPEG standards.

Figure 1. JPEG Families of standards and JPEG AI.

The 89th JPEG meeting had the following highlights:

  • JPEG AI call for evidence report
  • JPEG explores standardization needs to address fake media
  • JPEG Pleno Point Cloud Coding reviews the status of the call for evidence
  • JPEG Pleno Holography call for proposals timeline
  • JPEG DNA identifies use cases and requirements
  • JPEG XL standard defines the final specification
  • JPEG Systems JLINK reaches committee draft stage
  • JPEG XS 2nd Edition Parts 1, 2 and 3.

JPEG AI

At the 89th meeting, the submissions to the Call for Evidence on learning-based image coding were presented and discussed. Four submissions were received in response to the Call for Evidence. The results of the subjective evaluation of the submissions to the Call for Evidence were reported and discussed in detail by experts. It was agreed that there is strong evidence that learning-based image coding solutions can outperform the already defined anchors in terms of compression efficiency when compared to state-of-the-art conventional image coding architecture. Thus, it was decided to create a new standardisation activity for a JPEG AI on learning-based image coding system, that applies machine learning tools to achieve substantially better compression efficiency compared to current image coding systems, while offering unique features desirable for efficient distribution and consumption of images. This type of approach should allow obtaining an efficient compressed domain representation not only for visualisation but also for machine learning-based image processing and computer vision. JPEG AI releases to the public the results of the objective and subjective evaluations as well as the first version of common test conditions for assessing the performance of learning-based image coding systems.

JPEG explores standardization needs to address fake media

Recent advances in media modification, particularly deep learning-based approaches, can produce near realistic media content that is almost indistinguishable from authentic content. These developments open opportunities for production of new types of media contents that are useful for many creative industries but also increase risks of spread of maliciously modified content (e.g., ‘deepfake’) leading to social unrest, spreading of rumours or encouragement of hate crimes. The JPEG Committee is interested in exploring if a JPEG standard can facilitate a secure and reliable annotation of media modifications, both in good faith and malicious usage scenarios. 

The JPEG is currently discussing with stakeholders from academia, industry and other organisations to explore the use cases that will define a roadmap to identify the requirements leading to a potential standard. The Committee has received significant interest and has released a public document outlining the context, use cases and requirements. JPEG invites experts and technology users to actively participate in this activity and attend a workshop, to be held online in December 2020. Details on the activities of JPEG in this area can be found on the JPEG.org website. Interested parties are notably encouraged to register to the mailing list of the ad hoc group that has been set up to facilitate the discussions and coordination on this topic.

JPEG Pleno Point Cloud Coding

JPEG Pleno is working towards the integration of various modalities of plenoptic content under a single and seamless framework. Efficient and powerful point cloud representation is a key feature within this vision. Point cloud data supports a wide range of applications including computer-aided manufacturing, entertainment, cultural heritage preservation, scientific research and advanced sensing and analysis. During the 89th JPEG meeting, the JPEG Committee reviewed expressions of interest in the Final Call for Evidence on JPEG Pleno Point Cloud Coding. This Call for Evidence focuses specifically on point cloud coding solutions supporting scalability and random access of decoded point clouds. Between its 89th and 90th meetings, the JPEG Committee will be actively promoting this activity and collecting submissions to participate in the Call for Evidence.

JPEG Pleno Holography

At the 89th meeting, the JPEG Committee released an updated draft of the Call for Proposals for JPEG Pleno Holography. A final Call for Proposals on JPEG Pleno Holography will be released in April 2021. JPEG Pleno Holography is seeking for compression solutions of holographic content. The scope of the activity is quite large and addresses diverse use cases such as holographic microscopy and tomography, but also holographic displays and printing. Current activities are centred around refining the objective and subjective quality assessment procedures. Interested parties are already invited at this stage to participate in these activities.

JPEG DNA

JPEG standards are used in storage and archival of digital pictures. This puts the JPEG Committee in a good position to address the challenges of DNA-based storage by proposing an efficient image coding format to create artificial DNA molecules. JPEG DNA has been established as an exploration activity within the JPEG Committee to study use cases, to identify requirements and to assess the state of the art in DNA storage for the purpose of image archival using DNA in order to launch a standardization activity. To this end, a first workshop was organised on 30 September 2020. Presentations made at the workshop are available from the following URL: http://ds.jpeg.org/proceedings/JPEG_DNA_1st_Workshop_Proceedings.zip.
At its 89th meeting, the JPEG Committee released a second version of a public document that describes its findings regarding storage of digital images using artificial DNA. In this framework, JPEG DNA ad hoc group was re-conducted in order to continue its activities to further refine the above-mentioned document and to organise a second workshop. Interested parties are invited to join this activity by participating in the AHG through the following URL: http://listregistration.jpeg.org.

JPEG XL

Final technical comments by national bodies have been addressed and incorporated into the JPEG XL specification (ISO/IEC 18181-1) and the reference implementation. A draft FDIS study text has been prepared and final validation experiments are planned.

JPEG Systems

The JLINK (ISO/IEC 19566-7) standard has reached the committee draft stage and will be made public.  The JPEG Committee invites technical feedback on the document which is available on the JPEG website.  Development of the JPEG Snack (IS0/IEC 19566-8) standard has begun to support the defined use cases and requirements.  Interested parties can subscribe to the mailing list of the JPEG Systems AHG in order to contribute to the above activities.

JPEG XS

The JPEG committee is finalizing its work on the 2nd Editions of JPEG-XS Part 1, Part 2 and Part 3. Part 1 defines new coding tools required to efficiently compress raw Bayer images. The observed quality gains of raw Bayer compression over compressing in the RGB domain can be as high as 5dB PSNR. Moreover, the second edition adds support for mathematically lossless image compression and allows compression of 4:2:0 sub-sampled images. Part 2 defines new profiles for such content. With the support for low-complexity high-quality compression of raw Bayer (or Color-Filtered Array) data, JPEG XS proves to also be an excellent compression scheme in the professional and consumer digital camera market, as well as in the machine vision and automotive industry.

Final Quote

“JPEG AI will be a new work item completing the collection of JPEG standards. JPEG AI relies on artificial intelligence to compress images. This standard not only will offer superior compression efficiency beyond the current state of the art but also will open new possibilities for vision tasks by machines and computational imaging for humans.” Said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Future JPEG meetings are planned as follows:

  • No 90, will be held online from January 18 to 22, 2021.
  • N0 91, will be held online from April 19 to 23, 2021.

MPEG Column: 132nd MPEG Meeting (virtual/online)

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 132nd MPEG meeting was the first meeting with the new structure. That is, ISO/IEC JTC 1/SC 29/WG 11 — the official name of MPEG under the ISO structure — was disbanded after the 131st MPEG meeting and some of the subgroups of WG 11 (MPEG) have been elevated to independent MPEG Working Groups (WGs) and Advisory Groups (AGs) of SC 29 rather than subgroups of the former WG 11. Thus, the MPEG community is now an affiliated group of WGs and AGs that will continue meeting together according to previous MPEG meeting practices and will further advance the standardization activities of the MPEG work program.

The 132nd MPEG meeting was the first meeting with the new structure as follows (incl. Convenors and position within WG 11 structure):

  • AG 2 MPEG Technical Coordination (Convenor: Prof. Jörn Ostermann; for overall MPEG work coordination and prev. known as the MPEG chairs meeting; it’s expected that one can also provide inputs to this AG without being a member of this AG)
  • WG 2 MPEG Technical Requirements (Convenor Dr. Igor Curcio; former Requirements subgroup)
  • WG 3 MPEG Systems (Convenor: Dr. Youngkwon Lim; former Systems subgroup)
  • WG 4 MPEG Video Coding (Convenor: Prof. Lu Yu; former Video subgroup)
  • WG 5 MPEG Joint Video Coding Team(s) with ITU-T SG 16 (Convenor: Prof. Jens-Rainer Ohm; former JVET)
  • WG 6 MPEG Audio Coding (Convenor: Dr. Schuyler Quackenbush; former Audio subgroup)
  • WG 7 MPEG Coding of 3D Graphics (Convenor: Prof. Marius Preda, former 3DG subgroup)
  • WG 8 MPEG Genome Coding (Convenor: Prof. Marco Mattaveli; newly established WG)
  • AG 3 MPEG Liaison and Communication (Convenor: Prof. Kyuheon Kim; (former Communications subgroup)
  • AG 5 MPEG Visual Quality Assessment (Convenor: Prof. Mathias Wien; former Test subgroup).

The 132nd MPEG meeting was held as an online meeting and more than 300 participants continued to work efficiently on standards for the future needs of the industry. As a group, MPEG started to explore new application areas that will benefit from standardized compression technology in the future. A new web site has been created and can be found at http://mpeg.org/.

The official press release can be found here and comprises the following items:

  • Versatile Video Coding (VVC) Ultra-HD Verification Test Completed and Conformance and Reference Software Standards Reach their First Milestone
  • MPEG Completes Geometry-based Point Cloud Compression (G-PCC) Standard
  • MPEG Evaluates Extensions and Improvements to MPEG-G and Announces a Call for Evidence on New Advanced Genomics Features and Technologies
  • MPEG Issues Draft Call for Proposals on the Coded Representation of Haptics
  • MPEG Evaluates Responses to MPEG IPR Smart Contracts CfP
  • MPEG Completes Standard on Harmonization of DASH and CMAF
  • MPEG Completes 2nd Edition of the Omnidirectional Media Format (OMAF)
  • MPEG Completes the Low Complexity Enhancement Video Coding (LCEVC) Standard

In this report, I’d like to focus on VVC, G-PCC, DASH/CMAF, OMAF, and LCEVC.

Versatile Video Coding (VVC) Ultra-HD Verification Test Completed and Conformance & Reference Software Standards Reach their First Milestone

MPEG completed a verification testing assessment of the recently ratified Versatile Video Coding (VVC) standard for ultra-high definition (UHD) content with standard dynamic range, as may be used in newer streaming and broadcast television applications. The verification test was performed using rigorous subjective quality assessment methods and showed that VVC provides a compelling gain over its predecessor — the High-Efficiency Video Coding (HEVC) standard produced in 2013. In particular, the verification test was performed using the VVC reference software implementation (VTM) and the recently released open-source encoder implementation of VVC (VVenC):

  • Using its reference software implementation (VTM), VVC showed bit rate savings of roughly 45% over HEVC for comparable subjective video quality.
  • Using VVenC, additional bit rate savings of more than 10% relative to VTM were observed, which at the same time runs significantly faster than the reference software implementation.

Additionally, the standardization work for both conformance testing and reference software for the VVC standard reached its first major milestone, i.e., progressing to the Committee Draft ballot in the ISO/IEC approval process. The conformance testing standard (ISO/IEC 23090-15) will ensure interoperability among the diverse applications that use the VVC standard, and the reference software standard (ISO/IEC 23090-16) will provide an illustration of the capabilities of VVC and a valuable example showing how the standard can be implemented. The reference software will further facilitate the adoption of the standard by being available for use as the basis of product implementations.

Research aspects: as for every new video codec, its compression efficiency and computational complexity are important performance metrics. While the reference software (VTM) provides a valid reference in terms of compression efficiency it is not optimized for runtime. VVenC seems to provide already a significant improvement and with x266 another open source implementation will be available soon. Together with AOMedia’s AV1 (including its possible successor AV2), we are looking forward to a lively future in the area of video codecs.

MPEG Completes Geometry-based Point Cloud Compression Standard

MPEG promoted its ISO/IEC 23090-9 Geometry-based Point Cloud Compression (G-PCC) standard to the Final Draft International Standard (FDIS) stage. G-PCC addresses lossless and lossy coding of time-varying 3D point clouds with associated attributes such as color and material properties. This technology is particularly suitable for sparse point clouds. ISO/IEC 23090-5 Video-based Point Cloud Compression (V-PCC), which reached the FDIS stage in July 2020, addresses the same problem but for dense point clouds, by projecting the (typically dense) 3D point clouds onto planes, and then processing the resulting sequences of 2D images using video compression techniques. The generalized approach of G-PCC, where the 3D geometry is directly coded to exploit any redundancy in the point cloud itself, is complementary to V-PCC and particularly useful for sparse point clouds representing large environments.

Point clouds are typically represented by extremely large amounts of data, which is a significant barrier to mass-market applications. However, the relative ease of capturing and rendering spatial information compared to other volumetric video representations makes point clouds increasingly popular for displaying immersive volumetric data. The current draft reference software implementation of a lossless, intra-frame G‐PCC encoder provides a compression ratio of up to 10:1 and lossy coding of acceptable quality for a variety of applications with a ratio of up to 35:1.

By providing high immersion at currently available bit rates, the G‐PCC standard will enable various applications such as 3D mapping, indoor navigation, autonomous driving, advanced augmented reality (AR) with environmental mapping, and cultural heritage.

Research aspects: the main research focus related to G-PCC and V-PCC is currently on compression efficiency but one should not dismiss its delivery aspects including its dynamic, adaptive streaming. A recent paper on this topic has been published in the IEEE Communications Magazine and is entitled “From Capturing to Rendering: Volumetric Media Delivery With Six Degrees of Freedom“.

MPEG Finalizes the Harmonization of DASH and CMAF

MPEG successfully completed the harmonization of Dynamic Adaptive Streaming over HTTP (DASH) with Common Media Application Format (CMAF) featuring a DASH profile for the use with CMAF (as part of the 1st Amendment of ISO/IEC 23009-1:2019 4th edition).

CMAF and DASH segments are both based on the ISO Base Media File Format (ISOBMFF), which per se enables smooth integration of both technologies. Most importantly, this DASH profile defines (a) a normative mapping of CMAF structures to DASH structures and (b) how to use Media Presentation Description (MPD) as a manifest format.
Additional tools added to this amendment include

  • DASH events and timed metadata track timing and processing models with in-band event streams,
  • a method for specifying the resynchronization points of segments when the segments have internal structures that allow container-level resynchronization,
  • an MPD patch framework that allows the transmission of partial MPD information as opposed to the complete MPD using the XML patch framework as defined in IETF RFC 5261, and
  • content protection enhancements for efficient signalling.

It is expected that the 5th edition of the MPEG DASH standard (ISO/IEC 23009-1) containing this change will be issued at the 133rd MPEG meeting in January 2021. An overview of DASH standards/features can be found in the Figure below.

Research aspects: one of the features enabled by CMAF is low latency streaming that is actively researched within the multimedia systems community (e.g., here). The main research focus has been related to the ABR logic while its impact on the network is not yet fully understood and requires strong collaboration among stakeholders along the delivery path including ingest, encoding, packaging, (encryption), content delivery network (CDN), and consumption. A holistic view on ABR is needed to enable innovation and the next step towards the future generation of streaming technologies (https://athena.itec.aau.at/).

MPEG Completes 2nd Edition of the Omnidirectional Media Format

MPEG completed the standardization of the 2nd edition of the Omnidirectional MediA Format (OMAF) by promoting ISO/IEC 23009-2 to Final Draft International Standard (FDIS) status including the following features:

  • “Late binding” technologies to deliver and present only that part of the content that adapts to the dynamically changing users’ viewpoint. To enable an efficient implementation of such a feature, this edition of the specification introduces the concept of bitstream rewriting, in which a compliant bitstream is dynamically generated that, by combining the received portions of the bitstream, covers only the users’ viewport on the client.
  • Extension of OMAF beyond 360-degree video. This edition introduces the concept of viewpoints, which can be considered as user-switchable camera positions for viewing content or as temporally contiguous parts of a storyline to provide multiple choices for the storyline a user can follow.
  • Enhances the use of video, image, or timed text overlays on top of omnidirectional visual background video or images related to a sphere or a viewport.

Research aspects: standards usually define formats to enable interoperability but various informative aspects are left open for industry competition and subject to research and development. The same holds for OMAF and its 2nd edition enables researchers and developers to work towards efficient viewport-adaptive implementations focusing on the users’ viewport.

MPEG Completes the Low Complexity Enhancement Video Coding Standard

MPEG is pleased to announce the completion of the new ISO/IEC 23094-2 standard, i.e., Low Complexity Enhancement Video Coding (MPEG-5 Part 2 LCEVC), which has been promoted to Final Draft International Standard (FDIS) at the 132nd MPEG meeting.

  • LCEVC adds an enhancement data stream that can appreciably improve the resolution and visual quality of reconstructed video with an effective compression efficiency of limited complexity by building on top of existing and future video codecs.
  • LCEVC can be used to complement devices originally designed only for decoding the base layer bitstream, by using firmware, operating system, or browser support. It is designed to be compatible with existing video workflows (e.g., CDNs, metadata management, DRM/CA) and network protocols (e.g., HLS, DASH, CMAF) to facilitate the rapid deployment of enhanced video services.
  • LCEVC can be used to deliver higher video quality in limited bandwidth scenarios, especially when the available bit rate is low for high-resolution video delivery and decoding complexity is a challenge. Typical use cases include mobile streaming and social media, and services that benefit from high-density/low-power transcoding.

Research aspects: LCEVC provides a kind of scalable video coding by combining hardware- and software-based decoders that allow for certain flexibility as part of regular software life cycle updates. However, LCEVC has been never compared to Scalable Video Coding (SVC) and Scalable High-Efficiency Video Coding (SHVC) which could be an interesting aspect for future work.

The 133rd MPEG meeting will be again an online meeting in January 2021.

Click here for more information about MPEG meetings and their developments.

VQEG Column: Recent contributions to ITU recommendations

Welcome to the second column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG).
VQEG plays a major role in research and the development of standards on video quality and this column presents examples of recent contributions to International Telecommunication Union (ITU) recommendations, as well as ongoing contributions to recommendations to come in the near future. In addition, the formation of a new group within VQEG addressing Quality Assessment for Health Applications (QAH) has been announced.  

VQEG website: www.vqeg.org
Authors: 
Jesús Gutiérrez (jesus.gutierrez@upm.es), Universidad Politécnica de Madrid (Spain)
Kjell Brunnström (kjell.brunnstrom@ri.se), RISE (Sweden) 
Thanks to Lucjan Janowski (AGH University of Science and Technology), Alexander Raake (TU Ilmenau) and Shahid Satti (Opticom) for their help and contributions.

Introduction

VQEG is an international and independent organisation that provides a forum for technical experts in perceptual video quality assessment from industry, academia, and standardization organisations. Although VQEG does not develop or publish standards, several activities (e.g., validation tests, multi-lab test campaigns, objective quality models developments, etc.) carried out by VQEG groups have been instrumental in the development of international recommendations and standards. VQEG contributions have been mainly submitted to relevant ITU Study Groups (e.g., ITU-T SG9, ITU-T SG12, ITU-R WP6C), but also to other standardization bodies, such as MPEG, ITU-R SG6, ATIS, IEEE P.3333 and P.1858, DVB, and ETSI. 

In our first column on the ACM SIGMM Records we provided a table summarizing the several VQEG studies that have resulted in ITU Recommendations. In this new column, we describe with more detail the last contributions to recent ITU standards, and we provide an insight on the ongoing contributions that may result in ITU recommendations in the near future.

ITU Recommendations with recent inputs from VQEG

ITU-T Rec. P.1204 standard series

A campaign within the ITU-T Study Group (SG) 12 (Question 14) in collaboration with the VQEG AVHD group resulted in the development of three new video quality model standards for the assessment of sequences of up to UHD/4K resolution. This campaign was carried out during more than two years under the project “AVHD-AS / P.NATS Phase 2”. While “P.NATS Phase 1” (finalized in 2016 and resulting in the standards series ITU-T Rec. P.1203, P.1203.1, P.1203.2 and P.1203.3) addressed the development of improved bitstream-based models for the prediction of the overall quality of long (1-5 minutes) video streaming sessions, the second phase addressed the development of short-term video quality models covering a wider scope with bitstream-based, pixel-based and hybrid models. The P.NATS Phase 2 project was executed as a competition between nine participating institutions in different tracks resulting in the aforementioned three types of video quality models. 

For the competition, a total of 26 databases were created, 13 used for training and 13 for validation and selection of the winning models. In order to establish the ground truth, subjective video quality tests were performed on four different display devices (PC-monitors, 55-75” TVs, mobile, and tablet) with at least 24 subjects each and using the 5-point Absolute Category Rating (ACR) scale. In total, about 5000 test sequences with a duration of around 8 seconds were evaluated, containing a variety of resolutions, encoding configurations, bitrates, and framerates using the codecs H.264/AVC, H.265/HEVC and VP9.   

More details about the whole workflow and results of the competition can be found in [1]. As a result of this competition, the new standard series ITU-T Rec. P.1204 [2] has been recently published, including a bitstream-based model  (ITU-T Rec. P.1204.3 [3]), a pixel-based model (ITU-T Rec. P.1204.4 [4]) and a hybrid model (ITU-T Rec. P.1204.5 [5]).

ITU-T Rec. P.1401

ITU-T Rec. P.1401 [6] is about statistical analysis, evaluation and reporting guidelines of quality measurements and was recently revised in January 2020.  Based on the article by Brunnström and Barkowsky [7], it was recognized and pointed out by VQEG that this Recommendation, which is very useful, lacked a section on the topic of multiple comparisons and its potential impact on the performance evaluations of objective quality methods. In the latest revision, Section 7.6.5 covers this topic.

Ongoing VQEG Inputs to ITU Recommendations

ITU-T Rec. P.919

ITU has been working on a recommendation for subjective test methodologies for 360º video on Head-Mounted Displays (HMDs), under the SG12 Question 13 (Q13). The Immersive Media Group (IMG) of the VQEG has collaborated in this effort through the fulfilment of the Phase 1 of the Test Plan for Quality Assessment of 360-degree Video. In particular, the Phase 1 of this test plan addresses the assessment of short sequences (less than 30 seconds), in the spirit of ITU-R BT.500 [8] and ITU-T P.910 [9]. In this sense, the evaluation of audiovisual quality and simulator sickness was considered. On the other hand, the Phase 2 of the test plan (envisioned for the near future) covers the assessment of other factors that can be more influential with longer sequences (several minutes), such as immersiveness and presence.  

Therefore, within Phase 1 the IMG designed and executed a cross-lab test with the participation of ten international laboratories, from AGH University of Science and Technology (Poland), Centrum Wiskunde & Informatica (The Netherlands), Ghent University (Belgium), Nokia Bell-Labs (Spain), Roma TRE University (Italy), RISE Acreo (Sweden), TU Ilmenau (Germany), Universidad Politécnica de Madrid (Spain), University of Surrey (England), Wuhan University (China). 

This test was aimed at assessing and validating subjective evaluation methodologies for 360º video. Thus, the single-stimulus methodology Absolute Category Rating (ACR) and the double-stimulus Degradation Category Rating (DCR) were considered to evaluate audiovisual quality of 360º videos distorted with uniform and non-uniform degradations.  In particular, different configurations of uniform and tile-based coding were applied to eight video sources with different spatial, temporal and exploration properties. Other influence factors were also studied, such as the influence of the sequence duration (from 10 to 30s) and the test setup (considering different HMDs and methods to collect the observers’ ratings, using audio or not, etc.).  Finally, in addition to the evaluation of audiovisual quality, the assessment of simulator sickness symptoms was addressed studying the use of different questionnaires. As a result of this work, the IMG of VQEG presented two contributions to the recommendation ITU-T Rec. P.919 (ex P.360-VR), which has been consented in the last SG12 meeting (7-11 September 2020) and is envisioned to be published soon. In addition, the results and the annotated dataset coming from the cross-lab test will be published soon.

ITU-T Rec. P.913

Another upcoming contribution is prepared by the Statistical Analysis Group (SAM). The main goal of the proposal is to increase the precision of the subjective experiment analysis by describing a subjective answer as a random variable. The random variable is described by three key influencing factors, the sequence quality, a subject bias, and a subject precision. It is further development of the ITU-T P.913 [10] recommendation where subject bias was introduced. Adding subject precision allows for two achievements: Better handling unreliable subjects and easier estimation procedure. 

Current standards describe a way to remove an unreliable subject. The problem is that the methods proposed in BT.500 [8] and P.913 [10] are different and point to different subjects. Also, both methods have some arbitrary parameters (e.g., thresholds) deciding when a subject should be removed. It means that two subjects can be similarly imprecise but one is over the threshold, and we accept all his answers as correct and the other is under the threshold, and we remove her all answers. The proposed method weights the impact of each subject answer depending on the subject precision. As the consequence, each subject is to some extent removed and kept. The balance between how much information we keep and how much we remove depends on the subject precision. 

The estimation procedure of the proposed model, described in the literature, is MLE (Maximum Likelihood Estimation). Such estimation is computationally costly and needs a careful setup to obtain a reliable solution. Therefore, we proposed Alternating Projection (AP) solver which is less general than MLE but works as well as MLE for the subject model estimation. This solver is called “alternating projection” because, in a loop, we alternate between projecting (or averaging) the opinion scores along the subject dimension and the stimulus dimension. It increases the precision of the obtained model parameters’ step by step weighting more information coming from the more precise subjects. More details can be found in the white paper in [11].

Other updates 

A new VQEG group has been recently established related to Quality Assessment for Health Applications (QAH), with the motivation to study visual quality requirements for medical imaging and telemedicine. The main goals of this new group are:

  • Assemble all the existing publicly accessible databases on medical quality.
  • Develop databases with new diagnostic tasks and new objective quality assessment models.
  • Provide methodologies, recommendations and guidelines for subjective test of medical image quality assessment.
  • Study the quality requirements and Quality of Experience in the context of telemedicine and other telehealth services.

For any further questions or expressions of interest to join this group, please contact QAH Chair Lu Zhang (lu.ge@insa-rennes.fr), Vice Chair Meriem Outtas (Meriem.Outtas@insa-rennes.fr), and Vice Chair Hantao Liu (hantao.liu@cs.cardiff.ac.uk).

References

[1] A. Raake, S. Borer, S. Satti, J. Gustafsson, R.R.R. Rao, S. Medagli, P. List, S. Göring, D. Lindero, W. Robitza, G. Heikkilä, S. Broom, C. Schmidmer, B. Feiten, U. Wüstenhagen, T. Wittmann, M. Obermann, R. Bitto, “Multi-model standard for bitstream-, pixel-based and hybrid video quality assessment of UHD/4K: ITU-T P.1204” , IEEE Access, 2020 (Available online soon).   
[2] ITU-T Rec. P.1204. Video quality assessment of streaming services over reliable transport for resolutions up to 4K. Geneva, Switzerland: ITU, 2020.
[3] ITU-T Rec. P.1204.3. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to full bitstream information. Geneva, Switzerland: ITU, 2020.
[4] ITU-T Rec. P.1204.4. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to full and reduced reference pixel information. Geneva, Switzerland: ITU, 2020.
[5] ITU-T Rec. P.1204.5. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to transport and received pixel information. Geneva, Switzerland: ITU, 2020.
[6] ITU-T Rec. P.1401. Methods, metrics and procedures for statistical evaluation, qualification and comparison of objective quality prediction models. Geneva, Switzerland: ITU, 2020.
[7] K. Brunnström and M. Barkowsky, “Statistical quality of experience analysis: on planning the sample size and statistical significance testing”, Journal of Electronic Imaging, vol. 27, no. 5,  p. 11, Sep. 2018 (DOI: 10.1117/1.JEI.27.5.053013).
[8] ITU-R Rec. BT.500-14. Methodology for the subjective assessment of the quality of television pictures. Geneva, Switzerland: ITU, 2019.
[9]  ITU-T Rec. P.910. Subjective video quality assessment methods for multimedia applications. Geneva, Switzerland: ITU, 2008.
[10] ITU-T Rec. P.913. Methods for the subjective assessment of video quality, audio quality and audiovisual quality of Internet video and distribution quality television in any environment. Geneva, Switzerland: ITU, 2016.
[11] Z. Li, C. G. Bampis, L. Janowski, I. Katsavounidis, “A simple model for subject behavior in subjective experiments”, arXiv:2004.02067, Apr. 2020.

JPEG Column: 88th JPEG Meeting

The 88th JPEG meeting initially planned to be held in Geneva, Switzerland, was held online because of the Covid-19 outbreak.

JPEG experts organised a large number of sessions spread over day and night to allow the remote participation of multiple time zones. A very intense activity has resulted in multiple outputs and initiatives. In particular two new explorations activities were initiated. The first explores possible standardisation needs to address the growing emergence of fake media by introducing appropriate security features to prevent the misuse of media content. The latest, considers the use of DNA for media content archival.

Furthermore, JPEG has started the work on the new part 8 of the JPEG Systems standard, called JPEG snack, for interoperable rich image experiences, and it is holding two Call for Evidence, JPEG AI and JPEG Pleno Point cloud coding.

Despite travel restrictions, JPEG Committee has managed to keep up with the majority of its plans, defined prior to the COVID-19 outbreak. An overview of the different activities is represented in Fig. 1.

Figure 1 – JPEG Planned Timeline.

The 88th JPEG meeting had the following highlights:

  • JPEG explores standardization needs to address fake media
  • JPEG Pleno Point Cloud call for evidence
  • JPEG DNA – based archival of media content using DNA
  • JPEG AI call for evidence
  • JPEG XL standard evolves to a final specification
  • JPEG Systems part 8, named JPEG Snack progress
  • JPEG XS Part-1 2nd Edition first ballot.

JPEG explores standardization needs to address fake media

Recent advances in media manipulation, particularly deep learning-based approaches, can produce near realistic media content that is almost indistinguishable from authentic content to the human eye. These developments open opportunities for production of new types of media contents that are useful for the entertainment industry and other business usage, e.g., creation of special effects or artificial natural scene production with actors in the studio. However, this also leads to issues relating to fake media generation undermining the integrity of the media (e.g., deepfakes), copyright infringements and defamation to mention a few examples. Misuse of manipulated media can cause social unrest, spread rumours for political gain or encourage hate crimes. In this context, the term ‘fake’ is used here to refer to any manipulated media, independently of its ‘good’ or ‘bad’ intention.

In many application domains, fake media producers may want or may be required to declare the type of manipulations performed, in opposition to other situations where the intention is to ‘hide’ the mere existence of such manipulations. This is already leading various Governmental organizations to plan new legislation or companies (especially social media platforms or news outlets) to develop mechanisms that would clearly detect and annotate manipulated media contents when they are shared. While growing efforts are noticeable in developing technologies, there is a need to have a standard for the media/metadata format, e.g., a JPEG standard that facilitates a secure and reliable annotation of fake media, both in good faith and malicious usage scenarios. To better understand the fake media ecosystem and needs in terms of standardization, the JPEG Committee has initiated an in-depth analysis of fake media use cases, naturally independently of the “intentions”.     

More information on the initiative is available on the JPEG website. Interested parties are invited to join the above AHG through the following URL: http://listregistration.jpeg.org.

JPEG Pleno Point Cloud

JPEG Pleno is working towards the integration of various modalities of plenoptic content under a single and seamless framework. Efficient and powerful point cloud representation is a key feature within this vision. Point cloud data supports a wide range of applications including computer-aided manufacturing, entertainment, cultural heritage preservation, scientific research and advanced sensing and analysis. During the 88th JPEG meeting, the JPEG Committee released a Final Call for Evidence on JPEG Pleno Point Cloud Coding that focuses specifically on point cloud coding solutions supporting scalability and random access of decoded point clouds. Between the 88th and 89th meetings, the JPEG Committee will be actively promoting this activity and collecting registrations to participate in the Call for Evidence.

JPEG DNA

In digital media information, notably images, the relevant representation symbols, e.g. quantized DCT coefficients, are expressed in bits (i.e., binary units) but they could be expressed in any other units, for example the DNA units which follow a 4-ary representation basis. This would mean that DNA molecules may be created with a specific DNA units’ configuration which stores some media representation symbols, e.g. the symbols of a JPEG image, thus leading to DNA-based media storage as a form of molecular data storage. JPEG standards have been used in storage and archival of digital pictures as well as moving images. While the legacy JPEG format is widely used for photo storage in SD cards, as well as archival of pictures by consumers,  JPEG 2000 as described in ISO/IEC 15444 is used in many archival applications, notably for preservation of cultural heritage in form of visual data as pictures and video in digital format. This puts the JPEG Committee in a unique position to address the challenges in DNA-based storage by creating a standard image representation and coding for such applications. To explore the latter, an AHG has been established. Interested parties are invited to join the above AHG through the following URL: http://listregistration.jpeg.org.

JPEG AI

At the 88th meeting, the submissions to the Call for Evidence were reported and analysed. Six submissions were received in response to the Call for Evidence made in coordination with the IEEE MMSP 2020 Challenge. The submissions along with the anchors were already evaluated using objective quality metrics. Following this initial process, subjective experiments have been designed to compare the performance of all submissions. Thus, during this meeting, the main focus of JPEG AI was on the presentation and discussion of the objective performance evaluation of all submissions as well as the definition of the methodology for the subjective evaluation that will be made next.

JPEG XL

The standardization of the JPEG XL image coding system is nearing completion. Final technical comments by national bodies have been received for the codestream (Part 1); the DIS has been approved and an FDIS text is under preparation. The container file format (Part 2) is progressing to the DIS stage. A white paper summarizing key features of JPEG XL is available at http://ds.jpeg.org/whitepapers/jpeg-xl-whitepaper.pdf.

JPEG Systems

ISO/IEC has approved the JPEG Snack initiative to deliver interoperable rich image experiences.  As a result, the JPEG Systems Part 8 (ISO/IEC 19566-8) has been created to define the file format construction and the metadata signalling and descriptions which enable animation with transition effects.  A Call for Participation and updated use cases and requirements have been issued. The CfP and the use cases and requirements documents are available at http://ds.jpeg.org/documents/wg1n87035-REQ-JPEG_Snack_Use_Cases_and_Requirements_v2_2.pdf and http://ds.jpeg.org/documents/wg1n88032-SI-CfP_JPEG_Snack.pdf respectively.

An updated working draft for the JLINK initiative was completed.  Interest parties are encouraged to review the JLINK Working Draft 3.0 available at http://ds.jpeg.org/documents/wg1n88031-SI-JLINK_WD_3_0.pdf

JPEG XS

The JPEG committee is pleased to announce a significant step in the standardization of an efficient Bayer image compression scheme, with the first ballot of the 2nd Edition of JPEG XS Part-1.

The new edition of this visually lossless low-latency and lightweight compression scheme now includes image sensor coding tools allowing efficient compression of Color-Filtered Array (CFA) data. This compression enables better quality and lower complexity than the corresponding compression in the RGB domain.  It can be used as a mezzanine codec in various markets such as real-time video storage in and outside of cameras, and data compression onboard autonomous cars.

Final Quote

“Fake Media has become a challenge with the wide-spread manipulated contents in the news. JPEG is determined to mitigate this problem by providing standards that can securely identify manipulated contents.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Future JPEG meetings are planned as follows:

  • No 89, will be held online from October 5 to 9, 2020.

MPEG Column: 131st MPEG Meeting (virtual/online)

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 131st MPEG meeting concluded on July 3, 2020, online, again but with a press release comprising an impressive list of news items which is led by “MPEG Announces VVC – the Versatile Video Coding Standard”. Just in the middle of the SC 29 (i.e., MPEG’s parent body within ISO) restructuring process, MPEG successfully ratified — jointly with ITU-T’s VCEG within JVET — its next-generation video codec among other interesting results from the 131st MPEG meeting:

Standards progressing to final approval ballot (FDIS)

  • MPEG Announces VVC – the Versatile Video Coding Standard
  • Point Cloud Compression – MPEG promotes a Video-based Point Cloud Compression Technology to the FDIS stage
  • MPEG-H 3D Audio – MPEG promotes Baseline Profile for 3D Audio to the final stage

Call for Proposals

  • Call for Proposals on Technologies for MPEG-21 Contracts to Smart Contracts Conversion
  • MPEG issues a Call for Proposals on extension and improvements to ISO/IEC 23092 standard series

Standards progressing to the first milestone of the ISO standard development process

  • Widening support for storage and delivery of MPEG-5 EVC
  • Multi-Image Application Format adds support of HDR
  • Carriage of Geometry-based Point Cloud Data progresses to Committee Draft
  • MPEG Immersive Video (MIV) progresses to Committee Draft
  • Neural Network Compression for Multimedia Applications – MPEG progresses to Committee Draft
  • MPEG issues Committee Draft of Conformance and Reference Software for Essential Video Coding (EVC)

The corresponding press release of the 131st MPEG meeting can be found here: https://mpeg-standards.com/meetings/mpeg-131/. This report focused on video coding featuring VVC as well as PCC and systems aspects (i.e., file format, DASH).

MPEG Announces VVC – the Versatile Video Coding Standard

MPEG is pleased to announce the completion of the new Versatile Video Coding (VVC) standard at its 131st meeting. The document has been progressed to its final approval ballot as ISO/IEC 23090-3 and will also be known as H.266 in the ITU-T.

VVC Architecture (from IEEE ICME 2020 tutorial of Mathias Wien and Benjamin Bross)

VVC is the latest in a series of very successful standards for video coding that have been jointly developed with ITU-T, and it is the direct successor to the well-known and widely used High Efficiency Video Coding (HEVC) and Advanced Video Coding (AVC) standards (see architecture in the figure above). VVC provides a major benefit in compression over HEVC. Plans are underway to conduct a verification test with formal subjective testing to confirm that VVC achieves an estimated 50% bit rate reduction versus HEVC for equal subjective video quality. Test results have already demonstrated that VVC typically provides about a 40%-bit rate reduction for 4K/UHD video sequences in tests using objective metrics (i.e., PSNR, VMAF, MS-SSIM). Application areas especially targeted for the use of VVC include:

  • ultra-high definition 4K and 8K video,
  • video with a high dynamic range and wide colour gamut, and
  • video for immersive media applications such as 360° omnidirectional video.

Furthermore, VVC is designed for a wide variety of types of video such as camera capturedcomputer-generated, and mixed content for screen sharing, adaptive streaming, game streaming, video with scrolling text, etc. Conventional standard-definition and high-definition video content are also supported with similar gains in compression. In addition to improving coding efficiency, VVC also provides highly flexible syntax supporting such use cases as (i) subpicture bitstream extraction, (ii) bitstream merging, (iii) temporal sub-layering, and (iv) layered coding scalability.

The current performance of VVC compared to HEVC-HM is shown in the figure below which confirms the statement above but also highlights the increased complexity. Please note that VTM9 is not optimized for speed but functionality (i.e., compression efficiency).

Performance of VVC, VTM9 vs. HM (taken from https://bit.ly/mpeg131).

MPEG also announces completion of ISO/IEC 23002-7 “Versatile supplemental enhancement information for coded video bitstreams” (VSEI), developed jointly with ITU-T as Rec. ITU-T H.274. The new VSEI standard specifies the syntax and semantics of video usability information (VUI) parameters and supplemental enhancement information (SEI) messages for use with coded video bitstreams. VSEI is especially intended for use with VVC, although it is drafted to be generic and flexible so that it may also be used with other types of coded video bitstreams. Once specified in VSEI, different video coding standards and systems-environment specifications can re-use the same SEI messages without the need for defining special-purpose data customized to the specific usage context.

At the same time, the Media Coding Industry Forum (MC-IF) announces a VVC patent pool fostering with an initial meeting on September 1, 2020. The aim of this meeting is to identify tasks and to propose a schedule for VVC pool fostering with the goal to select a pool facilitator/administrator by the end of 2020. MC-IF is not facilitating or administering a patent pool.

At the time of writing this blog post, it is probably too early to make an assessment of whether VVC will share the fate of HEVC or AVC (w.r.t. patent pooling). AVC is still the most widely used video codec but with AVC, HEVC, EVC, VVC, LCEVC, AV1, (AV2), and probably also AVS3 — did I miss anything? — the competition and pressure are certainly increasing.

Research aspects: from a research perspective, reduction of time-complexity (for a variety of use cases) while maintaining quality and bitrate at acceptable levels is probably the most relevant aspect. Improvements in individual building blocks of VVC by using artificial neural networks (ANNs) are another area of interest but also end-to-end aspects of video coding using ANNs will probably pave the roads towards the/a next generation of video codec(s). Utilizing VVC and its features for HTTP adaptive streaming (HAS) is probably most interesting for me but maybe also for others…

MPEG promotes a Video-based Point Cloud Compression Technology to the FDIS stage

At its 131st meeting, MPEG promoted its Video-based Point Cloud Compression (V-PCC) standard to the Final Draft International Standard (FDIS) stage. V-PCC addresses lossless and lossy coding of 3D point clouds with associated attributes such as colors and reflectance. Point clouds are typically represented by extremely large amounts of data, which is a significant barrier for mass-market applications. However, the relative ease to capture and render spatial information as point clouds compared to other volumetric video representations makes point clouds increasingly popular to present immersive volumetric data. With the current V-PCC encoder implementation providing compression in the range of 100:1 to 300:1, a dynamic point cloud of one million points could be encoded at 8 Mbit/s with good perceptual quality. Real-time decoding and rendering of V-PCC bitstreams have also been demonstrated on current mobile hardware. The V-PCC standard leverages video compression technologies and the video ecosystem in general (hardware acceleration, transmission services, and infrastructure) while enabling new kinds of applications. The V-PCC standard contains several profiles that leverage existing AVC and HEVC implementations, which may make them suitable to run on existing and emerging platforms. The standard is also extensible to upcoming video specifications such as Versatile Video Coding (VVC) and Essential Video Coding (EVC).

The V-PCC standard is based on Visual Volumetric Video-based Coding (V3C), which is expected to be re-used by other MPEG-I volumetric codecs under development. MPEG is also developing a standard for the carriage of V-PCC and V3C data (ISO/IEC 23090-10) which has been promoted to DIS status at the 130th MPEG meeting.

By providing high-level immersiveness at currently available bandwidths, the V-PCC standard is expected to enable several types of applications and services such as six Degrees of Freedom (6 DoF) immersive media, virtual reality (VR) / augmented reality (AR), immersive real-time communication and cultural heritage.

Research aspects: as V-PCC is video-based, we can probably state similar research aspects as for video codecs such as improving efficiency both for encoding and rendering as well as reduction of time complexity. During the development of V-PCC mainly HEVC (and AVC) has/have been used but it is definitely interesting to use also VVC for PCC. Finally, the dynamic adaptive streaming of V-PCC data is still in its infancy despite some articles published here and there.

MPEG Systems related News

Finally, I’d like to share news related to MPEG systems and the carriage of video data as depicted in the figure below. In particular, the carriage of VVC (and also EVC) has been now enabled in MPEG-2 Systems (specifically within the transport stream) and in the various file formats (specifically within the NAL file format). The latter is used also in CMAF and DASH which makes VVC (and also EVC) ready for HTTP adaptive streaming (HAS).

Carriage of Video in MPEG Systems Standards (taken from https://bit.ly/mpeg131).

What about DASH and CMAF?

CMAF maintains a so-called “technologies under consideration” document which contains — among other things — a proposed VVC CMAF profile. Additionally, there are two exploration activities related to CMAF, i.e., (i) multi-stream support and (ii) storage, archiving, and content management for CMAF files.

DASH works on potential improvement for the first amendment to ISO/IEC 23009-1 4th edition related to CMAF support, events processing model, and other extensions. Additionally, there’s a working draft for a second amendment to ISO/IEC 23009-1 4th edition enabling bandwidth change signalling track and other enhancements. Furthermore, ISO/IEC 23009-8 (Session-based DASH operations) has been advanced to Draft International Standard (see also my last report).

An overview of the current status of MPEG-DASH can be found in the figure below.

The next meeting will be again an online meeting in October 2020.

Finally, MPEG organized a Webinar presenting results from the 131st MPEG meeting. The slides and video recordings are available here: https://bit.ly/mpeg131.

Click here for more information about MPEG meetings and their developments.

Standards Column: VQEG

Welcome to the first column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG).
VQEG is an international and independent organisation of technical experts in perceptual video quality assessment from industry, academia, and government organisations.
This column briefly introduces the mission and main activities of VQEG, establishing a starting point of a series of columns that will provide regular updates of the advances within the current ongoing projects, as well as reports of the VQEG meetings. 
The editors of these columns are Jesús Gutiérrez (upper photo, jesus.gutierrez@upm.es), co-chair of the Immersive Media Group of VQEG and Kjell Brunnström (lower photo, kjell.brunnstrom@ri.se), general co-chair of VQEG.  Feel free to contact them for any further questions, comments or information, and also to check the VQEG website: www.vqeg.org.

Introduction

The Video Quality Experts Group (VQEG) was born from a need to bring together experts in subjective video quality assessment and objective quality measurement. The first VQEG meeting, held in Turin in 1997, was attended by a small group of experts drawn from ITU-T and ITU-R Study Groups. VQEG was first grounded in basic subjective methodology and objective tool development/verification for video quality assessment such that the industry could be moved forward with standardization and implementation. At the beginning it was focused around measuring the perceived video quality since the distribution path for video and audio were limited and known.

Over the last 20 years from the formation of VQEG the ecosystem has changed dramatically and thus so must the work. Multimedia is now pervasive on all devices and methods of distribution from broadcast to cellular data networks. This shift has the expertise within VQEG to move from the visual (no-audio) quality of video to Quality of Experience (QoE).

The march forward of technologies means that VQEG needs to react and be on the leading edge of developing, defining and deploying methods and tools that help address these new technologies and move the industry forward. This also means that we need to embrace both qualitative and quantitative ways of defining these new spaces and terms. Taking a holistic approach to QoE will enable VQEG to drive forward and faster with unprecedented collaboration and execution

VQEG is open to all interested from industry, academia, government organizations and Standard-Developing Organizations (SDOs). There are no fees involved, no membership applications and no invitations are needed to participate in VQEG activities. Subscription to the main VQEG email list (ituvidq@its.bldrdoc.gov) constitutes membership in VQEG.

VQEG conducts work via discussions over email reflectors, regularly scheduled conference calls and, in general, two face-to-face meetings per year. There are currently more than 500 people registered across 11 email reflectors, including a main reflector for general announcements relevant to the entire group, and different project reflectors dedicated to technical discussions of specific projects. A LinkedIn group exists as well.

Objectives

The main objectives of VQEG are: 

  • To provide a forum, via email lists and face-to-face meetings for video quality assessment experts to exchange information and work together on common goals. 
  • To formulate test plans that clearly and specifically define the procedures for performing subjective assessment tests and objective models validations.
  • To produce open source databases of multimedia material and test results, as well as software tools. 
  • To conduct subjective studies of multimedia and immersive technologies and provide a place for collaborative model development to take place.

Projects

Currently, several working groups are active within VQEG, classified under four main topics:

  1. Subjective Methods: Based on collaborative efforts to improve subjective video quality test methods.
    • Audiovisual HD (AVHD), project “Advanced Subjective Methods” (AVHD-SUB): This group investigates improved audiovisual subjective quality testing methods. This effort may lead to a revision of ITU-T Rec. P.911. As examples of its activities, the group has investigated alternative experiment designs for subjective tests, to validate subjective testing of long video sequences that are only viewed once by each subject. In addition, it conducted a joint investigation into the impact of the environment on mean opinion scores (MOS).
    • Psycho-Physiological Quality Assessment (PsyPhyQA): The aim of this project is to establish novel psychophysiology based techniques and methodologies for video quality assessment and real-time interaction of humans with advanced video communication environments. Specifically, some of the aspects that the project is looking at include: video quality assessment based on human psychophysiology (including, eye gaze, EEG, EKG, EMG, GSR, etc.), computational video quality models based on psychophysiological measurements, signal processing and machine learning techniques for psychophysiology based video quality assessment, experimental design and methodologies for psychophysiological assessment, correlates of psychophysics and psychophysiology. PsyPhyQA has published a dataset and testplan for a common framework for the evaluation of psychophysiological visual quality assessment.
    • Statistical Analysis Methods (SAM): This group addresses problems related to how to better analyze and improve data quality coming from subjective experiments and how to consider uncertainty in objective media quality predictors/models development. Its main goals are: to improve methods used to draw conclusions from subjective experiments, to understand the process of expressing opinion in a subjective experiment, to improve subjective experiment design to facilitate analysis and applications, to improve the analysis of objective model performances, and to revisit standardised methods for the assessment of the performance of objective model performances. 
  2. Objective Metrics: Working towards developing and validating objective video quality metrics.
    • Audiovisual HD (AVHD), project “AVHD-AS / P.NATS phase 2”: It is a joint project of VQEG and ITU Study Group 12 Question 14. The main goal is to develop a multitude of objective models, varying in terms of complexity/type of input/use-cases for the assessment of video quality in HTTP/TCIP based adaptive bitrate streaming services (e.g., YouTube, Vimeo, Amazon Video, Netflix, etc). For these services quality experienced by the end user is affected by video coding degradations, and delivery degradations due to initial buffering, re-buffering and media adaptations caused by the changes in bitrate, resolution, and frame rate
    • Computer Generated Imagery (CGI): focuses on the computer generated content for both images and videos material. The main goals are as follows: creating a large database of computer generated content, analyzing the content (feature extraction before and after rendering), analyzing the performance of objective quality metrics, evaluating/developing existing/new quality metrics/models for CGI material, studying rendering adaptation techniques (depending on the network constraints). This activity is in-line with the ITU-T work item P.BBQCG (Parametric Bitstream-based Quality Assessment of Cloud Gaming Services). 
    • No Reference Metrics (NORM): This group is an open collaborative for developing No-Reference metrics and methods for monitoring use case specific visual service quality. The NORM group is a complementary, industry-driven alternative of QoE to measure automatically the visual quality by using perceived indicators. Its main activities are to maintain a list of real-world use cases for visual quality monitoring, a list of potential algorithms and methods for no reference MOS and/or key indicators (visual artifact detection) for each use case, a list of methods (including datasets) to train and validate the algorithms for each use case, and a list of methods to provide root cause indication for each use case. In addition, the group encourages open discussions and knowledge sharing on all aspects related to no-reference metric research and development. 
    • Joint Effort Group (JEG) – Hybrid: This group is an open collaboration working together to develop a robust Hybrid Perceptual/Bit-Stream model. It has developed and made available routines to create and capture bit-stream data and parse bit-streams into HMIX files. Efforts are underway into developing subjectively rated video quality datasets with bit-stream data that can be used by all JEG researchers. The goal is to produce one model that combines metrics developed separately by a variety of researchers. 
    • Quality Assessment for Computer Vision Applications (QACoViA): the goal of this group is to study the visual quality requirements for computer vision methods, especially focusing on: testing methodologies and frameworks to identify the limit of computer vision methods with respect to the visual quality of the ingest; the minimum quality requirements and objective visual quality measure to estimate if a visual content is the operating region of computer vision; and delivering implementable algorithms being a proof/demonstrate of the new proposal concept of an objective video quality assessment methods for recognition tasks.
  3. Industry and Applications: Focused on seeking improved understanding of new video technologies and applications.
    • 5G Key Performance Indicators (5GKPI): Studies the relationship between the Key Performance Indicators (KPI) of new communication networks (namely 5G, but extensible to others) and the QoE of the video services on top of them. With this aim, this group addresses: the definition of relevant use cases (e.g., video for industrial applications, or mobility scenarios), the study of global QoE aspects for video in mobility and industrial scenarios, the identification of the relevant network KPIs(e.g., bitrate, latency, etc.) and application-level video KPIs (e.g., picture quality, A/V sync, etc.) and the generation of open datasets for algorithm testing and training.
    • IMG (Immersive Media Group): This group researches on quality assessment of immersive media, with the main goals of generating datasets of immersive media content, validating subjective test methods, and baseline quality assessment of immersive systems providing guidelines for QoE evaluation. The technologies covered by this group include: 360-degree content, virtual/augmented mixed reality, stereoscopic 3D content, Free Viewpoint Video, multiview technologies, light field content, etc.
  4. Support and Outreach: Responsible for the support for VQEG’s activities.
    • eLetter: The goal of VQEG eLetter is to provide up-to-date technical advances on video quality related topics. Each issue of VQEG eletter features a collection of papers authored by well-known researchers. These papers are contributed by invited authors or authors responding to a call-for-paper, and they can be: technical papers, summary/review of other publications, best practice anthologies, reprints of difficult to obtain articles, and responses to other articles. VQEG wants the eLetter to be interactive in nature.
    • Human Factors for Visual Experiences (HFVE): The objectives of this group is  to uphold the liaison relation between VQEG and the IEEE standardization group P3333.1. Some examples of the activities going on within this group are the standard for the (deep learning-based) assessment based on human factors of visual experiences with virtual/augmented/mixed reality and the standards on human factors for the  quality assessment of light field imaging (IEEE P3333.1.4) and on quality assessment of high dynamic range technologies. 
    • Independent Lab Group (ILG): The ILG act as independent arbitrators, whose generous contributions make possible the VQEG validation tests. Their goal is to ensure that all VQEG validation testing is unbiased and done to high quality standards. 
    • Joint Effort Group (JEG): is an activity within VQEG that promotes collaborative efforts addressed to: validate metrics through both subjective dataset completion and metric design, extend subjective datasets in order to better identify the limitations of quality metrics, improve subjective methodologies to address new scenarios and use cases that involve QoE issues, and increase the knowledge about both subjective and objective video quality assessment.
    • Joint Qualinet-VQEG team on Immersive Media: The objectives of this joint team from Qualinet and VQEG are: to uphold the liaison relation between both bodies, to inform both QUALINET and VQEG on the activities in respective organizations (especially on the topic of immersive media), to promote collaborations on other topics (i.e., form new joint teams), and to uphold the liaison relation with ITU-T SG12, in particular on topics around interactive, augmented and virtual reality QoE.
    • Tools and Subjective Labs Setup: The objective of this project is to provide the video quality research community with a wide variety of software tools and guidance in order to facilitate research. Tools are available in the following categories: quality analysis (software to run quality analyses), encoding (video encoding tools), streaming (streaming and extracting information from video streams), subjective test software (tools for running and analyzing subjective tests), and helper tools (miscellaneous helper tools).

In addition, the Intersector Rapporteur Group on Audiovisual Quality Assessment (IRG-AVQA) studies topics related to video and audiovisual quality assessment (both subjective and objective) among ITU-R Study Group 6 and ITU-T Study Group 12. VQEG colocates meetings with the IRG-AVQA to encourage a wider range of experts to contribute to Recommendations. 

For more details and previous closed projects please check: https://www.its.bldrdoc.gov/vqeg/projects-home.aspx

Major achievements

VQEG activities are documented in reports and submitted to relevant ITU Study Groups (e.g., ITU-T SG9, ITU-T SG12, ITU-R WP6C), and other SDOs as appropriate. Several VQEG studies have resulted in ITU Recommendations.

VQEG ProjectDescriptionITU Recommendations
Full Reference Television (FRTV) Phase I Examined the performance of FR and NR models on standard definition video. The test materials used in this test plan and the subjective tests data are freely available to researchers. ITU-T J.143 (2000), ITU-T J.144 (2001), ITU-T J.149 (2004)
Full Reference Television (FRTV) Phase II Examined the performance of FR and NR models on standard definition video, using the DSCQS methodology. ITU-T J.144 (2004)
ITU-R BT.1683 (2004)
Multimedia (MM) Phase I Examined the performance of FR, RR and NR models for VGA, CIF and QCIF video (no audio).ITU-T J.148 (2003)
ITU-T P.910 (2008)
ITU-T J.246 (2008)
ITU-T J.247 (2008)
ITU-T J.340 (2010)
ITU-R BT.1683 (2004)
Reduced Reference / No Reference Television (RRNR-TV) Examined the performance of RR and NR models on standard definition video ITU-T J.244 (2008)
ITU-T J.249 (2010)
ITU-R BT.1885 (2011)
High Definition Television (HDTV) Examined the performance of FR, RR and NR models for HDTV. Some of the video sequences used in this test are publicly available in the Consumer Digital Video Library.ITU-T J.341 (2011)
ITU-T J.342 (2011)
QARTStudied the subjective quality evaluation of video used for recognition tasks and task-based multimedia applications. ITU-T P.912 (2008)
Hybrid Perceptual BitstreamExamined the performance of Hybri models for VGA/WVGA and HDTV ITU-T J.343 (2014)
ITU-T J.343.1-6 (2014)
3DTVInvestigated how to assess 3DTV subjective video quality, covering methodologies, display requirements and evaluation of visual discomfort and fatigue. ITU-T P.914 (2016)
ITU-T P.915 (2016)
ITU-T P.916 (2016)
Audiovisual HD (AVHD)On one side, addressed the subjective evaluation of audio-video quality metrics.
On the other side, developed model standards for video quality assessment of streaming services over reliable transport for resolutions up to 4K/UHD, in collaboration with ITU-T SG12.
ITU-T P.913 (2014)
ITU-T P.1204 (2020)
ITU-T P.1204.3 (2020)
ITU-T P.1204.4 (2020)
ITU-T P.1204.5 (2020)

The contribution to current ITU standardization efforts is still ongoing. For example, updated texts have been contributed by VQEG on statistical analysis in ITU-T Rec. P.1401, and on subjective quality assessment of 360-degree video in ITU-T P.360-VR. 

Apart from this, VQEG is supporting the research on QoE by providing for the research community tools and datasets. For instance, it is worth noting the wide variety of software tools and guidance in order to facilitate research provided by VQEG Tools and Subjective Labs Setup via GitHub. Another example, is the VQEG Image Quality Evaluation Tool (VIQET), which is an objective no-reference photo quality evaluation tool. Finally, several datasets have been published which can be found in the websites of the corresponding projects, in the Consumer Digital Video Library or in other repositories.

General articles for the interested reader about the work of VQEG, especially covering the previous works are [1, 2].

References

[1] Q. Huynh-Thu, A. Webster, K. Brunnström, and M. Pinson, “VQEG: Shaping Standards on Video Quality”, in 1st International Conference on Advanced Imaging, Tokyo, Japan, 2015.
[2] K. Brunnström, D. Hands, F. Speranza, and A. Webster, “VQEG Validation and ITU Standardisation of Objective Perceptual Video Quality Metrics”, IEEE Signal Processing Magazine, vol. 26, no. 3, pp. 96-101, May 2009.

JPEG Column: 87th JPEG Meeting

The 87th JPEG meeting initially planned to be held in Erlangen, Germany, was held online from 25-30, April 2020 because of the Covid-19 outbreak. JPEG experts participated in a number of online meetings attempting to make them as effective as possible while considering participation from different time zones, ranging from Australia to California, U.S.A.

JPEG decided to proceed with a Second Call for Evidence on JPEG Pleno Point Cloud Coding and continued work to prepare for contributions to the previous Call for Evidence on Learning-based Image Coding Technologies (JPEG AI).

The 87th JPEG meeting had the following highlights:

  • JPEG Pleno Point Cloud Coding issues a Call for Evidence on coding solutions supporting scalability and random access of decoded point clouds.
  • JPEG AI defines evaluation methodologies of the Call for Evidence on machine learning based image coding solutions.
  • JPEG XL defines the file format compatible with existing formats. 
  • JPEG exploration on Media Blockchain releases use cases and requirements.
  • JPEG Systems releases a first version of JPEG Snack use cases and requirements.
  • JPEG XS announces significant improvement of the quality of raw-Bayer image sensor data compression.

JPEG Pleno Point Cloud

JPEG Pleno is working towards the integration of various modalities of plenoptic content under a single and seamless framework. Efficient and powerful point cloud representation is a key feature within this vision. Point cloud data supports a wide range of applications including computer-aided manufacturing, entertainment, cultural heritage preservation, scientific research and advanced sensing and analysis. During the 87th JPEG meeting, the JPEG Committee released a Second Call for Evidence on JPEG Pleno Point Cloud Coding that focuses specifically on point cloud coding solutions supporting scalability and random access of decoded point clouds. The Second Call for Evidence on JPEG Pleno Point Cloud Coding has a revised timeline reflecting changes in the activity due to the 2020 COVID-19 Pandemic. A Final Call for Evidence on JPEG Pleno Point Cloud Coding is planned to be released in July 2020.

JPEG AI

The main focus of JPEG AI was on the promotion and definition of the submission and evaluation methodologies of the Call for Evidence (in coordination with the IEEE MMSP 2020 Challenge) that was issued as outcome of the 86th JPEG meeting, Sydney, Australia.

JPEG XL

The File Format has been defined for JPEG XL (ISO/IEC 18181-1) codestream, metadata and extensions. The file format enables compatibility with ISOBMFF, JUMBF, XMP, Exif and other existing standards. Standardization has now reached the Committee Draft stage and the DIS ballot is ongoing. A white paper about JPEG XL’s features and tools was approved at this meeting and is available on the jpeg.org website.

JPEG exploration on Media Blockchain – Call for feedback on use cases and requirements

JPEG has determined that blockchain and distributed ledger technologies (DLT) have great potential as a technology component to address many privacy and security related challenges in digital media applications. This includes digital rights management, privacy and security, integrity verification, and authenticity, that impacts society in several ways including the loss of income in the creative sector due to piracy, the spread of fake news, or evidence tampering for fraud purposes.

JPEG is exploring standardization needs related to media blockchain to ensure seamless interoperability and integration of blockchain technology with widely accepted media standards. In this context, the JPEG Committee announces a call for feedback from interested stakeholders on the first public release of the use cases and requirements document.

JPEG Systems initiates standardisation of JPEG Snack

Media “snacking”, the consumption of multimedia in short bursts (less than 15 minutes) has become globally popular. JPEG recognizes the need for standardizing how snack images are constructed to ensure interoperability. A first version of JPEG Snack use cases and requirements is now complete and publicly available on JPEG website inviting feedback from stakeholders.

JPEG made progress on a fundamental capability of the JPEG file structure with enhancements to JPEG Universal Metadata Box Format (JUMBF) to support embedding common file types; the DIS text for JUMBF Amendment 1 is ready for ballot. Likewise JPEG 360 Amendment 1 DIS text is ready for ballot; this amendment supports stereoscopic 360 degree images, accelerated rendering for regions-of-interest, and removes the XMP signature block from the metadata description.

JPEG XS – The JPEG committee is pleased to announce significant improvement of the quality of its upcoming Bayer compression.

Over the past year, an improvement of around 2dB has been observed for the new coding tools currently being developed for image sensor compression within JPEG XS. This visually lossless low-latency and lightweight compression scheme can be used as a mezzanine codec in various markets like real-time video storage inside and outside of cameras, and data compression onboard autonomous cars. Mathematically lossless capability is also investigated and encapsulation within MXF or SMPTE ST2110-22 is currently being finalized.

Final Quote

“JPEG is committed to the development of new standards that provide state of the art imaging solutions to the largest spectrum of stakeholders. During the 87th meeting, held online because of the Covid-19 pandemic, JPEG progressed well with its current and even launched new activities. Although some timelines had to be revisited, overall, no disruptions of the workplan have occurred.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

About JPEG

The Joint Photographic Experts Group (JPEG) is a Working Group of ISO/IEC, the International Organisation for Standardization / International Electrotechnical Commission, (ISO/IEC JTC 1/SC 29/WG 1) and of the International Telecommunication Union (ITU-T SG16), responsible for the popular JPEG, JPEG 2000, JPEG XR, JPSearch, JPEG XT and more recently, the JPEG XS, JPEG Systems, JPEG Pleno and JPEG XL families of imaging standards.

More information about JPEG and its work is available at jpeg.org or by contacting Antonio Pinheiro or Frederik Temmermans (pr@jpeg.org) of the JPEG Communication Subgroup.

If you would like to stay posted on JPEG activities, please subscribe to the jpeg-news mailing list on http://jpeg-news-list.jpeg.org.  

Future JPEG meetings are planned as follows:

  • No 88, initially planned in Geneva, Switzerland, July 4 to 10, 2020, will be held online from July 7 to 10, 2020.

MPEG Column: 130th MPEG Meeting (virtual/online)

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 130th MPEG meeting concluded on April 24, 2020, in Alpbach, Austria … well, not exactly, unfortunately. The 130th MPEG meeting concluded on April 24, 2020, but not in Alpbach, Austria.

I attended the 130th MPEG meeting remotely.

Because of the Covid-19 pandemic, the 130th MPEG meeting has been converted from a physical meeting to a fully online meeting, the first in MPEG’s 30+ years of history. Approximately 600 experts attending from 19 time zones worked in tens of Zoom meeting sessions supported by an online calendar and by collaborative tools that involved MPEG experts in both online and offline sessions. For example, input contributions had to be registered and uploaded ahead of the meeting to allow for efficient scheduling of two-hour meeting slots, which have been distributed from early morning to late night in order to accommodate experts working in different time zones as mentioned earlier. These input contributions have been then mapped to GitLab issues for offline discussions and the actual meeting slots have been primarily used for organizing the meeting, resolving conflicts, and making decisions including approving output documents. Although the productivity of the online meeting could not reach the level of regular face-to-face meetings, the results posted in the press release show that MPEG experts managed the challenge quite well, specifically

  • MPEG ratifies MPEG-5 Essential Video Coding (EVC) standard;
  • MPEG issues the Final Draft International Standards for parts 1, 2, 4, and 5 of MPEG-G 2nd edition;
  • MPEG expands the coverage of ISO Base Media File Format (ISOBMFF) family of standards;
  • A new standard for large scale client-specific streaming with MPEG-DASH;

Other Important Activities at the 130th MPEG meeting(i) the carriage of visual volumetric video-based coding data, (ii) Network-Based Media Processing (NBMP) function templates, (iii) the conversion from MPEG-21 contracts to smart contracts, (iv) deep neural network-based video coding, (v) Low Complexity Enhancement Video Coding (LCEVC) reaching DIS stage, and (vi) a new level of the MPEG-4 Audio ALS Simple Profile for high-resolution audio among others

The corresponding press release of the 130th MPEG meeting can be found here: https://mpeg.chiariglione.org/meetings/130. This report focused on video coding (EVC) and systems aspects (file format, DASH).

MPEG ratifies MPEG-5 Essential Video Coding Standard

At its 130th meeting, MPEG announced the completion of the new ISO/IEC 23094-1 standard which is referred to as MPEG-5 Essential Video Coding (EVC) and has been promoted to Final Draft International Standard (FDIS) status. There is a constant demand for more efficient video coding technologies (e.g., due to the increased usage of video on the internet), but coding efficiency is not the only factor determining the industry’s choice of video coding technology for products and services. The EVC standard offers improved compression efficiency compared to existing video coding standards and is based on the statements of all contributors to the standard who have committed announcing their license terms for the MPEG-5 EVC standard no later than two years after the FDIS publication date.

The MPEG-5 EVC defines two important profiles, including “Baseline profile” and “Main profile”. The “Baseline Profile” contains only technologies that are older than 20 years or otherwise freely available for use in the standard. In addition, the “Main Profile” adds a small number of additional tools, each of which can be either cleanly disabled or switched to the corresponding baseline tool on an individual basis.

It will be interesting to see how EVC profiles (baseline and main) will find its path into products and services given the existing number of codecs already in use (e.g., AVC, HEVC, VP9, AV1) and those still under development but being close to ratification (e.g., VVC, LCEVC). That is, in total, we may end up with about seven video coding formats that probably need to be considered for future video products and services. In other words, the multi-codec scenario I have envisioned some time ago is becoming reality raising some interesting challenges to be addressed in the future.

Research aspects: as for all video coding standards, the most important research aspect is certainly coding efficiency. For EVC it might be also interesting to research its usability of the built-in tool switching mechanism within a practical setup. Furthermore, the multi-codec issue, the ratification of EVC adds another facet to the already existing video coding standards in use or/and under development.

MPEG expands the Coverage of ISO Base Media File Format (ISOBMFF) Family of Standards

At the 130th WG11 (MPEG) meeting, the ISOBMFF family of standards has been significantly amended with new tools and functionalities. The standards in question are as follows:

  • ISO/IEC 14496-12: ISO Base Media File Format;
  • ISO/IEC 14496-15: Carriage of network abstraction layer (NAL) unit structured video in the ISO base media file format;
  • ISO/IEC 23008-12: Image File Format; and
  • ISO /IEC 23001-16: Derived visual tracks in the ISO base media file format.

In particular, three new amendments to the ISOBMFF family have reached their final milestone, i.e., Final Draft Amendment (FDAM):

  1. Amendment 4 to ISO/IEC 14496-12 (ISO Base Media File Format) allows the use of a more compact version of metadata for movie fragments;
  2. Amendment 1 to ISO/IEC 14496-15 (Carriage of network abstraction layer (NAL) unit structured video in the ISO base media file format) adds support of HEVC slice segment data track and additional extractor types for HEVC such as track reference and track groups; and
  3. Amendment 2 to ISO/IEC 23008-12 (Image File Format) adds support for more advanced features related to the storage of short image sequences such as burst and bracketing shots.

At the same time, new amendments have reached their first milestone, i.e., Committee Draft Amendment (CDAM):

  1. Amendment 2 to ISO/IEC 14496-15 (Carriage of network abstraction layer (NAL) unit structured video in the ISO base media file format) extends its scope to newly developed video coding standards such as Essential Video Coding (EVC) and Versatile Video Coding (VVC); and
  2. the first edition of ISO/IEC 23001-16 (Derived visual tracks in the ISO base media file format) allows a new type of visual track whose content can be dynamically generated at the time of presentation by applying some operations to the content in other tracks, such as crossfading over two tracks.

Both are expected to reach their final milestone in mid-2021.

Finally, the final text for the ISO/IEC 14496-12 6th edition Final Draft International Standard (FDIS) is now ready for the ballot after converting MP4RA to the Maintenance Agency. WG11 (MPEG) notes that Apple Inc. has been appointed as the Maintenance Agency and MPEG appreciates its valuable efforts for the many years while already acting as the official registration authority for the ISOBMFF family of standards, i.e., MP4RA (https://mp4ra.org/). The 6th edition of ISO/IEC 14496-12 is expected to be published by ISO by the end of this year.

Research aspects: the ISOBMFF family of standards basically offers certain tools and functionalities to satisfy the given use case requirements. The task of the multimedia systems research community could be to scientifically validate these tools and functionalities with respect to the use cases and maybe even beyond, e.g., try to adopt these tools and functionalities for novel applications and services.

A New Standard for Large Scale Client-specific Streaming with DASH

Historically, in ISO/IEC 23009 (Dynamic Adaptive Streaming over HTTP; DASH), every client has used the same Media Presentation Description (MPD) as it best serves the scalability of the service (e.g., for efficient cache efficiency in content delivery networks). However, there have been increasing requests from the industry to enable customized manifests for more personalized services. Consequently, MPEG has studied a solution to this problem without sacrificing scalability, and it has reached the first milestone of its standardization at the 130th MPEG meeting.

ISO/IEC 23009-8 adds a mechanism to the Media Presentation Description (MPD) to refer to another document, called Session-based Description (SBD), which allows per-session information. The DASH client can use this information (i.e., variables and their values) provided in the SBD to derive the URLs for HTTP GET requests. This standard is expected to reach its final milestone in mid-2021.

Research aspects: SBD’s goal is to enable personalization while maintaining scalability which calls for a tradeoff, i.e., which kind of information to put into the MPD and what should be conveyed within the SBD. This tradeoff per se could be considered already a research question that will be hopefully addressed in the near future.

An overview of the current status of MPEG-DASH can be found in the figure below.

The next MPEG meeting will be from June 29th to July 3rd and will be again an online meeting. I am looking forward to a productive AhG period and an online meeting later this year. I am sure that MPEG will further improve its online meeting capabilities and can certainly become a role model for other groups within ISO/IEC and probably also beyond.

MPEG Column: 129th MPEG Meeting in Brussels, Belgium

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 129th MPEG meeting concluded on January 17, 2020 in Brussels, Belgium with the following topics:

  • Coded representation of immersive media – WG11 promotes Network-Based Media Processing (NBMP) to the final stage
  • Coded representation of immersive media – Publication of the Technical Report on Architectures for Immersive Media
  • Genomic information representation – WG11 receives answers to the joint call for proposals on genomic annotations in conjunction with ISO TC 276/WG 5
  • Open font format – WG11 promotes Amendment of Open Font Format to the final stage
  • High efficiency coding and media delivery in heterogeneous environments – WG11 progresses Baseline Profile for MPEG-H 3D Audio
  • Multimedia content description interface – Conformance and Reference Software for Compact Descriptors for Video Analysis promoted to the final stage

Additional Important Activities at the 129th WG 11 (MPEG) meeting

The 129th WG 11 (MPEG) meeting was attended by more than 500 experts from 25 countries working on important activities including (i) a scene description for MPEG media, (ii) the integration of Video-based Point Cloud Compression (V-PCC) and Immersive Video (MIV), (iii) Video Coding for Machines (VCM), and (iv) a draft call for proposals for MPEG-I Audio among others.

The corresponding press release of the 129th MPEG meeting can be found here: https://mpeg.chiariglione.org/meetings/129. This report focused on network-based media processing (NBMP), architectures of immersive media, compact descriptors for video analysis (CDVA), and an update about adaptive streaming formats (i.e., DASH and CMAF).

MPEG picture at Friday plenary; © Rob Koenen (Tiledmedia).

Coded representation of immersive media – WG11 promotes Network-Based Media Processing (NBMP) to the final stage

At its 129th meeting, MPEG promoted ISO/IEC 23090-8, Network-Based Media Processing (NBMP), to Final Draft International Standard (FDIS). The FDIS stage is the final vote before a document is officially adopted as an International Standard (IS). During the FDIS vote, publications and national bodies are only allowed to place a Yes/No vote and are no longer able to make any technical changes. However, project editors are able to fix typos and make other necessary editorial improvements.

What is NBMP? The NBMP standard defines a framework that allows content and service providers to describe, deploy, and control media processing for their content in the cloud by using libraries of pre-built 3rd party functions. The framework includes an abstraction layer to be deployed on top of existing commercial cloud platforms and is designed to be able to be integrated with 5G core and edge computing. The NBMP workflow manager is another essential part of the framework enabling the composition of multiple media processing tasks to process incoming media and metadata from a media source and to produce processed media streams and metadata that are ready for distribution to media sinks.

Why NBMP? With the increasing complexity and sophistication of media services and the incurred media processing, offloading complex media processing operations to the cloud/network is becoming critically important in order to keep receiver hardware simple and power consumption low.

Research aspects: NBMP reminds me a bit about what has been done in the past in MPEG-21, specifically Digital Item Adaptation (DIA) and Digital Item Processing (DIP). The main difference is that MPEG now targets APIs rather than pure metadata formats, which is a step forward in the right direction as APIs can be implemented and used right away. NBMP will be particularly interesting in the context of new networking approaches including, but not limited to, software-defined networking (SDN), information-centric networking (ICN), mobile edge computing (MEC), fog computing, and related aspects in the context of 5G.

Coded representation of immersive media – Publication of the Technical Report on Architectures for Immersive Media

At its 129th meeting, WG11 (MPEG) published an updated version of its technical report on architectures for immersive media. This technical report, which is the first part of the ISO/IEC 23090 (MPEG-I) suite of standards, introduces the different phases of MPEG-I standardization and gives an overview of the parts of the MPEG-I suite. It also documents use cases and defines architectural views on the compression and coded representation of elements of immersive experiences. Furthermore, it describes the coded representation of immersive media and the delivery of a full, individualized immersive media experience. MPEG-I enables scalable and efficient individual delivery as well as mass distribution while adjusting to the rendering capabilities of consumption devices. Finally, this technical report breaks down the elements that contribute to a fully immersive media experience and assigns quality requirements as well as quality and design objectives for those elements.

Research aspects: This technical report provides a kind of reference architecture for immersive media, which may help identify research areas and research questions to be addressed in this context.

Multimedia content description interface – Conformance and Reference Software for Compact Descriptors for Video Analysis promoted to the final stage

Managing and organizing the quickly increasing volume of video content is a challenge for many industry sectors, such as media and entertainment or surveillance. One example task is scalable instance search, i.e., finding content containing a specific object instance or location in a very large video database. This requires video descriptors that can be efficiently extracted, stored, and matched. Standardization enables extracting interoperable descriptors on different devices and using software from different providers so that only the compact descriptors instead of the much larger source videos can be exchanged for matching or querying. ISO/IEC 15938-15:2019 – the MPEG Compact Descriptors for Video Analysis (CDVA) standard – defines such descriptors. CDVA includes highly efficient descriptor components using features resulting from a Deep Neural Network (DNN) and uses predictive coding over video segments. The standard is being adopted by the industry. At its 129th meeting, WG11 (MPEG) has finalized the conformance guidelines and reference software. The software provides the functionality to extract, match, and index CDVA descriptors. For easy deployment, the reference software is also provided as Docker containers.

Research aspects: The availability of reference software helps to conduct reproducible research (i.e., reference software is typically publicly available for free) and the Docker container even further contributes to this aspect.

DASH and CMAF

The 4th edition of DASH has already been published and is available as ISO/IEC 23009-1:2019. Similar to previous iterations, MPEG’s goal was to make the newest edition of DASH publicly available for free, with the goal of industry-wide adoption and adaptation. During the most recent MPEG meeting, we worked towards implementing the first amendment which will include additional (i) CMAF support and (ii) event processing models with minor updates; these amendments are currently in draft and will be finalized at the 130th MPEG meeting in Alpbach, Austria. An overview of all DASH standards and updates are depicted in the figure below:

ISO/IEC 23009-8 or “session-based DASH operations” is the newest variation of MPEG-DASH. The goal of this part of DASH is to allow customization during certain times of a DASH session while maintaining the underlying media presentation description (MPD) for all other sessions. Thus, MPDs should be cacheable within content distribution networks (CDNs) while additional information should be customizable on a per session basis within a newly added session-based description (SBD). It is understood that the SBD should have an efficient representation to avoid file size issues and it should not duplicate information typically found in the MPD.

The 2nd edition of the CMAF standard (ISO/IEC 23000-19) will be available soon (currently under FDIS ballot) and MPEG is currently reviewing additional tools in the so-called ‘technologies under considerations’ document. Therefore, amendments were drafted for additional HEVC media profiles and exploration activities on the storage and archiving of CMAF contents.

The next meeting will bring MPEG back to Austria (for the 4th time) and will be hosted in Alpbach, Tyrol. For more information about the upcoming 130th MPEG meeting click here.

Click here for more information about MPEG meetings and their developments