JPEG Committee launches a Call for Proposals on Learning based Point Cloud Coding
The 93rd JPEG meeting was held online from 18 to 22 October 2021. The JPEG Committee continued its work on the development of new standardised solutions for the representation of visual information. Notably, the JPEG Committee has decided to release a new call for proposals on point cloud coding based on machine learning technologies that targets both compression efficiency and effective performance for 3D processing as well as machine and computer vision tasks. This activity will be conducted in parallel with JPEG AI standardization. Furthermore, it was also decided to pursue the development of a new standard in the context of the exploration on JPEG Fake News activity.
Considering the response to the Call for Proposals on JPEG Pleno Holography, a first standard for compression of digital holograms has entered its collaborative phase. The response to the call for proposals identified a reliable coding solution for this type of visual information that overcomes the limitations of the state of the art coding solutions for holographic data compression.
The 93rd JPEG meeting had the following highlights:
JPEG Pleno Point Cloud Coding draft of the Call for Proposals;
JPEG JPEG Pleno Holography;
JPEG AI drafts of the Call for Proposals and Common Training and Test Conditions;
JPEG Fake Media defines the standardisation timeline;
JPEG NFT collects use cases;
JPEG AIC explores standardisation of near-visually lossless quality models;
JPEG XS new profiles and sub-levels;
JPEG XL explores fixed point implementations;
JPEG DNA considers image quaternary representations suitable for DNA storage.
The following provides an overview of the major achievements of the 93rd JPEG meeting.
JPEG Pleno Point Cloud Coding
JPEG Pleno is working towards the integration of various modalities of plenoptic content under a single and seamless framework. Efficient and powerful point cloud representation is a key feature within this vision. Point cloud data supports a wide range of applications for human and machine consumption including autonomous driving, computer-aided manufacturing, entertainment, cultural heritage preservation, scientific research and advanced sensing and analysis. During the 93rd JPEG meeting, the JPEG Committee released a Draft Call for Proposals on JPEG Pleno Point Cloud Coding. This call addresses learning-based coding technologies for point cloud content and associated attributes with emphasis on both human visualization and decompressed/reconstructed domain 3D processing and computer vision with competitive compression efficiency compared to point cloud coding standards in common use, with the goal of supporting a royalty-free baseline. A Final Call for Proposals on JPEG Pleno Point Cloud Coding is planned to be released in January 2022.
JPEG Pleno Holography
At its 93rd JPEG meeting, the committee reviewed the response to the Call for Proposals on JPEG Pleno Holography, which is the first standardization effort aspiring to a versatile solution for efficient compression of holograms for a wide range of applications such as holographic microscopy, tomography, interferometry, printing and display and their associated hologram types. The coding technology selected provides excellent rate-distortion performance for lossy coding, in addition, to supporting lossless coding and random access via a space-frequency segmentation approach. The selected technology will serve as a baseline for the standard specification to be developed. This final specification is planned to be published as an international standard in early 2024.
JPEG AI scope is the creation of a learning-based image coding standard offering a single-stream, compact compressed domain representation, targeting both human visualization with significant compression efficiency improvement over image coding standards in common use at equivalent subjective quality, and effective performance for image processing and computer vision tasks.
During the 93rd JPEG meeting, the JPEG AI project activities were focused on the analysis of the results of the exploration studies as well as refinements and improvements on common training and test conditions, especially the performance assessment of the image classification and super-resolution tasks. A related topic that received much attention was device interoperability which was thoroughly analyzed and discussed. Also, the JPEG AI Third Draft Call for Proposals is now available with improvements on evaluation conditions and proposal composition and requirements. A final call for proposals is expected to be issued at the 94th meeting (17-21 January 2022) and to produce a first Working Draft by October 2022.
JPEG Fake Media
The scope of the JPEG Fake Media exploration is to assess standardization needs to facilitate secure and reliable annotation of media asset creation and modifications in good-faith usage scenarios as well as in those with malicious intent. At the 93rd meeting, the JPEG Committee released an updated version of the “JPEG Fake Media Context, Use Cases and Requirements” document. The new version includes an extended set of definitions and a new section related to threat vectors. In addition, the requirements have been substantially enhanced, in particular those related to media asset authenticity and integrity. Given the progress of the exploration, an initial timeline for the standardization process was proposed:
Non-Fungible Tokens (NFTs) have recently attracted substantial interest. Numerous digital assets associated with NFTs are encoded in existing JPEG formats or can be represented in JPEG-developed current and future representations. Additionally, several trusts and security concerns have been raised about NFTs and the underlying digital assets. The JPEG Committee has established the JPEG NFT exploration initiative to better understand user requirements for media formats. JPEG NFT’s mission is to provide effective specifications that enable various applications that rely on NFTs applied to media assets. The standard shall be secure, trustworthy, and environmentally friendly, enabling an interoperable ecosystem based on NFT within or across applications. The group seeks to engage stakeholders from various backgrounds, including technical, legal, creative, and end-user communities, to develop use cases and requirements. On October 12th, 2021, a second JPEG NFT Workshop was organized in this context. The presentations and video footage from the workshop are now available on the JPEG website. In January 2022, a third workshop will focus on commonalities with the JPEG Fake Media exploration. JPEG encourages interested parties to visit its website frequently for the most up-to-date information and to subscribe to the JPEG NFT Ad Hoc Group’s (AhG) mailing list to participate in this effort.
During the 93rd JPEG Meeting, work was initiated on the first draft of a document on use cases and requirements regarding Assessment of Image Coding. The scope of AIC activities was defined to target standards or best practices with respect to subjective and objective image quality assessment methodologies that target a range from high quality to near-visually lossless quality. This is a range of visual qualities where artefacts are not noticeable by an average non-expert viewer without presenting an original reference image but are detectable by a flicker test.
The JPEG Committee created an updated document “Use Cases and Requirements for JPEG XS V3.0”. It describes new use cases and refines the requirements to allow improving the coding efficiency and to provide additional functionality w.r.t. HDR content, random access and more. In addition, the JPEG XS second editions of Part 1 (Core coding system), Part 2 (Profiles and buffer models), and Part 3 (Transport and container formats) went to the final ballot before ISO publication stage. In the meantime, the Committee continued working on the second editions of Part 4 (Conformance Testing) and Part 5 (Reference Software), which are now ready as Draft International Standards. In addition, the decision was made to create an amendment to Part 2 that will add a High420.12 profile and a new sublevel at 4 bpp, to swiftly address market demands.
Part 3 (Conformance testing) has proceeded to DIS stage. Core experiments were discussed to investigate hardware coding, in particular fixed-point implementations, and will be continued. Work on a second edition of Part 1 (Core coding system) was initiated. With preliminary support in major web browsers, image viewing and editing software, JPEG XL is ready for wide-scale adoption.
The JPEG Committee has continued its exploration of the coding of images in quaternary representations, as is particularly suitable for DNA storage. An important progress in this activity is the implementation of experimentation software to simulate the coding/decoding of images in quaternary code. A thorough explanation of the package has been created, and a wiki for documentation and a link to the code can be found here. A successful fifth workshop on JPEG DNA was held prior to the 93rd JPEG meeting and a new version of the JPEG DNA overview document was issued and is now publicly available. It was decided to continue this exploration by validating and extending the JPEG DNA experimentation software to simulate an end-to-end image storage pipeline using DNA for future exploration experiments, as well as improving the JPEG DNA overview document. Interested parties are invited to consider joining the effort by registering to the mailing list of JPEG DNA.
“Aware of the importance of timely standards in AI-powered imaging applications, the JPEG Committee is moving forward with two concurrent calls for proposals addressing both image and point cloud coding based on machine learning”, said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.
Upcoming JPEG meetings are planned as follows:
No 94, to be held online during 17-21 January 2022.
The 92nd JPEG meeting was held online from 7 to 13 July 2021. This meeting has consolidated JPEG’s exploration on standardisation needs related to Non-Fungible Tokens (NFTs). Recently, there has been a growing interest in the use of NFTs in many applications, notably in the trade of digital art and collectables.
Other notable results of the 92nd JPEG meeting have been the release of an update to the Call for Proposals on JPEG Pleno Holography and an initiative to revisit opportunities for standardisation of image quality assessment methodologies and metrics.
The 92nd JPEG meeting had the following highlights:
JPEG NFT exploration;
JPEG Fake Media defines context, use cases and requirements;
JPEG Pleno Holography call for proposals;
JPEG AI prepare Call for Proposals;
JPEG AIC explores new quality models;
The following provides an overview of the major achievements of the 92nd JPEG meeting.
JPEG NFT exploration
Recently, Non-Fungible Tokens (NFTs) have garnered considerable interest. Numerous digital assets linked with NFTs are either encoded in existing JPEG formats or can be represented in JPEG-developed current and forthcoming representations. Additionally, various trust and security concerns have been raised about NFTs and the digital assets on which they rely. To better understand user requirements for media formats, the JPEG Committee has launched the JPEG NFT exploration initiative. The mission of JPEG NFT is to provide effective specifications that enable various applications that rely on NFTs applied to media assets. A JPEG NFT standard shall be secure, trustworthy, and eco-friendly, enabling an interoperable ecosystem based on NFTs within or across applications. The committee strives to engage stakeholders from diverse backgrounds, including the technical, legal, artistic, and end-user communities, to establish use cases and requirements. In this context, the first JPEG NFT Workshop was held on July 1st, 2021. The workshop’s presentations and video footage are now accessible on the JPEG website, and a second workshop will be held in the near future. JPEG encourages interested parties to frequently visit its website for the most up-to-date information and to subscribe to the mailing list of the JPEG NFT Ad Hoc Group (AhG) in order to participate in this effort.
JPEG Fake Media
The scope of the JPEG Fake Media exploration is to assess standardisation needs to facilitate secure and reliable annotation of media asset creation and modifications in good-faith usage scenarios as well as in those with malicious intent. At the 92nd meeting, the JPEG Committee released an updated version of the “JPEG Fake Media Context, Use Cases and Requirements” document. This new version includes an improved and extended set of requirements covering three main categories: media creation and modification descriptions, metadata embedding & referencing and authenticity verification. In addition, the document contains several improvements including an extended set of definitions covering key terminologies. The JPEG Committee welcomes feedback to the document and invites interested experts to join the JPEG Fake Media AhG mailing list to get involved in the discussion.
Currently, a Call for Proposals is open for JPEG Pleno Holography, which is the first standardisation effort aspiring to provide a versatile solution for efficient compression of holograms for a wide range of applications such as holographic microscopy, tomography, interferometry, printing, and display, and their associated hologram types. Key desired functionalities include support for both lossy and lossless coding, scalability, random access, and integration within the JPEG Pleno system architecture, with the goal of supporting a royalty-free baseline. In support of this Call for Proposals, a Common Test Conditions document and accompanying software have been released, enabling elaborate stress testing from the rate-distortion, functionality and visual rendering quality perspectives. For the latter, numerical reconstruction software has been released enabling viewport rendering from holographic data. References to software and documentation can be found on the JPEG website.
JPEG Pleno Point Cloud continues to progress towards a Call for Proposals on learning-based point cloud coding solutions with the release at the 92nd JPEG meeting of an updated Use Cases and Requirements document. This document details how the JPEG Committee envisions learning-based point cloud coding solutions meeting the requirements of rapidly emerging use cases in this field. This document continues the focus on solutions supporting scalability and random access while detailing new requirements for 3D processing and computer vision tasks performed in the compressed domain to support emerging applications such as autonomous driving and robotics.
JPEG AI scope is the creation of a learning-based image coding standard offering a single-stream, compact compressed domain representation, targeting both human visualisation with significant compression efficiency improvement over image coding standards in common use at equivalent subjective quality, and effective performance for image processing and computer vision tasks. At the 92nd JPEG meeting, several activities were carried out towards the launch of the final JPEG AI Call for Proposals. This has included improvements of the training and test conditions for learning-based image coding, especially in the areas of the JPEG AI training dataset, target bitrates, computation of quality metrics, subjective quality evaluation, and complexity assessment. A software package called the JPEG AI objective quality assessment framework, with a reference implementation of all objective quality metrics, has been made available. Moreover, the results of the JPEG AI exploration experiments for image processing and computer vision tasks defined at the previous 91st JPEG meeting were presented and discussed, including their impact on Common Test Conditions.
Moreover, the JPEG AI Use Cases and Requirements were refined with two new core requirements regarding reconstruction reproducibility and hardware platform independence. The second draft of the Call for Proposals was produced and the timeline of the JPEG AI work item was revised. It was decided that the final Call for Proposals will be issued as an outcome of the 94th JPEG Meeting. The deadline for expression of interest and registration is 5 February 2022 and the submission of bitstreams and decoded images for the test dataset are due on 30 April 2022.
Image quality assessment remains an essential component in the development of image coding technologies. A new activity has been initiated in the JPEG AIC framework to study the assessment of image coding quality, with particular attention to crowd-sourced subjective evaluation methodologies and image coding at fidelity targets relevant for end-user image delivery on the web and consumer-grade photo archival.
JUMBF (ISO/IEC 19566-5 AMD1) and JPEG 360 (ISO/IEC 19566-6 AMD1) are now published standards available through ISO. A request to create the second amendment of JUMBF (ISO/IEC 19566-5) has been produced; this amendment will further extend the functionality to cover use cases and requirements under development in the JPEG Fake Media exploration initiative. The Systems software efforts are progressing on the development of a file parser for most JPEG standards and will include support for metadata within JUMBF boxes. Interested parties are invited to subscribe to the mailing list of the JPEG Systems AhG in order to monitor and contribute to JPEG Systems activities.
JPEG XS aims at the standardization of a visually lossless low-latency and lightweight compression that can be used as a mezzanine codec in various markets. With the second editions of Part 1 (core coding system), Part 2 (profiles and buffer models), and Part 3 (transport and container formats) under ballot to become International Standards, the work during this JPEG meeting went into the second edition of Part 4 (Conformance Testing) and Part 5 (Reference Software). The second edition primarily brings new coding and signalling capabilities to support raw Bayer sensor content, mathematically lossless coding of images with up to 12 bits per colour component sample, and 4:2:0-sampled image content. In addition, the JPEG Committee continued its initial exploration to study potential future improvements to JPEG XS, while still honouring its low-complexity and low-latency requirements. Among such improvements are better support for high dynamic range (HDR), better support for raw Bayer sensor content, and overall improved compression efficiency. The compression efficiency work also targets improved handling of computer-screen content and artificially-generated rendered content.
JPEG XL aims at standardization for image coding that offers high compression efficiency, along with features desirable for web distribution and efficient compression of high-quality images. JPEG XL Part 3 (Conformance testing) has been promoted to the Committee Draft stage of the ISO/IEC approval process. New core experiments were defined to investigate hardware-based coding, in particular including fixed-point implementations. With preliminary support in major web browsers, image viewing and manipulation libraries and tools, JPEG XL is ready for wide-scale adoption.
The JPEG Committee has continued its exploration of the coding of images in quaternary representations, as is particularly suitable for DNA storage. Two new use cases were identified as well as the sequencing noise models and simulators to use for DNA digital storage. There was a successful presentation of the fourth workshop by the stakeholders, and a new version of the JPEG DNA overview document was issued and is now publicly available. It was decided to continue this exploration by organising the fifth workshop and conducting further outreach to stakeholders, as well as to continue improving the JPEG DNA overview document. Moreover, it was also decided to produce software to simulate an end-to-end image storage pipeline using DNA storage for future exploration experiments. Interested parties are invited to consider joining the effort by registering to the mailing list of JPEG DNA.
“The JPEG Committee is considering standardisation needs for timely and effective specifications that can best support the use of NFTs in applications where media assets can be represented with JPEG formats.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.
Upcoming JPEG meetings are planned as follows:
No 93, to be held online during 18-22 October 2021.
No 94, to be held online during 17-21 January 2022.
The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.
The 135th MPEG meeting was once again held as an online meeting, and the official press release can be found here and comprises the following items:
MPEG Video Coding promotes MPEG Immersive Video (MIV) to the FDIS stage
Verification tests for more application cases of Versatile Video Coding (VVC)
MPEG Systems reaches first milestone for Video Decoding Interface for Immersive Media
MPEG Systems further enhances the extensibility and flexibility of Network-based Media Processing
MPEG Systems completes support of Versatile Video Coding and Essential Video Coding in High Efficiency Image File Format
Two MPEG White Papers:
Versatile Video Coding (VVC)
MPEG-G and its application of regulation and privacy
In this column, I’d like to focus on MIV and VVC including systems-related aspects as well as a brief update about DASH (as usual).
MPEG Immersive Video (MIV)
At the 135th MPEG meeting, MPEG Video Coding has promoted the MPEG Immersive Video (MIV) standard to the Final Draft International Standard (FDIS) stage. MIV was developed to support compression of immersive video content in which multiple real or virtual cameras capture a real or virtual 3D scene. The standard enables storage and distribution of immersive video content over existing and future networks for playback with 6 Degrees of Freedom (6DoF) of view position and orientation.
From a technical point of view, MIV is a flexible standard for multiview video with depth (MVD) that leverages the strong hardware support for commonly used video codecs to code volumetric video. The actual views may choose from three projection formats: (i) equirectangular, (ii) perspective, or (iii) orthographic. By packing and pruning views, MIV can achieve bit rates around 25 Mb/s and a pixel rate equivalent to HEVC Level 5.2.
The MIV standard is designed as a set of extensions and profile restrictions for the Visual Volumetric Video-based Coding (V3C) standard (ISO/IEC 23090-5). The main body of this standard is shared between MIV and the Video-based Point Cloud Coding (V-PCC) standard (ISO/IEC 23090-5 Annex H). It may potentially be used by other MPEG-I volumetric codecs under development. The carriage of MIV is specified through the Carriage of V3C Data standard (ISO/IEC 23090-10).
At the same time, MPEG Systems has begun developing the Video Decoding Interface for Immersive Media (VDI) standard (ISO/IEC 23090-13) for a video decoders’ input and output interfaces to provide more flexible use of the video decoder resources for such applications. At the 135th MPEG meeting, MPEG Systems has reached the first formal milestone of developing the ISO/IEC 23090-13 standard by promoting the text to Committee Draft ballot status. The VDI standard allows for dynamic adaptation of video bitstreams to provide the decoded output pictures in such a way so that the number of actual video decoders can be smaller than the number of the elementary video streams to be decoded. In other cases, virtual instances of video decoders can be associated with the portions of elementary streams required to be decoded. With this standard, the resource requirements of a platform running multiple virtual video decoder instances can be further optimized by considering the specific decoded video regions that are to be actually presented to the users rather than considering only the number of video elementary streams in use.
Research aspects: It seems that visual compression and systems standards enabling immersive media applications and services are becoming mature. However, the Quality of Experience (QoE) of such applications and services is still in its infancy. The QUALINET White Paper on Definitions of Immersive Media Experience (IMEx) provides a survey of definitions of immersion and presence which leads to a definition of Immersive Media Experience (IMEx). Consequently, the next step is working towards QoE metrics in this domain that requires subjective quality assessments imposing various challenges during the current COVID-19 pandemic.
Versatile Video Coding (VVC) updates
The third round of verification testing for Versatile Video Coding (VVC) has been completed. This includes the testing of High Dynamic Range (HDR) content of 4K ultra-high-definition (UHD) resolution using the Hybrid Log-Gamma (HLG) and Perceptual Quantization (PQ) video formats. The test was conducted using state-of-the-art high-quality consumer displays, emulating an internet streaming-type scenario.
On average, VVC showed on average approximately 50% bit rate reduction compared to High Efficiency Video Coding (HEVC).
Additionally, the ISO/IEC 23008-12 Image File Format has been amended to support images coded using Versatile Video Coding (VVC) and Essential Video Coding (EVC).
Research aspects: The results of the verification tests are usually publicly available and can be used as a baseline for future improvements of the respective standards including the evaluation thereof. For example, the tradeoff compression efficiency vs. encoding runtime (time complexity) for live and video on-demand scenarios is always an interesting research aspect.
The latest MPEG-DASH Update
Finally, I’d like to provide a brief update on MPEG-DASH! At the 135th MPEG meeting, MPEG Systems issued a draft amendment to the core MPEG-DASH specification (i.e., ISO/IEC 23009-1) that provides further improvements of Preroll which is renamed to Preperiod and it will be further discussed during the Ad-hoc Group (AhG) period (please join the dash email list for further details/announcements). Additionally, this amendment includes some minor improvements for nonlinear playback. The so-called Technologies under Consideration (TuC) document comprises new proposals that did not yet reach consensus for promotion to any official standards documents (e.g., amendments to existing DASH standards or new parts). Currently, proposals for minimizing initial delay are discussed among others. Finally, libdash has been updated to support the MPEG-DASH schema according to the 5th edition.
An updated overview of DASH standards/features can be found in the Figure below.
Research aspects: The informative aspects of MPEG-DASH such as the adaptive bitrate (ABR) algorithms have been subject to research for many years. New editions of the standard mostly introduced incremental improvements but disruptive ideas rarely reached the surface. Perhaps it’s time to take a step back and re-think how streaming should work for todays and future media applications and services.
The 136th MPEG meeting will be again an online meeting in October 2021 but MPEG is aiming to meet in-person again in January 2021 (if possible). Click here for more information about MPEG meetings and their developments.
The perceived visual quality is of utmost importance in the context of visual media compression, such as 2D, 3D, immersive video, and point clouds. The trade-off between compression efficiency and computational/implementation complexity has a crucial impact on the success of a compression scheme. This specifically holds for the development of visual media compression standards which typically aims at maximum compression efficiency using state-of-the-art coding technology. In MPEG, the subjective and objective assessment of visual quality has always been an integral part of the standards development process. Due to the significant effort of formal subjective evaluations, the standardization process typically relies on such formal tests in the starting phase and for verification while in the development phase objective metrics are used. In the new MPEG structure, established in 2020, a dedicated advisory group has been installed for the purpose of providing, maintaining, and developing visual quality assessment methods suitable for use in the standardization process.
This column lays out the scope and tasks of this advisory group and reports on its first achievements and developments. After a brief overview of the organizational structure, current projects are presented, and initial results are presented.
MPEG: A Group of Groups in ISO/IEC JTC 1/SC 29
The Moving Pictures Experts Groups (MPEG) is a standardization group that develops standards for coded representation of digital audio, video, 3D Graphics and genomic data. Since its establishment in 1988, the group has produced standards that enable the industry to offer interoperable devices for an enhanced digital media experience . In its new structure as defined in 2020, MPEG is established as a set of Working Groups (WGs) and Advisory Groups (AGs) in Sub-Committee (SC) 29 “Coding of audio, picture, multimedia and hypermedia information” of the Joint Technical Committee (JTC) 1 of ISO (International Standardization Organization) and IEC (International Electrotechnical Commission). The lists of WGs and AGs in SC 29 are shown in Figure 1. Besides MPEG, SC 29 also includes and JPEG (the Joint Photographic Experts Group, WG 1) as well as an Advisory Group for Chair Support Team and Management (AG 1) and an Advisory Group for JPEG and MPEG Collaboration (AG 4), thereby covering the wide field of media compression and transmission. Within this structure, the focus of AG 5 MPEG Visual Quality Assessment (MPEG VQA) is on interaction and collaboration with the working groups directly working on MPEG visual media compression, including WG 4 (Video Coding), WG 5 (JVET), and WG 7 (3D Graphics).
Setting the Field for MPEG VQA: The Terms of Reference
SC 29 has defined Terms of Reference (ToR) for all its WGs and AGs. The scope of AG5 MPEG Visual Quality Assessment is to support needs for quality assessment testing in close coordination with the relevant MPEG Working Groups, dealing with visual quality, with the following activities :
to assess the visual quality of new technologies to be considered to begin a new standardization project;
to contribute to the definition of Calls for Proposals (CfPs) for new standardization work items;
to select and design subjective quality evaluation methodologies and objective quality metrics for the assessment of visual coding technologies, e.g., in the context of a Call for Evidence (CfE) and CfP;
to contribute to the selection of test material and coding conditions for a CfP;
to define the procedures useful to assess the visual quality of the submissions to a CfP;
to design and conduct visual quality tests, process, and analyze the raw data, and make the report of the evaluation results available conclusively;
to support in the assessment of the final status of a standard, verifying its performance compared to the existing standard(s);
to maintain databases of test material;
to recommend guidelines for selection of testing laboratories (verifying their current capabilities);
to liaise with ITU and other relevant organizations on the creation of new Quality Assessment standards or the improvement of the existing ones.
Way of Working
Given the fact that MPEG Visual Quality Assessment is an advisory group, and given the above-mentioned ToR, the goal of AG5 is not to produce new standards on its own. Instead, AG5 strives to communicate and collaborate with relevant SDOs in the field, applying existing standards and recommendations and potentially contributing to further development by reporting results and working practices to these groups.
In terms of meetings, AG5 adopts the common MPEG meeting cycle of typically four MPEG AG/WG meetings per year, which -due to the ongoing pandemic situation- so far have all been held online. The meetings are held to review the progress of work, agree on recommendations, and decide on further plans. During the meeting, AG5 closely collaborates with the MPEG WGs and conducts experts viewing sessions in various MPEG standardization activities. The focus of such activities includes the preparation of new standardization projects, the performance verification of completed projects, as well as support of ongoing projects, where frequent subjective evaluation results are required in the decision process. Between meetings, AG5 work is carried out in the context of Ad-hoc Groups (AhGs) which are established from meeting to meeting with well-defined tasks.
Due to the broad field of ongoing standardization activities, AG5 has established so-called focus groups which cover the relevant fields of development. The focus group structure and the appointed chairs are shown in Figure 2.
The focus groups are mandated to coordinate with other relevant MPEG groups and other standardization bodies on activities of mutual interest, and to facilitate the formal and informal assessment of the visual media type under their consideration. The focus groups are described as follows:
Standard Dynamic Range Video (SDR): This is the ‘classical’ video quality assessment domain. The group strives to support, design, and conduct testing activities on SDR content at any resolution and coding condition, and to maintain existing testing methods and best practice procedures.
High Dynamic Range Video (HDR): The focus group on HDR strives to facilitate the assessment of HDR video quality using different devices with combinations of spatial resolution, colour gamut, and dynamic range, and further to maintain and refine methodologies for measuring HDR video quality. A specific focus of the starting phase was on the preparation of the verification tests for Versatile Video Coding (VVC, ISO/IEC 23090-3 / ITU-T H.266).
360° Video: The omnidirectional characteristics of 360° video content have to be taken into account for visual quality assessment. The groups’ focus is on continuing the development of 360° video quality assessment methodologies, including those using head-mounted devices. Like with the focus group on HDR, the verification tests for VVC had priority in the starting phase.
Immersive Video (MPEG Immersive Video, MIV): Since MIV allows for movement of the user at six degrees of freedom, the assessment of this type of content bears even more challenges and the variability of the user’s perception of the media has to be factored in. Given the absence of an original reference or ground truth, for the synthetically rendered scene, objective evaluation with conventional objective metrics is a challenge. The focus group strives to develop appropriate subjective expert viewing methods to support the development process of the standard and also evaluates and improve objective metrics in the context of MIV.
Ad hoc Groups
AG5 currently has three AhGs defined which are briefly presented with their mandates below:
Quality of immersive visual media (chaired by Christian Timmerer of AAU/Bitmovin, Joel Jung of Tencent, and Aljosa Smolic of Trinity College Dublin): Study Draft Overview of Quality Metrics and Methodologies for Immersive Visual Media (AG 05/N00013) with respect to new updates presented at this meeting; Solicit inputs for subjective evaluation methods and objective metrics for immersive video (e.g., 360, MIV, V-PCC, G-PCC); Organize public online workshop(s) on Quality of Immersive Media: Assessment and Metrics.
Learning-based quality metrics for 2D video (chaired by Yan Ye of Alibaba and Mathias Wien of RWTH Aachen University): Compile and maintain a list of video databases suitable and available to be used in AG5’s studies; Compile a list of learning-based quality metrics for 2D video to be studied; Evaluate the correlation between the learning-based quality metrics and subjective quality scores in the databases;
Guidelines for subjective visual quality evaluation (chaired by Mathias Wien of RWTH Aachen University, Lu Yu of Zhejiang University and Convenor of MPEG Video Coding (ISO/IEC JTC1 SC29/WG4), and Joel Jung of Tencent): Prepare the third draft of the Guidelines for Verification Testing of Visual Media Specifications; Prepare the second draft of the Guidelines for remote experts viewing test methods for use in the context of Ad-hoc Groups, and Core or Exploration Experiments.
AG 5 First Achievements
Reports and Guidelines
The results of the work of the AhGs are aggregated in AG5 output documents which are public (or will become public soon) in order to allow for feedback also from outside of the MPEG community.
The AhG on “Quality for Immersive Visual Media” maintains a report “Overview of Quality Metrics and Methodologies for Immersive Visual Media”  which documents the state-of-the-art in the field and shall serve as a reference for MPEG working groups in their work on compression standards in this domain. The AhG further organizes a public workshop on “Quality of Immersive Media: Assessment and Metrics” which takes place in an online form at the beginning of October 2021 . The scope of this workshop is to raise awareness about MPEG efforts in the context of quality of immersive visual media and to invite experts outside of MPEG to present new techniques relevant to the scope of this workshop.
The AhG on “Guidelines for Subjective Visual Quality Evaluation” currently develops two guideline documents supporting the MPEG standardization work. The “Guidelines for Verification Testing of Visual Media Specifications”  define the process of assessing the performance of a completed standard after its publication. The concept of verification testing has already been established MPEG working practice for its media compression standards since the 1990ties. The document is intended to formalize the process, describe the steps and conditions for the verification tests, and set the requirements to meet MPEG procedural quality expectations.
The AhG has further released a first draft of “Guidelines for Remote Experts Viewing Sessions” with the intention to establish a formalized procedure for ad-hoc generation subjective test results as input to the standards development process . This activity has been driven by the ongoing pandemic situation which forced MPEG to continue its work in virtual online meetings since early 2020. The procedure for remote experts viewing is intended to be applied during the (online) meeting phase or in the AhG phase and to provide measurable and reproducible subjective results in order to be input to the decision-making process in the project under consideration.
With Essential Video Coding (EVC) , Low Complexity Enhancement Video Coding (LCEVC)  of ISO/IEC, and the joint coding standard Versatile Video Coding (VVC) of ISO/IEC and ITU-T , a significant number of new video coding standards has been recently released. Since its first meeting in October 2020, AG5 has been engaged in the preparation and conduction of verification tests for these video coding specifications. Further verification tests for MPEG Immersive Video (MIV) and Video-based Point Cloud Compression (V-PCC)  are under preparation and more are to come. Results of the verification test activities which have been completed in the first year of AG5 are summarized in the following subsections. All reported results have been achieved by formal subjective assessments according to established assessment protocols  and performed by qualified test laboratories. The bitstreams were generated with reference software encoders of the specification under consideration using established encoder configurations with comparable settings for both, the reference and the evaluated coding schemes. It has to be noted that all testing had to be done under the constrained conditions of the ongoing pandemic situation which induced an additional challenge for the test laboratories in charge.
MPEG-5 Part 1: Essential Video Coding (EVC)
The EVC standard was developed with the goal to provide a royalty-free Baseline profile and a Main profile with higher compression efficiency compared to High-Efficiency Video Coding (HEVC) . Verification tests were conducted for Standard Dynamic Range (SDR) and high dynamic range (HDR, BT.2100 PQ) video content at both, HD (1920×1080 pixels) and UHD (3840×2160 pixels) resolution. The tests revealed around 40% bitrate savings at a comparable visual quality for the Main profile when compared to HEVC, and around 36% bitrate saving for the Baseline profile when compared to Advanced Video Coding (AVC) , both for SDR content . For HDR PQ content, the Main profile provided around 35% bitrate savings for both resolutions .
MPEG-5 Part 2: Low-Complexity Enhancement Video Coding (LCEVC)
The LCEVC standard follows a layered approach where an LCEVC enhancement layer is added to a lower resolution base layer of an existing codec in order to achieve the full resolution video . Since the base layer codec operates at a lower resolution and the separate enhancement layer decoding process is relatively lightweight, the computational complexity of the decoding process is typically lower compared to decoding of the full resolution with the base layer codec. The addition of the enhancement layer would typically be provided on top of the established base layer decoder implementation by an additional decoding entity, e.g., in a browser.
For verification testing, LCEVC was evaluated using AVC, HEVC, EVC, and VVC base layer bitstreams at half resolution, and comparing the performance to the respective schemes with full resolution coding as well half-resolution coding with a simple upsampling tool. For UHD resolution, the bitrate savings for LCEVC at comparable visual quality were at 46% when compared to full resolution AVC and 31% when compared to full resolution HEVC. The comparison to the more recent and more efficient EVC and VVC coding schemes led to partially overlapping confidence intervals of the subjective scores of the test subjects. The curves still revealed some benefits for the application of LCEVC. The gains compared to half-resolution coding with simple upsampling provided approximately 28%, 34%, 38%, and 33% bitrate savings at comparable visual quality, demonstrating the benefit of LCEVC enhancement layer coding compared to straight-forward plain upsampling .
MPEG-I Part 3 / ITU-T H.266: Versatile Video Coding (VVC)
VVC is the most recent video coding standard in the historical line of joint specifications of ISO/IEC and ITU-T, such as AVC and HEVC. The development focus for VVC was on compression efficiency improvement at a moderate increase of decode complexity as well as the versatility of the design . Versatility features include tools designed to address HDR, WCG, resolution-adaptive multi-rate video streaming services, 360-degree immersive video, bitstream extraction and merging, temporal scalability, gradual decoding refresh, and multilayer coding to deliver layered video content to support application features such as multiview, alpha maps, depth maps, and spatial and quality scalability.
A series of verification tests have been conducted covering SDR UHD and HD, HDR PQ and HLG, as well as 360° video contents . An early open-source encoder (VVenC, ) was additionally assessed in some categories. For SDR coding, both, the VVC reference software (VTM) and the open-source VVenC were evaluated against the HEVC reference software (HM). The results revealed bit rate savings of around 46% (SDR UHD, VTM and VVenC), 50% (SDR HD, VTM and VVenC), 49% (HDR UHD, PQ and HLG), 52%, and 50-56% (360° with different projection formats) at a similar visual quality compared to HEVC. In Figure 3, pooled MOS (Mean Opinion Score) over bit rate points for the mentioned categories are provided. The MOS values range from 10 (imperceptible impairments) down to 0 (everywhere severely annoying impairments). Pooling was done by computing the geometric mean of the bitrates and the arithmetic mean of the MOS scores across the test sequences of each test category. The results reveal a consistent benefit of VVC over its predecessor HEVC in terms of visual quality over the required bitrate.
This column presented an overview of the organizational structure and the activities of the Advisory Group on MPEG Visual Quality Assessment, ISO/IEC JTC 1/SC 29/AG 5, which has been formed about one year ago. The work items of AG5 include the application, documentation, evaluation, and improvement of objective quality metrics and subjective quality assessment procedures. In its first year of existence, the group has produced an overview on immersive quality metrics, draft guidelines for verification tests and for remote experts viewing sessions as well as reports of formal subjective quality assessments for the verification tests of EVC, LCEVC, and VVC. The work of the group will continue towards studying and developing quality metrics suitable for the assessment tasks emerging by the development of the various MPEG visual media coding standards and towards subjective quality evaluation in upcoming and future verification tests and new standardization projects.
 MPEG website, https://www.mpegstandards.org/.  ISO/IEC JTC1 SC29, “Terms of Reference of SC 29/WGs and AGs,” Doc. SC29N19020, July 2020.  ISO/IEC JTC1 SC29/AG5 MPEG VQA, “Draft Overview of Quality Metrics and Methodologies for Immersive Visual Media (v2)”, doc. AG5N13, 2nd meeting: January 2021.  MPEG AG 5 Workshop on Quality of Immersive Media: Assessment and Metrics, https://multimediacommunication.blogspot.com/2021/08/mpeg-ag-5-workshop-on-quality-of.html, October 5th, 2021.  ISO/IEC JTC1 SC29/AG5 MPEG VQA, “Guidelines for Verification Testing of Visual Media Specifications (draft 2)”, doc. AG5N30, 4th meeting: July 2021.  ISO/IEC JTC1 SC29/AG5 MPEG VQA, “Guidelines for remote experts viewing sessions (draft 1)”, doc. AG5N31, 4th meeting: July 2021.  ISO/IEC 23094-1:2020, “Information technology — General video coding — Part 1: Essential video coding”, October 2020.  ISO/IEC 23094-2, “Information technology – General video coding — Part 2: Low complexity enhancement video coding”, September 2021.  ISO/IEC 23090-3:2021, “Information technology — Coded representation of immersive media — Part 3: Versatile video coding”, February 2021.  ITU-T H.266, “Versatile Video Coding“, August 2020. https://www.itu.int/rec/recommendation.asp?lang=en&parent=T-REC-H.266-202008-I.  ISO/IEC 23090-5:2021, “Information technology — Coded representation of immersive media — Part 5: Visual volumetric video-based coding (V3C) and video-based point cloud compression (V-PCC)”, June 2021.  ITU-T P.910 (2008), Subjective video quality assessment methods for multimedia applications.  ITU-R BT.500-14 (2019), Methodologies for the subjective assessment of the quality of television images.  Fraunhofer HHI VVenC software repository. [Online]. Available: https://github.com/fraunhoferhhi/vvenc.  K. Choi, J. Chen, D. Rusanovskyy, K.-P. Choi and E. S. Jang, “An overview of the MPEG-5 essential video coding standard [standards in a nutshell]”, IEEE Signal Process. Mag., vol. 37, no. 3, pp. 160-167, May 2020.  ISO/IEC 23008-2:2020, “Information technology — High efficiency coding and media delivery in heterogeneous environments — Part 2: High efficiency video coding”, August 2020.  ITU-T H.265, “High Efficiency Video Coding”, August 2021.  ISO/IEC 14496-10:2020, “Information technology — Coding of audio-visual objects — Part 10: Advanced video coding”, December 2020.  ITU-T H.264, “Advanced Video Coding”, August 2021.  ISO/IEC JTC1 SC29/WG4, “Report on Essential Video Coding compression performance verification testing for SDR Content”, doc WG4N47, 2nd meeting: January 2021.  ISO/IEC JTC1 SC29/WG4, “Report on Essential Video Coding compression performance verification testing for HDR/WCG content”, doc WG4N30, 1st meeting: October 2020.  G. Meardi et al., “MPEG-5—Part 2: Low complexity enhancement video coding (LCEVC): Overview and performance evaluation”, Proc. SPIE, vol. 11510, pp. 238-257, Aug. 2020.  ISO/IEC JTC1 SC29/WG4, “Verification Test Report on the Compression Performance of Low Complexity Enhancement Video Coding”, doc. WG4N76, 3rd meeting: April 2020.  Benjamin Bross, Jianle Chen, Jens-Rainer Ohm, Gary J. Sullivan, and Ye-Kui Wang, “Developments in International Video Coding Standardization After AVC, With an Overview of Versatile Video Coding (VVC)”, Proceedings of the IEEE, Vol. 109, Issue 9, pp. 1463–1493, doi 10.1109/JPROC.2020.3043399, Sept. 2021 (open access publication), available at https://ieeexplore.ieee.org/document/9328514.  Benjamin Bross, Ye-Kui Wang, Yan Ye, Shan Liu, Gary J. Sullivan, and Jens-Rainer Ohm, “Overview of the Versatile Video Coding (VVC) Standard and its Applications”, IEEE Trans. Circuits & Systs. for Video Technol. (open access publication), available online at https://ieeexplore.ieee.org/document/9395142.  Mathias Wien and Vittorio Baroncini, “VVC Verification Test Report for Ultra High Definition (UHD) Standard Dynamic Range (SDR) Video Content”, doc. JVET-T2020 of ITU-T/ISO/IEC Joint Video Experts Team (JVET), 20th meeting: October 2020.  Mathias Wien and Vittorio Baroncini, “VVC Verification Test Report for High Definition (HD) and 360° Standard Dynamic Range (SDR) Video Content”, doc. JVET-V2020 of ITU-T/ISO/IEC Joint Video Experts Team (JVET), 22nd meeting: April 2021.  Mathias Wien and Vittorio Baroncini, “VVC verification test report for high dynamic range video content”, doc. JVET-W2020 of ITU-T/ISO/IEC Joint Video Experts Team (JVET), 23rd meeting: July 2021.
Welcome to the fifth column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG). The last VQEG plenary meeting took place online from 7 to 11 June 2021. As the previous meeting celebrated in December 2020, it was organized online (this time by Kingston University) with multiple sessions spread over five days, allowing remote participation of people from 22 different countries of America, Asia, and Europe. More than 100 participants registered to the meeting and they could attend the 40 presentations and several discussions that took place in all working groups. This column provides an overview of the recently completed VQEG plenary meeting, while all the information, minutes and files (including the presented slides) from the meeting are available online in the VQEG meeting website.
Several interesting presentations of state-of-the-art works can be of interest to the SIGMM community, in addition to the contributions to several working items of ITU from various VQEG groups. The progress on the new activities launched in the last VQEG plenary meeting (in relation to Live QoE assessment, SI/TI clarification, implementers guide for video quality metrics for coding applications, and the inclusion of video quality metrics as metadata in compressed streams), as well as the proposal for a new joint work on evaluation of immersive communication systems from a task-based or interactive perspective within the Immersive Media Group.
We encourage those readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.
Overview of VQEG Projects
Audiovisual HD (AVHD)
AVHD group works on improved subjective and objective methods for video-only and audiovisual quality of commonly available systems. Currently, after the project AVHD/P.NATS2 (a joint collaboration between VQEG and ITU SG12) finished in 2020 , two projects are ongoing within AVHD group: QoE Metrics for Live Video Streaming Applications (Live QoE), which was launched in the last plenary meeting, and Advanced Subjective Methods (AVHD-SUB). The main discussion during the AVHD sessions was related to the Live QoE project, which was led by Shahid Satti (Opticom) and Rohit Puri (Twitch). In addition to the presentation of the project proposal, the main decisions reached until now were exposed (e.g., use of videos of 20-30 seconds with resolution 1080p and framerates up to 60fps, use ACR as subjective test methodology, generation of test conditions, etc.), as well as open questions were brought up for discussion, especially in relation to how to acquire premium content and network traces. In addition to this discussion, Steve Göring (TU Ilmenau) presented and open-source platform (AVrate Voyager) for crowdsourcing/online subjective tests , and Shahid Satti (Opticom) presented the performance results of the Opticom models on the project AVHD/P.NATS Phase 2. Finally, Ioannis Katsavounidis (Facebook) presented the subjective testing validation of the AV1 performance from the Alliance for Open Media (AOM) to gather feedback on the test plan and possible interested testing labs from VQEG. It is also worth noting that this session was recorded to be used as raw multimedia data for the Live QoE project.
Quality Assessment for Health applications (QAH)
The session related to the QAH group group allocated three presentations apart from the project summary provided by Lucie Lévêque (Polytech Nantes). In particular, Meriem Outtas (INSA Rennes) provided a review on objective quality assessment of medical images and videos. This is is one of the topics jointly addressed by the group, which is working on an overview paper in line with the recent review on subjective medical image quality assessment . Moreover, Zohaib Amjad Khan (Université Sorbonne Paris Nord) presented a work on video quality assessment of laparoscopic videos, while Aditja Raj and Maria Martini (Kingston University) presented their work on multivariate regression-based convolutional neural network model for fundus image quality assessment.
Statistical Analysis Methods (SAM)
The SAM session consisted of three presentations followed by discussions on the topics. One of this was related to the description of subjective experiment consistency by p-value p-p plot , which was presented by Jakub Nawała (AGH University of Science and Technology). In addition, Zhi Li (Netflix) and Rafał Figlus (AGH University of Science and Technology) presented the progress on the contribution from SAM to the ITU-T to modify the recommendation P.913 to include the MLE model for subject behavior in subjective experiments  and the recently available implementation of this model in Excel. Finally, Pablo Pérez (Nokia Bell Labs) and Lucjan Janowski (AGH University of Science and Technology) presented their work on the possibility of performing subjective experiments with four subjects .
Computer Generated Imagery (CGI)
Nabajeet Barman (Kingston University) presented a report on the current activities of the CGI group. The main current working topics are related to gaming quality assessment methodologies and quality prediction, and codec comparison for CG content. This group is closely collaborating with the ITU-T SG12, as reflected by its support on the completion of the 3 work items: ITU-T Rec. G.1032 on influence factors on gaming quality of experience, ITU-T Rec. P.809 on subjective evaluation methods for gaming quality, and ITU-T Rec. G.1072 on opinion model for gaming applications. Furthermore, CGI is contributing to 3 new work items: ITU-T work item P.BBQCG on parametric bitstream-based quality assessment of cloud gaming services, ITU-T work item G.OMMOG on opinion models for mobile online gaming applications, and ITU-T work item P.CROWDG on subjective evaluation of gaming quality with a crowdsourcing approach. In addition, four presentations were scheduled during the CGI slots. The first one was delivered by Joel Jung (Tencent Media Lab) and David Lindero (Ericsson), who presented the details of the ITU-T work item P.BBQCG. Another one was related to the evaluation of MPEG-5 Part 2 (LCEVC) for gaming video streaming applications, which was presented by Nabajeet Barman (Kingston University) and Saman Zadtootaghaj (Dolby Laboratories). Also Nabajeet together with Maria Martini (Kingston University) presented a dataset, codec comparison and challenges related to user generated HDR gaming video streaming . Finally, JP Tauscher (Technische Universität Braunschweig) presented his work on EEG-based detection of deep fake images.
The 5GKPI session consisted of a presentation by Pablo Pérez (Nokia Bell-Labs) of the progress achieved by the group since the last plenary meeting in the following efforts: 1) the contribution to ITU-T Study Group 12 Question 13 related through the Technical Report about QoE in 5G video services (GSTR-5GQoE), which addresses QoE requirements and factors for some use cases like Tele-operated Driving (ToD), wireless content production, mixed reality offloading and first responder networks; 2) the contribution to the 5G Automotive Association (5GAA) through a high-level contribution on general QoE requirements for remote driving, considering for the near future the execution of subjective tests for ToD video quality; and 3) the long-term plan on working on a methodology to create simple opinion models to estimate average QoE for a network and use case.
Immersive Media Group (IMG)
Several presentations were delivered during the IMG session that were divided into two blocks: one covering technologies and studies related to the evaluation of immersive communication systems from a task-based or interactive perspective, and another one covering other topics related to the assessment of QoE of immersive media. The first set of presentations is related to a new proposal for a joint work within IMG related to the ITU-T work item P.QXM on QoE assessment of eXtended Reality meetings. Thus, Irene Viola (CWI) presented an overview of this work item. In addition, Carlos Cortés (Universidad Politécncia de Madrid) presented his work on evaluating the impact of delay on QoE in immersive interactive environments, Irene Viola (CWI) presented a dataset of point cloud dynamic humans for immersive telecommunications, Pablo César (CWI) presented their pipeline for social virtual reality , and Narciso García (Universidad Politécncia de Madrid) presented their real-time free-viewpoint video system (FVVLive) . After these presentations, Jesús Gutiérrez (Universidad Politécncia de Madrid) led the discussion on joint next steps with IMG, which, in addition, to identify interested parties in joining the effort to study the evaluation of immersive communication systems, also covered the further analyses to be done from the subjective tests carried out with short 360-degree videos  and the studies carried out to assess quality and other factors (e.g., presence) with long omnidirectional sequences. In this sense, Marta Orduna (Universidad Politécnica de Madrid) presented her subjective study to validate a methodology to assess quality, presence, empathy, attitude, and attention in Social VR . Future progress on these joint activities will be discussed in the group audio-calls. Within the other block of presentations related to immersive media topics, Maria Martini (Kingston University), Chulhee Lee (Yonsei University), and Patrick Le Callet (Université de Nantes) presented the status of IEEE standardization on QoE for immersive experiences (IEEE P3333.1.4 – Light Field, and IEEE P3333.1.3, deep learning-based quality assessment), Kjell Brunnström (RISE) presented their work on legibility and readability in augmented reality , Abdallah El Ali (CWI) presented his work on investigating the relationship between momentary emotion self-reports and head and eye movements in HMD-based 360° videos , Elijs Dima (Mid Sweden University) exposed his study on quality of experience in augmented telepresence considering the effects of viewing positions and depth-aiding augmentation , Silvia Rossi (UCL) presented her work towards behavioural analysis of 6-DoF user when consuming immersive media , and Yana Nehme (INSA Lyon) presented a study on exploring crowdsourcing for subjective quality assessment of 3D Graphics.
Intersector Rapporteur Group on Audiovisual Quality Assessment (IRG-AVQA) and Q19 Interim Meeting
During the IRG-AVQA session, an overview on the progress and recent works within ITU-R SG6 and ITU-T SG12 was provided. In particular, Chulhee Lee (Yonsei University) in collaboration with other ITU rapporteurs presented the progress of ITU-R WP6C on recommendations for HDR content, the work items within: ITU-T SG12 Question 9 on audio-related work items, SG12 Question 13 on gaming and immersive technologies (e.g., augmented/extended reality) among others, SG12 Question 14 recommendations and work items related to the development of video quality models, and SG12 Question 19 on work items related to television and multimedia. In addition, the progress of the group “Implementers Guide for Video Quality Metrics (IGVQM)”, launched in the last plenary meeting by Ioannis Katsavounidis (Facebook) was discussed addressing specific points to push the collection of video quality models and datasets to be used to develop an implementer’s guide for objective video quality metrics for coding applications.
The next VQEG plenary meeting will take place online in December 2021.
In addition, VQEG is investigating the possibility to disseminate the videos from all the talks from these plenary meetings via platforms such as Youtube and Facebook.
JPEG Committee issues a Call for Proposals on Holography coding
The 91st JPEG meeting was held online from 19 to 23 April 2021. This meeting saw several activities relating to holographic coding, notably the release of the JPEG Pleno Holography Call for Proposals, consolidated with the definition of the use cases and requirements for holographic coding and common test conditions that will assure the evaluation of the future proposals.
91st meeting was also marked by the start of a new exploration initiative
on Non-Fungible Tokens (NFTs), due to the recent interest in this technology in
a large number of applications and in particular in digital art. Since NFTs
rely on decentralized networks and JPEG has been analysing the implications of
Blockchains and distributed ledger technologies in imaging, it is a natural next
step to explore how JPEG standardization can facilitate interoperability
between applications that make use of NFTs.
The following presents an overview of the major achievements carried out during the 91st JPEG meeting.
The 91st JPEG meeting had the following highlights:
JPEG launches call for proposals for the first standard in holographic coding,
JPEG Fake Media,
JPEG Reference Software.
call for proposals for the first standard in holographic coding
JPEG Pleno aims to provide a standard framework for representing new
imaging modalities, such as light field, point cloud, and holographic content.
JPEG Pleno Holography is the first standardization effort for a versatile
solution to efficiently compress holograms for a wide range of applications ranging
from holographic microscopy to tomography, interferometry, and printing and
display, as well as their associated hologram types. Key functionalities
include support for both lossy and lossless coding, scalability, random access,
and integration within the JPEG Pleno system architecture, with the goal of
supporting a royalty free baseline.
The final Call for Proposals (CfP) on JPEG Pleno Holography – a
milestone in the roll-out of the JPEG Pleno framework – has been issued as the
main result of the 91st JPEG meeting, Online, 19-23 April 2021. The deadline
for expressions of interest and registration is 1 August 2021. Submissions to
the Call for Proposals are due on 1 September 2021.
A second milestone reached at this meeting was the promotion to International Standard of JPEG Pleno Part 2: Light Field Coding (ISO/IEC 21794-2). This standard provides light field coding tools originating from either microlens cameras or camera arrays. Part 1 of this standard, which was promoted to International Standard earlier, provides the overall file format syntax supporting light field, holography and point cloud modalities.
During the 91st JPEG meeting, the JPEG Committee officially
began an exciting phase of JPEG Pleno Point Cloud coding standardisation with a
focus on learning-based point cloud coding.
The scope of the JPEG Pleno Point Cloud activity is the creation of a learning-based
coding standard for point clouds and associated attributes, offering a
single-stream, compact compressed domain representation, supporting advanced
flexible data access functionalities. The JPEG Pleno Point Cloud standard
targets both interactive human visualization, with significant compression
efficiency over state of the art point cloud coding solutions commonly used at
equivalent subjective quality, and also enables effective performance for 3D
processing and computer vision tasks. The JPEG Committee expects the standard
to support a royalty-free baseline.
The standard is envisioned to provide a number of unique benefits,
including an efficient single point cloud representation for both humans and
machines. The intent is to provide humans with the ability to visualise and
interact with the point cloud geometry and attributes while providing machines
the ability to perform 3D processing and computer vision tasks in the
compressed domain, enabling lower complexity and higher accuracy through the
use of compressed domain features extracted from the original instead of the
lossily decoded point cloud.
Non-Fungible Tokens have been the focus of much attention in recent months. Several digitals assets that NFTs point to are either in existing JPEG formats or can be represented in current and emerging formats under development by the JPEG Committee. Furthermore, several trust and security issues have been raised regarding NFTs and the digital assets they rely on. Here again, JPEG Committee has a significant track record in security and trust in imaging applications. Building on this background, the JPEG Committee has launched a new exploration initiative around NFTs to better understand the needs in terms of imaging requirements and how existing as well as potential JPEG standards can help bring security and trust to NFTs in a wide range of applications and notably those that rely on contents that are represented in JPEG formats in still and animated pictures and 3D contents. The first steps in this initiative involve outreach to stakeholders in NFTs and its application and organization of a workshop to discuss challenges and current solutions in NFTs, notably in the context of applications relevant to the scope of the JPEG Standardization Committee. JPEG Committee invites interested parties to subscribe to the mailing list of the JPEG NFT exploration via http://listregistration.jpeg.org.
JPEG Fake Media
The JPEG Fake Media exploration activity continues its work to assess standardization needs to facilitate secure and reliable annotation of media asset creation and modifications in good faith usage scenarios as well as in those with malicious intent. At the 91st meeting, the JPEG Committee released an updated version of the “JPEG Fake Media Context, Use Cases and Requirements” document. This new version includes several refinements including an improved and coherent set of definitions covering key terminology. The requirements have been extended and reorganized into three main identified categories: media creation and modification descriptions, metadata embedding framework and authenticity verification framework. The presentations and video recordings of the 2nd Workshop on JPEG Fake Media are now available on the JPEG website. JPEG invites interested parties to regularly visit https://jpeg.org/jpegfakemedia for the latest information and subscribe to the mailing list via http://listregistration.jpeg.org.
At the 91st meeting, the results of the JPEG AI exploration experiments
for the image processing and computer vision tasks defined at the previous 90th
meeting were presented and discussed. Based on the analysis of the results, the
exploration experiments description was improved. This activity will allow the definition
of a performance assessment framework to use in the learning-based image codecs
latent representation in several visual analysis tasks, such as compressed
domain image classification and compressed domain material and texture
recognition. Moreover, the impact of such experiments on the current version of
the Common Test Conditions (CTC) was discussed.
Moreover, the draft of the Call for Proposals was analysed, notably regarding the training dataset and training procedures as well as the submission requirements. The timeline of the JPEG AI work item was discussed and it was agreed that the final Call for Proposals (CfP) will be issued as an outcome of the 93rd JPEG Meeting. The deadline for expression of interest and registration is 5 November 2021. Further, the submission of bitstreams and decoded images for the test dataset are due on 30 January 2022.
During the 91st meeting, the Draft International Standard (DIS) text of
JLINK (ISO/IEC 19566-7) and Committee Draft (CD) text of JPEG Snack (ISO/IEC
19566-8) were completed and will be submitted for ballot. Amendments for JUMBF
(ISO/IEC 19566-5 AMD1) and JPEG 360 (ISO/IEC 19566-6 AMD1) received a final
review and are being released for publication. In addition, new extensions to
JUMBF (ISO/IEC 19566-5) are under consideration to support rapidly emerging use
cases related to content authenticity and integrity; updated use cases and
requirements are being drafted. Finally, discussions have started to create
awareness on how to interact with JUMBF boxes and the information they contain,
without breaking integrity or interoperability. Interested parties are invited
to subscribe to the mailing list of the JPEG Systems AHG in order to contribute
to the above activities via http://listregistration.jpeg.org.
The second editions of JPEG XS Part 1 (Core coding system) and Part 3
(Transport and container formats) were prepared for Final Draft International
Standard (FDIS) balloting, with the intention of having both standards
published by October 2021. The second editions integrate new coding and signalling
capabilities to support RAW Bayer colour filter array (CFA) images, 4:2:0
sampled images and mathematically lossless coding of up to 12-bits per
component. The associated profiles and buffer models are handled in Part 2,
which is currently under DIS ballot. The focus now has shifted to work on the
second editions of Part 4 (Conformance testing) and Part 5 (Reference
software). Finally, the JPEG Committee defined a study to investigate future
improvements to high dynamic range (HDR) and mathematically lossless
compression capabilities, while still honouring the low-complexity and
low-latency requirements. In particular, for RAW Bayer CFA content, the JPEG Committee
will work on extensions of JPEG XS supporting lossless compression of CFA
patterns at sample bit depths above 12 bits.
The JPEG Committee has finalized JPEG XL Part 2 (File format), which is now at the FDIS stage. A Main profile has been specified in draft Amendment 1 to Part 1, which entered the draft amendment (DAM) stage of the approval process at the current meeting. The draft Main profile has two levels: Level 5 for end-user image delivery and Level 10 for generic use cases, including image authoring workflows. Now that the criteria for conformance have been determined, the JPEG Committee has defined new core experiments to define a set of test codestreams that provides full coverage of the coding tools. Part 4 (Reference software) is now at the DIS stage. With the first edition FDIS texts of both Part 1 and Part 2 now complete, JPEG XL is ready for wide adoption.
The JPEG Committee has continued its exploration of coding of images in quaternary representation, particularly suitable for DNA storage. After a successful third workshop presentation by stakeholders, two new use cases were identified along with a large number of new requirements, and a new version of the JPEG DNA overview document was issued and is now made publicly available. It was decided to continue this exploration by organizing the fourth workshop and conducting further outreach to stakeholders, as well as continuing with improving the JPEG DNA overview document.
Interested parties are invited to refer to the following URL and to
consider joining the effort by registering to the mailing list of JPEG DNA
JPEG Reference Software
The JPEG Committee is pleased to announce that its standard on the JPEG
reference software, 2nd edition, reached the state of International Standard
and will be publicly available from both ITU and ISO/IEC.
This standard, to appear as ITU-T T.873 | ISO/IEC 10918-7 (2nd Edition) provides
reference implementations to the first JPEG standard, used daily throughout the
world. The software included in this document guides vendors on how JPEG
(ISO/IEC 10918-1) can be implemented and may serve as a baseline and starting
point for JPEG
encoders or decoders.
This second edition updates the two reference implementations to their latest versions, fixing minor defects in the software.
“JPEG standards continue to be a motor of innovation and an enabler of new applications in imaging as witnessed by the release of the first standard for coding of holographic content.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.
Future JPEG meetings are planned as follows:
No. 92, will be held online from 7 to 13 July 2021.
No 93, is planned to be held in Berlin, Germany during 16-22 October 2021.
The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.
The 134th MPEG meeting was once again held as an online meeting, and the official press release can be found here and comprises the following items:
First International Standard on Neural Network Compression for Multimedia Applications
Completion of the carriage of VVC and EVC
Completion of the carriage of V3C in ISOBMFF
Call for Proposals: (a) new Advanced Genomics Features and Technologies, (b) MPEG-I Immersive Audio, and (c) coded Representation of Haptics
MPEG evaluated Responses on Incremental Compression of Neural Networks
Progression of MPEG 3D Audio Standards
The first milestone of development of Open Font Format (2nd amendment)
Verification tests: (a) low Complexity Enhancement Video Coding (LCEVC) Verification Test and (b) more application cases of Versatile Video Coding (VVC)
Standardization work on Version 2 of VVC and VSEI started
In this column, the focus is on streaming-related aspects including a brief update about MPEG-DASH.
First International Standard on Neural Network Compression for Multimedia Applications
Artificial neural networks have been adopted for a broad range of tasks in multimedia analysis and processing, such as visual and acoustic classification, extraction of multimedia descriptors, or image and video coding. The trained neural networks for these applications contain many parameters (i.e., weights), resulting in a considerable size. Thus, transferring them to several clients (e.g., mobile phones, smart cameras) benefits from a compressed representation of neural networks.
At the 134th MPEG meeting, MPEG Video ratified the first international standards on Neural Network Compression for Multimedia Applications (ISO/IEC 15938-17), designed as a toolbox of compression technologies. The specification contains different methods for
parameter transformation (e.g., quantization), and
methods that can be assembled to encoding pipelines combining one or more (in the case of reduction) methods from each group.
The results show that trained neural networks for many common multimedia problems such as image or audio classification or image compression can be compressed by a factor of 10-20 with no performance loss and even by more than 30 with performance trade-off. The specification is not limited to a particular neural network architecture and is independent of the neural network exchange format choice. The interoperability with common neural network exchange formats is described in the annexes of the standard.
As neural networks are becoming increasingly important, the communication thereof over heterogeneous networks to a plethora of devices raises various challenges including efficient compression that is inevitable and addressed in this standard. ISO/IEC 15938 is commonly referred to as MPEG-7 (or the “multimedia content description interface”) and this standard becomes now part 15 of MPEG-7.
Research aspects: Like for all compression-related standards, research aspects are related to compression efficiency (lossy/lossless), computational complexity (runtime, memory), and quality-related aspects. Furthermore, the compression of neural networks for multimedia applications probably enables new types of applications and services to be deployed in the (near) future. Finally, simultaneous delivery and consumption (i.e., streaming) of neural networks including incremental updates thereof will become a requirement for networked media applications and services.
Carriage of Media Assets
At the 134th MPEG meeting, MPEG Systems completed the carriage of various media assets in MPEG-2 Systems (Transport Stream) and the ISO Base Media File Format (ISOBMFF), respectively.
In particular, the standards for the carriage of Versatile Video Coding (VVC) and Essential Video Coding (EVC) over both MPEG-2 Transport Stream (M2TS) and ISO Base Media File Format (ISOBMFF) reached their final stages of standardization, respectively:
For M2TS, the standard defines constraints to elementary streams of VVC and EVC to carry them in the packetized elementary stream (PES) packets. Additionally, buffer management mechanisms and transport system target decoder (T-STD) model extension are also defined.
For ISOBMFF, the carriage of codec initialization information for VVC and EVC is defined in the standard. Additionally, it also defines samples and sub-samples reflecting the high-level bitstream structure and independently decodable units of both video codecs. For VVC, signaling and extraction of a certain operating point are also supported.
Finally, MPEG Systems completed the standard for the carriage of Visual Volumetric Video-based Coding (V3C) data using ISOBMFF. Therefore, it supports media comprising multiple independent component bitstreams and considers that only some portions of immersive media assets need to be rendered according to the users’ position and viewport. Thus, the metadata indicating the relationship between the region in the 3D spatial data to be rendered and its location in the bitstream is defined. In addition, the delivery of the ISOBMFF file containing a V3C content over DASH and MMT is also specified in this standard.
Research aspects: Carriage of VVC, EVC, and V3C using M2TS or ISOBMFF provides an essential building block within the so-called multimedia systems layer resulting in a plethora of research challenges as it typically offers an interoperable interface to the actual media assets. Thus, these standards enable efficient and flexible provisioning or/and use of these media assets that are deliberately not defined in these standards and subject to competition.
Call for Proposals and Verification Tests
At the 134th MPEG meeting, MPEG issued three Call for Proposals (CfPs) that are briefly highlighted in the following:
Coded Representation of Haptics: Haptics provide an additional layer of entertainment and sensory immersion beyond audio and visual media. This CfP aims to specify a coded representation of haptics data, e.g., to be carried using ISO Base Media File Format (ISOBMFF) files in the context of MPEG-DASH or other MPEG-I standards.
MPEG-I Immersive Audio: Immersive Audio will complement other parts of MPEG-I (i.e., Part 3, “Immersive Video” and Part 2, “Systems Support”) in order to provide a suite of standards that will support a Virtual Reality (VR) or an Augmented Reality (AR) presentation in which the user can navigate and interact with the environment using 6 degrees of freedom (6 DoF), that being spatial navigation (x, y, z) and user head orientation (yaw, pitch, roll).
New Advanced Genomics Features and Technologies: This CfP aims to collect submissions of new technologies that can (i) provide improvements to the current compression, transport, and indexing capabilities of the ISO/IEC 23092 standards suite, particularly applied to data consisting of very long reads generated by 3rd generation sequencing devices, (ii) provide the support for representation and usage of graph genome references, (iii) include coding modes relying on machine learning processes, satisfying data access modalities required by machine learning and providing higher compression, and (iv) support of interfaces with existing standards for the interchange of clinical data.
Detailed information, including instructions on how to respond to the call for proposals, the requirements that must be considered, the test data to be used, and the submission and evaluation procedures for proponents are available at www.mpeg.org.
Call for proposals typically mark the beginning of the formal standardization work whereas verification tests are conducted once a standard has been completed. At the 134th MPEG meeting and despite the difficulties caused by the pandemic situation, MPEG completed verification tests for Versatile Video Coding (VVC) and Low Complexity Enhancement Video Coding (LCEVC).
For LCEVC, verification tests measured the benefits of enhancing four existing codecs of different generations (i.e., AVC, HEVC, EVC, VVC) using tools as defined in LCEVC within two sets of tests:
The first set of tests compared LCEVC-enhanced encoding with full-resolution single-layer anchors. The average bit rate savings produced by LCEVC when enhancing AVC were determined to be approximately 46% for UHD and 28% for HD. When enhancing HEVC approximately 31% for UHD and 24% for HD. Test results tend to indicate an overall benefit also when using LCEVC to enhance EVC and VVC.
The second set of tests confirmed that LCEVC provided a more efficient means of resolution enhancement of half-resolution anchors than unguided up-sampling. Comparing LCEVC full-resolution encoding with the up-sampled half-resolution anchors, the average bit-rate savings when using LCEVC with AVC, HEVC, EVC and VVC were calculated to be approximately 28%, 34%, 38%, and 32% for UHD and 27%, 26%, 21%, and 21% for HD, respectively.
For VVC, it was already the second round of verification testing including the following aspects:
360-degree video for equirectangular and cubemap formats, where VVC shows on average more than 50% bit rate reduction compared to the previous major generation of MPEG video coding standard known as High Efficiency Video Coding (HEVC), developed in 2013.
Low-delay applications such as compression of conversational (teleconferencing) and gaming content, where the compression benefit is about 40% on average,
HD video streaming, with an average bit rate reduction of close to 50%.
A previous set of tests for 4K UHD content completed in October 2020 had shown similar gains. These verification tests used formal subjective visual quality assessment testing with “naïve” human viewers. The tests were performed under a strict hygienic regime in two test laboratories to ensure safe conditions for the viewers and test managers.
Research aspects: CfPs offer a unique possibility for researchers to propose research results for adoption into future standards. Verification tests provide objective or/and subjective evaluations of standardized tools which typically conclude the life cycle of a standard. The results of the verification tests are usually publicly available and can be used as a baseline for future improvements of the respective standards including the evaluation thereof.
Finally, I’d like to provide a brief update on MPEG-DASH! At the 134th MPEG meeting, MPEG Systems recommended the approval of ISO/IEC FDIS 23009-1 5th edition. That is, the MPEG-DASH core specification will be available as 5th edition sometime this year. Additionally, MPEG requests that this specification becomes freely available which also marks an important milestone in the development of the MPEG-DASH standard. Most importantly, the 5th edition of this standard incorporates CMAF support as well as other enhancements defined in the amendment of the previous edition. Additionally, the MPEG-DASH subgroup of MPEG Systems is already working on the first amendment to its 5th edition entitled preroll, nonlinear playback, and other extensions. It is expected that the 5th edition will also impact related specifications within MPEG but also in other Standards Developing Organizations (SDOs) such as DASH-IF, i.e., defining interoperability points (IOPs) for various codecs and others, or CTA WAVE (Web Application Video Ecosystem), i.e., defining device playback capabilities such as the Common Media Client Data (CMCD). Both DASH-IF and CTA WAVE provide means for (conformance) test infrastructure for DASH and CMAF.
An updated overview of DASH standards/features can be found in the Figure below.
Research aspects: MPEG-DASH has been ratified almost ten years ago which resulted in a plethora of research articles, mostly related to adaptive bitrate (ABR) algorithms and their impact on the streaming performance including the Quality of Experience (QoE). An overview of bitrate adaptation schemes is provided here including a list of open challenges and issues.
The 135th MPEG meeting will be again an online meeting in July 2021. Click here for more information about MPEG meetings and their developments.
Welcome to the fourth column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG). During the last VQEG plenary meeting (14-18 Dec. 2020) various interesting discussions arose regarding new topics not addressed up to then by VQEG groups, which led to launching three new sub-projects and a new project related to: 1) clarifying the computation of spatial and temporal information (SI and TI), 2) including video quality metrics as metadata in compressed bitstreams, 3) Quality of Experience (QoE) metrics for live video streaming applications, and 4) providing guidelines on implementing objective video quality metrics to the video compression community. The following sections provide more details about these new activities and try to encourage interested readers to follow and get involved in any of them by subscribing to the corresponding reflectors.
SI and TI Clarification
The VQEG No-Reference Metrics (NORM) group has recently focused on the topic of spatio-temporal complexity, revisiting the Spatial Information and Temporal Information (SI/TI) indicators, which are described in ITU-T Rec. P.910 . They were originally developed for the T1A1 dataset in 1994 . The metrics have found good use over the last 25 years – mostly employed for checking the complexity of video sources in datasets. However, SI/TI definitions contain ambiguities, so the goal of this sub-project is to provide revised definitions eliminating implementation inconsistencies.
Three main topics are discussed by VQEG in a series of online meetings:
Comparison of existing publicly available implementations for SI/TI: a comparison was made between several public open-source implementations for SI/TI, based on initial feedback from members of Facebook. Bugs and inconsistencies were identified with the handling of video frame borders, treatment of limited vs. full range content, as well as the reporting of TI values for the first frame. Also, the lack of standardized test vectors was brought up as an issue. As a consequence, a new reference library was developed in Python by members of TU Ilmenau, incorporating all bug fixes that were previously identified, and introducing a new test suite, to which the public is invited to contribute material. VQEG is now actively looking for specific test sequences that will be useful for both validating existing SI/TI implementations, but also extending the scope of the metrics, which is related to the next issue described below.
Study on how to apply SI/TI on different content formats: the description of SI/TI was found to be not suitable for extended applications such as video with a higher bit depth (> 8 Bit), HDR content, or spherical/3D video. Also, the question was raised on how to deal with the presence of scene changes in content. The community concluded that for content with higher bit depth, SI/TI functions should be calculated as specified, but that the output values could be mapped back to the original 8-Bit range to simplify comparisons. As for HDR, no conclusion was reached, given the inherent complexity of the subject. It was also preliminarily concluded that the treatment of scene changes should not be part of an SI/TI recommendation, instead focusing on calculating SI/TI for short sequences without scene changes, since the way scene changes would be dealt with may depend on the final application of the metrics.
Discussion on other relevant uses of SI/TI: it has been widely used for checking video datasets in terms of diversity and classifying content. Also, SI/TI have been used in some no-reference metrics as content features. The question was raised whether SI/TI could be used for predicting how well content could be encoded. The group noted that different encoders would deal with sources differently, e.g. related to noise in the video. It was stated that it would be nice to be able to find a metric that was purely related to content and not affected by encoding or representation.
As a first step, this revision of the topic of SI/TI has resulted in a harmonized implementation and in the identification of future application areas. Discussions on these topics will continue in the next months through audio-calls that are open to interested readers.
Video Quality Metadata Standard
Also within NORM group, another topic was launched related to the inclusion of video quality metadata in compressed streams .
Almost all modern transcoding pipelines use full-reference video quality metrics to decide on the most appropriate encoding settings. The computation of these quality metrics is demanding in terms of time and computational resources. In addition, estimation errors propagate and accumulate when quality metrics are recomputed several times along the transcoding pipeline. Thus, retaining the results of these metrics with the video can alleviate these constraints, requiring very little space and providing a “greener” way of estimating video quality. With this goal, the new sub-project has started working towards the definition of a standard format to include video quality metrics metadata both at video bitstream level and system layer .
In this sense, the experts involved in the new sub-project are working on the following items:
Identification of existing proposals and working groups within other standardisation bodies and organisations that address similar topics and propose amendments including new requirements. For example, MPEG has already worked on the adding of video quality metrics (e.g., PSNR, SSIM, MS-SSIM, VQM, PEVQ, MOS, FISG) metadata at system level (e.g, in MPEG2 streams , HTTP , etc.).
Identification of quality metrics to be considered in the standard. In principle, validated and standardized metrics are of interest, although other metrics can be also considered after a validation process on a standard set of subjective data (e.g., using existing datasets). New metrics to those used in previous approaches are of special interest. (e.g., VMAF , FB-MOS ).
Consideration of the computation of multiple generations of full-reference metrics at different steps of the transcoding chain, of the use of metrics at different resolutions, different spatio-temporal aggregation methods, etc.
Definition of a standard video quality metadata payload, including relevant fields such as metric name (e.g., “SSIM”), version (e.g., “v0.6.1”), raw score (e.g., “0.9256”), mapped-to-MOS score (e.g., “3.89”), scaling method (e.g., “Lanczos-5”), temporal reference (e.g., “0-3” frames), aggregation method (e.g., “arithmetic mean”), etc .
More details and information on how to join this activity can be found in the NORM webpage.
The success of a live multimedia streaming session is defined by the experience of a participating audience. Both the content communicated by the media and the quality at which it is delivered matter – for the same content, the quality delivered to the viewer is a differentiating factor. Live media streaming systems undertake a lot of investment and operate under very tight service availability and latency constraints to support multimedia sessions for their audience. Both to measure the return on investment and to make sound investment decisions, it is paramount that we be able to measure the media quality offered by these systems. In this sense, given the large scale and complexity of media streaming systems, objective metrics are needed to measure QoE.
Therefore, the following topics have been identified and are studied :
Creation of a high quality dataset, including media clips and subjective scores, which will be used to tune, train and develop objective QoE metrics. This dataset should represent the conditions that take place in typical live media streaming situations, therefore conditions and impairments comprising audio and video tracks (independently and jointly) will be considered. In addition, this datasets should cover a diverse set of content categories, including premium contentes (e.g., sports, movies, concerts, etc.) and user generated content (e.g., music, gaming, real life content, etc.).
Development of QoE objective metrics, especially focusing on no-reference or near-no-reference metrics, given the lack of access to the original video at various points in the live media streaming chain. Different types of models will be considered including signal-based (operate on the decoded signal), metadata-based (operate on available metadata, e.g. codecs, resolution, framerate, bitrate, etc.), bitstream-based (operate on the parsed bitstream), and hybrid models (combining signal and metadata) . Also, machine-learning based models will be explored.
Certain challenges are envisioned to be faced when dealing with these two topics, such as separating “content” from “quality” (taking int account that content plays a big role on engagement and acceptability), spectrum expectations, role of network impairments and the collection of enough data to develop robust models . Readers interested in joining this effort are encouraged to visit AVHD webpage for more details.
Implementer’s Guide to Video Quality Metrics
In the last meeting, a new dedicated group on Implementer’s Guide to Video Quality Metrics (IGVQM) was set up to work on introducing and provide guidelines on implementing objective video quality metrics to the video compression community.
During the development of new video coding standards, peak-signal-to-noise-ratio (PSNR) has been traditionally used as the main objective metric to determine which new coding tools to be adopted. It has been furthermore used to establish the bitrate savings that a new coding standard offers over its predecessor through the employment of the so-called “BD-rate” metric  that still relies on PSNR for measuring quality.
Although this choice was fully justified for the first image/video coding standards – JPEG (1992), MPEG1 (1994), MPEG2 (1996), JPEG2000 and even H.264/AVC (2004) – since there was simply no other alternative at that time, its continuing use for the development of H.265/HEVC (2013), VP9 (2013), AV1 (2018) and most recently EVC and VVC (2020) is questionable, given the rapid and continuous evolution of more perceptual image/video objective quality metrics, such as SSIM (2004) , MS-SSIM (2004) , and VMAF (2015) .
This project attempts to offer some guidance to the video coding community, including standards setting organisations, on how to better utilise existing objective video quality metrics to better capture the improvements offered by video coding tools. For this, the following goals have been envisioned:
Address video compression and scaling impairments only.
Explore and use “state-of-the-art” full-reference (pixel) objective metrics, examine applicability of no-reference objective metrics, and obtain reference implementations of them.
Offer temporal aggregation methods of image quality metrics into video quality metrics.
Present statistical analysis of existing subjective datasets, constraining them to compression and scaling artifacts.
Highlight differences among objective metrics and use-cases. For example, in case of very small differences, which metric is more sensitive? Which quality range is better served by what metric?
Offer standard logistic mappings of objective metrics to a normalised linear scale.
More details can be found in the working document that has been set up to launch the project  and on the VQEG website.
The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.
The 133rd MPEG meeting was once again held as an online meeting, and this time, kicked off with great news, that MPEG is one of the organizations honored as a 72nd Annual Technology & Engineering Emmy® Awards Recipient, specifically the MPEG Systems File Format Subgroup and its ISO Base Media File Format (ISOBMFF) et al.
The official press release can be found here and comprises the following items:
6th Emmy® Award for MPEG Technology: MPEG Systems File Format Subgroup wins Technology & Engineering Emmy® Award
Essential Video Coding (EVC) verification test finalized
MPEG issues a Call for Evidence on Video Coding for Machines
Neural Network Compression for Multimedia Applications – MPEG calls for technologies for incremental coding of neural networks
MPEG Systems reaches the first milestone for supporting Versatile Video Coding (VVC) and Essential Video Coding (EVC) in the Common Media Application Format (CMAF)
MPEG Systems continuously enhances Dynamic Adaptive Streaming over HTTP (DASH)
MPEG Systems reached the first milestone to carry event messages in tracks of the ISO Base Media File Format
In this report, I’d like to focus on ISOBMFF, EVC, CMAF, and DASH.
MPEG Systems File Format Subgroup wins Technology & Engineering Emmy® Award
MPEG is pleased to report that the File Format subgroup of MPEG Systems is being recognized this year by the National Academy for Television Arts and Sciences (NATAS) with a Technology & Engineering Emmy® for their 20 years of work on the ISO Base Media File Format (ISOBMFF). This format was first standardized in 1999 as part of the MPEG-4 Systems specification and is now in its 6th edition as ISO/IEC 14496-12. It has been used and adopted by many other specifications, e.g.:
MP4 and 3GP file formats;
Carriage of NAL unit structured video in the ISO Base Media File Format which provides support for AVC, HEVC, VVC, EVC, and probably soon LCEVC;
MPEG-21 file format;
Dynamic Adaptive Streaming over HTTP (DASH) and Common Media Application Format (CMAF);
High-Efficiency Image Format (HEIF);
Timed text and other visual overlays in ISOBMFF;
Common encryption format;
Carriage of timed metadata metrics of media;
Derived visual tracks;
Event message track format;
Carriage of uncompressed video;
Omnidirectional Media Format (OMAF);
Carriage of visual volumetric video-based coding data;
Carriage of geometry-based point cloud compression data;
… to be continued!
This is MPEG’s fourth Technology & Engineering Emmy® Award (after MPEG-1 and MPEG-2 together with JPEG in 1996, Advanced Video Coding (AVC) in 2008, and MPEG-2 Transport Stream in 2013) and sixth overall Emmy® Award including the Primetime Engineering Emmy® Awards for Advanced Video Coding (AVC) High Profile in 2008 and High-Efficiency Video Coding (HEVC) in 2017, respectively.
Essential Video Coding (EVC) verification test finalized
At the 133rd MPEG meeting, a verification testing assessment of the Essential Video Coding (EVC) standard was completed. The first part of the EVC verification test using high dynamic range (HDR) and wide color gamut (WCG) was completed at the 132nd MPEG meeting. A subjective quality evaluation was conducted comparing the EVC Main profile to the HEVC Main 10 profile and the EVC Baseline profile to AVC High 10 profile, respectively:
Analysis of the subjective test results showed that the average bitrate savings for EVC Main profile are approximately 40% compared to HEVC Main 10 profile, using UHD and HD SDR content encoded in both random access and low delay configurations.
The average bitrate savings for the EVC Baseline profile compared to the AVC High 10 profile is approximately 40% using UHD SDR content encoded in the random-access configuration and approximately 35% using HD SDR content encoded in the low delay configuration.
Verification test results using HDR content had shown average bitrate savings for EVC Main profile of approximately 35% compared to HEVC Main 10 profile.
By providing significantly improved compression efficiency compared to HEVC and earlier video coding standards while encouraging the timely publication of licensing terms, the MPEG-5 EVC standard is expected to meet the market needs of emerging delivery protocols and networks, such as 5G, enabling the delivery of high-quality video services to an ever-growing audience.
In addition to verification tests, EVC, along with VVC and CMAF were subject to further improvements to their support systems.
Research aspects: as for every new video codec, its compression efficiency and computational complexity are important performance metrics. Additionally, the availability of (efficient) open-source implementations (i.e., x264, x265, soon x266, VVenC, aomenc, et al., etc.) are vital for its adoption in the (academic) research community.
MPEG Systems reaches the first milestone for supporting Versatile Video Coding (VVC) and Essential Video Coding (EVC) in the Common Media Application Format (CMAF)
At the 133rd MPEG meeting, MPEG Systems promoted Amendment 2 of the Common Media Application Format (CMAF) to Committee Draft Amendment (CDAM) status, the first major milestone in the ISO/IEC approval process. This amendment defines:
constraints to (i) Versatile Video Coding (VVC) and (ii) Essential Video Coding (EVC) video elementary streams when carried in a CMAF video track;
codec parameters to be used for CMAF switching sets with VVC and EVC tracks; and
support of the newly introduced MPEG-H 3D Audio profile.
It is expected to reach its final milestone in early 2022. For research aspects related to CMAF, the reader is referred to the next section about DASH.
MPEG Systems continuously enhances Dynamic Adaptive Streaming over HTTP (DASH)
At the 133rd MPEG meeting, MPEG Systems promoted Part 8 of Dynamic Adaptive Streaming over HTTP (DASH) also referred to as “Session-based DASH” to its final stage of standardization (i.e., Final Draft International Standard (FDIS)).
Historically, in DASH, every client uses the same Media Presentation Description (MPD), as it best serves the scalability of the service. However, there have been increasing requests from the industry to enable customized manifests for enabling personalized services. MPEG Systems has standardized a solution to this problem without sacrificing scalability. Session-based DASH adds a mechanism to the MPD to refer to another document, called Session-based Description (SBD), which allows per-session information. The DASH client can use this information (i.e., variables and their values) provided in the SBD to derive the URLs for HTTP GET requests.
An updated overview of DASH standards/features can be found in the Figure below.
Research aspects: CMAF is mostly like becoming the main segment format to be used in the context of HTTP adaptive streaming (HAS) and, thus, also DASH (hence also the name common media application format). Supporting a plethora of media coding formats will inevitably result in a multi-codec dilemma to be addressed in the near future as there will be no flag day where everyone will switch to a new coding format. Thus, designing efficient bitrate ladders for multi-codec delivery will an interesting research aspect, which needs to include device/player support (i.e., some devices/player will support only a subset of available codecs), storage capacity/costs within the cloud as well as within the delivery network, and network distribution capacity/costs (i.e., CDN costs).
The 134th MPEG meeting will be again an online meeting in April 2021. Click here for more information about MPEG meetings and their developments.
90th JPEG meeting was held online from 18 to 22 January 2021. This meeting was distinguished
by very relevant activities, notably the new JPEG AI standardization project
planning, and the analysis of the Call for Evidence on JPEG Pleno Point Cloud Coding.
new JPEG AI Learning-based Image Coding System has become an official new work
item registered under ISO/IEC 6048 and aims at providing compression efficiency
in addition to image processing and computer visions
tasks without the need for decompression.
response to the Call for Evidence on JPEG Pleno Point Cloud Coding was a learning-based
method that was found to offer state of the art compression efficiency. Considering this response, the JPEG Pleno
Point Cloud activity will analyse the possibility of preparing a future call
for proposals on learning-based coding solutions that will also consider new functionalities,
building on the relevant use cases already identified that require machine
learning tasks processed in the compressed domain.
Meanwhile the new JPEG XL coding system has reached FDIS stage and it is ready for adoption. JPEG XL offers compression efficiency similar to the best state of the art in image coding, the best lossless compression performance, affordable low complexity and integration with the legacy JPEG image coding standard allowing a friendly transition between the two standards.
The 90th JPEG meeting had the following highlights:
JPEG Pleno Point Cloud response to the Call for Evidence,
JPEG XL Core Coding System reaches FDIS stage,
JPEG Fake Media exploration,
JPEG DNA continues the exploration on image coding suitable for DNA storage,
JPEG XS 2nd edition of Profiles reaches DIS stage.
scope of the JPEG AI is the creation of a learning-based image coding standard
offering a single-stream, compact compressed domain representation, targeting
both human visualization with significant compression efficiency improvement
over image coding standards in common use at equivalent subjective quality, and
effective performance for image processing and computer vision tasks, with the
goal of supporting a royalty-free baseline.
AI has made several advances during the 90th technical meeting. During this
meeting, the JPEG AI Use Cases and Requirements were discussed and
collaboratively defined. Moreover, the JPEG AI vision and the overall system
framework of an image compression solution with efficient compressed domain
representation was defined. Following this approach, a set of exploration
experiments were defined to assess the capabilities of the
compressed representation generated by learning-based image codecs,
considering some specific computer vision and image processing tasks.
the performance assessment of the most popular objective quality metrics, using
subjective scores obtained during the call for evidence were discussed, as well
as anchors and some techniques to perform spatial prediction and entropy
JPEG Pleno Point Cloud response to the Call for Evidence
JPEG Pleno is working towards the integration of various modalities of
plenoptic content under a single and seamless framework. Efficient and powerful
point cloud representation is a key feature within this vision. Point cloud
data supports a wide range of applications including computer-aided
manufacturing, entertainment, cultural heritage preservation, scientific
research and advanced sensing and analysis. During the 90th JPEG meeting, the
JPEG Committee reached an exciting major milestone and reviewed the results of its
Final Call for Evidence on JPEG Pleno Point Cloud Coding. With an innovative
Deep Learning based point cloud codec supporting scalability and random access
submitted, the Call for Evidence results highlighted the emerging role of Deep
Learning in point cloud representation and processing. Between the 90th and
91st meetings, the JPEG Committee will be refining the scope and direction of
this activity in light of the results of the Call for Evidence.
JPEG XL Core Coding System reaches FDIS stage
The JPEG Committee has
finalized JPEG XL Part 1 (Core Coding System), which is now at FDIS stage. The
committee has defined new core experiments to determine appropriate profiles
and levels for the codec, as well as appropriate criteria for defining
conformance. With Part 1 complete, and Part 2 close to completion, JPEG XL is
ready for evaluation and adoption by the market.
JPEG Fake Media exploration
JPEG Committee initiated the JPEG Fake Media JPEG exploration study with the
objective to create a standard that can facilitate secure and reliable
annotation of media asset generation and modifications. The initiative aims to
support usage scenarios that are in good faith as well as those with
malicious intent. During the 90th JPEG meeting, the committee released a new
version of the document entitled “JPEG Fake Media: Context Use Cases and
Requirements” which is available on the JPEG website. A first workshop on the
topic was organized on the 15th of December 2020. The program,
presentations and a video recording of this workshop are available on the JPEG
website. A second workshop will be organized around March 2021. More details
will be made available soon on JPEG.org.
JPEG invites interested parties to regularly visit https://jpeg.org/jpegfakemedia for the latest information and subscribe to the mailing list
JPEG DNA continues the exploration on image coding suitable for DNA storage
JPEG Committee continued its exploration for coding of images in quaternary
representation, particularly suitable for DNA storage. After a second
successful workshop presentation by stakeholders, additional requirements were
identified, and a new version of the JPEG DNA overview document was issued and
made publicly available. It was decided to continue this exploration by
organising a third workshop and further outreach to stakeholders, as well as a
proposal for an updated version of the JPEG overview document. Interested
parties are invited to refer to the following URL and to consider joining the
effort by registering to the mailing list of JPEG DNA here:
JUMBF (ISO/IEC 19566-5)
Amendment 1 draft review is complete and it is proceeding to international
standard and subsequent publication; additional features to support new
applications are under consideration. Likewise, JPEG 360 (ISO/IEC
19566-5) Amendment 1 draft review is complete, and it is proceeding to
international standard and subsequent publication. The JLINK (ISO/IEC
19566-7) standard completed the committee draft review and is preparing a DIS
study text ahead of the 91st meeting. The JPEG Snack (ISO/IEC 19566-8) will
make a second working draft. Interested parties can subscribe to the mailing list of the
JPEG Systems AHG in order to contribute to the above activities.
JPEG XS 2nd edition of Profiles reaches DIS stage
The 2nd edition of Part 2 (Profiles) is now at the DIS stage and defines the required new profiles and levels to support the compression of raw Bayer content, mathematically lossless coding of up to 12-bit per component images, and 4:2:0 sampled image content. With the second editions of Parts 1, 2, and 3 completed, and the scheduled second editions of Part 4 (Conformance) and 5 (Reference Software), JPEG XS will soon have received a complete backwards-compatible revision of its entire suite of standards. Moreover, the committee defined a new exploration study to create new coding tools for improving the HDR and mathematically lossless compression capabilities, while still honoring the low-complexity and low-latency requirements.
“The official approval of JPEG AI by JPEG Parent Bodies ISO and IEC is a strong signal of support of this activity and its importance in the creation of AI-based imaging applications” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.
Future JPEG meetings are planned as follows:
No 91, will be held online from April 19 to 23, 2021.
No 92, will be held online from July 7 to 13, 2021.