Author: Antonio Pinheiro
Affiliation: Instituto de Telecomunicacoes (IT) and Universidade da Beira Interior (UBI), Portugal
JPEG analyses the responses of the Calls for Proposals for the standardisation of the first codecs based on machine learning
The 96th JPEG meeting was held online from 25 to 29 July 2022. The meeting was one of the most productive in the recent history of JPEG with the analysis of the responses of two Calls for Proposals (CfP) for machine learning-based coding solutions, notably JPEG AI and JPEG Pleno Point Cloud Coding. The superior performance of the CfP responses compared to the state-of-the-art anchors leave little doubt about the future of coding technologies becoming dominated by machine learning-based solutions with the expected consequences on the standardisation pathway. A new era of multimedia coding standardisation has begun. Both activities had defined a verification model, and are pursuing a collaborative process that will select the best technologies for the definition of the new machine learning-based standards.
The 96th JPEG meeting had the following highlights:
- JPEG AI response to the Call for Proposals;
- JPEG Pleno Point Cloud begins the collaborative standardisation phase;
- JPEG Fake Media and NFT
- JPEG Systems
- JPEG Pleno Light Field
- JPEG AIC
- JPEG XS
- JPEG 2000
- JPEG DNA
The following summarises the major achievements of the 96th JPEG meeting.
The 96th JPEG meeting represents an important milestone for the JPEG AI standardisation as it marks the beginning of the collaborative phase of this project. The main JPEG AI objective is to design a solution that offers significant compression efficiency improvement over coding standards in common use at equivalent subjective quality and an effective compressed domain processing for machine learning-based image processing and computer vision tasks.
During the 96th JPEG meeting, several activities occurred, notably presentation of the eleven responses to all tracks of the Call for Proposals (CfP). Furthermore, discussions on the evaluation process used to assess submissions to the CfP took place, namely, subjective, objective and complexity assessment as well as the identification of device interoperability issues by cross-checking. For the standard reconstruction track, several contributions showed significantly higher compression efficiency in both subjective quality methodologies and objective metrics when compared to the best-performing conventional image coding.
From the analysis and discussion of the results obtained, the most promising technologies were identified and a new JPEG AI verification model under consideration (VMuC) was approved. The VMuC corresponds to a combination of two proponents’ solutions (following the ‘one tool for one functionality’ principle), selected by consensus and considering the CfP decision criteria and factors. In addition, a set of JPEG AI Core Experiments were defined to obtain further improvements in both performance efficiency and complexity, notably the use of learning-based GAN training, alternative analysis/synthesis transforms and an evaluation study for the compressed-domain denoising as an image processing task. Several further activities were also discussed and defined, such as the design of a compressed domain image classification decoder VMuC, the creation of a large screen content dataset for the training of learning-based image coding solutions and the definition of a new and larger JPEG AI test set.
JPEG Pleno Point Cloud begins collaborative standardisation phase
JPEG Pleno integrates various modalities of plenoptic content under a single framework in a seamless manner. Efficient and powerful point cloud representation is a key feature of this vision. A point cloud refers to data representing positions of points in space, expressed in a given three-dimensional coordinate system, the so-called geometry. This geometrical data can be accompanied by per-point attributes of varying nature (e.g. color or reflectance). Such datasets are usually acquired with a 3D scanner, LIDAR or created using 3D design software and can subsequently be used to represent and render 3D surfaces. Combined with other types of data (like light field data), point clouds open a wide range of new opportunities, notably for immersive browsing and virtual reality applications.
Learning-based solutions are the state of the art for several computer vision tasks, such as those requiring a high-level understanding of image semantics, e.g., image classification, face recognition and object segmentation, but also 3D processing tasks, e.g. visual enhancement and super-resolution. Recently, learning-based point cloud coding solutions have shown great promise to achieve competitive compression efficiency compared to available conventional point cloud coding solutions at equivalent subjective quality. Building on a history of successful and widely adopted coding standards, JPEG is well positioned to develop a standard for learning-based point cloud coding.
During its 94th meeting, the JPEG Committee released a Final Call for Proposals on JPEG Pleno Point Cloud Coding. This call addressed learning-based coding technologies for point cloud content and associated attributes with emphasis on both human visualization and decompressed/reconstructed domain 3D processing and computer vision with competitive compression efficiency compared to point cloud coding standards in common use, with the goal of supporting a royalty-free baseline. During its 96th meeting, the JPEG Committee evaluated 5 codecs submitted in response to this Call. Following a comprehensive evaluation process, the JPEG Committee selected one of the proposals to form the basis of a future standard and initialised a sub-division to form Part 6 of ISO/IEC 21794. The selected submission was a learning-based approach to point cloud coding that met the requirements of the Call and showed competitive performance, both in terms of coding geometry and color, against existing solutions.
JPEG Fake Media and NFT
At the 96th JPEG meeting, 6 pre-registrations to the Final Call for Proposals (CfP) on JPEG Fake Media were received. The scope of JPEG Fake Media is the creation of a standard that can facilitate the secure and reliable annotation of media asset creation and modifications. The standard shall address use cases that are in good faith as well as those with malicious intent. The CfP welcomes contributions that address at least one of the extensive list of requirements specified in the associated “Use Cases and Requirements for JPEG Fake Media” document. Proponents who have not yet made a pre-registration are still welcome to submit their final proposal before 19 October 2022. Full details about the timeline, submission requirements and evaluation processes are documented in the CfP available on jpeg.org.
In parallel with the work on Fake Media, JPEG explores use cases and requirements related to Non Fungible Tokens (NFTs). Although the use cases between both topics are different, there is a significant overlap in terms of requirements and relevant solutions. The presentations and video recordings of the joint 5th JPEG NFT and Fake Media Workshop that took place prior to the 96th meeting are available on the JPEG website. In addition, a new version of the “Use Cases and Requirements for JPEG NFT” was produced and made publicly available for review and feedback.
During the 96th JPEG Meeting, the IS texts for both JLINK (ISO/IEC 19566-7) and JPEG Snack (ISO/IEC 19566-8) were prepared and submitted for final publication. JLINK specifies a format to store multiple images inside of JPEG files and supports interactive navigation between them. JLINK addresses use cases such as virtual museum tours, real estate visits, hotspot zoom into other images and many others. JPEG Snack on the other hand enables self-running multimedia experiences such as animated image sequences and moving image overlays. Both standards are based on the JPEG Universal Metadata Box Format (JUMBF, ISO/IEC 19566-5) for which a second edition is in progress. This second edition adds extensions to the native support of CBOR (Concise Binary Object Representation) and attaches private fields to the JUMBF Description Box.
JPEG Pleno Light Field
During its 96th meeting, the JPEG Committee released the “JPEG Pleno Second Draft Call for Contributions on Light Field Subjective Quality Assessment”, to collect new procedures and best practices for light field subjective quality evaluation methodologies to assess artefacts induced by coding algorithms. All contributions, which can be test procedures, datasets, and any additional information, will be considered to develop the standard by consensus among JPEG experts following a collaborative process approach. The Final Call for Contributions will be issued at the 97th JPEG meeting. The deadline for submission of contributions is 1 April 2023.
A JPEG Pleno Light Field AhG has also started the preparation of a first workshop on Subjective Light Field Quality Assessment and a second workshop on Learning-based Light field Coding, to exchange experiences, to present technological advances and research results on light field subjective quality assessment and to present technological advances and research results on learning-based coding solutions for light field data, respectively.
During its 96th meeting, a Second Draft Call for Contributions on Subjective Image Quality Assessment was issued. The final Call for Contributions is now planned to be issued at the 97th JPEG meeting. The standardization process will be collaborative from the very beginning, i.e. all submissions will be considered in developing the next extension of the JPEG AIC standard. The deadline for submissions has been extended to 1 April 2023 at 23:59 UTC. Multiple types of contributions are accepted, namely subjective assessment methods including supporting evidence and detailed description, test material, interchange format, software implementation, criteria and protocols for evaluation, additional relevant use cases and requirements, and any relevant evidence or literature. A dataset of sample images with compression-based distortions in the target quality range is planned to be prepared for the 97th JPEG meeting.
With the 2nd edition of JPEG XS now in place, the JPEG Committee continues with the development of the 3rd edition of JPEG XS Part 1 (Core coding system) and Part 2 (Profiles and buffer models). These editions will address new use cases and requirements for JPEG XS by defining additional coding tools to further improve the coding efficiency, while keeping the low-latency and low-complexity core aspects of JPEG XS. The primary goal of the 3rd edition is to deliver the same image quality as the 2nd edition, but for specific content such as screen content with half of the required bandwidth. In this respect, experiments have indicated that it is possible to increase the quality in static regions of an image sequence by more than 10dB when compared to the 2nd edition. Based on the input contributions, a first working draft for 21122-1 has been created, along with the necessary core experiments for further evaluation and verification.
In addition, JPEG has finalized the work on the amendment for Part 2 2nd edition that defines a new High 4:2:0 profile and the new sublevel Sublev4bpp. This amendment is now ready for publication by ISO. In the context of Part 4 (Conformance testing) and Part 5 (Reference software), the JPEG Committee decided to make both parts publicly available.
Finally, the JPEG Committee decided to create a series of public documents, called the “JPEG XS in-depth series” that will explain various features and applications of JPEG XS to a broad audience. The first document in this series explains the advantages of using JPEG XS for raw image compression and will be published soon on jpeg.org.
The JPEG Committee published a case study that compares HT2K, ProRes and JPEG 2000 Part 1 when processing motion picture content with widely available commercial software tools running on notebook computers, available at https://ds.jpeg.org/documents/jpeg2000/wg1n100269-096-COM-JPEG_Case_Study_HTJ2K_performance_on_laptop_desktop_PCs.pdf
JPEG 2000 is widely used in the media and entertainment industry for Digital Cinema distribution, studio video masters and broadcast contribution links. High Throughput JPEG 2000 (HTJ2K or JPEG 2000 Part 15) is an update to JPEG 2000 that provides an order of magnitude speed up over legacy JPEG 2000 Part 1.
The JPEG Committee has continued its exploration of the coding of images in quaternary representations, as it is particularly suitable for DNA storage applications. The scope of JPEG DNA is the creation of a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers. During the 96th JPEG meeting, a new version of the overview document on Use Cases and Requirements for DNA-based Media Storage was issued and has been made publicly available. The JPEG Committee also updated two additional documents: the JPEG DNA Benchmark Codec and the JPEG DNA Common Test Conditions in order to allow for concrete exploration experiments to take place. This will allow further validation and extension of the JPEG DNA benchmark codec to simulate an end-to-end image storage pipeline using DNA and in particular, include biochemical noise simulation which is an essential element in practical implementations. A new branch has been created in the JPEG Gitlab that now contains two anchors and two JPEG DNA benchmark codecs.
“After successful calls for contributions, the JPEG Committee sets precedence by launching the collaborative phase of two learning based visual information coding standards, hence announcing the start of a new era in coding technologies relying on AI.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.
Upcoming JPEG meetings are planned as follows:
- No 97, will be held online from 24-28 October 2022.
- No 98, will be in Sydney, Australia from 14-20 January 2022