MPEG Column: 142nd MPEG Meeting in Antalya, Türkiye

The 142nd MPEG meeting was held as a face-to-face meeting in Antalya, Türkiye, and the official press release can be found here and comprises the following items:

  • MPEG issues Call for Proposals for Feature Coding for Machines
  • MPEG finalizes the 9th Edition of MPEG-2 Systems
  • MPEG reaches the First Milestone for Storage and Delivery of Haptics Data
  • MPEG completes 2nd Edition of Neural Network Coding (NNC)
  • MPEG completes Verification Test Report and Conformance and Reference Software for MPEG Immersive Video
  • MPEG finalizes work on metadata-based MPEG-D DRC Loudness Leveling

The press release text has been modified to match the target audience of ACM SIGMM and highlight research aspects targeting researchers in video technologies. This column focuses on the 9th edition of MPEG-2 Systems, storage and delivery of haptics data, neural network coding (NNC), MPEG immersive video (MIV), and updates on MPEG-DASH.

© https://www.mpeg142.com/en/

Feature Coding for Video Coding for Machines (FCVCM)

At the 142nd MPEG meeting, MPEG Technical Requirements (WG 2) issued a Call for Proposals (CfP) for technologies and solutions enabling efficient feature compression for video coding for machine vision tasks. This work on “Feature Coding for Video Coding for Machines (FCVCM)” aims at compressing intermediate features within neural networks for machine tasks. As applications for neural networks become more prevalent and the neural networks increase in complexity, use cases such as computational offload become more relevant to facilitate the widespread deployment of applications utilizing such networks. Initially as part of the “Video Coding for Machines” activity, over the last four years, MPEG has investigated potential technologies for efficient compression of feature data encountered within neural networks. This activity has resulted in establishing a set of ‘feature anchors’ that demonstrate the achievable performance for compressing feature data using state-of-the-art standardized technology. These feature anchors include tasks performed on four datasets.

Research aspects: FCVCM is about compression, and the central research aspect here is compression efficiency which can be tested against a commonly agreed dataset (anchors). Additionally, it might be attractive to research which features are relevant for video coding for machines (VCM) and quality metrics in this emerging domain. One might wonder whether, in the future, robots or other AI systems will participate in subjective quality assessments.

9th Edition of MPEG-2 Systems

MPEG-2 Systems was first standardized in 1994, defining two container formats: program stream (e.g., used for DVDs) and transport stream. The latter, also known as MPEG-2 Transport Stream (M2TS), is used for broadcast and internet TV applications and services. MPEG-2 Systems has been awarded a Technology and Engineering Emmy® in 2013 and at the 142nd MPEG meeting, MPEG Systems (WG 3) ratified the 9th edition of ISO/IEC 13818-1 MPEG-2 Systems. The new edition includes support for Low Complexity Enhancement Video Coding (LCEVC), the youngest in the MPEG family of video coding standards on top of more than 50 media stream types, including, but not limited to, 3D Audio and Versatile Video Coding (VVC). The new edition also supports new options for signaling different kinds of media, which can aid the selection of the best audio or other media tracks for specific purposes or user preferences. As an example, it can indicate that a media track provides information about a current emergency.

Research aspects: MPEG container formats such as MPEG-2 Systems and ISO Base Media File Format are necessary for storing and delivering multimedia content but are often neglected in research. Thus, I would like to take up the cudgels on behalf of the MPEG Systems working group and argue that researchers should pay more attention to these container formats and conduct research and experiments for its efficient use with respect to multimedia storage and delivery.

Storage and Delivery of Haptics Data

At the 142nd MPEG meeting, MPEG Systems (WG 3) reached the first milestone for ISO/IEC 23090-32 entitled “Carriage of haptics data” by promoting the text to Committee Draft (CD) status. This specification enables the storage and delivery of haptics data (defined by ISO/IEC 23090-31) in the ISO Base Media File Format (ISOBMFF; ISO/IEC 14496-12). Considering the nature of haptics data composed of spatial and temporal components, a data unit with various spatial or temporal data packets is used as a basic entity like an access unit of audio-visual media. Additionally, an explicit indication of a silent period considering the sparse nature of haptics data has been introduced in this draft. The standard is planned to be completed, i.e., to reach the status of Final Draft International Standard (FDIS), by the end of 2024.

Research aspects: Coding (ISO/IEC 23090-31) and carriage (ISO/IEC 23090-32) of haptics data goes hand in hand and needs further investigation concerning compression efficiency and storage/delivery performance with respect to various use cases.

Neural Network Coding (NNC)

Many applications of artificial neural networks for multimedia analysis and processing (e.g., visual and acoustic classification, extraction of multimedia descriptors, or image and video coding) utilize edge-based content processing or federated training. The trained neural networks for these applications contain many parameters (weights), resulting in a considerable size. Therefore, the MPEG standard for the compressed representation of neural networks for multimedia content description and analysis (NNC, ISO/IEC 15938-17, published in 2022) was developed, which provides a broad set of technologies for parameter reduction and quantization to compress entire neural networks efficiently.

Recently, an increasing number of artificial intelligence applications, such as edge-based content processing, content-adaptive video post-processing filters, or federated training, need to exchange updates of neural networks (e.g., after training on additional data or fine-tuning to specific content). Such updates include changes in the neural network parameters but may also involve structural changes in the neural network (e.g. when extending a classification method with a new class). In scenarios like federated training, these updates must be exchanged frequently, such that much more bandwidth over time is required, e.g., in contrast to the initial deployment of trained neural networks.

The second edition of NNC addresses these applications through efficient representation and coding of incremental updates and extending the set of compression tools that can be applied to both entire neural networks and updates. Trained models can be compressed to at least 10-20% and, for several architectures, even below 3% of their original size without performance loss. Higher compression rates are possible at moderate performance degradation. In a distributed training scenario, a model update after a training iteration can be represented at 1% or less of the base model size on average without sacrificing the classification performance of the neural network. NNC also provides synchronization mechanisms, particularly for distributed artificial intelligence scenarios, e.g., if clients in a federated learning environment drop out and later rejoin.

Research aspects: The incremental compression of neural networks enables various new use cases, which provides research opportunities for media coding and communication, including optimization thereof.

MPEG Immersive Video

At the 142nd MPEG meeting, MPEG Video Coding (WG 4) issued the verification test report of ISO/IEC 23090-12 MPEG immersive video (MIV) and completed the development of the conformance and reference software for MIV (ISO/IEC 23090-23), promoting it to the Final Draft International Standard (FDIS) stage.

MIV was developed to support the compression of immersive video content, in which multiple real or virtual cameras capture a real or virtual 3D scene. The standard enables the storage and distribution of immersive video content over existing and future networks for playback with 6 degrees of freedom (6DoF) of view position and orientation. MIV is a flexible standard for multi-view video plus depth (MVD) and multi-planar video (MPI) that leverages strong hardware support for commonly used video formats to compress volumetric video.

ISO/IEC 23090-23 specifies how to conduct conformance tests and provides reference encoder and decoder software for MIV. This draft includes 23 verified and validated conformance bitstreams spanning all profiles and encoding and decoding reference software based on version 15.1.1 of the test model for MPEG immersive video (TMIV). The test model, objective metrics, and other tools are publicly available at https://gitlab.com/mpeg-i-visual.

Research aspects: Conformance and reference software are usually provided to facilitate product conformance testing, but it also provides researchers with a common platform and dataset, allowing for the reproducibility of their research efforts. Luckily, conformance and reference software are typically publicly available with an appropriate open-source license.

MPEG-DASH Updates

Finally, I’d like to provide a quick update regarding MPEG-DASH, which has become a new part, namely redundant encoding and packaging for segmented live media (REAP; ISO/IEC 23009-9). The following figure provides the reference workflow for redundant encoding and packaging of live segmented media.

Reference workflow for redundant encoding and packaging of live segmented media.

The reference workflow comprises (i) Ingest Media Presentation Description (I-MPD), (ii) Distribution Media Presentation Description (D-MPD), and (iii) Storage Media Presentation Description (S-MPD), among others; each defining constraints on the MPD and tracks of ISO base media file format (ISOBMFF).

Additionally, the MPEG-DASH Break out Group discussed various technologies under consideration, such as (a) combining HTTP GET requests, (b) signaling common media client data (CMCD) and common media server data (CMSD) in a MPEG-DASH MPD, (c) image and video overlays in DASH, and (d) updates on lower latency.

An updated overview of DASH standards/features can be found in the Figure below.

Research aspects: The REAP committee draft (CD) is publicly available feedback from academia and industry is appreciated. In particular, first performance evaluations or/and reports from proof of concept implementations/deployments would be insightful for the next steps in the standardization of REAP.

The 143rd MPEG meeting will be held in Geneva from July 17-21, 2023. Click here for more information about MPEG meetings and their developments.

VQEG Column: Emerging Technologies Group (ETG)

Introduction

This column provides an overview of the new Video Quality Experts Group (VQEG) group called the Emerging Technologies Group (ETG), which was created during the last VQEG plenary meeting in December 2022. For an introduction to VQEG, please check the VQEG homepage or this presentation.

The works addressed by this new group can be of interest for the SIGMM community since they are related to AI-based technologies for image and video processing, greening of streaming, blockchain in media and entertainment, and ongoing related standardization activities.

About ETG

The main objective of this group is to address various aspects of multimedia that do not fall under the scope of any of the existing VQEG groups. The group, through its activities, aims to provide a common platform for people to gather together and discuss new emerging topics and ideas, discuss possible collaborations in the form of joint survey papers/whitepapers, funding proposals, etc. The topics addressed are not necessarily directly related to “video quality” but rather focus on any ongoing work in the field of multimedia which can indirectly impact the work addressed as part of VQEG. 

Scope

During the creation of the group, the following topics were tentatively identified to be of possible interest to the members of this group and VQEG in general: 

  • AI-based technologies:
    • Super Resolution
    • Learning-based video compression
    • Video coding for machines, etc., 
    • Enhancement, Denoising and other pre- and post-filter techniques
  • Greening of streaming and related trends
    • For example, trade-off between HDR and SDR to save energy and its impact on visual quality
  • Ongoing Standards Activities (which might impact the QoE of end users and hence will be relevant for VQEG)
    • 3GPP, SVTA, CTA WAVE, UHDF, etc.
    • MPEG/JVET
  • Blockchain in Media and Entertainment

Since the creation of the group, four talks on various topics have been organized, an overview of which is summarized next.

Overview of the Presentations

We briefly provide a summary of various talks that have been organized by the group since its inception.

On the work by MPEG Systems Smart Contracts for Media Subgroup

The first presentation was on the topic of the recent work by MPEG Systems on Smart Contract for Media [1], which was delivered by Dr Panos Kudumakis who is the Head of UK Delegation, ISO/IEC JTC1/SC29 & Chair of British Standards Institute (BSI) IST/37. Dr Panos in this talk highlighted the efforts in the last few years by MPEG towards developing several standardized ontologies catering to the needs of the media industry with respect to the codification of Intellectual Property Rights (IPR) information toward the fair trade of media. However, since inference and reasoning capabilities normally associated with ontology use cannot naturally be done on DLT environments, there is a huge potential to unlock the Semantic Web and, in turn, the creative economy by bridging this interoperability gap [2]. In that direction, ISO/IEC 21000-23 Smart Contracts for Media standard specifies the means (e.g., APIs) for converting MPEG IPR ontologies to smart contracts that can be executed on existing DLT environments [3]. The talk discussed the recent works that have been done as part of this effort and also on the ongoing efforts towards the design of a full-fledged ISO/IEC 23000-23 Decentralized Media Rights Application Format standard based on MPEG technologies (e.g., audio-visual codecs, file formats, streaming protocols, and smart contracts) and non-MPEG technologies (e.g., DLTs, content, and creator IDs). 
The recording of the presentation is available here, and the slides can be accessed here.

Introduction to NTIRE Workshop on Quality Assessment for Video Enhancement

The second presentation was given by Xiaohong Liu and Yuxuan Gao from Shanghai Jiao Tong University, China about one of the CVPR challenge workshops called the NTIRE 2023 Quality Assessment of Video Enhancement Challenge. The presentation described the motivation for starting this challenge and how this is of great relevance to the video community in general. Then the presenters described the dataset such as the dataset creation process, subjective tests to obtain ratings, and the reasoning behind the choice of the split of the dataset into training, validation, and test sets. The results of this challenge are scheduled to be presented at the upcoming spring meeting end of June 2023. The presentation recording is available here.  

Perception: The Next Milestone in Learned Image Compression

Johannes Balle from Google was the third presenter on the topic of “Perception: The Next Milestone in Learned Image Compression.” In the first part, Johannes discussed the learned compression and described the nonlinear transforms [4] and how they could achieve a higher image compression rate than linear transforms. Next, they emphasized the importance of perceptual metrics in comparison to distortion metrics by introducing the difference between perceptual quality vs. reconstruction quality [5]. Next, an example of generative-based image compression is presented where the two criteria of distortion metric and perceptual metric (named as realism criteria) are combined, HiFiC [6]. Finally, the talk concluded with an introduction to perceptual spaces and an example of a perceptual metric, PIM [7]. The presentation slides can be found here.

Compression with Neural Fields

Emilien Dupont (DeepMind) was the fourth presenter. He started the talk with a short introduction on the emergence of neural compression that fits a signal, e.g., an image or video, to a neural network. He then discussed the two recent works on neural compression that he was involved in, named COIN [8] and COIN++ [9].  He then made a short overview of other Implicit Neural Representation in the domain of video such as NerV [10] and NIRVANA [11]. The slides for the presentation can be found here.

Upcoming Presentations

As part of the ongoing efforts of the group, the following talks/presentations are scheduled in the next two months. For an updated schedule and list of presentations, please check the ETG homepage here.

Sustainable/Green Video Streaming

Given the increasing carbon footprint of streaming services and climate crisis, many new collaborative efforts have started recently, such as the Greening of the Streaming alliance, Ultra HD Sustainability forum, etc. In addition, research works recently have started focussing on how to make video streaming more greener/sustainable. A talk providing an overview of the recent works and progress in direction is tentatively scheduled around mid-May, 2023.    

Panel discussion at VQEG Spring Meeting (June 26-30, 2023), Sony Interactive Entertainment HQ, San Mateo, US

During the next face-to-face VQEG meeting in San Mateo there will be an interesting panel discussion on the topic of “Deep Learning in Video Quality and Compression.” The goal is to invite the machine learning experts to VQEG and bring the two groups closer. ETG will organize the panel discussion, and the following four panellists are currently invited to join this event: Zhi Li (Netflix), Ioannis Katsavounidis (Meta), Richard Zhang (Adobe), and Mathias Wien (RWTH Aachen). Before this panel discussion, two talks are tentatively scheduled, the first one on video super-resolution and the second one focussing on learned image compression. 
The meeting will talk place in hybrid mode allowing for participation both in-person and online. For further information about the meeting, please check the details here and if interested, register for the meeting.

Joining and Other Logistics

While participation in the talks is open to everyone, to get notified about upcoming talks and participate in the discussion, please consider subscribing to etg@vqeg.org email reflector and join the slack channel using this link. The meeting minutes are available here. We are always looking for new ideas to improve. If you have suggestions on topics we should focus on or have recommendation of presenters, please reach out to the chairs (Nabajeet and Saman).

References

[1] White paper on MPEG Smart Contracts for Media.
[2] DLT-based Standards for IPR Management in the Media Industry.
[3] DLT-agnostic Media Smart Contracts (ISO/IEC 21000-23).
[4] [2007.03034] Nonlinear Transform Coding.
[5] [1711.06077] The Perception-Distortion Tradeoff.
[6] [2006.09965] High-Fidelity Generative Image Compression.
[7] [2006.06752] An Unsupervised Information-Theoretic Perceptual Quality Metric.
[8] Coin: Compression with implicit neural representations.
[9] COIN++: Neural compression across modalities.
[10] Nerv: Neural representations for videos.
[11] NIRVANA: Neural Implicit Representations of Videos with Adaptive Networks and Autoregressive Patch-wise Modeling.

JPEG Column: 98th JPEG meeting in Sydney, Australia

JPEG explores standardization in event-based imaging

The 98th JPEG meeting was held in Sydney, Australia, from the 16th to 20th January 2023. This was a welcome return to face-to-face meetings after a long period of online meetings due to Covid-19 pandemics. Interestingly, the previous face-to-face meeting of the JPEG Committee was also held in Sydney, in January 2020. The face-to-face 98th JPEG meeting was complemented with online connections to allow the remote participation of those who could not be present.

The recent calls for proposals, such as JPEG Fake Media, JPEG AI and JPEG Pleno Learning Based Point Cloud Coding, resulted in a very dynamic and participative meeting in Sydney, with multiple technical sessions and decisions. Exploration activities such as JPEG DNA and JPEG NFT also produced drafts of future calls for proposals as a consequence of reaching sufficient maturity.

Furthermore, and considering the current trends in machine-based imaging applications, the JPEG Committee initiated an exploration on standardization in event-based imaging.

98th JPEG Meeting first plenary.

The 98th JPEG meeting had the following highlights:

  • New JPEG exploration in event-based imaging;
  • JPEG Fake Media and NFT;
  • JPEG AI;
  •  JPEG Pleno Learning-based Point Cloud Coding improves its Verification Model;
  • JPEG AIC prepares the analysis of the responses to the Call for Contribution;
  • JPEG XL second editions;
  • JPEG Systems;
  • JPEG DNA prepares its call for proposals;
  • JPEG XS 3rd Edition;
  • JPEG 2000 guidelines.

The following summarizes the major achievements during the 98th JPEG meeting.

New JPEG exploration in event-based imaging

The JPEG Committee has started a new exploration activity on event-based imaging named JPEG XE.

Event-based Imaging revolves around a new and emerging image modality created by event-based visual sensors. Event-based sensors are the foundation for a new class of cameras that allow the efficient capture of visual information at high speed while at the same time requiring low computational cost, a requirement which it is common in many machine vision applications. Such sensors are modeled based on the mechanisms of the human visual system for the detection of scene changes and the asynchronous capture of those changes. This means that every pixel works individually to detect scene changes and creates the associated events. If nothing happens, then no events are generated. This contrasts with conventional image sensors, where pixels are sampled in a continuous and periodic manner, with images generated regardless of any changes in the scene and a risk of reacting with delay and even missing quick changes.

The JPEG Committee recognizes that this new image modality opens doors to a large number of applications where capture and processing of visual information is needed. Currently, there is no standard format to represent event-based information, and therefore existing and emerging applications are fragmented and lack interoperability. The new JPEG XE activity focuses on establishing a scope and relevant definitions, collecting use cases and their associated requirements, and investigating the role that JPEG can play in the definition of timely standards in the near- and long-term. To start, an Ad-hoc Group has been established. To stay informed about the activities please join the event based imaging Ad-hoc Group mailing list.

JPEG Fake Media and NFT

In April 2022, the JPEG Committee released a Final Call for Proposals on JPEG Fake Media. The scope of JPEG Fake Media is the creation of a standard that can facilitate the secure and reliable annotation of media assets creation and modifications. During the 98th meeting, the JPEG Committee finalised the evaluation of the six submitted proposals and initiated the process for establishing a new standard.

The JPEG Committee also continues to explore use cases and requirements related to Non-Fungible Tokens (NFTs). Although the use cases for both topics are very different, there is a clear commonality in terms of requirements and relevant solutions. An updated version of the “Use Cases and Requirements for JPEG NFT” was produced and made publicly available for review and feedback.

To stay informed about the activities, please join the mailing list of the Ad-hoc Group and regularly check the JPEG website for the latest information.

JPEG AI

Following the creation of the JPEG AI Verification Model at the previous 97th JPEG meeting, more discussions occurred at the 98th meeting to improve the coding efficiency, and complexity, especially on the decoder side. The JPEG AI VM has several unique characteristics, such as a parallelizable context model to perform latent prediction, decoupling of prediction and sample reconstruction, and rate adaptation, among others. JPEG AI VM shows up to 31% compression gain over VVC Intra for natural content. A new JPEG AI test set was released during the 98th meeting. This is a large dataset for the evaluation of the JPEG AI VM containing 50 images, with the objective of tracking the performance improvements at every meeting. The JPEG AI Common Training and Test Conditions were updated to include this new dataset. In this meeting, it was also decided to integrate several changes into the JPEG AI VM, speeding up training, improving performance at high rates and fixing bugs. A set of core experiments were established at this meeting targeting RD performance and complexity improvements. The JPEG AI VM Software Guidelines were approved, describing the initial setup repository of JPEG AI VM, how to obtain the JPEG AI dataset, and how to run tests and training. A description of the structure of the JPEG AI VM repository was also made available.

JPEG Pleno Learning-based Point Cloud coding

The JPEG Pleno Point Cloud activity progressed at this meeting with a number of technical submissions for improvements to the VM in the area of colour coding, artefact processing and improvements to coding speed. In addition, the JPEG Committee released the “Call for Content for JPEG Pleno Point Cloud Coding” to expand on the current training and test set with new point clouds representing key use cases. Prior to the 99th JPEG Meeting, JPEG experts will promote the Call for Content as well as investigate possible advancements to the VM in the areas of auto-regressive entropy encoding, sparse tensor convolution, meta-data controlled post-filtering of colour and a flexible split geometry and colour coding framework for the VM.

JPEG AIC

During the 98th JPEG meeting in Sydney, Australia, Exploration Study 1 on JPEG AIC was established. This exploration study will collect results from three types of previously standardized subjective evaluation methodologies in order to provide an informative reference for the JPEG AIC submissions to the Call for Contributions that are due by April 1st, 2023. Corrections and additions to the JPEG AIC Common Test Conditions were issued in order to reflect the addition of a new codec for testing content generation and a new anchor subjective quality assessment methodology.

The JPEG Committee is working on the continuation of the previous standardization efforts (AIC-1 and AIC-2) and aims at developing a new standard, known as AIC-3. The new standard will focus on the methodologies for quality assessment of images in a range that goes from high quality to near-visually lossless quality, which are not covered by any previous AIC standards.

JPEG XL

The second editions of JPEG XL Part 1 (Core coding system) and Part 2 (File format) have reached the CD stage. These second editions provide clarifications, corrections and editorial improvements that will facilitate independent implementations. Also, an updated version of the JPEG XL White Paper has been published and is freely available through jpeg.org.

JPEG Systems

The JLINK standard (19566-7:2022) is now published by ISO. JLINK specifies an image file format capable of linking multiple media elements, such as image and text in any JPEG file format. It enables enhanced curated experiences of a set of images for education, training, virtual museum tours, travelogs, and similar visually-oriented content.

The JPEG Snack (19566-8) standard is expected to be published in February 2023. JPEG Snack specifies the coding of audio, picture, multimedia and hypermedia information, enabling a rich, image-based, short-form animated experiences for social media.

The second edition of JUMBF (JPEG Universal Metadata Box Format, 19566-5) is progressing to IS stage; the second edition brings new capabilities and support for additional types of media.

JPEG DNA

The JPEG Committee has been working on an exploration for coding of images in quaternary representations particularly suitable for image archival on DNA storage. The scope of JPEG DNA is the creation of a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers. During the 98th JPEG meeting, a draft Call for Proposals for JPEG DNA was issued and made public, as a first concrete step towards standardisation. The draft call for proposals for JPEG DNA is complemented by a JPEG DNA Common Test Conditions document which is also made public, describing details about the dataset, operating points, anchors and performance assessment methodologies and metrics that will be used to evaluate anchors and future responses to the Call for Proposals. The final Call for Proposals for JPEG DNA is expected to be released at the conclusion of the 99th JPEG meeting in April 2023, after a set of exploration experiments have validated the procedures outlined in the draft Call for Proposals for JPEG DNA and JPEG DNA Common Test Conditions. The deadline for submission of proposals to the Call for Proposals for JPEG DNA is 2 October 2023 with a pre-registration due by 10 July 2023. The JPEG DNA international standard is expected to be published by early 2025.

JPEG XS

The JPEG Committee continued with the definition of JPEG XS 3rd edition. The primary goal of the 3rd edition is to deliver the same image quality as the 2nd edition, but with half of the required bandwidth. The Committee Draft for Part 1 (Core coding system) will proceed to ISO ballot. This means that the standard is now technically defined, and all the new coding tools are known. Most notably, Part 1 adds a temporal decorrelation coding mode to further improve the coding efficiency, while keeping the low-latency and low-complexity core aspects of JPEG XS. This new coding tool is of extreme importance for remote desktop applications and screen sharing. In addition, mathematically lossless coding can now support up to 16 bits precision (up from 12 bits). For Part 2 (Profiles and buffer models), the committee created a second Working Draft and issued further core experiments to proceed and support this work. Meanwhile, ISO approved the creation of a new edition of Part 3 (Transport and container formats) that is needed to address the changes of Part 1 and Part 2.

JPEG 2000

The JPEG committee publishes two sets of guidelines for implementers of JPEG 2000, available on jpeg.org.

The first describes an algorithm for controlling JPEG 2000 coding quality using a single number (Qfactor) between 1 (worst quality) and 100 (best quality), as is commonly done with JPEG.

The second explains how to create, parse and use HTJ2K placeholder passes and HT Sets. These features are an integral part of HTJ2K and enable mathematically lossless transcoding between HT- and J2K-based codestreams, among other applications.

Final Quote

“The interest in event-based imaging has been rising with several products designed and offered by the industry. The JPEG Committee believes in interoperable solutions and has initiated an exploration for standardization of event-based imaging in order to accelerate creation of an ecosystem.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Upcoming JPEG meetings are planned as follows:

  • No 99, will be online from 24-28 April 2023
  • No 100, will be in Covilhã, Portugal from 17-21 July 2023

VQEG Column: VQEG Meeting December 2022

Introduction

This column provides an overview of the last Video Quality Experts Group (VQEG) plenary meeting, which took place from 12 to 16 December 2022. Around 100 participants from 21 different countries around the world registered for the meeting that was organized online by Brightcove (United Kingdom). During the five days, there were more than 40 presentations and discussions among researchers working on topics related to the projects ongoing within VQEG. All the related information, minutes, and files from the meeting are available online on the VQEG meeting website, and video recordings of the meeting are available on Youtube.

Many of the works presented in this meeting can be relevant for the SIGMM community working on quality assessment. Particularly interesting can be the proposals to update and merge ITU-T recommendations P.913, P.911, and P.910, the kick-off of the test plan to evaluate the QoE of immersive interactive communication systems, and the creation of a new group on emerging technologies that will start working on AI-based technologies and greening of streaming and related trends.

We encourage readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Group picture of the VQEG Meeting 12-16 December 2022 (online).

Overview of VQEG Projects

Audiovisual HD (AVHD)

The AVHD group investigates improved subjective and objective methods for analysing commonly available video systems. Currently, there are two projects ongoing under this group: Quality of Experience (QoE) Metrics for Live Video Streaming Applications (Live QoE) and Advanced Subjective Methods (AVHD-SUB).

In this meeting, there were three presentations related to topics covered by this group. In the first one, Maria Martini (Kingston University, UK), presented her work on converting video quality assessment metrics. In particular, the work addressed the relationship between SSIM and PSNR for DCT-based compressed images and video, exploiting the content-related factor [1]. The second presentation was given by Urvashi Pal (Akamai, Australia) and dealt with video codec profiling with video quality assessment complexities and resolutions. Finally, Jingwen Zhu (Nantes Université, France) presented her work on the benefit of parameter-driven approaches for the modelling and the prediction of a Satisfied User Ratio for compressed videos [2].

Quality Assessment for Health applications (QAH)

The QAH group works on the quality assessment of health applications, considering both subjective evaluation and the development of datasets, objective metrics, and task-based approaches. Currently there is an open discussion on new topics to address within the group, such as the application of visual attention models and studies to health applications. Also, an opportunity to conduct medical perception research was announced, which was proposed by Elizabeth Krupinski and will take place in the European Congress of Radiology (Vienna, Austria, Mar. 2023).

In addition, four research works were presented at the meeting. Firstly, Julie Fournier (INSA Rennes, France) presented new insights on affinity therapy for people with ASD, based on an eye-tracking study on images. The second presentation was delivered by Lumi Xia (INSA Rennes, France) and dealt with the evaluation of the usability of deep learning-based denoising models for low-dose CT simulation. Also, Mohamed Amine Kerkouri (University of Orleans, France), presented his work on deep-based quality assessment of medical images through domain adaptation. Finally, Jorge Caviedes (ASU, USA) delivered a talk on cognition inspired diagnostic image quality models, emphasising the need of distinguishing among interpretability (e.g., medical professional is confident in making a diagnosis), adequacy (e.g., capture technique shows the right area for assessment), and visual quality (e.g., MOS) in quality assessment of medical contents.

Statistical Analysis Methods (SAM)

The SAM group works on improving analysis methods both for the results of subjective experiments and for objective quality models and metrics. The group is currently working on updating and merging the ITU-T recommendations P.913, P.911, and P.910. The suggestion is to make P.910 and P.911 obsolete and make P.913 the only recommendation from ITU-T on subjective video quality assessments. The group worked on the liaison and document to be sent to ITU-T SG12 and will be available in the meeting files.

In addition, Mohsen Jenadeleh (Univerity of Konstanz, Germany) presented his work on collective just noticeable difference assessment for compressed video with Flicker Test and QUEST+.

Computer Generated Imagery (CGI)

CGI group is devoted to analysing and evaluating computer-generated content, with a focus on gaming in particular. The group is currently working in collaboration with ITU-T SG12 on the work item P.BBQCG on Parametric bitstream-based Quality Assessment of Cloud Gaming Services. In this sense, Saman Zadtootaghaj (Sony Interactive Entertainment, Germany) provided an update on the ongoing activities. In addition, they are working on two new work items: G.OMMOG on Opinion Model for Mobile Online Gaming applications and P.CROWDG on Subjective Evaluation of Gaming Quality with a Crowdsourcing Approach. Also, the group is working on identifying other topics and interests in CGI rather than gaming content.

No Reference Metrics (NORM)

The NORM group is an open collaborative project for developing no-reference metrics for monitoring visual service quality. Currently, the group is working on three topics: the development of no-reference metrics, the clarification of the computation of the Spatial and Temporal Indexes (SI and TI, defined in the ITU-T Recommendation P.910), and the development of a standard for video quality metadata. 

In relation to the first topic, Margaret Pinson (NTIA/ITS, US), talked about why no-reference metrics for image and video quality lack accuracy and reproducibility [3] and presented new datasets containing camera noise and compression artifacts for the development of no-reference metrics by the group. In addition, Oliver Wiedeman (University of Konstanz, Germany) presented his work on cross-resolution image quality assessment.

Regarding the computation of complexity indices, Maria Martini (Kingston University, UK) presented a study comparing 12 metrics (and possible combinations) for assessing video content complexity. Vignesh V. Menon (University of Klagenfurt, Austria) presented a summary of live per-title encoding approaches using video complexity features. Ioannis Katsavounidis and Cosmin Stejerean (Meta, US) presented their work on using motion search to order videos by coding complexity, also making available the software in open source. In addition, they led a discussion on supplementing classic SI and TI with improved complexity metrics (VCA, motion search, etc.).

Finally, related to the third topic, Ioannis Katsavounidis (Meta, US) provided an update on the status of the project. Given that the idea is already mature enough, a contribution will be made to MPEG to consider the insertion of metadata of video metrics into the encoded video streams. In addition, a liaison with AOMedia will be established that may go beyond this particular topic. And include best practices on subjective testing, IMG topics, etc.

Joint Effort Group (JEG) – Hybrid

The JEG group was focused on a joint work to develop hybrid perceptual/bitstream metrics and gradually evolved over time to include several areas of Video Quality Assessment (VQA), such as the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. Currently, the group is working on research problems rather than algorithms and models with immediate applicability. In addition, the group has launched a new website, which includes a list of activities of interest, freely available publications, and other resources. 

Two examples of research problems addressed by the group were shown by the two presentations given by Lohic Fotio Tiotsop (Politecnico di Torino, Italy). The topic of the first presentation was related to the training of artificial intelligence observers for a wide range of applications, while the second presentation provided guidelines to train, validate, and publish DNN-based objective measures.

5G Key Performance Indicators (5GKPI)

The 5GKPI group studies the relationship between key performance indicators of new 5G networks and QoE of video services on top of them. In this meeting, Pablo Pérez (Nokia XR Lab, Spain) presented an overview of activities related to QoE and XR within 3GPP.

Immersive Media Group (IMG)

The IMG group is focused on the research on quality assessment of immersive media. The main joint activity going on within the group is the development of a test plan to evaluate the QoE of immersive interactive communication systems. After the discussions that took place in previous meetings and audio calls, a tentative schedule has been proposed to start the execution of the test plan in the following months. In this sense, a new work item will be proposed in the next ITU-T SG12 meeting to establish a collaboration between VQEG-IMG and ITU on this topic.

In addition to this, a variety of different topics related to immersive media technologies were covered in the works presented during the meeting. For example, Yaosi Hu (Wuhan University, China) presented her work on video quality assessment based on quality aggregation networks. In relation to light field imaging, Maria Martini (Kingston University, UK) exposed the main problems related to what light field quality assessment datasets are currently meeting and presented a new dataset. Also, there were three talks by researchers from CWI (Netherlands) dealing with point cloud QoE assessment: Silvia Rossi presented a behavioral analysis in a 6-DoF VR system, taking into account the influence of content, quality and user disposition [4]; Shishir Subramanyam presented his work related to the subjective QoE evaluation of user-centered adaptive streaming of dynamic point clouds [5]; and Irene Viola presented a point cloud objective quality assessment using PCA-based descriptors (PointPCA). Another presentation related to point cloud quality assessment was delivered by Marouane Tliba (Université d’Orleans, France), who presented an efficient deep-based graph objective metric

In addition, Shirin Rafiei (RISE, Sweden) gave a talk on UX and QoE aspects of remote control operations using a laboratory platform, Marta Orduna (Universidad Politécnica de Madrid, Spain) presented her work on comparing ACR, SSDQE, and SSCQE in long duration 360-degree videos, whose results will be used to submit a proposal to extend ITU-T Rec. P.919 for long sequences, and Ali Ak (Nantes Université, France) his work on just noticeable differences to HDR/SDR image/video quality.    

Quality Assessment for Computer Vision Applications (QACoViA)

The goal of the QACoViA group is to study the visual quality requirements for computer vision methods, where the “final observer” is an algorithm. Four presentations were delivered in this meeting addressing diverse related topics. In the first one, Mikołaj Leszczuk (AGH University, Poland) presented a method for assessing objective video quality for automatic license plate recognition tasks [6]. Also, Femi Adeyemi-Ejeye (University of Surrey, UK) presented his work related to the assessment of rail 8K-UHD CCTV facing video for the investigation of collisions. The third presentation dealt with the application of facial expression recognition and was delivered by Lucie Lévêque (Nantes Université, France), who compared the robustness of humans and deep neural networks on this task [7]. Finally, Alban Marie (INSA Rennes, France) presented a study video coding for machines through a large-scale evaluation of DNNs robustness to compression artefacts for semantic segmentation [8].

Other updates

In relation to the Human Factors for Visual Experiences (HFVE) group, Maria Martini (Kingston University, UK) provided a summary of the status of IEEE recommended practice for the quality assessment of light field imaging. Also, Kjell Brunnström (RISE, Sweden) presented a study related to the perceptual quality of video on simulated low temperatures in LCD vehicle displays.

In addition, a new group was created in this meeting called Emerging Technologies Group (ETG), whose main objective is to address various aspects of multimedia that do not fall under the scope of any of the existing VQEG groups. The topics addressed are not necessarily directly related to “video quality” but can indirectly impact the work addressed as part of VQEG. In particular, two major topics of interest were currently identified: AI-based technologies and greening of streaming and related trends. Nevertheless, the group aims to provide a common platform for people to gather together and discuss new emerging topics, discuss possible collaborations in the form of joint survey papers/whitepapers, funding proposals, etc.

Moreover, it was agreed during the meeting to make the Psycho-Physiological Quality Assessment (PsyPhyQA) group dormant until interest resumes in this effort. Also, it was proposed to move the Implementer’s Guide for Video Quality Metrics (IGVQM) project into the JEG-Hybrid, since their activities are currently closely related. This will be discussed in future group meetings and the final decisions will be announced. Finally, as a reminder, the VQEG GitHub with tools and subjective labs setup is still online and kept updated.

The next VQEG plenary meeting will take place in May 2023 and the location will be announced soon on the VQEG website.

References

[1] Maria G. Martini, “On the relationship between SSIM and PSNR for DCT-based compressed images and video: SSIM as content-aware PSNR”, TechRxiv. Preprint. https://doi.org/10.36227/techrxiv.21725390.v1, 2022.
[2] J. Zhu, P. Le Callet; A. Perrin, S. Sethuraman, K. Rahul, “On The Benefit of Parameter-Driven Approaches for the Modeling and the Prediction of Satisfied User Ratio for Compressed Video”, IEEE International Conference on Image Processing (ICIP), Oct. 2022.
[3] Margaret H. Pinson, “Why No Reference Metrics for Image and Video Quality Lack Accuracy and Reproducibility”, Frontiers in Signal Processing, Jul. 2022.
[4] S. Rossi, I. viola, P. Cesar, “Behavioural Analysis in a 6-DoF VR System: Influence of Content, Quality and User Disposition”, Proceedings of the 1st Workshop on Interactive eXtended Reality, Oct. 2022.
[5] S. Subramanyam, I. Viola, J. Jansen, E. Alexiou, A. Hanjalic, P. Cesar, “Subjective QoE Evaluation of User-Centered Adaptive Streaming of Dynamic Point Clouds”, International Conference on Quality of Multimedia Experience (QoMEX), Sep. 2022.
[6] M. Leszczuk, L. Janowski, J. Nawała, and A. Boev, “Method for Assessing Objective Video Quality for Automatic License Plate Recognition Tasks”, Communications in Computer and Information Science, Oct. 2022.
[7] L. Lévêque, F. Villoteau, E. V. B. Sampaio, M. Perreira Da Silva, and P. Le Callet, “Comparing the Robustness of Humans and Deep Neural Networks on Facial Expression Recognition”, Electronics, 11(23), Dec. 2022.
[8] A. Marie, K. Desnos, L. Morin, and Lu Zhang, “Video Coding for Machines: Large-Scale Evaluation of Deep Neural Networks Robustness to Compression Artifacts for Semantic Segmentation”, IEEE International Workshop on Multimedia Signal Processing (MMSP), Sep. 2022.

MPEG Column: 140th MPEG Meeting in Mainz, Germany

After several years of online meetings, the 140th MPEG meeting was held as a face-to-face meeting in Mainz, Germany, and the official press release can be found here and comprises the following items:

  • MPEG evaluates the Call for Proposals on Video Coding for Machines
  • MPEG evaluates Call for Evidence on Video Coding for Machines Feature Coding
  • MPEG reaches the First Milestone for Haptics Coding
  • MPEG completes a New Standard for Video Decoding Interface for Immersive Media
  • MPEG completes Development of Conformance and Reference Software for Compression of Neural Networks
  • MPEG White Papers: (i) MPEG-H 3D Audio, (ii) MPEG-I Scene Description

Video Coding for Machines

Video coding is the process of compression and decompression of digital video content with the primary purpose of consumption by humans (e.g., watching a movie or video telephony). Recently, however, massive video data is more and more analyzed without human intervention leading to a new paradigm referred to as Video Coding for Machines (VCM) which targets both (i) conventional video coding and (ii) feature coding (see here for further details).

At the 140th MPEG meeting, MPEG Technical Requirements (WG 2) evaluated the responses to the Call for Proposals (CfP) for technologies and solutions enabling efficient video coding for machine vision tasks. A total of 17 responses to this CfP were received, with responses providing various technologies such as (i) learning-based video codecs, (ii) block-based video codecs, (iii) hybrid solutions combining (i) and (ii), and (iv) novel video coding architectures. Several proposals use a region of interest-based approach, where different areas of the frames are coded in varying qualities.

The responses to the CfP reported an improvement in compression efficiency of up to 57% on object tracking, up to 45% on instance segmentation, and up to 39% on object detection, respectively, in terms of bit rate reduction for equivalent task performance. Notably, all requirements defined by WG 2 were addressed by various proposals.

Furthermore, MPEG Technical Requirements (WG 2) evaluated the responses to the Call for Evidence (CfE) for technologies and solutions enabling efficient feature coding for machine vision tasks. A total of eight responses to this CfE were received, of which six responses were considered valid based on the conditions described in the call:

  • For the tested video dataset increases in compression efficiency of up to 87% compared to the video anchor and over 90% compared to the feature anchor were reported.
  • For the tested image dataset, the compression efficiency can be increased by over 90% compared to both image and feature anchors.

Research aspects: the main research area is still the same as described in my last column, i.e., compression efficiency (incl. probably runtime, sometimes called complexity) and Quality of Experience (QoE). Additional research aspects are related to the actual task for which video coding for machines is used (e.g., segmentation, object detection, as mentioned above).

Video Decoding Interface for Immersive Media

One of the most distinctive features of immersive media compared to 2D media is that only a tiny portion of the content is presented to the user. Such a portion is interactively selected at the time of consumption. For example, a user may not see the same point cloud object’s front and back sides simultaneously. Thus, for efficiency reasons and depending on the users’ viewpoint, only the front or back sides need to be delivered, decoded, and presented. Similarly, parts of the scene behind the observer may not need to be accessed.

At the 140th MPEG meeting, MPEG Systems (WG 3) reached the final milestone of the Video Decoding Interface for Immersive Media (VDI) standard (ISO/IEC 23090-13) by promoting the text to Final Draft International Standard (FDIS). The standard defines the basic framework and specific implementation of this framework for various video coding standards, including support for application programming interface (API) standards that are widely used in practice, e.g., Vulkan by Khronos.

The VDI standard allows for dynamic adaptation of video bitstreams to provide the decoded output pictures so that the number of actual video decoders can be smaller than the number of elementary video streams to be decoded. In other cases, virtual instances of video decoders can be associated with the portions of elementary streams required to be decoded. With this standard, the resource requirements of a platform running multiple virtual video decoder instances can be further optimized by considering the specific decoded video regions to be presented to the users rather than considering only the number of video elementary streams in use. The first edition of the VDI standard includes support for the following video coding standards: High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC), and Essential Video Coding (EVC).

Research aspect: VDI is also a promising standard to enable the implementation of viewport adaptive tile-based 360-degree video streaming, but its performance still needs to be assessed in various scenarios. However, requesting and decoding individual tiles within a 360-degree video streaming application is a prerequisite for enabling efficiency in such cases, and VDI provides the basis for its implementation.

MPEG-DASH Updates

Finally, I’d like to provide a quick update regarding MPEG-DASH, which seems to be in maintenance mode. As mentioned in my last blog post, amendments, Defects under Investigation (DuI), and Technologies under Consideration (TuC) are output documents, as well as a new working draft called Redundant encoding and packaging for segmented live media (REAP), which eventually will become ISO/IEC 23009-9. The scope of REAP is to define media formats for redundant encoding and packaging of live segmented media, media ingest, and asset storage. The current working draft can be downloaded here.

Research aspects: REAP defines a distributed system and, thus, all research aspects related to such systems apply here, e.g., performance and scalability, just to name a few.

The 141st MPEG meeting will be online from January 16-20, 2023. Click here for more information about MPEG meetings and their developments.

JPEG Column: 97th JPEG Meeting

JPEG initiates specification on fake media based on responses to its call for proposals

The 97th JPEG meeting was held online from 24 to 28 October 2022. JPEG received responses to the Call for Proposals (CfP) on JPEG Fake Media, the first multimedia international standard designed to facilitate the secure and reliable annotation of media assets creation and modifications. In total six responses were received addressing different requirements in the scope of this standardization initiative. Moreover, relevant advances were made on the standardization of learning-based coding, notably the learning-based coding of images, JPEG AI, and JPEG Pleno point cloud coding. Furthermore, the explorations on quality assessment of images, JPEG AIC, and of JPEG Pleno light field had relevant advances with the definition of their Calls for Contributions and Common Test Conditions.

Also relevant, the 98th JPEG meeting will be held in Sydney, Australia, representing a return to physical meetings after the long COVID pandemics. This is a return, as the last physical meeting was also held in January 2020 in the same location, in Sydney, Australia.

The 97th JPEG meeting had the following highlights:

  • JPEG Fake Media responses to the Call for Proposals analysed,
  • JPEG AI Verification Model,
  • JPEG Pleno Learning-based Point Cloud coding Verification Model,
  • JPEG Pleno Light Field issues a Call for Contributions on Subjective Light Field Quality Assessment,
  • JPEG AIC issues a Call for Contributions on Subjective Image Quality Assessment,
  • JPEG DNA releases a draft of Common Test Conditions,
  • JPEG XS prepares third edition of core coding system, and profiles and buffer models,
  • JPEG 2000 conformance is under development.
Fig. 1: Fake Media application scenarios: Good faith vs Malicious intent.

The following summarises the major achievements of the 97th JPEG meeting.

JPEG Fake Media

In April 2022, the JPEG Committee released a Final Call for Proposals on JPEG Fake Media. The scope of JPEG Fake Media is the creation of a standard that can facilitate the secure and reliable annotation of media assets creation and modifications. The standard shall address use cases that are in good faith as well as those with malicious intent. During the 97th meeting in October 2022, the following six responses to the call were presented:

  1. Adobe/C2PA: C2PA Specification
  2. Huawei: Provenance and Right Management for Digital Contents in JPEG Fake Media
  3. Sony Group Corporation: Methods to keep track provenance of media asset and signing data
  4. Vrije Universiteit Brussel/imec: Media revision history tracking via asset decomposition and serialization
  5. UPC: MIPAMS Provenance module
  6. Newcastle University: Response to JPEG Fake Media standardization call

In the coming months, these proposals will be thoroughly evaluated following a process that is open, transparent, fair and unbiased and allows deep technical discussions to assess which proposals best address identified requirements. Based on the conclusions of these discussions, a new standard will be produced to address fake media and provide solutions for transparency related to media authenticity. The standard will combine the best elements of the six proposals.

To stay informed about the activities please join the JPEG Fake Media & NFT AHG mailing list and regularly check the JPEG website for the latest information.

JPEG AI

JPEG AI (ISO/IEC 6048) aims at the development of a learning-based image coding standard offering a single-stream, compact compressed domain representation, targeting both human visualization with significant compression efficiency improvement over state-of-the-art image coding standards at similar subjective quality, and improved performance for image processing and computer vision tasks. The evaluation of the Call for Proposals responses had already confirmed the industry interest, and the subjective tests presented at the 96th JPEG meeting showed results that significantly outperform conventional image compression solutions. 

The JPEG AI verification model has been issued as the outcome of this meeting and follows the integration effort of several neural networks and tools. There are several characteristics that make the JPEG AI Verification Model (VM) unique, such as the decoupling of the entropy decoding from the sample reconstruction and the exploitation of the spatial correlation between latents using a prediction and a fusion network as well as a massively parallelized auto-regressive network. The performance evaluation has shown significant RD performance improvements (as much as 32.2% of BD-rate over H.266/VVC) with competitive decoding complexity. Other functionalities such as rate adaptation and device interoperability have also been addressed with the use of gain units and the quantization of the weights in the entropy decoding module. Moreover, the adoption process for architectural changes and for new or improved coding tools in JPEG AI VM was approved. A set of core experiments have been defined for improving the JPEG AI VM and target the improvement of the coding efficiency and the reduction of the encoding and decoding complexity. The core experiments represent a set of promising technologies, such as learning-based GAN training, simplification of the analysis/synthesis transform, adaptive entropy coding alphabet, and even encoder-only tools and procedures for training speed-up.

JPEG Pleno Learning-based Point Cloud coding

The JPEG Pleno Point Cloud activity progressed at this meeting with the successful validation of the Verification Model under Consideration (VMuC). The VMuC was confirmed as the Verification Model (VM) to form the core of the future standard; ISO/IEC 21794 Part 6 JPEG Pleno: Learning-based Point Cloud Coding. The JPEG Committee has commenced work on the Working Draft of the standard, with initial text reviewed at this meeting. Prior to the next 98th JPEG Meeting, JPEG experts will investigate possible advancements to the VM in the area of auto-regressive entropy encoding and sparse tensor convolution as well as sourcing additional point clouds for the JPEG Pleno Point Cloud test set.

JPEG Pleno Light Field

During the 97th meeting, the JPEG Committee released the “JPEG Pleno Final Call for Contributions on Subjective Light Field Quality Assessment”, to collect new procedures and best practices regarding light field subjective quality evaluation methodologies to assess artifacts induced by coding algorithms. All contributions, including test procedures, datasets, and any additional information, will be considered to develop the standard by consensus among the JPEG experts following a collaborative process approach. The deadline for submission of contributions is April 1, 2023.


The JPEG Committee organized its 1st workshop on light field quality assessment to discuss challenges and current solutions for subjective light field quality assessment, explore relevant use cases and requirements, and provide a forum for researchers to discuss the latest findings in this area. The JPEG Committee also promoted its 2nd workshop on learning-based light field coding to exchange experiences and to present technological advances in learning-based coding solutions for light field data. The proceedings and video footage of both workshops are now accessible on the JPEG website.

JPEG AIC

At the 97th JPEG Meeting, a new JPEG AIC Final Call for Contributions on Subjective Image Quality Assessment was issued. The JPEG Committee is working on the continuation of the previous standardization efforts (AIC-1 and AIC-2) and aims at developing a new standard, known as AIC-3. The new standard will be focusing on the methodologies for quality assessment of images in a range that goes from high quality to near-visually lossless quality, which are not covered by the previous AIC standards.

The Call for Contributions on Subjective Image Quality Assessment is asking for contributions to the standardization process that will be collaborative from the very beginning. In this context, all received contributions will be considered for the development of the standard by consensus among the JPEG experts.

The JPEG Committee will be releasing a new JPEG AIC-3 Dataset on the 15th of December 2022. And the deadline for submitting contributions to the call is set to the 1st of April 2023 23:59 UTC. The contributors will be presenting their contributions at the 99th JPEG Meeting in April 2023.

The Call for Contributions on Subjective Image Quality Assessment addresses the development of a suitable subjective evaluation methodology standard. A second stage will address the objective perceptual visual quality evaluation models that perform well and have a good discriminative power in the high quality to near-visually lossless quality range.

JPEG DNA

The JPEG Committee has continued its exploration of the coding of images in quaternary representations, as it is particularly suitable for DNA storage applications. The scope of JPEG DNA is the creation of a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers. During the 97th JPEG meeting, the JPEG DNA Benchmark Codec and the JPEG DNA Common Test Conditions were updated to allow for additional concrete experiments to take place prior to issuing a draft call for proposals at the next meeting. This will also allow further validation and extension of the JPEG DNA benchmark codec to simulate an end-to-end image storage pipeline using DNA and in particular include biochemical noise simulation which is an essential element in practical implementations.

JPEG XS

The 2nd edition of JPEG XS is now fully completed and published. The JPEG Committee continues its work on the 3rd edition of JPEG XS, starting with Part 1 (Core coding system) and Part 2 (Profiles and buffer models). These editions will address new use cases and requirements for JPEG XS by defining additional coding tools to further improve the coding efficiency, while keeping the low-latency and low-complexity core aspects of JPEG XS. The primary goal of the 3rd edition is to deliver the same image quality as the 2nd edition, but with half of the required bandwidth. During the 97th JPEG meeting, a new Working Draft of Part 1 and a first Working Draft of Part 2 were created. To support the work a new Core Experiment was also issued to further test the proposed technology. Finally, an update to the JPEG XS White Paper has been published.

JPEG 2000

A new edition of Rec. ITU-T T.803 | ISO/IEC 15444-4 (JPEG 2000 conformance) is under development.

This new edition proposes to relax the maximum allowable errors so that well-designed 16-bit fixed-point implementations pass all compliance tests; adds two test codestreams to facilitate testing of inverse wavelet and component decorrelating transform accuracy, and adds several codestreams and files conforming to Rec. ITU-T 801 |ISO/IEC 15444-2 to facilitate the implementation of decoders and file format readers

Codestreams and test files can be found on the JPEG GitLab repository at: https://gitlab.com/wg1/htj2k-codestreams/-/merge_requests/14

Final Quote

“Motivated by the consumers’ concerns of manipulated contents, the JPEG Committee has taken concrete steps to define a new standard that provides interoperable solutions for a secure and reliable annotation of media assets creation and modifications” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Upcoming JPEG meetings are planned as follows:

  • No 98, will be in Sydney, Australia from 14-20 January 2022

VQEG Column: VQEG Meeting May 2022

Introduction

Welcome to this new column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG), which will provide an overview of the last VQEG plenary meeting that took place from 9 to 13 May 2022. It was organized by INSA Rennes (France), and it was the first face-to-face meeting after the series of online meetings due to the Covid-19 pandemic. Remote attendance was also offered, which made possible that around 100 participants, from 17 different countries, attended the meeting (more than 30 of them attended in person). During the meeting, more than 40 presentations were provided, and interesting discussion took place. All the related information, minutes, and files from the meeting are available online in the VQEG meeting website, and video recordings of the meeting are available in Youtube.

Many of the works presented at this meeting can be relevant for the SIGMM community working on quality assessment. Particularly interesting can be the proposals to update the ITU-T Recommendations P.910 and P.913, as well as the presented publicly available datasets. We encourage those readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Group picture of the VQEG Meeting 9-13 May 2022 in Rennes (France).

Overview of VQEG Projects

Audiovisual HD (AVHD)

The AVHD group investigates improved subjective and objective methods for analyzing commonly available video systems. In this sense, the group continues working on extensions of the ITU-T Recommendation P.1204 to cover other encoders (e.g., AV1) apart from H.264, HEVC, and VP9. In addition, the project’s Quality of Experience (QoE) Metrics for Live Video Streaming Applications (Live QoE) and Advanced Subjective Methods (AVHD-SUB) are still ongoing. 

In this meeting, several AVHD-related topics were discussed, supported by six different presentations. In the first one, Mikolaj Leszczuk (AGH University, Poland) presented an analysis of the influence on the subjective assessment of the quality of video transmission of experiment conditions, such as video sequence order, variation and repeatability that can entail a “learning” process of the test participants during the test. In the second presentation, Lucjan Janowski (AGH University, Poland) presented two proposals towards more ecologically valid experiment designs: the first one using the Absolute Category Rating [1] without scale but in a “think aloud” manner, and the second one called “Your Youtube, our lab” in which the user selects the content that he or she prefers and a question quality appears during the viewing experience through a specifically designed interface. Also dealing with the study of testing methodologies, Babak Naderi (TU-Berlin, Germany) presented work on subjective evaluation of video quality with a crowdsourcing approach, while Pierre David (Capacités, France) presented a three-lab experiment, involving Capacités (France), RISE (Sweden) and AGH University (Poland) on quality evaluation of social media videos. Kjell Brunnström (RISE, Sweden) continued by giving an overview of video quality assessment of Video Assistant Refereeing (VAR) systems, and lastly, Olof Lindman (SVT, Sweden) presented another effort to reduce the lack of open datasets with the Swedish Television (SVT) Open Content.

Quality Assessment for Health applications (QAH)

The QAH group works on the quality assessment of health applications, considering both subjective evaluation and the development of datasets, objective metrics, and task-based approaches. In this meeting, Lucie Lévêque (Nantes Université, France) provided an overview of the recent activities of the group, including a submitted review paper on objective quality assessment for medical images, a special session accepted for IEEE International Conference on Image Processing (ICIP) that will take place in October in Bordeaux (France), and a paper submitted to IEEE ICIP on quality assessment through detection task of covid-19 pneumonia. The work described in this paper was also presented by Meriem Outtas (INSA Rennes, France).

In addition, there were two more presentations related to the quality assessment of medical images. Firstly, Yuhao Sun (University of Edinburgh, UK) presented their research on a no-reference image quality metric for visual distortions on Computed Tomography (CT) scans [2]. Finally, Marouane Tliba (Université d’Orleans, France) presented his studies on quality assessment of medical images through deep-learning techniques using domain adaptation.

Statistical Analysis Methods (SAM)

The SAM group works on improving analysis methods both for the results of subjective experiments and for objective quality models and metrics. The group is currently working on a proposal to update the ITU-T Recommendation P.913, including new testing methods for subjective quality assessment and statistical analysis of the results. Margaret Pinson presented this work during the meeting.   

In addition, five presentations were delivered addressing topics related to the group activities. Jakub Nawała (AGH University, Poland) presented the Generalised Score Distribution to accurately describe responses from subjective quality experiments. Three presentations were provided by members of Nantes Université (France): Ali Ak presented his work on spammer detection on pairwise comparison experiments, Andreas Pastor talked about how to improve the maximum likelihood difference scaling method in order to measure the inter-content scale, and Chama El Majeny presented the functionalities of a subjective test analysis tool, whose code will be publicly available. Finally, Dietmar Saupe (Univerity of Konstanz, Germany) delivered a presentation on subjective image quality assessment with boosted triplet comparisons.

Computer Generated Imagery (CGI)

CGI group is devoted to analyzing and evaluating computer-generated content, with a focus on gaming in particular. Currently, the group is working on the ITU-T Work Item P.BBQCG on Parametric bitstream-based Quality Assessment of Cloud Gaming Services. Apart from this, Jerry (Xiangxu) Yu (University of Texas at Austin, US) presented a work on subjective and objective quality assessment of user-generated gaming videos and Nasim Jamshidi (TUB, Germany) presented a deep-learning bitstream-based video quality model for CG content.

No Reference Metrics (NORM)

The NORM group is an open collaborative project for developing no-reference metrics for monitoring visual service quality. Currently, the group is working on three topics: the development of no-reference metrics, the clarification of the computation of the Spatial and Temporal Indexes (SI and TI, defined in the ITU-T Recommendation P.910), and on the development of a standard for video quality metadata.  

At this meeting, this was one of the most active groups and the corresponding sessions included several presentations and discussions. Firstly, Yiannis Andreopoulos (iSIZE, UK) presented their work on domain-specific fusion of multiple objective quality metrics. Then, Werner Robitza (AVEQ GmbH/TU Ilmenau, Germany) presented the updates on SI/TI clarification activities, which is leading an update of the ITU-T Recommendation P.910. In addition, Lukas Krasula (Netflix, US) presented their investigations on the relation between banding annoyance and the overall quality perceived by the viewers. Hadi Amirpour (University of Klagenfurt, Austria) delivered two presentations related to their Video Complexity Analyzer and their Video Complexity Dataset, which are both publicly available. Finally, Mikołaj Leszczuk (AGH University , Poland) gave two talks on their research related to User-Generated Content (UGC) (a.k.a. in-the-wild video content) recognition and on advanced video quality indicators to characterise video content.   

Joint Effort Group (JEG) – Hybrid

The JEG group was focused on joint work to develop hybrid perceptual/bitstream metrics and gradually evolved over time to include several areas of Video Quality Assessment (VQA), such as the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. A report on the ongoing activities of the group was presented by Enrico Masala (Politecnico di Torino, Italy), which included the release of a new website to reflect the evolution that happened in the last few years within the group. Although currently the group is not directly seeking the development of new metrics or tools readily available for VQA, it is still working on related topics such as the studies by Lohic Fotio Tiotsop (Politecnico di Torino, Italy) on the sensitivity of artificial intelligence-based observers to input signal modification.

5G Key Performance Indicators (5GKPI)

The 5GKPI group studies the relationship between key performance indicators of new 5G networks and QoE of video services on top of them. In this meeting, Pablo Pérez (Nokia, Spain) presented an extended report on the group activities, from which it is worth noting the joint work on a contribution to the ITU-T Work Item G.QoE-5G

Immersive Media Group (IMG)

The IMG group is focused on the research on the quality assessment of immersive media. Currently, the main joint activity of the group is the development of a test plan for evaluating the QoE of immersive interactive communication systems. In this sense, Pablo Pérez (Nokia, Spain) and Jesús Gutiérrez (Universidad Politécnica de Madrid, Spain) presented a follow up on this test plan including an overview of the state-of-the-art on related works and a taxonomy classifying the existing systems [3]. This test plan is closely related to the work carried out by the ITU-T on QoE Assessment of eXtended Reality Meetings, so Gunilla Berndtsson (Ericsson, Sweden) presented the latest advances on the development of the P.QXM.  

Apart from this, there were four presentations related to the quality assessment of immersive media. Shirin Rafiei (RISE, Sweden) presented a study on QoE assessment of an augmented remote operating system for scaling in smart mining applications. Zhengyu Zhang (INSA Rennes, France) gave a talk on a no-reference quality metric for light field images based on deep-learning and exploiting angular and spatial information. Ali Ak (Nantes Université, France) presented a study on the effect of temporal sub-sampling on the accuracy of the quality assessment of volumetric video. Finally, Waqas Ellahi (Nantes Université, France) showed their research on a machine-learning framework to predict Tone-Mapping Operator (TMO) preference based on image and visual attention features [4].

Quality Assessment for Computer Vision Applications (QACoViA)

The goal of the QACoViA group is to study the visual quality requirements for computer vision methods. In this meeting, there were three presentations related to this topic. Mikołaj Leszczuk (AGH University, Poland) presented an objective video quality assessment method for face recognition tasks. Also, Alban Marie  (INSA Rennes, France) showed an analysis of the correlation of quality metrics with artificial intelligence accuracy. Finally, Lucie Lévêque (Nantes Université, France) gave an overview of a study on the reliability of existing algorithms for facial expression recognition [5]. 

Intersector Rapporteur Group on Audiovisual Quality Assessment (IRG-AVQA)

The IRG-AVQA group studies topics related to video and audiovisual quality assessment (both subjective and objective) among ITU-R Study Group 6 and ITU-T Study Group 12. In this sense, Chulhee Lee (Yonsei University, South Korea) and Alexander Raake (TU Ilmenau, Germany) provided an overview on ongoing activities related to quality assessment within ITU-R and ITU-T.

Other updates

In addition, the Human Factors for Visual Experiences (HFVE), whose objective is to uphold the liaison relation between VQEG and the IEEE standardization group P3333.1, presented their advances in relation to two standards: IEEE P3333.1.3 – Deep-Learning-based assessment of VE based on HF, which has been approved and published, and the IEEE P3333.1.4 on Light field imaging, which has been submitted and is in the process to be approved. Also, although there were not many activities in this meeting within the Implementer’s Guide for Video Quality Metrics (IGVQM) and the Psycho-Physiological Quality Assessment (PsyPhyQA) they are still active. Finally, as a reminder, the VQEG GitHub with tools and subjective labs setup is still online and kept updated.

The next VQEG plenary meeting will take place online in December 2022. Please, see VQEG Meeting information page for more information.

References

[1] ITU, “Subjective video quality assessment methods for multimedia applications”, ITU-T Recommendation P.910, Jul. 2022.
[2] Y. Sun, G. Mogos, “Impact of Visual Distortion on Medical Images”, IAENG International Journal of Computer Science, 1:49, Mar. 2022.
[3] P. Pérez, E. González-sosa, J. Gutiérrez, N. García, “Emerging Immersive Communication Systems: Overview, Taxonomy, and Good Practices for QoE Assessment”, Frontiers in Signal Processing, Jul. 2022.
[4] W. Ellahi, T. Vigier, P. Le Callet, “A machine-learning framework to predict TMO preference based on image and visual attention features”, International Workshop on Multimedia Signal Processing, Oct. 2021.
[5] E. M. Barbosa Sampaio, L. Lévêque, P. Le Callet, M. Perreira Da Silva, “Are facial expression recognition algorithms reliable in the context of interactive media? A new metric to analyse their performance”, ACM International Conference on Interactive Media Experiences, Jun. 2022.

JPEG Column: 96th JPEG Meeting

JPEG analyses the responses of the Calls for Proposals for the standardisation of the first codecs based on machine learning

The 96th JPEG meeting was held online from 25 to 29 July 2022. The meeting was one of the most productive in the recent history of JPEG with the analysis of the responses of two Calls for Proposals (CfP) for machine learning-based coding solutions, notably JPEG AI and JPEG Pleno Point Cloud Coding. The superior performance of the CfP responses compared to the state-of-the-art anchors leave little doubt about the future of coding technologies becoming dominated by machine learning-based solutions with the expected consequences on the standardisation pathway. A new era of multimedia coding standardisation has begun. Both activities had defined a verification model, and are pursuing a collaborative process that will select the best technologies for the definition of the new machine learning-based standards.

The 96th JPEG meeting had the following highlights:

JPEG AI and JPEG Pleno Point Cloud, the two first machine learning-based coding standards under development by JPEG.
  • JPEG AI response to the Call for Proposals;
  • JPEG Pleno Point Cloud begins the collaborative standardisation phase;
  • JPEG Fake Media and NFT
  • JPEG Systems
  • JPEG Pleno Light Field
  • JPEG AIC
  • JPEG XS
  • JPEG 2000
  • JPEG DNA

The following summarises the major achievements of the 96th JPEG meeting.

JPEG AI

The 96th JPEG meeting represents an important milestone for the JPEG AI standardisation as it marks the beginning of the collaborative phase of this project. The main JPEG AI objective is to design a solution that offers significant compression efficiency improvement over coding standards in common use at equivalent subjective quality and an effective compressed domain processing for machine learning-based image processing and computer vision tasks. 

During the 96th JPEG meeting, several activities occurred, notably presentation of the eleven responses to all tracks of the Call for Proposals (CfP). Furthermore, discussions on the evaluation process used to assess submissions to the CfP took place, namely, subjective, objective and complexity assessment as well as the identification of device interoperability issues by cross-checking. For the standard reconstruction track, several contributions showed significantly higher compression efficiency in both subjective quality methodologies and objective metrics when compared to the best-performing conventional image coding.

From the analysis and discussion of the results obtained, the most promising technologies were identified and a new JPEG AI verification model under consideration (VMuC) was approved. The VMuC corresponds to a combination of two proponents’ solutions (following the ‘one tool for one functionality’ principle), selected by consensus and considering the CfP decision criteria and factors. In addition, a set of JPEG AI Core Experiments were defined to obtain further improvements in both performance efficiency and complexity, notably the use of learning-based GAN training, alternative analysis/synthesis transforms and an evaluation study for the compressed-domain denoising as an image processing task. Several further activities were also discussed and defined, such as the design of a compressed domain image classification decoder VMuC, the creation of a large screen content dataset for the training of learning-based image coding solutions and the definition of a new and larger JPEG AI test set.

JPEG Pleno Point Cloud begins collaborative standardisation phase

JPEG Pleno integrates various modalities of plenoptic content under a single framework in a seamless manner. Efficient and powerful point cloud representation is a key feature of this vision. A point cloud refers to data representing positions of points in space, expressed in a given three-dimensional coordinate system, the so-called geometry. This geometrical data can be accompanied by per-point attributes of varying nature (e.g. color or reflectance). Such datasets are usually acquired with a 3D scanner, LIDAR or created using 3D design software and can subsequently be used to represent and render 3D surfaces. Combined with other types of data (like light field data), point clouds open a wide range of new opportunities, notably for immersive browsing and virtual reality applications.

Learning-based solutions are the state of the art for several computer vision tasks, such as those requiring a high-level understanding of image semantics, e.g., image classification, face recognition and object segmentation, but also 3D processing tasks, e.g. visual enhancement and super-resolution. Recently, learning-based point cloud coding solutions have shown great promise to achieve competitive compression efficiency compared to available conventional point cloud coding solutions at equivalent subjective quality. Building on a history of successful and widely adopted coding standards, JPEG is well positioned to develop a standard for learning-based point cloud coding.

During its 94th meeting, the JPEG Committee released a Final Call for Proposals on JPEG Pleno Point Cloud Coding. This call addressed learning-based coding technologies for point cloud content and associated attributes with emphasis on both human visualization and decompressed/reconstructed domain 3D processing and computer vision with competitive compression efficiency compared to point cloud coding standards in common use, with the goal of supporting a royalty-free baseline. During its 96th meeting, the JPEG Committee evaluated 5 codecs submitted in response to this Call. Following a comprehensive evaluation process, the JPEG Committee selected one of the proposals to form the basis of a future standard and initialised a sub-division to form Part 6 of ISO/IEC 21794. The selected submission was a learning-based approach to point cloud coding that met the requirements of the Call and showed competitive performance, both in terms of coding geometry and color, against existing solutions.

JPEG Fake Media and NFT

At the 96th JPEG meeting, 6 pre-registrations to the Final Call for Proposals (CfP) on JPEG Fake Media were received. The scope of JPEG Fake Media is the creation of a standard that can facilitate the secure and reliable annotation of media asset creation and modifications. The standard shall address use cases that are in good faith as well as those with malicious intent. The CfP welcomes contributions that address at least one of the extensive list of requirements specified in the associated “Use Cases and Requirements for JPEG Fake Media” document. Proponents who have not yet made a pre-registration are still welcome to submit their final proposal before 19 October 2022. Full details about the timeline, submission requirements and evaluation processes are documented in the CfP available on jpeg.org.

In parallel with the work on Fake Media, JPEG explores use cases and requirements related to Non Fungible Tokens (NFTs). Although the use cases between both topics are different, there is a significant overlap in terms of requirements and relevant solutions. The presentations and video recordings of the joint 5th JPEG NFT and Fake Media Workshop that took place prior to the 96th meeting are available on the JPEG website. In addition, a new version of the “Use Cases and Requirements for JPEG NFT” was produced and made publicly available for review and feedback.

JPEG Systems

During the 96th JPEG Meeting, the IS texts for both JLINK (ISO/IEC 19566-7) and JPEG Snack (ISO/IEC 19566-8) were prepared and submitted for final publication. JLINK specifies a format to store multiple images inside of JPEG files and supports interactive navigation between them. JLINK addresses use cases such as virtual museum tours, real estate visits, hotspot zoom into other images and many others. JPEG Snack on the other hand enables self-running multimedia experiences such as animated image sequences and moving image overlays. Both standards are based on the JPEG Universal Metadata Box Format (JUMBF, ISO/IEC 19566-5) for which a second edition is in progress. This second edition adds extensions to the native support of CBOR (Concise Binary Object Representation) and attaches private fields to the JUMBF Description Box.

JPEG Pleno Light Field

During its 96th meeting, the JPEG Committee released the “JPEG Pleno Second Draft Call for Contributions on Light Field Subjective Quality Assessment”, to collect new procedures and best practices for light field subjective quality evaluation methodologies to assess artefacts induced by coding algorithms. All contributions, which can be test procedures, datasets, and any additional information, will be considered to develop the standard by consensus among JPEG experts following a collaborative process approach. The Final Call for Contributions will be issued at the 97th JPEG meeting. The deadline for submission of contributions is 1 April 2023.

A JPEG Pleno Light Field AhG has also started the preparation of a first workshop on Subjective Light Field Quality Assessment and a second workshop on Learning-based Light field Coding, to exchange experiences, to present technological advances and research results on light field subjective quality assessment and to present technological advances and research results on learning-based coding solutions for light field data, respectively.

JPEG AIC

During its 96th meeting, a Second Draft Call for Contributions on Subjective Image Quality Assessment was issued. The final Call for Contributions is now planned to be issued at the 97th JPEG meeting. The standardization process will be collaborative from the very beginning, i.e. all submissions will be considered in developing the next extension of the JPEG AIC standard. The deadline for submissions has been extended to 1 April 2023 at 23:59 UTC. Multiple types of contributions are accepted, namely subjective assessment methods including supporting evidence and detailed description, test material, interchange format, software implementation, criteria and protocols for evaluation, additional relevant use cases and requirements, and any relevant evidence or literature. A dataset of sample images with compression-based distortions in the target quality range is planned to be prepared for the 97th JPEG meeting.

JPEG XS

With the 2nd edition of JPEG XS now in place, the JPEG Committee continues with the development of the 3rd edition of JPEG XS Part 1 (Core coding system) and Part 2 (Profiles and buffer models). These editions will address new use cases and requirements for JPEG XS by defining additional coding tools to further improve the coding efficiency, while keeping the low-latency and low-complexity core aspects of JPEG XS. The primary goal of the 3rd edition is to deliver the same image quality as the 2nd edition, but for specific content such as screen content with half of the required bandwidth. In this respect, experiments have indicated that it is possible to increase the quality in static regions of an image sequence by more than 10dB when compared to the 2nd edition. Based on the input contributions, a first working draft for 21122-1 has been created, along with the necessary core experiments for further evaluation and verification.

In addition, JPEG has finalized the work on the amendment for Part 2 2nd edition that defines a new High 4:2:0 profile and the new sublevel Sublev4bpp. This amendment is now ready for publication by ISO. In the context of Part 4 (Conformance testing) and Part 5 (Reference software), the JPEG Committee decided to make both parts publicly available.

Finally, the JPEG Committee decided to create a series of public documents, called the “JPEG XS in-depth series” that will explain various features and applications of JPEG XS to a broad audience. The first document in this series explains the advantages of using JPEG XS for raw image compression and will be published soon on jpeg.org.

JPEG 2000

The JPEG Committee published a case study that compares HT2K, ProRes and JPEG 2000 Part 1 when processing motion picture content with widely available commercial software tools running on notebook computers, available at https://ds.jpeg.org/documents/jpeg2000/wg1n100269-096-COM-JPEG_Case_Study_HTJ2K_performance_on_laptop_desktop_PCs.pdf

JPEG 2000 is widely used in the media and entertainment industry for Digital Cinema distribution, studio video masters and broadcast contribution links. High Throughput JPEG 2000 (HTJ2K or JPEG 2000 Part 15) is an update to JPEG 2000 that provides an order of magnitude speed up over legacy JPEG 2000 Part 1.

JPEG DNA

The JPEG Committee has continued its exploration of the coding of images in quaternary representations, as it is particularly suitable for DNA storage applications. The scope of JPEG DNA is the creation of a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers. During the 96th JPEG meeting, a new version of the overview document on Use Cases and Requirements for DNA-based Media Storage was issued and has been made publicly available. The JPEG Committee also updated two additional documents: the JPEG DNA Benchmark Codec and the JPEG DNA Common Test Conditions in order to allow for concrete exploration experiments to take place. This will allow further validation and extension of the JPEG DNA benchmark codec to simulate an end-to-end image storage pipeline using DNA and in particular, include biochemical noise simulation which is an essential element in practical implementations. A new branch has been created in the JPEG Gitlab that now contains two anchors and two JPEG DNA benchmark codecs.

Final Quote

“After successful calls for contributions, the JPEG Committee sets precedence by launching the collaborative phase of two learning based visual information coding standards, hence announcing the start of a new era in coding technologies relying on AI.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Upcoming JPEG meetings are planned as follows:

  • No 97, will be held online from 24-28 October 2022.
  • No 98, will be in Sydney, Australia from 14-20 January 2022

MPEG Column: 139th MPEG Meeting (virtual/online)

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 139th MPEG meeting was once again held as an online meeting, and the official press release can be found here and comprises the following items:

  • MPEG Issues Call for Evidence for Video Coding for Machines (VCM)
  • MPEG Ratifies the Third Edition of Green Metadata, a Standard for Energy-Efficient Media Consumption
  • MPEG Completes the Third Edition of the Common Media Application Format (CMAF) by adding Support for 8K and High Frame Rate for High Efficiency Video Coding
  • MPEG Scene Descriptions adds Support for Immersive Media Codecs
  • MPEG Starts New Amendment of VSEI containing Technology for Neural Network-based Post Filtering
  • MPEG Starts New Edition of Video Coding-Independent Code Points Standard
  • MPEG White Paper on the Third Edition of the Common Media Application Format

In this report, I’d like to focus on VCM, Green Metadata, CMAF, VSEI, and a brief update about DASH (as usual).

Video Coding for Machines (VCM)

MPEG’s exploration work on Video Coding for Machines (VCM) aims at compressing features for machine-performed tasks such as video object detection and event analysis. As neural networks increase in complexity, architectures such as collaborative intelligence, whereby a network is distributed across an edge device and the cloud, become advantageous. With the rise of newer network architectures being deployed amongst a heterogenous population of edge devices, such architectures bring flexibility to systems implementers. Due to such architectures, there is a need to efficiently compress intermediate feature information for transport over wide area networks (WANs). As feature information differs substantially from conventional image or video data, coding technologies and solutions for machine usage could differ from conventional human-viewing-oriented applications to achieve optimized performance. With the rise of machine learning technologies and machine vision applications, the amount of video and images consumed by machines has rapidly grown. Typical use cases include intelligent transportation, smart city technology, intelligent content management, etc., which incorporate machine vision tasks such as object detection, instance segmentation, and object tracking. Due to the large volume of video data, extracting and compressing the feature from a video is essential for efficient transmission and storage. Feature compression technology solicited in this Call for Evidence (CfE) can also be helpful in other regards, such as computational offloading and privacy protection.

Over the last three years, MPEG has investigated potential technologies for efficiently compressing feature data for machine vision tasks and established an evaluation mechanism that includes feature anchors, rate-distortion-based metrics, and evaluation pipelines. The evaluation framework of VCM depicted below comprises neural network tasks (typically informative) at both ends as well as VCM encoder and VCM decoder, respectively. The normative part of VCM typically includes the bitstream syntax which implicitly defines the decoder whereas other parts are usually left open for industry competition and research.

Further details about the CfP and how interested parties can respond can be found in the official press release here.

Research aspects: the main research area for coding-related standards is certainly compression efficiency (and probably runtime). However, this video coding standard will not target humans as video consumers but as machines. Thus, video quality and, in particular, Quality of Experience needs to be interpreted differently, which could be another worthwhile research dimension to be studied in the future.

Green Metadata

MPEG Systems has been working on Green Metadata for the last ten years to enable the adaptation of the client’s power consumption according to the complexity of the bitstream. Many modern implementations of video decoders can adjust their operating voltage or clock speed to adjust the power consumption level according to the required computational power. Thus, if the decoder implementation knows the variation in the complexity of the incoming bitstream, then the decoder can adjust its power consumption level to the complexity of the bitstream. This will allow less energy use in general and extended video playback for the battery-powered devices.

The third edition enables support for Versatile Video Coding (VVC, ISO/IEC 23090-3, a.k.a. ITU-T H.266) encoded bitstreams and enhances the capability of this standard for real-time communication applications and services. While finalizing the support of VVC, MPEG Systems has also started the development of a new amendment to the Green Metadata standard, adding the support of Essential Video Coding (EVC, ISO/IEC 23094-1) encoded bitstreams.

Research aspects: reducing global greenhouse gas emissions will certainly be a challenge for humanity in the upcoming years. The amount of data on today’s internet is dominated by video, which all consumes energy from production to consumption. Therefore, there is a strong need for explicit research efforts to make video streaming in all facets friendly to our environment. 

Third Edition of Common Media Application Format (CMAF)

The third edition of CMAF adds two new media profiles for High Efficiency Video Coding (HEVC, ISO/IEC 23008-2, a.k.a. ITU-T H.265), namely for (i) 8K and (ii) High Frame Rate (HFR). Regarding the former, the media profile supporting 8K resolution video encoded with HEVC (Main 10 profile, Main Tier with 10 bits per colour component) has been added to the list of CMAF media profiles for HEVC. The profile will be branded as ‘c8k0’ and will support videos with up to 7680×4320 pixels (8K) and up to 60 frames per second. Regarding the latter, another media profile has been added to the list of CMAF media profiles, branded as ‘c8k1’ and supports HEVC encoded video with up to 8K resolution and up to 120 frames per second. Finally, chroma location indication support has been added to the 3rd edition of CMAF.

Research aspects: basically, CMAF serves two purposes: (i) harmonizing DASH and HLS at the segment format level by adopting the ISOBMFF and (ii) enabling low latency streaming applications by introducing chunks (that are smaller than segments). The third edition supports resolutions up to 8K and HFR, which raises the question of how low latency can be achieved for 8K/HFR applications and services and under which conditions.

New Amendment for Versatile Supplemental Enhancement Information (VSEI) containing Technology for Neural Network-based Post Filtering

At the 139th MPEG meeting, the MPEG Joint Video Experts Team with ITU-T SG 16 (WG 5; JVET) issued a Committee Draft Amendment (CDAM) text for the Versatile Supplemental Enhancement Information (VSEI) standard (ISO/IEC 23002-7, a.k.a. ITU-T H.274). Beyond the Supplemental Enhancement Information (SEI) message for shutter interval indication, which is already known from its specification in Advanced Video Coding (AVC, ISO/IEC 14496-10, a.k.a. ITU-T H.264) and High Efficiency Video Coding (HEVC, ISO/IEC 23008-2, a.k.a. ITU-T H.265), and a new indicator for subsampling phase indication which is relevant for variable-resolution video streaming, this new amendment contains two SEI messages for describing and activating post filters using neural network technology in video bitstreams. This could reduce coding noise, upsampling, colour improvement, or denoising. The description of the neural network architecture itself is based on MPEG’s neural network coding standard (ISO/IEC 15938-17). Results from an exploration experiment have shown that neural network-based post filters can deliver better performance than conventional filtering methods. Processes for invoking these new post-processing filters have already been tested in a software framework and will be made available in an upcoming version of the Versatile Video Coding (VVC, ISO/IEC 23090-3, a.k.a. ITU-T H.266) reference software (ISO/IEC 23090-16, a.k.a. ITU-T H.266.2).

Research aspects: quality enhancements such as reducing coding noise, upsampling, colour improvement, or denoising have been researched quite substantially either with or without neural networks. Enabling such quality enhancements via (V)SEI messages enable system-level support for research and development efforts in this area. For example, integration in video streaming applications or/and conversational services, including performance evaluations.

The latest MPEG-DASH Update

Finally, I’d like to provide a brief update on MPEG-DASH! At the 139th MPEG meeting, MPEG Systems issued a new working draft related to Extended Dependent Random Access Point (EDRAP) streaming and other extensions, which will be further discussed during the Ad-hoc Group (AhG) period (please join the dash email list for further details/announcements). Furthermore, Defects under Investigation (DuI) and Technologies under Consideration (TuC) have been updated. Finally, a new part has been added (ISO/IEC 23009-9), which is called encoder and packager synchronization, for which also a working draft has been produced. Publicly available documents (if any) can be found here.

An updated overview of DASH standards/features can be found in the Figure below.

Research aspects: in the Christian Doppler Laboratory ATHENA we aim to research and develop novel paradigms, approaches, (prototype) tools and evaluation results for the phases (i) multimedia content provisioning (i.e., video coding), (ii) content delivery (i.e., video networking), and (iii) content consumption (i.e., video player incl. ABR and QoE) in the media delivery chain as well as for (iv) end-to-end aspects, with a focus on, but not being limited to, HTTP Adaptive Streaming (HAS). Recent DASH-related publications include “Low Latency Live Streaming Implementation in DASH and HLS” and “Segment Prefetching at the Edge for Adaptive Video Streaming” among others.

The 140th MPEG meeting will be face-to-face in Mainz, Germany, from October 24-28, 2022. Click here for more information about MPEG meetings and their developments.

JPEG Column: 95th JPEG Meeting

JPEG issues a call for proposals for JPEG Fake Media

The 95th JPEG meeting was held online from 25 to 29 April 2022. A Call for Proposals (CfP) was issued for JPEG Fake Media that aims at a standardisation framework for secure annotation of modifications in media assets. With this new initiative, JPEG endeavours to provide standardised means for the identification of the provenance of media assets that include imaging information. Assuring the provenance of the coded information is essential considering the current trends and possibilities on multimedia technology.

Fake Media standardisation aims the identification of image provenance.

This new initiative complements the ongoing standardisation of machine learning based codecs for images and point clouds. Both are expected to revolutionise the state of the art of coding standards, leading to compression rates beyond the current state of the art.

The 95th JPEG meeting had the following highlights:

  • JPEG Fake Media issues a Call for Proposals;
  • JPEG AI
  • JPEG Pleno Point Cloud Coding;
  • JPEG Pleno Light Fields quality assessment;
  • JPEG AIC near perceptual lossless quality assessment;
  • JPEG NFT exploration;
  • JPEG DNA explorations
  • JPEG XS 2nd edition published;
  • JPEG XL 2nd edition.

The following summarises the major achievements of the 95th JPEG meeting.

JPEG Fake Media

At its 95th JPEG meeting, the committee issued a Final Call for Proposals (CfP) on JPEG Fake Media. The scope of JPEG Fake Media is the creation of a standard that can facilitate the secure and reliable annotation of media asset creation and modifications. The standard shall address use cases that are in good faith as well as those with malicious intent. The call for proposals welcomes contributions that address at least one of the extensive list of requirements specified in the associated “Use Cases and Requirements for JPEG Fake Media” document. Proponents are highly encouraged to express their interest in submission of a proposal before 20 July 2022 and submit their final proposal before 19 October 2022. Full details about the timeline, submission requirements and evaluation processes are documented in the CfP available on jpeg.org.

JPEG AI

Following the JPEG AI joint ISO/IEC/ITU-T Call for Proposals issued after the 94th JPEG committee meeting, 14 registrations were received among which 12 codecs were submitted for the standard reconstruction task. For computer vision and image processing tasks, several teams have submitted compressed domain decoders, notably 6 for image classification. Prior to the 95th JPEG meeting, the work was focused on the management of the Call for Proposals submissions and the creation of the test sets and the generation of anchors for standard reconstruction, image processing and computer vision tasks. Moreover, a dry run of the subjective evaluation of the JPEG AI anchors was performed with expert subjects and the results were analysed during this meeting, followed by additions and corrections to the JPEG AI Common Training and Test Conditions and the definition of several recommendations for the evaluation of the proposals, notably, the anchors, images and bitrates selection. A procedure for cross-check evaluation was also discussed and approved. The work will now focus on the evaluation of the Call for Proposals submissions, which is expected to be finalized at the 96th JPEG meeting.

JPEG Pleno Point Cloud Coding

JPEG Pleno is working towards the integration of various modalities of plenoptic content under a single and seamless framework. Efficient and powerful point cloud representation is a key feature within this vision. Point cloud data supports a wide range of applications for human and machine consumption including metaverse, autonomous driving, computer-aided manufacturing, entertainment, cultural heritage preservation, scientific research and advanced sensing and analysis. During the 95th JPEG meeting, the JPEG Committee reviewed the responses to the Final Call for Proposals on JPEG Pleno Point Cloud Coding. Four responses have been received from three different institutions. At the upcoming 96th JPEG meeting, the responses to the Call for Proposals will be evaluated with a subjective quality evaluation and objective metric calculations.

JPEG Pleno Light Field

The JPEG Pleno standard tools provide a framework for coding new imaging modalities derived from representations inspired by the plenoptic function. The image modalities addressed by the current standardization activities are light field, holography, and point clouds, where these image modalities describe different sampled representations of the plenoptic function. Therefore, to properly assess the quality of these plenoptic modalities, specific subjective and objective quality assessment methods need to be designed.

In this context, JPEG has launched a new standardisation effort known as JPEG Pleno Quality Assessment. It aims at providing a quality assessment standard, defining a framework that includes subjective quality assessment protocols and objective quality assessment procedures for lossy decoded data of plenoptic modalities for multiple use cases and requirements. The first phase of this effort will address the light field modality.

To assist this task, JPEG has issued the “JPEG Pleno Draft Call for Contributions on Light Field Subjective Quality Assessment”, to collect new procedures and best practices with regard to light field subjective quality assessment methodologies to assess artefacts induced by coding algorithms. All contributions, which can be test procedures, datasets, and any additional information, will be considered to develop the standard by consensus among the JPEG experts following a collaborative process approach.

The Final Call for Contributions will be issued at the 96th JPEG meeting. The deadline for submission of contributions is 18 December 2022.

JPEG AIC

During the 95th JPEG Meeting, the committee released the Draft Call for Contributions on Subjective Image Quality Assessment.

The new JPEG AIC standard will be developed considering all the submissions to the Call for Contributions in a collaborative process. The deadline for the submission is set for 14 October 2022. Multiple types of contributions are accepted, notably subjective assessment methods including supporting evidence and detailed description, test material, interchange format, software implementation, criteria and protocols for evaluation, additional relevant use cases and requirements, and any relevant evidence or literature.

The JPEG AIC committee has also started the preparation of a workshop on subjective assessment methods for the investigated quality range, which will be held at the end of June. The workshop targets obtaining different views on the problem, and will include both internal and external speakers, as well as a Q&A panel. Experts in the field of quality assessment and stakeholders interested in the use cases are invited.

JPEG NFT

After the joint JPEG NFT and Fake Media workshops it became evident that even though the use cases between both topics are different, there is a significant overlap in terms of requirements and relevant solutions. For that reason, it was decided to create a single AHG that covers both JPEG NFT and JPEG Fake Media explorations. The newly established AHG JPEG Fake Media and NFT will use the JPEG Fake Media mailing list.

JPEG DNA

The JPEG Committee has continued its exploration of the coding of images in quaternary representations, as it is particularly suitable for DNA storage applications. The scope of JPEG DNA is the creation of a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers. A new version of the overview document on DNA-based Media Storage: State-of-the-Art, Challenges, Use Cases and Requirements was issued and has been made publicly available. It was decided to continue this exploration by validating and extending the JPEG DNA benchmark codec to simulate an end-to-end image storage pipeline using DNA for future exploration experiments including biochemical noise simulation. During the 95th JPEG meeting, a new specific document describing the Use Cases and Requirements for DNA-based Media Storage was created which is made publicly available. A timeline for the standardization process was also defined. Interested parties are invited to consider joining the effort by registering to the JPEG DNA AHG mailing list.

JPEG XS

The JPEG Committee is pleased to announce that the 2nd editions of Part 1 (Core coding system), Part 2 (Profiles and buffer models), and Part 3 (Transport and container formats) were published in March 2022. Furthermore, the committee finalized the work on Part 4 (Conformance testing) and Part 5 (Reference software), which are now entering the final phase for publication. With these last two parts, the committee’s work on the 2nd edition of the JPEG XS standards comes to an end, allowing to shift the focus to further improve the standard. Meanwhile, in response to the latest Use Cases and Requirements for JPEG XS v3.1, the committee received a number of technology proposals from Fraunhofer and intoPIX that focus on improving the compression performance for desktop content sequences. The proposals will now be evaluated and thoroughly tested and will form the foundation of the work towards a 3rd edition of the JPEG XS suite of standards. The primary goal of the 3rd edition is to deliver the same image quality as the 2nd edition, but with half of the required bandwidth.

JPEG XL

The second edition of JPEG XL Part 1 (Core coding system), with an improved numerical stability of the edge-preserving filter and numerous editorial improvements, has proceeded to the CD stage. Work on a second edition of Part 2 (File format) was initiated. Hardware coding was also further investigated. Preliminary software support has been implemented in major web browsers, image viewing and editing software, including popular tools such as FFmpeg, ImageMagick, libvips, GIMP, GDK and Qt. JPEG XL is now ready for wide-scale adoption.

Final Quote

“Recent development on creation and modification of visual information call for development of tools that can help protecting the authenticity and integrity of media assets. JPEG Fake Media is a standardised framework to deal with imaging provenance.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Upcoming JPEG meetings are planned as follows:

  • No. 96, will be held online during 25-29 July 2022.