Standards – Page 4 – ACM SIGMM Records

JPEG Column: 98th JPEG meeting in Sydney, Australia

By Antonio Pinheiro | April 12, 2023 - 10:14 |May 8, 2023 0223, Event Report, Feature, Standards

JPEG explores standardization in event-based imaging

The 98^th JPEG meeting was held in Sydney, Australia, from the 16^th to 20^th January 2023. This was a welcome return to face-to-face meetings after a long period of online meetings due to Covid-19 pandemics. Interestingly, the previous face-to-face meeting of the JPEG Committee was also held in Sydney, in January 2020. The face-to-face 98^th JPEG meeting was complemented with online connections to allow the remote participation of those who could not be present.

The recent calls for proposals, such as JPEG Fake Media, JPEG AI and JPEG Pleno Learning Based Point Cloud Coding, resulted in a very dynamic and participative meeting in Sydney, with multiple technical sessions and decisions. Exploration activities such as JPEG DNA and JPEG NFT also produced drafts of future calls for proposals as a consequence of reaching sufficient maturity.

Furthermore, and considering the current trends in machine-based imaging applications, the JPEG Committee initiated an exploration on standardization in event-based imaging.

The 98^th JPEG meeting had the following highlights:

New JPEG exploration in event-based imaging;
JPEG Fake Media and NFT;
JPEG AI;
JPEG Pleno Learning-based Point Cloud Coding improves its Verification Model;
JPEG AIC prepares the analysis of the responses to the Call for Contribution;
JPEG XL second editions;
JPEG Systems;
JPEG DNA prepares its call for proposals;
JPEG XS 3rd Edition;
JPEG 2000 guidelines.

The following summarizes the major achievements during the 98^th JPEG meeting.

New JPEG exploration in event-based imaging

The JPEG Committee has started a new exploration activity on event-based imaging named JPEG XE.

Event-based Imaging revolves around a new and emerging image modality created by event-based visual sensors. Event-based sensors are the foundation for a new class of cameras that allow the efficient capture of visual information at high speed while at the same time requiring low computational cost, a requirement which it is common in many machine vision applications. Such sensors are modeled based on the mechanisms of the human visual system for the detection of scene changes and the asynchronous capture of those changes. This means that every pixel works individually to detect scene changes and creates the associated events. If nothing happens, then no events are generated. This contrasts with conventional image sensors, where pixels are sampled in a continuous and periodic manner, with images generated regardless of any changes in the scene and a risk of reacting with delay and even missing quick changes.

The JPEG Committee recognizes that this new image modality opens doors to a large number of applications where capture and processing of visual information is needed. Currently, there is no standard format to represent event-based information, and therefore existing and emerging applications are fragmented and lack interoperability. The new JPEG XE activity focuses on establishing a scope and relevant definitions, collecting use cases and their associated requirements, and investigating the role that JPEG can play in the definition of timely standards in the near- and long-term. To start, an Ad-hoc Group has been established. To stay informed about the activities please join the event based imaging Ad-hoc Group mailing list.

JPEG Fake Media and NFT

In April 2022, the JPEG Committee released a Final Call for Proposals on JPEG Fake Media. The scope of JPEG Fake Media is the creation of a standard that can facilitate the secure and reliable annotation of media assets creation and modifications. During the 98th meeting, the JPEG Committee finalised the evaluation of the six submitted proposals and initiated the process for establishing a new standard.

The JPEG Committee also continues to explore use cases and requirements related to Non-Fungible Tokens (NFTs). Although the use cases for both topics are very different, there is a clear commonality in terms of requirements and relevant solutions. An updated version of the “Use Cases and Requirements for JPEG NFT” was produced and made publicly available for review and feedback.

To stay informed about the activities, please join the mailing list of the Ad-hoc Group and regularly check the JPEG website for the latest information.

JPEG AI

Following the creation of the JPEG AI Verification Model at the previous 97th JPEG meeting, more discussions occurred at the 98th meeting to improve the coding efficiency, and complexity, especially on the decoder side. The JPEG AI VM has several unique characteristics, such as a parallelizable context model to perform latent prediction, decoupling of prediction and sample reconstruction, and rate adaptation, among others. JPEG AI VM shows up to 31% compression gain over VVC Intra for natural content. A new JPEG AI test set was released during the 98th meeting. This is a large dataset for the evaluation of the JPEG AI VM containing 50 images, with the objective of tracking the performance improvements at every meeting. The JPEG AI Common Training and Test Conditions were updated to include this new dataset. In this meeting, it was also decided to integrate several changes into the JPEG AI VM, speeding up training, improving performance at high rates and fixing bugs. A set of core experiments were established at this meeting targeting RD performance and complexity improvements. The JPEG AI VM Software Guidelines were approved, describing the initial setup repository of JPEG AI VM, how to obtain the JPEG AI dataset, and how to run tests and training. A description of the structure of the JPEG AI VM repository was also made available.

JPEG Pleno Learning-based Point Cloud coding

The JPEG Pleno Point Cloud activity progressed at this meeting with a number of technical submissions for improvements to the VM in the area of colour coding, artefact processing and improvements to coding speed. In addition, the JPEG Committee released the “Call for Content for JPEG Pleno Point Cloud Coding” to expand on the current training and test set with new point clouds representing key use cases. Prior to the 99^th JPEG Meeting, JPEG experts will promote the Call for Content as well as investigate possible advancements to the VM in the areas of auto-regressive entropy encoding, sparse tensor convolution, meta-data controlled post-filtering of colour and a flexible split geometry and colour coding framework for the VM.

JPEG AIC

During the 98th JPEG meeting in Sydney, Australia, Exploration Study 1 on JPEG AIC was established. This exploration study will collect results from three types of previously standardized subjective evaluation methodologies in order to provide an informative reference for the JPEG AIC submissions to the Call for Contributions that are due by April 1^st, 2023. Corrections and additions to the JPEG AIC Common Test Conditions were issued in order to reflect the addition of a new codec for testing content generation and a new anchor subjective quality assessment methodology.

The JPEG Committee is working on the continuation of the previous standardization efforts (AIC-1 and AIC-2) and aims at developing a new standard, known as AIC-3. The new standard will focus on the methodologies for quality assessment of images in a range that goes from high quality to near-visually lossless quality, which are not covered by any previous AIC standards.

JPEG XL

The second editions of JPEG XL Part 1 (Core coding system) and Part 2 (File format) have reached the CD stage. These second editions provide clarifications, corrections and editorial improvements that will facilitate independent implementations. Also, an updated version of the JPEG XL White Paper has been published and is freely available through jpeg.org.

JPEG Systems

The JLINK standard (19566-7:2022) is now published by ISO. JLINK specifies an image file format capable of linking multiple media elements, such as image and text in any JPEG file format. It enables enhanced curated experiences of a set of images for education, training, virtual museum tours, travelogs, and similar visually-oriented content.

The JPEG Snack (19566-8) standard is expected to be published in February 2023. JPEG Snack specifies the coding of audio, picture, multimedia and hypermedia information, enabling a rich, image-based, short-form animated experiences for social media.

The second edition of JUMBF (JPEG Universal Metadata Box Format, 19566-5) is progressing to IS stage; the second edition brings new capabilities and support for additional types of media.

JPEG DNA

The JPEG Committee has been working on an exploration for coding of images in quaternary representations particularly suitable for image archival on DNA storage. The scope of JPEG DNA is the creation of a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers. During the 98th JPEG meeting, a draft Call for Proposals for JPEG DNA was issued and made public, as a first concrete step towards standardisation. The draft call for proposals for JPEG DNA is complemented by a JPEG DNA Common Test Conditions document which is also made public, describing details about the dataset, operating points, anchors and performance assessment methodologies and metrics that will be used to evaluate anchors and future responses to the Call for Proposals. The final Call for Proposals for JPEG DNA is expected to be released at the conclusion of the 99th JPEG meeting in April 2023, after a set of exploration experiments have validated the procedures outlined in the draft Call for Proposals for JPEG DNA and JPEG DNA Common Test Conditions. The deadline for submission of proposals to the Call for Proposals for JPEG DNA is 2 October 2023 with a pre-registration due by 10 July 2023. The JPEG DNA international standard is expected to be published by early 2025.

JPEG XS

The JPEG Committee continued with the definition of JPEG XS 3rd edition. The primary goal of the 3rd edition is to deliver the same image quality as the 2nd edition, but with half of the required bandwidth. The Committee Draft for Part 1 (Core coding system) will proceed to ISO ballot. This means that the standard is now technically defined, and all the new coding tools are known. Most notably, Part 1 adds a temporal decorrelation coding mode to further improve the coding efficiency, while keeping the low-latency and low-complexity core aspects of JPEG XS. This new coding tool is of extreme importance for remote desktop applications and screen sharing. In addition, mathematically lossless coding can now support up to 16 bits precision (up from 12 bits). For Part 2 (Profiles and buffer models), the committee created a second Working Draft and issued further core experiments to proceed and support this work. Meanwhile, ISO approved the creation of a new edition of Part 3 (Transport and container formats) that is needed to address the changes of Part 1 and Part 2.

JPEG 2000

The JPEG committee publishes two sets of guidelines for implementers of JPEG 2000, available on jpeg.org.

The first describes an algorithm for controlling JPEG 2000 coding quality using a single number (Qfactor) between 1 (worst quality) and 100 (best quality), as is commonly done with JPEG.

The second explains how to create, parse and use HTJ2K placeholder passes and HT Sets. These features are an integral part of HTJ2K and enable mathematically lossless transcoding between HT- and J2K-based codestreams, among other applications.

Final Quote

“The interest in event-based imaging has been rising with several products designed and offered by the industry. The JPEG Committee believes in interoperable solutions and has initiated an exploration for standardization of event-based imaging in order to accelerate creation of an ecosystem.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Upcoming JPEG meetings are planned as follows:

No 99, will be online from 24-28 April 2023
No 100, will be in Covilhã, Portugal from 17-21 July 2023

VQEG Column: VQEG Meeting December 2022

By Jesús Gutiérrez | January 31, 2023 - 12:55 |January 31, 2023 0123, Event Report, Feature, Standards

Leave a comment

Introduction

This column provides an overview of the last Video Quality Experts Group (VQEG) plenary meeting, which took place from 12 to 16 December 2022. Around 100 participants from 21 different countries around the world registered for the meeting that was organized online by Brightcove (United Kingdom). During the five days, there were more than 40 presentations and discussions among researchers working on topics related to the projects ongoing within VQEG. All the related information, minutes, and files from the meeting are available online on the VQEG meeting website, and video recordings of the meeting are available on Youtube.

Many of the works presented in this meeting can be relevant for the SIGMM community working on quality assessment. Particularly interesting can be the proposals to update and merge ITU-T recommendations P.913, P.911, and P.910, the kick-off of the test plan to evaluate the QoE of immersive interactive communication systems, and the creation of a new group on emerging technologies that will start working on AI-based technologies and greening of streaming and related trends.

We encourage readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Group picture of the VQEG Meeting 12-16 December 2022 (online).

Overview of VQEG Projects

Audiovisual HD (AVHD)

The AVHD group investigates improved subjective and objective methods for analysing commonly available video systems. Currently, there are two projects ongoing under this group: Quality of Experience (QoE) Metrics for Live Video Streaming Applications (Live QoE) and Advanced Subjective Methods (AVHD-SUB).

In this meeting, there were three presentations related to topics covered by this group. In the first one, Maria Martini (Kingston University, UK), presented her work on converting video quality assessment metrics. In particular, the work addressed the relationship between SSIM and PSNR for DCT-based compressed images and video, exploiting the content-related factor [1]. The second presentation was given by Urvashi Pal (Akamai, Australia) and dealt with video codec profiling with video quality assessment complexities and resolutions. Finally, Jingwen Zhu (Nantes Université, France) presented her work on the benefit of parameter-driven approaches for the modelling and the prediction of a Satisfied User Ratio for compressed videos [2].

Quality Assessment for Health applications (QAH)

The QAH group works on the quality assessment of health applications, considering both subjective evaluation and the development of datasets, objective metrics, and task-based approaches. Currently there is an open discussion on new topics to address within the group, such as the application of visual attention models and studies to health applications. Also, an opportunity to conduct medical perception research was announced, which was proposed by Elizabeth Krupinski and will take place in the European Congress of Radiology (Vienna, Austria, Mar. 2023).

In addition, four research works were presented at the meeting. Firstly, Julie Fournier (INSA Rennes, France) presented new insights on affinity therapy for people with ASD, based on an eye-tracking study on images. The second presentation was delivered by Lumi Xia (INSA Rennes, France) and dealt with the evaluation of the usability of deep learning-based denoising models for low-dose CT simulation. Also, Mohamed Amine Kerkouri (University of Orleans, France), presented his work on deep-based quality assessment of medical images through domain adaptation. Finally, Jorge Caviedes (ASU, USA) delivered a talk on cognition inspired diagnostic image quality models, emphasising the need of distinguishing among interpretability (e.g., medical professional is confident in making a diagnosis), adequacy (e.g., capture technique shows the right area for assessment), and visual quality (e.g., MOS) in quality assessment of medical contents.

Statistical Analysis Methods (SAM)

The SAM group works on improving analysis methods both for the results of subjective experiments and for objective quality models and metrics. The group is currently working on updating and merging the ITU-T recommendations P.913, P.911, and P.910. The suggestion is to make P.910 and P.911 obsolete and make P.913 the only recommendation from ITU-T on subjective video quality assessments. The group worked on the liaison and document to be sent to ITU-T SG12 and will be available in the meeting files.

In addition, Mohsen Jenadeleh (Univerity of Konstanz, Germany) presented his work on collective just noticeable difference assessment for compressed video with Flicker Test and QUEST+.

Computer Generated Imagery (CGI)

CGI group is devoted to analysing and evaluating computer-generated content, with a focus on gaming in particular. The group is currently working in collaboration with ITU-T SG12 on the work item P.BBQCG on Parametric bitstream-based Quality Assessment of Cloud Gaming Services. In this sense, Saman Zadtootaghaj (Sony Interactive Entertainment, Germany) provided an update on the ongoing activities. In addition, they are working on two new work items: G.OMMOG on Opinion Model for Mobile Online Gaming applications and P.CROWDG on Subjective Evaluation of Gaming Quality with a Crowdsourcing Approach. Also, the group is working on identifying other topics and interests in CGI rather than gaming content.

No Reference Metrics (NORM)

The NORM group is an open collaborative project for developing no-reference metrics for monitoring visual service quality. Currently, the group is working on three topics: the development of no-reference metrics, the clarification of the computation of the Spatial and Temporal Indexes (SI and TI, defined in the ITU-T Recommendation P.910), and the development of a standard for video quality metadata.

In relation to the first topic, Margaret Pinson (NTIA/ITS, US), talked about why no-reference metrics for image and video quality lack accuracy and reproducibility [3] and presented new datasets containing camera noise and compression artifacts for the development of no-reference metrics by the group. In addition, Oliver Wiedeman (University of Konstanz, Germany) presented his work on cross-resolution image quality assessment.

Regarding the computation of complexity indices, Maria Martini (Kingston University, UK) presented a study comparing 12 metrics (and possible combinations) for assessing video content complexity. Vignesh V. Menon (University of Klagenfurt, Austria) presented a summary of live per-title encoding approaches using video complexity features. Ioannis Katsavounidis and Cosmin Stejerean (Meta, US) presented their work on using motion search to order videos by coding complexity, also making available the software in open source. In addition, they led a discussion on supplementing classic SI and TI with improved complexity metrics (VCA, motion search, etc.).

Finally, related to the third topic, Ioannis Katsavounidis (Meta, US) provided an update on the status of the project. Given that the idea is already mature enough, a contribution will be made to MPEG to consider the insertion of metadata of video metrics into the encoded video streams. In addition, a liaison with AOMedia will be established that may go beyond this particular topic. And include best practices on subjective testing, IMG topics, etc.

Joint Effort Group (JEG) – Hybrid

The JEG group was focused on a joint work to develop hybrid perceptual/bitstream metrics and gradually evolved over time to include several areas of Video Quality Assessment (VQA), such as the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. Currently, the group is working on research problems rather than algorithms and models with immediate applicability. In addition, the group has launched a new website, which includes a list of activities of interest, freely available publications, and other resources.

Two examples of research problems addressed by the group were shown by the two presentations given by Lohic Fotio Tiotsop (Politecnico di Torino, Italy). The topic of the first presentation was related to the training of artificial intelligence observers for a wide range of applications, while the second presentation provided guidelines to train, validate, and publish DNN-based objective measures.

5G Key Performance Indicators (5GKPI)

The 5GKPI group studies the relationship between key performance indicators of new 5G networks and QoE of video services on top of them. In this meeting, Pablo Pérez (Nokia XR Lab, Spain) presented an overview of activities related to QoE and XR within 3GPP.

Immersive Media Group (IMG)

The IMG group is focused on the research on quality assessment of immersive media. The main joint activity going on within the group is the development of a test plan to evaluate the QoE of immersive interactive communication systems. After the discussions that took place in previous meetings and audio calls, a tentative schedule has been proposed to start the execution of the test plan in the following months. In this sense, a new work item will be proposed in the next ITU-T SG12 meeting to establish a collaboration between VQEG-IMG and ITU on this topic.

In addition to this, a variety of different topics related to immersive media technologies were covered in the works presented during the meeting. For example, Yaosi Hu (Wuhan University, China) presented her work on video quality assessment based on quality aggregation networks. In relation to light field imaging, Maria Martini (Kingston University, UK) exposed the main problems related to what light field quality assessment datasets are currently meeting and presented a new dataset. Also, there were three talks by researchers from CWI (Netherlands) dealing with point cloud QoE assessment: Silvia Rossi presented a behavioral analysis in a 6-DoF VR system, taking into account the influence of content, quality and user disposition [4]; Shishir Subramanyam presented his work related to the subjective QoE evaluation of user-centered adaptive streaming of dynamic point clouds [5]; and Irene Viola presented a point cloud objective quality assessment using PCA-based descriptors (PointPCA). Another presentation related to point cloud quality assessment was delivered by Marouane Tliba (Université d’Orleans, France), who presented an efficient deep-based graph objective metric.

In addition, Shirin Rafiei (RISE, Sweden) gave a talk on UX and QoE aspects of remote control operations using a laboratory platform, Marta Orduna (Universidad Politécnica de Madrid, Spain) presented her work on comparing ACR, SSDQE, and SSCQE in long duration 360-degree videos, whose results will be used to submit a proposal to extend ITU-T Rec. P.919 for long sequences, and Ali Ak (Nantes Université, France) his work on just noticeable differences to HDR/SDR image/video quality.

Quality Assessment for Computer Vision Applications (QACoViA)

The goal of the QACoViA group is to study the visual quality requirements for computer vision methods, where the “final observer” is an algorithm. Four presentations were delivered in this meeting addressing diverse related topics. In the first one, Mikołaj Leszczuk (AGH University, Poland) presented a method for assessing objective video quality for automatic license plate recognition tasks [6]. Also, Femi Adeyemi-Ejeye (University of Surrey, UK) presented his work related to the assessment of rail 8K-UHD CCTV facing video for the investigation of collisions. The third presentation dealt with the application of facial expression recognition and was delivered by Lucie Lévêque (Nantes Université, France), who compared the robustness of humans and deep neural networks on this task [7]. Finally, Alban Marie (INSA Rennes, France) presented a study video coding for machines through a large-scale evaluation of DNNs robustness to compression artefacts for semantic segmentation [8].

Other updates

In relation to the Human Factors for Visual Experiences (HFVE) group, Maria Martini (Kingston University, UK) provided a summary of the status of IEEE recommended practice for the quality assessment of light field imaging. Also, Kjell Brunnström (RISE, Sweden) presented a study related to the perceptual quality of video on simulated low temperatures in LCD vehicle displays.

In addition, a new group was created in this meeting called Emerging Technologies Group (ETG), whose main objective is to address various aspects of multimedia that do not fall under the scope of any of the existing VQEG groups. The topics addressed are not necessarily directly related to “video quality” but can indirectly impact the work addressed as part of VQEG. In particular, two major topics of interest were currently identified: AI-based technologies and greening of streaming and related trends. Nevertheless, the group aims to provide a common platform for people to gather together and discuss new emerging topics, discuss possible collaborations in the form of joint survey papers/whitepapers, funding proposals, etc.

Moreover, it was agreed during the meeting to make the Psycho-Physiological Quality Assessment (PsyPhyQA) group dormant until interest resumes in this effort. Also, it was proposed to move the Implementer’s Guide for Video Quality Metrics (IGVQM) project into the JEG-Hybrid, since their activities are currently closely related. This will be discussed in future group meetings and the final decisions will be announced. Finally, as a reminder, the VQEG GitHub with tools and subjective labs setup is still online and kept updated.

The next VQEG plenary meeting will take place in May 2023 and the location will be announced soon on the VQEG website.

References

[1] Maria G. Martini, “On the relationship between SSIM and PSNR for DCT-based compressed images and video: SSIM as content-aware PSNR”, TechRxiv. Preprint. https://doi.org/10.36227/techrxiv.21725390.v1, 2022.
[2] J. Zhu, P. Le Callet; A. Perrin, S. Sethuraman, K. Rahul, “On The Benefit of Parameter-Driven Approaches for the Modeling and the Prediction of Satisfied User Ratio for Compressed Video”, IEEE International Conference on Image Processing (ICIP), Oct. 2022.
[3] Margaret H. Pinson, “Why No Reference Metrics for Image and Video Quality Lack Accuracy and Reproducibility”, Frontiers in Signal Processing, Jul. 2022.
[4] S. Rossi, I. viola, P. Cesar, “Behavioural Analysis in a 6-DoF VR System: Influence of Content, Quality and User Disposition”, Proceedings of the 1st Workshop on Interactive eXtended Reality, Oct. 2022.
[5] S. Subramanyam, I. Viola, J. Jansen, E. Alexiou, A. Hanjalic, P. Cesar, “Subjective QoE Evaluation of User-Centered Adaptive Streaming of Dynamic Point Clouds”, International Conference on Quality of Multimedia Experience (QoMEX), Sep. 2022.
[6] M. Leszczuk, L. Janowski, J. Nawała, and A. Boev, “Method for Assessing Objective Video Quality for Automatic License Plate Recognition Tasks”, Communications in Computer and Information Science, Oct. 2022.
[7] L. Lévêque, F. Villoteau, E. V. B. Sampaio, M. Perreira Da Silva, and P. Le Callet, “Comparing the Robustness of Humans and Deep Neural Networks on Facial Expression Recognition”, Electronics, 11(23), Dec. 2022.
[8] A. Marie, K. Desnos, L. Morin, and Lu Zhang, “Video Coding for Machines: Large-Scale Evaluation of Deep Neural Networks Robustness to Compression Artifacts for Semantic Segmentation”, IEEE International Workshop on Multimedia Signal Processing (MMSP), Sep. 2022.

MPEG Column: 140th MPEG Meeting in Mainz, Germany

By Christian Timmerer | January 23, 2023 - 11:22 |January 26, 2023 0123, Event Report, Feature, Standards

Leave a comment

After several years of online meetings, the 140th MPEG meeting was held as a face-to-face meeting in Mainz, Germany, and the official press release can be found here and comprises the following items:

MPEG evaluates the Call for Proposals on Video Coding for Machines
MPEG evaluates Call for Evidence on Video Coding for Machines Feature Coding
MPEG reaches the First Milestone for Haptics Coding
MPEG completes a New Standard for Video Decoding Interface for Immersive Media
MPEG completes Development of Conformance and Reference Software for Compression of Neural Networks
MPEG White Papers: (i) MPEG-H 3D Audio, (ii) MPEG-I Scene Description

Video Coding for Machines

Video coding is the process of compression and decompression of digital video content with the primary purpose of consumption by humans (e.g., watching a movie or video telephony). Recently, however, massive video data is more and more analyzed without human intervention leading to a new paradigm referred to as Video Coding for Machines (VCM) which targets both (i) conventional video coding and (ii) feature coding (see here for further details).

At the 140th MPEG meeting, MPEG Technical Requirements (WG 2) evaluated the responses to the Call for Proposals (CfP) for technologies and solutions enabling efficient video coding for machine vision tasks. A total of 17 responses to this CfP were received, with responses providing various technologies such as (i) learning-based video codecs, (ii) block-based video codecs, (iii) hybrid solutions combining (i) and (ii), and (iv) novel video coding architectures. Several proposals use a region of interest-based approach, where different areas of the frames are coded in varying qualities.

The responses to the CfP reported an improvement in compression efficiency of up to 57% on object tracking, up to 45% on instance segmentation, and up to 39% on object detection, respectively, in terms of bit rate reduction for equivalent task performance. Notably, all requirements defined by WG 2 were addressed by various proposals.

Furthermore, MPEG Technical Requirements (WG 2) evaluated the responses to the Call for Evidence (CfE) for technologies and solutions enabling efficient feature coding for machine vision tasks. A total of eight responses to this CfE were received, of which six responses were considered valid based on the conditions described in the call:

For the tested video dataset increases in compression efficiency of up to 87% compared to the video anchor and over 90% compared to the feature anchor were reported.
For the tested image dataset, the compression efficiency can be increased by over 90% compared to both image and feature anchors.

Research aspects: the main research area is still the same as described in my last column, i.e., compression efficiency (incl. probably runtime, sometimes called complexity) and Quality of Experience (QoE). Additional research aspects are related to the actual task for which video coding for machines is used (e.g., segmentation, object detection, as mentioned above).

Video Decoding Interface for Immersive Media

One of the most distinctive features of immersive media compared to 2D media is that only a tiny portion of the content is presented to the user. Such a portion is interactively selected at the time of consumption. For example, a user may not see the same point cloud object’s front and back sides simultaneously. Thus, for efficiency reasons and depending on the users’ viewpoint, only the front or back sides need to be delivered, decoded, and presented. Similarly, parts of the scene behind the observer may not need to be accessed.

At the 140th MPEG meeting, MPEG Systems (WG 3) reached the final milestone of the Video Decoding Interface for Immersive Media (VDI) standard (ISO/IEC 23090-13) by promoting the text to Final Draft International Standard (FDIS). The standard defines the basic framework and specific implementation of this framework for various video coding standards, including support for application programming interface (API) standards that are widely used in practice, e.g., Vulkan by Khronos.

The VDI standard allows for dynamic adaptation of video bitstreams to provide the decoded output pictures so that the number of actual video decoders can be smaller than the number of elementary video streams to be decoded. In other cases, virtual instances of video decoders can be associated with the portions of elementary streams required to be decoded. With this standard, the resource requirements of a platform running multiple virtual video decoder instances can be further optimized by considering the specific decoded video regions to be presented to the users rather than considering only the number of video elementary streams in use. The first edition of the VDI standard includes support for the following video coding standards: High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC), and Essential Video Coding (EVC).

Research aspect: VDI is also a promising standard to enable the implementation of viewport adaptive tile-based 360-degree video streaming, but its performance still needs to be assessed in various scenarios. However, requesting and decoding individual tiles within a 360-degree video streaming application is a prerequisite for enabling efficiency in such cases, and VDI provides the basis for its implementation.

MPEG-DASH Updates

Finally, I’d like to provide a quick update regarding MPEG-DASH, which seems to be in maintenance mode. As mentioned in my last blog post, amendments, Defects under Investigation (DuI), and Technologies under Consideration (TuC) are output documents, as well as a new working draft called Redundant encoding and packaging for segmented live media (REAP), which eventually will become ISO/IEC 23009-9. The scope of REAP is to define media formats for redundant encoding and packaging of live segmented media, media ingest, and asset storage. The current working draft can be downloaded here.

Research aspects: REAP defines a distributed system and, thus, all research aspects related to such systems apply here, e.g., performance and scalability, just to name a few.

The 141st MPEG meeting will be online from January 16-20, 2023. Click here for more information about MPEG meetings and their developments.

JPEG Column: 97th JPEG Meeting

By Antonio Pinheiro | January 22, 2023 - 12:53 |March 1, 2023 0123, Event Report, Feature, Standards

Leave a comment

JPEG initiates specification on fake media based on responses to its call for proposals

The 97^th JPEG meeting was held online from 24 to 28 October 2022. JPEG received responses to the Call for Proposals (CfP) on JPEG Fake Media, the first multimedia international standard designed to facilitate the secure and reliable annotation of media assets creation and modifications. In total six responses were received addressing different requirements in the scope of this standardization initiative. Moreover, relevant advances were made on the standardization of learning-based coding, notably the learning-based coding of images, JPEG AI, and JPEG Pleno point cloud coding. Furthermore, the explorations on quality assessment of images, JPEG AIC, and of JPEG Pleno light field had relevant advances with the definition of their Calls for Contributions and Common Test Conditions.

Also relevant, the 98th JPEG meeting will be held in Sydney, Australia, representing a return to physical meetings after the long COVID pandemics. This is a return, as the last physical meeting was also held in January 2020 in the same location, in Sydney, Australia.

The 97^th JPEG meeting had the following highlights:

JPEG Fake Media responses to the Call for Proposals analysed,
JPEG AI Verification Model,
JPEG Pleno Learning-based Point Cloud coding Verification Model,
JPEG Pleno Light Field issues a Call for Contributions on Subjective Light Field Quality Assessment,
JPEG AIC issues a Call for Contributions on Subjective Image Quality Assessment,
JPEG DNA releases a draft of Common Test Conditions,
JPEG XS prepares third edition of core coding system, and profiles and buffer models,
JPEG 2000 conformance is under development.

Fig. 1: Fake Media application scenarios: Good faith vs Malicious intent.

The following summarises the major achievements of the 97^th JPEG meeting.

JPEG Fake Media

In April 2022, the JPEG Committee released a Final Call for Proposals on JPEG Fake Media. The scope of JPEG Fake Media is the creation of a standard that can facilitate the secure and reliable annotation of media assets creation and modifications. The standard shall address use cases that are in good faith as well as those with malicious intent. During the 97th meeting in October 2022, the following six responses to the call were presented:

Adobe/C2PA: C2PA Specification
Huawei: Provenance and Right Management for Digital Contents in JPEG Fake Media
Sony Group Corporation: Methods to keep track provenance of media asset and signing data
Vrije Universiteit Brussel/imec: Media revision history tracking via asset decomposition and serialization
UPC: MIPAMS Provenance module
Newcastle University: Response to JPEG Fake Media standardization call

In the coming months, these proposals will be thoroughly evaluated following a process that is open, transparent, fair and unbiased and allows deep technical discussions to assess which proposals best address identified requirements. Based on the conclusions of these discussions, a new standard will be produced to address fake media and provide solutions for transparency related to media authenticity. The standard will combine the best elements of the six proposals.

To stay informed about the activities please join the JPEG Fake Media & NFT AHG mailing list and regularly check the JPEG website for the latest information.

JPEG AI

JPEG AI (ISO/IEC 6048) aims at the development of a learning-based image coding standard offering a single-stream, compact compressed domain representation, targeting both human visualization with significant compression efficiency improvement over state-of-the-art image coding standards at similar subjective quality, and improved performance for image processing and computer vision tasks. The evaluation of the Call for Proposals responses had already confirmed the industry interest, and the subjective tests presented at the 96th JPEG meeting showed results that significantly outperform conventional image compression solutions.

The JPEG AI verification model has been issued as the outcome of this meeting and follows the integration effort of several neural networks and tools. There are several characteristics that make the JPEG AI Verification Model (VM) unique, such as the decoupling of the entropy decoding from the sample reconstruction and the exploitation of the spatial correlation between latents using a prediction and a fusion network as well as a massively parallelized auto-regressive network. The performance evaluation has shown significant RD performance improvements (as much as 32.2% of BD-rate over H.266/VVC) with competitive decoding complexity. Other functionalities such as rate adaptation and device interoperability have also been addressed with the use of gain units and the quantization of the weights in the entropy decoding module. Moreover, the adoption process for architectural changes and for new or improved coding tools in JPEG AI VM was approved. A set of core experiments have been defined for improving the JPEG AI VM and target the improvement of the coding efficiency and the reduction of the encoding and decoding complexity. The core experiments represent a set of promising technologies, such as learning-based GAN training, simplification of the analysis/synthesis transform, adaptive entropy coding alphabet, and even encoder-only tools and procedures for training speed-up.

JPEG Pleno Learning-based Point Cloud coding

The JPEG Pleno Point Cloud activity progressed at this meeting with the successful validation of the Verification Model under Consideration (VMuC). The VMuC was confirmed as the Verification Model (VM) to form the core of the future standard; ISO/IEC 21794 Part 6 JPEG Pleno: Learning-based Point Cloud Coding. The JPEG Committee has commenced work on the Working Draft of the standard, with initial text reviewed at this meeting. Prior to the next 98^th JPEG Meeting, JPEG experts will investigate possible advancements to the VM in the area of auto-regressive entropy encoding and sparse tensor convolution as well as sourcing additional point clouds for the JPEG Pleno Point Cloud test set.

JPEG Pleno Light Field

During the 97^th meeting, the JPEG Committee released the “JPEG Pleno Final Call for Contributions on Subjective Light Field Quality Assessment”, to collect new procedures and best practices regarding light field subjective quality evaluation methodologies to assess artifacts induced by coding algorithms. All contributions, including test procedures, datasets, and any additional information, will be considered to develop the standard by consensus among the JPEG experts following a collaborative process approach. The deadline for submission of contributions is April 1, 2023.

The JPEG Committee organized its 1^st workshop on light field quality assessment to discuss challenges and current solutions for subjective light field quality assessment, explore relevant use cases and requirements, and provide a forum for researchers to discuss the latest findings in this area. The JPEG Committee also promoted its 2^nd workshop on learning-based light field coding to exchange experiences and to present technological advances in learning-based coding solutions for light field data. The proceedings and video footage of both workshops are now accessible on the JPEG website.

JPEG AIC

At the 97^th JPEG Meeting, a new JPEG AIC Final Call for Contributions on Subjective Image Quality Assessment was issued. The JPEG Committee is working on the continuation of the previous standardization efforts (AIC-1 and AIC-2) and aims at developing a new standard, known as AIC-3. The new standard will be focusing on the methodologies for quality assessment of images in a range that goes from high quality to near-visually lossless quality, which are not covered by the previous AIC standards.

The Call for Contributions on Subjective Image Quality Assessment is asking for contributions to the standardization process that will be collaborative from the very beginning. In this context, all received contributions will be considered for the development of the standard by consensus among the JPEG experts.

The JPEG Committee will be releasing a new JPEG AIC-3 Dataset on the 15^th of December 2022. And the deadline for submitting contributions to the call is set to the 1^st of April 2023 23:59 UTC. The contributors will be presenting their contributions at the 99^th JPEG Meeting in April 2023.

The Call for Contributions on Subjective Image Quality Assessment addresses the development of a suitable subjective evaluation methodology standard. A second stage will address the objective perceptual visual quality evaluation models that perform well and have a good discriminative power in the high quality to near-visually lossless quality range.

JPEG DNA

The JPEG Committee has continued its exploration of the coding of images in quaternary representations, as it is particularly suitable for DNA storage applications. The scope of JPEG DNA is the creation of a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers. During the 97^th JPEG meeting, the JPEG DNA Benchmark Codec and the JPEG DNA Common Test Conditions were updated to allow for additional concrete experiments to take place prior to issuing a draft call for proposals at the next meeting. This will also allow further validation and extension of the JPEG DNA benchmark codec to simulate an end-to-end image storage pipeline using DNA and in particular include biochemical noise simulation which is an essential element in practical implementations.

JPEG XS

The 2^nd edition of JPEG XS is now fully completed and published. The JPEG Committee continues its work on the 3^rd edition of JPEG XS, starting with Part 1 (Core coding system) and Part 2 (Profiles and buffer models). These editions will address new use cases and requirements for JPEG XS by defining additional coding tools to further improve the coding efficiency, while keeping the low-latency and low-complexity core aspects of JPEG XS. The primary goal of the 3^rd edition is to deliver the same image quality as the 2^nd edition, but with half of the required bandwidth. During the 97^th JPEG meeting, a new Working Draft of Part 1 and a first Working Draft of Part 2 were created. To support the work a new Core Experiment was also issued to further test the proposed technology. Finally, an update to the JPEG XS White Paper has been published.

JPEG 2000

A new edition of Rec. ITU-T T.803 | ISO/IEC 15444-4 (JPEG 2000 conformance) is under development.

This new edition proposes to relax the maximum allowable errors so that well-designed 16-bit fixed-point implementations pass all compliance tests; adds two test codestreams to facilitate testing of inverse wavelet and component decorrelating transform accuracy, and adds several codestreams and files conforming to Rec. ITU-T 801 |ISO/IEC 15444-2 to facilitate the implementation of decoders and file format readers

Codestreams and test files can be found on the JPEG GitLab repository at: https://gitlab.com/wg1/htj2k-codestreams/-/merge_requests/14