VQEG Column: VQEG Meeting June 2023

Introduction

This column provides a report on the last Video Quality Experts Group (VQEG) plenary meeting, which took place from 26 to 30 June 2023 in San Mateo (USA), hosted by Sony Interactive Entertainment. More than 90 participants worldwide registered for the hybrid meeting, counting with the physical attendance of more than 40 people. This meeting was co-located with the ITU-T SG12 meeting, which took place in the first two days of the week. In addition, more than 50 presentations related to the ongoing projects within VQEG were provided, leading to interesting discussions among the researchers attending the meeting. All the related information, minutes, and files from the meeting are available online on the VQEG meeting website, and video recordings of the meeting are available on Youtube.

In this meeting, there were several aspects that can be relevant for the SIGMM community working on quality assessment. For instance, there are interesting new work items and efforts on updating existing recommendations discussed in the ITU-T SG12 co-located meeting (see the section about the Intersector Rapporteur Group on Audiovisual Quality Assessment). In addition, there was an interesting panel related to deep learning for video coding and video quality with experts from different companies (e.g., Netflix, Adobe, Meta, and Google) (see the Emerging Technologies Group section). Also, a special session on Quality of Experience (QoE) for gaming was organized, involving researchers from several international institutions. Apart from this, readers may be interested in the presentation about MPEG activities on quality assessment and the different developments from industry and academia on tools, algorithms and methods for video quality assessment.

We encourage readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Group picture of the VQEG Meeting 26-30 June 2023 hosted by Sony Interactive Entertainment (San Mateo, USA).

Overview of VQEG Projects

Audiovisual HD (AVHD)

The AVHD group investigates improved subjective and objective methods for analyzing commonly available video systems. In this meeting, there were several presentations related to topics covered by this group, which were distributed in different sessions during the meeting.

Nabajeet Barman (Kingston University, UK) presented a datasheet for subjective and objective quality assessment datasets. Ali Ak (Nantes Université, France) delivered a presentation on the acceptability and annoyance of video quality in context. Mikołaj Leszczuk (AGH University, Poland) presented a crowdsourcing pixel quality study using non-neutral photos. Kamil Koniuch (AGH University, Poland) discussed about the role of theoretical models in ecologically valid studies, covering the example of a video quality of experience model. Jingwen Zhu (Nantes Université, France) presented her work on evaluating the streaming experience of the viewers with Just Noticeable Difference (JND)-based Encoding. Also, Lucjan Janowski (AGH University, Poland) talked about proposing a more ecologically-valid experiment protocol using YouTube platform.

In addition, there were four presentations by researchers from the industry sector. Hojat Yeganeh (SSIMWAVE/IMAX, USA) talked about how more accurate video quality assessment metrics would lead to more savings. Lukas Krasula (Netflix, USA) delivered a presentation on subjective video quality for 4K HDR-WCG content using a browser-based approach for at-home testing. Also, Christos Bampis (Netflix, USA) presented the work done by Netflix on improving video quality with neural networks. Finally, Pranav Sodhani (Apple, USA) talked about how to evaluate videos with the Advanced Video Quality Tool (AVQT).

Quality Assessment for Health applications (QAH)

The QAH group works on the quality assessment of health applications, considering both subjective evaluation and the development of datasets, objective metrics, and task-based approaches. The group is currently working towards an ITU-T recommendation for the assessment of medical contents. In this sense, Meriem Outtas (INSA Rennes, France) led an editing session of a draft of this recommendation.

Statistical Analysis Methods (SAM)

The SAM group works on improving analysis methods both for the results of subjective experiments and for objective quality models and metrics. The group is currently working on updating and merging the ITU-T recommendations P.913, P.911, and P.910.

Apart from this, several researchers presented their works on related topics. For instance, Pablo Pérez (Nokia XR Lab, Spain) presented (not so) new findings about transmission rating scale and subjective scores. Also, Jingwen Zhu (Nantes Université, France) presented ZREC, an approach for mean and percentile opinion scores recovery. In addition, Andreas Pastor (Nantes Université, France) presented three works: 1) on the accuracy of open video quality metrics for local decision in AV1 video codec, 2) on recovering quality scores in noisy pairwise subjective experiments using negative log-likelihood, and 3) on guidelines for subjective haptic quality assessment, considering a case study on quality assessment of compressed haptic signals. Lucjan Janowski (AGH University, Poland) discussed about experiment precision, proposing experiment precision measures and methods for experiments comparison. Finally, there were three presentations from members of the University of Konstanz (Germany). Dietmar Saupe presented the JPEG AIC-3 activity on fine-grained assessment of subjective quality of compressed images, Mohsen Jenadeleh talked about how relaxed forced choice improves performance of visual quality assessment methods, and Mirko Dulfer presented his work on quantization for Mean Opinion Score (MOS) recovery in Absolute Category Rating (ACR) experiments.

Computer Generated Imagery (CGI)

CGI group is devoted to analyzing and evaluating of computer-generated content, with a focus on gaming in particular. In this meeting, Saman Zadtootaghaj (Sony Interactive Entertainment, Germany) an Nabajeet Barman (Kingston University, UK) organized a special gaming session, in which researchers from several international institutions presented their work in this topic. Among them, Yu-Chih Chen (UT Austin LIVE Lab, USA) presented GAMIVAL, a Video Quality Prediction on Mobile Cloud Gaming Content. Also, Urvashi Pal (Akamai, USA) delivered a presentation on web streaming quality assessment via computer vision applications over cloud. Mathias Wien (RWTH Aachen University, Germany) provided updates on ITU-T P.BBQCG work item, dataset and model development. Avinab Saha (UT Austin LIVE Lab, USA) presented a study of subjective and objective quality assessment of mobile cloud gaming videos. Finally, Irina Cotanis (Infovista, Sweden) and Karan Mitra (Luleå University of Technology, Sweden) presented their work towards QoE models for mobile cloud and virtual reality games.

No Reference Metrics (NORM)

The NORM group is an open collaborative project for developing no-reference metrics for monitoring visual service quality. In this meeting, Margaret Pinson (NTIA, USA) and Ioannis Katsavounidis (Meta, USA), two of the chairs of the group, provided a summary of NORM successes and discussion of current efforts for improved complexity metric. In addition, there were six presentations dealing with related topics. C.-C. Jay Kuo (University of Southern California, USA) talked about blind visual quality assessment for mobile/edge computing. Vignesh V. Menon (University of Klagenfurt, Austria) presented the updates of the Video Quality Analyzer (VQA). Yilin Wang (Google/YouTube, USA) gave a talk on the recent updates on the Universal Video Quality (UVQ). Farhad Pakdaman (Tampere University, Finland) and Li Yu (Nanjing University, China), presented a low complexity no-reference image quality assessment based on multi-scale attention mechanism with natural scene statistics. Finally, Mikołaj Leszczuk (AGH University, Poland) presented his work on visual quality indicators adapted to resolution changes and on considering in-the-wild video content as a special case of user generated content and a system for its recognition.

Emerging Technologies Group (ETG)

The main objective of the ETG group is to address various aspects of multimedia that do not fall under the scope of any of the existing VQEG groups. The topics addressed are not necessarily directly related to “video quality” but can indirectly impact the work addressed as part of VQEG. This group aims to provide a common platform for people to gather together and discuss new emerging topics, discuss possible collaborations in the form of joint survey papers/whitepapers, funding proposals, etc.

One of the topics addressed by this group is related to the use of artificial-intelligence technologies to different domains, such as compression, super-resolution, and video quality assessment. In this sense, Saman Zadtootaghaj (Sony Interactive Entertainment, Germany) organized a panel session with experts from different companies (e.g., Netflix, Adobe, Meta, and Google) on deep learning in the video coding and video quality domains. In this sense, Marcos Conde (Sony Interactive Entertainment, Germany) and David Minnen (Google, USA) gave a talk on generative compression and the challenges for quality assessment.

Another topic covered by this group is greening of streaming and related trends. In this sense, Vignesh V. Menon and Samira Afzal (University of Klagenfurt, Austria) presented their work on green variable framerate encoding for adaptive live streaming. Also, Prajit T. Rajendran (Université Paris Saclay, France) and Vignesh V. Menon (University of Klagenfurt, Austria) delivered a presentation on energy efficient live per-title encoding for adaptive streaming. Finally, Berivan Isik (Stanford University, USA) talked about sandwiched video compression to efficiently extending the reach of standard codecs with neural wrappers.

Joint Effort Group (JEG) – Hybrid

The JEG group was focused on a joint work to develop hybrid perceptual/bitstream metrics and gradually evolved over time to include several areas of Video Quality Assessment (VQA), such as the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. In addition, the group will include under its activities the VQEG project Implementer’s Guide for Video Quality Metrics (IGVQM).

Apart from this, there were three presentations addressing related topics in this meeting. Nabajeet Barman (Kingston University, UK) presented a subjective dataset for multi-screen video streaming applications. Also, Lohic Fotio (Politecnico di Torino, Italy) presented his works entitled “Human-in-the-loop” training procedure of the artificial-intelligence-based observer (AIO) of a real subject and advances on the “template” on how to report DNN-based video quality metrics.

The website of the group includes a list of activities of interest, freely available publications, and other resources.

Immersive Media Group (IMG)

The IMG group is focused on the research on quality assessment of immersive media. The main joint activity going on within the group is the development of a test plan to evaluate the QoE of immersive interactive communication systems, which is carried out in collaboration with ITU-T through the work item P.IXC. In this meeting, Pablo Pérez (Nokia XR Lab, Spain) and Jesús Gutiérrez (Universidad Politécnica de Madrid, Spain) provided a report on the status of the test plan, including the test proposals from 13 different groups that have joined the activity, which will be launched in September.

In addition to this, Shirin Rafiei (RISE, Sweden) delivered a presentation on her work on human interaction in industrial tele-operated driving through a laboratory investigation.

Quality Assessment for Computer Vision Applications (QACoViA)

The goal of the QACoViA group is to study the visual quality requirements for computer vision methods, where the “final observer” is an algorithm. In this meeting, Avrajyoti Dutta (AGH University, Poland) delivered a presentation dealing with the subjective quality assessment of video summarization algorithms through a crowdsourcing approach.

Intersector Rapporteur Group on Audiovisual Quality Assessment (IRG-AVQA)

This VQEG meeting was co-located with the rapporteur group meeting of ITU-T Study Group 12 – Question 19, coordinated by Chulhee Lee (Yonsei University, Korea). During the first two days of the week, the experts from ITU-T and VQEG worked together on various topics. For instance, there was an editing session to work together on the VQEG proposal to merge the ITU-T Recommendations P.910, P.911, and P.913, including updates with new methods. Another topic addressed during this meeting was the working item “P.obj-recog”, related to the development of an object-recognition-rate-estimation model in surveillance video of autonomous driving. In this sense, a liaison statement was also discussed with the VQEG AVHD group. Also in relation to this group, another liaison statement was discussed on the new work item “P.SMAR” on subjective tests for evaluating the user experience for mobile Augmented Reality (AR) applications.

Other updates

One interesting presentation was given by Mathias Wien (RWTH Aachen University, Germany) on the quality evaluation activities carried out within the MPEG Visual Quality Assessment group, including the expert viewing tests. This presentation and the follow-up discussions will help to strengthen the collaboration between VQEG and MPEG on video quality evaluation activities.

The next VQEG plenary meeting will take place in autumn 2023 and will be announced soon on the VQEG website.

MPEG Column: 143rd MPEG Meeting in Geneva, Switzerland

The 143rd MPEG meeting took place in person in Geneva, Switzerland. The official press release can be accessed here and includes the following details:

  • MPEG finalizes the Carriage of Uncompressed Video and Images in ISOBMFF
  • MPEG reaches the First Milestone for two ISOBMFF Enhancements
  • MPEG ratifies Third Editions of VVC and VSEI
  • MPEG reaches the First Milestone of AVC (11th Edition) and HEVC Amendment
  • MPEG Genomic Coding extended to support Joint Structured Storage and Transport of Sequencing Data, Annotation Data, and Metadata
  • MPEG completes Reference Software and Conformance for Geometry-based Point Cloud Compression

We have adjusted the press release to suit the audience of ACM SIGMM and emphasized research on video technologies. This edition of the MPEG column centers around ISOBMFF and video codecs. As always, the column will conclude with an update on MPEG-DASH.

ISOBMFF Enhancements

The ISO Base Media File Format (ISOBMFF) supports the carriage of a wide range of media data such as video, audio, point clouds, haptics, etc., which has now been further extended to uncompressed video and images.

ISO/IEC 23001-17 – Carriage of uncompressed video and images in ISOBMFF – specifies how uncompressed 2D image and video data is carried in files that comply with the ISOBMFF family of standards. This encompasses a range of data types, including monochromatic and colour data, transparency (alpha) information, and depth information. The standard enables the industry to effectively exchange uncompressed video and image data while utilizing all additional information provided by the ISOBMFF, such as timing, color space, and sample aspect ratio for interoperable interpretation and/or display of uncompressed video and image data.

ISO/IEC 14496-15 (based on ISOBMFF) provides the basis for “network abstraction layer (NAL) unit structured video coding formats” such as AVC, HEVC, and VVC. The current version is the 6th edition, which has been amended to support neural-network post-filter supplemental enhancement information (SEI) messages. This amendment defines the carriage of the neural-network post-filter characteristics (NNPFC) SEI messages and the neural-network post-filter activation (NNPFA) SEI messages to enable the delivery of (i) a base post-processing filter and (ii) a series of neural network updates synchronized with the input video pictures/frames.

Research aspects: While the former, the carriage of uncompressed video and images in ISOBMFF, seems to be something obvious to be supported within a file format, the latter enables to use neural network-based post-processing filters to enhance video quality after the decoding process, which is an active field of research. The current extensions with the file format provide a baseline for the evaluation (cf. also next section).

Video Codec Enhancements

MPEG finalized the specifications of the third editions of the Versatile Video Coding (VVC, ISO/IEC 23090-3) and the Versatile Supplemental Enhancement Information (VSEI, ISO/IEC 23002-7) standards. Additionally, MPEG issued the Committee Draft (CD) text of the eleventh edition of the Advanced Video Coding (AVC, ISO/IEC 14496-10) standard and the Committee Draft Amendment (CDAM) text on top of the High Efficiency Video Coding standard (HEVC, ISO/IEC 23008-2).

These SEI messages include two systems-related SEI messages, (a) one for signaling of green metadata as specified in ISO/IEC 23001-11 and (b) the other for signaling of an alternative video decoding interface for immersive media as specified in ISO/IEC 23090-13. Furthermore, the neural network post-filter characteristics SEI message and the neural-network post-processing filter activation SEI message have been added to AVC, HEVC, and VVC.

The two SEI messages for describing and activating post-filters using neural network technology in video bitstreams could, for example, be used for reducing coding noise, spatial and temporal upsampling (i.e., super-resolution and frame interpolation), color improvement, or general denoising of the decoder output. The description of the neural network architecture itself is based on MPEG’s neural network representation standard (ISO/IEC 15938 17). As results from an exploration experiment have shown, neural network-based post-filters can deliver better results than conventional filtering methods. Processes for invoking these new post-filters have already been tested in a software framework and will be made available in an upcoming version of the VVC reference software (ISO/IEC 23090-16).

Research aspects: SEI messages for neural network post-filters (NNPF) for AVC, HEVC, and VVC, including systems supports within the ISOBMFF, is a powerful tool(box) for interoperable visual quality enhancements at the client. This tool(box) will (i) allow for Quality of Experience (QoE) assessments and (ii) enable the analysis thereof across codecs once integrated within the corresponding reference software.

MPEG-DASH Updates

The current status of MPEG-DASH is depicted in the figure below:

The latest edition of MPEG-DASH is the 5th edition (ISO/IEC 23009-1:2022) which is publicly/freely available here. There are currently three amendments under development:

  • ISO/IEC 23009-1:2022 Amendment 1: Preroll, nonlinear playback, and other extensions. This amendment has been ratified already and is currently being integrated into the 5th edition of part 1 of the MPEG-DASH specification.
  • ISO/IEC 23009-1:2022 Amendment 2: EDRAP streaming and other extensions. EDRAP stands for Extended Dependent Random Access Point and at this meeting the Draft Amendment (DAM) has been approved. EDRAP increases the coding efficiency for random access and has been adopted within VVC.
  • ISO/IEC 23009-1:2022 Amendment 3: Segment sequences for random access and switching. This amendment is at Committee Draft Amendment (CDAM) stage, the first milestone of the formal standardization process. This amendment aims at improving tune-in time for low latency streaming.

Additionally, MPEG Technologies under Consideration (TuC) comprises a few new work items, such as content selection and adaptation logic based on device orientation and signalling of haptics data within DASH.

Finally, part 9 of MPEG-DASH — redundant encoding and packaging for segmented live media (REAP) — has been promoted to Draft International Standard (DIS). It is expected to be finalized in the upcoming meetings.

Research aspects: Random access has been extensively evaluated in the context of video coding but not (low latency) streaming. Additionally, the TuC item related to content selection and adaptation logic based on device orientation raises QoE issues to be further explored.

The 144th MPEG meeting will be held in Hannover from October 16-20, 2023. Click here for more information about MPEG meetings and their developments.

JPEG Column: 99th JPEG Meeting

JPEG Trust on a mission to re-establish trust in digital media

The 99th JPEG meeting was held online, from 24th to 28th April 2023.

Providing tools suitable for establishing provenance, authenticity and ownership of multimedia content is one of the most difficult challenges faced nowadays, considering the technological models that allow effective multimedia data manipulation and generation. As in the past, the JPEG Committee is again answering the emerging challenges in multimedia. JPEG Trust is a standard offering solutions to media authenticity, provenance and ownership.

Furthermore, learning-based coding standards, JPEG AI and JPEG Pleno Learning-based Point Cloud Coding, continue their development. New verification models that incorporate the technological developments resulting from verification experiments and contributions have been approved.

Also relevant, the responses to the Calls for Contributions on standardization of quality models of JPEG AIC and JPEG Pleno Light Field Quality Assessment received responses and started a collaborative process to define new standards.

The 99th JPEG meeting had the following highlights:

Trust, Authenticity and Provenance.
  • New JPEG Trust international standard targets media authenticity
  • JPEG AI new verification model
  • JPEG DNA releases its call for proposals
  • JPEG Pleno Light Field Quality Assessment analyses the response to the call for contributions
  • JPEG AIC analyses the response to the call for contributions
  • JPEG XE identifies use cases and requirements for event based vision
  • JPEG Systems: JUMBF second edition is progressing to publication stage
  • JPEG NFT prepares a call for proposals
  • JPEG XS progress for its third edition

The following summarizes the major achievements during the 99th JPEG meeting.

New JPEG Trust international standard targets media authenticity

Drawing reliable conclusions about the authenticity of digital media is complicated, and becoming more so as AI-based synthetic media such as Deep Fakes and Generative Adversarial Netwodrks (GANs) start appearing. Consumers of social media are challenged to assess the trustworthiness of the media they encounter, and agencies that depend on the authenticity of media assets must be concerned with mistaking fake media for real, with risks of real-world consequences.

To address this problem and to provide leadership in global interoperable media asset authenticity, JPEG initiated development of a new international standard: JPEG Trust. JPEG Trust defines a framework for establishing trust in media. This framework adresses aspects of authenticity, provenance and integrity through secure and reliable annotation of media assets throughout their life cycle. The first part, “Core foundation”, defines the JPEG Trust framework and provides building blocks for more elaborate use cases. It is expected that the standard will evolve over time and be extended with additional specifications.

JPEG Trust arises from a four-year exploration of requirements for addressing mis- and dis-information in online media, followed by a 2022 Call for Proposals, conducted by international experts from industry and academia from all over the world.

The new standard is expected to be published in 2024. To stay updated on JPEG Trust, please regularly check the JPEG website for the latest information.

JPEG AI

The JPEG AI activity progressed at this meeting with more than 60 technical contributions submitted for improvements and additions to the Verification Model (VM), which after some discussion and analysis, resulted in several adoptions for integration into the future VM3.0. These adoptions target the speed-up of the decoding process, namely the replacement of the range coder by an asymmetric numeral system, support for multi-threading or/and single instruction multiple data operations, and parallel decoding with sub-streams. The JPEG AI context module was significantly accelerated with a new network architecture along with other synthesis transform and entropy decoding network simplifications. Moreover, a lightweight model was also adopted targeting mobile devices, providing 10%-15% compression efficiency gains over VVC Intra at just 20-30 kMAC/pxl. In this context, JPEG AI will start the development and evaluation of two JPEG AI VM configurations at two different operating points: lightweight and high.

At the 99th meeting, the JPEG AI requirements were reviewed and it was concluded that most of the key requirements will be achieved by the previously anticipated timeline for DIS (scheduled for Oct. 2023) and thus version 1 of the JPEG AI standard will go as planned without changes in its timeline and with a clear focus on image reconstruction. Some core requirements, such as those addressing computer vision and image processing tasks as well as progressive decoding, will be addressed in a version 2 along with other tools that further improve requirements already addressed in version 1, such as better compression efficiency.

JPEG Pleno Learning-based Point Cloud coding

The JPEG Pleno Point Cloud activity progressed at this meeting with a major improvement to its VM providing improved performance and control over the balance between the coding of geometry and colour via a split geometry and colour coding framework. Colour attribute information is encoded using JPEG AI resulting in enhanced performance and compatibility with the ecosystem of emerging high-performance JPEG codecs. Prior to the 100th JPEG Meeting, JPEG experts will investigate possible advancements to the VM in the areas of attention models, sparse tensor convolution, and support for residual lossless coding.

JPEG DNA

The JPEG Committee has been working on an exploration for coding of images in quaternary representations particularly suitable for image archival on DNA storage. The scope of JPEG DNA is the creation of a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers. During the 99th JPEG meeting, a final call for proposals for JPEG DNA was issued and made public, as a first concrete step towards standardization.

The final call for proposals for JPEG DNA is complemented by a JPEG DNA Common Test Conditions document which is also made public, describing details about the dataset, operating points, anchors and performance assessment methodologies and metrics that will be used to evaluate anchors and future proposals to be submitted. A set of exploration studies has validated the procedures outlined in the final call for proposals for JPEG DNA. The deadline for submission of proposals to the Call for Proposals for JPEG DNA is 2 October 2023, with a pre-registration due by 10 July 2023. The JPEG DNA international standard is expected to be published by early 2025.

JPEG Pleno Light Field Quality Assessment

At the 99th JPEG meeting two contributions were received in response to the JPEG Pleno Final Call for Contributions (CfC) on Subjective Light Field Quality Assessment.

  • Contribution 1: presents a 3-step subjective quality assessment framework, with a pre-processing step; a scoring step; and a data processing step. The contribution includes a software implementation of the quality assessment framework.
  • Contribution 2: presents a multi-view light field dataset, comprising synthetic light fields. It provides RGB + ground-truth depth data, realistic and challenging blender scenes, with various textures, fine structures, rich depth, specularities, non-Lambertian areas, and difficult materials (water, patterns, etc).

The received contributions will be considered in the development of a modular framework based on a collaborative process addressing the use cases and requirements under the JPEG Pleno Quality Assessment of light fields standardization effort.

JPEG AIC

Three contributions in response to the JPEG Call for Contributions (CfC) on Subjective Image Quality Assessment were received at the 99th JPEG meeting. One contribution presented a new subjective quality assessment methodology that combines relative and absolute data. The second contribution reported a new subjective quality assessment methodology based on triplet comparison with boosting techniques. Finally, the last contribution reported a new pairwise sampling methodology.

These contributions will be considered in the development of the standard, following a collaborative process. Several core experiments were designed to assist the creation of a Working Draft (WD) for the future JPEG AIC Part 3 standard.

JPEG XE

The JPEG committee continued with the exploration activity on Event-based Vision, called JPEG XE. Event-based Vision revolves around a new and emerging image modality created by event-based visual sensors. At this meeting, the scope was defined to be the creation and development of a standard to represent events in an efficient way allowing interoperability between sensing, storage, and processing, targeting machine vision applications. Events in the context of this standard are defined as the messages that signal the result of an observation at a precise point in time, typically triggered by a detected change in the physical world. The exploration activity is currently working on the definition of the use cases and requirements.

An Ad-hoc Group has been established. To stay informed about the activities please join the event based imaging Ad-hoc Group mailing list.

JPEG XL

The second editions of JPEG XL Part 1 (Core coding system) and Part 2 (File format) have proceeded to the DIS stage. These second editions provide clarifications, corrections and editorial improvements that will facilitate independent implementations. Experiments are planned to prepare for a second edition of JPEG XL Part 3 (Conformance testing), including conformance testing of the independent implementations J40, jxlatte, and jxl-oxide.

JPEG Systems

The second edition of JUMBF (JPEG Universal Metadata Box Format, ISO/IEC 19566-5) is progressing to the IS publication stage; the second edition brings new capabilities and support for additional types of media.

JPEG NFT

Many Non-Fungible Tokens (NFTs) point to assets represented in JPEG formats or can be represented in current and emerging formats under development by the JPEG Committee. However, various trust and security concerns have been raised about NFTs and the digital assets on which they rely. To better understand user requirements for media formats, the JPEG Committee conducted an exploration on NFTs. The scope of JPEG NFT is the creation of effective specifications that support a wide range of applications relying on NFTs applied to media assets. The standard will be secure, trustworthy and eco-friendly, allowing for an interoperable ecosystem relying on NFT within a single application or across applications. As a result of the exploration, at the 99th JPEG Meeting the committee released a “Draft Call for Proposals on JPEG NFT” and associated updated “Use Cases and Requirements for JPEG NFT”. Both documents are made publicly available for review and feedback.

JPEG XS

The JPEG committee continued its work on the JPEG XS 3rd edition. The primary goal of the 3rd edition is to deliver the same image quality as the 2nd edition, but with half of the required bandwidth. For Part 1 – Core coding tools – the Draft International Standard will proceed to ISO/IEC ballot. This is a significant step in the standardization process with all the core coding technology now final. Most notably, Part 1 adds a temporal decorrelation coding mode to further improve the coding efficiency, while keeping the low-latency and low-complexity core aspects of JPEG XS. Furthermore, Part 2 – Profiles and buffer models – and Part 3 – Transport and container formats – will proceed to Committee Draft consultation. Part 2 is important as it defines the conformance points for JPEG XS compliance. Completion of the JPEG XS 3rd edition standard is scheduled for January 2024.

Final Quote

“The creation of standardized tools to bring assurance of authenticity, provenance and ownership for multimedia content is the most efficient path to suppress the abusive use of fake media. JPEG Trust will be the first international standard that provides such tools.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Future JPEG meetings are planned as follows:

  • No 100, will be in Covilhã, Portugal from 17-21 July 2023
  • No 101, will be online from 30 October – 3 November 2023

A zip package containing the official JPEG logo and logos of all JPEG standards can be downloaded here.

MPEG Column: 142nd MPEG Meeting in Antalya, Türkiye

The 142nd MPEG meeting was held as a face-to-face meeting in Antalya, Türkiye, and the official press release can be found here and comprises the following items:

  • MPEG issues Call for Proposals for Feature Coding for Machines
  • MPEG finalizes the 9th Edition of MPEG-2 Systems
  • MPEG reaches the First Milestone for Storage and Delivery of Haptics Data
  • MPEG completes 2nd Edition of Neural Network Coding (NNC)
  • MPEG completes Verification Test Report and Conformance and Reference Software for MPEG Immersive Video
  • MPEG finalizes work on metadata-based MPEG-D DRC Loudness Leveling

The press release text has been modified to match the target audience of ACM SIGMM and highlight research aspects targeting researchers in video technologies. This column focuses on the 9th edition of MPEG-2 Systems, storage and delivery of haptics data, neural network coding (NNC), MPEG immersive video (MIV), and updates on MPEG-DASH.

© https://www.mpeg142.com/en/

Feature Coding for Video Coding for Machines (FCVCM)

At the 142nd MPEG meeting, MPEG Technical Requirements (WG 2) issued a Call for Proposals (CfP) for technologies and solutions enabling efficient feature compression for video coding for machine vision tasks. This work on “Feature Coding for Video Coding for Machines (FCVCM)” aims at compressing intermediate features within neural networks for machine tasks. As applications for neural networks become more prevalent and the neural networks increase in complexity, use cases such as computational offload become more relevant to facilitate the widespread deployment of applications utilizing such networks. Initially as part of the “Video Coding for Machines” activity, over the last four years, MPEG has investigated potential technologies for efficient compression of feature data encountered within neural networks. This activity has resulted in establishing a set of ‘feature anchors’ that demonstrate the achievable performance for compressing feature data using state-of-the-art standardized technology. These feature anchors include tasks performed on four datasets.

Research aspects: FCVCM is about compression, and the central research aspect here is compression efficiency which can be tested against a commonly agreed dataset (anchors). Additionally, it might be attractive to research which features are relevant for video coding for machines (VCM) and quality metrics in this emerging domain. One might wonder whether, in the future, robots or other AI systems will participate in subjective quality assessments.

9th Edition of MPEG-2 Systems

MPEG-2 Systems was first standardized in 1994, defining two container formats: program stream (e.g., used for DVDs) and transport stream. The latter, also known as MPEG-2 Transport Stream (M2TS), is used for broadcast and internet TV applications and services. MPEG-2 Systems has been awarded a Technology and Engineering Emmy® in 2013 and at the 142nd MPEG meeting, MPEG Systems (WG 3) ratified the 9th edition of ISO/IEC 13818-1 MPEG-2 Systems. The new edition includes support for Low Complexity Enhancement Video Coding (LCEVC), the youngest in the MPEG family of video coding standards on top of more than 50 media stream types, including, but not limited to, 3D Audio and Versatile Video Coding (VVC). The new edition also supports new options for signaling different kinds of media, which can aid the selection of the best audio or other media tracks for specific purposes or user preferences. As an example, it can indicate that a media track provides information about a current emergency.

Research aspects: MPEG container formats such as MPEG-2 Systems and ISO Base Media File Format are necessary for storing and delivering multimedia content but are often neglected in research. Thus, I would like to take up the cudgels on behalf of the MPEG Systems working group and argue that researchers should pay more attention to these container formats and conduct research and experiments for its efficient use with respect to multimedia storage and delivery.

Storage and Delivery of Haptics Data

At the 142nd MPEG meeting, MPEG Systems (WG 3) reached the first milestone for ISO/IEC 23090-32 entitled “Carriage of haptics data” by promoting the text to Committee Draft (CD) status. This specification enables the storage and delivery of haptics data (defined by ISO/IEC 23090-31) in the ISO Base Media File Format (ISOBMFF; ISO/IEC 14496-12). Considering the nature of haptics data composed of spatial and temporal components, a data unit with various spatial or temporal data packets is used as a basic entity like an access unit of audio-visual media. Additionally, an explicit indication of a silent period considering the sparse nature of haptics data has been introduced in this draft. The standard is planned to be completed, i.e., to reach the status of Final Draft International Standard (FDIS), by the end of 2024.

Research aspects: Coding (ISO/IEC 23090-31) and carriage (ISO/IEC 23090-32) of haptics data goes hand in hand and needs further investigation concerning compression efficiency and storage/delivery performance with respect to various use cases.

Neural Network Coding (NNC)

Many applications of artificial neural networks for multimedia analysis and processing (e.g., visual and acoustic classification, extraction of multimedia descriptors, or image and video coding) utilize edge-based content processing or federated training. The trained neural networks for these applications contain many parameters (weights), resulting in a considerable size. Therefore, the MPEG standard for the compressed representation of neural networks for multimedia content description and analysis (NNC, ISO/IEC 15938-17, published in 2022) was developed, which provides a broad set of technologies for parameter reduction and quantization to compress entire neural networks efficiently.

Recently, an increasing number of artificial intelligence applications, such as edge-based content processing, content-adaptive video post-processing filters, or federated training, need to exchange updates of neural networks (e.g., after training on additional data or fine-tuning to specific content). Such updates include changes in the neural network parameters but may also involve structural changes in the neural network (e.g. when extending a classification method with a new class). In scenarios like federated training, these updates must be exchanged frequently, such that much more bandwidth over time is required, e.g., in contrast to the initial deployment of trained neural networks.

The second edition of NNC addresses these applications through efficient representation and coding of incremental updates and extending the set of compression tools that can be applied to both entire neural networks and updates. Trained models can be compressed to at least 10-20% and, for several architectures, even below 3% of their original size without performance loss. Higher compression rates are possible at moderate performance degradation. In a distributed training scenario, a model update after a training iteration can be represented at 1% or less of the base model size on average without sacrificing the classification performance of the neural network. NNC also provides synchronization mechanisms, particularly for distributed artificial intelligence scenarios, e.g., if clients in a federated learning environment drop out and later rejoin.

Research aspects: The incremental compression of neural networks enables various new use cases, which provides research opportunities for media coding and communication, including optimization thereof.

MPEG Immersive Video

At the 142nd MPEG meeting, MPEG Video Coding (WG 4) issued the verification test report of ISO/IEC 23090-12 MPEG immersive video (MIV) and completed the development of the conformance and reference software for MIV (ISO/IEC 23090-23), promoting it to the Final Draft International Standard (FDIS) stage.

MIV was developed to support the compression of immersive video content, in which multiple real or virtual cameras capture a real or virtual 3D scene. The standard enables the storage and distribution of immersive video content over existing and future networks for playback with 6 degrees of freedom (6DoF) of view position and orientation. MIV is a flexible standard for multi-view video plus depth (MVD) and multi-planar video (MPI) that leverages strong hardware support for commonly used video formats to compress volumetric video.

ISO/IEC 23090-23 specifies how to conduct conformance tests and provides reference encoder and decoder software for MIV. This draft includes 23 verified and validated conformance bitstreams spanning all profiles and encoding and decoding reference software based on version 15.1.1 of the test model for MPEG immersive video (TMIV). The test model, objective metrics, and other tools are publicly available at https://gitlab.com/mpeg-i-visual.

Research aspects: Conformance and reference software are usually provided to facilitate product conformance testing, but it also provides researchers with a common platform and dataset, allowing for the reproducibility of their research efforts. Luckily, conformance and reference software are typically publicly available with an appropriate open-source license.

MPEG-DASH Updates

Finally, I’d like to provide a quick update regarding MPEG-DASH, which has become a new part, namely redundant encoding and packaging for segmented live media (REAP; ISO/IEC 23009-9). The following figure provides the reference workflow for redundant encoding and packaging of live segmented media.

Reference workflow for redundant encoding and packaging of live segmented media.

The reference workflow comprises (i) Ingest Media Presentation Description (I-MPD), (ii) Distribution Media Presentation Description (D-MPD), and (iii) Storage Media Presentation Description (S-MPD), among others; each defining constraints on the MPD and tracks of ISO base media file format (ISOBMFF).

Additionally, the MPEG-DASH Break out Group discussed various technologies under consideration, such as (a) combining HTTP GET requests, (b) signaling common media client data (CMCD) and common media server data (CMSD) in a MPEG-DASH MPD, (c) image and video overlays in DASH, and (d) updates on lower latency.

An updated overview of DASH standards/features can be found in the Figure below.

Research aspects: The REAP committee draft (CD) is publicly available feedback from academia and industry is appreciated. In particular, first performance evaluations or/and reports from proof of concept implementations/deployments would be insightful for the next steps in the standardization of REAP.

The 143rd MPEG meeting will be held in Geneva from July 17-21, 2023. Click here for more information about MPEG meetings and their developments.

VQEG Column: Emerging Technologies Group (ETG)

Introduction

This column provides an overview of the new Video Quality Experts Group (VQEG) group called the Emerging Technologies Group (ETG), which was created during the last VQEG plenary meeting in December 2022. For an introduction to VQEG, please check the VQEG homepage or this presentation.

The works addressed by this new group can be of interest for the SIGMM community since they are related to AI-based technologies for image and video processing, greening of streaming, blockchain in media and entertainment, and ongoing related standardization activities.

About ETG

The main objective of this group is to address various aspects of multimedia that do not fall under the scope of any of the existing VQEG groups. The group, through its activities, aims to provide a common platform for people to gather together and discuss new emerging topics and ideas, discuss possible collaborations in the form of joint survey papers/whitepapers, funding proposals, etc. The topics addressed are not necessarily directly related to “video quality” but rather focus on any ongoing work in the field of multimedia which can indirectly impact the work addressed as part of VQEG. 

Scope

During the creation of the group, the following topics were tentatively identified to be of possible interest to the members of this group and VQEG in general: 

  • AI-based technologies:
    • Super Resolution
    • Learning-based video compression
    • Video coding for machines, etc., 
    • Enhancement, Denoising and other pre- and post-filter techniques
  • Greening of streaming and related trends
    • For example, trade-off between HDR and SDR to save energy and its impact on visual quality
  • Ongoing Standards Activities (which might impact the QoE of end users and hence will be relevant for VQEG)
    • 3GPP, SVTA, CTA WAVE, UHDF, etc.
    • MPEG/JVET
  • Blockchain in Media and Entertainment

Since the creation of the group, four talks on various topics have been organized, an overview of which is summarized next.

Overview of the Presentations

We briefly provide a summary of various talks that have been organized by the group since its inception.

On the work by MPEG Systems Smart Contracts for Media Subgroup

The first presentation was on the topic of the recent work by MPEG Systems on Smart Contract for Media [1], which was delivered by Dr Panos Kudumakis who is the Head of UK Delegation, ISO/IEC JTC1/SC29 & Chair of British Standards Institute (BSI) IST/37. Dr Panos in this talk highlighted the efforts in the last few years by MPEG towards developing several standardized ontologies catering to the needs of the media industry with respect to the codification of Intellectual Property Rights (IPR) information toward the fair trade of media. However, since inference and reasoning capabilities normally associated with ontology use cannot naturally be done on DLT environments, there is a huge potential to unlock the Semantic Web and, in turn, the creative economy by bridging this interoperability gap [2]. In that direction, ISO/IEC 21000-23 Smart Contracts for Media standard specifies the means (e.g., APIs) for converting MPEG IPR ontologies to smart contracts that can be executed on existing DLT environments [3]. The talk discussed the recent works that have been done as part of this effort and also on the ongoing efforts towards the design of a full-fledged ISO/IEC 23000-23 Decentralized Media Rights Application Format standard based on MPEG technologies (e.g., audio-visual codecs, file formats, streaming protocols, and smart contracts) and non-MPEG technologies (e.g., DLTs, content, and creator IDs). 
The recording of the presentation is available here, and the slides can be accessed here.

Introduction to NTIRE Workshop on Quality Assessment for Video Enhancement

The second presentation was given by Xiaohong Liu and Yuxuan Gao from Shanghai Jiao Tong University, China about one of the CVPR challenge workshops called the NTIRE 2023 Quality Assessment of Video Enhancement Challenge. The presentation described the motivation for starting this challenge and how this is of great relevance to the video community in general. Then the presenters described the dataset such as the dataset creation process, subjective tests to obtain ratings, and the reasoning behind the choice of the split of the dataset into training, validation, and test sets. The results of this challenge are scheduled to be presented at the upcoming spring meeting end of June 2023. The presentation recording is available here.  

Perception: The Next Milestone in Learned Image Compression

Johannes Balle from Google was the third presenter on the topic of “Perception: The Next Milestone in Learned Image Compression.” In the first part, Johannes discussed the learned compression and described the nonlinear transforms [4] and how they could achieve a higher image compression rate than linear transforms. Next, they emphasized the importance of perceptual metrics in comparison to distortion metrics by introducing the difference between perceptual quality vs. reconstruction quality [5]. Next, an example of generative-based image compression is presented where the two criteria of distortion metric and perceptual metric (named as realism criteria) are combined, HiFiC [6]. Finally, the talk concluded with an introduction to perceptual spaces and an example of a perceptual metric, PIM [7]. The presentation slides can be found here.

Compression with Neural Fields

Emilien Dupont (DeepMind) was the fourth presenter. He started the talk with a short introduction on the emergence of neural compression that fits a signal, e.g., an image or video, to a neural network. He then discussed the two recent works on neural compression that he was involved in, named COIN [8] and COIN++ [9].  He then made a short overview of other Implicit Neural Representation in the domain of video such as NerV [10] and NIRVANA [11]. The slides for the presentation can be found here.

Upcoming Presentations

As part of the ongoing efforts of the group, the following talks/presentations are scheduled in the next two months. For an updated schedule and list of presentations, please check the ETG homepage here.

Sustainable/Green Video Streaming

Given the increasing carbon footprint of streaming services and climate crisis, many new collaborative efforts have started recently, such as the Greening of the Streaming alliance, Ultra HD Sustainability forum, etc. In addition, research works recently have started focussing on how to make video streaming more greener/sustainable. A talk providing an overview of the recent works and progress in direction is tentatively scheduled around mid-May, 2023.    

Panel discussion at VQEG Spring Meeting (June 26-30, 2023), Sony Interactive Entertainment HQ, San Mateo, US

During the next face-to-face VQEG meeting in San Mateo there will be an interesting panel discussion on the topic of “Deep Learning in Video Quality and Compression.” The goal is to invite the machine learning experts to VQEG and bring the two groups closer. ETG will organize the panel discussion, and the following four panellists are currently invited to join this event: Zhi Li (Netflix), Ioannis Katsavounidis (Meta), Richard Zhang (Adobe), and Mathias Wien (RWTH Aachen). Before this panel discussion, two talks are tentatively scheduled, the first one on video super-resolution and the second one focussing on learned image compression. 
The meeting will talk place in hybrid mode allowing for participation both in-person and online. For further information about the meeting, please check the details here and if interested, register for the meeting.

Joining and Other Logistics

While participation in the talks is open to everyone, to get notified about upcoming talks and participate in the discussion, please consider subscribing to etg@vqeg.org email reflector and join the slack channel using this link. The meeting minutes are available here. We are always looking for new ideas to improve. If you have suggestions on topics we should focus on or have recommendation of presenters, please reach out to the chairs (Nabajeet and Saman).

References

[1] White paper on MPEG Smart Contracts for Media.
[2] DLT-based Standards for IPR Management in the Media Industry.
[3] DLT-agnostic Media Smart Contracts (ISO/IEC 21000-23).
[4] [2007.03034] Nonlinear Transform Coding.
[5] [1711.06077] The Perception-Distortion Tradeoff.
[6] [2006.09965] High-Fidelity Generative Image Compression.
[7] [2006.06752] An Unsupervised Information-Theoretic Perceptual Quality Metric.
[8] Coin: Compression with implicit neural representations.
[9] COIN++: Neural compression across modalities.
[10] Nerv: Neural representations for videos.
[11] NIRVANA: Neural Implicit Representations of Videos with Adaptive Networks and Autoregressive Patch-wise Modeling.

JPEG Column: 98th JPEG meeting in Sydney, Australia

JPEG explores standardization in event-based imaging

The 98th JPEG meeting was held in Sydney, Australia, from the 16th to 20th January 2023. This was a welcome return to face-to-face meetings after a long period of online meetings due to Covid-19 pandemics. Interestingly, the previous face-to-face meeting of the JPEG Committee was also held in Sydney, in January 2020. The face-to-face 98th JPEG meeting was complemented with online connections to allow the remote participation of those who could not be present.

The recent calls for proposals, such as JPEG Fake Media, JPEG AI and JPEG Pleno Learning Based Point Cloud Coding, resulted in a very dynamic and participative meeting in Sydney, with multiple technical sessions and decisions. Exploration activities such as JPEG DNA and JPEG NFT also produced drafts of future calls for proposals as a consequence of reaching sufficient maturity.

Furthermore, and considering the current trends in machine-based imaging applications, the JPEG Committee initiated an exploration on standardization in event-based imaging.

98th JPEG Meeting first plenary.

The 98th JPEG meeting had the following highlights:

  • New JPEG exploration in event-based imaging;
  • JPEG Fake Media and NFT;
  • JPEG AI;
  •  JPEG Pleno Learning-based Point Cloud Coding improves its Verification Model;
  • JPEG AIC prepares the analysis of the responses to the Call for Contribution;
  • JPEG XL second editions;
  • JPEG Systems;
  • JPEG DNA prepares its call for proposals;
  • JPEG XS 3rd Edition;
  • JPEG 2000 guidelines.

The following summarizes the major achievements during the 98th JPEG meeting.

New JPEG exploration in event-based imaging

The JPEG Committee has started a new exploration activity on event-based imaging named JPEG XE.

Event-based Imaging revolves around a new and emerging image modality created by event-based visual sensors. Event-based sensors are the foundation for a new class of cameras that allow the efficient capture of visual information at high speed while at the same time requiring low computational cost, a requirement which it is common in many machine vision applications. Such sensors are modeled based on the mechanisms of the human visual system for the detection of scene changes and the asynchronous capture of those changes. This means that every pixel works individually to detect scene changes and creates the associated events. If nothing happens, then no events are generated. This contrasts with conventional image sensors, where pixels are sampled in a continuous and periodic manner, with images generated regardless of any changes in the scene and a risk of reacting with delay and even missing quick changes.

The JPEG Committee recognizes that this new image modality opens doors to a large number of applications where capture and processing of visual information is needed. Currently, there is no standard format to represent event-based information, and therefore existing and emerging applications are fragmented and lack interoperability. The new JPEG XE activity focuses on establishing a scope and relevant definitions, collecting use cases and their associated requirements, and investigating the role that JPEG can play in the definition of timely standards in the near- and long-term. To start, an Ad-hoc Group has been established. To stay informed about the activities please join the event based imaging Ad-hoc Group mailing list.

JPEG Fake Media and NFT

In April 2022, the JPEG Committee released a Final Call for Proposals on JPEG Fake Media. The scope of JPEG Fake Media is the creation of a standard that can facilitate the secure and reliable annotation of media assets creation and modifications. During the 98th meeting, the JPEG Committee finalised the evaluation of the six submitted proposals and initiated the process for establishing a new standard.

The JPEG Committee also continues to explore use cases and requirements related to Non-Fungible Tokens (NFTs). Although the use cases for both topics are very different, there is a clear commonality in terms of requirements and relevant solutions. An updated version of the “Use Cases and Requirements for JPEG NFT” was produced and made publicly available for review and feedback.

To stay informed about the activities, please join the mailing list of the Ad-hoc Group and regularly check the JPEG website for the latest information.

JPEG AI

Following the creation of the JPEG AI Verification Model at the previous 97th JPEG meeting, more discussions occurred at the 98th meeting to improve the coding efficiency, and complexity, especially on the decoder side. The JPEG AI VM has several unique characteristics, such as a parallelizable context model to perform latent prediction, decoupling of prediction and sample reconstruction, and rate adaptation, among others. JPEG AI VM shows up to 31% compression gain over VVC Intra for natural content. A new JPEG AI test set was released during the 98th meeting. This is a large dataset for the evaluation of the JPEG AI VM containing 50 images, with the objective of tracking the performance improvements at every meeting. The JPEG AI Common Training and Test Conditions were updated to include this new dataset. In this meeting, it was also decided to integrate several changes into the JPEG AI VM, speeding up training, improving performance at high rates and fixing bugs. A set of core experiments were established at this meeting targeting RD performance and complexity improvements. The JPEG AI VM Software Guidelines were approved, describing the initial setup repository of JPEG AI VM, how to obtain the JPEG AI dataset, and how to run tests and training. A description of the structure of the JPEG AI VM repository was also made available.

JPEG Pleno Learning-based Point Cloud coding

The JPEG Pleno Point Cloud activity progressed at this meeting with a number of technical submissions for improvements to the VM in the area of colour coding, artefact processing and improvements to coding speed. In addition, the JPEG Committee released the “Call for Content for JPEG Pleno Point Cloud Coding” to expand on the current training and test set with new point clouds representing key use cases. Prior to the 99th JPEG Meeting, JPEG experts will promote the Call for Content as well as investigate possible advancements to the VM in the areas of auto-regressive entropy encoding, sparse tensor convolution, meta-data controlled post-filtering of colour and a flexible split geometry and colour coding framework for the VM.

JPEG AIC

During the 98th JPEG meeting in Sydney, Australia, Exploration Study 1 on JPEG AIC was established. This exploration study will collect results from three types of previously standardized subjective evaluation methodologies in order to provide an informative reference for the JPEG AIC submissions to the Call for Contributions that are due by April 1st, 2023. Corrections and additions to the JPEG AIC Common Test Conditions were issued in order to reflect the addition of a new codec for testing content generation and a new anchor subjective quality assessment methodology.

The JPEG Committee is working on the continuation of the previous standardization efforts (AIC-1 and AIC-2) and aims at developing a new standard, known as AIC-3. The new standard will focus on the methodologies for quality assessment of images in a range that goes from high quality to near-visually lossless quality, which are not covered by any previous AIC standards.

JPEG XL

The second editions of JPEG XL Part 1 (Core coding system) and Part 2 (File format) have reached the CD stage. These second editions provide clarifications, corrections and editorial improvements that will facilitate independent implementations. Also, an updated version of the JPEG XL White Paper has been published and is freely available through jpeg.org.

JPEG Systems

The JLINK standard (19566-7:2022) is now published by ISO. JLINK specifies an image file format capable of linking multiple media elements, such as image and text in any JPEG file format. It enables enhanced curated experiences of a set of images for education, training, virtual museum tours, travelogs, and similar visually-oriented content.

The JPEG Snack (19566-8) standard is expected to be published in February 2023. JPEG Snack specifies the coding of audio, picture, multimedia and hypermedia information, enabling a rich, image-based, short-form animated experiences for social media.

The second edition of JUMBF (JPEG Universal Metadata Box Format, 19566-5) is progressing to IS stage; the second edition brings new capabilities and support for additional types of media.

JPEG DNA

The JPEG Committee has been working on an exploration for coding of images in quaternary representations particularly suitable for image archival on DNA storage. The scope of JPEG DNA is the creation of a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers. During the 98th JPEG meeting, a draft Call for Proposals for JPEG DNA was issued and made public, as a first concrete step towards standardisation. The draft call for proposals for JPEG DNA is complemented by a JPEG DNA Common Test Conditions document which is also made public, describing details about the dataset, operating points, anchors and performance assessment methodologies and metrics that will be used to evaluate anchors and future responses to the Call for Proposals. The final Call for Proposals for JPEG DNA is expected to be released at the conclusion of the 99th JPEG meeting in April 2023, after a set of exploration experiments have validated the procedures outlined in the draft Call for Proposals for JPEG DNA and JPEG DNA Common Test Conditions. The deadline for submission of proposals to the Call for Proposals for JPEG DNA is 2 October 2023 with a pre-registration due by 10 July 2023. The JPEG DNA international standard is expected to be published by early 2025.

JPEG XS

The JPEG Committee continued with the definition of JPEG XS 3rd edition. The primary goal of the 3rd edition is to deliver the same image quality as the 2nd edition, but with half of the required bandwidth. The Committee Draft for Part 1 (Core coding system) will proceed to ISO ballot. This means that the standard is now technically defined, and all the new coding tools are known. Most notably, Part 1 adds a temporal decorrelation coding mode to further improve the coding efficiency, while keeping the low-latency and low-complexity core aspects of JPEG XS. This new coding tool is of extreme importance for remote desktop applications and screen sharing. In addition, mathematically lossless coding can now support up to 16 bits precision (up from 12 bits). For Part 2 (Profiles and buffer models), the committee created a second Working Draft and issued further core experiments to proceed and support this work. Meanwhile, ISO approved the creation of a new edition of Part 3 (Transport and container formats) that is needed to address the changes of Part 1 and Part 2.

JPEG 2000

The JPEG committee publishes two sets of guidelines for implementers of JPEG 2000, available on jpeg.org.

The first describes an algorithm for controlling JPEG 2000 coding quality using a single number (Qfactor) between 1 (worst quality) and 100 (best quality), as is commonly done with JPEG.

The second explains how to create, parse and use HTJ2K placeholder passes and HT Sets. These features are an integral part of HTJ2K and enable mathematically lossless transcoding between HT- and J2K-based codestreams, among other applications.

Final Quote

“The interest in event-based imaging has been rising with several products designed and offered by the industry. The JPEG Committee believes in interoperable solutions and has initiated an exploration for standardization of event-based imaging in order to accelerate creation of an ecosystem.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Upcoming JPEG meetings are planned as follows:

  • No 99, will be online from 24-28 April 2023
  • No 100, will be in Covilhã, Portugal from 17-21 July 2023

VQEG Column: VQEG Meeting December 2022

Introduction

This column provides an overview of the last Video Quality Experts Group (VQEG) plenary meeting, which took place from 12 to 16 December 2022. Around 100 participants from 21 different countries around the world registered for the meeting that was organized online by Brightcove (United Kingdom). During the five days, there were more than 40 presentations and discussions among researchers working on topics related to the projects ongoing within VQEG. All the related information, minutes, and files from the meeting are available online on the VQEG meeting website, and video recordings of the meeting are available on Youtube.

Many of the works presented in this meeting can be relevant for the SIGMM community working on quality assessment. Particularly interesting can be the proposals to update and merge ITU-T recommendations P.913, P.911, and P.910, the kick-off of the test plan to evaluate the QoE of immersive interactive communication systems, and the creation of a new group on emerging technologies that will start working on AI-based technologies and greening of streaming and related trends.

We encourage readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Group picture of the VQEG Meeting 12-16 December 2022 (online).

Overview of VQEG Projects

Audiovisual HD (AVHD)

The AVHD group investigates improved subjective and objective methods for analysing commonly available video systems. Currently, there are two projects ongoing under this group: Quality of Experience (QoE) Metrics for Live Video Streaming Applications (Live QoE) and Advanced Subjective Methods (AVHD-SUB).

In this meeting, there were three presentations related to topics covered by this group. In the first one, Maria Martini (Kingston University, UK), presented her work on converting video quality assessment metrics. In particular, the work addressed the relationship between SSIM and PSNR for DCT-based compressed images and video, exploiting the content-related factor [1]. The second presentation was given by Urvashi Pal (Akamai, Australia) and dealt with video codec profiling with video quality assessment complexities and resolutions. Finally, Jingwen Zhu (Nantes Université, France) presented her work on the benefit of parameter-driven approaches for the modelling and the prediction of a Satisfied User Ratio for compressed videos [2].

Quality Assessment for Health applications (QAH)

The QAH group works on the quality assessment of health applications, considering both subjective evaluation and the development of datasets, objective metrics, and task-based approaches. Currently there is an open discussion on new topics to address within the group, such as the application of visual attention models and studies to health applications. Also, an opportunity to conduct medical perception research was announced, which was proposed by Elizabeth Krupinski and will take place in the European Congress of Radiology (Vienna, Austria, Mar. 2023).

In addition, four research works were presented at the meeting. Firstly, Julie Fournier (INSA Rennes, France) presented new insights on affinity therapy for people with ASD, based on an eye-tracking study on images. The second presentation was delivered by Lumi Xia (INSA Rennes, France) and dealt with the evaluation of the usability of deep learning-based denoising models for low-dose CT simulation. Also, Mohamed Amine Kerkouri (University of Orleans, France), presented his work on deep-based quality assessment of medical images through domain adaptation. Finally, Jorge Caviedes (ASU, USA) delivered a talk on cognition inspired diagnostic image quality models, emphasising the need of distinguishing among interpretability (e.g., medical professional is confident in making a diagnosis), adequacy (e.g., capture technique shows the right area for assessment), and visual quality (e.g., MOS) in quality assessment of medical contents.

Statistical Analysis Methods (SAM)

The SAM group works on improving analysis methods both for the results of subjective experiments and for objective quality models and metrics. The group is currently working on updating and merging the ITU-T recommendations P.913, P.911, and P.910. The suggestion is to make P.910 and P.911 obsolete and make P.913 the only recommendation from ITU-T on subjective video quality assessments. The group worked on the liaison and document to be sent to ITU-T SG12 and will be available in the meeting files.

In addition, Mohsen Jenadeleh (Univerity of Konstanz, Germany) presented his work on collective just noticeable difference assessment for compressed video with Flicker Test and QUEST+.

Computer Generated Imagery (CGI)

CGI group is devoted to analysing and evaluating computer-generated content, with a focus on gaming in particular. The group is currently working in collaboration with ITU-T SG12 on the work item P.BBQCG on Parametric bitstream-based Quality Assessment of Cloud Gaming Services. In this sense, Saman Zadtootaghaj (Sony Interactive Entertainment, Germany) provided an update on the ongoing activities. In addition, they are working on two new work items: G.OMMOG on Opinion Model for Mobile Online Gaming applications and P.CROWDG on Subjective Evaluation of Gaming Quality with a Crowdsourcing Approach. Also, the group is working on identifying other topics and interests in CGI rather than gaming content.

No Reference Metrics (NORM)

The NORM group is an open collaborative project for developing no-reference metrics for monitoring visual service quality. Currently, the group is working on three topics: the development of no-reference metrics, the clarification of the computation of the Spatial and Temporal Indexes (SI and TI, defined in the ITU-T Recommendation P.910), and the development of a standard for video quality metadata. 

In relation to the first topic, Margaret Pinson (NTIA/ITS, US), talked about why no-reference metrics for image and video quality lack accuracy and reproducibility [3] and presented new datasets containing camera noise and compression artifacts for the development of no-reference metrics by the group. In addition, Oliver Wiedeman (University of Konstanz, Germany) presented his work on cross-resolution image quality assessment.

Regarding the computation of complexity indices, Maria Martini (Kingston University, UK) presented a study comparing 12 metrics (and possible combinations) for assessing video content complexity. Vignesh V. Menon (University of Klagenfurt, Austria) presented a summary of live per-title encoding approaches using video complexity features. Ioannis Katsavounidis and Cosmin Stejerean (Meta, US) presented their work on using motion search to order videos by coding complexity, also making available the software in open source. In addition, they led a discussion on supplementing classic SI and TI with improved complexity metrics (VCA, motion search, etc.).

Finally, related to the third topic, Ioannis Katsavounidis (Meta, US) provided an update on the status of the project. Given that the idea is already mature enough, a contribution will be made to MPEG to consider the insertion of metadata of video metrics into the encoded video streams. In addition, a liaison with AOMedia will be established that may go beyond this particular topic. And include best practices on subjective testing, IMG topics, etc.

Joint Effort Group (JEG) – Hybrid

The JEG group was focused on a joint work to develop hybrid perceptual/bitstream metrics and gradually evolved over time to include several areas of Video Quality Assessment (VQA), such as the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. Currently, the group is working on research problems rather than algorithms and models with immediate applicability. In addition, the group has launched a new website, which includes a list of activities of interest, freely available publications, and other resources. 

Two examples of research problems addressed by the group were shown by the two presentations given by Lohic Fotio Tiotsop (Politecnico di Torino, Italy). The topic of the first presentation was related to the training of artificial intelligence observers for a wide range of applications, while the second presentation provided guidelines to train, validate, and publish DNN-based objective measures.

5G Key Performance Indicators (5GKPI)

The 5GKPI group studies the relationship between key performance indicators of new 5G networks and QoE of video services on top of them. In this meeting, Pablo Pérez (Nokia XR Lab, Spain) presented an overview of activities related to QoE and XR within 3GPP.

Immersive Media Group (IMG)

The IMG group is focused on the research on quality assessment of immersive media. The main joint activity going on within the group is the development of a test plan to evaluate the QoE of immersive interactive communication systems. After the discussions that took place in previous meetings and audio calls, a tentative schedule has been proposed to start the execution of the test plan in the following months. In this sense, a new work item will be proposed in the next ITU-T SG12 meeting to establish a collaboration between VQEG-IMG and ITU on this topic.

In addition to this, a variety of different topics related to immersive media technologies were covered in the works presented during the meeting. For example, Yaosi Hu (Wuhan University, China) presented her work on video quality assessment based on quality aggregation networks. In relation to light field imaging, Maria Martini (Kingston University, UK) exposed the main problems related to what light field quality assessment datasets are currently meeting and presented a new dataset. Also, there were three talks by researchers from CWI (Netherlands) dealing with point cloud QoE assessment: Silvia Rossi presented a behavioral analysis in a 6-DoF VR system, taking into account the influence of content, quality and user disposition [4]; Shishir Subramanyam presented his work related to the subjective QoE evaluation of user-centered adaptive streaming of dynamic point clouds [5]; and Irene Viola presented a point cloud objective quality assessment using PCA-based descriptors (PointPCA). Another presentation related to point cloud quality assessment was delivered by Marouane Tliba (Université d’Orleans, France), who presented an efficient deep-based graph objective metric

In addition, Shirin Rafiei (RISE, Sweden) gave a talk on UX and QoE aspects of remote control operations using a laboratory platform, Marta Orduna (Universidad Politécnica de Madrid, Spain) presented her work on comparing ACR, SSDQE, and SSCQE in long duration 360-degree videos, whose results will be used to submit a proposal to extend ITU-T Rec. P.919 for long sequences, and Ali Ak (Nantes Université, France) his work on just noticeable differences to HDR/SDR image/video quality.    

Quality Assessment for Computer Vision Applications (QACoViA)

The goal of the QACoViA group is to study the visual quality requirements for computer vision methods, where the “final observer” is an algorithm. Four presentations were delivered in this meeting addressing diverse related topics. In the first one, Mikołaj Leszczuk (AGH University, Poland) presented a method for assessing objective video quality for automatic license plate recognition tasks [6]. Also, Femi Adeyemi-Ejeye (University of Surrey, UK) presented his work related to the assessment of rail 8K-UHD CCTV facing video for the investigation of collisions. The third presentation dealt with the application of facial expression recognition and was delivered by Lucie Lévêque (Nantes Université, France), who compared the robustness of humans and deep neural networks on this task [7]. Finally, Alban Marie (INSA Rennes, France) presented a study video coding for machines through a large-scale evaluation of DNNs robustness to compression artefacts for semantic segmentation [8].

Other updates

In relation to the Human Factors for Visual Experiences (HFVE) group, Maria Martini (Kingston University, UK) provided a summary of the status of IEEE recommended practice for the quality assessment of light field imaging. Also, Kjell Brunnström (RISE, Sweden) presented a study related to the perceptual quality of video on simulated low temperatures in LCD vehicle displays.

In addition, a new group was created in this meeting called Emerging Technologies Group (ETG), whose main objective is to address various aspects of multimedia that do not fall under the scope of any of the existing VQEG groups. The topics addressed are not necessarily directly related to “video quality” but can indirectly impact the work addressed as part of VQEG. In particular, two major topics of interest were currently identified: AI-based technologies and greening of streaming and related trends. Nevertheless, the group aims to provide a common platform for people to gather together and discuss new emerging topics, discuss possible collaborations in the form of joint survey papers/whitepapers, funding proposals, etc.

Moreover, it was agreed during the meeting to make the Psycho-Physiological Quality Assessment (PsyPhyQA) group dormant until interest resumes in this effort. Also, it was proposed to move the Implementer’s Guide for Video Quality Metrics (IGVQM) project into the JEG-Hybrid, since their activities are currently closely related. This will be discussed in future group meetings and the final decisions will be announced. Finally, as a reminder, the VQEG GitHub with tools and subjective labs setup is still online and kept updated.

The next VQEG plenary meeting will take place in May 2023 and the location will be announced soon on the VQEG website.

References

[1] Maria G. Martini, “On the relationship between SSIM and PSNR for DCT-based compressed images and video: SSIM as content-aware PSNR”, TechRxiv. Preprint. https://doi.org/10.36227/techrxiv.21725390.v1, 2022.
[2] J. Zhu, P. Le Callet; A. Perrin, S. Sethuraman, K. Rahul, “On The Benefit of Parameter-Driven Approaches for the Modeling and the Prediction of Satisfied User Ratio for Compressed Video”, IEEE International Conference on Image Processing (ICIP), Oct. 2022.
[3] Margaret H. Pinson, “Why No Reference Metrics for Image and Video Quality Lack Accuracy and Reproducibility”, Frontiers in Signal Processing, Jul. 2022.
[4] S. Rossi, I. viola, P. Cesar, “Behavioural Analysis in a 6-DoF VR System: Influence of Content, Quality and User Disposition”, Proceedings of the 1st Workshop on Interactive eXtended Reality, Oct. 2022.
[5] S. Subramanyam, I. Viola, J. Jansen, E. Alexiou, A. Hanjalic, P. Cesar, “Subjective QoE Evaluation of User-Centered Adaptive Streaming of Dynamic Point Clouds”, International Conference on Quality of Multimedia Experience (QoMEX), Sep. 2022.
[6] M. Leszczuk, L. Janowski, J. Nawała, and A. Boev, “Method for Assessing Objective Video Quality for Automatic License Plate Recognition Tasks”, Communications in Computer and Information Science, Oct. 2022.
[7] L. Lévêque, F. Villoteau, E. V. B. Sampaio, M. Perreira Da Silva, and P. Le Callet, “Comparing the Robustness of Humans and Deep Neural Networks on Facial Expression Recognition”, Electronics, 11(23), Dec. 2022.
[8] A. Marie, K. Desnos, L. Morin, and Lu Zhang, “Video Coding for Machines: Large-Scale Evaluation of Deep Neural Networks Robustness to Compression Artifacts for Semantic Segmentation”, IEEE International Workshop on Multimedia Signal Processing (MMSP), Sep. 2022.

MPEG Column: 140th MPEG Meeting in Mainz, Germany

After several years of online meetings, the 140th MPEG meeting was held as a face-to-face meeting in Mainz, Germany, and the official press release can be found here and comprises the following items:

  • MPEG evaluates the Call for Proposals on Video Coding for Machines
  • MPEG evaluates Call for Evidence on Video Coding for Machines Feature Coding
  • MPEG reaches the First Milestone for Haptics Coding
  • MPEG completes a New Standard for Video Decoding Interface for Immersive Media
  • MPEG completes Development of Conformance and Reference Software for Compression of Neural Networks
  • MPEG White Papers: (i) MPEG-H 3D Audio, (ii) MPEG-I Scene Description

Video Coding for Machines

Video coding is the process of compression and decompression of digital video content with the primary purpose of consumption by humans (e.g., watching a movie or video telephony). Recently, however, massive video data is more and more analyzed without human intervention leading to a new paradigm referred to as Video Coding for Machines (VCM) which targets both (i) conventional video coding and (ii) feature coding (see here for further details).

At the 140th MPEG meeting, MPEG Technical Requirements (WG 2) evaluated the responses to the Call for Proposals (CfP) for technologies and solutions enabling efficient video coding for machine vision tasks. A total of 17 responses to this CfP were received, with responses providing various technologies such as (i) learning-based video codecs, (ii) block-based video codecs, (iii) hybrid solutions combining (i) and (ii), and (iv) novel video coding architectures. Several proposals use a region of interest-based approach, where different areas of the frames are coded in varying qualities.

The responses to the CfP reported an improvement in compression efficiency of up to 57% on object tracking, up to 45% on instance segmentation, and up to 39% on object detection, respectively, in terms of bit rate reduction for equivalent task performance. Notably, all requirements defined by WG 2 were addressed by various proposals.

Furthermore, MPEG Technical Requirements (WG 2) evaluated the responses to the Call for Evidence (CfE) for technologies and solutions enabling efficient feature coding for machine vision tasks. A total of eight responses to this CfE were received, of which six responses were considered valid based on the conditions described in the call:

  • For the tested video dataset increases in compression efficiency of up to 87% compared to the video anchor and over 90% compared to the feature anchor were reported.
  • For the tested image dataset, the compression efficiency can be increased by over 90% compared to both image and feature anchors.

Research aspects: the main research area is still the same as described in my last column, i.e., compression efficiency (incl. probably runtime, sometimes called complexity) and Quality of Experience (QoE). Additional research aspects are related to the actual task for which video coding for machines is used (e.g., segmentation, object detection, as mentioned above).

Video Decoding Interface for Immersive Media

One of the most distinctive features of immersive media compared to 2D media is that only a tiny portion of the content is presented to the user. Such a portion is interactively selected at the time of consumption. For example, a user may not see the same point cloud object’s front and back sides simultaneously. Thus, for efficiency reasons and depending on the users’ viewpoint, only the front or back sides need to be delivered, decoded, and presented. Similarly, parts of the scene behind the observer may not need to be accessed.

At the 140th MPEG meeting, MPEG Systems (WG 3) reached the final milestone of the Video Decoding Interface for Immersive Media (VDI) standard (ISO/IEC 23090-13) by promoting the text to Final Draft International Standard (FDIS). The standard defines the basic framework and specific implementation of this framework for various video coding standards, including support for application programming interface (API) standards that are widely used in practice, e.g., Vulkan by Khronos.

The VDI standard allows for dynamic adaptation of video bitstreams to provide the decoded output pictures so that the number of actual video decoders can be smaller than the number of elementary video streams to be decoded. In other cases, virtual instances of video decoders can be associated with the portions of elementary streams required to be decoded. With this standard, the resource requirements of a platform running multiple virtual video decoder instances can be further optimized by considering the specific decoded video regions to be presented to the users rather than considering only the number of video elementary streams in use. The first edition of the VDI standard includes support for the following video coding standards: High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC), and Essential Video Coding (EVC).

Research aspect: VDI is also a promising standard to enable the implementation of viewport adaptive tile-based 360-degree video streaming, but its performance still needs to be assessed in various scenarios. However, requesting and decoding individual tiles within a 360-degree video streaming application is a prerequisite for enabling efficiency in such cases, and VDI provides the basis for its implementation.

MPEG-DASH Updates

Finally, I’d like to provide a quick update regarding MPEG-DASH, which seems to be in maintenance mode. As mentioned in my last blog post, amendments, Defects under Investigation (DuI), and Technologies under Consideration (TuC) are output documents, as well as a new working draft called Redundant encoding and packaging for segmented live media (REAP), which eventually will become ISO/IEC 23009-9. The scope of REAP is to define media formats for redundant encoding and packaging of live segmented media, media ingest, and asset storage. The current working draft can be downloaded here.

Research aspects: REAP defines a distributed system and, thus, all research aspects related to such systems apply here, e.g., performance and scalability, just to name a few.

The 141st MPEG meeting will be online from January 16-20, 2023. Click here for more information about MPEG meetings and their developments.

JPEG Column: 97th JPEG Meeting

JPEG initiates specification on fake media based on responses to its call for proposals

The 97th JPEG meeting was held online from 24 to 28 October 2022. JPEG received responses to the Call for Proposals (CfP) on JPEG Fake Media, the first multimedia international standard designed to facilitate the secure and reliable annotation of media assets creation and modifications. In total six responses were received addressing different requirements in the scope of this standardization initiative. Moreover, relevant advances were made on the standardization of learning-based coding, notably the learning-based coding of images, JPEG AI, and JPEG Pleno point cloud coding. Furthermore, the explorations on quality assessment of images, JPEG AIC, and of JPEG Pleno light field had relevant advances with the definition of their Calls for Contributions and Common Test Conditions.

Also relevant, the 98th JPEG meeting will be held in Sydney, Australia, representing a return to physical meetings after the long COVID pandemics. This is a return, as the last physical meeting was also held in January 2020 in the same location, in Sydney, Australia.

The 97th JPEG meeting had the following highlights:

  • JPEG Fake Media responses to the Call for Proposals analysed,
  • JPEG AI Verification Model,
  • JPEG Pleno Learning-based Point Cloud coding Verification Model,
  • JPEG Pleno Light Field issues a Call for Contributions on Subjective Light Field Quality Assessment,
  • JPEG AIC issues a Call for Contributions on Subjective Image Quality Assessment,
  • JPEG DNA releases a draft of Common Test Conditions,
  • JPEG XS prepares third edition of core coding system, and profiles and buffer models,
  • JPEG 2000 conformance is under development.
Fig. 1: Fake Media application scenarios: Good faith vs Malicious intent.

The following summarises the major achievements of the 97th JPEG meeting.

JPEG Fake Media

In April 2022, the JPEG Committee released a Final Call for Proposals on JPEG Fake Media. The scope of JPEG Fake Media is the creation of a standard that can facilitate the secure and reliable annotation of media assets creation and modifications. The standard shall address use cases that are in good faith as well as those with malicious intent. During the 97th meeting in October 2022, the following six responses to the call were presented:

  1. Adobe/C2PA: C2PA Specification
  2. Huawei: Provenance and Right Management for Digital Contents in JPEG Fake Media
  3. Sony Group Corporation: Methods to keep track provenance of media asset and signing data
  4. Vrije Universiteit Brussel/imec: Media revision history tracking via asset decomposition and serialization
  5. UPC: MIPAMS Provenance module
  6. Newcastle University: Response to JPEG Fake Media standardization call

In the coming months, these proposals will be thoroughly evaluated following a process that is open, transparent, fair and unbiased and allows deep technical discussions to assess which proposals best address identified requirements. Based on the conclusions of these discussions, a new standard will be produced to address fake media and provide solutions for transparency related to media authenticity. The standard will combine the best elements of the six proposals.

To stay informed about the activities please join the JPEG Fake Media & NFT AHG mailing list and regularly check the JPEG website for the latest information.

JPEG AI

JPEG AI (ISO/IEC 6048) aims at the development of a learning-based image coding standard offering a single-stream, compact compressed domain representation, targeting both human visualization with significant compression efficiency improvement over state-of-the-art image coding standards at similar subjective quality, and improved performance for image processing and computer vision tasks. The evaluation of the Call for Proposals responses had already confirmed the industry interest, and the subjective tests presented at the 96th JPEG meeting showed results that significantly outperform conventional image compression solutions. 

The JPEG AI verification model has been issued as the outcome of this meeting and follows the integration effort of several neural networks and tools. There are several characteristics that make the JPEG AI Verification Model (VM) unique, such as the decoupling of the entropy decoding from the sample reconstruction and the exploitation of the spatial correlation between latents using a prediction and a fusion network as well as a massively parallelized auto-regressive network. The performance evaluation has shown significant RD performance improvements (as much as 32.2% of BD-rate over H.266/VVC) with competitive decoding complexity. Other functionalities such as rate adaptation and device interoperability have also been addressed with the use of gain units and the quantization of the weights in the entropy decoding module. Moreover, the adoption process for architectural changes and for new or improved coding tools in JPEG AI VM was approved. A set of core experiments have been defined for improving the JPEG AI VM and target the improvement of the coding efficiency and the reduction of the encoding and decoding complexity. The core experiments represent a set of promising technologies, such as learning-based GAN training, simplification of the analysis/synthesis transform, adaptive entropy coding alphabet, and even encoder-only tools and procedures for training speed-up.

JPEG Pleno Learning-based Point Cloud coding

The JPEG Pleno Point Cloud activity progressed at this meeting with the successful validation of the Verification Model under Consideration (VMuC). The VMuC was confirmed as the Verification Model (VM) to form the core of the future standard; ISO/IEC 21794 Part 6 JPEG Pleno: Learning-based Point Cloud Coding. The JPEG Committee has commenced work on the Working Draft of the standard, with initial text reviewed at this meeting. Prior to the next 98th JPEG Meeting, JPEG experts will investigate possible advancements to the VM in the area of auto-regressive entropy encoding and sparse tensor convolution as well as sourcing additional point clouds for the JPEG Pleno Point Cloud test set.

JPEG Pleno Light Field

During the 97th meeting, the JPEG Committee released the “JPEG Pleno Final Call for Contributions on Subjective Light Field Quality Assessment”, to collect new procedures and best practices regarding light field subjective quality evaluation methodologies to assess artifacts induced by coding algorithms. All contributions, including test procedures, datasets, and any additional information, will be considered to develop the standard by consensus among the JPEG experts following a collaborative process approach. The deadline for submission of contributions is April 1, 2023.


The JPEG Committee organized its 1st workshop on light field quality assessment to discuss challenges and current solutions for subjective light field quality assessment, explore relevant use cases and requirements, and provide a forum for researchers to discuss the latest findings in this area. The JPEG Committee also promoted its 2nd workshop on learning-based light field coding to exchange experiences and to present technological advances in learning-based coding solutions for light field data. The proceedings and video footage of both workshops are now accessible on the JPEG website.

JPEG AIC

At the 97th JPEG Meeting, a new JPEG AIC Final Call for Contributions on Subjective Image Quality Assessment was issued. The JPEG Committee is working on the continuation of the previous standardization efforts (AIC-1 and AIC-2) and aims at developing a new standard, known as AIC-3. The new standard will be focusing on the methodologies for quality assessment of images in a range that goes from high quality to near-visually lossless quality, which are not covered by the previous AIC standards.

The Call for Contributions on Subjective Image Quality Assessment is asking for contributions to the standardization process that will be collaborative from the very beginning. In this context, all received contributions will be considered for the development of the standard by consensus among the JPEG experts.

The JPEG Committee will be releasing a new JPEG AIC-3 Dataset on the 15th of December 2022. And the deadline for submitting contributions to the call is set to the 1st of April 2023 23:59 UTC. The contributors will be presenting their contributions at the 99th JPEG Meeting in April 2023.

The Call for Contributions on Subjective Image Quality Assessment addresses the development of a suitable subjective evaluation methodology standard. A second stage will address the objective perceptual visual quality evaluation models that perform well and have a good discriminative power in the high quality to near-visually lossless quality range.

JPEG DNA

The JPEG Committee has continued its exploration of the coding of images in quaternary representations, as it is particularly suitable for DNA storage applications. The scope of JPEG DNA is the creation of a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers. During the 97th JPEG meeting, the JPEG DNA Benchmark Codec and the JPEG DNA Common Test Conditions were updated to allow for additional concrete experiments to take place prior to issuing a draft call for proposals at the next meeting. This will also allow further validation and extension of the JPEG DNA benchmark codec to simulate an end-to-end image storage pipeline using DNA and in particular include biochemical noise simulation which is an essential element in practical implementations.

JPEG XS

The 2nd edition of JPEG XS is now fully completed and published. The JPEG Committee continues its work on the 3rd edition of JPEG XS, starting with Part 1 (Core coding system) and Part 2 (Profiles and buffer models). These editions will address new use cases and requirements for JPEG XS by defining additional coding tools to further improve the coding efficiency, while keeping the low-latency and low-complexity core aspects of JPEG XS. The primary goal of the 3rd edition is to deliver the same image quality as the 2nd edition, but with half of the required bandwidth. During the 97th JPEG meeting, a new Working Draft of Part 1 and a first Working Draft of Part 2 were created. To support the work a new Core Experiment was also issued to further test the proposed technology. Finally, an update to the JPEG XS White Paper has been published.

JPEG 2000

A new edition of Rec. ITU-T T.803 | ISO/IEC 15444-4 (JPEG 2000 conformance) is under development.

This new edition proposes to relax the maximum allowable errors so that well-designed 16-bit fixed-point implementations pass all compliance tests; adds two test codestreams to facilitate testing of inverse wavelet and component decorrelating transform accuracy, and adds several codestreams and files conforming to Rec. ITU-T 801 |ISO/IEC 15444-2 to facilitate the implementation of decoders and file format readers

Codestreams and test files can be found on the JPEG GitLab repository at: https://gitlab.com/wg1/htj2k-codestreams/-/merge_requests/14

Final Quote

“Motivated by the consumers’ concerns of manipulated contents, the JPEG Committee has taken concrete steps to define a new standard that provides interoperable solutions for a secure and reliable annotation of media assets creation and modifications” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Upcoming JPEG meetings are planned as follows:

  • No 98, will be in Sydney, Australia from 14-20 January 2022

VQEG Column: VQEG Meeting May 2022

Introduction

Welcome to this new column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG), which will provide an overview of the last VQEG plenary meeting that took place from 9 to 13 May 2022. It was organized by INSA Rennes (France), and it was the first face-to-face meeting after the series of online meetings due to the Covid-19 pandemic. Remote attendance was also offered, which made possible that around 100 participants, from 17 different countries, attended the meeting (more than 30 of them attended in person). During the meeting, more than 40 presentations were provided, and interesting discussion took place. All the related information, minutes, and files from the meeting are available online in the VQEG meeting website, and video recordings of the meeting are available in Youtube.

Many of the works presented at this meeting can be relevant for the SIGMM community working on quality assessment. Particularly interesting can be the proposals to update the ITU-T Recommendations P.910 and P.913, as well as the presented publicly available datasets. We encourage those readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Group picture of the VQEG Meeting 9-13 May 2022 in Rennes (France).

Overview of VQEG Projects

Audiovisual HD (AVHD)

The AVHD group investigates improved subjective and objective methods for analyzing commonly available video systems. In this sense, the group continues working on extensions of the ITU-T Recommendation P.1204 to cover other encoders (e.g., AV1) apart from H.264, HEVC, and VP9. In addition, the project’s Quality of Experience (QoE) Metrics for Live Video Streaming Applications (Live QoE) and Advanced Subjective Methods (AVHD-SUB) are still ongoing. 

In this meeting, several AVHD-related topics were discussed, supported by six different presentations. In the first one, Mikolaj Leszczuk (AGH University, Poland) presented an analysis of the influence on the subjective assessment of the quality of video transmission of experiment conditions, such as video sequence order, variation and repeatability that can entail a “learning” process of the test participants during the test. In the second presentation, Lucjan Janowski (AGH University, Poland) presented two proposals towards more ecologically valid experiment designs: the first one using the Absolute Category Rating [1] without scale but in a “think aloud” manner, and the second one called “Your Youtube, our lab” in which the user selects the content that he or she prefers and a question quality appears during the viewing experience through a specifically designed interface. Also dealing with the study of testing methodologies, Babak Naderi (TU-Berlin, Germany) presented work on subjective evaluation of video quality with a crowdsourcing approach, while Pierre David (Capacités, France) presented a three-lab experiment, involving Capacités (France), RISE (Sweden) and AGH University (Poland) on quality evaluation of social media videos. Kjell Brunnström (RISE, Sweden) continued by giving an overview of video quality assessment of Video Assistant Refereeing (VAR) systems, and lastly, Olof Lindman (SVT, Sweden) presented another effort to reduce the lack of open datasets with the Swedish Television (SVT) Open Content.

Quality Assessment for Health applications (QAH)

The QAH group works on the quality assessment of health applications, considering both subjective evaluation and the development of datasets, objective metrics, and task-based approaches. In this meeting, Lucie Lévêque (Nantes Université, France) provided an overview of the recent activities of the group, including a submitted review paper on objective quality assessment for medical images, a special session accepted for IEEE International Conference on Image Processing (ICIP) that will take place in October in Bordeaux (France), and a paper submitted to IEEE ICIP on quality assessment through detection task of covid-19 pneumonia. The work described in this paper was also presented by Meriem Outtas (INSA Rennes, France).

In addition, there were two more presentations related to the quality assessment of medical images. Firstly, Yuhao Sun (University of Edinburgh, UK) presented their research on a no-reference image quality metric for visual distortions on Computed Tomography (CT) scans [2]. Finally, Marouane Tliba (Université d’Orleans, France) presented his studies on quality assessment of medical images through deep-learning techniques using domain adaptation.

Statistical Analysis Methods (SAM)

The SAM group works on improving analysis methods both for the results of subjective experiments and for objective quality models and metrics. The group is currently working on a proposal to update the ITU-T Recommendation P.913, including new testing methods for subjective quality assessment and statistical analysis of the results. Margaret Pinson presented this work during the meeting.   

In addition, five presentations were delivered addressing topics related to the group activities. Jakub Nawała (AGH University, Poland) presented the Generalised Score Distribution to accurately describe responses from subjective quality experiments. Three presentations were provided by members of Nantes Université (France): Ali Ak presented his work on spammer detection on pairwise comparison experiments, Andreas Pastor talked about how to improve the maximum likelihood difference scaling method in order to measure the inter-content scale, and Chama El Majeny presented the functionalities of a subjective test analysis tool, whose code will be publicly available. Finally, Dietmar Saupe (Univerity of Konstanz, Germany) delivered a presentation on subjective image quality assessment with boosted triplet comparisons.

Computer Generated Imagery (CGI)

CGI group is devoted to analyzing and evaluating computer-generated content, with a focus on gaming in particular. Currently, the group is working on the ITU-T Work Item P.BBQCG on Parametric bitstream-based Quality Assessment of Cloud Gaming Services. Apart from this, Jerry (Xiangxu) Yu (University of Texas at Austin, US) presented a work on subjective and objective quality assessment of user-generated gaming videos and Nasim Jamshidi (TUB, Germany) presented a deep-learning bitstream-based video quality model for CG content.

No Reference Metrics (NORM)

The NORM group is an open collaborative project for developing no-reference metrics for monitoring visual service quality. Currently, the group is working on three topics: the development of no-reference metrics, the clarification of the computation of the Spatial and Temporal Indexes (SI and TI, defined in the ITU-T Recommendation P.910), and on the development of a standard for video quality metadata.  

At this meeting, this was one of the most active groups and the corresponding sessions included several presentations and discussions. Firstly, Yiannis Andreopoulos (iSIZE, UK) presented their work on domain-specific fusion of multiple objective quality metrics. Then, Werner Robitza (AVEQ GmbH/TU Ilmenau, Germany) presented the updates on SI/TI clarification activities, which is leading an update of the ITU-T Recommendation P.910. In addition, Lukas Krasula (Netflix, US) presented their investigations on the relation between banding annoyance and the overall quality perceived by the viewers. Hadi Amirpour (University of Klagenfurt, Austria) delivered two presentations related to their Video Complexity Analyzer and their Video Complexity Dataset, which are both publicly available. Finally, Mikołaj Leszczuk (AGH University , Poland) gave two talks on their research related to User-Generated Content (UGC) (a.k.a. in-the-wild video content) recognition and on advanced video quality indicators to characterise video content.   

Joint Effort Group (JEG) – Hybrid

The JEG group was focused on joint work to develop hybrid perceptual/bitstream metrics and gradually evolved over time to include several areas of Video Quality Assessment (VQA), such as the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. A report on the ongoing activities of the group was presented by Enrico Masala (Politecnico di Torino, Italy), which included the release of a new website to reflect the evolution that happened in the last few years within the group. Although currently the group is not directly seeking the development of new metrics or tools readily available for VQA, it is still working on related topics such as the studies by Lohic Fotio Tiotsop (Politecnico di Torino, Italy) on the sensitivity of artificial intelligence-based observers to input signal modification.

5G Key Performance Indicators (5GKPI)

The 5GKPI group studies the relationship between key performance indicators of new 5G networks and QoE of video services on top of them. In this meeting, Pablo Pérez (Nokia, Spain) presented an extended report on the group activities, from which it is worth noting the joint work on a contribution to the ITU-T Work Item G.QoE-5G

Immersive Media Group (IMG)

The IMG group is focused on the research on the quality assessment of immersive media. Currently, the main joint activity of the group is the development of a test plan for evaluating the QoE of immersive interactive communication systems. In this sense, Pablo Pérez (Nokia, Spain) and Jesús Gutiérrez (Universidad Politécnica de Madrid, Spain) presented a follow up on this test plan including an overview of the state-of-the-art on related works and a taxonomy classifying the existing systems [3]. This test plan is closely related to the work carried out by the ITU-T on QoE Assessment of eXtended Reality Meetings, so Gunilla Berndtsson (Ericsson, Sweden) presented the latest advances on the development of the P.QXM.  

Apart from this, there were four presentations related to the quality assessment of immersive media. Shirin Rafiei (RISE, Sweden) presented a study on QoE assessment of an augmented remote operating system for scaling in smart mining applications. Zhengyu Zhang (INSA Rennes, France) gave a talk on a no-reference quality metric for light field images based on deep-learning and exploiting angular and spatial information. Ali Ak (Nantes Université, France) presented a study on the effect of temporal sub-sampling on the accuracy of the quality assessment of volumetric video. Finally, Waqas Ellahi (Nantes Université, France) showed their research on a machine-learning framework to predict Tone-Mapping Operator (TMO) preference based on image and visual attention features [4].

Quality Assessment for Computer Vision Applications (QACoViA)

The goal of the QACoViA group is to study the visual quality requirements for computer vision methods. In this meeting, there were three presentations related to this topic. Mikołaj Leszczuk (AGH University, Poland) presented an objective video quality assessment method for face recognition tasks. Also, Alban Marie  (INSA Rennes, France) showed an analysis of the correlation of quality metrics with artificial intelligence accuracy. Finally, Lucie Lévêque (Nantes Université, France) gave an overview of a study on the reliability of existing algorithms for facial expression recognition [5]. 

Intersector Rapporteur Group on Audiovisual Quality Assessment (IRG-AVQA)

The IRG-AVQA group studies topics related to video and audiovisual quality assessment (both subjective and objective) among ITU-R Study Group 6 and ITU-T Study Group 12. In this sense, Chulhee Lee (Yonsei University, South Korea) and Alexander Raake (TU Ilmenau, Germany) provided an overview on ongoing activities related to quality assessment within ITU-R and ITU-T.

Other updates

In addition, the Human Factors for Visual Experiences (HFVE), whose objective is to uphold the liaison relation between VQEG and the IEEE standardization group P3333.1, presented their advances in relation to two standards: IEEE P3333.1.3 – Deep-Learning-based assessment of VE based on HF, which has been approved and published, and the IEEE P3333.1.4 on Light field imaging, which has been submitted and is in the process to be approved. Also, although there were not many activities in this meeting within the Implementer’s Guide for Video Quality Metrics (IGVQM) and the Psycho-Physiological Quality Assessment (PsyPhyQA) they are still active. Finally, as a reminder, the VQEG GitHub with tools and subjective labs setup is still online and kept updated.

The next VQEG plenary meeting will take place online in December 2022. Please, see VQEG Meeting information page for more information.

References

[1] ITU, “Subjective video quality assessment methods for multimedia applications”, ITU-T Recommendation P.910, Jul. 2022.
[2] Y. Sun, G. Mogos, “Impact of Visual Distortion on Medical Images”, IAENG International Journal of Computer Science, 1:49, Mar. 2022.
[3] P. Pérez, E. González-sosa, J. Gutiérrez, N. García, “Emerging Immersive Communication Systems: Overview, Taxonomy, and Good Practices for QoE Assessment”, Frontiers in Signal Processing, Jul. 2022.
[4] W. Ellahi, T. Vigier, P. Le Callet, “A machine-learning framework to predict TMO preference based on image and visual attention features”, International Workshop on Multimedia Signal Processing, Oct. 2021.
[5] E. M. Barbosa Sampaio, L. Lévêque, P. Le Callet, M. Perreira Da Silva, “Are facial expression recognition algorithms reliable in the context of interactive media? A new metric to analyse their performance”, ACM International Conference on Interactive Media Experiences, Jun. 2022.