The 153rd MPEG meeting took place online from January 19-23, 2026. The official MPEG press release can be found here. This report highlights key outcomes from the meeting, with a focus on research directions relevant to the ACM SIGMM community:
MPEG Roadmap
Exploration on MPEG Gaussian Splat Coding (GSC)
MPEG Immersive Video 2nd edition (new white paper)
MPEG Roadmap
MPEG released an updated roadmap showing continued convergence of immersive and “beyond video” media with deployment-ready systems work. Near-term priorities include 6DoF experiences (MPEG Immersive Video v2 and 6DoF audio), volumetric representations (dynamic meshes, solid point clouds, LiDAR, and emerging Gaussian splat coding), and “coding for machines,” which treats visual and audio signals as inputs to downstream analytics rather than only for human consumption.
Research aspects: The most promising research opportunities sit at the intersections: renderer and device-aware rate-distortion-complexity optimization for volumetric content; adaptive streaming and packaging evolution (e.g., MPEG-DASH / CMAF) for interactive 6DoF services under tight latency constraints; and cross-cutting themes such as media authenticity and provenance, green and energy metadata, and exploration threads on neural-network-based compression and compression of neural networks that foreshadow AI-native multimedia pipelines.
MPEG Gaussian Splat Coding (GSC)
Gaussian Splat Coding (GSC) is MPEG’s effort to standardize how 3D Gaussian Splatting content, scenes represented as sparse “Gaussian splats” with geometry plus rich attributes (scale and rotation, opacity, and spherical-harmonics appearance for view-dependent rendering), is encoded, decoded, and evaluated so it can be exchanged and rendered consistently across platforms. The main motivation is interoperability for immersive media pipelines: enabling reproducible results, shared benchmarks, and comparable rate-distortion-complexity trade-offs for use cases spanning telepresence and immersive replay to mobile XR and digital twins, while retaining the visual strengths that made 3DGS attractive compared to heavier neural scene representations.
The work remains in an exploration phase, coordinated across ISO/IEC JTC 1/SC 29 groups WG 4 (MPEG Video Coding) and WG 7 (MPEG Coding for 3D Graphics and Haptics) through Joint Exploration Experiments covering datasets and anchors, new coding tools, software (renderer and metrics), and Common Test Conditions (CTC). A notable systems thread is “lightweight GSC” for resource-constrained devices (single-frame, low-latency tracks using geometry-based and video-based pipelines with explicit time and memory targets), alongside an “early deployment” path via amendments to existing MPEG point-cloud codecs to more natively carry Gaussian-splat parameters. In parallel, MPEG is testing whether splat-specific tools can outperform straightforward mappings in quality, bitrate, and compute for real-time and streaming-centric scenarios.
Research aspects: Relevant SIGMM directions include splat-aware compression tools and rate-distortion-complexity optimization (including tracked vs. non-tracked temporal prediction); QoE evaluation for 6DoF navigation (metrics for view and temporal consistency and splat-specific artifacts); decoder and renderer co-design for real-time and mobile lightweight profiles (progressive and LOD-friendly layouts, GPU-friendly decode); and networked delivery problems such as adaptive streaming, ROI and view-dependent transmission, and loss resilience for splat parameters. Additional opportunities include interoperability work on reproducible benchmarking, conformance testing, and practical packaging and signaling for deployment.
MPEG Immersive Video 2nd edition (white paper)
The second edition of MPEG Immersive Video defines an interoperable bitstream and decoding process for efficient 6DoF immersive scene playback, supporting translational and rotational movement with motion parallax to reduce discomfort often associated with pure 3DoF viewing. The second edition primarily extends functionality (without changing the high-level bitstream structure), adding capabilities such as capture-device information, additional projection types, and support for Simple Multi-Plane Image (MPI), alongside tools that better support geometry and attribute handling and depth-related processing.
Architecturally, MIV ingests multiple (unordered) camera views with geometry (depth and occupancy) and attributes (e.g., texture), then reduces inter-view redundancy by extracting patches and packing them into 2D “atlases” that are compressed using conventional video codecs. MIV-specific metadata signals how to reconstruct views from the atlases. The standard is built as an extension of the common Visual Volumetric Video-based Coding (V3C) bitstream framework shared with V-PCC, with profiles that preserve backward compatibility while introducing a new profile for added second-edition functionality and a tailored profile for full-plane MPI delivery.
Research aspects: Key SIGMM topics include systems-efficient 6DoF delivery (better view and patch selection and atlas packing under latency and bandwidth constraints); rate-distortion-complexity-QoE optimization that accounts for decode and render cost (especially on HMD and mobile) and motion-parallax comfort; adaptive delivery strategies (representation ladders, viewport and pose-driven bit allocation, robust packetization and error resilience for atlas video plus metadata); renderer-aware metrics and subjective protocols for multi-view temporal consistency; and deployment-oriented work such as profile and level tuning, codec-group choices (HEVC / VVC), conformance testing, and exploiting second-edition features (capture device info, depth tools, Simple MPI) for more reliable reconstruction and improved user experience.
Concluding Remarks
The meeting outcomes highlight a clear shift toward immersive and AI-enabled media systems where compression, rendering, delivery, and evaluation must be co-designed. These developments offer timely opportunities for the ACM SIGMM community to contribute reproducible benchmarks, perceptual metrics, and end-to-end streaming and systems research that can directly influence emerging standards and deployments.
The 154th MPEG meeting will be held in Santa Eulària, Spain, from April 27 to May 1, 2026. Click here for more information about MPEG meetings and ongoing developments.
JPEG XS developers awarded the Engineering, Science and Technology Emmy®.
The 109th JPEG meeting was held in Nuremberg, Germany, from 12 to 17 October 2025.
This JPEG meeting began with the excellent news that JPEG XS developers Fraunhofer IIS and intoPIX were awarded the Engineering, Science and Technology Emmy® for their contributions to the development of the JPEG XS standard.
Furthermore the 109th JPEG meeting was also marked by several major achievements: JPEG Trust Part 2 on Trust Profiles and Reports, complementing Part 1 with several profiles for various usage scenarios, reached Committee Draft; JPEG AIC part 3 was produced for final publication by ISO; JPEG XE reached Committee Draft stage; and the calls for proposals on objective evaluation JPEG AIC-4 and JPEG Pleno Quality Assessment of Light Field received several responses.
The following sections summarise the main highlights of the 109th JPEG meeting:
Fraunhofer IIS and intoPIX representatives with the awarded Engineering, Science and Technology Emmy®.
JPEG Trust Part 2 on Trust Profiles and Reports reaches Committee Draft stage.
JPEG AIC-4 receives responses to the Call for Proposals on Objective Image Quality Assessment.
JPEG XE Part 1, the core coding system, reaches DIS stage.
JPEG XS Part 1 AMD 1 reaches DIS stage.
JPEG AI Part 2 (Profiling), Part 3 (Reference Software), and Part 5 (File Format) approved as International Standards.
JPEG DNA designed the wet-lab experiments, including DNA synthesis/sequencing.
JPEG Peno receives responses to the Call for Proposals on Objective Metrics for Light Field Quality Assessment.
JPEG RF establishes frameworks for coding and quality assessment of radiance fields.
JPEG XL innitiates embedding of JPEG XL in ISOBMFF/HEIF.
JPEG Trust
At the 109th JPEG Meeting, the JPEG Committee reached a key milestone with the completion of the Committee Draft (CD) for JPEG Trust Part 2 – Trust Profiles and Reports (ISO/IEC 21617-2). Building on the framework established in Part 1 (Core Foundation), this new specification further refines Trust Profiles and Trust Reports and provides several example profiles and reusable profile snippets for adoption in diverse usage scenarios.
Compared to earlier drafts, the new Trust Profiles specification introduces templates and dynamic metadata blocks, offering enhanced flexibility while maintaining full backwards compatibility for existing profiles. This flexibility is also reflected in the updated Trust Reports, which can now be more easily tailored to specific usage scenarios. This new specification sets the stage for user communities to build their own Trust Profiles and customise them to their specific needs.
In addition to the CD on Part 2, the committee also produced a CD of Part 4 – Reference Software. This specification provides a reference implementation and reference dataset of the Core Foundation. The reference software will be extended with additional implementations in the future.
Finally, the committee also advanced Part 3 – Media Asset Watermarking. The Terms and Definitions and Use Cases and Requirements documents are now publicly available on the JPEG website. The development of Part 3 is progressing on schedule, with the Committee Draft stage targeted for January 2026.
JPEG AIC
The JPEG AIC-3 standard, which specifies a methodology for fine-grained subjective image quality assessment in the range from good quality up to mathematically lossless, was finalised at the 109th JPEG meeting and will be published as International Standard ISO/IEC 29170-3.
In response to the JPEG AIC-4 Call for Proposals on Objective Image Quality Assessment, four proposals were received and presented. A large-scale subjective experiment has been prepared in order to evaluate the proposals.
JPEG XE
JPEG XE is a joint effort between ITU-T SG21 and ISO/IEC JTC1/SC29/WG1 and will become the first internationally endorsed specification by major standardization bodies ITU-T, ISO, and IEC, for coding of events. It aims to establish a robust and interoperable format for efficient representation and coding of events in the context of machine vision and related applications. To expand the reach of JPEG XE, the JPEG Committee has closely coordinated its activities with the MIPI Alliance with the intention of developing a cross-compatible coding mode, allowing MIPI ESP signals to be decoded effectively by JPEG XE decoders.
At the 109th JPEG Meeting, the DIS of JPEG XE Part 1, the core coding system, was prepared. This part specifies the low-complexity and low-latency lossless coding technology that will be the foundation of JPEG XE. Reaching DIS stage is a major milestone and freezes the core coding technology for the first edition of JPEG XE. The JPEG Committee plans to further improve the coding performance and to provide additional lossless and lossy coding modes, scheduled to be developed in 2026. While the DIS of Part 1 is under ballot for approval as an International Standard, the JPEG Committee initiated the work on Part 2 of JPEG XE to define the profiles and levels. A DIS of Part 2 is planned to be ready for ballot in January 2026.
With JPEG XE Part 1 under ballot and Part 2 in the pipeline, the JPEG Committee remains committed to the development of a comprehensive and industry-aligned standard that meets the growing demand for event-based vision technologies. The collaborative approach between multiple standardisation organisations underscores a shared vision for a unified, international standard to accelerate innovation and interoperability in this emerging field.
JPEG XS
The JPEG Committee is extremely proud to announce that the two companies behind the development of JPEG XS, intoPIX and Fraunhofer IIS, were awarded an Emmy® for Engineering, Science, and Technology for their role in the development of the JPEG XS standard. The awards ceremony was held on October 14th, 2025, at the Television Academy’s Saban Media Center in North Hollywood, California. This award recognizes JPEG XS for being a state-of-the-art image compression format that transmits high-quality images with minimal latency and low-resource consumption, with visually near-lossless image quality. It affirms that JPEG XS is the fundamental game changer for real-time transmission of video in live, professional video, and broadcast applications, and that it is being heavily adopted by the industry.
Nevertheless, the work to further improve JPEG XS continues. In this context, the DIS of AMD 1 of JPEG XS Part 1 is currently under ballot at ISO and is expected to be ready by January 2026. This amendment enables the embedding of sub-frame metadata to JPEG XS as required by augmented and virtual reality applications currently discussed within VESA. The JPEG Committee also initiated the steps to start an amendment for Part 2 (Profiles and buffer models) that will define additional sublevels needed to support on-the-fly proxy-level extraction (i.e. lower resolution streams from a master stream) without recompression. The amendment is planned to go to DIS ballot at the next 110th JPEG meeting in Sydney, Australia.
JPEG AI
During the 109th JPEG meeting, the JPEG AI project achieved major milestones, with Part 2 (Profiling), Part 3 (Reference Software), and Part 5 (File Format) approved as International Standards. Meanwhile, Part 4 (Conformance) is proceeding to publication after a positive ballot. The Core Experiments confirmed that JPEG AI outperforms state-of-the-art codecs in compression efficiency and demonstrated a decoder implementation based on the SADL library.
JPEG DNA
During the 109th JPEG meeting, the JPEG Committee designed the wet-lab experiments, including DNA synthesis/sequencing, with results expected by January 2026. The primary objective of the wet-lab experiments is to validate the technical specifications outlined in the current DIS study text of ISO/IEC 25508-1 in the realistic procedures for DNA media storage. Additional efforts are underway as a new Core Experiment to study the performance of the codec-dependent unequal error correction technique, which is expected to result in the future publication of JPEG DNA Part 2 – Profiles and levels.
JPEG Pleno
JPEG Pleno marked a pivotal step toward the forthcoming ISO/IEC 21794-7 standard, Light Field Quality Assessment. The new Part 7 was officially approved for inclusion in the ISO/IEC work programme, confirming international support for standardizing light field quality assessment methodologies. Moreover, in response to the Call for Proposals on Objective Metrics for Light Field Quality Assessment, three proposals were received and presented. In preparation for the evaluation of the proposals submitted in response to the CfP, an evaluation dataset was released and discussed during the meeting. The next milestone is the execution of a Subjective Quality Assessment on the evaluation dataset to evaluate the proposed objective metrics by the 110th JPEG meeting in Sydney. To this end, the methodological design and preparation of the subjective test were discussed and finalized, marking an important step toward developing the standardization framework for objective light field quality assessment.
The JPEG Pleno Workshop on Emerging Coding Technologies for Plenoptic Modalities was conducted at the 109th meeting with presentations from Touradj Ebrahimi (JPEG Convenor), Peter Schelkens (JPEG Plenoptic Coding and Quality Sub-Group Chair), Aljosa Smolic (Hochschule Luzern), Søren Otto Forchhammer (Danmarks Tekniske Universitet), Giuseppe Valenzise (Université Paris-Saclay), Amr Rizk (Leibniz Universität Hannover), Michael Rudolph (Leibniz Universität Hannover), and Irene Viola (Centrum Wiskunde & Informatica).
JPEG RF
At the 109th JPEG Meeting the exploration activity on JPEG Radiance Fields (JPEG RF) continued its progress toward establishing frameworks for coding and quality assessment of radiance fields. The group updated the drafts of the Use Cases and Requirements and Common Test Conditions, alongside the outcomes of an Exploration Study, which examined the impact of camera trajectory design on human perception during a subjective quality assessment. These discussions refined methodological guidelines for trajectory generation and the subjective assessment procedures. Building on this progress, Exploration Study 6 was launched to benchmark the complete assessment framework through a subjective experiment using the developed protocols. Outreach activities were also planned to engage additional stakeholders and support further development ahead of the next 110th JPEG Meeting in Sydney, Australia.
JPEG XL
At the 109th JPEG meeting, work has started on an embedding of JPEG XL in ISOBMFF/HEIF. It will be described in a new edition of ISO/IEC 18181-2, which has been initiated.
Final Quote
“During the 109th JPEG Meeting, the JPEG Committee reached several important milestones. In particular, JPEG Trust continues its development with the addition of new Parts towards the creation of a reliable and effective standard that restores authenticity and provenance of the multimedia information.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.
This column introduces the now completed ITU-T P.1204 video quality model standards for assessing sequences up to UHD/4K resolution. Initially developed over two years by ITU-T Study Group 12 (Question Q14/12) and VQEG, the work used a large dataset of 26 subjective tests (13 for training, 13 for validation), each involving at least 24 participants rating sequences on the 5-point ACR scale. The tests covered diverse encoding settings, bitrates, resolutions, and framerates for H.264/AVC, H.265/HEVC, and VP9 codecs. The resulting 5,000-sequence dataset forms the largest lab-based source for model development to date. Initially standardized were P.1204.3, a no-reference bitstream-based model with full bitstream access, P.1204.4, a pixel-based, reduced-/full-reference model, and P.1204.5, a no-reference hybrid model. The current record focuses on the latest additions to the series, namely P.1204.1, a parametric, metadata-based model using only information about which codec was used, plus bitrate, framerate and resolution, and P.1204.2, which in addition uses frame-size and frame-type information to include video-content aspects into the predictions.
Introduction
Video quality under specific encoding settings is central to applications such as VoD, live streaming, and audiovisual communication. In HTTP-based adaptive streaming (HAS) services, bitrate ladders define video representations across resolutions and bitrates, balancing screen resolution and network capacity. Video quality, a key contributor to users’ Quality of Experience (QoE), can vary with bandwidth fluctuations, buffer delays, or playback stalls.
While such quality fluctuations and broader QoE aspects are discussed elsewhere, this record focuses on short-term video quality as modeled by ITU-T P.1204 for HAS-type content. These models assess segments of around 10s under reliable transport (e.g., TCP, QUIC), covering resolution, framerate, and encoding effects, but excluding pixel-level impairments from packet loss under unreliable transport.
Because video quality is perceptual, subjective tests, laboratory or crowdsourced, remain essential, especially at high resolutions such as 4K UHD under controlled viewing conditions (1.5H or 1.6H viewing distance). Yet, studies show limited perceptual gain between HD and 4K, depending on source content, underlining the need for representative test materials. Given the high cost of such tests, objective (instrumental) models are required for scalable, automated assessment supporting applications like bitrate ladder design and service monitoring.
Four main model classes exist: metadata-based, bitstream-based, pixel-based, and hybrid. Metadata-based models use codec parameters (e.g., resolution, bitrate) and are lightweight; bitstream-based models analyze encoded streams without decoding, as in ITU-T P.1203 and P.1204.3 [1][2][3][7]. Pixel-based models compare decoded frames and include Full Reference and Reduced Reference models (e.g., P.1204.4, and also PSNR [9], SSIM [10], VMAF [11][12]), as well as No Reference variants. Finally, hybrid models combine pixel and bitstream or metadata inputs, exemplified by the ITU-T P.1204.5 standard. These three standards, P.1204.3 P.1204.4 and P.1204.5, formed the initial P.1204 Recommendation series finalized in 2020.
ITU-T P.1204 series completed with P.1204.1 and P.1204.2
The respective standardization project under the Work Item name P.NATS Phase 2 (read: Peanuts) was a unique video quality model development competition conducted in collaboration between ITU-T Study Group 12 (SG12) and the Video Quality Experts Group (VQEG). The target use cases were for up to UHD/4K resolution, with presentation on UHD/4K resolution PC/TV or Mobile/Tablet (MO/TA). For the first time, bitstream-, pixel-based, and hybrid models were jointly developed, trained, and validated, using a large common subjective dataset comprising 26 tests, each with at least 24 participants (see, e.g., [1] for details). The P.NATS Phase 2 work built on the earlier “P.NATS Phase 1” project, which resulted in the ITU-T Rec. P.1203 standards series (P.1203, P.1203.1, P.1203.2, P.1203.3). In the P.NATS Phase 2 project, video quality models in five different categories were evaluated, and different candidates were found to be eligible to be recommended as standards. The initially standardized three models out of the five categories were the aforementioned P.1204.3, P.1204.4 and P.1204.5. However, due to the lack of consensus between the winning proponents, no models were recommended as standards for the category “bitstream Mode 0” with access to high-level metadata only, such as the video codec, resolution, framerate and bitrate used, and “bitstream Mode 1”, with further access to frame-size information that can be used for content-complexity estimation.
For the latest model additions of P.1204.1 and P.1204.2, subsets of the databases initially used in the P.NATS Phase 2 project were employed for model training. Two different datasets belonging to the two contexts PC/TV and MO/TA were used for training the models. AVT-PNATS-UHD-1 is the dataset for the PC/TV use case and ERCS-PNATS-UHD-1 the dataset used for the MO/TA use case.
AVT-PNATS-UHD-1 [7] consists of four different subjective tests conducted by TU Ilmenau as part of the P.NATS Phase 2 competition. The target resolution of these datasets was 3840 x 2160 pixels. ERCS-PNATS-UHD-1 [1] is a dataset targeting the MO/TA use case. It consists of one subjective test conducted by Ericsson as part of the P.NATS Phase 2 competition. The target resolution of these datasets was 2560 x 1440 pixels.
For model performance evaluation, beyond AVT-PNATS-UHD-1, further externally available video-quality test databases were used, as outlined in the following.
AVT-VQDB-UHD-1: This is a publicly available dataset and consists of four different subjective tests. All the four tests had a full-factorial design. In total, 17 different SRCs with a duration of 7-10 s were used across all the four tests. All the sources had a resolution of 3840×2160 pixels and a framerate of 60 fps. For HRC design, bitrate was selected in fixed (i.e. non-adaptive) values per PVS between 200kbps and 40000kbps, resolution between 360p and 2160p and framerate between 15fps and 60fps. In all the tests, a 2-pass encoding approach was used to encode the videos, with medium preset for H.264 and H.265, and the speed parameter for VP9 set to the default value “0”. A total of 104 participants in the four tests.
GVS: This dataset consists of 24 SRCs that have been extracted from 12 different games. The SRCs are of 1920×1080 pixel resolution, 30fps framerate and have a duration of 30s . The HRC design included three different resolutions, namely, 480p, 720p and 1080p . 90 PVSs resulting from 15 bitrate-resolution pairs were used for subjective evaluation. A total of 25 participants rated all the 90 PVSs.
KUGVD: Six SRCs out of the 24 SRCs from the GVSwere used to develop KUGVD. The same bitrate-resolution pairs from GVS were included to define the HRCs. In total, 90 PVSs were used in the subjective evaluation and 17 participants took part in the test.
CGVDS: This dataset consists of SRCs captured at 60fps from 15 different games. For designing the HRCs, three resolutions, namely, 480p, 720p and 1080p at three different framerates of 20, 30, and 60fps were considered. To ensure that the SRCs from all the games could be assessed by test subjects, the overall test was split into 5 different subjective tests, with a minimum of 72 PVSs being rated in each of the tests. A total of over 100 participants took part over the five different tests, with a minimum of 20 participants per test.
Twitch: The Twitch Dataset consists of 36 different games, with 6 games each representing one out of 6 pre-defined genres. The dataset consists of streams directly downloaded from Twitch. A total of 351 video sequences of approximately 50s duration across all representations were downloaded. 90 video sequences out of these 351 video sequences were selected for subjective evaluation. Only the first 30s of the chosen 90 PVSs were considered for subjective testing. Six different resolutions between 160p and 1080p at framerates of 30 and 60fps were used. 29 participants rated all the 90 PVSs.
BBQCG: This is the training dataset developed as part of the P.BBQCG work item. This dataset consists of nine subjective test databases. Three out of these nine test databases consisted of processed video sequences (PVSs) up to 1080p/120fps and the remaining had PVSs up to 4K/60fps. Three codecs, namely, H.264, H.265, and AV1 were used to encode the videos. Overall 900 different PVSs were created from 12 sources (SRCs) by encoding the SRCs with different encoding settings.
AVT-VQDB-UHD-1-VD: This dataset consists of 16 source contents encoded using a CRF-based encoding approach. Overall 192 PVSs were generated by encoding all 16 sources in four resolutions, namely, 360p, 720p, 1080p, 2160p with three CRF values (22, 30, 38) each. A total of 40 subjects participate in the study.
ITU-T P.1204.1 and P.1204.2 model prediction performance
The performance figures of the two new models P.1204.1 and P.1204.2 models on the different datasets are indicated in Table 1 (P.1204.1) and Table 2 (P.1204.2) below.
Table 1: Performance of P.1204.1 (Mode 0) on the evaluation datasets, in terms of Root Mean Square Error (RMSE, measure used as winning criterion in the ITU-T/VQEG modelling competition). Pearson Correlation Coefficeint (PCC), Spearman Rank Correlation Coefficient (SRCC) and Kendall’s tau.
Dataset
RMSE
PCC
SRCC
Kendall
AVT-VQDB-UHD-1
0.499
0.890
0.877
0.684
KUGVD
0.840
0.590
0.570
0.410
GVS
0.690
0.670
0.650
0.490
CGVDS
0.470
0.780
0.750
0.560
Twitch
0.430
0.920
0.890
0.710
BBQCG
0.598 (on a 7-point scale)
0.841
0.843
0.647
AVT-VQDB-UHD-1-VD
0.650
0.814
0.813
0.617
Table 2: Performance of P.1204.1 (Mode 1) on the evaluation datasets, in terms of Root Mean Square Error (RMSE, measure used as winning criterion in the ITU-T/VQEG modelling competition). Pearson Correlation Coefficeint (PCC), Spearman Rank Correlation Coefficient (SRCC) and Kendall’s tau.
Dataset
RMSE
PCC
SRCC
Kendall
AVT-VQDB-UHD-1
0.476
0.901
0.900
0.730
KUGVD
0.500
0.870
0.860
0.690
GVS
0.420
0.890
0.870
0.710
CGVDS
0.360
0.900
0.880
0.690
Twitch
0.370
0.940
0.930
0.770
BBQCG
0.737 (on a 7-point scale)
0.745
0.746
0.547
AVT-VQDB-UHD-1-VD
0.598
0.845
0.845
0.654
For all databases except BBQCG and KUGVD, the Mode 0 model P.1204.1 performs in a solid way, as shown in Table 1. With the information about frame types and sizes available to the Mode 1 model P.1204.2, performance improves considerably, as shown in Table 2. For performance results of all three previously standardized models, P.1204.3, P.1204.4 and P.1204.5, the reader is referred to [1] and the individual standards, [4][5][6]. For the P.1204.3 model, complementary performance information is presented in, e.g., [2][7]. For P.1204.4, additional model performance information is available in [8], including results for AV1, AVS2, and VVC.
The following plots provide an illustration of how the new P.1204.1 Mode 0 model may be used. Here, bitrate-ladder-type graphs are presented, with the predicted Mean Opinion Score on a 5-point scale plotted over log bitrate.
Codec: H.264
Codec: H.265
Codec: VP9
Conclusions and Outlook
The P.1204 standard
series now comprises the complete initially planned set of models, namely:
ITU-T P.1204.1: Bitstream Mode 0, i.e., metadata-based model with access to information about video codec, resolution, framerate and bitrate used.
ITU-T P.1204.2: Bitstream Mode 1, i.e., metadata-based model with access to information about video codec, resolution, framerate and bitrate used, plus information about video frame types and sizes.
Extensions of some of these models beyond the initial scope of codecs (H.264/AVC, H.265/HEVC, VP9) have been included over the last few years. Here, P.1204.4 and P.1204.5 have been extended (P.1204.5) or evaluated (P.1204.4) to also cover the AV1 video codec. Work in ITU-T SG12 (Q14/12) is ongoing so as to also extend P.1204.1, P.1204.2 and P.1204.3 to newer codecs such as AV1, and all five models are planned to be extended so as to also cover VVC. It is noted that for P.1204.3, P.1204.4 and P.1204.5, also long-term quality integration modules that generate per-session scores for up to 5min long streaming sessions have been described in Appendices of the respective recommendations. For P.1204.1 and P.1204.2, this extension still has to be completed. Initial evaluations for similar Mode 0 and Mode 1 models that use the P.1204.3-type long-term integration can be found in [7].
The 152nd MPEG meeting took place in Geneva, Switzerland, from October 7 to October 11, 2025. The official MPEG press release can be found here. This column highlights key points from the meeting, amended with research aspects relevant to the ACM SIGMM community:
MPEG Systems received an Emmy® Award for the Common Media Application Format (CMAF). A separate press release regarding this achievement is available here.
JVET ratified new editions of VSEI, VVC, and HEVC
The fourth edition of Visual Volumetric Video-based Coding (V3C and V-PCC) has been finalized
Responses to the call for evidence on video compression with capability beyond VVC successfully evaluated
MPEG Systems received an Emmy® Award for the Common Media Application Format (CMAF)
On September 18, 2025, the National Academy of Television Arts & Sciences (NATAS) announced that the MPEG Systems Working Group (ISO/IEC JTC 1/SC 29/WG 3) had been selected as a recipient of a Technology & Engineering Emmy® Award for standardizing the Common Media Application Format (CMAF). But what is CMAF? CMAF (ISO/IEC 23000-19) is a media format standard designed to simplify and unify video streaming workflows across different delivery protocols and devices. Here’s a structured overview. Before CMAF, streaming services often had to produce multiple container formats, i.e., (i) ISO Base Media File Format (ISOBMFF) for MPEG-DASH and MPEG-2 Transport Stream (TS) for Apple HLS. This duplication resulted in additional encoding, packaging, and storage costs. I wrote a blog post about this some time ago here. CMAF’s main goal is to define a single, standardized segmented media format usable by both HLS and DASH, enabling “encode once, package once, deliver everywhere.”
The core concept of CMAF is that it is based on ISOBMFF, the foundation for MP4. Each CMAF stream consists of a CMAF header, CMAF media segments, and CMAF track files (a logical sequence of segments for one stream, e.g., video or audio). CMAF enables low-latency streaming by allowing progressive segment transfer, adopting chunked transfer encoding via CMAF chunks. CMAF defines interoperable profiles for codecs and presentation types for video, audio, and subtitles. Thanks to its compatibility with and adoption within existing streaming standards, CMAF bridges the gaps between DASH and HLS, creating a unified ecosystem.
Research aspects include – but are not limited to – low-latency tuning (segment/chunk size trade-offs, HTTP/3, QUIC), Quality of Experience (QoE) impact of chunk-based adaptation, synchronization of live and interactive CMAF streams, edge-assisted CMAF caching and prediction, and interoperability testing and compliance tools.
JVET ratified new editions of VSEI, VVC, and HEVC
At its 40th meeting, the Joint Video Experts Team (JVET, ISO/IEC JTC 1/SC 29/WG 5) concluded the standardization work on the next editions of three key video coding standards, advancing them to the Final Draft International Standard (FDIS) stage. Corresponding twin-text versions have also been submitted to ITU-T for consent procedures. The finalized standards include:
High Efficiency Video Coding (HEVC) — ISO/IEC 23008-2 | ITU-T Rec. H.265
The primary focus of these new editions is the extension and refinement of Supplemental Enhancement Information (SEI) messages, which provide metadata and auxiliary data to support advanced processing, interpretation, and quality management of coded video streams.
The updated VSEI specification introduces both new and refined SEI message types supporting advanced use cases:
AI-driven processing: Extensions for neural-network-based post-filtering and film grain synthesis offer standardized signalling for machine learning components in decoding and rendering pipelines.
Semantic and multimodal content: New SEI messages describe infrared, X-ray, and other modality indicators, region packing, and object mask encoding; creating interoperability points for multimodal fusion and object-aware compression research.
Pipeline optimization: Messages defining processing order and post-processing nesting support research on joint encoder-decoder optimization and edge-cloud coordination in streaming architectures.
Authenticity and generative media: A new set of messages supports digital signature embedding and generative-AI-based face encoding, raising questions for the SIGMM community about trust, authenticity, and ethical AI in media pipelines.
Metadata and interpretability: New SEIs for text description, image format metadata, and AI usage restriction requests could facilitate research into explainable media, human-AI interaction, and regulatory compliance in multimedia systems.
All VSEI features are fully compatible with the new VVC edition, and most are also supported in HEVC. The new HEVC edition further refines its multi-view profiles, enabling more robust 3D and immersive video use cases.
Research aspects of these new standard’s editions can be summarized as follows: (i) Define new standardized interfaces between neural post-processing and conventional video coding, fostering reproducible and interoperable research on learned enhancement models. (ii) Encourage exploration of metadata-driven adaptation and QoE optimization using SEI-based signals in streaming systems. (iii) Open possibilities for cross-layer system research, connecting compression, transport, and AI-based decision layers. (iv) Introduce a formal foundation for authenticity verification, content provenance, and AI-generated media signalling, relevant to current debates on trustworthy multimedia.
These updates highlight how ongoing MPEG/ITU standardization is evolving toward a more AI-aware, multimodal, and semantically rich media ecosystem, providing fertile ground for experimental and applied research in multimedia systems, coding, and intelligent media delivery.
The fourth edition of Visual Volumetric Video-based Coding (V3C and V-PCC) has been finalized
MPEG Coding of 3D Graphics and Haptics (ISO/IEC JTC 1/SC 29/WG7) has advanced MPEG-I Part 5 – Visual Volumetric Video-based Coding (V3C and V-PCC) to the Final Draft International Standard (FDIS) stage, marking its fourth edition. This revision introduces major updates to the Video-based Coding of Volumetric Content (V3C) framework, particularly enabling support for an additional bitstream instance: V-DMC (Video-based Dynamic Mesh Compression).
Previously, V3C served as the structural foundation for V-PCC (Video-based Point Cloud Compression) and MIV (MPEG Immersive Video). The new edition extends this flexibility by allowing V-DMC integration, reinforcing V3C as a generic, extensible framework for volumetric and 3D video coding. All instances follow a shared principle, i.e., using conventional 2D video codecs (e.g., HEVC, VVC) for projection-based compression, complemented by specialized tools for mapping, geometry, and metadata handling.
While V-PCC remains co-specified within Part 5, MIV (Part 12) and V-DMC (Part 29) are standardized separately. The progression to FDIS confirms the technical maturity and architectural stability of the framework.
This evolution opens new research directions as follows: (i) Unified 3D content representation, enabling comparative evaluation of point cloud, mesh, and view-based methods under one coding architecture. (ii) Efficient use of 2D codecs for 3D media, raising questions on mapping optimization, distortion modeling, and geometry-texture compression. (iii) Dynamic and interactive volumetric streaming, relevant to AR/VR, telepresence, and immersive communication research.
The fourth edition of MPEG-I Part 5 thus positions V3C as a cornerstone for future volumetric, AI-assisted, and immersive video systems, bridging standardization and cutting-edge multimedia research.
Responses to the call for evidence on video compression with capability beyond VVC successfully evaluated
The Joint Video Experts Team (JVET, ISO/IEC JTC 1/SC 29/WG 5) has completed the evaluation of submissions to its Call for Evidence (CfE) on video compression with capability beyond VVC. The CfE investigated coding technologies that may surpass the performance of the current Versatile Video Coding (VVC) standard in compression efficiency, computational complexity, and extended functionality.
A total of five submissions were assessed, complemented by ECM16 reference encodings and VTM anchor sequences with multiple runtime variants. The evaluation addressed both compression capability and encoding runtime, as well as low-latency and error-resilience features. All technologies were derived from VTM, ECM, or NNVC frameworks, featuring modified encoder configurations and coding tools rather than entirely new architectures.
Key Findings
In the compression capability test, 76 out of 120 test cases showed at least one submission with a non-overlapping confidence interval compared to the VTM anchor. Several methods outperformed ECM16 in visual quality and achieved notable compression gains at lower complexity. Neural-network-based approaches demonstrated clear perceptual improvements, particularly for 8K HDR content, while gains were smaller for gaming scenarios.
In the encoding runtime test, significant improvements were observed even under strict complexity constraints: 37 of 60 test points (at both 1× and 0.2× runtime) showed statistically significant benefits over VTM. Some submissions achieved faster encoding than VTM, with only a 35% increase in decoder runtime.
Research Relevance and Outlook
The CfE results illustrate a maturing convergence between model-based and data-driven video coding, raising research questions highly relevant for the ACM SIGMM community:
How can learned prediction and filtering networks be integrated into standard codecs while preserving interoperability and runtime control?
What methodologies can best evaluate perceptual quality beyond PSNR, especially for HDR and immersive content?
How can complexity-quality trade-offs be optimized for diverse hardware and latency requirements?
Building on these outcomes, JVET is preparing a Call for Proposals (CfP) for the next-generation video coding standard, with a draft planned for early 2026 and evaluation through 2027. Upcoming activities include refining test material, adding Reference Picture Resampling (RPR), and forming a new ad hoc group on hardware implementation complexity.
For multimedia researchers, this CfE marks a pivotal step toward AI-assisted, complexity-adaptive, and perceptually optimized compression systems, which are considered a key frontier where codec standardization meets intelligent multimedia research.
The 153rd MPEG meeting will be held online from January 19 to January 23, 2026. Click here for more information about MPEG meetings and their developments.
JPEG XE reaches Committee Draft stage at the 108th JPEG meeting
The 108th JPEG meeting was held in Daejeon, Republic of Korea, from 29 June to 4 July 2025.
During this meeting, the JPEG Committee finalised the Committee Draft of JPEG XE, an upcoming International Standard for lossless coding of visual events, that has been sent for consultation of ISO/IEC JTC1/SC29 national bodies. JPEG XE will be the first International Standard developed for the lossless representation and coding of visual events, and is being developed under the auspices of ISO, IEC, and ITU.
Furthermore, the JPEG Committee was informed that the prestigious Joseph von Fraunhofer Prize 2025 was awarded to three JPEG Committee members Prof. Siegfried Fößel, Dr. Joachim Keinert and Dr. Thomas Richter, for their contributions to the development of the JPEG XS standard. The JPEG XS standard specifies a compression technology with very low latency at a low implementation complexity and with a very precise bit-rate control. A presentation video can be accessed here.
108th JPEG Meeting in Daejeon, Rep. of Korea.
The following sections summarise the main highlights of the 108th JPEG meeting:
JPEG XE Committee Draft sent for consultation
JPEG Trust second edition aligns with C2PA
JPEG AI parts 2, 3 and 4 proceed for publication as IS
JPEG DNA reaches DIS stage
JPEG AIC on Objective Image Quality Assessment
JPEG Pleno Learning-based Point Cloud Coding proceed for publication as IS
JPEG XS Part 1 Amendment 1 proceeds to DIS stage
JPEG RF explores 3DGS coding and quality evaluation
JPEG XE
At the 108th JPEG Meeting, the Committee Draft of the first International Standard for lossless coding of events was issued and sent for consultation to ISO/IEC JTC1/SC29 national bodies for consultation. JPEG XE is being developed under the auspices of ISO/IEC and ITU-T and aims to establish a robust and interoperable format for efficient representation and coding of events in the context of machine vision and related applications. By reaching the Committee Draft stage, the JPEG Committee has attained a very important milestone. The Committee Draft was produced based on the five received responses to a Call for Proposals issued after the 104th JPEG Meeting held in July 2024. The two submissions meet the requirements for the constrained lossless coding of events and allow the implementation and operation of the coding model with limited resources, power, and complexity. The remaining three responses address the unconstrained coding mode and will be considered in a second phase of standardisation.
JPEG XE is the fruit of a joint effort between ISO/IEC JTC1/SC29/WG1 and ITU-T SG21 and is hoped to result in a largely supported JPEG XE standard, improving the potential compatibility and interoperability across applications, products, and services. Additionally, the JPEG Committee is in contact with the MIPI Alliance with the intention of developing a cross-compatible coding mode, allowing MIPI ESP signals to be decoded effectively by JPEG XE decoders.
The JPEG Committee remains committed to the development of a comprehensive and industry-aligned standard that meets the growing demand for event-based vision technologies. The collaborative approach between multiple standardisation organisations underscores a shared vision for a unified, international standard to accelerate innovation and interoperability in this emerging field.
JPEG Trust
JPEG Trust completed its second edition of JPEG Trust Part 1: Core Foundation, which brings JPEG Trust into alignment with the updated C2PA specification 2.1 and integrates aspects of Intellectual Property Rights (IPR). This second edition is now approved as a Draft International Standard for submission to ISO/IEC balloting, with an expected completion timeframe at the end of 2025.
Showcasing the adoption of JPEG Trust technology, JPEG Trust Part 4 – Reference software has now reached the Committee Draft stage.
Work continues on JPEG Trust Part 2: Trust profiles catalogue, a repository of Trust Profile and reporting snippets designed to assist implementers in constructing their Trust Profiles and Trust Reports, as well as JPEG Trust Part 3: Media asset watermarking.
JPEG AI
During the 108th JPEG meeting, JPEG AI Parts 2, 3, and 5 received positive DIS ballot results with only editorial comments, allowing them to proceed to publication as International Standards. These parts extend Part 1 by specifying stream and decoder profiles, reference software with usage documentation, and file format embedding for container formats such as ISOBMFF and HEIF.
The results from two Core Experiments were reviewed. The first evaluated gain map-based HDR coding, comparing it to simulcast methods and HEIC, while the second focused on implementing JPEG AI on smartphones using ONNX. Progressive decoding performance was assessed under channel truncation, and adaptive selection techniques were proposed to mitigate losses. Subjective and objective evaluations confirmed JPEG AI’s strong performance, often surpassing codecs such as VVC Intra, AVIF, JPEG XL, and performing comparably to ECM in informal viewing tests.
Another contribution explored compressed-domain image classification using latent representations, demonstrating competitive accuracy across bitrates. A proposal to limit tile splits in JPEG AI Part 2 was also discussed, and experiments identified Model 2 as the most robust and efficient default model for the levels with only one model at the decoder side.
JPEG DNA
During the 108th JPEG meeting, the JPEG Committee produced a study DIS text of JPEG DNA Part 1 (ISO/IEC 25508-1). The purpose of this text is to synchronise the current version of the Verification Model with the changes made to the Committee Draft document, reflecting the comments received from the consultation. The DIS balloting of Part 1 is scheduled to take place after the next JPEG meeting, starting in October 2025.
The JPEG Committee is also planning wet-lab experiments to validate that the current specification of the JPEG DNA satisfies the conditions required for applications using the current state of the art in DNA synthesis and sequencing, such as biochemical constraints, decodability, coverage rate, and the impact of error-correcting code on compression performance.
The goal still remains to reach International Standard (IS) status for Part 1 during 2026.
JPEG AIC
Part 4 of JPEG AIC deals with objective quality metrics for fine-grained assessment of high-fidelity compressed images. As of the 108th JPEG Meeting, the Call for Proposals on Objective Image Quality Assessment (JPEG AIC-4), which was launched in April 2025, has already resulted in four non-mandatory registrations of interest that were reviewed. In this JPEG meeting, the technical details regarding the evaluation of proposed metrics and of the anchor metrics were developed and finalised. The results have been integrated in the document “Common Test Conditions on Objective Image Quality Assessment v2.0”, available on the JPEG website. Moreover, the procedures to generate the evaluation image dataset were defined and will be carried out by JPEG experts. The responses to the Call for Proposals for JPEG AIC-4 are expected in September 2025, together with their application for the evaluation dataset, with the goal of creating a Working Draft of a new standard on objective quality assessment of high-fidelity images by April 2026.
JPEG Pleno
At the 108th JPEG meeting, significant progress was reported in the ongoing JPEG Pleno Quality Assessment activity for light fields. A Call for Proposals (CfP) on objective quality metrics for light fields is currently underway, with submissions to be evaluated using a new evaluation dataset. The JPEG Committee also prepares the DIS of ISO/IEC 21794-7, which defines a standard for subjective quality assessment methodologies of light fields.
During the 108th JPEG meeting, the 2nd edition of ISO/IEC 21794-2 (“Plenoptic image coding system (JPEG Pleno) Part 2: Light field coding”) advanced to the Draft International Standard (DIS) stage. This 2nd edition includes the specification of a third coding mode entitled Slanted 4D Transform Mode and its associated profile.
The 108th JPEG meeting also saw the successful completion of the Final Draft International Standard balloting and the impending publication of ISO/IEC 21794-6: Learning-based Point Cloud Coding. This is the world’s first international standard on learning-based point cloud coding. The publication of Part 6 of ISO/IEC 21794 is a crucial and notable milestone in the representation of point clouds. The publication of the International Standard is expected to take place during the second half of 2025.
JPEG XS
The JPEG Committee advanced the AMD 1 of JPEG XS Part 1 to DIS stage; it allows the embedding of sub-frame metadata to JPEG XS as required by augmented and virtual reality applications currently discussed within VESA. Part 5 3rd edition, which is the reference software of JPEG XS, was also approved for publication as an International Standard.
JPEG RF
During the 108th JPEG meeting, the JPEG Radiance Fields exploration advanced its work on discussing the procedures for reliable evaluation of potential proposals in the future, with a particular focus on refining subjective evaluation protocols. A key outcome was the initiation of Exploration Study 5, aimed at investigating how different test camera trajectories influence human perception during subjective quality assessment. The Common Test Conditions (CTC) document was also reviewed, with the subjective testing component remaining provisional pending the outcome of this exploration study. In addition, existing use cases and requirements for JPEG RF were re-examined, setting the stage for the development of revised drafts of both the Use Cases and Requirements document and the CTC. New mandates include conducting Exploration Study 5, revising documents, and expanding stakeholder engagement.
Final Quote
“The release of the Committee Draft of JPEG XE standard for lossless coding of events at the 108th JPEG meeting is an impressive achievement and will accelerate deployment of products and applications relying on visual events.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.
From May 5th to 9th, 2025 Meta hosted the plenary meeting of the Video Quality Experts Group (VQEG) in their headquarters in Menlo Park (CA, United Sates). Around 150 participants registered to the meeting, coming from industry and academic institutions from 26 different countries worldwide.
The meeting was dedicated to present updates and discuss about topics related to the ongoing projects within VQEG. All the related information, minutes, and files from the meeting are available online in the VQEG meeting website, and video recordings of the meeting are available in Youtube.
All the topics mentioned bellow can be of interest for the SIGMM community working on quality assessment, but special attention can be devoted to the first activities of the group on Subjective and objective assessment of GenAI content (SOGAI) and to the advances on the contribution of the Immersive Media Group (IMG) group to the International Telecommunication Union (ITU) towards the Rec. ITU-T P.IXC for the evaluation of Quality of Experience (QoE) of immersive interactive communication systems.
Readers of these columns who are interested in VQEG’s ongoing projects are encouraged to subscribe to the corresponding mailing lists to stay informed and get involved.
Group picture of the meeting
Overview of VQEG Projects
Immersive Media Group (IMG)
The IMG group researches on the quality assessment of immersive media technologies. Currently, the main joint activity of the group is the development of a test plan to evaluate the QoE of immersive interactive communication systems, which is carried out in collaboration with ITU-T through the work item P.IXC. In this meeting, Pablo Pérez (Nokia XR Lab, Spain), Marta Orduna (Nokia XR Lab, Spain), and Jesús Gutiérrez (Universidad Politécnica de Madrid, Spain) presented the status of this recommendation and the next steps to be addressed towards a new contribution to ITU-T in its next meeting in September 2025. Also, in this meeting, it was decided that Marta Orduna will replace Pablo Pérez as vice-chair of IMG. In addition, the following presentations related to IMG topics were delivered:
Gareth Rendle (Bauhaus-Universität Weimar, Germany) and Felix Immohr (TU Ilmenau, Germany) presented a user study on the influence of audiovisual realism on communication behaviour in group-to-group telepresence, showing that avatar realism has positive effects on subjective ratings of perceived message understanding and group cohesion, and yields behavioural differences that indicate more interactivity and engagement. Also, Anton Lammert (Bauhaus-Universität Weimar, Germany), presented his work (in collaboration with Gareth and Felix) on a system designed for the comprehensive analysis of social Virtual Reality (VR) studies, called Immersive Study Analyzer (ISA), which records all user actions, speech, and the contextual environment.
Kamil Koniuch, Norbert Barczyk, Lucjan Janowski, and Mateusz Olszewski (AGH University of Krakow, Poland) presented their work on developing VR games based on circumplex model of group tasks for Quality of Experience (QoE) measurements.
The SAM group investigates on analysis methods both for the results of subjective experiments and for objective quality models and metrics. In relation with these topics, the following presentations were delivered during the meeting:
Dietmar Saupe (University of Konstanz, Germany) delivered two presentations. The first one covered the updates on the JPEG Assessment of Image Coding (AIC) project, especially the JPEG AIC-3, which is a standard (currently under review at ISO/IEC) for fine-grained subjective assessment of image quality in the high-fidelity range. The second one focused on the robustness and accuracy of Mean Opinion Scores (MOSs) with hard and soft outlier detection and proposed two new outlier detection methods with low complexity and excellent worst-case performance.
Mohsen Jenadeleh (University of Konstanz, Germany) and Jon Sneyers (Cloudinary, Belgium) presented their work on fine-grained High dynamic range (HDR) image quality assessment, introducing the AIC-HDR2025 dataset, comprising 100 test images generated from five sources with different encoding configurations and presenting the results of a subjective tests with it. In addition, Mohsen also presented his research on subjective visual quality assessment for high-fidelity learning-based image compression, which covered a comprehensive subjective visual quality assessment of JPEG AI-compressed images using the JPEG AIC-3 methodology, which quantifies differences in Just Noticeable Difference (JND) units.
Panagiotis Traganitis (Michigan State University, United States) presented a unified framework for learning from crowdsourced noisy labels, covering classical and modern methods for aggregating rankings while inferring annotator quality, as well as its application in ranking problems.
Joint Effort Group (JEG) – Hybrid
The group JEG addresses several areas of Video Quality Assessment (VQA), such as the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. The chair of this group, Enrico Masala (Politecnico di Torino, Italy) presented the updates on the latest activities of the group, including the current results of the Implementer’s Guide for Video Quality Metrics (IGVQM) project. In addition to this, the following presentations were delivered:
The ETG group focuses on various aspects of multimedia that, although they are not necessarily directly related to “video quality”, can indirectly impact the work carried out within VQEG and are not addressed by any of the existing VQEG groups. In particular, this group aims to provide a common platform for people to gather together and discuss new emerging topics, possible collaborations in the form of joint survey papers, funding proposals, etc. In this sense, the following topics were presented and discussed in the meeting:
Avinab Saha (UT Austin, United States) presented the dataset of perceived expression differences, FaceExpressions-70k, which contains 70,500 subjective expression comparisons rated by over 1,000 study participants obtained via crowdsourcing.
David Ronca (Meta Platforms Inc. United States) presented the Video Codec Acid Test (VCAT), which is a benchmarking tool for hardware and software decoders on Android devices.
Subjective and objective assessment of GenAI content (SOGAI)
The SOGAI group seeks to standardize both subjective testingmethodologies and objective metrics for assessing the quality of GenAI-generated content. In this first meeting of the group since its foundation, the following topics were presented and discussed:
Ryan Lei and Qi Cai (Meta Platforms Inc., United states) presented their work on learning from subjective evaluation of Super Resolution (SR) in production use cases at scale, which included extensive benchmarking tests and subjective evaluation with external crowdsource vendors.
Ioannis Katsavounidis, Qi Cai, Elias Kokkinis, Shankar Regunathan (Meta Platforms Inc., United States) presented their work on learning from synergistic subjective/objective evaluation of auto dubbing in production use cases.
Kamil Koniuch (AGH University of Krakow, Poland) presented his research on cognitive perspective on Absolute Category Rating (ACR) scale tests
Patrick Le Callet (Nantes Universite, France) presented his work, in collaboration with researchers from SJTU (China) on perceptual quality assessment of AI-generated omnidirectional images, including the annotated dataset called AIGCOIQA2024.
Multimedia Experience and Human Factors (MEHF)
The MEHF group focuses on the human factors influencing audiovisual and multimedia experiences, facilitating a comprehensive understanding of how human factors impact the perceived quality of multimedia content. In this meeting, the following presentations were given:
Dawid Juszka (AGH University of Krakow, Oland) presented his study on the impact of valence and arousal of video content on subjective QoE assessment scores.
Tomasz Konaszyński (AGH University of Krakow, Poland) presented his research on human and contextual bias in QoE, addressing the impact of testers’ psychophysical condition, declared at the beginning of the research process.
The 5GKPI group studies the relationship between key performance indicators of new 5G networks and QoE of video services on top of them. In this meeting, Pablo Pérez (Nokia XR Lab, Spain) and the rest of the team presented a first draft of the VQEG Whitepaper on QoE management in telecommunication networks, which shares insights and recommendations on actionable controls and performance metrics that the Content Application Providers (CAPs) and Network Service Providers (NSPs) can use to infer, measure and manage QoE.
In addition, Pablo Perez (Nokia
XR Lab, Spain), Marta Orduna (Nokia XR Lab, Spain), and Kamil Koniuch (AGH University of Krakow, Poland) presented design guidelines
and a proposal of a simple but practical QoE model for communication networks,
with a focus on 5G/6G compatibility.
Quality Assessment for Health Applications (QAH)
The QAH group is focused on the quality assessment of health applications. It addresses subjective evaluation, generation of datasets, development of objective metrics, and task-based approaches. In this meeting, Lumi Xia (INSA Rennes, France) presented her research on task-based medical image quality assessment by numerical observer.
Other updates
Apart from this, Ajit Ninan (Meta Platforms Inc., United States) delivered a keynote on rethinking visual quality for perceptual display; a panel was organized with Christos Bampis (Netflix, United States), Denise Noyes (Meta Platforms Inc., United States), and Yilin Wang (Google, United States) addressing what more is left to do on optimizing video quality for adaptive streaming applications, which was moderated by Narciso García (Universidad Politécnica de Madrid, Spain); and there was a co-located ITU-T Q19 interim meeting. In addition, although no progresses were presented in this meeting, the groups on No Reference Metrics (NORM) and on Quality Assessment for Computer Vision Applications (QACoViA) are still active.
Finally, as already announced in the VQEG website, the next VQEG plenary meeting will be online or hybrid online/in-person, probably in November or December 2025.
JPEG assesses responses to its Call for Proposals on Lossless Coding of Visual Events
The 107th JPEG meeting was held in Brussels, Belgium, from April 12 to 18, 2025. During this meeting, the JPEG Committee assessed the responses to its call for proposals on JPEG XE, an International Standard for lossless coding of visual events. JPEG XE is being developed under the auspices of three major standardisation organisations: ISO, IEC, and ITU. It will be the first codec developed by the JPEG committee targeting lossless representation and coding of visual events.
The JPEG Committee is also working on various standardisation projects, such as JPEG AI, which uses learning technology to achieve high compression, JPEG Trust, which sets standards to combat fake media and misinformation while rebuilding trust in multimedia, and JPEG DNA, which represents digital images using DNA sequences for long-term storage.
The following sections summarise the main highlights of the 107th JPEG meeting:
JPEG XE
JPEG AI
JPEG Trust
JPEG AIC
JPEG Pleno
JPEG DNA
JPEG XS
JPEG RF
JPEG XE
This initiative focuses on a new imaging modality produced by event-based visual sensors. This effort aims to establish a standard that efficiently represents and codes events, thereby enhancing interoperability in sensing, storage, and processing for machine vision and related applications.
As a response to the JPEG XE Final Call for Proposals on lossless coding of events, the JPEG Committee received five innovative proposals for consideration. Their evaluation indicated that two among them meet the stringent requirements of the constrained case, where resources, power, and complexity are severely limited. The remaining three proposals can cater to the unconstrained case. During the 107th JPEG meeting, the JPEG Committee launched a series of Core Experiments to define a path forward based on the received proposals as a starting point for the development of the JPEG XE standard.
To streamline the standardisation process, the JPEG Committee will proceed with the JPEG XE initiative in three distinct phases. Phase 1 will concentrate on lossless coding for the constrained case, while Phase 2 will address the unconstrained case. Both phases will commence simultaneously, although Phase 1 will follow a faster timeline to enable a timely publication of the first edition of the standard. The JPEG Committee recognises the urgent industry demand for a standardised solution for the constrained case, aiming to produce a Committee Draft by as early as July 2025. The third phase will focus on lossy compression of event sequences. The discussions and preparations will be initiated soon.
In a significant collaborative effort between ISO/IEC JTC 1/SC 29/WG1 and ITU-T SG21, the JPEG Committee will proceed to specify a joint JPEG XE standard. This partnership will ensure that JPEG XE becomes a shared standard under ISO, IEC, and ITU-T, reflecting their mutual commitment to developing standards for event-based systems.
Additionally, the JPEG Committee is actively discussing and exploring lossy coding of visual events, exploring future evaluation methods for such advanced technologies. Stakeholders interested in JPEG XE are encouraged to access public documents available at jpeg.org. Moreover, a joint Ad-hoc Group on event-based vision has been formed between ITU-T Q7/21 and ISO/IEC JTC1 SC29/WG1, paving the way for continued collaboration leading up to the 108th JPEG meeting.
JPEG AI
At the 107th JPEG meeting, JPEG AI discussions focused around conformance (JPEG AI Part 4), which has now advanced to the Draft International Standard (DIS) stage. The specification defines three conformance points — namely, the decoded residual tensor, the decoded latent space tensor (also referred to as feature space), and the decoded image. Strict conformance for the residual tensor is evaluated immediately after entropy decoding, while soft conformance for the latent space tensor is assessed after tensor decoding. The decoded image conformance is measured after converting the image to the output picture format, but before any post-processing filters are applied. Regarding the decoded image, two types have been defined: conformance Type A, which implies low tolerance, and conformance Type B, which allows for moderate tolerance.
During the 107th JPEG meeting, the results of several subjective quality assessment experiments were also presented and discussed, using different methodologies and for different test conditions, from low to very high qualities, including both SDR and HDR images. The results of these evaluations have shown that JPEG AI is highly competitive and, in many cases, outperforms existing state-of-the-art codecs such as VVC Intra, AVIF, and JPEG XL. A demonstration of an JPEG AI encoder running on a Huawei Mate50 Pro smartphone with a Qualcomm Snapdragon 8+ Gen1 chipset was also presented. This implementation supports tiling, high-resolution (4K) support, and a base profile with level 20. Finally, the implementation status of all mandatory and desirable JPEG AI requirements was discussed, assessing whether each requirement had been fully met, partially addressed, or remained unaddressed. This helped to clarify the current maturity of the standard and identify areas for further refinements.
JPEG Trust
Building on the publication of JPEG Trust (ISO/IEC 21617) Part 1 – Core Foundation in January 2025, the JPEG Committee approved a Draft International Standard (DIS) for a 2nd edition of Part 1 – Core Foundation during the 107th JPEG meeting. This Part 1 – Core Foundation 2nd edition incorporates the signalling of identity and intellectual property rights to address three particular challenges:
achieving transparency, through the signaling of content provenance
identifying content that has been generated either by humans, machines or AI systems, and
enabling interoperability, for example, by standardising machine-readable terms of use of intellectual property, especially AI-related rights reservations.
Additionally, the JPEG Committee is currently developing Part 2 – Trust Profiles Catalogue. Part 2 provides a catalogue of trust profile snippets that can be used either on their own or in combination for the purpose of constructing trust profiles, which can then be used for assessing the trustworthiness of media assets in given usage scenarios. The Trust Profiles Catalogue also defines a collection of conformance points, which enables interoperability across usage scenarios through the use of associated trust profiles.
The Committee continues to develop JPEG Trust Part 3 – Media asset watermarking to build out additional requirements for identified use cases, including the emerging need to identify AIGC content.
Finally, during the 107th meeting, the JPEG Committee initiated a Part 4 – Reference software, which will provide reference implementations of JPEG Trust to which implementers can refer to in developing trust solutions based on the JPEG Trust framework.
JPEG AIC
The JPEG AIC Part 3 standard (ISO/IEC CD 29170-3), has received a revised title “Information technology — JPEG AIC Assessment of image coding — Part 3: Subjective quality assessment of high-fidelity images”. At the 107th JPEG meeting, the results of the last Core Experiments for the standard and the comments on the Committee Draft of the standard were addressed. The draft text was thoroughly revised and clarified, and has now advanced to the Draft International Standard (DIS) stage.
Furthermore, Part 4 of JPEG AIC deals with objective quality metrics, also of high-fidelity images, and at the 107th JPEG meeting, the technical details regarding anchor metrics as well as the testing and evaluation of proposed methods were discussed and finalised. The results have been compiled in the document “Common Test Conditions on Objective Image Quality Assessment”, available on the JPEG website. Moreover, the corresponding Final Call for Proposals on Objective Image Quality Assessment (AIC-4) has been issued. Proposals are expected at the end of Summer 2025. The first Working Draft for Objective Image Quality Assessment (AIC-4) is planned for April 2026.
JPEG Pleno
The JPEG Pleno Light Field activity discussed the DoCR for the submitted Committee Draft (CD) of the 2nd edition of ISO/IEC 21794-2 (“Plenoptic image coding system (JPEG Pleno) Part 2: Light field coding”). This 2nd edition integrates AMD1 of ISO/IEC 21794-2 (“Profiles and levels for JPEG Pleno Light Field Coding”) and includes the specification of a third coding mode entitled Slanted 4D Transform Mode and its associated profile. It is expected that at the 108th JPEG meeting this new edition will advance to the Draft International Standard (DIS) stage.
Software tools have been created and tested to be added as Common Test Condition Tools to a reference software implementation for the standardized technologies within the JPEG Pleno framework, including the JPEG Pleno Part 2 (ISO/IEC 21794-2).
In the framework of the ongoing standardisation effort on quality assessment methodologies for light fields, significant progress was achieved during the 107th JPEG meeting. The JPEG Committee finalised the Committee Draft (CD) of the forthcoming standard ISO/IEC 21794-7 entitled JPEG Pleno Quality Assessment – Light Fields, representing an important step toward the establishment of reliable tools for evaluating the perceptual quality of light fields. This CD incorporates recent refinements to the subjective light field assessment framework and integrates insights from the latest core experiments.
The Committee also approved the Final Call for Proposals (CfP) on Objective Metrics for JPEG Pleno Quality Assessment – Light Fields. This initiative invites proposals of novel objective metrics capable of accurately predicting perceived quality of compressed light field content. The detailed submission timeline and required proposal components are outlined in the released final CfP document. To support this process, updated versions of the Use Cases and Requirements (v6.0) and Common Test Conditions (v2.0) related to this CfP were reviewed and made available. Moreover, several task forces have been established to address key proposal elements, including dataset preparation, codec configuration, objective metric evaluation, and the subjective experiments.
At this meeting, ISO/IEC 21794-6 (“Plenoptic image coding system (JPEG Pleno) Part 6: Learning-based point cloud coding”) progressed to the balloting of the Final Draft International Standard (FDIS) stage. Balloting will end on the 12th of June 2025 with the publication of the International Standard expected for August 2025.
The JPEG Committee held a workshop on Future Challenges in Compression of Holograms for XR Applications organised on April 16th, covering major applications from holographic cameras to holographic displays. The 2nd workshop for Future Challenges in Compression of Holograms for Metrology Applications is planned for July.
JPEG DNA
The JPEG Committee continues to develop JPEG DNA, an ambitious initiative to standardize the representation of digital images using DNA sequences for long-term storage. Following a Call for Proposals launched at its 99th JPEG meeting, a Verification Model was established during the 102nd JPEG meeting, then refined through core experiments that led to the first Working Draft at the 103rd JPEG meeting.
New JPEG DNA logo.
At its 105th JPEG meeting, JPEG DNA was officially approved as a new ISO/IEC project (ISO/IEC 25508), structured into four parts: Core Coding System, Profiles and Levels, Reference Software, and Conformance. The Committee Draft (CD) of Part 1 was produced at the 106th JPEG meeting.
During the 107th JPEG meeting, the JPEG Committee reviewed the comments received on the CD of JPEG DNA standard and prepared a Disposition of Comments Report (DoCR). The goal remains to reach International Standard (IS) status for Part 1 by April 2026.
On this occasion, the official JPEG DNA logo was also unveiled, marking a new milestone in the visibility and identity of the project.
JPEG XS
The development of the third edition of the JPEG XS standard is nearing its final stages, marking significant progress for the standardisation of high-performance video coding. Notably, Part 4, focusing on conformance testing, has been officially accepted by ISO and IEC for publication. Meanwhile, Part 5, which provides reference software, is presently at Draft International Standard (DIS) ballot stage.
In a move that underscores the commitment to accessibility and innovation in media technology, both Part 4 and Part 5 will be made publicly available as free standards. This decision is expected to facilitate widespread adoption and integration of JPEG XS in relevant industries and applications.
Looking to the future, the JPEG Committee is exploring enhancements to the JPEG XS standard, particularly in supporting a master-proxy stream feature. This feature enables a high-fidelity master video stream to be accompanied by a lower-resolution proxy stream, ensuring minimal overhead. Such functionalities are crucial in optimising broadcast and content production workflows.
JPEG RF
The JPEG RF activity issued the proceedings of the Joint JPEG/MPEG Workshop on Radiance Fields which was held on the 31st of January and featured world-renowned speakers discussing Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) from the perspective of both academia, industry, and standardisation groups. Video recordings and all related material were made publicly available on the JPEG website. Moreover, an improved version of the JPEG RF State of the Art and Challenges document was proposed, including an updated review of coding techniques for radiance fields as well as newly identified use cases and requirements. The group also defined an exploration study to investigate protocols for subjective and objective quality assessment, which are considered to be crucial to advance this activity towards a coding standard for radiance fields.
Final Quote
“A cost-effective and interoperable event-based vision ecosystem requires an efficient coding standard. The JPEG Committee embraces this new challenge by initiating a new standardisation project to achieve this objective.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.
The last plenary meeting of the Video Quality Experts Group (VQEG) was held online by the Institute for Telecommunication Sciences (ITS) of the National Telecommunications and Information Adminsitration (NTIA) from November 18th to 22nd, 2024. The meeting was attended by 70 participants from industry and academic institutions from 17 different countries worldwide.
The meeting was dedicated to present updates and discuss about topics related to the ongoing projects within VQEG. All the related information, minutes, and files from the meeting are available online in the VQEG meeting website, and video recordings of the meeting are available in Youtube.
All the topics mentioned bellow can be of interest for the SIGMM community working on quality assessment, but special attention can be devoted to the creation of a new group focused on Subjective and objective assessment of GenAI content (SOGAI) and to the recent contribution of the Immersive Media Group (IMG) group to the International Telecommunication Union (ITU) towards the Rec. ITU-T P.IXC for the evaluation of Quality of Experience (QoE) of immersive interactive communication systems. Finally, it is worth noting that Ioannis Katsavounidis (Meta, US) joins Kjell Brunnström (RISE, Sweden) as co-chairs of VQEG, substituting Margaret Pinson (NTIA(ITS).
Readers of these columns interested in the ongoing projects of VQEG are encouraged to subscribe to their corresponding reflectors to follow the activities going on and to get involved in them.
Group picture of the online meeting
Overview of VQEG Projects
Audiovisual HD (AVHD)
The AVHD group works on developing and validating subjective and objective methods to analyze commonly available video systems. In this meeting, Lucjan Janowski (AGH University of Krakow, Poland) and Margaret Pinson (NTIA/ITS) presented their proposal to fix wording related to an experiment realism and validity, based on the experience in the psychology domain that addresses the important concept of describing how much results from lab experiment can be used outside a laboratory.
In addition, given that there are no current joint activities of the group, the AVHD project will become dormant, with the possibility to be activated when new activities are planned.
Statistical Analysis Methods (SAM)
The group SAM investigates on analysis methods both for the results of subjective experiments and for objective quality models and metrics. In addition to a discussion on the future activities of the group lead by its chairs Ioannis Katsavounidis (Meta, US), Zhi Li (Netflix, US), and Lucjan Janowski (AGH University of Krakow, Poland), the following presentations were delivered during the meeting:
Dietmar Saupe (University of Konstanz, Germany) delivered two presentations. The first one focused on maximum entropy and quantized metric models for absolute category ratings, based on the investigation of families of multinomial probability distributions parameterized by mean and variance that are used to fit the empirical rating distributions. To validate the proposed models, a comparison of the performance of these models and the state-of-the-art (given by the generalized score distribution) was done on two large datasets (KonIQ-10k and VQEG HDTV). The second presentation proposed a fine-grained subjective visual quality assessment method for high-fidelity compressed images, which is based on the current activities of the JPEG standardization project Advanced Image Coding (AIC). In addition to the assessment method, a dataset of high-quality compressed images and their corresponding crowdsourced visual quality ratings was presented.
The ETG group focuses on various aspects of multimedia that, although they are not necessarily directly related to “video quality”, can indirectly impact the work carried out within VQEG and are not addressed by any of the existing VQEG groups. In particular, this group aims to provide a common platform for people to gather together and discuss new emerging topics, possible collaborations in the form of joint survey papers, funding proposals, etc. During this meeting, Abhijay Ghildyal (Portland State University, US), Saman Zadtootaghaj (Sony Interactive Entertainment, Germany), and Nabajeet Barman (Sony Interactive Entertainment, UK) presented their work on quality assessment of AI generated content and AI enhanced content. In addition, Matthias Wien (RWTH Aachen University, Germany) presented the approach, design and methodology for the evaluation of AI-based Point Cloud Compression in the corresponding Call for Proposals in MPEG. Finally, Abhijay Ghildyal (Portland State University, US) presented his work on how foundation models boost low-level perceptual similarity metrics, investigating the potential of using intermediate features or activations from these models for low-level image quality assessment, and showing that such metrics can outperform existing ones without requiring additional training.
Joint Effort Group (JEG) – Hybrid
The group JEG addresses several areas of Video Quality Assessment (VQA), such as the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. In addition, the group includes the VQEG project Implementer’s Guide for Video Quality Metrics (IGVQM). The chair of this group, Enrico Masala (Politecnico di Torino, Italy) presented the updates on the latest activities going on, including the plans for experiments within the IGVMQ project to get feedback from other VQEG members.
The IMG group researches on the quality assessment of immersive media technologies. Currently, the main joint activity of the group is the development of a test plan to evaluate the QoE of immersive interactive communication systems, which is carried out in collaboration with ITU-T through the work item P.IXC. In this meeting, Pablo Pérez (Nokia XR Lab, Spain), Marta Orduna (Nokia XR Lab, Spain), and Jesús Gutiérrez (Universidad Politécnica de Madrid, Spain) presented the status of the Rec. ITU-T P.IXC that the group was writing based on the joint test plan developed in the last months and that was submitted to ITU and discussed in its meeting in January 2025.
Also, in relation with this test plan, Lucjan Janowski (AGH University of Krakow, Poland) and Margaret Pinson (NTIA/ITS) presented an overview of ITU recommendations for interactive experiments that can be used in the IMG context.
In relation with other topics addressed by IMG, Emin Zerman (Mid Sweden University, Sweden) delivered two presentations. The first one presented the BASICS dataset, which contains a representative range of nearly 1500 point clouds assessed by thousands of participants to enable robust quality assessments for 3D scenes. The approach involved a careful selection of diverse source scenes and the application of specific “distortions” to simulate real-world compression impacts, including traditional and learning-based methods. The second presentation described a spherical light field database (SLFDB) for immersive telecommunication and telepresence applications, which comprises 60-view omnidirectional captures across 20 scenes, providing a comprehensive basis for telepresence research.
Quality Assessment for Computer Vision Applications (QACoViA)
The group QACoViA addresses the study the visual quality requirements for computer vision methods, where the final user is an algorithm. In this meeting, Mehr un Nisa (AGH University of Krakow, Poland) presented a comparative performance analysis of deep learning architectures in underwater image classification. In particular, the study assessed the performance of the VGG-16, EfficientNetB0, and SimCLR models in classifying 5,000 underwater images. The results reveal each model’s strengths and weaknesses, providing insights for future improvements in underwater image analysis
5G Key Performance Indicators (5GKPI)
The 5GKPI group studies relationship between key performance indicators of new 5G networks and QoE of video services on top of them. In this meeting, Pablo Perez (Nokia XR Lab, Spain) and Francois Blouin (Meta, US) and others presented the progress on the 5G-KPI White Paper, sharing some of the ideas on QoS-to-QoE modeling that the group has been working on to get feedback from other VQEG members.
Multimedia Experience and Human Factors (MEHF)
The MEHF group focuses on the human factors influencing audiovisual and multimedia experiences, facilitating a comprehensive understanding of how human factors impact the perceived quality of multimedia content. In this meeting, Dominika Wanat (AGH University of Krakow, Poland) presented MANIANA (Mobile Appliance for Network Interrupting, Analysis & Notorious Annoyance), an IoT device for testing QoS and QoE applications in home network conditions that is made based on Raspberry Pi 4 minicomputer and open source solutions and allows safe, robust, and universal testing applications.
Other updates
Apart from this, it is worth noting that, although no progresses were presented in this meeting, the Quality Assessment for Health Applications (QAH) group is still active and focused on the quality assessment of health applications. It addresses subjective evaluation, generation of datasets, development of objective metrics, and task-based approaches.
In addition, the Computer Generated Imagery (CGI) project became dormant, since it recent activities can be covered by other existing groups such as ETG and SOGAI.
Also, in this meeting Margaret
Pinson (NTIA/ITS) stepped down as co-chair of VQEG and Ioannis Katsavounidis (Meta,
US) is the new co-chair together with Kjell Brunnström (RISE, Sweden).
The 106th JPEG meeting was held online from January 6 to 10, 2025. During this meeting, the first image coding standard based on machine learning technology, JPEG AI, was sent for publication as an International Standard. This is a major achievement as it leverages JPEG with major trends in imaging technologies and provides an efficient standardized solution for image coding, with nearly 30% improvement over the most advanced solutions in the state-of-the-art. JPEG AI has been developed under the auspices of three major standardization organizations: ISO, IEC and ITU.
The following sections summarize the main highlights of the 106th JPEG meeting.
JPEG AI – the first International Standard for end-to-end learning-based image coding
JPEG Trust – a framework for establishing trust in digital media
JPEG XE – lossless coding of event-based vision
JPEG AIC – assessment of the visual quality of high-fidelity images
JPEG Pleno – standard framework for representing plenoptic data
At its 106th meeting, the JPEG Committee approved publication of the text of JPEG AI, the first International Standard for end-to-end learning-based image coding. This achievement marks a significant milestone in the field of digital imaging and compression, offering a new approach for efficient, high-quality image storage and transmission.
The scope of JPEG AI is the creation of a learning-based image coding standard offering a single-stream, compact compressed domain representation, targeting both human visualization with significant compression efficiency improvement over image coding standards in common use at equivalent subjective quality, and effective performance for image processing and computer vision tasks, with the goal of supporting a royalty-free baseline.
The JPEG AI standard leverages deep learning algorithms that learn from vast amounts of image data the best way to compress images, allowing it to adapt to a wide range of content and offering enhanced perceptual visual quality and faster compression capabilities. The key benefits of JPEG AI are:
Superior compression efficiency: JPEG AI offers higher compression efficiency, leading to reduced storage requirements and faster transmission times compared to other state-of-the-art image coding solutions.
Implementation-friendly encoding and decoding: JPEG AI codec supports a wide array of devices with different characteristics, including mobile platforms, through optimized encoding and decoding processes.
Compressed-domain image processing and computer vision tasks: JPEG AI’s architecture enables multi-purpose optimization for both human visualization and machine-driven tasks.
By creating the JPEG AI International Standard, the JPEG Committee has opened the door to more efficient and versatile image compression solutions that will benefit industries ranging from digital media and telecommunications to cloud storage and visual surveillance. This standard provides a framework for image compression in the face of rapidly growing visual data demands, enabling more efficient storage, faster transmission, and higher-quality visual experiences.
As JPEG AI establishes itself as the new benchmark in image compression, its potential to reshape the future of digital imaging is undeniable, promising groundbreaking advancements in efficiency and versatility.
JPEG Trust
The first part of JPEG Trust, the “Core Foundation” (ISO/IEC 21617-1) was approved for publication in late 2024 and is in the process of being published as an International Standard by ISO. The JPEG Trust standard provides a proactive approach to trust management by defining a framework for establishing trust in digital media. The Core Foundation specifies three main pillars: annotating provenance, extracting and evaluating Trust Indicators, and handling privacy and security concerns.
At the 106th JPEG Meeting, the JPEG Committee produced a Committee Draft (CD) for a 2nd edition of the Core Foundation. The 2nd edition further extends and improves the standard with new functionalities, including important specifications for Intellectual Property Rights (IPR) management such as authorship and rights declarations. In addition, this new edition will align the specification with the upcoming ISO 22144 standard, which is a standard for Content Credentials based on the C2PA 2.1 specification.
In parallel with the work on the 2nd edition of the Core Foundation (Part 1), the JPEG Committee continues to work on Part 2 and Part 3, “Trust Profiles Catalogue” and “Media Asset Watermarking”, respectively.
JPEG XE
The JPEG XE initiative is currently awaiting the conclusion of the open Final Call for Proposals on lossless coding of events, which will close on March 31, 2025. This initiative focuses on a new and emerging image modality introduced by event-based visual sensors. JPEG aims to establish a standard that efficiently represents events, facilitating interoperability in sensing, storage, and processing for machine vision and other relevant applications.
To ensure the success of this emerging standard, the JPEG Committee has reached out to other standardization organizations. The JPEG Committee, already a collaborative group under ISO/IEC and ITU-T, is engaged in discussions with ITU-T’s SG21 to develop JPEG XE as a joint standard. This collaboration aligns perfectly with the objectives of both organizations, as SG21 is also dedicated to creating standards around event-based systems.
Additionally, the JPEG Committee continues its discussions and research on lossy coding of events, focusing on future evaluation methods for these technologies. Those interested in the JPEG XE initiative are encouraged to review the public documents available at jpeg.org. Furthermore, the Ad-hoc Group on event-based vision has been re-established to advance work leading up to the 107th JPEG meeting in Brussels. To stay informed about this activity, please join the event-based vision Ad-hoc Group mailing list.
JPEG AIC
Part 3 of JPEG AIC (AIC-3) defines a methodology for subjective assessment of the visual quality of high-fidelity images, and the forthcoming Part 4 of JPEG AIC deals with objective quality metrics, also of high-fidelity images. In this JPEG meeting, the document on Use Cases and Requirements that refers to both AIC-3 and AIC-4, was revised. It defines the scope of both anticipated standards and sets it into relation to the previous specifications for AIC-1 and AIC-2. While AIC-1 covers a broad quality range including low quality, it does not allow fine-grained quality assessment in the high-fidelity range. AIC-2 entails methods that determine a threshold separating visually lossless coded images from lossy ones. The quality range addressed by AIC-3 and AIC-4 is an interval that contains the AIC-2 threshold, reaching from high quality up to the numerically lossless case. The JPEG Committee is preparing the DIS text for AIC-3 and has launched the Second Draft Call for Proposals on Objective Image Quality Assessment (AIC-4) which includes the timeline for this JPEG activity. Proposals are expected at the end of Summer 2025. The first Working Draft for Objective Image Quality Assessment (AIC-4) is planned for April 2026.
JPEG Pleno
The 106th meeting marked a major milestone for the JPEG Pleno Point Cloud activity with the release of the Final Draft International Standard (FDIS) for ISO/IEC DIS 21794-6:2024 Information technology — Plenoptic image coding system (JPEG Pleno) — Part 6: Learning-based point cloud coding. Point cloud data supports a wide range of applications, including computer-aided manufacturing, entertainment, cultural heritage preservation, scientific research, and advanced sensing and analysis. The JPEG Committee considers this learning-based standard to be a powerful and efficient solution for point cloud coding. This standard is applicable to interactive human visualization, with competitive compression efficiency compared to state-of-the-art point cloud coding solutions in common use, and effective performance for 3D processing and machine-related computer vision tasks and has the goal of supporting a royalty-free baseline. This standard specifies a codestream format for storage of point clouds. The standard also provides information on the coding tools and defines extensions to the JPEG Pleno File Format and associated metadata descriptors that are specific to point cloud modalities. With the release of the FDIS at the 106th JPEG meeting, it is expected that the International Standard will be published in July 2025.
The JPEG Pleno Light Field activity discussed the Committee Draft (CD) of the 2nd edition of ISO/IEC 21794-2 (“Plenoptic image coding system (JPEG Pleno) Part 2: Light field coding”) that integrates AMD1 of ISO/IEC 21794-2 (“Profiles and levels for JPEG Pleno Light Field Coding”) and includes the specification of a third coding mode entitled Slanted 4D Transform Mode and its associated profile.
A White Paper on JPEG Pleno Light Field Coding has been released, providing the architecture of the current two JPEG Pleno Part-2 coding modes, as well as the coding architecture of its third coding mode, to be included in the 2nd edition of the standard. The White Paper also presents applications and use cases and briefly describes the JPEG Pleno Model (JPLM). The JPLM provides a reference implementation for the standardized technologies within the JPEG Pleno framework, including the JPEG Pleno Part 2 (ISO/IEC 21794-2). Improvements to JPLM have been implemented and tested, including a user-friendly interface that relies on well-documented JSON configuration files.
During the JPEG meeting week, significant progress was made in the JPEG Pleno Quality Assessment activity, which focuses on developing methodologies for subjective and objective quality assessment of plenoptic modalities. A Working Draft on subjective quality assessment, incorporating insights from extensive experiments conducted by JPEG experts, was discussed.
JPEG Systems
The reference software of JPEG Systems (ISO/IEC 19566-10) is now published as an International Standard and is available as open source on the JPEG website. This first edition implements the JPEG Universal Metadata Box Format (ISO/IEC 19566-5) and provides a reference dataset. An extended version of the reference software with support for additional Parts of JPEG Systems is currently under development. This new edition will add support for JPEG Privacy and Security, JPEG 360, JLINK, and JPEG Snack.
At its 106th meeting, the JPEG Committee also initiated a 3rd edition of the JPEG Universal Metadata Box Format (ISO/IEC 19566-5). This new edition will integrate the latest amendment that allows JUMBF boxes to exist as stand-alone files and adds support for payload compression. In addition, the 3rd edition will add a JUMBF validator and a scheme for JUMBF box retainment while transcoding from one JPEG format to another.
JPEG DNA
JPEG DNA is an initiative aimed at developing a standard capable of representing bi-level, continuous-tone grayscale, continuous-tone color, or multichannel digital samples in a format using nucleotide sequences to support DNA storage. The JPEG DNA Verification Model (VM) was created during the 102nd JPEG meeting based on performance assessments and descriptive analyses of the submitted solutions to a Call for Proposals, issued at the 99th JPEG meeting. Since then, several core experiments have been continuously conducted to validate and enhance this Verification Model. Such efforts led to the creation of the first Working Draft of JPEG DNA during the 103rd JPEG meeting. At the 105th JPEG meeting, the JPEG Committee officially introduced a New Work Item Proposal (NWIP) for JPEG DNA, elevating it to an officially sanctioned ISO/IEC Project. The proposal defined JPEG DNA as a multi-part standard: Part 1: Core Coding System, Part 2: Profiles and Levels, Part 3: Reference Software, Part 4: Conformance.
The JPEG Committee is targeting the International Standard (IS) stage for Part 1 by April 2026.
At its 106th meeting, the JPEG Committee made significant progress toward achieving this goal. Efforts were focused on producing the Committee Draft (CD) for Part 1, a crucial milestone in the standardization process. Additionally, JPEG DNA Part 1 has now been assigned the Project identification ISO/IEC 25508-01.
JPEG XS
The JPEG XS activity focussed primarily on finalization of the third editions of JPEG XS Part 4 – Conformance testing, and Part 5 – Reference software. Recall that the 3rd editions of Parts 1, 2, and 3 are published and available for purchase. Part 4 is now at FDIS stage and is expected to be approved as International Standard around April of 2025. For Part 5, work on the reference software was completed to implement TDC profile encoding functionality, making it feature complete and fully compliant with the 3rd edition of JPEG XS. As such, Part 5 is ready to be balloted as a DIS. However, work on the reference software will continue to bring further improvements. The reference software and Part 5 will become publicly and freely available, similar to Part 4.
JPEG XL
The second edition of Part 3 (conformance testing) of JPEG XL proceeded to publication as International Standard. Regarding Part 2 (file format), a third edition has been prepared, and it reached the DIS stage. The new edition will include support for embedding gain maps in JPEG XL files.
JPEG 2000
The JPEG Committee has begun work on adding support for the HTTP/3 transport to the JPIP protocol, which allows the interactive browsing of JPEG 2000 images over networks. HTTP/3 is the third major version of the Hypertext Transfer Protocol (HTTP) and allows for significantly lower latency operations compared to earlier versions. A Committee Draft ballot of the 3rd edition of the JPIP specifications (Rec. ITU-T T.808 | ISO/IEC 15444-9) is expected to start shortly, with the project completed sometime in 2026.
Separately, the 3rd edition of Rec. ITU-T T.815 | ISO/IEC 15444-16, which specifies the carriage of JPEG 2000 imagery in the ISOBMFF and HEIF file formats, has been approved for publication. This new edition adds support for more flexible color signaling and JPEG 2000 video tracks.
JPEG RF
The JPEG RF exploration issued at this meeting the “JPEG Radiance Fields State of the Art and Challenges”, a public document that describes the latest developments on Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) technologies and defines a scope for the activity focusing on the creation of a coding standard. The JPEG Committee is also organizing a workshop on Radiance Fields jointly with MPEG, which will take place on January 31st, featuring key experts in the field presenting various aspects of this exciting new emerging technology.
Final Quote
“The newly approved JPEG AI, developed under the auspices of ISO, IEC and ITU, is the first image coding standard based on machine learning and is a breakthrough in image coding providing 30% compression gains over the most advanced solutions in state-of-the-art.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.
The 150th MPEG meeting was held online from 31 March to 04 April 2025. The official press release can be found here. This column provides the following highlights:
Requirements: MPEG-AI strategy and white paper on MPEG technologies for metaverse
JVET: Draft Joint Call for Evidence on video compression with capability beyond Versatile Video Coding (VVC)
Video: Gaussian splat coding and video coding for machines
Audio: Audio coding for machines
3DGH: 3D Gaussian splat coding
MPEG-AI Strategy
The MPEG-AI strategy envisions a future where AI and neural networks are deeply integrated into multimedia coding and processing, enabling transformative improvements in how digital content is created, compressed, analyzed, and delivered. By positioning AI at the core of multimedia systems, MPEG-AI seeks to enhance both content representation and intelligent analysis. This approach supports applications ranging from adaptive streaming and immersive media to machine-centric use cases like autonomous vehicles and smart cities. AI is employed to optimize coding efficiency, generate intelligent descriptors, and facilitate seamless interaction between content and AI systems. The strategy builds on foundational standards such as ISO/IEC 15938-13 (CDVS), 15938-15 (CDVA), and 15938-17 (Neural Network Coding), which collectively laid the groundwork for integrating AI into multimedia frameworks.
Currently, MPEG is developing a family of standards under the ISO/IEC 23888 series that includes a vision document, machine-oriented video coding, and encoder optimization for AI analysis. Future work focuses on feature coding for machines and AI-based point cloud compression to support high-efficiency 3D and visual data handling. These efforts reflect a paradigm shift from human-centric media consumption to systems that also serve intelligent machine agents. MPEG-AI maintains compatibility with traditional media processing while enabling scalable, secure, and privacy-conscious AI deployments. Through this initiative, MPEG aims to define the future of multimedia as an intelligent, adaptable ecosystem capable of supporting complex, real-time, and immersive digital experiences.
MPEG White Paper on Metaverse Technologies
The MPEG white paper on metaverse technologies (cf. MPEG white papers) outlines the pivotal role of MPEG standards in enabling immersive, interoperable, and high-quality virtual experiences that define the emerging metaverse. It identifies core metaverse parameters – real-time operation, 3D experience, interactivity, persistence, and social engagement – and maps them to MPEG’s longstanding and evolving technical contributions. From early efforts like MPEG-4’s Binary Format for Scenes (BIFS) and Animation Framework eXtension (AFX) to MPEG-V’s sensory integration, and the advanced MPEG-I suite, these standards underpin critical features such as scene representation, dynamic 3D asset compression, immersive audio, avatar animation, and real-time streaming. Key technologies like point cloud compression (V-PCC, G-PCC), immersive video (MIV), and dynamic mesh coding (V-DMC) demonstrate MPEG’s capacity to support realistic, responsive, and adaptive virtual environments. Recent efforts include neural network compression for learned scene representations (e.g., NeRFs), haptic coding formats, and scene description enhancements, all geared toward richer user engagement and broader device interoperability.
The document highlights five major metaverse use cases – virtual environments, immersive entertainment, virtual commerce, remote collaboration, and digital twins – all supported by MPEG innovations. It emphasizes the foundational role of MPEG-I standards (e.g., Parts 12, 14, 29, 39) for synchronizing immersive content, representing avatars, and orchestrating complex 3D scenes across platforms. Future challenges identified include ensuring interoperability across systems, advancing compression methods for AI-assisted scenarios, and embedding security and privacy protections. With decades of multimedia expertise and a future-focused standards roadmap, MPEG positions itself as a key enabler of the metaverse – ensuring that emerging virtual ecosystems are scalable, immersive, and universally accessible.
The MPEG white paper on metaverse technologies highlights several research opportunities, including efficient compression of dynamic 3D content (e.g., point clouds, meshes, neural representations), synchronization of immersive audio and haptics, real-time adaptive streaming, and scene orchestration. It also points to challenges in standardizing interoperable avatar formats, AI-enhanced media representation, and ensuring seamless user experiences across devices. Additional research directions include neural network compression, cross-platform media rendering, and developing perceptual metrics for immersive Quality of Experience (QoE).
Draft Joint Call for Evidence (CfE) on Video Compression beyond Versatile Video Coding (VVC)
The latest JVET AHG report on ECM software development (AHG6), documented as JVET-AL0006, shows promising results. Specifically, in the “Overall” row and “Y” column, there is a 27.06% improvement in coding efficiency compared to VVC, as shown in the figure below.
The Draft Joint Call for Evidence (CfE) on video compression beyond VVC (Versatile Video Coding), identified as document JVET-AL2026 | N 355, is being developed to explore new advancements in video compression. The CfE seeks evidence in three main areas: (a) improved compression efficiency and associated trade-offs, (b) encoding under runtime constraints, and (c) enhanced performance in additional functionalities. This initiative aims to evaluate whether new techniques can significantly outperform the current state-of-the-art VVC standard in both compression and practical deployment aspects.
The visual testing will be carried out across seven categories, including various combinations of resolution, dynamic range, and use cases: SDR Random Access UHD/4K, SDR Random Access HD, SDR Low Bitrate HD, HDR Random Access 4K, HDR Random Access Cropped 8K, Gaming Low Bitrate HD, and UGC (User-Generated Content) Random Access HD. Sequences and rate points for testing have already been defined and agreed upon. For a fair comparison, rate-matched anchors using VTM (VVC Test Model) and ECM (Enhanced Compression Model) will be generated, with new configurations to enable reduced run-time evaluations. A dry-run of the visual tests is planned during the upcoming Daejeon meeting, with ECM and VTM as reference anchors, and the CfE welcomes additional submissions. Following this dry-run, the final Call for Evidence is expected to be issued in July, with responses due in October.
The Draft Joint Call for Evidence (CfE) on video compression beyond VVC invites research into next-generation video coding techniques that offer improved compression efficiency, reduced encoding complexity under runtime constraints, and enhanced functionalities such as scalability or perceptual quality. Key research aspects include optimizing the trade-off between bitrate and visual fidelity, developing fast encoding methods suitable for constrained devices, and advancing performance in emerging use cases like HDR, 8K, gaming, and user-generated content.
3D Gaussian Splat Coding
Gaussian splatting is a real-time radiance field rendering method that represents a scene using 3D Gaussians. Each Gaussian has parameters like position, scale, color, opacity, and orientation, and together they approximate how light interacts with surfaces in a scene. Instead of ray marching (as in NeRF), it renders images by splatting the Gaussians onto a 2D image plane and blending them using a rasterization pipeline, which is GPU-friendly and much faster. Developed by Kerbl et al. (2023) it is capable of real-time rendering (60+ fps) and outperforms previous NeRF-based methods in speed and visual quality. Gaussian splat coding refers to the compression and streaming of 3D Gaussian representations for efficient storage and transmission. It’s an active research area and under standardization consideration in MPEG.
MPEG technical requirements working group together with MPEG video working group started an exploration on Gaussian splat coding and the MPEG coding of 3D graphics and haptics (3DGH) working group addresses 3D Gaussian splat coding, respectively. Draft Gaussian splat coding use cases and requirements are available and various joint exploration experiments (JEEs) are conducted between meetings.
(3D) Gaussian splat coding is actively researched in academia, also in the context of streaming, e.g., like in “LapisGS: Layered Progressive 3D Gaussian Splatting for Adaptive Streaming” or “LTS: A DASH Streaming System for Dynamic Multi-Layer 3D Gaussian Splatting Scenes”. The research aspects of 3D Gaussian splat coding and streaming span a wide range of areas across computer graphics, compression, machine learning, and systems for real-time immersive media. In particular, on efficiently representing and transmitting Gaussian-based neural scene representations for real-time rendering. Key areas include compression of Gaussian parameters (position, scale, color, opacity), perceptual and geometry-aware optimizations, and neural compression techniques such as learned latent coding. Streaming challenges involve adaptive, view-dependent delivery, level-of-detail management, and low-latency rendering on edge or mobile devices. Additional research directions include standardizing file formats, integrating with scene graphs, and ensuring interoperability with existing 3D and immersive media frameworks.
MPEG Audio and Video Coding for Machines
The Call for Proposals on Audio Coding for Machines (ACoM), issued by the MPEG audio coding working group, aims to develop a standard for efficiently compressing audio, multi-dimensional signals (e.g., medical data), or extracted features for use in machine-driven applications. The standard targets use cases such as connected vehicles, audio surveillance, diagnostics, health monitoring, and smart cities, where vast data streams must be transmitted, stored, and processed with low latency and high fidelity. The ACoM system is designed in two phases: the first focusing on near-lossless compression of audio and metadata to facilitate training of machine learning models, and the second expanding to lossy compression of features optimized for specific applications. The goal is to support hybrid consumption – by machines and, where needed, humans – while ensuring interoperability, low delay, and efficient use of storage and bandwidth.
The CfP outlines technical requirements, submission guidelines, and evaluation metrics. Participants must provide decoders compatible with Linux/x86 systems, demonstrate performance through objective metrics like compression ratio, encoder/decoder runtime, and memory usage, and undergo a mandatory cross-checking process. Selected proposals will contribute to a reference model and working draft of the standard. Proponents must register by August 1, 2025, with submissions due in September, and evaluation taking place in October. The selection process emphasizes lossless reproduction, metadata fidelity, and significant improvements over a baseline codec, with a path to merge top-performing technologies into a unified solution for standardization.
Research aspects of Audio Coding for Machines (ACoM) include developing efficient compression techniques for audio and multi-dimensional data that preserve key features for machine learning tasks, optimizing encoding for low-latency and resource-constrained environments, and designing hybrid formats suitable for both machine and human consumption. Additional research areas involve creating interoperable feature representations, enhancing metadata handling for context-aware processing, evaluating trade-offs between lossless and lossy compression, and integrating machine-optimized codecs into real-world applications like surveillance, diagnostics, and smart systems.
The MPEG video coding working group approved the committee draft (CD) for ISO/IEC 23888-2 video coding for machines (VCM). VCM aims to encode visual content in a way that maximizes machine task performance, such as computer vision, scene understanding, autonomous driving, smart surveillance, robotics and IoT. Instead of preserving photorealistic quality, VCM seeks to retain features and structures important for machines, possibly at much lower bitrates than traditional video codecs. The CD introduces several new tools and enhancements aimed at improving machine-centric video processing efficiency. These include updates to spatial resampling, such as the signaling of the inner decoded picture size to better support scalable inference. For temporal resampling, the CD enables adaptive resampling ratios and introduces pre- and post-filters within the temporal resampler to maintain task-relevant temporal features. In the filtering domain, it adopts bit depth truncation techniques – integrating bit depth shifting, luma enhancement, and chroma reconstruction – to optimize both signaling efficiency and cross-platform interoperability. Luma enhancement is further refined through an integer-based implementation for luma distribution parameters, while chroma reconstruction is stabilized across different hardware platforms. Additionally, the CD proposes removing the neural network-based in-loop filter (NNLF) to simplify the pipeline. Finally, in terms of bitstream structure, it adopts a flattened structure with new signaling methods to support efficient random access and better coordination with system layers, aligning with the low-latency, high-accuracy needs of machine-driven applications.
Research in VCM focuses on optimizing video representation for downstream machine tasks, exploring task-driven compression techniques that prioritize inference accuracy over perceptual quality. Key areas include joint video and feature coding, adaptive resampling methods tailored to machine perception, learning-based filter design, and bitstream structuring for efficient decoding and random access. Other important directions involve balancing bitrate and task accuracy, enhancing robustness across platforms, and integrating machine-in-the-loop optimization to co-design codecs with AI inference pipelines.
Concluding Remarks
The 150th MPEG meeting marks significant progress across AI-enhanced media, immersive technologies, and machine-oriented coding. With ongoing work on MPEG-AI, metaverse standards, next-gen video compression, Gaussian splat representation, and machine-friendly audio and video coding, MPEG continues to shape the future of interoperable, intelligent, and adaptive multimedia systems. The research opportunities and standardization efforts outlined in this meeting provide a strong foundation for innovations that support real-time, efficient, and cross-platform media experiences for both human and machine consumption.
The 151st MPEG meeting will be held in Daejeon, Korea, from 30 June to 04 July 2025. Click here for more information about MPEG meetings and their developments.