Christian Timmerer – ACM SIGMM Records

About Christian Timmerer

Christian Timmerer is a researcher, entrepreneur, and teacher on immersive multimedia communication, streaming, adaptation, and Quality of Experience. He is an Assistant Professor at Alpen-Adria-Universität Klagenfurt, Austria. Follow him on Twitter at http://twitter.com/timse7 and subscribe to his blog at http://blog.timmerer.com.

MPEG Column: 154th MPEG Meeting

By Christian Timmerer | May 14, 2026 - 17:05 |May 15, 2026 0226, Event Report, Feature, Standards

Leave a comment

The 154th MPEG meeting took place in Santa Eulària, Spain, from April 27 to May 1, 2026. The official MPEG press release can be found here. This report highlights key outcomes from the meeting, with a focus on research directions relevant to the ACM SIGMM community:

Exploration on MPEG Gaussian Splat Coding (GSC)
Draft Joint Call for Proposals: Video Compression Beyond VVC
Energy-aware Streaming in MPEG-DASH
MPEG-AI: Vision and Scenarios for Artificial Intelligence in Multimedia
MPEG Roadmap

Exploration on MPEG Gaussian Splat Coding (GSC)

The MPEG WG 2 Technical Requirements group — jointly with WG 4 (Video Coding), WG 5 (JVET: Joint Video Coding Team(s) with ITU-T SG 16), and WG 7 (Coding of 3D Graphics and Haptics) — made progress toward standardizing Gaussian Splat Coding (GSC) regarding draft requirements and use cases subject to change. Gaussian splatting, first introduced in a landmark 2023 ACM SIGGRAPH paper by Kerbl et al. [Kerbl2023], represents 3D scenes as collections of anisotropic Gaussian primitives carrying geometry (x, y, z positions) and appearance attributes (opacity, scale, rotation, and spherical harmonics coefficients for view-dependent color), enabling photorealistic novel-view synthesis with real-time rendering. Because raw Gaussian splat data can be extremely large and the ecosystem of proprietary formats (.ply, .splat, .spz, etc.) is fragmented, MPEG has identified a clear need for interoperable, efficient compression standards. Two exploration tracks are currently being pursued: I-3DGS, which operates on Gaussian splats in the well-established “INRIA” format as a symmetric encode/decode pipeline, and A-3DGS, which allows alternative learned representations and training-integrated approaches.

The draft requirements, still evolving, currently cover representation, coding, and system aspects across both tracks, with an additional lightweight profile targeting resource-constrained devices such as mobile phones (Snapdragon 8 Gen 3/Elite) and HMDs (Snapdragon XR Gen2, e.g., Meta Quest 3). Among the coding requirements under consideration are lossy and lossless compression with variable bitrate, spatial and temporal random access, progressive and scalable decoding (quality, Level of Detail (LoD), attribute subsets), and error resilience. Notably, a lightweight profile currently proposes hard complexity constraints (i.e., real-time encode/decode on 2024/2025 mobile hardware, a 2GB runtime memory cap, and at most four concurrent video decoder sessions) reflecting MPEG’s intent to enable a fast-deployment path for interoperable interchange and storage of static Gaussian splat assets. Alongside the requirements, a draft set of 27 use cases has been identified, spanning consumer XR (telepresence, gaming, social media, retail), professional media (movie production, sports broadcasting, immersive journalism), industrial applications (digital twins, Building Information Modeling (BIM), structure inspection, disaster assessment), and emerging hybrid representations such as Gaussian splats attached to deformable meshes for avatar animation and rigging. Several of these use cases are motivating draft requirements around primitive ordering preservation and stable identifier signaling for external metadata associations, though the details of these provisions may still change.

Research aspects: Even at this early draft stage, the direction of MPEG’s GSC work opens a rich set of research opportunities. On the compression side, the dual-track structure raises open questions around rate-distortion-complexity optimization for both geometry-based and video-codec-based pipelines, including temporally coherent coding of dynamic (tracked and non-tracked) Gaussian sequences and attribute-group-aware progressive coding. The QoE angle is equally pressing: no widely accepted perceptual quality metric yet exists for 6DoF Gaussian splat rendering, and the community can contribute splat-artifact-aware metrics, view-consistency measures, and subjective evaluation methodologies. The envisioned lightweight profile points to a need for co-design of decoders and real-time renderers targeting mobile GPU architectures, offering opportunities in GPU-friendly bitstream layouts and LOD-driven streaming. From a systems and networking perspective, the spatial and temporal random-access provisions, combined with the breadth of use cases demanding adaptive streaming to diverse devices (HMDs, phones, TVs, browsers), map naturally onto adaptive bitrate research, ROI- and view-dependent segment delivery, and loss-resilient transmission of splat parameters. Finally, the emerging use cases around hybrid mesh-Gaussian avatars, scene editing, and semantic metadata associations introduce new multimedia content management and interactive media challenges that go well beyond traditional video streaming and are squarely within the scope of ACM SIGMM’s research community.

Draft Joint Call for Proposals: Video Compression Beyond VVC

MPEG’s Joint Video Experts Team (JVET) — operating jointly under ITU-T SG21 and ISO/IEC JTC 1/SC 29 — advanced a draft Joint Call for Proposals (CfP) for a new generation of video compression technology with capabilities that would substantially exceed those of the current Versatile Video Coding (VVC) standard (Rec. ITU-T H.266 | ISO/IEC 23090-3). The final CfP is planned for July 2026, with proposal submissions evaluated at a JVET meeting in January 2027 and a tentative target of a completed standard by October 2029. The overarching goal is to solicit compression technology that significantly improves upon VVC’s Main 10 Profile in terms of rate-distortion performance, encoder/decoder implementability, applicability to diverse content types, and additional features such as low latency, error robustness, and scalability, while explicitly recognizing that practical fast encoding is increasingly important across a growing range of applications.

The draft CfP defines four test cases. The primary test case targets improved compression without runtime constraints, spanning several content categories: SDR random-access at UHD/4K and HD resolutions, SDR low-delay HD (targeting conversational and gaming applications), HDR content under both PQ and HLG transfer functions at UHD, gaming low-delay HD, and user-generated content. Three additional test cases impose encoder runtime constraints relative to the VVC Test Model (VTM) reference encoder, enabling JVET to characterize the compression-versus-speed trade-off across submissions. Formal subjective evaluation will follow the degradation category rating (DCR) methodology per ITU-R BT.500. Importantly, the CfP explicitly addresses neural and learned components: proponents must disclose what training data was used and are prohibited from using any test sequence as training material, and source code (incl. training scripts or parameter derivation procedures) must be made available for accepted technologies entering the core experiments process. The draft notes that specific test sequences and target bitrates may still change before the final CfP is issued.

Research aspects: The runtime-constrained test cases create a natural framework for studying the compression-complexity Pareto frontier for both classical and learned codecs. The inclusion of user-generated content and gaming video as distinct categories invites research into content-adaptive coding tools and perceptual quality metrics tailored to these sources, as does the HDR coverage with its use of weighted PSNR alongside MS-SSIM. The explicit allowance for neural and learned components, with mandatory training data disclosure and source code requirements, signals that JVET anticipates hybrid and end-to-end learned codecs as serious contenders, making codec-agnostic adaptive streaming, QoE modeling for learned video codecs, and large-scale perceptual quality benchmarking timely topics for the ACM SIGMM community.

Energy-aware Streaming in MPEG-DASH

MPEG’s WG 3 (Systems/DASH) is developing a framework for integrating energy-related information into adaptive streaming workflows, currently documented as a Technology under Consideration (TuC) in the DASH specification. The proposed framework treats energy as a first-class design metric alongside QoE, latency, and throughput, and defines an end-to-end approach for assigning, aggregating, and propagating energy consumption data across the entire media delivery chain — from production and encoding through CDN distribution to the client. A key design principle is extensibility: rather than hardcoding specific metrics, the framework proposes a common registry of energy-related metrics (such as energy indices or carbon indices) identified via URNs or 4CC codes, inspired by existing registries like MP4RA and DASH-IF. Energy information may be carried through a variety of existing DASH mechanisms, including MPD descriptors at multiple granularity levels (Adaptation Set, Representation, Segment, Service Location), CMCD/CMSD extensions, metadata tracks, SAND messages, and event streams. A dedicated Energy descriptor in the MPD is proposed, analogous to existing Accessibility descriptors, to expose energy information to clients and applications for representation selection, user exposure, and reporting to back-end servers.

The April 2026 update reported significant progress on two related fronts. A 5G-MAG workshop co-organized with 3GPP SA4 and Greening of Streaming (March 2026) highlighted growing industry consensus around practical energy measurement, surfacing findings such as the dominant role of device eco-mode settings and content brightness over codec or resolution choices in determining end-device energy consumption, and the challenge of reproducible cloud-based energy measurement. In parallel, 3GPP’s Rel-20 study on media energy consumption exposure (FS_Energy_Ph2_MED) reached 80% completion and is expected to conclude in June 2026, with normative work to follow. Notably, 3GPP’s current draft conclusions focus on generic architectural enablers, specifically a new Energy Information Application Function, while explicitly deferring media-layer and client-driven energy optimization to external bodies such as MPEG, SVTA, and DVB. This positions MPEG-DASH’s manifest-based energy signaling work as the natural venue for maturing the streaming-level mechanisms that 3GPP may later reference.

Research aspects: This work opens several timely directions. Energy-aware ABR algorithm design, i.e., jointly optimizing QoE and energy across representation selection, CDN choice, and client device settings, is a natural extension of the existing adaptive streaming research agenda. The proposed metrics registry and MPD-level signaling create opportunities for dataset construction and benchmarking, building on emerging open datasets such as COCONUT [Tashtarian2024] and VEED [Linder2024]. The finding that device-side factors (eco-mode, display brightness) dominate energy consumption over codec and bitrate choices challenges some common assumptions and calls for more holistic QoE-energy modeling. Finally, the cross-SDO coordination between MPEG, 3GPP, IETF (GREEN working group), and Greening of Streaming presents opportunities for the ACM SIGMM community to contribute to the design of interoperable, standardized energy reporting APIs for streaming services.

MPEG-AI: Vision and Scenarios for Artificial Intelligence in Multimedia

The first edition of ISO/IEC TR 23888-1 serves as the foundational vision document for the MPEG-AI series (ISO/IEC 23888). The document maps out how AI and neural network technologies interact with multimedia standardization along two complementary axes: (i) AI as a multimedia coding tool (e.g., AI-based video compression, 3D point cloud coding) and (ii) multimedia as input for AI consumption (e.g., video coding optimized for machine vision tasks). Under this umbrella, the document surveys six technical areas. In AI-based video coding, neural network components are explored as hybrid additions to VVC-style codecs, covering in-loop filters, intra prediction, super-resolution via reference picture resampling, and content-adaptive postfilters transmitted via SEI messages using the Neural Network Coding standard (NNC, ISO/IEC 15938-17). In AI-based 3D graphics coding, the focus is on dynamic point clouds for immersive (XR, gaming) and machine-oriented (autonomous navigation, BIM) applications, where sparsity and geometric irregularity pose unique challenges beyond those faced by image/video AI codecs. AI model compression (NNC) addresses the bandwidth-efficient deployment and incremental updating of neural network weights to devices, with use cases ranging from adaptive streaming ABR models to federated learning and postfilter delivery. Video coding for machines (VCM) targets compression optimized for downstream AI tasks such as object detection, tracking, and content moderation, with applications in surveillance, intelligent transportation, smart cities, and industrial inspection. Feature coding for machines (FCM) extends this to split-inference architectures where intermediate feature maps — rather than reconstructed video — are compressed and transmitted between edge devices and servers. Finally, distributed AI media description addresses the interoperable representation and API-level exchange of AI inference results (e.g., bounding boxes, segmentation masks) between networked media analyzers, as specified in the MPEG-IoMT suite.

ISO/IEC TR 23888-1: AI as a multimedia coding tool and multimedia as input for AI consumption.

Research aspects: The hybrid codec paradigm raises open questions around joint optimization of traditional and learned tools and complexity-aware training for mobile targets. The VCM and FCM tracks call for new task-oriented quality metrics capturing machine-task performance as a function of bitrate, an area where the multimedia and computer vision communities can collaborate. The split-inference and feature coding scenarios introduce latency-constrained compression problems for edge-to-cloud pipelines, which naturally connect to adaptive streaming and IoT research. Finally, the reproducibility and bit-exactness challenges highlighted in the document — hardware-dependent inference, non-deterministic training, and the absence of standardized evaluation environments — present an opportunity for the community to develop shared benchmarking infrastructure for learned multimedia codecs.

MPEG Roadmap

MPEG released an updated roadmap at its 154th meeting, reflecting the current status and near-term trajectory of its standardization activities across three broad pillars. Under Media Coding, work nearing completion includes MPEG Immersive Video v.2, Feature Coding for Machines, Solid Point Cloud Coding, and Dynamic Mesh Compression, while longer-horizon efforts cover AI Graphics Compression, Video Coding for Machines, Lenslet video coding, and — directly relevant to this report — both Video-based and Geometry-based Gaussian Splat Coding tracks. Under Systems and Tools, near-term deliverables include DASH v.7, Green metadata v.4, and Carriage of Haptics Data, with CMAF v.4 and File Format (ISOBMFF) v.10 on a slightly longer timeline. The Beyond Media pillar continues to advance genomic data search and biomedical waveform coding (BWC), alongside media authenticity and provenance indication — underscoring MPEG’s expanding scope well beyond traditional audiovisual applications.

Research aspects: The roadmap highlights several intersecting research opportunities. The convergence of volumetric and neural representations (i.e., point clouds, dynamic meshes, Gaussian splats, and lenslet video; all progressing in parallel) raises open questions around unified rate-distortion frameworks and cross-format QoE evaluation for 6DoF experiences. The simultaneous progression of Video Coding for Machines and Feature Coding for Machines alongside traditional human-centric codecs calls for research into adaptive pipelines that can serve both human and machine consumers from a shared bitstream. The Green metadata track connects directly to the energy-aware streaming work discussed above, underscoring the need for end-to-end energy modeling that spans codec choice, packaging, delivery, and consumption. Finally, the Beyond Media thread (e.g., particularly genomic data and biomedical waveforms) signals an expanding definition of “multimedia” that the ACM SIGMM community may wish to engage with as compression, retrieval, and QoE methods developed for audiovisual content find applicability in life sciences.

Concluding Remarks

The 154th MPEG meeting in Santa Eularia reflects a standards body in active transition, broadening its scope from traditional audiovisual compression toward a richer landscape that encompasses neural scene representations, AI-native codecs, energy-aware delivery, and even biomedical data. The Gaussian Splat Coding exploration, the next-generation video compression Call for Proposals, the MPEG-AI vision document, and the energy-aware streaming framework each address distinct but interconnected challenges: how to represent, compress, deliver, and consume increasingly complex and diverse media efficiently and sustainably. For the ACM SIGMM community, this meeting offers both a map of where industry standardization is heading and a set of open research problems (i.e., spanning perceptual quality assessment, learned compression, edge inference, green streaming, and immersive media delivery) where academic contributions can meaningfully shape the next generation of multimedia standards.

The 155th MPEG meeting will be held in Geneva, Switzerland, from July 13 to 17, 2026. Click here for more information about MPEG meetings and ongoing developments.

References

[Kerbl, 2023] Bernhard Kerbl, Georgios Kopanas, Thomas Leimkuehler, and George Drettakis. 2023. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Trans. Graph. 42, 4, Article 139 (August 2023), 14 pages. https://doi.org/10.1145/3592433
[Tashtarian, 2024] Farzad Tashtarian, Daniele Lorenzi, Hadi Amirpour, Samira Afzal, and Christian Timmerer. 2024. COCONUT: Content Consumption Energy Measurement Dataset for Adaptive Video Streaming. In Proceedings of the 15th ACM Multimedia Systems Conference (MMSys ’24). Association for Computing Machinery, New York, NY, USA, 346–352. https://doi.org/10.1145/3625468.3652179
[Linder, 2024] Sandro Linder, Samira Afzal, Christian Bauer, Hadi Amirpour, Radu Prodan, and Christian Timmerer. 2024. VEED: Video Encoding Energy and CO2 Emissions Dataset for AWS EC2 instances. In Proceedings of the 15th ACM Multimedia Systems Conference (MMSys ’24). Association for Computing Machinery, New York, NY, USA, 332–338. https://doi.org/10.1145/3625468.3652178

MPEG Column: 153rd MPEG Meeting

By Christian Timmerer | February 2, 2026 - 11:36 |February 2, 2026 0126, Event Report, Feature, Standards

Leave a comment

The 153rd MPEG meeting took place online from January 19-23, 2026. The official MPEG press release can be found here. This report highlights key outcomes from the meeting, with a focus on research directions relevant to the ACM SIGMM community:

MPEG Roadmap
Exploration on MPEG Gaussian Splat Coding (GSC)
MPEG Immersive Video 2nd edition (new white paper)

MPEG Roadmap

MPEG released an updated roadmap showing continued convergence of immersive and “beyond video” media with deployment-ready systems work. Near-term priorities include 6DoF experiences (MPEG Immersive Video v2 and 6DoF audio), volumetric representations (dynamic meshes, solid point clouds, LiDAR, and emerging Gaussian splat coding), and “coding for machines,” which treats visual and audio signals as inputs to downstream analytics rather than only for human consumption.

Research aspects: The most promising research opportunities sit at the intersections: renderer and device-aware rate-distortion-complexity optimization for volumetric content; adaptive streaming and packaging evolution (e.g., MPEG-DASH / CMAF) for interactive 6DoF services under tight latency constraints; and cross-cutting themes such as media authenticity and provenance, green and energy metadata, and exploration threads on neural-network-based compression and compression of neural networks that foreshadow AI-native multimedia pipelines.

MPEG Gaussian Splat Coding (GSC)

Gaussian Splat Coding (GSC) is MPEG’s effort to standardize how 3D Gaussian Splatting content, scenes represented as sparse “Gaussian splats” with geometry plus rich attributes (scale and rotation, opacity, and spherical-harmonics appearance for view-dependent rendering), is encoded, decoded, and evaluated so it can be exchanged and rendered consistently across platforms. The main motivation is interoperability for immersive media pipelines: enabling reproducible results, shared benchmarks, and comparable rate-distortion-complexity trade-offs for use cases spanning telepresence and immersive replay to mobile XR and digital twins, while retaining the visual strengths that made 3DGS attractive compared to heavier neural scene representations.

The work remains in an exploration phase, coordinated across ISO/IEC JTC 1/SC 29 groups WG 4 (MPEG Video Coding) and WG 7 (MPEG Coding for 3D Graphics and Haptics) through Joint Exploration Experiments covering datasets and anchors, new coding tools, software (renderer and metrics), and Common Test Conditions (CTC). A notable systems thread is “lightweight GSC” for resource-constrained devices (single-frame, low-latency tracks using geometry-based and video-based pipelines with explicit time and memory targets), alongside an “early deployment” path via amendments to existing MPEG point-cloud codecs to more natively carry Gaussian-splat parameters. In parallel, MPEG is testing whether splat-specific tools can outperform straightforward mappings in quality, bitrate, and compute for real-time and streaming-centric scenarios.

Research aspects: Relevant SIGMM directions include splat-aware compression tools and rate-distortion-complexity optimization (including tracked vs. non-tracked temporal prediction); QoE evaluation for 6DoF navigation (metrics for view and temporal consistency and splat-specific artifacts); decoder and renderer co-design for real-time and mobile lightweight profiles (progressive and LOD-friendly layouts, GPU-friendly decode); and networked delivery problems such as adaptive streaming, ROI and view-dependent transmission, and loss resilience for splat parameters. Additional opportunities include interoperability work on reproducible benchmarking, conformance testing, and practical packaging and signaling for deployment.

MPEG Immersive Video 2nd edition (white paper)

The second edition of MPEG Immersive Video defines an interoperable bitstream and decoding process for efficient 6DoF immersive scene playback, supporting translational and rotational movement with motion parallax to reduce discomfort often associated with pure 3DoF viewing. The second edition primarily extends functionality (without changing the high-level bitstream structure), adding capabilities such as capture-device information, additional projection types, and support for Simple Multi-Plane Image (MPI), alongside tools that better support geometry and attribute handling and depth-related processing.

Architecturally, MIV ingests multiple (unordered) camera views with geometry (depth and occupancy) and attributes (e.g., texture), then reduces inter-view redundancy by extracting patches and packing them into 2D “atlases” that are compressed using conventional video codecs. MIV-specific metadata signals how to reconstruct views from the atlases. The standard is built as an extension of the common Visual Volumetric Video-based Coding (V3C) bitstream framework shared with V-PCC, with profiles that preserve backward compatibility while introducing a new profile for added second-edition functionality and a tailored profile for full-plane MPI delivery.

Research aspects: Key SIGMM topics include systems-efficient 6DoF delivery (better view and patch selection and atlas packing under latency and bandwidth constraints); rate-distortion-complexity-QoE optimization that accounts for decode and render cost (especially on HMD and mobile) and motion-parallax comfort; adaptive delivery strategies (representation ladders, viewport and pose-driven bit allocation, robust packetization and error resilience for atlas video plus metadata); renderer-aware metrics and subjective protocols for multi-view temporal consistency; and deployment-oriented work such as profile and level tuning, codec-group choices (HEVC / VVC), conformance testing, and exploiting second-edition features (capture device info, depth tools, Simple MPI) for more reliable reconstruction and improved user experience.

Concluding Remarks

The meeting outcomes highlight a clear shift toward immersive and AI-enabled media systems where compression, rendering, delivery, and evaluation must be co-designed. These developments offer timely opportunities for the ACM SIGMM community to contribute reproducible benchmarks, perceptual metrics, and end-to-end streaming and systems research that can directly influence emerging standards and deployments.

The 154th MPEG meeting will be held in Santa Eulària, Spain, from April 27 to May 1, 2026. Click here for more information about MPEG meetings and ongoing developments.

MPEG Column: 152nd MPEG Meeting

By Christian Timmerer | December 9, 2025 - 08:20 |December 9, 2025 0425, 0425, Event Report, Feature, September 2013, Standards

Leave a comment

The 152nd MPEG meeting took place in Geneva, Switzerland, from October 7 to October 11, 2025. The official MPEG press release can be found here. This column highlights key points from the meeting, amended with research aspects relevant to the ACM SIGMM community:

MPEG Systems received an Emmy® Award for the Common Media Application Format (CMAF). A separate press release regarding this achievement is available here.
JVET ratified new editions of VSEI, VVC, and HEVC
The fourth edition of Visual Volumetric Video-based Coding (V3C and V-PCC) has been finalized
Responses to the call for evidence on video compression with capability beyond VVC successfully evaluated

MPEG Systems received an Emmy® Award for the Common Media Application Format (CMAF)

On September 18, 2025, the National Academy of Television Arts & Sciences (NATAS) announced that the MPEG Systems Working Group (ISO/IEC JTC 1/SC 29/WG 3) had been selected as a recipient of a Technology & Engineering Emmy® Award for standardizing the Common Media Application Format (CMAF). But what is CMAF? CMAF (ISO/IEC 23000-19) is a media format standard designed to simplify and unify video streaming workflows across different delivery protocols and devices. Here’s a structured overview. Before CMAF, streaming services often had to produce multiple container formats, i.e., (i) ISO Base Media File Format (ISOBMFF) for MPEG-DASH and MPEG-2 Transport Stream (TS) for Apple HLS. This duplication resulted in additional encoding, packaging, and storage costs. I wrote a blog post about this some time ago here. CMAF’s main goal is to define a single, standardized segmented media format usable by both HLS and DASH, enabling “encode once, package once, deliver everywhere.”

The core concept of CMAF is that it is based on ISOBMFF, the foundation for MP4. Each CMAF stream consists of a CMAF header, CMAF media segments, and CMAF track files (a logical sequence of segments for one stream, e.g., video or audio). CMAF enables low-latency streaming by allowing progressive segment transfer, adopting chunked transfer encoding via CMAF chunks. CMAF defines interoperable profiles for codecs and presentation types for video, audio, and subtitles. Thanks to its compatibility with and adoption within existing streaming standards, CMAF bridges the gaps between DASH and HLS, creating a unified ecosystem.

Research aspects include – but are not limited to – low-latency tuning (segment/chunk size trade-offs, HTTP/3, QUIC), Quality of Experience (QoE) impact of chunk-based adaptation, synchronization of live and interactive CMAF streams, edge-assisted CMAF caching and prediction, and interoperability testing and compliance tools.

JVET ratified new editions of VSEI, VVC, and HEVC

At its 40th meeting, the Joint Video Experts Team (JVET, ISO/IEC JTC 1/SC 29/WG 5) concluded the standardization work on the next editions of three key video coding standards, advancing them to the Final Draft International Standard (FDIS) stage. Corresponding twin-text versions have also been submitted to ITU-T for consent procedures. The finalized standards include:

Versatile Supplemental Enhancement Information (VSEI) — ISO/IEC 23002-7 | ITU-T Rec. H.274
Versatile Video Coding (VVC) — ISO/IEC 23090-3 | ITU-T Rec. H.266
High Efficiency Video Coding (HEVC) — ISO/IEC 23008-2 | ITU-T Rec. H.265

The primary focus of these new editions is the extension and refinement of Supplemental Enhancement Information (SEI) messages, which provide metadata and auxiliary data to support advanced processing, interpretation, and quality management of coded video streams.

The updated VSEI specification introduces both new and refined SEI message types supporting advanced use cases:

AI-driven processing: Extensions for neural-network-based post-filtering and film grain synthesis offer standardized signalling for machine learning components in decoding and rendering pipelines.
Semantic and multimodal content: New SEI messages describe infrared, X-ray, and other modality indicators, region packing, and object mask encoding; creating interoperability points for multimodal fusion and object-aware compression research.
Pipeline optimization: Messages defining processing order and post-processing nesting support research on joint encoder-decoder optimization and edge-cloud coordination in streaming architectures.
Authenticity and generative media: A new set of messages supports digital signature embedding and generative-AI-based face encoding, raising questions for the SIGMM community about trust, authenticity, and ethical AI in media pipelines.
Metadata and interpretability: New SEIs for text description, image format metadata, and AI usage restriction requests could facilitate research into explainable media, human-AI interaction, and regulatory compliance in multimedia systems.

All VSEI features are fully compatible with the new VVC edition, and most are also supported in HEVC. The new HEVC edition further refines its multi-view profiles, enabling more robust 3D and immersive video use cases.

Research aspects of these new standard’s editions can be summarized as follows: (i) Define new standardized interfaces between neural post-processing and conventional video coding, fostering reproducible and interoperable research on learned enhancement models. (ii) Encourage exploration of metadata-driven adaptation and QoE optimization using SEI-based signals in streaming systems. (iii) Open possibilities for cross-layer system research, connecting compression, transport, and AI-based decision layers. (iv) Introduce a formal foundation for authenticity verification, content provenance, and AI-generated media signalling, relevant to current debates on trustworthy multimedia.

These updates highlight how ongoing MPEG/ITU standardization is evolving toward a more AI-aware, multimodal, and semantically rich media ecosystem, providing fertile ground for experimental and applied research in multimedia systems, coding, and intelligent media delivery.

The fourth edition of Visual Volumetric Video-based Coding (V3C and V-PCC) has been finalized

MPEG Coding of 3D Graphics and Haptics (ISO/IEC JTC 1/SC 29/WG7) has advanced MPEG-I Part 5 – Visual Volumetric Video-based Coding (V3C and V-PCC) to the Final Draft International Standard (FDIS) stage, marking its fourth edition. This revision introduces major updates to the Video-based Coding of Volumetric Content (V3C) framework, particularly enabling support for an additional bitstream instance: V-DMC (Video-based Dynamic Mesh Compression).

Previously, V3C served as the structural foundation for V-PCC (Video-based Point Cloud Compression) and MIV (MPEG Immersive Video). The new edition extends this flexibility by allowing V-DMC integration, reinforcing V3C as a generic, extensible framework for volumetric and 3D video coding. All instances follow a shared principle, i.e., using conventional 2D video codecs (e.g., HEVC, VVC) for projection-based compression, complemented by specialized tools for mapping, geometry, and metadata handling.

While V-PCC remains co-specified within Part 5, MIV (Part 12) and V-DMC (Part 29) are standardized separately. The progression to FDIS confirms the technical maturity and architectural stability of the framework.

This evolution opens new research directions as follows: (i) Unified 3D content representation, enabling comparative evaluation of point cloud, mesh, and view-based methods under one coding architecture. (ii) Efficient use of 2D codecs for 3D media, raising questions on mapping optimization, distortion modeling, and geometry-texture compression. (iii) Dynamic and interactive volumetric streaming, relevant to AR/VR, telepresence, and immersive communication research.

The fourth edition of MPEG-I Part 5 thus positions V3C as a cornerstone for future volumetric, AI-assisted, and immersive video systems, bridging standardization and cutting-edge multimedia research.

Responses to the call for evidence on video compression with capability beyond VVC successfully evaluated

The Joint Video Experts Team (JVET, ISO/IEC JTC 1/SC 29/WG 5) has completed the evaluation of submissions to its Call for Evidence (CfE) on video compression with capability beyond VVC. The CfE investigated coding technologies that may surpass the performance of the current Versatile Video Coding (VVC) standard in compression efficiency, computational complexity, and extended functionality.

A total of five submissions were assessed, complemented by ECM16 reference encodings and VTM anchor sequences with multiple runtime variants. The evaluation addressed both compression capability and encoding runtime, as well as low-latency and error-resilience features. All technologies were derived from VTM, ECM, or NNVC frameworks, featuring modified encoder configurations and coding tools rather than entirely new architectures.

Key Findings

In the compression capability test, 76 out of 120 test cases showed at least one submission with a non-overlapping confidence interval compared to the VTM anchor. Several methods outperformed ECM16 in visual quality and achieved notable compression gains at lower complexity. Neural-network-based approaches demonstrated clear perceptual improvements, particularly for 8K HDR content, while gains were smaller for gaming scenarios.
In the encoding runtime test, significant improvements were observed even under strict complexity constraints: 37 of 60 test points (at both 1× and 0.2× runtime) showed statistically significant benefits over VTM. Some submissions achieved faster encoding than VTM, with only a 35% increase in decoder runtime.

Research Relevance and Outlook

The CfE results illustrate a maturing convergence between model-based and data-driven video coding, raising research questions highly relevant for the ACM SIGMM community:

How can learned prediction and filtering networks be integrated into standard codecs while preserving interoperability and runtime control?
What methodologies can best evaluate perceptual quality beyond PSNR, especially for HDR and immersive content?
How can complexity-quality trade-offs be optimized for diverse hardware and latency requirements?

Building on these outcomes, JVET is preparing a Call for Proposals (CfP) for the next-generation video coding standard, with a draft planned for early 2026 and evaluation through 2027. Upcoming activities include refining test material, adding Reference Picture Resampling (RPR), and forming a new ad hoc group on hardware implementation complexity.

For multimedia researchers, this CfE marks a pivotal step toward AI-assisted, complexity-adaptive, and perceptually optimized compression systems, which are considered a key frontier where codec standardization meets intelligent multimedia research.

The 153rd MPEG meeting will be held online from January 19 to January 23, 2026. Click here for more information about MPEG meetings and their developments.

MPEG Column: 150th MPEG Meeting (Virtual/Online)

By Christian Timmerer | May 7, 2025 - 07:15 |June 4, 2025 0225, 0225, Event Report, Feature, Standards

Leave a comment

The 150th MPEG meeting was held online from 31 March to 04 April 2025. The official press release can be found here. This column provides the following highlights:

Requirements: MPEG-AI strategy and white paper on MPEG technologies for metaverse
JVET: Draft Joint Call for Evidence on video compression with capability beyond Versatile Video Coding (VVC)
Video: Gaussian splat coding and video coding for machines
Audio: Audio coding for machines
3DGH: 3D Gaussian splat coding

MPEG-AI Strategy

The MPEG-AI strategy envisions a future where AI and neural networks are deeply integrated into multimedia coding and processing, enabling transformative improvements in how digital content is created, compressed, analyzed, and delivered. By positioning AI at the core of multimedia systems, MPEG-AI seeks to enhance both content representation and intelligent analysis. This approach supports applications ranging from adaptive streaming and immersive media to machine-centric use cases like autonomous vehicles and smart cities. AI is employed to optimize coding efficiency, generate intelligent descriptors, and facilitate seamless interaction between content and AI systems. The strategy builds on foundational standards such as ISO/IEC 15938-13 (CDVS), 15938-15 (CDVA), and 15938-17 (Neural Network Coding), which collectively laid the groundwork for integrating AI into multimedia frameworks.

Currently, MPEG is developing a family of standards under the ISO/IEC 23888 series that includes a vision document, machine-oriented video coding, and encoder optimization for AI analysis. Future work focuses on feature coding for machines and AI-based point cloud compression to support high-efficiency 3D and visual data handling. These efforts reflect a paradigm shift from human-centric media consumption to systems that also serve intelligent machine agents. MPEG-AI maintains compatibility with traditional media processing while enabling scalable, secure, and privacy-conscious AI deployments. Through this initiative, MPEG aims to define the future of multimedia as an intelligent, adaptable ecosystem capable of supporting complex, real-time, and immersive digital experiences.

MPEG White Paper on Metaverse Technologies

The MPEG white paper on metaverse technologies (cf. MPEG white papers) outlines the pivotal role of MPEG standards in enabling immersive, interoperable, and high-quality virtual experiences that define the emerging metaverse. It identifies core metaverse parameters – real-time operation, 3D experience, interactivity, persistence, and social engagement – and maps them to MPEG’s longstanding and evolving technical contributions. From early efforts like MPEG-4’s Binary Format for Scenes (BIFS) and Animation Framework eXtension (AFX) to MPEG-V’s sensory integration, and the advanced MPEG-I suite, these standards underpin critical features such as scene representation, dynamic 3D asset compression, immersive audio, avatar animation, and real-time streaming. Key technologies like point cloud compression (V-PCC, G-PCC), immersive video (MIV), and dynamic mesh coding (V-DMC) demonstrate MPEG’s capacity to support realistic, responsive, and adaptive virtual environments. Recent efforts include neural network compression for learned scene representations (e.g., NeRFs), haptic coding formats, and scene description enhancements, all geared toward richer user engagement and broader device interoperability.

The document highlights five major metaverse use cases – virtual environments, immersive entertainment, virtual commerce, remote collaboration, and digital twins – all supported by MPEG innovations. It emphasizes the foundational role of MPEG-I standards (e.g., Parts 12, 14, 29, 39) for synchronizing immersive content, representing avatars, and orchestrating complex 3D scenes across platforms. Future challenges identified include ensuring interoperability across systems, advancing compression methods for AI-assisted scenarios, and embedding security and privacy protections. With decades of multimedia expertise and a future-focused standards roadmap, MPEG positions itself as a key enabler of the metaverse – ensuring that emerging virtual ecosystems are scalable, immersive, and universally accessible.

The MPEG white paper on metaverse technologies highlights several research opportunities, including efficient compression of dynamic 3D content (e.g., point clouds, meshes, neural representations), synchronization of immersive audio and haptics, real-time adaptive streaming, and scene orchestration. It also points to challenges in standardizing interoperable avatar formats, AI-enhanced media representation, and ensuring seamless user experiences across devices. Additional research directions include neural network compression, cross-platform media rendering, and developing perceptual metrics for immersive Quality of Experience (QoE).

Draft Joint Call for Evidence (CfE) on Video Compression beyond Versatile Video Coding (VVC)

The latest JVET AHG report on ECM software development (AHG6), documented as JVET-AL0006, shows promising results. Specifically, in the “Overall” row and “Y” column, there is a 27.06% improvement in coding efficiency compared to VVC, as shown in the figure below.

The Draft Joint Call for Evidence (CfE) on video compression beyond VVC (Versatile Video Coding), identified as document JVET-AL2026 | N 355, is being developed to explore new advancements in video compression. The CfE seeks evidence in three main areas: (a) improved compression efficiency and associated trade-offs, (b) encoding under runtime constraints, and (c) enhanced performance in additional functionalities. This initiative aims to evaluate whether new techniques can significantly outperform the current state-of-the-art VVC standard in both compression and practical deployment aspects.

The visual testing will be carried out across seven categories, including various combinations of resolution, dynamic range, and use cases: SDR Random Access UHD/4K, SDR Random Access HD, SDR Low Bitrate HD, HDR Random Access 4K, HDR Random Access Cropped 8K, Gaming Low Bitrate HD, and UGC (User-Generated Content) Random Access HD. Sequences and rate points for testing have already been defined and agreed upon. For a fair comparison, rate-matched anchors using VTM (VVC Test Model) and ECM (Enhanced Compression Model) will be generated, with new configurations to enable reduced run-time evaluations. A dry-run of the visual tests is planned during the upcoming Daejeon meeting, with ECM and VTM as reference anchors, and the CfE welcomes additional submissions. Following this dry-run, the final Call for Evidence is expected to be issued in July, with responses due in October.

The Draft Joint Call for Evidence (CfE) on video compression beyond VVC invites research into next-generation video coding techniques that offer improved compression efficiency, reduced encoding complexity under runtime constraints, and enhanced functionalities such as scalability or perceptual quality. Key research aspects include optimizing the trade-off between bitrate and visual fidelity, developing fast encoding methods suitable for constrained devices, and advancing performance in emerging use cases like HDR, 8K, gaming, and user-generated content.

3D Gaussian Splat Coding

Gaussian splatting is a real-time radiance field rendering method that represents a scene using 3D Gaussians. Each Gaussian has parameters like position, scale, color, opacity, and orientation, and together they approximate how light interacts with surfaces in a scene. Instead of ray marching (as in NeRF), it renders images by splatting the Gaussians onto a 2D image plane and blending them using a rasterization pipeline, which is GPU-friendly and much faster. Developed by Kerbl et al. (2023) it is capable of real-time rendering (60+ fps) and outperforms previous NeRF-based methods in speed and visual quality. Gaussian splat coding refers to the compression and streaming of 3D Gaussian representations for efficient storage and transmission. It’s an active research area and under standardization consideration in MPEG.

MPEG technical requirements working group together with MPEG video working group started an exploration on Gaussian splat coding and the MPEG coding of 3D graphics and haptics (3DGH) working group addresses 3D Gaussian splat coding, respectively. Draft Gaussian splat coding use cases and requirements are available and various joint exploration experiments (JEEs) are conducted between meetings.

(3D) Gaussian splat coding is actively researched in academia, also in the context of streaming, e.g., like in “LapisGS: Layered Progressive 3D Gaussian Splatting for Adaptive Streaming” or “LTS: A DASH Streaming System for Dynamic Multi-Layer 3D Gaussian Splatting Scenes”. The research aspects of 3D Gaussian splat coding and streaming span a wide range of areas across computer graphics, compression, machine learning, and systems for real-time immersive media. In particular, on efficiently representing and transmitting Gaussian-based neural scene representations for real-time rendering. Key areas include compression of Gaussian parameters (position, scale, color, opacity), perceptual and geometry-aware optimizations, and neural compression techniques such as learned latent coding. Streaming challenges involve adaptive, view-dependent delivery, level-of-detail management, and low-latency rendering on edge or mobile devices. Additional research directions include standardizing file formats, integrating with scene graphs, and ensuring interoperability with existing 3D and immersive media frameworks.

MPEG Audio and Video Coding for Machines

The Call for Proposals on Audio Coding for Machines (ACoM), issued by the MPEG audio coding working group, aims to develop a standard for efficiently compressing audio, multi-dimensional signals (e.g., medical data), or extracted features for use in machine-driven applications. The standard targets use cases such as connected vehicles, audio surveillance, diagnostics, health monitoring, and smart cities, where vast data streams must be transmitted, stored, and processed with low latency and high fidelity. The ACoM system is designed in two phases: the first focusing on near-lossless compression of audio and metadata to facilitate training of machine learning models, and the second expanding to lossy compression of features optimized for specific applications. The goal is to support hybrid consumption – by machines and, where needed, humans – while ensuring interoperability, low delay, and efficient use of storage and bandwidth.

The CfP outlines technical requirements, submission guidelines, and evaluation metrics. Participants must provide decoders compatible with Linux/x86 systems, demonstrate performance through objective metrics like compression ratio, encoder/decoder runtime, and memory usage, and undergo a mandatory cross-checking process. Selected proposals will contribute to a reference model and working draft of the standard. Proponents must register by August 1, 2025, with submissions due in September, and evaluation taking place in October. The selection process emphasizes lossless reproduction, metadata fidelity, and significant improvements over a baseline codec, with a path to merge top-performing technologies into a unified solution for standardization.

Research aspects of Audio Coding for Machines (ACoM) include developing efficient compression techniques for audio and multi-dimensional data that preserve key features for machine learning tasks, optimizing encoding for low-latency and resource-constrained environments, and designing hybrid formats suitable for both machine and human consumption. Additional research areas involve creating interoperable feature representations, enhancing metadata handling for context-aware processing, evaluating trade-offs between lossless and lossy compression, and integrating machine-optimized codecs into real-world applications like surveillance, diagnostics, and smart systems.

The MPEG video coding working group approved the committee draft (CD) for ISO/IEC 23888-2 video coding for machines (VCM). VCM aims to encode visual content in a way that maximizes machine task performance, such as computer vision, scene understanding, autonomous driving, smart surveillance, robotics and IoT. Instead of preserving photorealistic quality, VCM seeks to retain features and structures important for machines, possibly at much lower bitrates than traditional video codecs. The CD introduces several new tools and enhancements aimed at improving machine-centric video processing efficiency. These include updates to spatial resampling, such as the signaling of the inner decoded picture size to better support scalable inference. For temporal resampling, the CD enables adaptive resampling ratios and introduces pre- and post-filters within the temporal resampler to maintain task-relevant temporal features. In the filtering domain, it adopts bit depth truncation techniques – integrating bit depth shifting, luma enhancement, and chroma reconstruction – to optimize both signaling efficiency and cross-platform interoperability. Luma enhancement is further refined through an integer-based implementation for luma distribution parameters, while chroma reconstruction is stabilized across different hardware platforms. Additionally, the CD proposes removing the neural network-based in-loop filter (NNLF) to simplify the pipeline. Finally, in terms of bitstream structure, it adopts a flattened structure with new signaling methods to support efficient random access and better coordination with system layers, aligning with the low-latency, high-accuracy needs of machine-driven applications.

Research in VCM focuses on optimizing video representation for downstream machine tasks, exploring task-driven compression techniques that prioritize inference accuracy over perceptual quality. Key areas include joint video and feature coding, adaptive resampling methods tailored to machine perception, learning-based filter design, and bitstream structuring for efficient decoding and random access. Other important directions involve balancing bitrate and task accuracy, enhancing robustness across platforms, and integrating machine-in-the-loop optimization to co-design codecs with AI inference pipelines.

Concluding Remarks

The 150th MPEG meeting marks significant progress across AI-enhanced media, immersive technologies, and machine-oriented coding. With ongoing work on MPEG-AI, metaverse standards, next-gen video compression, Gaussian splat representation, and machine-friendly audio and video coding, MPEG continues to shape the future of interoperable, intelligent, and adaptive multimedia systems. The research opportunities and standardization efforts outlined in this meeting provide a strong foundation for innovations that support real-time, efficient, and cross-platform media experiences for both human and machine consumption.

The 151st MPEG meeting will be held in Daejeon, Korea, from 30 June to 04 July 2025. Click here for more information about MPEG meetings and their developments.

MPEG Column: 149th MPEG Meeting in Geneva, Switzerland

By Christian Timmerer | April 23, 2025 - 09:10 |April 23, 2025 0125, Feature, Standards

Leave a comment

The 149th MPEG meeting took place in Geneva, Switzerland, from January 20 to 24, 2025. The official press release can be found here. MPEG promoted three standards (among others) to Final Draft International Standard (FDIS), driving innovation in next-generation, immersive audio and video coding, and adaptive streaming:

MPEG-I Immersive Audio enables realistic 3D audio with six degrees of freedom (6DoF).
MPEG Immersive Video (Second Edition) introduces advanced coding tools for volumetric video.
MPEG-DASH (Sixth Edition) enhances low-latency streaming, content steering, and interactive media.

This column focuses on these new standards/editions based on the press release and amended with research aspect relevant for the ACM SIGMM community.

MPEG-I Immersive Audio

At the 149th MPEG meeting, MPEG Audio Coding (WG 6) promoted ISO/IEC 23090-4 MPEG-I immersive audio to Final Draft International Standard (FDIS), marking a major milestone in the development of next-generation audio technology.

MPEG-I immersive audio is a groundbreaking standard designed for the compact and highly realistic representation of spatial sound. Tailored for Metaverse applications, including Virtual, Augmented, and Mixed Reality (VR/AR/MR), it enables seamless real-time rendering of interactive 3D audio with six degrees of freedom (6DoF). Users can not only turn their heads in any direction (pitch/yaw/roll) but also move freely through virtual environments (x/y/z), creating an unparalleled sense of immersion.

True to MPEG’s legacy, this standard is optimized for efficient distribution – even over networks with severe bitrate constraints. Unlike proprietary VR/AR audio solutions, MPEG-I Immersive Audio ensures broad interoperability, long-term stability, and suitability for both streaming and downloadable content. It also natively integrates MPEG-H 3D Audio for high-quality compression.

The standard models a wide range of real-world acoustic effects to enhance realism. It captures detailed sound source properties (e.g., level, point sources, extended sources, directivity characteristics, and Doppler effects) as well as complex environmental interactions (e.g., reflections, reverberation, diffraction, and both total and partial occlusion). Additionally, it supports diverse acoustic environments, including outdoor spaces, multiroom scenes with connecting portals, and areas with dynamic openings such as doors and windows. Its rendering engine balances computational efficiency with high-quality output, making it suitable for a variety of applications.

Further reinforcing its impact, the upcoming ISO/IEC 23090-34 Immersive audio reference software will fully implement MPEG-I immersive audio in a real-time framework. This interactive 6DoF experience will facilitate industry adoption and accelerate innovation in immersive audio. The reference software is expected to reach FDIS status by April 2025.

With MPEG-I immersive audio, MPEG continues to set the standard for the future of interactive and spatial audio, paving the way for more immersive digital experiences.

Research aspects: Research can focus on optimizing the streaming and compression of MPEG-I immersive audio for constrained networks, ensuring efficient delivery without compromising spatial accuracy. Another key area is improving real-time 6DoF audio rendering by balancing computational efficiency and perceptual realism, particularly in modeling complex acoustic effects like occlusions, reflections, and Doppler shifts for interactive VR/AR/MR applications.

MPEG Immersive Video (Second Edition)

At the 149th MPEG meeting, MPEG Video Coding (WG 4) advanced the second edition of ISO/IEC 23090-12 MPEG immersive video (MIV) to Final Draft International Standard (FDIS), marking a significant step forward in immersive video technology.

MIV enables the efficient compression, storage, and distribution of immersive video content, where multiple real or virtual cameras capture a 3D scene. Designed for next-generation applications, the standard supports playback with six degrees of freedom (6DoF), allowing users to not only change their viewing orientation (pitch/yaw/roll) but also move freely within the scene (x/y/z). By leveraging strong hardware support for widely used video formats, MPEG immersive video provides a highly flexible framework for multi-view video plus depth (MVD) and multi-plane image (MPI) video coding, making volumetric video more accessible and efficient.

With the second edition, MPEG continues to expand the capabilities of MPEG immersive video, introducing a range of new technologies to enhance coding efficiency and support more advanced immersive experiences. Key additions include:

Geometry coding using luma and chroma planes, improving depth representation
Capture device information, enabling better reconstruction of the original scene
Patch margins and background views, optimizing scene composition
Static background atlases, reducing redundant data for stationary elements
Support for decoder-side depth estimation, enhancing depth accuracy
Chroma dynamic range modification, improving color fidelity
Piecewise linear normalized disparity quantization and linear depth quantization, refining depth precision

The second edition also introduces two new profiles: (1) MIV Simple MPI profile, allowing MPI content playback with a single 2D video decoder, and (2) MIV 2 profile, a superset of existing profiles that incorporates all newly added tools.

With these advancements, MPEG immersive video continues to push the boundaries of immersive media, providing a robust and efficient solution for next-generation video applications.

Research aspects: Possible research may explore advancements in MPEG immersive video to improve compression efficiency and real-time streaming while preserving depth accuracy and spatial quality. Another key area is enhancing 6DoF video rendering by leveraging new coding tools like decoder-side depth estimation and geometry coding, enabling more precise scene reconstruction and seamless user interaction in volumetric video applications.

MPEG-DASH (Sixth Edition)

At the 149th MPEG meeting, MPEG Systems (WG 3) advanced the sixth edition of MPEG-DASH (ISO/IEC 23009-1 Media presentation description and segment formats) by promoting it to the Final Draft International Standard (FDIS), the final stage of standards development. This milestone underscores MPEG’s ongoing commitment to innovation and responsiveness to evolving market needs.

The sixth edition introduces several key enhancements to improve the flexibility and efficiency of MPEG-DASH:

Alternative media presentation support, enabling seamless switching between main and alternative streams
Content steering signaling across multiple CDNs, optimizing content delivery
Enhanced segment sequence addressing, improving low-latency streaming and faster tune-in
Compact duration signaling using patterns, reducing MPD overhead
Support for Common Media Client Data (CMCD), enabling better client-side analytics
Nonlinear playback for interactive storylines, expanding support for next-generation media experiences

With these advancements, MPEG-DASH continues to evolve as a robust and scalable solution for adaptive streaming, ensuring greater efficiency, flexibility, and enhanced user experiences across a wide range of applications.

Research aspects: While advancing MPEG-DASH for more efficient and flexible adaptive streaming has been subject to research for a while, optimizing content delivery across multiple CDNs while minimizing latency and optimizing QoE remains an open issue. Another key area is enhancing interactivity and user experiences by leveraging new features like nonlinear playback for interactive storylines and improved client-side analytics through Common Media Client Data (CMCD).

The 150th MPEG meeting will be held online from March 31 to April 04, 2025. Click here for more information about MPEG meetings and their developments.

MPEG Column: 148th MPEG Meeting in Kemer, Türkiye

By Christian Timmerer | December 11, 2024 - 13:29 |December 11, 2024 0424, Event Report, Feature, Standards

Leave a comment

The 148th MPEG meeting took place in Kemer, Türkiye, from November 4 to 8, 2024. The official press release can be found here and includes the following highlights:

Point Cloud Coding: AI-based point cloud coding & enhanced G-PCC
MPEG Systems: New Part of MPEG DASH for redundant encoding and packaging, reference software and conformance of ISOBMFF, and a new structural CMAF brand profile
Video Coding: New part of MPEG-AI and 2nd edition of conformance and reference software for MPEG Immersive Video (MIV)
MPEG completes subjective quality testing for film grain synthesis using the Film Grain Characteristics SEI message

148th MPEG Meeting, Kemer, Türkiye, November 4-8, 2024.

Point Cloud Coding

At the 148^th MPEG meeting, MPEG Coding of 3D Graphics and Haptics (WG 7) launched a new AI-based Point Cloud Coding standardization project. MPEG WG 7 reviewed six responses to a Call for Proposals (CfP) issued in April 2024 targeting the full range of point cloud formats, from dense point clouds used in immersive applications to sparse point clouds generated by Light Detection and Ranging (LiDAR) sensors in autonomous driving. With bit depths ranging from 10 to 18 bits, the CfP called for solutions that could meet the precision requirements of these varied use cases.

Among the six reviewed proposals, the leading proposal distinguished itself with a hybrid coding strategy that integrates end-to-end learning-based geometry coding and traditional attribute coding. This proposal demonstrated exceptional adaptability, capable of efficiently encoding both dense point clouds for immersive experiences and sparse point clouds from LiDAR sensors. With its unified design, the system supports inter-prediction coding using a shared model with intra-coding, applicable across various bitrate requirements without retraining. Furthermore, the proposal offers flexible configurations for both lossy and lossless geometry coding.

Performance assessments highlighted the leading proposal’s effectiveness, with significant bitrate reductions compared to traditional codecs: a 47% reduction for dense, dynamic sequences in immersive applications and a 35% reduction for sparse dynamic sequences in LiDAR data. For combined geometry and attribute coding, it achieved a 40% bitrate reduction across both dense and sparse dynamic sequences, while subjective evaluations confirmed its superior visual quality over baseline codecs.

The leading proposal has been selected as the initial test model, which can be seen as a baseline implementation for future improvements and developments. Additionally, MPEG issued a working draft and common test conditions.

Research aspects: The initial test model, like those for other codec test models, is typically available as open source. This enables both academia and industry to contribute to refining various elements of the upcoming AI-based Point Cloud Coding standard. Of particular interest is how training data and processes are incorporated into the standardization project and their impact on the final standard.

Another point cloud-related project is called Enhanced G-PCC, which introduces several advanced features to improve the compression and transmission of 3D point clouds. Notable enhancements include inter-frame coding, refined octree coding techniques, Trisoup surface coding for smoother geometry representation, and dynamic Optimal Binarization with Update On-the-fly (OBUF) modules. These updates provide higher compression efficiency while managing computational complexity and memory usage, making them particularly advantageous for real-time processing and high visual fidelity applications, such as LiDAR data for autonomous driving and dense point clouds for immersive media.

By adding this new part to MPEG-I, MPEG addresses the industry’s growing demand for scalable, versatile 3D compression technology capable of handling both dense and sparse point clouds. Enhanced G-PCC provides a robust framework that meets the diverse needs of both current and emerging applications in 3D graphics and multimedia, solidifying its role as a vital component of modern multimedia systems.

MPEG Systems Updates

At its 148^th meeting, MPEG Systems (WG 3) worked on the following aspects, among others:

New Part of MPEG DASH for redundant encoding and packaging
Reference software and conformance of ISOBMFF
A new structural CMAF brand profile

The second edition of ISO/IEC 14496-32 (ISOBMFF) introduces updated reference software and conformance guidelines, and the new CMAF brand profile supports Multi-View High Efficiency Video Coding (MV-HEVC), which is compatible with devices like Apple Vision Pro and Meta Quest 3.

The new part of MPEG DASH, ISO/IEC 23009-9, addresses redundant encoding and packaging for segmented live media (REAP). The standard is designed for scenarios where redundant encoding and packaging are essential, such as 24/7 live media production and distribution in cloud-based workflows. It specifies formats for interchangeable live media ingest and stream announcements, as well as formats for generating interchangeable media presentation descriptions. Additionally, it provides failover support and mechanisms for reintegrating distributed components in the workflow, whether they involve file-based content, live inputs, or a combination of both.

Research aspects: With the FDIS of MPEG DASH REAP available, the following topics offer potential for both academic and industry-driven research aligned with the standard’s objectives (in no particular order or priority):

Optimization of redundant encoding and packaging: Investigate methods to minimize resource usage (e.g., computational power, storage, and bandwidth) in redundant encoding and packaging workflows. Explore trade-offs between redundancy levels and quality of service (QoS) in segmented live media scenarios.
Interoperability of live media Ingest formats: Evaluate the interoperability of the standard’s formats with existing live media workflows and tools. Develop techniques for seamless integration with legacy systems and emerging cloud-based media workflows.
Failover mechanisms for cloud-based workflows: Study the reliability and latency of failover mechanisms in distributed live media workflows. Propose enhancements to the reintegration of failed components to maintain uninterrupted service.
Standardized stream announcements and descriptions: Analyze the efficiency and scalability of stream announcement formats in large-scale live streaming scenarios. Research methods for dynamically updating media presentation descriptions during live events.
Hybrid workflow support: Investigate the challenges and opportunities in combining file-based and live input workflows within the standard. Explore strategies for adaptive workflow transitions between live and on-demand content.
Cloud-based workflow scalability: Examine the scalability of the REAP standard in high-demand scenarios, such as global live event streaming. Study the impact of cloud-based distributed workflows on latency and synchronization.
Security and resilience: Research security challenges related to redundant encoding and packaging in cloud environments. Develop techniques to enhance the resilience of workflows against cyberattacks or system failures.
Performance metrics and quality assessment: Define performance metrics for evaluating the effectiveness of REAP in live media workflows. Explore objective and subjective quality assessment methods for media streams delivered using this standard.

The current/updated status of MPEG-DASH is shown in the figure below.

Video Coding Updates

In terms of video coding, two noteworthy updates are described here:

Part 3 of MPEG-AI, ISO/IEC 23888-3 – Optimization of encoders and receiving systems for machine analysis of coded video content, reached Committee Draft Technical Report (CDTR) status
Second edition of conformance and reference software for MPEG Immersive Video (MIV). This draft includes verified and validated conformance bitstreams and encoding and decoding reference software based on version 22 of the Test model for MPEG immersive video (TMIV). The test model, objective metrics, and some other tools are publicly available at https://gitlab.com/mpeg-i-visual.

Part 3 of MPEG-AI, ISO/IEC 23888-3: This new technical report on “optimization of encoders and receiving systems for machine analysis of coded video content” is based on software experiments conducted by JVET, focusing on optimizing non-normative elements such as preprocessing, encoder settings, and postprocessing. The research explored scenarios where video signals, decoded from bitstreams compliant with the latest video compression standard, ISO/IEC 23090-3 – Versatile Video Coding (VVC), are intended for input into machine vision systems rather than for human viewing. Compared to the JVET VVC reference software encoder, which was originally optimized for human consumption, significant bit rate reductions were achieved when machine vision task precision was used as the performance criterion.

The report will include an annex with example software implementations of these non-normative algorithmic elements, applicable to VVC or other video compression standards. Additionally, it will explore the potential use of existing supplemental enhancement information messages from ISO/IEC 23002-7 – Versatile supplemental enhancement information messages for coded video bitstreams – for embedding metadata useful in these contexts.

Research aspects: (1) Focus on optimizing video encoding for machine vision tasks by refining preprocessing, encoder settings, and postprocessing to improve bit rate efficiency and task precision, compared to traditional approaches for human viewing. (2) Examine the use of metadata, specifically SEI messages from ISO/IEC 23002-7, to enhance machine analysis of compressed video, improving adaptability, performance, and interoperability.

Subjective Quality Testing for Film Grain Synthesis

At the 148^th MPEG meeting , the MPEG Joint Video Experts Team (JVET) with ITU-T SG 16 (WG 5 / JVET) and MPEG Visual Quality Assessment (AG 5) conducted a formal expert viewing experiment to assess the impact of film grain synthesis on the subjective quality of video content. This evaluation specifically focused on film grain synthesis controlled by the Film Grain Characteristics (FGC) supplemental enhancement information (SEI) message. The study aimed to demonstrate the capability of film grain synthesis to mask compression artifacts introduced by the underlying video coding schemes.

For the evaluation, FGC SEI messages were adapted to a diverse set of video sequences, including scans of original film material, digital camera noise, and synthetic film grain artificially applied to digitally captured video. The subjective performance of video reconstructed from VVC and HEVC bitstreams was compared with and without film grain synthesis. The results highlighted the effectiveness of film grain synthesis, showing a significant improvement in subjective quality and enabling bitrate savings of up to a factor of 10 for certain test points.

This study opens several avenues for further research:

Optimization of film grain synthesis techniques: Investigating how different grain synthesis methods affect the perceptual quality of video across a broader range of content and compression levels.
Compression artifact mitigation: Exploring the interaction between film grain synthesis and specific types of compression artifacts, with a focus on improving masking efficiency.
Adaptation of FGC SEI messages: Developing advanced algorithms for tailoring FGC SEI messages to dynamically adapt to diverse video characteristics, including real-time encoding scenarios.
Bitrate savings analysis: Examining the trade-offs between bitrate savings and subjective quality across various coding standards and network conditions.

The 149th MPEG meeting will be held in Geneva, Switzerland from January 20-24, 2025. Click here for more information about MPEG meetings and their developments.

MPEG Column: 147th MPEG Meeting in Sapporo, Japan

By Christian Timmerer | September 8, 2024 - 11:12 |October 15, 2024 0324, Event Report, Feature

Leave a comment

The 147th MPEG meeting was held in Sapporo, Japan from 15-19 July 2024, and the official press release can be found here. It comprises the following highlights:

ISO Base Media File Format*: The 8th edition was promoted to Final Draft International Standard, supporting seamless media presentation for DASH and CMAF.
Syntactic Description Language: Finalized as an independent standard for MPEG-4 syntax.
Low-Overhead Image File Format*: First milestone achieved for small image handling improvements.
Neural Network Compression*: Second edition for conformance and reference software promoted.
Internet of Media Things (IoMT): Progress made on reference software for distributed media tasks.

* … covered in this column and expanded with possible research aspects.

8th edition of ISO Base Media File Format

The ever-growing expansion of the ISO/IEC 14496-12 ISO base media file format (ISOBMFF) application area has continuously brought new technologies to the standards. During the last couple of years, MPEG Systems (WG 3) has received new technologies on ISOBMFF for more seamless support of ISO/IEC 23009 Dynamic Adaptive Streaming over HTTP (DASH) and ISO/IEC 23000-19 Common Media Application Format (CMAF) leading to the development of the 8th edition of ISO/IEC14496-12.

The new edition of the standard includes new technologies to explicitly indicate the set of tracks representing various versions of the media presentation of a single media for seamless switching and continuous presentation. Such technologies will enable more efficient processing of the ISOBMFF formatted files for DASH manifest or CMAF Fragments.

Research aspects: The central research aspect of the 8th edition of ISOBMFF, which “will enable more efficient processing,” will undoubtedly be its evaluation compared to the state-of-the-art. Standards typically define a format, but how to use it is left open to implementers. Therefore, the implementation is a crucial aspect and will allow for a comparison of performance. One such implementation of ISOBMFF is GPAC, which most likely will be among the first to implement these new features.

Low-Overhead Image File Format

ISO/IEC 23008-12 image format specification defines generic structures for storing image items and sequences based on ISO/IEC 14496-12 ISO base media file format (ISOBMFF). As it allows the use of various high-performance video compression standards for a single image or a series of images, it has been adopted by the market quickly. However, it was challenging to use it for very small-sized images such as icons or emojis. While the initial design of the standard was versatile and useful for a wide range of applications, the size of headers becomes an overhead for applications with tiny images. Thus, Amendment 3 of ISO/IEC 23008-12 low-overhead image file format aims to address this use case by adding a new compact box for storing metadata instead of the ‘Meta’ box to lower the size of the overhead.

Research aspects: The issue regarding header sizes of ISOBMFF for small files or low bitrate (in the case of video streaming) was known for some time. Therefore, amendments in these directions are appreciated while further performance evaluations are needed to confirm design choices made at this initial step of standardization.

Neural Network Compression

An increasing number of artificial intelligence applications based on artificial neural networks, such as edge-based multimedia content processing, content-adaptive video post-processing filters, or federated training, need to exchange updates of neural networks (e.g., after training on additional data or fine-tuning to specific content). For this purpose, MPEG developed a second edition of the standard for coding of neural networks for multimedia content description and analysis (NNC, ISO/IEC 15938-17, published in 2024), adding syntax for differential coding of neural network parameters as well as new coding tools. Trained models can be compressed to at least 10-20% for several architectures, even below 3%, of their original size without performance loss. Higher compression rates are possible at moderate performance degradation. In a distributed training scenario, a model update after a training iteration can be represented at 1% or less of the base model size on average without sacrificing the classification performance of the neural network.

In order to facilitate the implementation of the standard, the accompanying standard ISO/IEC 15938-18 has been updated to cover the second edition of ISO/IEC 15938-17. This standard provides a reference software for encoding and decoding NNC bitstreams, as well as a set of conformance guidelines and reference bitstreams for testing of decoder implementations. The software covers the functionalities of both editions of the standard, and can be configured to test different combinations of coding tools specified by the standard.

Research aspects: The reference software for NNC, together with the reference software for audio/video codecs, are vital tools for building complex multimedia systems and its (baseline) evaluation with respect to compression efficiency only (not speed). This is because reference software is usually designed for functionality (i.e., compression in this case) and not performance.

The 148th MPEG meeting will be held in Kemer, Türkiye, from November 04-08, 2024. Click here for more information about MPEG meetings and their developments.

MPEG Column: 146th MPEG Meeting in Rennes, France

By Christian Timmerer | May 21, 2024 - 10:38 |July 17, 2024 0224, Event Report, Feature, Standards

Leave a comment

The 146th MPEG meeting was held in Rennes, France from 22-26 April 2024, and the official press release can be found here. It comprises the following highlights:

AI-based Point Cloud Coding*: Call for proposals focusing on AI-driven point cloud encoding for applications such as immersive experiences and autonomous driving.
Object Wave Compression*: Call for interest in object wave compression for enhancing computer holography transmission.
Open Font Format: Committee Draft of the fifth edition, overcoming previous limitations like the 64K glyph encoding constraint.
Scene Description: Ratified second edition, integrating immersive media objects and extending support for various data types.
MPEG Immersive Video (MIV): New features in the second edition, enhancing the compression of immersive video content.
Video Coding Standards: New editions of AVC, HEVC, and Video CICP, incorporating additional SEI messages and extended multiview profiles.
Machine-Optimized Video Compression*: Advancement in optimizing video encoders for machine analysis.
MPEG-I Immersive Audio*: Reached Committee Draft stage, supporting high-quality, real-time interactive audio rendering for VR/AR/MR.
Video-based Dynamic Mesh Coding (V-DMC)*: Committee Draft status for efficiently storing and transmitting dynamic 3D content.
LiDAR Coding*: Enhanced efficiency and responsiveness in LiDAR data processing with the new standard reaching Committee Draft status.

* … covered in this column.

AI-based Point Cloud Coding

MPEG issued a Call for Proposals (CfP) on AI-based point cloud coding technologies as a result from ongoing explorations regarding use cases, requirements, and the capabilities of AI-driven point cloud encoding, particularly for dynamic point clouds.

With recent significant progress in AI-based point cloud compression technologies, MPEG is keen on studying and adopting AI methodologies. MPEG is specifically looking for learning-based codecs capable of handling a broad spectrum of dynamic point clouds, which are crucial for applications ranging from immersive experiences to autonomous driving and navigation. As the field evolves rapidly, MPEG expects to receive multiple innovative proposals. These may include a unified codec, capable of addressing multiple types of point clouds, or specialized codecs tailored to meet specific requirements, contingent upon demonstrating clear advantages. MPEG has therefore publicly called for submissions of AI-based point cloud codecs, aimed at deepening the understanding of the various options available and their respective impacts. Submissions that meet the requirements outlined in the call will be invited to provide source code for further analysis, potentially laying the groundwork for a new standard in AI-based point cloud coding. MPEG welcomes all relevant contributions and looks forward to evaluating the responses.

Research aspects: In-depth analysis of algorithms, techniques, and methodologies, including a comparative study of various AI-driven point cloud compression techniques to identify the most effective approaches. Other aspects include creating or improving learning-based codecs that can handle dynamic point clouds as well as metrics for evaluating the performance of these codecs in terms of compression efficiency, reconstruction quality, computational complexity, and scalability. Finally, the assessment of how improved point cloud compression can enhance user experiences would be worthwhile to consider here also.

Object Wave Compression

A Call for Interest (CfI) in object wave compression has been issued by MPEG. Computer holography, a 3D display technology, utilizes a digital fringe pattern called a computer-generated hologram (CGH) to reconstruct 3D images from input 3D models. Holographic near-eye displays (HNEDs) reduce the need for extensive pixel counts due to their wearable design, positioning the display near the eye. This positions HNEDs as frontrunners for the early commercialization of computer holography, with significant research underway for product development. Innovative approaches facilitate the transmission of object wave data, crucial for CGH calculations, over networks. Object wave transmission offers several advantages, including independent treatment from playback device optics, lower computational complexity, and compatibility with video coding technology. These advancements open doors for diverse applications, ranging from entertainment experiences to real- time two-way spatial transmissions, revolutionizing fields such as remote surgery and virtual collaboration. As MPEG explores object wave compression for computer holography transmission, a Call for Interest seeks contributions to address market needs in this field.

Research aspects: Apart from compression efficiency, lower computation complexity, and compatibility with video coding technology, there is a range of research aspects, including the design, implementation, and evaluation of coding algorithms within the scope of this CfI. The QoE of computer-generated holograms (CGHs) together with holographic near-eye displays (HNEDs) is yet another dimension to be explored.

Machine-Optimized Video Compression

MPEG started working on a technical report regarding to the “Optimization of Encoders and Receiving Systems for Machine Analysis of Coded Video Content”. In recent years, the efficacy of machine learning-based algorithms in video content analysis has steadily improved. However, an encoder designed for human consumption does not always produce compressed video conducive to effective machine analysis. This challenge lies not in the compression standard but in optimizing the encoder or receiving system. The forthcoming technical report addresses this gap by showcasing technologies and methods that optimize encoders or receiving systems to enhance machine analysis performance.

Research aspects: Video (and audio) coding for machines has been recently addressed by MPEG Video and Audio working groups, respectively. MPEG Joint Video Experts Team with ITU-T SG16, also known as JVET, joined this space with a technical report, but research aspects remain unchanged, i.e., coding efficiency, metrics, and quality aspects for machine analysis of compressed/coded video content.

MPEG-I Immersive Audio

MPEG Audio Coding is entering the “immersive space” with MPEG-I immersive audio and its corresponding reference software. The MPEG-I immersive audio standard sets a new benchmark for compact and lifelike audio representation in virtual and physical spaces, catering to Virtual, Augmented, and Mixed Reality (VR/AR/MR) applications. By enabling high-quality, real-time interactive rendering of audio content with six degrees of freedom (6DoF), users can experience immersion, freely exploring 3D environments while enjoying dynamic audio. Designed in accordance with MPEG’s rigorous standards, MPEG-I immersive audio ensures efficient distribution across bandwidth-constrained networks without compromising on quality. Unlike proprietary frameworks, this standard prioritizes interoperability, stability, and versatility, supporting both streaming and downloadable content while seamlessly integrating with MPEG-H 3D audio compression. MPEG-I’s comprehensive modeling of real-world acoustic effects, including sound source properties and environmental characteristics, guarantees an authentic auditory experience. Moreover, its efficient rendering algorithms balance computational complexity with accuracy, empowering users to finely tune scene characteristics for desired outcomes.

Research aspects: Evaluating QoE of MPEG-I immersive audio-enabled environments as well as the efficient audio distribution across bandwidth-constrained networks without compromising on audio quality are two important research aspects to be addressed by the research community.

Video-based Dynamic Mesh Coding (V-DMC)

Video-based Dynamic Mesh Compression (V-DMC) represents a significant advancement in 3D content compression, catering to the ever-increasing complexity of dynamic meshes used across various applications, including real-time communications, storage, free-viewpoint video, augmented reality (AR), and virtual reality (VR). The standard addresses the challenges associated with dynamic meshes that exhibit time-varying connectivity and attribute maps, which were not sufficiently supported by previous standards. Video-based Dynamic Mesh Compression promises to revolutionize how dynamic 3D content is stored and transmitted, allowing more efficient and realistic interactions with 3D content globally.

Research aspects: V-DMC aims to allow “more efficient and realistic interactions with 3D content”, which are subject to research, i.e., compression efficiency vs. QoE in constrained networked environments.

Low Latency, Low Complexity LiDAR Coding

Low Latency, Low Complexity LiDAR Coding underscores MPEG’s commitment to advancing coding technologies required by modern LiDAR applications across diverse sectors. The new standard addresses critical needs in the processing and compression of LiDAR-acquired point clouds, which are integral to applications ranging from automated driving to smart city management. It provides an optimized solution for scenarios requiring high efficiency in both compression and real-time delivery, responding to the increasingly complex demands of LiDAR data handling. LiDAR technology has become essential for various applications that require detailed environmental scanning, from autonomous vehicles navigating roads to robots mapping indoor spaces. The Low Latency, Low Complexity LiDAR Coding standard will facilitate a new level of efficiency and responsiveness in LiDAR data processing, which is critical for the real-time decision-making capabilities needed in these applications. This standard builds on comprehensive analysis and industry feedback to address specific challenges such as noise reduction, temporal data redundancy, and the need for region-based quality of compression. The standard also emphasizes the importance of low latency coding to support real-time applications, essential for operational safety and efficiency in dynamic environments.

Research aspects: This standard effectively tackles the challenge of balancing high compression efficiency with real-time capabilities, addressing these often conflicting goals. Researchers may carefully consider these aspects and make meaningful contributions.

The 147th MPEG meeting will be held in Sapporo, Japan, from July 15-19, 2024. Click here for more information about MPEG meetings and their developments.

Energy-Efficient Video Streaming: Open-Source Tools, Datasets, and Solutions

By Christian Timmerer | April 22, 2024 - 12:08 |June 26, 2024 0224, Feature, QoE Column

Leave a comment

Abstract: Energy efficiency has become a crucial aspect of today’s IT infrastructures, and video (streaming) accounts for over half of today’s Internet traffic. This column highlights open-source tools, datasets, and solutions addressing energy efficiency in video streaming presented at ACM Multimedia Systems 2024 and its co-located workshop ACM Green Multimedia Systems.

Introduction

Across various platforms, users seek the highest Quality of Experience (QoE) in video communication and streaming. Whether it’s a crucial business meeting or a relaxing evening of entertainment, individuals desire seamless and high-quality video experiences. However, meeting this demand for high-quality video comes with a cost: increased energy usage [1],[2]. This energy consumption occurs at every stage of the process, including content provision via cloud services and consumption on end users’ devices [3]. Unfortunately, this heightened energy consumption inevitably leads to higher CO2 emissions (except for renewable energy sources), posing environmental challenges. It emphasizes the need for studies to assess the carbon footprint of video streaming.

Content provision is a critical stage in video streaming, involving encoding videos into various formats, resolutions, and bitrates. Encoding demands computing power and energy, especially in cloud-based systems. Cloud computing has become famous for video encoding due to its scalability [4] to adjust cloud resources to handle changing workloads and flexibility [5] to scale their operations based on demand. However, this convenience comes at a cost. Data centers, the heart of cloud computing, consume a significant portion of global electricity, around 3% [6]. Video encoding is one of the biggest energy consumers within these data centers. Therefore, optimizing video encoding for lower energy consumption is crucial for reducing the environmental impact of cloud-based video delivery.

Content consumption [7] involves the device using the network interface card to request and download video segments from the server, decompressing them for playback, and finally rendering the decoded frames on the screen, where the energy consumption depends on the screen technology and brightness settings.

The GAIA project showcased its research on the environmental impact of video streaming at the recent 15th ACM Multimedia Systems Conference (April 15-18, Bari, Italy). We presented our findings at relevant conference sessions: Open-Source Software and Dataset and the Green Multimedia Systems (GMSys) workshop.

Open Source Software

GREEM: An Open-Source Benchmark Tool Measuring the Environmental Footprint of Video Streaming [PDF] [Github] [Poster]

GREEM (Gaia Resource Energy and Emission Monitoring) aims to measure energy usage during video encoding and decoding processes. GREEM tracks the effects of video processing on hardware performance and provides a suite of analytical scenarios. This tool offers easy-to-use scenarios covering the most common video streaming situations, such as measuring sequential and parallel video encoding and decoding.

Contributions:

Accessible: GREEM is available in a GitHub repository (https://github.com/cd-athena/GREEM) for energy measurement of video processing.
Automates experimentation: It allows users to easily configure and run various encoding scenarios with different parameters to compare results.
In-depth monitoring: The tool traces numerous hardware parameters, specifically monitoring energy consumption and GPU metrics, including core and memory utilization, temperature, and fan speed, providing a complete picture of video processing resource usage.
Visualization: GREEM offers scripts that generate analytic plots, allowing users to visualize and understand their measurement results easily.

Verifiable: GREEM empowers researchers with a tool that has earned the ACM Reproducibility Badge, which allows others to reproduce the experiments and results reported in the paper.

Open Source Datasets

VEED: Video Encoding Energy and CO2 Emissions Dataset for AWS EC2 instances [PDF] [Github] [Poster]

As video encoding increasingly shifts to cloud-based services, concerns about the environmental impact of massive data centers arise. The Video Encoding Energy and CO2 Emissions Dataset (VEED) provides the energy consumption and CO2 emissions associated with video encoding on Amazon’s Elastic Compute Cloud (EC2) instances. Additionally, VEED goes beyond energy consumption as it also captures encoding duration and CPU utilization.

Contributions:

Findability: A comprehensive metadata description file ensures VEED’s discoverability for researchers.
Accessibility: VEED is open for download on GitHub (https://github.com/cd-athena/VEEDdataset), removing access barriers for researchers. Core findings in the research that leverages the VEED dataset have been independently verified (ACM Reproducibility Badge).
Interoperability: The dataset is provided in a comma-separated value (CSV) format, allowing integration with various analysis applications.
Reusability: Description files empower researchers to understand the data structure and context, facilitating its use in diverse analytical projects.

COCONUT: Content Consumption Energy Measurement Dataset for Adaptive Video Streaming [PDF] [Github]

COCONUT is a dataset comprising the energy consumption of video streaming across various devices and different HAS (HTTP Adaptive Streaming) players. COCONUT captures user data during MPEG-DASH video segment streaming on laptops, smartphones, and other client devices, measuring energy consumption at different stages of streaming, including segment retrieval through the network interface card, video decoding, and rendering on the device.This paper has been designated the ACM Artifacts Available badge, signifying that the COCONUT dataset is publicly accessible. COCONUT can be accessed at https://athena.itec.aau.at/coconut/.

Second International ACM Green Multimedia Systems Workshop — GMSys 2024

VEEP: Video Encoding Energy and CO2 Emission Prediction [pdf] [slides]

In VEEP, a machine learning (ML) scheme that empowers users to predict the energy consumption and CO2 emissions associated with cloud-based video encoding.

Contributions:

Content-aware energy prediction: VEEP analyzes video content to extract features impacting encoding complexity. This understanding feeds an ML model that accurately predicts the energy consumption required for encoding the video on AWS EC2 instances. (High Accuracy: Achieves an R² score of 0.96)
Real-time carbon footprint: VEEP goes beyond energy. It also factors in real-time carbon intensity data based on the location of the cloud instance. This allows VEEP to calculate the associated CO2 emissions for your encoding tasks at encoding time.
Resulting impact: By carefully selecting the type and location of cloud instances based on VEEP’s predictions, CO2 emissions can be reduced by up to 375 times. This significant reduction signifies VEEP’s potential to contribute to greener video encoding.

Conclusions

This column provided an overview of the GAIA project’s research on the environmental impact of video streaming, presented at the 15th ACM Multimedia Systems Conference. GREEM measurement tool empowers developers and researchers to measure the energy and CO2 emissions of video processing. VEED provides valuable insights into energy consumption and CO2 emissions during cloud-based video encoding on AWS EC2 instances. COCONUT sheds light on energy usage during video playback on various devices and with different players, aiding in optimizing client-side video streaming. Furthermore, VEEP, a machine learning framework, takes energy efficiency a step further. It allows users to predict energy consumption and CO2 emissions associated with cloud-based video encoding, allowing users to select cloud instances that minimize environmental impact. These studies can help researchers, developers, and service providers to optimize video streaming for a more sustainable future. The focus on encoding and playback highlights the importance of a holistic approach considering the entire video streaming lifecycle. While these papers primarily focus on the environmental impact of video streaming, a strong connection exists between energy efficiency and QoE [8],[9],[10]. Optimizing video processing for lower energy consumption can sometimes lead to trade-offs regarding video quality. Future research directions could explore techniques for optimizing video processing while ensuring a consistently high QoE for viewers.

References

[1] A. Katsenou, J. Mao, and I. Mavromatis, “Energy-Rate-Quality Tradeoffs of State-of-the-Art Video Codecs.” arXiv, Oct. 02, 2022. Accessed: Oct. 06, 2022. [Online]. Available: http://arxiv.org/abs/2210.00618

[2] H. Amirpour, V. V. Menon, S. Afzal, R. Prodan, and C. Timmerer, “Optimizing video streaming for sustainability and quality: The role of preset selection in per-title encoding,” in 2023 IEEE International Conference on Multimedia and Expo (ICME), IEEE, 2023, pp. 1679–1684. Accessed: May 05, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10219577/

[3] S. Afzal, R. Prodan, C. Timmerer, “Green Video Streaming: Challenges and Opportunities – ACM SIGMM Records.” Accessed: May 05, 2024. [Online]. Available: https://records.sigmm.org/2023/01/08/green-video-streaming-challenges-and-opportunities/

[4] A. Atadoga, U. J. Umoga, O. A. Lottu, and E. O. Sodiya, “Evaluating the impact of cloud computing on accounting firms: A review of efficiency, scalability, and data security,” Glob. J. Eng. Technol. Adv., vol. 18, no. 2, pp. 065–075, Feb. 2024, doi: 10.30574/gjeta.2024.18.2.0027.

[5] B. Zeng, Y. Zhou, X. Xu, and D. Cai, “Bi-level planning approach for incorporating the demand-side flexibility of cloud data centers under electricity-carbon markets,” Appl. Energy, vol. 357, p. 122406, Mar. 2024, doi: 10.1016/j.apenergy.2023.122406.

[6] M. Law, “Energy efficiency predictions for data centers in 2023.” Accessed: May 03, 2024. [Online]. Available: https://datacentremagazine.com/articles/efficiency-to-loom-large-for-data-centre-industry-in-2023

[7] C. Yue, S. Sen, B. Wang, Y. Qin, and F. Qian, “Energy considerations for ABR video streaming to smartphones: Measurements, models and insights,” in Proceedings of the 11th ACM Multimedia Systems Conference, 2020, pp. 153–165, doi: 10.1145/3339825.3391867.

[8] G. Bingöl, A. Floris, S. Porcu, C. Timmerer, and L. Atzori, “Are Quality and Sustainability Reconcilable? A Subjective Study on Video QoE, Luminance and Resolution,” in 2023 15th International Conference on Quality of Multimedia Experience (QoMEX), IEEE, 2023, pp. 19–24. Accessed: May 06, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10178513/

[9] G. Bingöl, S. Porcu, A. Floris, and L. Atzori, “An Analysis of the Trade-Off Between Sustainability and Quality of Experience for Video Streaming,” in 2023 IEEE International Conference on Communications Workshops (ICC Workshops), IEEE, 2023, pp. 1600–1605. Accessed: May 06, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10283614/

[10] C. Herglotz, W. Robitza, A. Raake, T. Hossfeld, and A. Kaup, “Power Reduction Opportunities on End-User Devices in Quality-Steady Video Streaming.” arXiv, May 24, 2023. doi: 10.48550/arXiv.2305.15117.

MPEG Column: 145th MPEG Meeting (Virtual/Online)

By Christian Timmerer | February 22, 2024 - 12:26 |February 22, 2024 0124, Event Report, Feature, Standards

Leave a comment

The 145th MPEG meeting was held online from 22-26 January 2024, and the official press release can be found here. It comprises the following highlights:

Latest Edition of the High Efficiency Image Format Standard Unveils Cutting-Edge Features for Enhanced Image Decoding and Annotation
MPEG Systems finalizes Standards supporting Interoperability Testing
MPEG finalizes the Third Edition of MPEG-D Dynamic Range Control
MPEG finalizes the Second Edition of MPEG-4 Audio Conformance
MPEG Genomic Coding extended to support Transport and File Format for Genomic Annotations
MPEG White Paper: Neural Network Coding (NNC) – Efficient Storage and Inference of Neural Networks for Multimedia Applications

This column will focus on the High Efficiency Image Format (HEIF) and interoperability testing. As usual, a brief update on MPEG-DASH et al. will be provided.

High Efficiency Image Format (HEIF)

The High Efficiency Image Format (HEIF) is a widely adopted standard in the imaging industry that continues to grow in popularity. At the 145th MPEG meeting, MPEG Systems (WG 3) ratified its third edition, which introduces exciting new features, such as progressive decoding capabilities that enhance image quality through a sequential, single-decoder instance process. With this enhancement, users can decode bitstreams in successive steps, with each phase delivering perceptible improvements in image quality compared to the preceding step. Additionally, the new edition introduces a sophisticated data structure that describes the spatial configuration of the camera and outlines the unique characteristics responsible for generating the image content. The update also includes innovative tools for annotating specific areas in diverse shapes, adding a layer of creativity and customization to image content manipulation. These annotation features cater to the diverse needs of users across various industries.

Research aspects: Progressive coding has been a part of modern image coding formats for some time now. However, the inclusion of supplementary metadata provides an opportunity to explore new use cases that can benefit both user experience (UX) and quality of experience (QoE) in academic settings.

Interoperability Testing

MPEG standards typically comprise format definitions (or specifications) to enable interoperability among products and services from different vendors. Interestingly, MPEG goes beyond these format specifications and provides reference software and conformance bitstreams, allowing conformance testing.

At the 145^th MPEG meeting, MPEG Systems (WG 3) finalized two standards comprising conformance and reference software by promoting it to the Final Draft International Standard (FDIS), the final stage of standards development. The finalized standards, ISO/IEC 23090-24 and ISO/IEC 23090-25, showcase the pinnacle of conformance and reference software for scene description and visual volumetric video-based coding data, respectively.

ISO/IEC 23090-24 focuses on conformance and reference software for scene description, providing a comprehensive reference implementation and bitstream tailored for conformance testing related to ISO/IEC 23090-14, scene description. This standard opens new avenues for advancements in scene depiction technologies, setting a new standard for conformance and software reference in this domain.

Similarly, ISO/IEC 23090-25 targets conformance and reference software for the carriage of visual volumetric video-based coding data. With a dedicated reference implementation and bitstream, this standard is poised to elevate the conformance testing standards for ISO/IEC 23090-10, the carriage of visual volumetric video-based coding data. The introduction of this standard is expected to have a transformative impact on the visualization of volumetric video data.

At the same 145^th MPEG meeting, MPEG Audio Coding (WG6) celebrated the completion of the second edition of ISO/IEC 14496-26, audio conformance, elevating it to the Final Draft International Standard (FDIS) stage. This significant update incorporates seven corrigenda and five amendments into the initial edition, originally published in 2010.

ISO/IEC 14496-26 serves as a pivotal standard, providing a framework for designing tests to ensure the compliance of compressed data and decoders with the requirements outlined in ISO/IEC 14496-3 (MPEG-4 Audio). The second edition reflects an evolution of the original, addressing key updates and enhancements through diligent amendments and corrigenda. This latest edition, now at the FDIS stage, marks a notable stride in MPEG Audio Coding’s commitment to refining audio conformance standards and ensuring the seamless integration of compressed data within the MPEG-4 Audio framework.

These standards will be made freely accessible for download on the official ISO website, ensuring widespread availability for industry professionals, researchers, and enthusiasts alike.

Research aspects: Reference software and conformance bitstreams often serve as the basis for further research (and development) activities and, thus, are highly appreciated. For example, reference software of video coding formats (e.g., HM for HEVC, VM for VVC) can be used as a baseline when improving coding efficiency or other aspects of the coding format.

MPEG-DASH Updates

The current status of MPEG-DASH is shown in the figure below.

The following most notable aspects have been discussed at the 145th MPEG meeting and adopted into ISO/IEC 23009-1, which will eventually become the 6th edition of the MPEG-DASH standard:

It is now possible to pass CMCD parameters sid and cid via the MPD URL.
Segment duration patterns can be signaled using SegmentTimeline.
Definition of a background mode of operation, which allows a DASH player to receive MPD updates and listen to events without possibly decrypting or rendering any media.

Additionally, the technologies under consideration (TuC) document has been updated with means to signal maximum segment rate, extend copyright license signaling, and improve haptics signaling in DASH. Finally, REAP is progressing towards FDIS but not yet there and most details will be discussed in the upcoming AhG period.

The 146th MPEG meeting will be held in Rennes, France, from April 22-26, 2024. Click here for more information about MPEG meetings and their developments.