Author: Christian Timmerer, firstname.lastname@example.org
Affiliation: Alpen-Adria-Universität (AAU) Klagenfurt, Austria & Bitmovin Inc., San Francisco, CA, USA
The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.
The 139th MPEG meeting was once again held as an online meeting, and the official press release can be found here and comprises the following items:
- MPEG Issues Call for Evidence for Video Coding for Machines (VCM)
- MPEG Ratifies the Third Edition of Green Metadata, a Standard for Energy-Efficient Media Consumption
- MPEG Completes the Third Edition of the Common Media Application Format (CMAF) by adding Support for 8K and High Frame Rate for High Efficiency Video Coding
- MPEG Scene Descriptions adds Support for Immersive Media Codecs
- MPEG Starts New Amendment of VSEI containing Technology for Neural Network-based Post Filtering
- MPEG Starts New Edition of Video Coding-Independent Code Points Standard
- MPEG White Paper on the Third Edition of the Common Media Application Format
In this report, I’d like to focus on VCM, Green Metadata, CMAF, VSEI, and a brief update about DASH (as usual).
Video Coding for Machines (VCM)
MPEG’s exploration work on Video Coding for Machines (VCM) aims at compressing features for machine-performed tasks such as video object detection and event analysis. As neural networks increase in complexity, architectures such as collaborative intelligence, whereby a network is distributed across an edge device and the cloud, become advantageous. With the rise of newer network architectures being deployed amongst a heterogenous population of edge devices, such architectures bring flexibility to systems implementers. Due to such architectures, there is a need to efficiently compress intermediate feature information for transport over wide area networks (WANs). As feature information differs substantially from conventional image or video data, coding technologies and solutions for machine usage could differ from conventional human-viewing-oriented applications to achieve optimized performance. With the rise of machine learning technologies and machine vision applications, the amount of video and images consumed by machines has rapidly grown. Typical use cases include intelligent transportation, smart city technology, intelligent content management, etc., which incorporate machine vision tasks such as object detection, instance segmentation, and object tracking. Due to the large volume of video data, extracting and compressing the feature from a video is essential for efficient transmission and storage. Feature compression technology solicited in this Call for Evidence (CfE) can also be helpful in other regards, such as computational offloading and privacy protection.
Over the last three years, MPEG has investigated potential technologies for efficiently compressing feature data for machine vision tasks and established an evaluation mechanism that includes feature anchors, rate-distortion-based metrics, and evaluation pipelines. The evaluation framework of VCM depicted below comprises neural network tasks (typically informative) at both ends as well as VCM encoder and VCM decoder, respectively. The normative part of VCM typically includes the bitstream syntax which implicitly defines the decoder whereas other parts are usually left open for industry competition and research.
Further details about the CfP and how interested parties can respond can be found in the official press release here.
Research aspects: the main research area for coding-related standards is certainly compression efficiency (and probably runtime). However, this video coding standard will not target humans as video consumers but as machines. Thus, video quality and, in particular, Quality of Experience needs to be interpreted differently, which could be another worthwhile research dimension to be studied in the future.
MPEG Systems has been working on Green Metadata for the last ten years to enable the adaptation of the client’s power consumption according to the complexity of the bitstream. Many modern implementations of video decoders can adjust their operating voltage or clock speed to adjust the power consumption level according to the required computational power. Thus, if the decoder implementation knows the variation in the complexity of the incoming bitstream, then the decoder can adjust its power consumption level to the complexity of the bitstream. This will allow less energy use in general and extended video playback for the battery-powered devices.
The third edition enables support for Versatile Video Coding (VVC, ISO/IEC 23090-3, a.k.a. ITU-T H.266) encoded bitstreams and enhances the capability of this standard for real-time communication applications and services. While finalizing the support of VVC, MPEG Systems has also started the development of a new amendment to the Green Metadata standard, adding the support of Essential Video Coding (EVC, ISO/IEC 23094-1) encoded bitstreams.
Research aspects: reducing global greenhouse gas emissions will certainly be a challenge for humanity in the upcoming years. The amount of data on today’s internet is dominated by video, which all consumes energy from production to consumption. Therefore, there is a strong need for explicit research efforts to make video streaming in all facets friendly to our environment.
Third Edition of Common Media Application Format (CMAF)
The third edition of CMAF adds two new media profiles for High Efficiency Video Coding (HEVC, ISO/IEC 23008-2, a.k.a. ITU-T H.265), namely for (i) 8K and (ii) High Frame Rate (HFR). Regarding the former, the media profile supporting 8K resolution video encoded with HEVC (Main 10 profile, Main Tier with 10 bits per colour component) has been added to the list of CMAF media profiles for HEVC. The profile will be branded as ‘c8k0’ and will support videos with up to 7680×4320 pixels (8K) and up to 60 frames per second. Regarding the latter, another media profile has been added to the list of CMAF media profiles, branded as ‘c8k1’ and supports HEVC encoded video with up to 8K resolution and up to 120 frames per second. Finally, chroma location indication support has been added to the 3rd edition of CMAF.
Research aspects: basically, CMAF serves two purposes: (i) harmonizing DASH and HLS at the segment format level by adopting the ISOBMFF and (ii) enabling low latency streaming applications by introducing chunks (that are smaller than segments). The third edition supports resolutions up to 8K and HFR, which raises the question of how low latency can be achieved for 8K/HFR applications and services and under which conditions.
New Amendment for Versatile Supplemental Enhancement Information (VSEI) containing Technology for Neural Network-based Post Filtering
At the 139th MPEG meeting, the MPEG Joint Video Experts Team with ITU-T SG 16 (WG 5; JVET) issued a Committee Draft Amendment (CDAM) text for the Versatile Supplemental Enhancement Information (VSEI) standard (ISO/IEC 23002-7, a.k.a. ITU-T H.274). Beyond the Supplemental Enhancement Information (SEI) message for shutter interval indication, which is already known from its specification in Advanced Video Coding (AVC, ISO/IEC 14496-10, a.k.a. ITU-T H.264) and High Efficiency Video Coding (HEVC, ISO/IEC 23008-2, a.k.a. ITU-T H.265), and a new indicator for subsampling phase indication which is relevant for variable-resolution video streaming, this new amendment contains two SEI messages for describing and activating post filters using neural network technology in video bitstreams. This could reduce coding noise, upsampling, colour improvement, or denoising. The description of the neural network architecture itself is based on MPEG’s neural network coding standard (ISO/IEC 15938-17). Results from an exploration experiment have shown that neural network-based post filters can deliver better performance than conventional filtering methods. Processes for invoking these new post-processing filters have already been tested in a software framework and will be made available in an upcoming version of the Versatile Video Coding (VVC, ISO/IEC 23090-3, a.k.a. ITU-T H.266) reference software (ISO/IEC 23090-16, a.k.a. ITU-T H.266.2).
Research aspects: quality enhancements such as reducing coding noise, upsampling, colour improvement, or denoising have been researched quite substantially either with or without neural networks. Enabling such quality enhancements via (V)SEI messages enable system-level support for research and development efforts in this area. For example, integration in video streaming applications or/and conversational services, including performance evaluations.
The latest MPEG-DASH Update
Finally, I’d like to provide a brief update on MPEG-DASH! At the 139th MPEG meeting, MPEG Systems issued a new working draft related to Extended Dependent Random Access Point (EDRAP) streaming and other extensions, which will be further discussed during the Ad-hoc Group (AhG) period (please join the dash email list for further details/announcements). Furthermore, Defects under Investigation (DuI) and Technologies under Consideration (TuC) have been updated. Finally, a new part has been added (ISO/IEC 23009-9), which is called encoder and packager synchronization, for which also a working draft has been produced. Publicly available documents (if any) can be found here.
An updated overview of DASH standards/features can be found in the Figure below.
Research aspects: in the Christian Doppler Laboratory ATHENA we aim to research and develop novel paradigms, approaches, (prototype) tools and evaluation results for the phases (i) multimedia content provisioning (i.e., video coding), (ii) content delivery (i.e., video networking), and (iii) content consumption (i.e., video player incl. ABR and QoE) in the media delivery chain as well as for (iv) end-to-end aspects, with a focus on, but not being limited to, HTTP Adaptive Streaming (HAS). Recent DASH-related publications include “Low Latency Live Streaming Implementation in DASH and HLS” and “Segment Prefetching at the Edge for Adaptive Video Streaming” among others.
The 140th MPEG meeting will be face-to-face in Mainz, Germany, from October 24-28, 2022. Click here for more information about MPEG meetings and their developments.