About Christian Timmerer

Christian Timmerer is a researcher, entrepreneur, and teacher on immersive multimedia communication, streaming, adaptation, and Quality of Experience. He is an Assistant Professor at Alpen-Adria-Universität Klagenfurt, Austria. Follow him on Twitter at http://twitter.com/timse7 and subscribe to his blog at http://blog.timmerer.com.

Immersive Media Experiences – Why finding Consensus is Important

An introduction to the QUALINET White Paper on Definitions of Immersive Media Experience (IMEx) [1].

Introduction

Immersive media are reshaping the way users experience reality. They are increasingly incorporated across enterprise and consumer sectors to offer experiential solutions to a diverse range of industries. Current technologies that afford an immersive media experience (IMEx) include Augmented Reality (AR), Virtual Reality (VR), Mixed Reality (MR), and 360-degree video. Popular uses can be found in enhancing connectivity applications, supporting knowledge-based tasks, learning & skill development, as well as adding immersive and interactive dimensions to the retail, business, and entertainment industries. Whereas the evolution of immersive media can be traced over the past 50 years, its current popularity boost is primarily owed to significant advances in the last decade brought about by improved connectivity, superior computing, and device capabilities. In specific, advancements witnessed in display technologies, visualizations, interaction & tracking devices, recognition technologies, platform development, new media formats, and increasing user demand for real-time & dynamic content across platforms.

Though still in its infancy, the immersive economy is growing into a dynamic and confident sector. Being an emerging sector, it is hard to find official data, but some estimations project the immersive media global market size to continue its upward growth at around 30% CAGR to reach USD180 Bn by 2022 [2,3]. Country-wise, the USA is expected to secure 1/3rd of the global immersive media market share followed by China, Japan, Germany, and the UK as likely immersive media markets where significant spending is anticipated. Consumer products and devices are poised to be the largest contributing segment. The growth in immersive consumer products is expected to continue as Head-Mounted Displays (HMD) become commonplace and interest in mobile augmented reality increase [4]. However, immersive media are no longer just a pursuit of alternative display technologies but pushing towards holistic ecosystems that seek contributions from hardware manufacturers, application & platform developers, content producers, and users. These ecosystems are making way for sophisticated content creation available on platforms that allow user participation, interaction, and skill integration through advanced tools.

Immersive media experience (IMEx), today, is not only how users view media but in fact a transformative way to consume media altogether. They draw considerable interdisciplinary interest from multiple disciplines. As stakeholders increase, the need for clarity and coherence on definitions and concepts become all the more important. In this article, we provide an overview and a brief survey of some of the key definitions that are central to IMEx including its Quality of Experience (QoE), application areas, influencing factors, and assessment methods. Our aim is to enable some clarity and initiate consensus, on topics related to IMEx that can be useful for researchers and practitioners working both inside academia and the industry.

Why to understand IMEx?

IMEx combines reality with technology enabling emplaced multimedia experiences of standard media (film, photographic, or animated) as well as synthetic and interactive environments for users. They utilize visual, auditory, and haptic feedback to stimulate physical senses such that users psychologically feel immersed within these multidimensional media environments. This sense of “being there” is also referred to as presence.

As mentioned earlier, the enthusiasm for IMEx is mainly driven by the gaming, entertainment, retail, healthcare, digital marketing, and skill training industries. So far, research has tilted favourably towards innovation, with a particular interest in image capture, recognition, mapping, and display technologies over the past few years. However, the prevalence of IMEx has also ushered in a plethora of definitions, frameworks, and models to understand the psychological and phenomenological concepts associated with these media forms. Central, of course, are the closely related concepts of immersion and presence, which are interpreted varyingly across fields; for example, when one moves from literature to narratology to computer sciences. However, with immersive media, these three separate fields come together inside interactive digital narrative applications where immersive narratives are used to solve real-world problems. This is when noticeable interdisciplinary differences regarding definitions, scope, and constituents require urgent redressal to achieve a coherent understanding of the used concepts. Such consensus is vital for giving directionality to the future of immersive media that can be shared by all.

A White Paper on IMEx

A recent White Paper [1] by QUALINET, the European Network on Quality of Experience in Multimedia Systems and Services [5], is a contribution to the discussions related to Immersive Media Experience (IMEx). It attempts to build consensus around ideas and concepts that are related to IMEx but originate from multidisciplinary groups with a joint interest in multimedia experiences.

The QUALINET community aims at extending the notion of network-centric Quality of Service (QoS) in multimedia systems, by relying on the concept of Quality of Experience (QoE). The main scientific objective is the development of methodologies for subjective and objective quality metrics considering current and new trends in multimedia communication systems as witnessed by the appearance of new types of content and interactions.

The white paper was created based on an activity launched at the 13th QUALINET meeting on June 4, 2019, in Berlin as part of Task Force 7, Immersive Media Experiences (IMEx). The paper received contributions from 44 authors under 10 section leads, which were consolidated into a first draft and released among all section leads and editors for internal review. After incorporating the feedback from all section leads, the editors initially released the White Paper within the QUALINET community for review. Following feedback from QUALINET at large, the editors distributed the White Paper widely for an open, public community review (e.g., research communities/committees in ACM and IEEE, standards development organizations, various open email reflectors related to this topic). The feedback received from this public consultation process resulted in the final version which has been approved during the 14th QUALINET meeting on May 25, 2020.

Understanding the White Paper

The White Paper surveys definitions and concepts that contribute to IMEx. It describes the Quality of Experience (QoE) for immersive media by establishing a relationship between the concepts of QoE and IMEx. This article provides an outline of these concepts by looking at:

  • Survey of definitions of immersion and presence discusses various frameworks and conceptual models that are most relevant to these phenomena in terms of multimedia experiences.
  • Definition of immersive media experience describes experiential determinants for IMEx characterized through its various technological contexts.
  • Quality of experience for immersive media applies existing QoE concepts to understand the user-centric subjective feelings of “a sense of being there”, “a sense of agency”, and “cybersickness”.
  • Application area for immersive media experience presents an overview of immersive technologies in use within gaming, omnidirectional content, interactive storytelling, health, entertainment, and communications.
  • Influencing factors on immersive media experience look at the three existing influence factors on QoE with a pronounced emphasis on the human influence factor as of very high relevance to IMEx.
  • Assessment of immersive media experience underscores the importance of proper examination of multimedia systems, including IMEx, by highlighting three methods currently in use, i.e., subjective, behavioral, and psychophysiological.
  • Standardization activities discuss the three clusters of activities currently underway to achieve interoperability for IMEx: (i) data representation & formats; (ii) guidelines, systems standards, & APIs; and (iii) Quality of Experience (QoE).

Conclusions

Immersive media have significantly changed the use and experience of new digital media. These innovative technologies transcend traditional formats and present new ways to interact with digital information inside synthetic or enhanced realities, which include VR, AR, MR, and haptic communications. Earlier the need for a multidisciplinary consensus was discussed vis-à-vis definitions of IMEx. The QUALINET white paper provides such “a toolbox of definitions” for IMEx. It stands out for bringing together insights from multimedia groups spread across academia and industry, specifically the Video Quality Experts Group (VQEG) and the Immersive Media Group (IMG). This makes it a valuable asset for those working in the field of IMEx going forward.

References

[1] Perkis, A., Timmerer, C., et al., “QUALINET White Paper on Definitions of Immersive Media Experience (IMEx)”, European Network on Quality of Experience in Multimedia Systems and Services, 14th QUALINET meeting (online), May 25, 2020. Online: https://arxiv.org/abs/2007.07032
[2] Mateos-Garcia, J., Stathoulopoulos, K., & Thomas, N. (2018). The immersive economy in the UK (Rep. No. 18.1137.020). Innovate UK.
[3] Infocomm Media 2025 Supplementary Information (pp. 31-43, Rep.). (2015). Singapore: Ministry of Communications and Information.
[4] Hadwick, A. (2020). XR Industry Insight Report 2019-2020 (Rep.). San Francisco: VRX Conference & Expo.
[5] http://www.qualinet.eu/

MPEG Column: 132nd MPEG Meeting (virtual/online)

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 132nd MPEG meeting was the first meeting with the new structure. That is, ISO/IEC JTC 1/SC 29/WG 11 — the official name of MPEG under the ISO structure — was disbanded after the 131st MPEG meeting and some of the subgroups of WG 11 (MPEG) have been elevated to independent MPEG Working Groups (WGs) and Advisory Groups (AGs) of SC 29 rather than subgroups of the former WG 11. Thus, the MPEG community is now an affiliated group of WGs and AGs that will continue meeting together according to previous MPEG meeting practices and will further advance the standardization activities of the MPEG work program.

The 132nd MPEG meeting was the first meeting with the new structure as follows (incl. Convenors and position within WG 11 structure):

  • AG 2 MPEG Technical Coordination (Convenor: Prof. Jörn Ostermann; for overall MPEG work coordination and prev. known as the MPEG chairs meeting; it’s expected that one can also provide inputs to this AG without being a member of this AG)
  • WG 2 MPEG Technical Requirements (Convenor Dr. Igor Curcio; former Requirements subgroup)
  • WG 3 MPEG Systems (Convenor: Dr. Youngkwon Lim; former Systems subgroup)
  • WG 4 MPEG Video Coding (Convenor: Prof. Lu Yu; former Video subgroup)
  • WG 5 MPEG Joint Video Coding Team(s) with ITU-T SG 16 (Convenor: Prof. Jens-Rainer Ohm; former JVET)
  • WG 6 MPEG Audio Coding (Convenor: Dr. Schuyler Quackenbush; former Audio subgroup)
  • WG 7 MPEG Coding of 3D Graphics (Convenor: Prof. Marius Preda, former 3DG subgroup)
  • WG 8 MPEG Genome Coding (Convenor: Prof. Marco Mattaveli; newly established WG)
  • AG 3 MPEG Liaison and Communication (Convenor: Prof. Kyuheon Kim; (former Communications subgroup)
  • AG 5 MPEG Visual Quality Assessment (Convenor: Prof. Mathias Wien; former Test subgroup).

The 132nd MPEG meeting was held as an online meeting and more than 300 participants continued to work efficiently on standards for the future needs of the industry. As a group, MPEG started to explore new application areas that will benefit from standardized compression technology in the future. A new web site has been created and can be found at http://mpeg.org/.

The official press release can be found here and comprises the following items:

  • Versatile Video Coding (VVC) Ultra-HD Verification Test Completed and Conformance and Reference Software Standards Reach their First Milestone
  • MPEG Completes Geometry-based Point Cloud Compression (G-PCC) Standard
  • MPEG Evaluates Extensions and Improvements to MPEG-G and Announces a Call for Evidence on New Advanced Genomics Features and Technologies
  • MPEG Issues Draft Call for Proposals on the Coded Representation of Haptics
  • MPEG Evaluates Responses to MPEG IPR Smart Contracts CfP
  • MPEG Completes Standard on Harmonization of DASH and CMAF
  • MPEG Completes 2nd Edition of the Omnidirectional Media Format (OMAF)
  • MPEG Completes the Low Complexity Enhancement Video Coding (LCEVC) Standard

In this report, I’d like to focus on VVC, G-PCC, DASH/CMAF, OMAF, and LCEVC.

Versatile Video Coding (VVC) Ultra-HD Verification Test Completed and Conformance & Reference Software Standards Reach their First Milestone

MPEG completed a verification testing assessment of the recently ratified Versatile Video Coding (VVC) standard for ultra-high definition (UHD) content with standard dynamic range, as may be used in newer streaming and broadcast television applications. The verification test was performed using rigorous subjective quality assessment methods and showed that VVC provides a compelling gain over its predecessor — the High Efficiency Video Coding (HEVC) standard produced in 2013. In particular, the verification test was performed using the VVC reference software implementation (VTM) and the recently released open-source encoder implementation of VVC (VVenC):

  • Using its reference software implementation (VTM), VVC showed bit rate savings of roughly 45% over HEVC for comparable subjective video quality.
  • Using VVenC, additional bit rate savings of more than 10% relative to VTM were observed, which at the same time runs significantly faster than the reference software implementation.

Additionally, the standardization work for both conformance testing and reference software for the VVC standard reached its first major milestone, i.e., progressing to the Committee Draft ballot in the ISO/IEC approval process. The conformance testing standard (ISO/IEC 23090-15) will ensure interoperability among the diverse applications that use the VVC standard, and the reference software standard (ISO/IEC 23090-16) will provide an illustration of the capabilities of VVC and a valuable example showing how the standard can be implemented. The reference software will further facilitate the adoption of the standard by being available for use as the basis of product implementations.

Research aspects: as for every new video codec, its compression efficiency and computational complexity are important performance metrics. While the reference software (VTM) provides a valid reference in terms of compression efficiency it is not optimized for runtime. VVenC seems to provide already a significant improvement and with x266 another open source implementation will be available soon. Together with AOMedia’s AV1 (including its possible successor AV2) we are looking forward to a lively future in the area of video codecs.

MPEG Completes Geometry-based Point Cloud Compression Standard

MPEG promoted its ISO/IEC 23090-9 Geometry-based Point Cloud Compression (G-PCC) standard to the Final Draft International Standard (FDIS) stage. G-PCC addresses lossless and lossy coding of time-varying 3D point clouds with associated attributes such as color and material properties. This technology is particularly suitable for sparse point clouds. ISO/IEC 23090-5 Video-based Point Cloud Compression (V-PCC), which reached the FDIS stage in July 2020, addresses the same problem but for dense point clouds, by projecting the (typically dense) 3D point clouds onto planes, and then processing the resulting sequences of 2D images using video compression techniques. The generalized approach of G-PCC, where the 3D geometry is directly coded to exploit any redundancy in the point cloud itself, is complementary to V-PCC and particularly useful for sparse point clouds representing large environments.

Point clouds are typically represented by extremely large amounts of data, which is a significant barrier to mass-market applications. However, the relative ease of capturing and rendering spatial information compared to other volumetric video representations makes point clouds increasingly popular for displaying immersive volumetric data. The current draft reference software implementation of a lossless, intra-frame G‐PCC encoder provides a compression ratio of up to 10:1 and lossy coding of acceptable quality for a variety of applications with a ratio of up to 35:1.

By providing high immersion at currently available bit rates, the G‐PCC standard will enable various applications such as 3D mapping, indoor navigation, autonomous driving, advanced augmented reality (AR) with environmental mapping, and cultural heritage.

Research aspects: the main research focus related to G-PCC and V-PCC is currently on compression efficiency but one should not dismiss its delivery aspects including its dynamic, adaptive streaming. A recent paper on this topic has been published in the IEEE Communications Magazine and is entitled “From Capturing to Rendering: Volumetric Media Delivery With Six Degrees of Freedom“.

MPEG Finalizes the Harmonization of DASH and CMAF

MPEG successfully completed the harmonization of Dynamic Adaptive Streaming over HTTP (DASH) with Common Media Application Format (CMAF) featuring a DASH profile for the use with CMAF (as part of the 1st Amendment of ISO/IEC 23009-1:2019 4th edition).

CMAF and DASH segments are both based on the ISO Base Media File Format (ISOBMFF), which per se enables smooth integration of both technologies. Most importantly, this DASH profile defines (a) a normative mapping of CMAF structures to DASH structures and (b) how to use Media Presentation Description (MPD) as a manifest format.
Additional tools added to this amendment include

  • DASH events and timed metadata track timing and processing models with in-band event streams,
  • a method for specifying the resynchronization points of segments when the segments have internal structures that allow container-level resynchronization,
  • an MPD patch framework that allows the transmission of partial MPD information as opposed to the complete MPD using the XML patch framework as defined in IETF RFC 5261, and
  • content protection enhancements for efficient signalling.

It is expected that the 5th edition of the MPEG DASH standard (ISO/IEC 23009-1) containing this change will be issued at the 133rd MPEG meeting in January 2021. An overview of DASH standards/features can be found in the Figure below.

Research aspects: one of the features enabled by CMAF is low latency streaming that is actively researched within the multimedia systems community (e.g., here). The main research focus has been related to the ABR logic while its impact on the network is not yet fully understood and requires strong collaboration among stakeholders along the delivery path including ingest, encoding, packaging, (encryption), content delivery network (CDN), and consumption. A holistic view on ABR is needed to enable innovation and the next step towards the future generation of streaming technologies (https://athena.itec.aau.at/).

MPEG Completes 2nd Edition of the Omnidirectional Media Format

MPEG completed the standardization of the 2nd edition of the Omnidirectional MediA Format (OMAF) by promoting ISO/IEC 23009-2 to Final Draft International Standard (FDIS) status including the following features:

  • “Late binding” technologies to deliver and present only that part of the content that adapts to the dynamically changing users’ viewpoint. To enable an efficient implementation of such a feature, this edition of the specification introduces the concept of bitstream rewriting, in which a compliant bitstream is dynamically generated that, by combining the received portions of the bitstream, covers only the users’ viewport on the client.
  • Extension of OMAF beyond 360-degree video. This edition introduces the concept of viewpoints, which can be considered as user-switchable camera positions for viewing content or as temporally contiguous parts of a storyline to provide multiple choices for the storyline a user can follow.
  • Enhances the use of video, image, or timed text overlays on top of omnidirectional visual background video or images related to a sphere or a viewport.

Research aspects: standards usually define formats to enable interoperability but various informative aspects are left open for industry competition and subject to research and development. The same holds for OMAF and its 2nd edition enables researchers and developers to work towards efficient viewport-adaptive implementations focusing on the users’ viewport.

MPEG Completes the Low Complexity Enhancement Video Coding Standard

MPEG is pleased to announce the completion of the new ISO/IEC 23094-2 standard, i.e., Low Complexity Enhancement Video Coding (MPEG-5 Part 2 LCEVC), which has been promoted to Final Draft International Standard (FDIS) at the 132nd MPEG meeting.

  • LCEVC adds an enhancement data stream that can appreciably improve the resolution and visual quality of reconstructed video with an effective compression efficiency of limited complexity by building on top of existing and future video codecs.
  • LCEVC can be used to complement devices originally designed only for decoding the base layer bitstream, by using firmware, operating system, or browser support. It is designed to be compatible with existing video workflows (e.g., CDNs, metadata management, DRM/CA) and network protocols (e.g., HLS, DASH, CMAF) to facilitate the rapid deployment of enhanced video services.
  • LCEVC can be used to deliver higher video quality in limited bandwidth scenarios, especially when the available bit rate is low for high-resolution video delivery and decoding complexity is a challenge. Typical use cases include mobile streaming and social media, and services that benefit from high-density/low-power transcoding.

Research aspects: LCEVC provides a kind of scalable video coding by combining hardware- and software-based decoders that allow for certain flexibility as part of regular software life cycle updates. However, LCEVC has been never compared to Scalable Video Coding (SVC) and Scalable High-Efficiency Video Coding (SHVC) which could be an interesting aspect for future work.

The 133rd MPEG meeting will be again an online meeting in January 2021.

Click here for more information about MPEG meetings and their developments.

MPEG Column: 131st MPEG Meeting (virtual/online)

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 131st MPEG meeting concluded on July 3, 2020, online, again but with a press release comprising an impressive list of news items which is led by “MPEG Announces VVC – the Versatile Video Coding Standard”. Just in the middle of the SC 29 (i.e., MPEG’s parent body within ISO) restructuring process, MPEG successfully ratified — jointly with ITU-T’s VCEG within JVET — its next-generation video codec among other interesting results from the 131st MPEG meeting:

Standards progressing to final approval ballot (FDIS)

  • MPEG Announces VVC – the Versatile Video Coding Standard
  • Point Cloud Compression – MPEG promotes a Video-based Point Cloud Compression Technology to the FDIS stage
  • MPEG-H 3D Audio – MPEG promotes Baseline Profile for 3D Audio to the final stage

Call for Proposals

  • Call for Proposals on Technologies for MPEG-21 Contracts to Smart Contracts Conversion
  • MPEG issues a Call for Proposals on extension and improvements to ISO/IEC 23092 standard series

Standards progressing to the first milestone of the ISO standard development process

  • Widening support for storage and delivery of MPEG-5 EVC
  • Multi-Image Application Format adds support of HDR
  • Carriage of Geometry-based Point Cloud Data progresses to Committee Draft
  • MPEG Immersive Video (MIV) progresses to Committee Draft
  • Neural Network Compression for Multimedia Applications – MPEG progresses to Committee Draft
  • MPEG issues Committee Draft of Conformance and Reference Software for Essential Video Coding (EVC)

The corresponding press release of the 131st MPEG meeting can be found here: https://mpeg-standards.com/meetings/mpeg-131/. This report focused on video coding featuring VVC as well as PCC and systems aspects (i.e., file format, DASH).

MPEG Announces VVC – the Versatile Video Coding Standard

MPEG is pleased to announce the completion of the new Versatile Video Coding (VVC) standard at its 131st meeting. The document has been progressed to its final approval ballot as ISO/IEC 23090-3 and will also be known as H.266 in the ITU-T.

VVC Architecture (from IEEE ICME 2020 tutorial of Mathias Wien and Benjamin Bross)

VVC is the latest in a series of very successful standards for video coding that have been jointly developed with ITU-T, and it is the direct successor to the well-known and widely used High Efficiency Video Coding (HEVC) and Advanced Video Coding (AVC) standards (see architecture in the figure above). VVC provides a major benefit in compression over HEVC. Plans are underway to conduct a verification test with formal subjective testing to confirm that VVC achieves an estimated 50% bit rate reduction versus HEVC for equal subjective video quality. Test results have already demonstrated that VVC typically provides about a 40%-bit rate reduction for 4K/UHD video sequences in tests using objective metrics (i.e., PSNR, VMAF, MS-SSIM). Application areas especially targeted for the use of VVC include:

  • ultra-high definition 4K and 8K video,
  • video with a high dynamic range and wide colour gamut, and
  • video for immersive media applications such as 360° omnidirectional video.

Furthermore, VVC is designed for a wide variety of types of video such as camera capturedcomputer-generated, and mixed content for screen sharing, adaptive streaming, game streaming, video with scrolling text, etc. Conventional standard-definition and high-definition video content are also supported with similar gains in compression. In addition to improving coding efficiency, VVC also provides highly flexible syntax supporting such use cases as (i) subpicture bitstream extraction, (ii) bitstream merging, (iii) temporal sub-layering, and (iv) layered coding scalability.

The current performance of VVC compared to HEVC-HM is shown in the figure below which confirms the statement above but also highlights the increased complexity. Please note that VTM9 is not optimized for speed but functionality (i.e., compression efficiency).

Performance of VVC, VTM9 vs. HM (taken from https://bit.ly/mpeg131).

MPEG also announces completion of ISO/IEC 23002-7 “Versatile supplemental enhancement information for coded video bitstreams” (VSEI), developed jointly with ITU-T as Rec. ITU-T H.274. The new VSEI standard specifies the syntax and semantics of video usability information (VUI) parameters and supplemental enhancement information (SEI) messages for use with coded video bitstreams. VSEI is especially intended for use with VVC, although it is drafted to be generic and flexible so that it may also be used with other types of coded video bitstreams. Once specified in VSEI, different video coding standards and systems-environment specifications can re-use the same SEI messages without the need for defining special-purpose data customized to the specific usage context.

At the same time, the Media Coding Industry Forum (MC-IF) announces a VVC patent pool fostering with an initial meeting on September 1, 2020. The aim of this meeting is to identify tasks and to propose a schedule for VVC pool fostering with the goal to select a pool facilitator/administrator by the end of 2020. MC-IF is not facilitating or administering a patent pool.

At the time of writing this blog post, it is probably too early to make an assessment of whether VVC will share the fate of HEVC or AVC (w.r.t. patent pooling). AVC is still the most widely used video codec but with AVC, HEVC, EVC, VVC, LCEVC, AV1, (AV2), and probably also AVS3 — did I miss anything? — the competition and pressure are certainly increasing.

Research aspects: from a research perspective, reduction of time-complexity (for a variety of use cases) while maintaining quality and bitrate at acceptable levels is probably the most relevant aspect. Improvements in individual building blocks of VVC by using artificial neural networks (ANNs) are another area of interest but also end-to-end aspects of video coding using ANNs will probably pave the roads towards the/a next generation of video codec(s). Utilizing VVC and its features for HTTP adaptive streaming (HAS) is probably most interesting for me but maybe also for others…

MPEG promotes a Video-based Point Cloud Compression Technology to the FDIS stage

At its 131st meeting, MPEG promoted its Video-based Point Cloud Compression (V-PCC) standard to the Final Draft International Standard (FDIS) stage. V-PCC addresses lossless and lossy coding of 3D point clouds with associated attributes such as colors and reflectance. Point clouds are typically represented by extremely large amounts of data, which is a significant barrier for mass-market applications. However, the relative ease to capture and render spatial information as point clouds compared to other volumetric video representations makes point clouds increasingly popular to present immersive volumetric data. With the current V-PCC encoder implementation providing compression in the range of 100:1 to 300:1, a dynamic point cloud of one million points could be encoded at 8 Mbit/s with good perceptual quality. Real-time decoding and rendering of V-PCC bitstreams have also been demonstrated on current mobile hardware. The V-PCC standard leverages video compression technologies and the video ecosystem in general (hardware acceleration, transmission services, and infrastructure) while enabling new kinds of applications. The V-PCC standard contains several profiles that leverage existing AVC and HEVC implementations, which may make them suitable to run on existing and emerging platforms. The standard is also extensible to upcoming video specifications such as Versatile Video Coding (VVC) and Essential Video Coding (EVC).

The V-PCC standard is based on Visual Volumetric Video-based Coding (V3C), which is expected to be re-used by other MPEG-I volumetric codecs under development. MPEG is also developing a standard for the carriage of V-PCC and V3C data (ISO/IEC 23090-10) which has been promoted to DIS status at the 130th MPEG meeting.

By providing high-level immersiveness at currently available bandwidths, the V-PCC standard is expected to enable several types of applications and services such as six Degrees of Freedom (6 DoF) immersive media, virtual reality (VR) / augmented reality (AR), immersive real-time communication and cultural heritage.

Research aspects: as V-PCC is video-based, we can probably state similar research aspects as for video codecs such as improving efficiency both for encoding and rendering as well as reduction of time complexity. During the development of V-PCC mainly HEVC (and AVC) has/have been used but it is definitely interesting to use also VVC for PCC. Finally, the dynamic adaptive streaming of V-PCC data is still in its infancy despite some articles published here and there.

MPEG Systems related News

Finally, I’d like to share news related to MPEG systems and the carriage of video data as depicted in the figure below. In particular, the carriage of VVC (and also EVC) has been now enabled in MPEG-2 Systems (specifically within the transport stream) and in the various file formats (specifically within the NAL file format). The latter is used also in CMAF and DASH which makes VVC (and also EVC) ready for HTTP adaptive streaming (HAS).

Carriage of Video in MPEG Systems Standards (taken from https://bit.ly/mpeg131).

What about DASH and CMAF?

CMAF maintains a so-called “technologies under consideration” document which contains — among other things — a proposed VVC CMAF profile. Additionally, there are two exploration activities related to CMAF, i.e., (i) multi-stream support and (ii) storage, archiving, and content management for CMAF files.

DASH works on potential improvement for the first amendment to ISO/IEC 23009-1 4th edition related to CMAF support, events processing model, and other extensions. Additionally, there’s a working draft for a second amendment to ISO/IEC 23009-1 4th edition enabling bandwidth change signalling track and other enhancements. Furthermore, ISO/IEC 23009-8 (Session-based DASH operations) has been advanced to Draft International Standard (see also my last report).

An overview of the current status of MPEG-DASH can be found in the figure below.

The next meeting will be again an online meeting in October 2020.

Finally, MPEG organized a Webinar presenting results from the 131st MPEG meeting. The slides and video recordings are available here: https://bit.ly/mpeg131.

Click here for more information about MPEG meetings and their developments.

MPEG Column: 130th MPEG Meeting (virtual/online)

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 130th MPEG meeting concluded on April 24, 2020, in Alpbach, Austria … well, not exactly, unfortunately. The 130th MPEG meeting concluded on April 24, 2020, but not in Alpbach, Austria.

I attended the 130th MPEG meeting remotely.

Because of the Covid-19 pandemic, the 130th MPEG meeting has been converted from a physical meeting to a fully online meeting, the first in MPEG’s 30+ years of history. Approximately 600 experts attending from 19 time zones worked in tens of Zoom meeting sessions supported by an online calendar and by collaborative tools that involved MPEG experts in both online and offline sessions. For example, input contributions had to be registered and uploaded ahead of the meeting to allow for efficient scheduling of two-hour meeting slots, which have been distributed from early morning to late night in order to accommodate experts working in different time zones as mentioned earlier. These input contributions have been then mapped to GitLab issues for offline discussions and the actual meeting slots have been primarily used for organizing the meeting, resolving conflicts, and making decisions including approving output documents. Although the productivity of the online meeting could not reach the level of regular face-to-face meetings, the results posted in the press release show that MPEG experts managed the challenge quite well, specifically

  • MPEG ratifies MPEG-5 Essential Video Coding (EVC) standard;
  • MPEG issues the Final Draft International Standards for parts 1, 2, 4, and 5 of MPEG-G 2nd edition;
  • MPEG expands the coverage of ISO Base Media File Format (ISOBMFF) family of standards;
  • A new standard for large scale client-specific streaming with MPEG-DASH;

Other Important Activities at the 130th MPEG meeting(i) the carriage of visual volumetric video-based coding data, (ii) Network-Based Media Processing (NBMP) function templates, (iii) the conversion from MPEG-21 contracts to smart contracts, (iv) deep neural network-based video coding, (v) Low Complexity Enhancement Video Coding (LCEVC) reaching DIS stage, and (vi) a new level of the MPEG-4 Audio ALS Simple Profile for high-resolution audio among others

The corresponding press release of the 130th MPEG meeting can be found here: https://mpeg.chiariglione.org/meetings/130. This report focused on video coding (EVC) and systems aspects (file format, DASH).

MPEG ratifies MPEG-5 Essential Video Coding Standard

At its 130th meeting, MPEG announced the completion of the new ISO/IEC 23094-1 standard which is referred to as MPEG-5 Essential Video Coding (EVC) and has been promoted to Final Draft International Standard (FDIS) status. There is a constant demand for more efficient video coding technologies (e.g., due to the increased usage of video on the internet), but coding efficiency is not the only factor determining the industry’s choice of video coding technology for products and services. The EVC standard offers improved compression efficiency compared to existing video coding standards and is based on the statements of all contributors to the standard who have committed announcing their license terms for the MPEG-5 EVC standard no later than two years after the FDIS publication date.

The MPEG-5 EVC defines two important profiles, including “Baseline profile” and “Main profile”. The “Baseline Profile” contains only technologies that are older than 20 years or otherwise freely available for use in the standard. In addition, the “Main Profile” adds a small number of additional tools, each of which can be either cleanly disabled or switched to the corresponding baseline tool on an individual basis.

It will be interesting to see how EVC profiles (baseline and main) will find its path into products and services given the existing number of codecs already in use (e.g., AVC, HEVC, VP9, AV1) and those still under development but being close to ratification (e.g., VVC, LCEVC). That is, in total, we may end up with about seven video coding formats that probably need to be considered for future video products and services. In other words, the multi-codec scenario I have envisioned some time ago is becoming reality raising some interesting challenges to be addressed in the future.

Research aspects: as for all video coding standards, the most important research aspect is certainly coding efficiency. For EVC it might be also interesting to research its usability of the built-in tool switching mechanism within a practical setup. Furthermore, the multi-codec issue, the ratification of EVC adds another facet to the already existing video coding standards in use or/and under development.

MPEG expands the Coverage of ISO Base Media File Format (ISOBMFF) Family of Standards

At the 130th WG11 (MPEG) meeting, the ISOBMFF family of standards has been significantly amended with new tools and functionalities. The standards in question are as follows:

  • ISO/IEC 14496-12: ISO Base Media File Format;
  • ISO/IEC 14496-15: Carriage of network abstraction layer (NAL) unit structured video in the ISO base media file format;
  • ISO/IEC 23008-12: Image File Format; and
  • ISO /IEC 23001-16: Derived visual tracks in the ISO base media file format.

In particular, three new amendments to the ISOBMFF family have reached their final milestone, i.e., Final Draft Amendment (FDAM):

  1. Amendment 4 to ISO/IEC 14496-12 (ISO Base Media File Format) allows the use of a more compact version of metadata for movie fragments;
  2. Amendment 1 to ISO/IEC 14496-15 (Carriage of network abstraction layer (NAL) unit structured video in the ISO base media file format) adds support of HEVC slice segment data track and additional extractor types for HEVC such as track reference and track groups; and
  3. Amendment 2 to ISO/IEC 23008-12 (Image File Format) adds support for more advanced features related to the storage of short image sequences such as burst and bracketing shots.

At the same time, new amendments have reached their first milestone, i.e., Committee Draft Amendment (CDAM):

  1. Amendment 2 to ISO/IEC 14496-15 (Carriage of network abstraction layer (NAL) unit structured video in the ISO base media file format) extends its scope to newly developed video coding standards such as Essential Video Coding (EVC) and Versatile Video Coding (VVC); and
  2. the first edition of ISO/IEC 23001-16 (Derived visual tracks in the ISO base media file format) allows a new type of visual track whose content can be dynamically generated at the time of presentation by applying some operations to the content in other tracks, such as crossfading over two tracks.

Both are expected to reach their final milestone in mid-2021.

Finally, the final text for the ISO/IEC 14496-12 6th edition Final Draft International Standard (FDIS) is now ready for the ballot after converting MP4RA to the Maintenance Agency. WG11 (MPEG) notes that Apple Inc. has been appointed as the Maintenance Agency and MPEG appreciates its valuable efforts for the many years while already acting as the official registration authority for the ISOBMFF family of standards, i.e., MP4RA (https://mp4ra.org/). The 6th edition of ISO/IEC 14496-12 is expected to be published by ISO by the end of this year.

Research aspects: the ISOBMFF family of standards basically offers certain tools and functionalities to satisfy the given use case requirements. The task of the multimedia systems research community could be to scientifically validate these tools and functionalities with respect to the use cases and maybe even beyond, e.g., try to adopt these tools and functionalities for novel applications and services.

A New Standard for Large Scale Client-specific Streaming with DASH

Historically, in ISO/IEC 23009 (Dynamic Adaptive Streaming over HTTP; DASH), every client has used the same Media Presentation Description (MPD) as it best serves the scalability of the service (e.g., for efficient cache efficiency in content delivery networks). However, there have been increasing requests from the industry to enable customized manifests for more personalized services. Consequently, MPEG has studied a solution to this problem without sacrificing scalability, and it has reached the first milestone of its standardization at the 130th MPEG meeting.

ISO/IEC 23009-8 adds a mechanism to the Media Presentation Description (MPD) to refer to another document, called Session-based Description (SBD), which allows per-session information. The DASH client can use this information (i.e., variables and their values) provided in the SBD to derive the URLs for HTTP GET requests. This standard is expected to reach its final milestone in mid-2021.

Research aspects: SBD’s goal is to enable personalization while maintaining scalability which calls for a tradeoff, i.e., which kind of information to put into the MPD and what should be conveyed within the SBD. This tradeoff per se could be considered already a research question that will be hopefully addressed in the near future.

An overview of the current status of MPEG-DASH can be found in the figure below.

The next MPEG meeting will be from June 29th to July 3rd and will be again an online meeting. I am looking forward to a productive AhG period and an online meeting later this year. I am sure that MPEG will further improve its online meeting capabilities and can certainly become a role model for other groups within ISO/IEC and probably also beyond.

MPEG Column: 129th MPEG Meeting in Brussels, Belgium

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 129th MPEG meeting concluded on January 17, 2020 in Brussels, Belgium with the following topics:

  • Coded representation of immersive media – WG11 promotes Network-Based Media Processing (NBMP) to the final stage
  • Coded representation of immersive media – Publication of the Technical Report on Architectures for Immersive Media
  • Genomic information representation – WG11 receives answers to the joint call for proposals on genomic annotations in conjunction with ISO TC 276/WG 5
  • Open font format – WG11 promotes Amendment of Open Font Format to the final stage
  • High efficiency coding and media delivery in heterogeneous environments – WG11 progresses Baseline Profile for MPEG-H 3D Audio
  • Multimedia content description interface – Conformance and Reference Software for Compact Descriptors for Video Analysis promoted to the final stage

Additional Important Activities at the 129th WG 11 (MPEG) meeting

The 129th WG 11 (MPEG) meeting was attended by more than 500 experts from 25 countries working on important activities including (i) a scene description for MPEG media, (ii) the integration of Video-based Point Cloud Compression (V-PCC) and Immersive Video (MIV), (iii) Video Coding for Machines (VCM), and (iv) a draft call for proposals for MPEG-I Audio among others.

The corresponding press release of the 129th MPEG meeting can be found here: https://mpeg.chiariglione.org/meetings/129. This report focused on network-based media processing (NBMP), architectures of immersive media, compact descriptors for video analysis (CDVA), and an update about adaptive streaming formats (i.e., DASH and CMAF).

MPEG picture at Friday plenary; © Rob Koenen (Tiledmedia).

Coded representation of immersive media – WG11 promotes Network-Based Media Processing (NBMP) to the final stage

At its 129th meeting, MPEG promoted ISO/IEC 23090-8, Network-Based Media Processing (NBMP), to Final Draft International Standard (FDIS). The FDIS stage is the final vote before a document is officially adopted as an International Standard (IS). During the FDIS vote, publications and national bodies are only allowed to place a Yes/No vote and are no longer able to make any technical changes. However, project editors are able to fix typos and make other necessary editorial improvements.

What is NBMP? The NBMP standard defines a framework that allows content and service providers to describe, deploy, and control media processing for their content in the cloud by using libraries of pre-built 3rd party functions. The framework includes an abstraction layer to be deployed on top of existing commercial cloud platforms and is designed to be able to be integrated with 5G core and edge computing. The NBMP workflow manager is another essential part of the framework enabling the composition of multiple media processing tasks to process incoming media and metadata from a media source and to produce processed media streams and metadata that are ready for distribution to media sinks.

Why NBMP? With the increasing complexity and sophistication of media services and the incurred media processing, offloading complex media processing operations to the cloud/network is becoming critically important in order to keep receiver hardware simple and power consumption low.

Research aspects: NBMP reminds me a bit about what has been done in the past in MPEG-21, specifically Digital Item Adaptation (DIA) and Digital Item Processing (DIP). The main difference is that MPEG now targets APIs rather than pure metadata formats, which is a step forward in the right direction as APIs can be implemented and used right away. NBMP will be particularly interesting in the context of new networking approaches including, but not limited to, software-defined networking (SDN), information-centric networking (ICN), mobile edge computing (MEC), fog computing, and related aspects in the context of 5G.

Coded representation of immersive media – Publication of the Technical Report on Architectures for Immersive Media

At its 129th meeting, WG11 (MPEG) published an updated version of its technical report on architectures for immersive media. This technical report, which is the first part of the ISO/IEC 23090 (MPEG-I) suite of standards, introduces the different phases of MPEG-I standardization and gives an overview of the parts of the MPEG-I suite. It also documents use cases and defines architectural views on the compression and coded representation of elements of immersive experiences. Furthermore, it describes the coded representation of immersive media and the delivery of a full, individualized immersive media experience. MPEG-I enables scalable and efficient individual delivery as well as mass distribution while adjusting to the rendering capabilities of consumption devices. Finally, this technical report breaks down the elements that contribute to a fully immersive media experience and assigns quality requirements as well as quality and design objectives for those elements.

Research aspects: This technical report provides a kind of reference architecture for immersive media, which may help identify research areas and research questions to be addressed in this context.

Multimedia content description interface – Conformance and Reference Software for Compact Descriptors for Video Analysis promoted to the final stage

Managing and organizing the quickly increasing volume of video content is a challenge for many industry sectors, such as media and entertainment or surveillance. One example task is scalable instance search, i.e., finding content containing a specific object instance or location in a very large video database. This requires video descriptors that can be efficiently extracted, stored, and matched. Standardization enables extracting interoperable descriptors on different devices and using software from different providers so that only the compact descriptors instead of the much larger source videos can be exchanged for matching or querying. ISO/IEC 15938-15:2019 – the MPEG Compact Descriptors for Video Analysis (CDVA) standard – defines such descriptors. CDVA includes highly efficient descriptor components using features resulting from a Deep Neural Network (DNN) and uses predictive coding over video segments. The standard is being adopted by the industry. At its 129th meeting, WG11 (MPEG) has finalized the conformance guidelines and reference software. The software provides the functionality to extract, match, and index CDVA descriptors. For easy deployment, the reference software is also provided as Docker containers.

Research aspects: The availability of reference software helps to conduct reproducible research (i.e., reference software is typically publicly available for free) and the Docker container even further contributes to this aspect.

DASH and CMAF

The 4th edition of DASH has already been published and is available as ISO/IEC 23009-1:2019. Similar to previous iterations, MPEG’s goal was to make the newest edition of DASH publicly available for free, with the goal of industry-wide adoption and adaptation. During the most recent MPEG meeting, we worked towards implementing the first amendment which will include additional (i) CMAF support and (ii) event processing models with minor updates; these amendments are currently in draft and will be finalized at the 130th MPEG meeting in Alpbach, Austria. An overview of all DASH standards and updates are depicted in the figure below:

ISO/IEC 23009-8 or “session-based DASH operations” is the newest variation of MPEG-DASH. The goal of this part of DASH is to allow customization during certain times of a DASH session while maintaining the underlying media presentation description (MPD) for all other sessions. Thus, MPDs should be cacheable within content distribution networks (CDNs) while additional information should be customizable on a per session basis within a newly added session-based description (SBD). It is understood that the SBD should have an efficient representation to avoid file size issues and it should not duplicate information typically found in the MPD.

The 2nd edition of the CMAF standard (ISO/IEC 23000-19) will be available soon (currently under FDIS ballot) and MPEG is currently reviewing additional tools in the so-called ‘technologies under considerations’ document. Therefore, amendments were drafted for additional HEVC media profiles and exploration activities on the storage and archiving of CMAF contents.

The next meeting will bring MPEG back to Austria (for the 4th time) and will be hosted in Alpbach, Tyrol. For more information about the upcoming 130th MPEG meeting click here.

Click here for more information about MPEG meetings and their developments

Can the Multimedia Research Community via Quality of Experience contribute to a better Quality of Life?

Can the multimedia community contribute to a better Quality of Life? Delivering a higher resolution and distortion-free media stream so you can enjoy the latest movie on Netflix or YouTube may provide instantaneous satisfaction, but does it make your long term life better? Whilst the QoMEX conference series has traditionally considered the former, in more recent years and with a view to QoMEX 2020, research works that consider the later are also welcome. In this context, rather than looking at what we do, reflecting on how we do it could offer opportunities for sustained rather than instantaneous impact in fields such as health, inclusive of assistive technologies (AT) and digital heritage among many others.

In this article, we ask if the concepts from the Quality of Experience (QoE) [1] framework model can be applied, adapted and reimagined to inform and develop tools and systems that enhance our Quality of Life. The World Health Organisation (WHO) definition of health states that “[h]ealth is a state of complete physical, mental and social well-being and not merely the absence of disease or infirmity” [2]. This is a definition that is well-aligned with the familiar yet ill-defined term, Quality of Life (QoL). Whilst QoL requires further work towards a concrete definition, the definition of QoE has been developed through work by the QUALINET EU COST Network [3]. Using multimedia quality as a use case, a white paper [1] resulted from this effort that describes the human, context, service and system factors that influence the quality of experience for multimedia systems.

Fig. 1: (a) Quality of Experience and (b) Quality of Life. (reproduced from [2]).

The QoE formation process has been mapped to a conceptual model allowing systems and services to be evaluated and improved. Such a model has been developed and used in predicting QoE. Adapting and applying the methods to health-related QoL will allow predictive models for QoL to be developed.

In this context, the best paper award winner at QoMEX in 2017 [4] proposed such a mapping for QoL in stroke prevention, care and rehabilitation (Fig. 1) along with examining practical challenges for modeling and applications. The process of identifying and categorizing factors and features was illustrated using stroke patient treatment as an example use case and this work has continued through the European Union Horizon 2020 research project PRECISE4Q [5]. For medical practitioners, a QoL framework can assist in the development of decision support systems solutions, patient monitoring, and imaging systems.

At more of a “systems” level in e-health applications, the WHO defines assistive devices and technologies as “those whose primary purpose is to maintain or improve an individual’s functioning and independence to facilitate participation and to enhance overall well-being” [6]. A proposed application of immersive technologies as an assistive technology (AT) training solution applied QoE as a mechanism to evaluate the usability and utility of the system [7]. The assessment of immersive AT used a number of physiological data: EEG signal, GSR/EDA, body surface temperature, accelerometer, HR and BVP. These allow objective analysis while the individual is operating the wheelchair simulator. Performing such evaluations in an ecologically valid manner is a challenging task. However, the QoE framework provides a concrete mechanism to consider the human, context and system factors that influence the usability and utility of such a training simulator. In particular, the use of implicit and objective metrics can complement qualitative approaches to evaluations.

In the same vein, another work presented at QoMEX 2017 [8], employed the use of Augmented Reality (AR) and Virtual Reality (VR) as a clinical aid for diagnosis of speech and language difficulties, specifically aphasia (see Fig. 2). It is estimated, that speech or language difficulties affect more than 12% of people internationally [9]. Individuals who suffer from a stroke or traumatic brain injury (TBI) often experience symptoms of aphasia as a result of damage to the left frontal lobe. Anomic aphasia [10] is a mild form of aphasia in which patients experience word retrieval problems and semantic memory difficulties. Opportunities exist to digitalize well-accepted clinical approaches that can be augmented through QoE based objective and implicit metrics. Understanding the user via advanced processing techniques is an area in dire need of further research with significant opportunities to understand the user at a cognitive, interaction and performance levels moving far beyond the binary pass/fail of traditional approaches.

Fig. 2: Prototype System Framework (Reproduced from [8]). I. Physiological wearable sensors used to capture data. (a) Neurosky mindwave® device. (b) Empatica E4® wristband. II. Representation of user interaction with the wheelchair simulator. III. The compatibles displays. (a) Common screen. (b) Oculus Rift® HMD device. (c) HTC Vive® HMD device.

Moving beyond health, the QoE concept can also be extended to other areas such as digital heritage. Organizations such as broadcasters and national archives that collect media recordings are digitizing their material because the analog storage media degrade over time. Archivists, restoration experts, content creators, and consumers are all stakeholders but they have different perspectives when it comes to their expectations and needs. Hence their QoE for archive material can be very different, as discussed at QoMEX 2019 [11]. For people interested in media archives viewing quality through a QoE lens, QoE aids in understanding the issues and priorities of the stakeholders. Applying the QoE framework to explore the different stakeholders and the influencing factors that affect their QoE perceptions over time allows different kinds of models for QoE to be developed and used across the stages of the archived material lifecycle from digitization through restoration and consumption.

The QoE framework’s simple yet comprehensive conceptual model for the quality formation process has had a major impact on multimedia quality. The examples presented here highlight how it can be used as a blueprint in other domains and to reconcile different perspectives and attitudes to quality. With an eye on the next and future editions of QoMEX, will we see other use cases and applications of QoE to domains and concepts beyond multimedia quality evaluations? The QoMEX conference series has evolved and adapted based on emerging application domains, industry engagement, and approaches to quality evaluations.  It is clear that the scope of QoE research broadened significantly over the last 11 years. Please take a look at [12] for details on the conference topics and special sessions that the organizing team for QoMEX2020 in Athlone Ireland hope will broaden the range of use cases that apply QoE towards QoL and other application domains in a spirit of inclusivity and diversity.

References:

[1] P. Le Callet, S. Möller, and A. Perkis, eds., “Qualinet White Paper on Definitions of Quality of Experience (2012). European Network on Quality of Experience in Multimedia Systems and Services (COST Action IC 1003), Lausanne, Switzerland, Version 1.2, March 2013.”

[2] World Health Organization, “World health organisation. preamble to the constitution of the world health organisation,” 1946. [Online]. Available: http://apps.who.int/gb/bd/PDF/bd47/EN/constitution-en.pdf. [Accessed: 21-Jan-2020].

[3] QUALINET [Online], Available: https://www.qualinet.eu. [Accessed: 21-Jan-2020].

[4] A. Hines and J. D. Kelleher, “A framework for post-stroke quality of life prediction using structured prediction,” 9th International Conference on Quality of Multimedia Experience, QoMEX 2017, Erfurt, Germany, June 2017.

[5] European Union Horizon 2020 research project PRECISE4Q, https://precise4q.eu/. [Accessed: 21-Jan-2020].

[6] “WHO | Assistive devices and technologies,” WHO, 2017. [Online]. Available: http://www.who.int/disabilities/technology/en/. [Accessed: 21-Jan-2020].

[7] D. Pereira Salgado, F. Roque Martins, T. Braga Rodrigues, C. Keighrey, R. Flynn, E. L. Martins Naves, and N. Murray, “A QoE assessment method based on EDA, heart rate and EEG of a virtual reality assistive technology system”, In Proceedings of the 9th ACM Multimedia Systems Conference (Demo Paper), pp. 517-520, 2018.

[8] C. Keighrey, R. Flynn, S. Murray, and N. Murray, “A QoE Evaluation of Immersive Augmented and Virtual Reality Speech & Language Assessment Applications”, 9th International Conference on Quality of Multimedia Experience, QoMEX 2017, Erfurt, Germany, June 2017.

[9] “Scope of Practice in Speech-Language Pathology,” 2016. [Online]. Available: http://www.asha.org/uploadedFiles/SP2016-00343.pdf. [Accessed: 21-Jan-2020].

[10] J. Reilly, “Semantic Memory and Language Processing in Aphasia and Dementia,” Seminars in Speech and Language, vol. 29, no. 1, pp. 3-4, 2008.

[11] A. Ragano, E. Benetos, and A. Hines, “Adapting the Quality of Experience Framework for Audio Archive Evaluation,” Eleventh International Conference on Quality of Multimedia Experience (QoMEX), Berlin, Germany, 2019.

[12] QoMEX 2020, Athlone, Ireland. [Online]. Available: https://www.qomex2020.ie. [Accessed: 21-Jan-2020].

MPEG Column: 128th MPEG Meeting in Geneva, Switzerland

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 128th MPEG meeting concluded on October 11, 2019 in Geneva, Switzerland with the following topics:

  • Low Complexity Enhancement Video Coding (LCEVC) Promoted to Committee Draft
  • 2nd Edition of Omnidirectional Media Format (OMAF) has reached the first milestone
  • Genomic Information Representation – Part 4 Reference Software and Part 5 Conformance Promoted to Draft International Standard

The corresponding press release of the 128th MPEG meeting can be found here: https://mpeg.chiariglione.org/meetings/128. In this report we will focus on video coding aspects (i.e., LCEVC) and immersive media applications (i.e., OMAF). At the end, we will provide an update related to adaptive streaming (i.e., DASH and CMAF).

Low Complexity Enhancement Video Coding

Low Complexity Enhancement Video Coding (LCEVC) has been promoted to committee draft (CD) which is the first milestone in the ISO/IEC standardization process. LCEVC is part two of MPEG-5 or ISO/IEC 23094-2 if you prefer the always easy-to-remember ISO codes. We introduced MPEG-5 already in previous posts and LCEVC is about a standardized video coding solution that leverages other video codecs in a manner that improves video compression efficiency while maintaining or lowering the overall encoding and decoding complexity.

The LCEVC standard uses a lightweight video codec to add up to two layers of encoded residuals. The aim of these layers is correcting artefacts produced by the base video codec and adding detail and sharpness for the final output video.

The target of this standard comprises software or hardware codecs with extra processing capabilities, e.g., mobile devices, set top boxes (STBs), and personal computer based decoders. Additional benefits are the reduction in implementation complexity or a corresponding expansion in spatial resolution.

LCEVC is based on existing codecs which allows for backwards-compatibility with existing deployments. Supporting LCEVC enables “softwareized” video coding allowing for release and deployment options known from software-based solutions which are well understood by software companies and, thus, opens new opportunities in improving and optimizing video-based services and applications.

Research aspects: in video coding, research efforts are mainly related to coding efficiency and complexity (as usual). However, as MPEG-5 basically adds a software layer on top of what is typically implemented in hardware, all kind of aspects related to software engineering could become an active area of research.

Omnidirectional Media Format

The scope of the Omnidirectional Media Format (OMAF) is about 360° video, images, audio and associated timed text and specifies (i) a coordinate system, (ii) projection and rectangular region-wise packing methods, (iii) storage of omnidirectional media and the associated metadata using ISOBMFF, (iv) encapsulation, signaling and streaming of omnidirectional media in DASH and MMT, and (v) media profiles and presentation profiles.

At this meeting, the second edition of OMAF (ISO/IEC 23090-2) has been promoted to committee draft (CD) which includes

  • support of improved overlay of graphics or textual data on top of video,
  • efficient signaling of videos structured in multiple sub parts,
  • enabling more than one viewpoint, and
  • new profiles supporting dynamic bitstream generation according to the viewport.

As for the first edition, OMAF includes encapsulation and signaling in ISOBMFF as well as streaming of omnidirectional media (DASH and MMT). It will reach its final milestone by the end of 2020.

360° video is certainly a vital use case towards a fully immersive media experience. Devices to capture and consume such content are becoming increasingly available and will probably contribute to the dissemination of this type of content. However, it is also understood that the complexity increases significantly, specifically with respect to large-scale, scalable deployments due to increased content volume/complexity, timing constraints (latency), and quality of experience issues.

Research aspects: understanding the increased complexity of 360° video or immersive media in general is certainly an important aspect to be addressed towards enabling applications and services in this domain. We may even start thinking that 360° video actually works (e.g., it’s possible to capture, upload to YouTube and consume it on many devices) but the devil is in the detail in order to handle this complexity in an efficient way to enable seamless and high quality of experience.

DASH and CMAF

The 4th edition of DASH (ISO/IEC 23009-1) will be published soon and MPEG is currently working towards a first amendment which will be about (i) CMAF support and (ii) event processing model. An overview of all DASH standards is depicted in the figure below, notably part one of MPEG-DASH referred to as media presentation description and segment formats.

MPEG-DASH-standard-status

The 2nd edition of the CMAF standard (ISO/IEC 23000-19) will become available very soon and MPEG is currently reviewing additional tools in the so-called technologies under considerations document as well as conducting various explorations. A working draft for additional media profiles is also under preparation.

Research aspects: with CMAF, low-latency supported is added to DASH-like applications and services. However, the implementation specifics are actually not defined in the standard and subject to competition (e.g., here). Interestingly, the Bitmovin video developer reports from both 2018 and 2019 highlight the need for low-latency solutions in this domain.

At the ACM Multimedia Conference 2019 in Nice, France I gave a tutorial entitled “A Journey towards Fully Immersive Media Access” which includes updates related to DASH and CMAF. The slides are available here.

Outlook 2020

Finally, let me try giving an outlook for 2020, not so much content-wise but events planned for 2020 that are highly relevant for this column:

  • MPEG129, Jan 13-17, 2020, Brussels, Belgium
  • DCC 2020, Mar 24-27, 2020, Snowbird, UT, USA
  • MPEG130, Apr 20-24, 2020, Alpbach, Austria
  • NAB 2020, Apr 08-22, Las Vegas, NV, USA
  • ICASSP 2020, May 4-8, 2020, Barcelona, Spain
  • QoMEX 2020, May 26-28, 2020, Athlone, Ireland
  • MMSys 2020, Jun 8-11, 2020, Istanbul, Turkey
  • IMX 2020, June 17-19, 2020, Barcelona, Spain
  • MPEG131, Jun 29 – Jul 3, 2020, Geneva, Switzerland
  • NetSoft,QoE Mgmt Workshop, Jun 29 – Jul 3, 2020, Ghent, Belgium
  • ICME 2020, Jul 6-10, London, UK
  • ATHENA summer school, Jul 13-17, Klagenfurt, Austria
  • … and many more!

MPEG Column: 127th MPEG Meeting in Gothenburg, Sweden

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

Plenary of the 127th MPEG Meeting in Gothenburg, Sweden.

Plenary of the 127th MPEG Meeting in Gothenburg, Sweden.

The 127th MPEG meeting concluded on July 12, 2019 in Gothenburg, Sweden with the following topics:

  • Versatile Video Coding (VVC) enters formal approval stage, experts predict 35-60% improvement over HEVC
  • Essential Video Coding (EVC) promoted to Committee Draft
  • Common Media Application Format (CMAF) 2nd edition promoted to Final Draft International Standard
  • Dynamic Adaptive Streaming over HTTP (DASH) 4th edition promoted to Final Draft International Standard
  • Carriage of Point Cloud Data Progresses to Committee Draft
  • JPEG XS carriage in MPEG-2 TS promoted to Final Draft Amendment of ISO/IEC 13818-1 7th edition
  • Genomic information representation – WG11 issues a joint call for proposals on genomic annotations in conjunction with ISO TC 276/WG 5
  • ISO/IEC 23005 (MPEG-V) 4th Edition – WG11 promotes the Fourth edition of two parts of “Media Context and Control” to the Final Draft International Standard (FDIS) stage

The corresponding press release of the 127th MPEG meeting can be found here: https://mpeg.chiariglione.org/meetings/127

Versatile Video Coding (VVC)

The Moving Picture Experts Group (MPEG) is pleased to announce that Versatile Video Coding (VVC) progresses to Committee Draft, experts predict 35-60% improvement over HEVC.

The development of the next major generation of video coding standard has achieved excellent progress, such that MPEG has approved the Committee Draft (CD, i.e., the text for formal balloting in the ISO/IEC approval process).

The new VVC standard will be applicable to a very broad range of applications and it will also provide additional functionalities. VVC will provide a substantial improvement in coding efficiency relative to existing standards. The improvement in coding efficiency is expected to be quite substantial – e.g., in the range of 35–60% bit rate reduction relative to HEVC although it has not yet been formally measured. Relative to HEVC means for equivalent subjective video quality at picture resolutions such as 1080p HD or 4K or 8K UHD, either for standard dynamic range video or high dynamic range and wide color gamut content for levels of quality appropriate for use in consumer distribution services. The focus during the development of the standard has primarily been on 10-bit 4:2:0 content, and 4:4:4 chroma format will also be supported.

The VVC standard is being developed in the Joint Video Experts Team (JVET), a group established jointly by MPEG and the Video Coding Experts Group (VCEG) of ITU-T Study Group 16. In addition to a text specification, the project also includes the development of reference software, a conformance testing suite, and a new standard ISO/IEC 23002-7 specifying supplemental enhancement information messages for coded video bitstreams. The approval process for ISO/IEC 23002-7 has also begun, with the issuance of a CD consideration ballot.

Research aspects: VVC represents the next generation video codec to be deployed in 2020+ and basically the same research aspects apply as for previous generations, i.e., coding efficiency, performance/complexity, and objective/subjective evaluation. Luckily, JVET documents are freely available including the actual standard (committee draft), software (and its description), and common test conditions. Thus, researcher utilizing these resources are able to conduct reproducible research when contributing their findings and code improvements back to the community at large. 

Essential Video Coding (EVC)

MPEG-5 Essential Video Coding (EVC) promoted to Committee Draft

Interestingly, at the same meeting as VVC, MPEG promoted MPEG-5 Essential Video Coding (EVC) to Committee Draft (CD). The goal of MPEG-5 EVC is to provide a standardized video coding solution to address business needs in some use cases, such as video streaming, where existing ISO video coding standards have not been as widely adopted as might be expected from their purely technical characteristics.

The MPEG-5 EVC standards includes a baseline profile that contains only technologies that are over 20 years old or are otherwise expected to be royalty-free. Additionally, a main profile adds a small number of additional tools, each providing significant performance gain. All main profile tools are capable of being individually switched off or individually switched over to a corresponding baseline tool. Organizations making proposals for the main profile have agreed to publish applicable licensing terms within two years of FDIS stage, either individually or as part of a patent pool.

Research aspects: Similar research aspects can be described for EVC and from a software engineering perspective it could be also interesting to further investigate this switching mechanism of individual tools or/and fall back option to baseline tools. Naturally, a comparison with next generation codecs such as VVC is interesting per se. The licensing aspects itself are probably interesting for other disciplines but that is another story…

Common Media Application Format (CMAF)

MPEG ratified the 2nd edition of the Common Media Application Format (CMAF)

The Common Media Application Format (CMAF) enables efficient encoding, storage, and delivery of digital media content (incl. audio, video, subtitles among others), which is key to scaling operations to support the rapid growth of video streaming over the internet. The CMAF standard is the result of widespread industry adoption of an application of MPEG technologies for adaptive video streaming over the Internet, and widespread industry participation in the MPEG process to standardize best practices within CMAF.

The 2nd edition of CMAF adds support for a number of specifications that were a result of significant industry interest. Those include

  • Advanced Audio Coding (AAC) multi-channel;
  • MPEG-H 3D Audio;
  • MPEG-D Unified Speech and Audio Coding (USAC);
  • Scalable High Efficiency Video Coding (SHVC);
  • IMSC 1.1 (Timed Text Markup Language Profiles for Internet Media Subtitles and Captions); and
  • additional HEVC video CMAF profiles and brands.

This edition also introduces CMAF supplemental data handling as well as new structural brands for CMAF that reflects the common practice of the significant deployment of CMAF in industry. Companies adopting CMAF technology will find the specifications introduced in the 2nd Edition particularly useful for further adoption and proliferation of CMAF in the market.

Research aspects: see below (DASH).

Dynamic Adaptive Streaming over HTTP (DASH)

MPEG approves the 4th edition of Dynamic Adaptive Streaming over HTTP (DASH)

The 4th edition of MPEG-DASH comprises the following features:

  • service description that is intended by the service provider on how the service is expected to be consumed;
  • a method to indicate the times corresponding to the production of associated media;
  • a mechanism to signal DASH profiles and features, employed codec and format profiles; and
  • supported protection schemes present in the Media Presentation Description (MPD).

It is expected that this edition will be published later this year. 

Research aspects: CMAF 2nd and DASH 4th edition come along with a rich feature set enabling a plethora of use cases. The underlying principles are still the same and research issues arise from updated application and service requirements with respect to content complexity, time aspects (mainly delay/latency), and quality of experience (QoE). The DASH-IF awards the excellence in DASH award at the ACM Multimedia Systems conference and an overview about its academic efforts can be found here.

Carriage of Point Cloud Data

MPEG progresses the Carriage of Point Cloud Data to Committee Draft

At its 127th meeting, MPEG has promoted the carriage of point cloud data to the Committee Draft stage, the first milestone of ISO standard development process. This standard is the first one introducing the support of volumetric media in the industry-famous ISO base media file format family of standards.

This standard supports the carriage of point cloud data comprising individually encoded video bitstreams within multiple file format tracks in order to support the intrinsic nature of the video-based point cloud compression (V-PCC). Additionally, it also allows the carriage of point cloud data in one file format track for applications requiring multiplexed content (i.e., the video bitstream of multiple components is interleaved into one bitstream).

This standard is expected to support efficient access and delivery of some portions of a point cloud object considering that in many cases that entire point cloud object may not be visible by the user depending on the viewing direction or location of the point cloud object relative to other objects. It is currently expected that the standard will reach its final milestone by the end of 2020.

Research aspects: MPEG’s Point Cloud Compression (PCC) comes in two flavors, video- and geometric-based but still requires to be packaged into file and delivery formats. MPEG’s choice here is the ISO base media file format and the efficient carriage of point cloud data is characterized by both functionality (i.e., enabling the required used cases) and performance (such as low overhead).

MPEG 2 Systems/Transport Stream

JPEG XS carriage in MPEG-2 TS promoted to Final Draft Amendment of ISO/IEC 13818-1 7th edition

At its 127th meeting, WG11 (MPEG) has extended ISO/IEC 13818-1 (MPEG-2 Systems) – in collaboration with WG1 (JPEG) – to support ISO/IEC 21122 (JPEG XS) in order to support industries using still image compression technologies for broadcasting infrastructures. The specification defines a JPEG XS elementary stream header and specifies how the JPEG XS video access unit (specified in ISO/IEC 21122-1) is put into a Packetized Elementary Stream (PES). Additionally, the specification also defines how the System Target Decoder (STD) model can be extended to support JPEG XS video elementary streams.

Genomic information representation

WG11 issues a joint call for proposals on genomic annotations in conjunction with ISO TC 276/WG 5

The introduction of high-throughput DNA sequencing has led to the generation of large quantities of genomic sequencing data that have to be stored, transferred and analyzed. So far WG 11 (MPEG) and ISO TC 276/WG 5 have addressed the representation, compression and transport of genome sequencing data by developing the ISO/IEC 23092 standard series also known as MPEG-G. They provide a file and transport format, compression technology, metadata specifications, protection support, and standard APIs for the access of sequencing data in the native compressed format.

An important element in the effective usage of sequencing data is the association of the data with the results of the analysis and annotations that are generated by processing pipelines and analysts. At the moment such association happens as a separate step, standard and effective ways of linking data and meta information derived from sequencing data are not available.

At its 127th meeting, MPEG and ISO TC 276/WG 5 issued a joint Call for Proposals (CfP) addressing the solution of such problem. The call seeks submissions of technologies that can provide efficient representation and compression solutions for the processing of genomic annotation data.

Companies and organizations are invited to submit proposals in response to this call. Responses are expected to be submitted by the 8th January 2020 and will be evaluated during the 129th WG 11 (MPEG) meeting. Detailed information, including how to respond to the call for proposals, the requirements that have to be considered, and the test data to be used, is reported in the documents N18648, N18647, and N18649 available at the 127th meeting website (http://mpeg.chiariglione.org/meetings/127). For any further question about the call, test conditions, required software or test sequences please contact: Joern Ostermann, MPEG Requirements Group Chair (ostermann@tnt.uni-hannover.de) or Martin Golebiewski, Convenor ISO TC 276/WG 5 (martin.golebiewski@h-its.org).

ISO/IEC 23005 (MPEG-V) 4th Edition

WG11 promotes the Fourth edition of two parts of “Media Context and Control” to the Final Draft International Standard (FDIS) stage

At its 127th meeting, WG11 (MPEG) promoted the 4th edition of two parts of ISO/IEC 23005 (MPEG-V; Media Context and Control) standards to the Final Draft International Standard (FDIS). The new edition of ISO/IEC 23005-1 (architecture) enables ten new use cases, which can be grouped into four categories: 3D printing, olfactory information in virtual worlds, virtual panoramic vision in car, and adaptive sound handling. The new edition of ISO/IEC 23005-7 (conformance and reference software) is updated to reflect the changes made by the introduction of new tools defined in other parts of ISO/IEC 23005. More information on MPEG-V and its parts 1-7 can be found at https://mpeg.chiariglione.org/standards/mpeg-v.


Finally, the unofficial highlight of the 127th MPEG meeting we certainly found while scanning the scene in Gothenburg on Tuesday night…

MPEG127_Metallica

MPEG Column: 126th MPEG Meeting in Geneva, Switzerland

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 126th MPEG meeting concluded on March 29, 2019 in Geneva, Switzerland with the following topics:

  • Three Degrees of Freedom Plus (3DoF+) – MPEG evaluates responses to the Call for Proposal and starts a new project on Metadata for Immersive Video
  • Neural Network Compression for Multimedia Applications – MPEG evaluates responses to the Call for Proposal and kicks off its technical work
  • Low Complexity Enhancement Video Coding – MPEG evaluates responses to the Call for Proposal and selects a Test Model for further development
  • Point Cloud Compression – MPEG promotes its Geometry-based Point Cloud Compression (G-PCC) technology to the Committee Draft (CD) stage
  • MPEG Media Transport (MMT) – MPEG approves 3rd Edition of Final Draft International Standard
  • MPEG-G – MPEG-G standards reach Draft International Standard for Application Program Interfaces (APIs) and Metadata technologies

The corresponding press release of the 126th MPEG meeting can be found here: https://mpeg.chiariglione.org/meetings/126

Three Degrees of Freedom Plus (3DoF+)

MPEG evaluates responses to the Call for Proposal and starts a new project on Metadata for Immersive Video

MPEG’s support for 360-degree video — also referred to as omnidirectional video — is achieved using the Omnidirectional Media Format (OMAF) and Supplemental Enhancement Information (SEI) messages for High Efficiency Video Coding (HEVC). It basically enables the utilization of the tiling feature of HEVC to implement 3DoF applications and services, e.g., users consuming 360-degree content using a head mounted display (HMD). However, rendering flat 360-degree video may generate visual discomfort when objects close to the viewer are rendered. The interactive parallax feature of Three Degrees of Freedom Plus (3DoF+) will provide viewers with visual content that more closely mimics natural vision, but within a limited range of viewer motion.

At its 126th meeting, MPEG received five responses to the Call for Proposals (CfP) on 3DoF+ Visual. Subjective evaluations showed that adding the interactive motion parallax to 360-degree video will be possible. Based on the subjective and objective evaluation, a new project was launched, which will be named Metadata for Immersive Video. A first version of a Working Draft (WD) and corresponding Test Model (TM) were designed to combine technical aspects from multiple responses to the call. The current schedule for the project anticipates Final Draft International Standard (FDIS) in July 2020.

Research aspects: Subjective evaluations in the context of 3DoF+ but also immersive media services in general are actively researched within the multimedia research community (e.g., ACM SIGMM/SIGCHI, QoMEX) resulting in a plethora of research papers. One apparent open issue is the gap between scientific/fundamental research and standards developing organizations (SDOs) and industry fora which often address the same problem space but sometimes adopt different methodologies, approaches, tools, etc. However, MPEG (and also other SDOs) often organize public workshops and there will be one during the next meeting, specifically on July 10, 2019 in Gothenburg, Sweden which will be about “Coding Technologies for Immersive Audio/Visual Experiences”. Further details are available here.

Neural Network Compression for Multimedia Applications

MPEG evaluates responses to the Call for Proposal and kicks off its technical work

Artificial neural networks have been adopted for a broad range of tasks in multimedia analysis and processing, such as visual and acoustic classification, extraction of multimedia descriptors or image and video coding. The trained neural networks for these applications contain a large number of parameters (i.e., weights), resulting in a considerable size. Thus, transferring them to a number of clients using them in applications (e.g., mobile phones, smart cameras) requires compressed representation of neural networks.

At its 126th meeting, MPEG analyzed nine technologies submitted by industry leaders as responses to the Call for Proposals (CfP) for Neural Network Compression. These technologies address compressing neural network parameters in order to reduce their size for transmission and the efficiency of using them, while not or only moderately reducing their performance in specific multimedia applications.

After a formal evaluation of submissions, MPEG identified three main technology components in the compression pipeline, which will be further studied in the development of the standard. A key conclusion is that with the proposed technologies, a compression to 10% or less of the original size can be achieved with no or negligible performance loss, where this performance is measured as classification accuracy in image and audio classification, matching rate in visual descriptor matching, and PSNR reduction in image coding. Some of these technologies also result in the reduction of the computational complexity of using the neural network or can benefit from specific capabilities of the target hardware (e.g., support for fixed point operations).

Research aspects: This topic has been addressed already in previous articles here and here. An interesting observation after this meeting is that apparently the compression efficiency is remarkable, specifically as the performance loss is negligible for specific application domains. However, results are based on certain applications and, thus, general conclusions regarding the compression of neural networks as well as how to evaluate its performance are still subject to future work. Nevertheless, MPEG is certainly leading this activity which could become more and more important as more applications and services rely on AI-based techniques.

Low Complexity Enhancement Video Coding

MPEG evaluates responses to the Call for Proposal and selects a Test Model for further development

MPEG started a new work item referred to as Low Complexity Enhancement Video Coding (LCEVC), which will be added as part 2 of the MPEG-5 suite of codecs. The new standard is aimed at bridging the gap between two successive generations of codecs by providing a codec-agile extension to existing video codecs that improves coding efficiency and can be readily deployed via software upgrade and with sustainable power consumption.

The target is to achieve:

  • coding efficiency close to High Efficiency Video Coding (HEVC) Main 10 by leveraging Advanced Video Coding (AVC) Main Profile and
  • coding efficiency close to upcoming next generation video codecs by leveraging HEVC Main 10.

This coding efficiency should be achieved while maintaining overall encoding and decoding complexity lower than that of the leveraged codecs (i.e., AVC and HEVC, respectively) when used in isolation at full resolution. This target has been met, and one of the responses to the CfP will serve as starting point and test model for the standard. The new standard is expected to become part of the MPEG-5 suite of codecs and its development is expected to be completed in 2020.

Research aspects: In addition to VVC and EVC, LCEVC is now the third video coding project within MPEG basically addressing requirements and needs going beyond HEVC. As usual, research mainly focuses on compression efficiency but a general trend in video coding is probably observable that favors software-based solutions rather than pure hardware coding tools. As such, complexity — both at encoder and decoder — is becoming important as well as power efficiency which are additional factors to be taken into account. Other issues are related to business aspects which are typically discussed elsewhere, e.g., here.

Point Cloud Compression

MPEG promotes its Geometry-based Point Cloud Compression (G-PCC) technology to the Committee Draft (CD) stage

MPEG’s Geometry-based Point Cloud Compression (G-PCC) standard addresses lossless and lossy coding of time-varying 3D point clouds with associated attributes such as color and material properties. This technology is appropriate especially for sparse point clouds.

MPEG’s Video-based Point Cloud Compression (V-PCC) addresses the same problem but for dense point clouds, by projecting the (typically dense) 3D point clouds onto planes, and then processing the resulting sequences of 2D images with video compression techniques.

G-PCC provides a generalized approach, which directly codes the 3D geometry to exploit any redundancy found in the point cloud itself and is complementary to V-PCC and particularly useful for sparse point clouds representing large environments.

Point clouds are typically represented by extremely large amounts of data, which is a significant barrier for mass market applications. However, the relative ease to capture and render spatial information compared to other volumetric video representations makes point clouds increasingly popular to present immersive volumetric data. The current implementation of a lossless, intra-frame G-PCC encoder provides a compression ratio up to 10:1 and acceptable quality lossy coding of ratio up to 35:1.

Research aspects: After V-PCC MPEG has now promoted G-PCC to CD but, in principle, the same research aspects are relevant as discussed here. Thus, coding efficiency is the number one performance metric but also coding complexity and power consumption needs to be considered to enable industry adoption. Systems technologies and adaptive streaming are actively researched within the multimedia research community, specifically ACM MM and ACM MMSys.

MPEG Media Transport (MMT)

MPEG approves 3rd Edition of Final Draft International Standard

MMT 3rd edition will introduce two aspects:

  • enhancements for mobile environments and
  • support of Contents Delivery Networks (CDNs).

The support for multipath delivery will enable delivery of services over more than one network connection concurrently, which is specifically useful for mobile devices that can support more than one connection at a time.

Additionally, support for intelligent network entities involved in media services (i.e., Media Aware Network Entity (MANE)) will make MMT-based services adapt to changes of the mobile network faster and better. Understanding the support for load balancing is an important feature of CDN-based content delivery, messages for DNS management, media resource update, and media request is being added in this edition.

On going developments within MMT will add support for the usage of MMT over QUIC (Quick UDP Internet Connections) and support of FCAST in the context of MMT.

Research aspects: Multimedia delivery/transport is still an important issue, specifically as multimedia data on the internet is increasing much faster than network bandwidth. In particular, the multimedia research community (i.e., ACM MM and ACM MMSys) is looking into novel approaches and tools utilizing exiting/emerging protocols/techniques like HTTP/2, HTTP/3 (QUIC), WebRTC, and Information-Centric Networking (ICN). One question, however, remains, namely what is the next big thing in multimedia delivery/transport as currently we are certainly in a phase where tools like adaptive HTTP streaming (HAS) reached maturity and the multimedia research community is eager to work on new topics in this domain.

MPEG Column: 125th MPEG Meeting in Marrakesh, Morocco

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 125th MPEG meeting concluded on January 18, 2019 in Marrakesh, Morocco with the following topics:

  • Network-Based Media Processing (NBMP) – MPEG promotes NBMP to Committee Draft stage
  • 3DoF+ Visual – MPEG issues Call for Proposals on Immersive 3DoF+ Video Coding Technology
  • MPEG-5 Essential Video Coding (EVC) – MPEG starts work on MPEG-5 Essential Video Coding
  • ISOBMFF – MPEG issues Final Draft International Standard of Conformance and Reference software for formats based on the ISO Base Media File Format (ISOBMFF)
  • MPEG-21 User Description – MPEG finalizes 2nd edition of the MPEG-21 User Description

The corresponding press release of the 125th MPEG meeting can be found here. In this blog post I’d like to focus on those topics potentially relevant for over-the-top (OTT), namely NBMP, EVC, and ISOBMFF.

Network-Based Media Processing (NBMP)

The NBMP standard addresses the increasing complexity and sophistication of media services, specifically as the incurred media processing requires offloading complex media processing operations to the cloud/network to keep receiver hardware simple and power consumption low. Therefore, NBMP standard provides a standardized framework that allows content and service providers to describe, deploy, and control media processing for their content in the cloud. It comes with two main functions: (i) an abstraction layer to be deployed on top of existing cloud platforms (+ support for 5G core and edge computing) and (ii) a workflow manager to enable composition of multiple media processing tasks (i.e., process incoming media and metadata from a media source and produce processed media streams and metadata that are ready for distribution to a media sink). The NBMP standard now reached Committee Draft (CD) stage and final milestone is targeted for early 2020.

In particular, a standard like NBMP might become handy in the context of 5G in combination with mobile edge computing (MEC) which allows offloading certain tasks to a cloud environment in close proximity to the end user. For OTT, this could enable lower latency and more content being personalized towards the user’s context conditions and needs, hopefully leading to a better quality and user experience.

For further research aspects please see one of my previous posts

MPEG-5 Essential Video Coding (EVC)

MPEG-5 EVC clearly targets the high demand for efficient and cost-effective video coding technologies. Therefore, MPEG commenced work on such a new video coding standard that should have two profiles: (i) royalty-free baseline profile and (ii) main profile, which adds a small number of additional tools, each of which is capable, on an individual basis, of being either cleanly switched off or else switched over to the corresponding baseline tool. Timely publication of licensing terms (if any) is obviously very important for the success of such a standard.

The target coding efficiency for responses to the call for proposals was to be at least as efficient as HEVC. This target was exceeded by approximately 24% and the development of the MPEG-5 EVC standard is expected to be completed in 2020.

As of today, there’s the need to support AVC, HEVC, VP9, and AV1; soon VVC will become important. In other words, we already have a multi-codec environment to support and one might argue one more codec is probably not a big issue. The main benefit of EVC will be a royalty-free baseline profile but with AV1 there’s already such a codec available and it will be interesting to see how the royalty-free baseline profile of EVC compares to AV1.

For a new video coding format we will witness a plethora of evaluations and comparisons with existing formats (i.e., AVC, HEVC, VP9, AV1, VVC). These evaluations will be mainly based on objective metrics such as PSNR, SSIM, and VMAF. It will be also interesting to see subjective evaluations, specifically targeting OTT use cases (e.g., live and on demand).

ISO Base Media File Format (ISOBMFF)

The ISOBMFF (ISO/IEC 14496-12) is used as basis for many file (e.g., MP4) and streaming formats (e.g., DASH, CMAF) and as such received widespread adoption in both industry and academia. An overview of ISOBMFF is available here. The reference software is now available on GitHub and a plethora of conformance files are available here. In this context, the open source project GPAC is probably the most interesting aspect from a research point of view.