About Christian Timmerer

Christian Timmerer is a researcher, entrepreneur, and teacher on immersive multimedia communication, streaming, adaptation, and Quality of Experience. He is an Assistant Professor at Alpen-Adria-Universität Klagenfurt, Austria. Follow him on Twitter at http://twitter.com/timse7 and subscribe to his blog at http://blog.timmerer.com.

MPEG Column: 126th MPEG Meeting in Geneva, Switzerland

By Christian Timmerer | May 22, 2019 - 08:55 |July 6, 2019 0219, Feature, Standards

Leave a comment

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 126th MPEG meeting concluded on March 29, 2019 in Geneva, Switzerland with the following topics:

Three Degrees of Freedom Plus (3DoF+) – MPEG evaluates responses to the Call for Proposal and starts a new project on Metadata for Immersive Video
Neural Network Compression for Multimedia Applications – MPEG evaluates responses to the Call for Proposal and kicks off its technical work
Low Complexity Enhancement Video Coding – MPEG evaluates responses to the Call for Proposal and selects a Test Model for further development
Point Cloud Compression – MPEG promotes its Geometry-based Point Cloud Compression (G-PCC) technology to the Committee Draft (CD) stage
MPEG Media Transport (MMT) – MPEG approves 3rd Edition of Final Draft International Standard
MPEG-G – MPEG-G standards reach Draft International Standard for Application Program Interfaces (APIs) and Metadata technologies

The corresponding press release of the 126th MPEG meeting can be found here: https://mpeg.chiariglione.org/meetings/126

Three Degrees of Freedom Plus (3DoF+)

MPEG evaluates responses to the Call for Proposal and starts a new project on Metadata for Immersive Video

MPEG’s support for 360-degree video — also referred to as omnidirectional video — is achieved using the Omnidirectional Media Format (OMAF) and Supplemental Enhancement Information (SEI) messages for High Efficiency Video Coding (HEVC). It basically enables the utilization of the tiling feature of HEVC to implement 3DoF applications and services, e.g., users consuming 360-degree content using a head mounted display (HMD). However, rendering flat 360-degree video may generate visual discomfort when objects close to the viewer are rendered. The interactive parallax feature of Three Degrees of Freedom Plus (3DoF+) will provide viewers with visual content that more closely mimics natural vision, but within a limited range of viewer motion.

At its 126th meeting, MPEG received five responses to the Call for Proposals (CfP) on 3DoF+ Visual. Subjective evaluations showed that adding the interactive motion parallax to 360-degree video will be possible. Based on the subjective and objective evaluation, a new project was launched, which will be named Metadata for Immersive Video. A first version of a Working Draft (WD) and corresponding Test Model (TM) were designed to combine technical aspects from multiple responses to the call. The current schedule for the project anticipates Final Draft International Standard (FDIS) in July 2020.

Research aspects: Subjective evaluations in the context of 3DoF+ but also immersive media services in general are actively researched within the multimedia research community (e.g., ACM SIGMM/SIGCHI, QoMEX) resulting in a plethora of research papers. One apparent open issue is the gap between scientific/fundamental research and standards developing organizations (SDOs) and industry fora which often address the same problem space but sometimes adopt different methodologies, approaches, tools, etc. However, MPEG (and also other SDOs) often organize public workshops and there will be one during the next meeting, specifically on July 10, 2019 in Gothenburg, Sweden which will be about “Coding Technologies for Immersive Audio/Visual Experiences”. Further details are available here.

Neural Network Compression for Multimedia Applications

MPEG evaluates responses to the Call for Proposal and kicks off its technical work

Artificial neural networks have been adopted for a broad range of tasks in multimedia analysis and processing, such as visual and acoustic classification, extraction of multimedia descriptors or image and video coding. The trained neural networks for these applications contain a large number of parameters (i.e., weights), resulting in a considerable size. Thus, transferring them to a number of clients using them in applications (e.g., mobile phones, smart cameras) requires compressed representation of neural networks.

At its 126th meeting, MPEG analyzed nine technologies submitted by industry leaders as responses to the Call for Proposals (CfP) for Neural Network Compression. These technologies address compressing neural network parameters in order to reduce their size for transmission and the efficiency of using them, while not or only moderately reducing their performance in specific multimedia applications.

After a formal evaluation of submissions, MPEG identified three main technology components in the compression pipeline, which will be further studied in the development of the standard. A key conclusion is that with the proposed technologies, a compression to 10% or less of the original size can be achieved with no or negligible performance loss, where this performance is measured as classification accuracy in image and audio classification, matching rate in visual descriptor matching, and PSNR reduction in image coding. Some of these technologies also result in the reduction of the computational complexity of using the neural network or can benefit from specific capabilities of the target hardware (e.g., support for fixed point operations).

Research aspects: This topic has been addressed already in previous articles here and here. An interesting observation after this meeting is that apparently the compression efficiency is remarkable, specifically as the performance loss is negligible for specific application domains. However, results are based on certain applications and, thus, general conclusions regarding the compression of neural networks as well as how to evaluate its performance are still subject to future work. Nevertheless, MPEG is certainly leading this activity which could become more and more important as more applications and services rely on AI-based techniques.

Low Complexity Enhancement Video Coding

MPEG evaluates responses to the Call for Proposal and selects a Test Model for further development

MPEG started a new work item referred to as Low Complexity Enhancement Video Coding (LCEVC), which will be added as part 2 of the MPEG-5 suite of codecs. The new standard is aimed at bridging the gap between two successive generations of codecs by providing a codec-agile extension to existing video codecs that improves coding efficiency and can be readily deployed via software upgrade and with sustainable power consumption.

The target is to achieve:

coding efficiency close to High Efficiency Video Coding (HEVC) Main 10 by leveraging Advanced Video Coding (AVC) Main Profile and
coding efficiency close to upcoming next generation video codecs by leveraging HEVC Main 10.

This coding efficiency should be achieved while maintaining overall encoding and decoding complexity lower than that of the leveraged codecs (i.e., AVC and HEVC, respectively) when used in isolation at full resolution. This target has been met, and one of the responses to the CfP will serve as starting point and test model for the standard. The new standard is expected to become part of the MPEG-5 suite of codecs and its development is expected to be completed in 2020.

Research aspects: In addition to VVC and EVC, LCEVC is now the third video coding project within MPEG basically addressing requirements and needs going beyond HEVC. As usual, research mainly focuses on compression efficiency but a general trend in video coding is probably observable that favors software-based solutions rather than pure hardware coding tools. As such, complexity — both at encoder and decoder — is becoming important as well as power efficiency which are additional factors to be taken into account. Other issues are related to business aspects which are typically discussed elsewhere, e.g., here.

Point Cloud Compression

MPEG promotes its Geometry-based Point Cloud Compression (G-PCC) technology to the Committee Draft (CD) stage

MPEG’s Geometry-based Point Cloud Compression (G-PCC) standard addresses lossless and lossy coding of time-varying 3D point clouds with associated attributes such as color and material properties. This technology is appropriate especially for sparse point clouds.

MPEG’s Video-based Point Cloud Compression (V-PCC) addresses the same problem but for dense point clouds, by projecting the (typically dense) 3D point clouds onto planes, and then processing the resulting sequences of 2D images with video compression techniques.

G-PCC provides a generalized approach, which directly codes the 3D geometry to exploit any redundancy found in the point cloud itself and is complementary to V-PCC and particularly useful for sparse point clouds representing large environments.

Point clouds are typically represented by extremely large amounts of data, which is a significant barrier for mass market applications. However, the relative ease to capture and render spatial information compared to other volumetric video representations makes point clouds increasingly popular to present immersive volumetric data. The current implementation of a lossless, intra-frame G-PCC encoder provides a compression ratio up to 10:1 and acceptable quality lossy coding of ratio up to 35:1.

Research aspects: After V-PCC MPEG has now promoted G-PCC to CD but, in principle, the same research aspects are relevant as discussed here. Thus, coding efficiency is the number one performance metric but also coding complexity and power consumption needs to be considered to enable industry adoption. Systems technologies and adaptive streaming are actively researched within the multimedia research community, specifically ACM MM and ACM MMSys.

MPEG Media Transport (MMT)

MPEG approves 3rd Edition of Final Draft International Standard

MMT 3rd edition will introduce two aspects:

enhancements for mobile environments and
support of Contents Delivery Networks (CDNs).

The support for multipath delivery will enable delivery of services over more than one network connection concurrently, which is specifically useful for mobile devices that can support more than one connection at a time.

Additionally, support for intelligent network entities involved in media services (i.e., Media Aware Network Entity (MANE)) will make MMT-based services adapt to changes of the mobile network faster and better. Understanding the support for load balancing is an important feature of CDN-based content delivery, messages for DNS management, media resource update, and media request is being added in this edition.

On going developments within MMT will add support for the usage of MMT over QUIC (Quick UDP Internet Connections) and support of FCAST in the context of MMT.

Research aspects: Multimedia delivery/transport is still an important issue, specifically as multimedia data on the internet is increasing much faster than network bandwidth. In particular, the multimedia research community (i.e., ACM MM and ACM MMSys) is looking into novel approaches and tools utilizing exiting/emerging protocols/techniques like HTTP/2, HTTP/3 (QUIC), WebRTC, and Information-Centric Networking (ICN). One question, however, remains, namely what is the next big thing in multimedia delivery/transport as currently we are certainly in a phase where tools like adaptive HTTP streaming (HAS) reached maturity and the multimedia research community is eager to work on new topics in this domain.

MPEG Column: 125th MPEG Meeting in Marrakesh, Morocco

By Christian Timmerer | April 12, 2019 - 21:38 |April 13, 2019 0119, Feature, Standards

Leave a comment

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 125th MPEG meeting concluded on January 18, 2019 in Marrakesh, Morocco with the following topics:

Network-Based Media Processing (NBMP) – MPEG promotes NBMP to Committee Draft stage
3DoF+ Visual – MPEG issues Call for Proposals on Immersive 3DoF+ Video Coding Technology
MPEG-5 Essential Video Coding (EVC) – MPEG starts work on MPEG-5 Essential Video Coding
ISOBMFF – MPEG issues Final Draft International Standard of Conformance and Reference software for formats based on the ISO Base Media File Format (ISOBMFF)
MPEG-21 User Description – MPEG finalizes 2nd edition of the MPEG-21 User Description

The corresponding press release of the 125th MPEG meeting can be found here. In this blog post I’d like to focus on those topics potentially relevant for over-the-top (OTT), namely NBMP, EVC, and ISOBMFF.

Network-Based Media Processing (NBMP)

The NBMP standard addresses the increasing complexity and sophistication of media services, specifically as the incurred media processing requires offloading complex media processing operations to the cloud/network to keep receiver hardware simple and power consumption low. Therefore, NBMP standard provides a standardized framework that allows content and service providers to describe, deploy, and control media processing for their content in the cloud. It comes with two main functions: (i) an abstraction layer to be deployed on top of existing cloud platforms (+ support for 5G core and edge computing) and (ii) a workflow manager to enable composition of multiple media processing tasks (i.e., process incoming media and metadata from a media source and produce processed media streams and metadata that are ready for distribution to a media sink). The NBMP standard now reached Committee Draft (CD) stage and final milestone is targeted for early 2020.

In particular, a standard like NBMP might become handy in the context of 5G in combination with mobile edge computing (MEC) which allows offloading certain tasks to a cloud environment in close proximity to the end user. For OTT, this could enable lower latency and more content being personalized towards the user’s context conditions and needs, hopefully leading to a better quality and user experience.

For further research aspects please see one of my previous posts.

MPEG-5 Essential Video Coding (EVC)

MPEG-5 EVC clearly targets the high demand for efficient and cost-effective video coding technologies. Therefore, MPEG commenced work on such a new video coding standard that should have two profiles: (i) royalty-free baseline profile and (ii) main profile, which adds a small number of additional tools, each of which is capable, on an individual basis, of being either cleanly switched off or else switched over to the corresponding baseline tool. Timely publication of licensing terms (if any) is obviously very important for the success of such a standard.

The target coding efficiency for responses to the call for proposals was to be at least as efficient as HEVC. This target was exceeded by approximately 24% and the development of the MPEG-5 EVC standard is expected to be completed in 2020.

As of today, there’s the need to support AVC, HEVC, VP9, and AV1; soon VVC will become important. In other words, we already have a multi-codec environment to support and one might argue one more codec is probably not a big issue. The main benefit of EVC will be a royalty-free baseline profile but with AV1 there’s already such a codec available and it will be interesting to see how the royalty-free baseline profile of EVC compares to AV1.

For a new video coding format we will witness a plethora of evaluations and comparisons with existing formats (i.e., AVC, HEVC, VP9, AV1, VVC). These evaluations will be mainly based on objective metrics such as PSNR, SSIM, and VMAF. It will be also interesting to see subjective evaluations, specifically targeting OTT use cases (e.g., live and on demand).

ISO Base Media File Format (ISOBMFF)

The ISOBMFF (ISO/IEC 14496-12) is used as basis for many file (e.g., MP4) and streaming formats (e.g., DASH, CMAF) and as such received widespread adoption in both industry and academia. An overview of ISOBMFF is available here. The reference software is now available on GitHub and a plethora of conformance files are available here. In this context, the open source project GPAC is probably the most interesting aspect from a research point of view.

Solving Complex Issues through Immersive Narratives — Does QoE Play a Role?

By Christian Timmerer | April 11, 2019 - 09:27 |April 13, 2019 0119, Feature, QoE Column

Leave a comment

Introduction

A transdisciplinary dialogue and innovative research, including technical and artistic research as well as digital humanities are necessary to solve complex issues. We need to support and produce creative practices, and engage in a critical reflection about the social and ethical dimensions of our current technology developments. At the core is an understanding that no single discipline, technology, or field can produce knowledge capable of addressing the complexities and crises of the contemporary world. Moreover, we see the arts and humanities as critical tools for understanding this hyper-complex, mediated, and fragmented global reality. As a use case, we will consider the complexity of extreme weather events, natural disasters and failure of climate change mitigation and adaptation, which are the risks with the highest likelihood of occurrence and largest global impact (World Economic Forum, 2017). Through our project, World of Wild Waters (WoWW), we are using immersive narratives and gamification to create a simpler holistic understanding of cause and effect of natural hazards by creating immersive user experiences based on real data, realistic scenarios and simulations. The objective is to increase societal preparedness for a multitude of stakeholders. Quality of Experience (QoE) modeling and assessment of immersive media experiences are at the heart of the expected impact of the narratives, where we would expect active participation, engagement and change, to play a key role [1].

Here, we present our views of immersion and presence in light of Quality of Experience (QoE). We will discuss the technical and creative considerations needed for QoE modeling and assessment of immersive media experiences. Finally, we will provide some reflections on QoE being an important building block in immersive narratives in general, and especially towards considering Extended Realities (XR) as an instantiation of Digital storytelling.

But what is Immersion and an Immersive Media Experience?

Immersion and immersive media experiences are commonly used terms in industry and academia today to describe new digital media. However, there is a gap in definitions of the term between the two worlds that can lead to confusions. This gap needs to be filled for XR to become a success and finally hit the masses, and not simply vanish as it has done so many times before since the invention of VR in 1962 by Morton Heilig (The Sensorama, or «Experience Theatre»). Immersion, thus far, can be plainly put as submersion in a medium (representational, fictional or simulated). It refers to a sense of belief, or the suspension of disbelief, while describing the experience/event of being surrounded by an environment (artificial, mental, etc.). This view is contrasted by a data-oriented view often used by technophiles who regard immersion as a technological feat that ensures a multimodal sensory input to the user [2]. This is the objective description, which views immersion as quantifiable afforded or offered by the system (computer and head-mounted display (HMD), in this case).

Developing immersion on these lines risks favoring the typology of spatial immersion while alienating the rest (phenomenological, narrative, tactical, pleasure, etc.). This can be seen in recent VR applications that propel high-fidelity, low-latency, and precision-tracking products that aim to simulate the exactitude of sensorial information (visual, auditory, haptic) available in the real world to make the experience as ‘real’ as possible – a sense of realness, that is not necessarily immersive [3].

Another closely related phenomenon is that of presence, shortened from its original 1980’s form of telepresence [3]. It is a core phenomenon for immersive technologies describing an engagement via technology where one feels as oneself, even though physically removed. This definition was later appropriated for simulated/virtual environments where it was described as a “feeling of being transported” into the synthetic/artificial space of a simulated environment. It is for this reason that presence, a subjective sensation, is most often associated with spatial immersion. A renewed interest in presence research has invited fresh insights into conceptualizing presence.

Based on the technical or system approach towards immersion, we can refer to immersive media experiences through the definitions given in in Figure 1.

Figure 1. Evolution of current immersive media experiences

Figure 1. Definitions of current immersive media experiences

Much of the media considered today still consists of audio and visual presentations, but now enriched by new functionality such as 360 view, 3D and enabling interactivity. The ultimate goals are to create immersive media experiences by digitally creating real world presence by using available media technology and optimizing the experience as perceived by the participant [4].

Immersive Narratives for Solving Complex issues

The optimized immersive experience can be used in various domains to help solve complex issues by narration or gamification. Through World of Wild Waters (WoWW) we aim to focus on immersive narration and gamification of natural hazards. The project focuses on implication of immersive storytelling for disaster management by depicting extreme weather events and natural disasters. Immersive media experiences can present XR solutions for natural hazards by simulating real time data and providing people with a hands-on experience of how it feels to face an unexpected disaster. Immersive narratives can be used to allow people to be better prepared by experiencing the effects of different emergency scenarios while in a safe environment. However, QoE modeling and assessment for serious immersive narrations is a challenge and one need to carefully combine immersion, media technology and end user experiences for solving such complex issues.

Does QoE Play a Role?

Current state-of-the-art (SOTA) in immersive narratives from a technology point of view is by implementing virtual experience through Virtual Reality (VR), Augmented Reality (AR) and Mixed Reality (MR), commonly referred to as eXtended Realities (XR) seen as XR. Discussing the SOTA of XR is challenging as it exists across a large number of companies and sectors in form of fragmented domain specific products and services, and is changing from quarter to quarter. The definitions of immersion and presence differ, however, it is important to raise awareness of its generic building blocks to start a discussion on the way to move forward. The most important building blocks are the use of digital storytelling in the creation of the experience and the quality of the final experiences as perceived by the participants.

XR relies heavily on immersive narratives, stories where the experiences surround you providing a sense of realness as well as a sense of being there. Following Mel Slaters platform for VR [5], immersion consists of three parts:

the concrete technical system for production,
the illusions we are addressing and
the resulting experience as interpreted by the participant.

The illusions part of XR play on providing a sense of being in a different place, which through high quality media makes us perceive that this is really happening (plausibility). Providing a high-quality experience eventually make us feel as participants in the story (agency). Finally, by feeling we are really participating in the experience, we get body ownership in this place. To be able to achieve these high-quality future media technology experiences we need new work processes and work flows for immersive experiences, requiring a vibrant connection between artists, innovators and technologists utilizing creative narratives and interactivity. To validate their quality and usefulness and ultimately business success, we need to focus on research and innovation within quality modeling and assessment making it possibly for the creators to iteratively improve the performance of their XR experience.

A transdisciplinary approach to immersive media experiences amplifies the relevance of content. Current QoE models predominantly treat content as a system influence factor, which allows for evaluations limited to its format, i.e., nature (e.g., image, sound, motion, speech, etc.) and type (e.g., analog or digital). Such a definition seems insufficient given how much the overall perceptual quality of such media is important. With technologies becoming mainstream, there is a global push for engaging content. Successful XR applications require strong content to generate, and retain, interest. One-time adventures, such as rollercoaster rides, are now deal breakers. With technologies, users too have matured, as the novelty factor of such media diminishes so does the initial preoccupation with interactivity and simulations. Immersive experiences must rely on content for a lasting impression.

However, the social impact of this media saturated reality is yet to be completely understood. QoE modeling and assessment and business models are evolving as we see more and more experiences being used commercially. However, there is still a lot of work to be done in the fields of the legal, ethical, political, health and cultural domains.

Conclusion

Immersive media experiences make a significant impact on the use and experience of new digital media through new and innovative approaches. These services are capable of establishing advanced transferable and sustainable best practices, specifically in art and technology, for playful and liveable human centered experiences solving complex problems. Further, the ubiquity of such media is changing our understanding for mediums as they form liveable environments that envelop our lives as a whole. The effects of these experiences are challenging our traditional concepts of liveability, which is why it is imperative for us to approach them as a paradigmatic shift in the civilizational project. The path taken should merge work on the technical aspects (systems) with the creative considerations (content).

Reference and Bibliography Entries

[1] Le Callet, P., Möller, S. and Perkis, A., 2013. Qualinet White Paper on Definitions of Quality of Experience (2012). European Network on Quality of Experience in Multimedia Systems and Services (COST Action IC 1003). Version 1.2. Mar-2013. [URL]

[2] Perrin, A.F.N.M., Xu, H., Kroupi, E., Řeřábek, M. and Ebrahimi, T., 2015, October. Multimodal dataset for assessment of quality of experience in immersive multimedia. In Proceedings of the 23rd ACM international conference on Multimedia (pp. 1007-1010). ACM. [URL]

[3] Normand, V., Babski, C., Benford, S., Bullock, A., Carion, S., Chrysanthou, Y., Farcet, N., Frécon, E., Harvey, J., Kuijpers, N. and Magnenat-Thalmann, N., 1999. The COVEN project: Exploring applicative, technical, and usage dimensions of collaborative virtual environments. Presence: Teleoperators & Virtual Environments, 8(2), pp.218-236. [URL]

[4] A. Perkis and A. Hameed, “Immersive media experiences – what do we need to move forward?,” SMPTE 2018, Westin Bonaventure Hotel & Suites, Los Angeles, California, 2018, pp. 1-12.
doi: 10.5594/M001846

[5] M. Slater, MV Sanchez-Vives, “Enhancing Our Lives with Immersive Virtual Reality”, Frontiers in Robotics and AI, 2016 – frontiersin.org

Note from the Editors:

Quality of Experience (QoE) in the context of immersive media applications and services are gaining momentum as such apps/services become available. Thus, it requires a deep integrated understanding of all involved aspects and corresponding scientific evaluations of the various dimensions (including but not limited to reproducibility). Therefore, the interested reader is referred to QUALINET and QoMEX, specifically QoMEX2019 which play a key role in this exciting application domain.

MPEG Column: 124th MPEG Meeting in Macau, China

By Christian Timmerer | December 16, 2018 - 11:12 |December 21, 2018 0418, Feature, Standards

Leave a comment

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The MPEG press release comprises the following aspects:

Point Cloud Compression – MPEG promotes a video-based point cloud compression technology to the Committee Draft stage
Compressed Representation of Neural Networks – MPEG issues Call for Proposals
Low Complexity Video Coding Enhancements – MPEG issues Call for Proposals
New Video Coding Standard expected to have licensing terms timely available – MPEG issues Call for Proposals
Multi-Image Application Format (MIAF) promoted to Final Draft International Standard
3DoF+ Draft Call for Proposal goes Public

Point Cloud Compression – MPEG promotes a video-based point cloud compression technology to the Committee Draft stage

At its 124th meeting, MPEG promoted its Video-based Point Cloud Compression (V-PCC) standard to Committee Draft (CD) stage. V-PCC addresses lossless and lossy coding of 3D point clouds with associated attributes such as colour. By leveraging existing and video ecosystems in general (hardware acceleration, transmission services and infrastructure), and future video codecs as well, the V-PCC technology enables new applications. The current V-PCC encoder implementation provides a compression of 125:1, which means that a dynamic point cloud of 1 million points could be encoded at 8 Mbit/s with good perceptual quality.

A next step is the storage of V-PCC in ISOBMFF for which a working draft has been produced. It is expected that further details will be discussed in upcoming reports.

Research aspects: Video-based Point Cloud Compression (V-PCC) is at CD stage and a first working draft for the storage of V-PCC in ISOBMFF has been provided. Thus, a next consequence is the delivery of V-PCC encapsulated in ISOBMFF over networks utilizing various approaches, protocols, and tools. Additionally, one may think of using also different encapsulation formats if needed.

MPEG issues Call for Proposals on Compressed Representation of Neural Networks

Artificial neural networks have been adopted for a broad range of tasks in multimedia analysis and processing, media coding, data analytics, and many other fields. Their recent success is based on the feasibility of processing much larger and complex neural networks (deep neural networks, DNNs) than in the past, and the availability of large-scale training data sets. Some applications require the deployment of a particular trained network instance to a potentially large number of devices and, thus, could benefit from a standard for the compressed representation of neural networks. Therefore, MPEG has issued a Call for Proposals (CfP) for compression technology for neural networks, focusing on the compression of parameters and weights, focusing on four use cases: (i) visual object classification, (ii) audio classification, (iii) visual feature extraction (as used in MPEG CDVA), and (iv) video coding.

Research aspects: As point out last time, research here will mainly focus around compression efficiency for both lossy and lossless scenarios. Additionally, communication aspects such as transmission of compressed artificial neural networks within lossy, large-scale environments including update mechanisms may become relevant in the (near) future.

MPEG issues Call for Proposals on Low Complexity Video Coding Enhancements

Upon request from the industry, MPEG has identified an area of interest in which video technology deployed in the market (e.g., AVC, HEVC) can be enhanced in terms of video quality without the need to necessarily replace existing hardware. Therefore, MPEG has issued a Call for Proposals (CfP) on Low Complexity Video Coding Enhancements.

The objective is to develop video coding technology with a data stream structure defined by two component streams: a base stream decodable by a hardware decoder and an enhancement stream suitable for software processing implementation. The project is meant to be codec agnostic; in other words, the base encoder and base decoder can be AVC, HEVC, or any other codec in the market.

Research aspects: The interesting aspect here is that this use case assumes a legacy base decoder – most likely realized in hardware – which is enhanced with software-based implementations to improve coding efficiency or/and quality without sacrificing capabilities of the end user in terms of complexity and, thus, energy efficiency due to the software based solution.

MPEG issues Call for Proposals for a New Video Coding Standard expected to have licensing terms timely available

At its 124th meeting, MPEG issued a Call for Proposals (CfP) for a new video coding standard to address combinations of both technical and application (i.e., business) requirements that may not be adequately met by existing standards. The aim is to provide a standardized video compression solution which combines coding efficiency similar to that of HEVC with a level of complexity suitable for real-time encoding/decoding and the timely availability of licensing terms.

Research aspects: This new work item is more related to business aspects (i.e., licensing terms) than technical aspects of video coding.

Multi-Image Application Format (MIAF) promoted to Final Draft International Standard

The Multi-Image Application Format (MIAF) defines interoperability points for creation, reading, parsing, and decoding of images embedded in High Efficiency Image File (HEIF) format by (i) only defining additional constraints on the HEIF format, (ii) limiting the supported encoding types to a set of specific profiles and levels, (iii) requiring specific metadata formats, and (iv) defining a set of brands for signaling such constraints including specific depth map and alpha plane formats. For instance, it addresses use case like a capturing device may use one of HEIF codecs with a specific HEVC profile and level in its created HEIF files, while a playback device is only capable of decoding the AVC bitstreams.

Research aspects: MIAF is an application format which is defined as a combination of tools (incl. profiles and levels) of other standards (e.g., audio codecs, video codecs, systems) to address the needs of a specific application. Thus, the research is related to use cases enabled by this application format.

3DoF+ Draft Call for Proposal goes Public

Following investigations on the coding of “three Degrees of Freedom plus” (3DoF+) content in the context of MPEG-I, the MPEG video subgroup has provided evidence demonstrating the capability to encode a 3DoF+ content efficiently while maintaining compatibility with legacy HEVC hardware. As a result, MPEG decided to issue a draft Call for Proposal (CfP) to the public containing the information necessary to prepare for the final Call for Proposal expected to occur at the 125th MPEG meeting (January 2019) with responses due at the 126th MPEG meeting (March 2019).

Research aspects: This work item is about video (coding) and, thus, research is about compression efficiency.

What else happened at #MPEG124?

MPEG-DASH 3rd edition is still in the final editing phase and not yet available. Last time, I wrote that we expect final publication later this year or early next year and we hope this is still the case. At this meeting Amendment.5 is progressed to DAM and conformance/reference software for SRD, SAND and Server Push is also promoted to DAM. In other words, DASH is pretty much in maintenance mode.
MPEG-I (systems part) is working on immersive media access and delivery and I guess more updates will come on this after the next meeting. OMAF is working on a 2nd edition for which a working draft exists and phase 2 use cases (public document) and draft requirements are discussed.
Versatile Video Coding (VVC): working draft 3 (WD3) and test model 3 (VTM3) has been issued at this meeting including a large number of new tools. Both documents (and software) will be publicly available after editing periods (Nov. 23 for WD3 and Dec 14 for VTM3).

MPEG Column: 123rd MPEG Meeting in Ljubljana, Slovenia

By Christian Timmerer | September 14, 2018 - 11:16 |October 13, 2018 0318, Feature, Standards

Leave a comment

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The MPEG press release comprises the following topics:

MPEG issues Call for Evidence on Compressed Representation of Neural Networks
Network-Based Media Processing – MPEG evaluates responses to call for proposal and kicks off its technical work
MPEG finalizes 1st edition of Technical Report on Architectures for Immersive Media
MPEG releases software for MPEG-I visual activities
MPEG enhances ISO Base Media File Format (ISOBMFF) with new features

MPEG issues Call for Evidence on Compressed Representation of Neural Networks

Artificial neural networks have been adopted for a broad range of tasks in multimedia analysis and processing, media coding, data analytics, translation and many other fields. Their recent success is based on the feasibility of processing much larger and complex neural networks (deep neural networks, DNNs) than in the past, and the availability of large-scale training data sets. As a consequence, trained neural networks contain a large number of parameters (weights), resulting in a quite large size (e.g., several hundred MBs). Many applications require the deployment of a particular trained network instance, potentially to a larger number of devices, which may have limitations in terms of processing power and memory (e.g., mobile devices or smart cameras). Any use case, in which a trained neural network (and its updates) needs to be deployed to a number of devices could thus benefit from a standard for the compressed representation of neural networks.

At its 123rd meeting, MPEG has issued a Call for Evidence (CfE) for compression technology for neural networks. The compression technology will be evaluated in terms of compression efficiency, runtime, and memory consumption and the impact on performance in three use cases: visual object classification, visual feature extraction (as used in MPEG Compact Descriptors for Visual Analysis) and filters for video coding. Responses to the CfE will be analyzed on the weekend prior to and during the 124th MPEG meeting in October 2018 (Macau, CN).

Research aspects: As this is about “compression” of structured data, research aspects will mainly focus around compression efficiency for both lossy and lossless scenarios. Additionally, communication aspects such as transmission of compressed artificial neural networks within lossy, large-scale environments including update mechanisms may become relevant in the (near) future. Furthermore, additional use cases should be communicated towards MPEG until the next meeting.

Network-Based Media Processing – MPEG evaluates responses to call for proposal and kicks off its technical work

Recent developments in multimedia have brought significant innovation and disruption to the way multimedia content is created and consumed. At its 123rd meeting, MPEG analyzed the technologies submitted by eight industry leaders as responses to the Call for Proposals (CfP) for Network-Based Media Processing (NBMP, MPEG-I Part 8). These technologies address advanced media processing use cases such as network stitching for virtual reality (VR) services, super-resolution for enhanced visual quality, transcoding by a mobile edge cloud, or viewport extraction for 360-degree video within the network environment. NBMP allows service providers and end users to describe media processing operations that are to be performed by the entities in the networks. NBMP will describe the composition of network-based media processing services out of a set of NBMP functions and makes these NBMP services accessible through Application Programming Interfaces (APIs).

NBMP will support the existing delivery methods such as streaming, file delivery, push-based progressive download, hybrid delivery, and multipath delivery within heterogeneous network environments. MPEG issued a Call for Proposal (CfP) seeking technologies that allow end-user devices, which are limited in processing capabilities and power consumption, to offload certain kinds of processing to the network.

After a formal evaluation of submissions, MPEG selected three technologies as starting points for the (i) workflow, (ii) metadata, and (iii) interfaces for static and dynamically acquired NBMP. A key conclusion of the evaluation was that NBMP can significantly improve the performance and efficiency of the cloud infrastructure and media processing services.

Research aspects: I reported about NBMP in my previous post and basically the same applies here. NBMP will be particularly interesting in the context of new networking approaches including, but not limited to, software-defined networking (SDN), information-centric networking (ICN), mobile edge computing (MEC), fog computing, and related aspects in the context of 5G.

MPEG finalizes 1st edition of Technical Report on Architectures for Immersive Media

At its 123nd meeting, MPEG finalized the first edition of its Technical Report (TR) on Architectures for Immersive Media. This report constitutes the first part of the MPEG-I standard for the coded representation of immersive media and introduces the eight MPEG-I parts currently under specification in MPEG. In particular, it addresses three Degrees of Freedom (3DoF; three rotational and un-limited movements around the X, Y and Z axes (respectively pitch, yaw and roll)), 3DoF+ (3DoF with additional limited translational movements (typically, head movements) along X, Y and Z axes), and 6DoF (3DoF with full translational movements along X, Y and Z axes) experiences but it mostly focuses on 3DoF. Future versions are expected to cover aspects beyond 3DoF. The report documents use cases and defines architectural views on elements that contribute to an overall immersive experience. Finally, the report also includes quality considerations for immersive services and introduces minimum requirements as well as objectives for a high-quality immersive media experience.

Research aspects: ISO/IEC technical reports are typically publicly available and provides informative descriptions of what the standard is about. In MPEG-I this technical report can be used as a guideline for possible architectures for immersive media. This first edition focuses on three Degrees of Freedom (3DoF; three rotational and un-limited movements around the X, Y and Z axes (respectively pitch, yaw and roll)) and outlines the other degrees of freedom currently foreseen in MPEG-I. It also highlights use cases and quality-related aspects that could be of interest for the research community.

MPEG releases software for MPEG-I visual activities

MPEG-I visual is an activity that addresses the specific requirements of immersive visual media for six degrees of freedom virtual walkthroughs with correct motion parallax within a bounded volume. MPEG-I visual covers application scenarios from 3DoF+ with slight body and head movements in a sitting position to 6DoF allowing some walking steps from a central position. At the 123nd MPEG meeting, an important progress has been achieved in software development. A new Reference View Synthesizer (RVS 2.0) has been released for 3DoF+, allowing to synthesize virtual viewpoints from an unlimited number of input views. RVS integrates code bases from Universite Libre de Bruxelles and Philips, who acted as software coordinator. A Weighted-to-Spherically-uniform PSNR (WS-PSNR) software utility, essential to 3DoF+ and 6DoF activities, has been developed by Zhejiang University. WS-PSNR is a full reference objective quality metric for all flavors of omnidirectional video. RVS and WS-PSNR are essential software tools for the upcoming Call for Proposals on 3DoF+ expected to be released at the 124th MPEG meeting in October 2018 (Macau, CN).

Research aspects: MPEG does not only produce text specifications but also reference software and conformance bitstreams, which are important assets for both research and development. Thus, it is very much appreciated to have a new Reference View Synthesizer (RVS 2.0) and Weighted-to-Spherically-uniform PSNR (WS-PSNR) software utility available which enables interoperability and reproducibility of R&D efforts/results in this area.

MPEG enhances ISO Base Media File Format (ISOBMFF) with new features

At the 123rd MPEG meeting, a couple of new amendments related to ISOBMFF has reached the first milestone. Amendment 2 to ISO/IEC 14496-12 6th edition will add the option to have relative addressing as an alternative to offset addressing, which in some environments and workflows can simplify the handling of files and will allow creation of derived visual tracks using items and samples in other tracks with some transformation, for example rotation. Another amendment reached its first milestone is the first amendment to ISO/IEC 23001-7 3rd edition. It will allow use of multiple keys to a single sample and scramble some parts of AVC or HEVC video bitstreams without breaking conformance to the existing decoders. That is, the bitstream will be decodable by existing decoders, but some parts of the video will be scrambled. It is expected that these amendments will reach the final milestone in Q3 2019.

Research aspects: The ISOBMFF reference software is now available on Github, which is a valuable service to the community and allows for active standard’s participation even from outside of MPEG. It is recommended that interested parties have a look at it and consider contributing to this project.

What else happened at #MPEG123?

The MPEG-DASH 3rd edition is finally available as output document (N17813; only available to MPEG members) combining 2nd edition, four amendments, and 2 corrigenda. We expect final publication later this year or early next year.
There is a new DASH amendment and corrigenda items in pipeline which should progress to final stages also some time next year. The status of MPEG-DASH (July 2018) can be seen below.

MPEG received a rather interesting input document related to “streaming first” which resulted into a publicly available output document entitled “thoughts on adaptive delivery and access to immersive media”. The key idea here is to focus on streaming (first) rather than on file/encapsulation formats typically used for storage (and streaming second). This document should become available here.
Since a couple of meetings, MPEG maintains a standardization roadmap highlighting recent/major MPEG standards and documenting the roadmap for the next five years. It definitely worth keeping this in mind when defining/updating your own roadmap.
JVET/VVC issued Working Draft 2 of Versatile Video Coding (N17732 | JVET-K1001) and Test Model 2 of Versatile Video Coding (VTM 2) (N17733 | JVET-K1002). Please note that N-documents are MPEG internal but JVET-documents are publicly accessible here: http://phenix.it-sudparis.eu/jvet/. An interesting aspect is that VTM2/WD2 should have >20% rate reduction compared to HEVC, all with reasonable complexity and the next benchmark set (BMS) should have close to 30% rate reduction vs. HEVC. Further improvements expected from (a) improved merge, intra prediction, etc., (b) decoder-side estimation with low complexity, (c) multi-hypothesis prediction and OBMC, (d) diagonal and other geometric partitioning, (e) secondary transforms, (f) new approaches of loop filtering, reconstruction and prediction filtering (denoising, non-local, diffusion based, bilateral, etc.), (g) current picture referencing, palette, and (h) neural networks.
In addition to VVC — which is a joint activity with VCEG –, MPEG is working on two video-related exploration activities, namely (a) an enhanced quality profile of the AVC standard and (b) a low complexity enhancement video codec. Both topics will be further discussed within respective Ad-hoc Groups (AhGs) and further details are available here.
Finally, MPEG established an Ad-hoc Group (AhG) dedicated to the long-term planning which is also looking into application areas/domains other than media coding/representation.

In this context it is probably worth mentioning the following DASH awards at recent conferences

June 2018, DASH-IF awarded Excellence in DASH award at ACM MMSys 2018
July 2018, DASH-IF awarded Grand Challenge on Dynamic Adaptive Streaming over HTTP at IEEE ICME 2018

Additionally, there have been two tutorials at ICME related to MPEG standards, which you may find interesting

Quality of Experience Column: An Introduction

By Christian Timmerer | September 13, 2018 - 19:02 |October 13, 2018 0318, Feature, QoE Column

Leave a comment

“Quality of Experience (QoE) is the degree of delight or annoyance of the user of an application or service. It results from the fulfillment of his or her expectations with respect to the utility and / or enjoyment of the application or service in the light of the user’s personality and current state.“ (Definition from the Qualinet Whitepaper 2013).

Research on Quality of Experience (QoE) has advanced significantly in recent years and attracts attention from various stakeholders. Different facets have been addressed by the research community like subjective user studies to identify QoE influence factors for particular applications like video streaming, QoE models to capture the effects of those influence factors on concrete applications, QoE monitoring approaches at the end user site but also within the network to assess QoE during service consumption and to provide means for QoE management for improved QoE. However, in order to progress in the area of QoE, new research directions have to be taken. The application of QoE in practice needs to consider the entire QoE eco-system and the stakeholders along the service delivery chain to the end user.

The term Quality of Experience dates back to a presentation in 2001 (interestingly, at a Quality of Service workshop) and Figure 1 depicts an overview of QoE showing some of the influence factors.

Figure 1. Quality of Experience (from Ebrahimi’09)

Different communities have been very active in the context of QoE. A long-established community is Qualinet which started in 2010. The Qualinet community (www.qualinet.eu) provided a definition of QoE in its [Qualinet Whitepaper] which is a contribution of the European Network on Quality of Experience in Multimedia Systems and Services, Qualinet (COST Action IC 1003), to the scientific discussion about the term QoE and its underlying concepts. The concepts and ideas cited in this paper mainly refer to the Quality of Experience of multimedia communication systems, but may be helpful also for other areas where QoE is an issue. Qualinet is organized in different task forces which address various research topics: Managing Web and Cloud QoE; Gaming; QoE in Medical Imaging and Healthcare; Crowdsourcing; Immersive Media Experiences (IMEx). There is also a liaison relation with VQEG and a task force on Qualinet Databases providing a platform with QoE-related dataset. The Qualinet database (http://dbq.multimediatech.cz/) is seen as a key for current and future developments in Quality of Experience, which resides in a rich and internationally recognized database of content of different sorts, and to share such a database with the scientific community at large.

Another example of the Qualinet activities is the Crowdsourcing task force. The goal of this task force is among others to identify the scientific challenges and problems for QoE assessment via crowdsourcing but also the strengths and benefits, and to derive a methodology and setup for crowdsourcing in QoE assessment including statistical approaches for proper analysis. Crowdsourcing is a popular approach that outsources tasks via the Internet to a large number of users. Commercial crowdsourcing platforms provide a global pool of users employed for performing short and simple online tasks. For quality assessment of multimedia services and applications, crowdsourcing enables new possibilities by moving the subjective test into the crowd resulting in larger diversity of the test subjects, faster turnover of test campaigns, and reduced costs due to low reimbursement costs of the participants. Further, crowdsourcing allows easily addressing additional features like real-life environments. Crowdsourced quality assessment however is not a straightforward implementation of existing subjective testing methodologies in an Internet-based environment. Additional challenges and differences to lab studies occur, in conceptual, technical, and motivational areas. The white paper [Crowdsourcing Best Practices] summarizes the recommendations and best practices for crowdsourced quality assessment of multimedia applications from the Qualinet Task Force on “Crowdsourcing” and is also discussed within the standardization ITU-T P.CROWD.

A selection of QoE related communities is provided in the following to give an overview on the pervasion of QoE in research.

Qualinet (http://www.qualinet.eu): European Network on Quality of Experience in Multimedia Systems and Services as outlined above. Qualinet is also technical sponsor of QoMEX.
QoMEX (http://qomex.org/). The International Conference on Quality of Multimedia Experience (QoMEX) is a top-ranked international conference and among the twenty-best conferences in Google Scholar for subcategory Multimedia. In 2019, the 11th International Conference on Quality of Multimedia Experience will be held in June 5th to 7th, 2019 in Berlin, Germany. It will bring together leading experts from academia and industry to present and discuss current and future research on multimedia quality, quality of experience (QoE) and user experience (UX). This way, it will contribute towards an integrated view on QoE and UX, and foster the exchange between the so-far distinct communities.
ACM SIGMM (http://www.sigmm.org/): Within the ACM community, QoE plays also a significant role in the major events like ACM Multimedia (ACM MM), where “Experience” is one of the four major themes. ACM Multimedia Systems (MMSys) regularly publishes works on QoE, and included special sessions on those topics in the last years. ACM MMsys 2019 will held from June 18 – 21, 2019 in Amherst, Massachusetts, USA.
ICME: The IEEE International Conference on Multimedia and Expo (IEEE ICME 2019) will be held from July 8-12, 2019 in Shanghai, China. It includes in the call for papers topics such as Multimedia quality assessment and metrics, and Multi-modal media computing and human-machine interaction.
ACM SIGCOMM (http://www.sigcomm.com): Within ACM SIGCOMM, Internet-QoE workshops have been initiated in 2016 and 2017. The focus of the last edition was on QoE Measurements, QoE-based Traffic Monitoring and Analysis, QoE-based Network Management.
Tracking QoE in the Internet Workshop: A summary and the outcomes of the “Workshop on Tracking Quality of Experience in the Internet” at Princeton gives a very good impression on the QoE activities in US with a recent focus on QoE monitoring and measurable QoE parameters in the presence of constraints like encryption.
SPEC RG QoE (https://research.spec.org): The mission of SPEC’s Research Group (RG) is to promote innovative research in the area of quantitative system evaluation and analysis by serving as a platform for collaborative research efforts fostering the interaction between industry and academia in the field. The SPEC research group on QoE is the starting point for the release of QoE ideas, QoE approaches, QoE measurement tools, and QoE assessment paradigms.
QoENet (http://www.qoenet-itn.eu) is a Marie Curie project, whose focus is the analysis, design, optimization and management of the QoE in advanced multimedia services, creating a fully-integrated and multi-disciplinary network of 12 Early Stage Researchers working in and seconded by 7 academic institutions, 3 private companies and 1 standardization institute distributed in 6 European countries and in South Korea. The project is then fulfilling the major objective of training through research of the young fellows to broader the knowledge in the field of the new generation of researchers. Significant research results have been achieved in the field of: QoE for online gaming, social TV and storytelling, and adaptive video streaming; QoE management in collaborative ISP/OTT scenarios; models for HDR, VR/AR and 3D images and videos.
Many QoE-related activities at a national level are also happening. For example, a community of professors and researchers from Spain organize a yearly workshop entitled “QoS and QoE in Multimedia Communications” since 2015 (URL of its latest edition: https://bit.ly/2LSlb2N). This community is targeted at establishing collaborations, sharing resources, and discussing about the latest contributions and open issues. The community is also pursuing the creation of a national network on QoE (like the Spanish Qualinet), and then involving international researchers in that network.
There are several standardization-related activities ongoing e.g. in standardization groups ITU, JPEG, MPEG, VQEG. Their specific interest in QoE will be summarized in one of the upcoming QoE columns.

The first QoE column will discuss how to approach an integrated view of QoE and User Experience. While research on QoE has mostly been carried out in the area of multimedia communications, user experience (UX) has addressed hedonic and pragmatic usage aspects of interactive applications. In the case of QoE, the meaningfulness of the application to the user and the forces driving the use have been largely neglected, while in the UX field, respective research has been carried out but hardly been incorporated in a model combined with the pragmatic and hedonic aspects. In the first column will be dedicated to recent ideas “Toward an integrated view of QoE and User Experience”. To give the readers an impression on the expected contents, we foresee in the upcoming QoE columns topics to discuss about recent activities like

Point cloud subjective evaluation methodology
Complex, interactive narrative design for complexity
Large-Scale Visual Quality Assessment Databases
Status and upcoming QoE activities in standardization
Active Learning and Machine Learning for subjective testing and QoE modeling
QoE in 5G: QoE management in softwarized networks with big data analytics
Immersive Media Experiences e.g. for VR/AR/360° video applications

Our aim for SIGMM Records is to share insights from the QoE community and to highlight recent development, new research directions, but also lessons learned and best practices. If you are interested in writing for the QoE column, or have something you would like to know more about in this area, please do not hesitate to contact the editors. The SIGMM Records editors responsible for QoE are active in different communities and QoE research directions.

The QoE column is edited by Tobias Hoßfeld and Christian Timmerer.

[Qualinet Whitepaper] Qualinet White Paper on Definitions of Quality of Experience (2012). European Network on Quality of Experience in Multimedia Systems and Services (COST Action IC 1003), Patrick Le Callet, Sebastian Möller and Andrew Perkis, eds., Lausanne, Switzerland, Version 1.2, March 2013.” Qualinet_QoE_whitepaper_v1.2

[Crowdsourcing Best Practices] Tobias Hoßfeld et al. “Best Practices and Recommendations for Crowdsourced QoE-Lessons learned from the Qualinet Task Force ‘Crowdsourcing’” (2014). Qualinet_CSLessonsLearned_29Oct2014

Tobias Hoßfeld is full professor at the University of Würzburg, Chair of Communication Networks, and is active in QoE research and teaching for more than 10 years. He finished his PhD in 2009 and his professorial thesis (habilitation) “Modeling and Analysis of Internet Applications and Services” in 2013 at the University of Würzburg. From 2014 to 2018, he was head of the Chair “Modeling of Adaptive Systems” at the University of Duisburg-Essen, Germany. He has published more than 100 research papers in major conferences and journals and received the Fred W. Ellersick Prize 2013 (IEEE Communications Society) for one of his articles on QoE. Among others, he is member of the advisory board of the ITC (International Teletraffic Congress), the editorial board of IEEE Communications Surveys & Tutorials and of Springer Quality and User Experience.

Christian Timmerer received his M.Sc. (Dipl.-Ing.) in January 2003 and his Ph.D. (Dr.techn.) in June 2006 (for research on the adaptation of scalable multimedia content in streaming and constrained environments) both from the Alpen-Adria-Universität (AAU) Klagenfurt. He joined the AAU in 1999 (as a system administrator) and is currently an Associate Professor at the Institute of Information Technology (ITEC) within the Multimedia Communication Group. His research interests include immersive multimedia communications, streaming, adaptation, Quality of Experience, and Sensory Experience. He was the general chair of WIAMIS 2008, QoMEX 2013, and MMSys 2016 and has participated in several EC-funded projects, notably DANAE, ENTHRONE, P2P-Next, ALICANTE, SocialSensor, COST IC1003 QUALINET, and ICoSOLE. He also participated in ISO/MPEG work for several years, notably in the area of MPEG-21, MPEG-M, MPEG-V, and MPEG-DASH where he also served as standard editor. In 2012 he cofounded Bitmovin (http://www.bitmovin.com/) to provide professional services around MPEG-DASH where he holds the position of the Chief Innovation Officer (CIO).

MPEG Column: 122nd MPEG Meeting in San Diego, CA, USA

By Christian Timmerer | July 5, 2018 - 06:10 |July 14, 2018 0218, Event Report, Feature, Standards

Leave a comment

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

MPEG122 Plenary, San Diego, CA, USA.

The MPEG press release comprises the following topics:

Versatile Video Coding (VVC) project starts strongly in the Joint Video Experts Team
MPEG issues Call for Proposals on Network-based Media Processing
MPEG finalizes 7th edition of MPEG-2 Systems Standard
MPEG enhances ISO Base Media File Format (ISOBMFF) with two new features
MPEG-G standards reach Draft International Standard for transport and compression technologies

Versatile Video Coding (VVC) – MPEG’ & VCEG’s new video coding project starts strong

The Joint Video Experts Team (JVET), a collaborative team formed by MPEG and ITU-T Study Group 16’s VCEG, commenced work on a new video coding standard referred to as Versatile Video Coding (VVC). The goal of VVC is to provide significant improvements in compression performance over the existing HEVC standard (i.e., typically twice as much as before) and to be completed in 2020. The main target applications and services include — but not limited to — 360-degree and high-dynamic-range (HDR) videos. In total, JVET evaluated responses from 32 organizations using formal subjective tests conducted by independent test labs. Interestingly, some proposals demonstrated compression efficiency gains of typically 40% or more when compared to using HEVC. Particular effectiveness was shown on ultra-high definition (UHD) video test material. Thus, we may expect compression efficiency gains well-beyond the targeted 50% for the final standard.

Research aspects: Compression tools and everything around it including its objective and subjective assessment. The main application area is clearly 360-degree and HDR. Watch out conferences like PCS and ICIP (later this year), which will be full of papers making references to VVC. Interestingly, VVC comes with a first draft, a test model for simulation experiments, and a technology benchmark set which is useful and important for any developments for both inside and outside MPEG as it allows for reproducibility.

MPEG issues Call for Proposals on Network-based Media Processing

This Call for Proposals (CfP) addresses advanced media processing technologies such as network stitching for VR service, super resolution for enhanced visual quality, transcoding, and viewport extraction for 360-degree video within the network environment that allows service providers and end users to describe media processing operations that are to be performed by the network. Therefore, the aim of network-based media processing (NBMP) is to allow end user devices to offload certain kinds of processing to the network. Therefore, NBMP describes the composition of network-based media processing services based on a set of media processing functions and makes them accessible through Application Programming Interfaces (APIs). Responses to the NBMP CfP will be evaluated on the weekend prior to the 123rd MPEG meeting in July 2018.

Research aspects: This project reminds me a lot about what has been done in the past in MPEG-21, specifically Digital Item Adaptation (DIA) and Digital Item Processing (DIP). The main difference is that MPEG targets APIs rather than pure metadata formats, which is a step forward into the right direction as APIs can be implemented and used right away. NBMP will be particularly interesting in the context of new networking approaches including, but not limited to, software-defined networking (SDN), information-centric networking (ICN), mobile edge computing (MEC), fog computing, and related aspects in the context of 5G.

7th edition of MPEG-2 Systems Standard and ISO Base Media File Format (ISOBMFF) with two new features

More than 20 years since its inception development of MPEG-2 systems technology (i.e., transport/program stream) continues. New features include support for: (i) JPEG 2000 video with 4K resolution and ultra-low latency, (ii) media orchestration related metadata, (iii) sample variance, and (iv) HEVC tiles.

The partial file format enables the description of an ISOBMFF file partially received over lossy communication channels. This format provides tools to describe reception data, the received data and document transmission information such as received or lost byte ranges and whether the corrupted/lost bytes are present in the file and repair information such as location of the source file, possible byte offsets in that source, byte stream position at which a parser can try processing a corrupted file. Depending on the communication channel, this information may be setup by the receiver or through out-of-band means.

ISOBMFF’s sample variants (2nd edition), which are typically used to provide forensic information in the rendered sample data that can, for example, identify the specific Digital Rights Management (DRM) client which has decrypted the content. This variant framework is intended to be fully compatible with MPEG’s Common Encryption (CENC) and agnostic to the particular forensic marking system used.

Research aspects: MPEG systems standards are mainly relevant for multimedia systems research with all its characteristics. The partial file format is specifically interesting as it targets scenarios with lossy communication channels.

MPEG-G standards reach Draft International Standard for transport and compression technologies

MPEG-G provides a set of standards enabling interoperability for applications and services dealing with high-throughput deoxyribonucleic acid (DNA) sequencing. At its 122nd meeting, MPEG promoted its core set of MPEG-G specifications, i.e., transport and compression technologies, to Draft International Standard (DIS) stage. Such parts of the standard provide new transport technologies (ISO/IEC 23092-1) and compression technologies (ISO/IEC 23092-2) supporting rich functionality for the access and transport including streaming of genomic data by interoperable applications. Reference software (ISO/IEC 23092-4) and conformance (ISO/IEC 23092-5) will reach this stage in the next 12 months.

Research aspects: the main focus of this work item is compression and transport is still in its infancy. Therefore, research on the actual delivery for compressed DNA information as well as its processing is solicited.

What else happened at MPEG122?

Requirements is exploring new video coding tools dealing with low-complexity and process enhancements.
The activity around coded representation of neural networks has defined a set of vital use cases and is now calling for test data to be solicited until the next meeting.
The MP4 registration authority (MP4RA) has a new awesome web site http://mp4ra.org/.
MPEG-DASH is finally approving and working the 3rd edition comprising consolidated version of recent amendments and corrigenda.
CMAF started an exploration on multi-stream support, which could be relevant for tiled streaming and multi-channel audio.
OMAF kicked-off its activity towards a 2nd edition enabling support for 3DoF+ and social VR with the plan going to committee draft (CD) in Oct’18. Additionally, there’s a test framework proposed, which allows to assess performance of various CMAF tools. Its main focus is on video but MPEG’s audio subgroup has a similar framework to enable subjective testing. It could be interesting seeing these two frameworks combined in one way or the other.
MPEG-I architectures (yes plural) are becoming mature and I think this technical report will become available very soon. In terms of video, MPEG-I looks more closer at 3DoF+ defining common test conditions and a call for proposals (CfP) planned for MPEG123 in Ljubljana, Slovenia. Additionally, explorations for 6DoF and compression of dense representation of light fields are ongoing and have been started, respectively.
Finally, point cloud compression (PCC) is in its hot phase of core experiments for various coding tools resulting into updated versions of the test model and working draft.

Research aspects: In this section I would like to focus on DASH, CMAF, and OMAF. Multi-stream support, as mentioned above, is relevant for tiled streaming and multi-channel audio which has been recently studied in the literature and is also highly relevant for industry. The efficient storage and streaming of such kind of content within the file format is an important aspect and often underrepresented in both research and standardization. The goal here is to keep the overhead low while maximizing the utility of the format to enable certain functionalities. OMAF now targets the social VR use case, which has been discussed in the research literature for a while and, finally, makes its way into standardization. An important aspect here is both user and quality of experience, which requires intensive subjective testing.

Finally, on May 10 MPEG will celebrate 30 years as its first meeting dates back to 1988 in Ottawa, Canada with around 30 attendees. The 122nd meeting had more than 500 attendees and MPEG has around 20 active work items. A total of more than 170 standards have been produces (that’s approx. six standards per year) where some standards have up to nine editions like the HEVC standards. Overall, MPEG is responsible for more that 23% of all JTC 1 standards and some of them showing extraordinary longevity regarding extensions, e.g., MPEG-2 systems (24 years), MPEG-4 file format (19 years), and AVC (15 years). MPEG standards serve billions of users (e.g., MPEG-1 video, MP2, MP3, AAC, MPEG-2, AVC, ISOBMFF, DASH). Some — more precisely five — standards have receive Emmy awards in the past (MPEG-1, MPEG-2, AVC (2x), and HEVC).

Thus, happy birthday MPEG! In today’s society starts the high performance era with 30 years, basically the time of “compression”, i.e., we apply all what we learnt and live out everything, truly optimistic perspective for our generation X (millennials) standards body!

MPEG Column: 121st MPEG Meeting in Gwangju, Korea

By Christian Timmerer | February 21, 2018 - 07:57 |March 23, 2018 0118, Event Report, Feature, Standards

Leave a comment

The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects.

The MPEG press release comprises the following topics:

Compact Descriptors for Video Analysis (CDVA) reaches Committee Draft level
MPEG-G standards reach Committee Draft for metadata and APIs
MPEG issues Calls for Visual Test Material for Immersive Applications
Internet of Media Things (IoMT) reaches Committee Draft level
MPEG finalizes its Media Orchestration (MORE) standard

At the end I will also briefly summarize what else happened with respect to DASH, CMAF, OMAF as well as discuss future aspects of MPEG.

Compact Descriptors for Video Analysis (CDVA) reaches Committee Draft level

The Committee Draft (CD) for CDVA has been approved at the 121st MPEG meeting, which is the first formal step of the ISO/IEC approval process for a new standard. This will become a new part of MPEG-7 to support video search and retrieval applications (ISO/IEC 15938-15).

Managing and organizing the quickly increasing volume of video content is a challenge for many industry sectors, such as media and entertainment or surveillance. One example task is scalable instance search, i.e., finding content containing a specific object instance or location in a very large video database. This requires video descriptors which can be efficiently extracted, stored, and matched. Standardization enables extracting interoperable descriptors on different devices and using software from different providers, so that only the compact descriptors instead of the much larger source videos can be exchanged for matching or querying. The CDVA standard specifies descriptors that fulfil these needs and includes (i) the components of the CDVA descriptor, (ii) its bitstream representation and (iii) the extraction process. The final standard is expected to be finished in early 2019.

CDVA introduces a new descriptor based on features which are output from a Deep Neural Network (DNN). CDVA is robust against viewpoint changes and moderate transformations of the video (e.g., re-encoding, overlays), it supports partial matching and temporal localization of the matching content. The CDVA descriptor has a typical size of 2–4 KBytes per second of video. For typical test cases, it has been demonstrated to reach a correct matching rate of 88% (at 1% false matching rate).

Research aspects: There are probably endless research aspects in the visual descriptor space ranging from validation of the achieved to results so far to further improving informative aspects with the goal to increase correct matching rate (and consequently decreasing the false matching rating). In general, however, the question is whether there’s a need for descriptors in the era of bandwidth-storage-computing over-provisioning and the raising usage of artificial intelligence techniques such as machine learning and deep learning.

MPEG-G standards reach Committee Draft for metadata and APIs

In my previous report I introduced the MPEG-G standard for compression and transport technologies of genomic data. At the 121st MPEG meeting, metadata and APIs reached CD level. The former – metadata – provides relevant information associated to genomic data and the latter – APIs – allow for building interoperable applications capable of manipulating MPEG-G files. Additional standardization plans for MPEG-G include the CDs for reference software (ISO/IEC 23092-4) and conformance (ISO/IEC 23092-4), which are planned to be issued at the next 122nd MPEG meeting with the objective of producing Draft International Standards (DIS) at the end of 2018.

Research aspects: Metadata typically enables certain functionality which can be tested and evaluated against requirements. APIs allow to build applications and services on top of the underlying functions, which could be a driver for research projects to make use of such APIs.

MPEG issues Calls for Visual Test Material for Immersive Applications

I have reported about the Omnidirectional Media Format (OMAF) in my previous report. At the 121st MPEG meeting, MPEG was working on extending OMAF functionalities to allow the modification of viewing positions, e.g., in case of head movements when using a head-mounted display, or for use with other forms of interactive navigation. Unlike OMAF which only provides 3 degrees of freedom (3DoF) for the user to view the content from a perspective looking outwards from the original camera position, the anticipated extension will also support motion parallax within some limited range which is referred to as 3DoF+. In the future with further enhanced technologies, a full 6 degrees of freedom (6DoF) will be achieved with changes of viewing position over a much larger range. To develop technology in these domains, MPEG has issued two Calls for Test Material in the areas of 3DoF+ and 6DoF, asking owners of image and video material to provide such content for use in developing and testing candidate technologies for standardization. Details about these calls can be found at https://mpeg.chiariglione.org/.

Research aspects: The good thing about test material is that it allows for reproducibility, which is an important aspect within the research community. Thus, it is more than appreciated that MPEG issues such a call and let’s hope that this material will become publicly available. Typically this kind of visual test material targets coding but it would be also interesting to have such test content for storage and delivery.

Internet of Media Things (IoMT) reaches Committee Draft level

The goal of IoMT is is to facilitate the large-scale deployment of distributed media systems with interoperable audio/visual data and metadata exchange. This standard specifies APIs providing media things (i.e., cameras/displays and microphones/loudspeakers, possibly capable of significant processing power) with the capability of being discovered, setting-up ad-hoc communication protocols, exposing usage conditions, and providing media and metadata as well as services processing them. IoMT APIs encompass a large variety of devices, not just connected cameras and displays but also sophisticated devices such as smart glasses, image/speech analyzers and gesture recognizers. IoMT enables the expression of the economic value of resources (media and metadata) and of associated processing in terms of digital tokens leveraged by the use of blockchain technologies.

Research aspects: The main focus of IoMT is APIs which provides easy and flexible access to the underlying device’ functionality and, thus, are an important factor to enable research within this interesting domain. For example, using these APIs to enable communicates among these various media things could bring up new forms of interaction with these technologies.

MPEG finalizes its Media Orchestration (MORE) standard

MPEG “Media Orchestration” (MORE) standard reached Final Draft International Standard (FDIS), the final stage of development before being published by ISO/IEC. The scope of the Media Orchestration standard is as follows:

It supports the automated combination of multiple media sources (i.e., cameras, microphones) into a coherent multimedia experience.
It supports rendering multimedia experiences on multiple devices simultaneously, again giving a consistent and coherent experience.
It contains tools for orchestration in time (synchronization) and space.

MPEG expects that the Media Orchestration standard to be especially useful in immersive media settings. This applies notably in social virtual reality (VR) applications, where people share a VR experience and are able to communicate about it. Media Orchestration is expected to allow synchronizing the media experience for all users, and to give them a spatially consistent experience as it is important for a social VR user to be able to understand when other users are looking at them.

Research aspects: This standard enables the social multimedia experience proposed in literature. Interestingly, the W3C is working on something similar referred to as timing object and it would be interesting to see whether these approaches have some commonalities.

What else happened at the MPEG meeting?

DASH is fully in maintenance mode and we are still waiting for the 3rd edition which is supposed to be a consolidation of existing corrigenda and amendments. Currently only minor extensions are proposed and conformance/reference software is being updated. Similar things can be said for CMAF where we have one amendment and one corrigendum under development. Additionally, MPEG is working on CMAF conformance. OMAF has reached FDIS at the last meeting and MPEG is working on reference software and conformance also. It is expected that in the future we will see additional standards and/or technical reports defining/describing how to use CMAF and OMAF in DASH.

Regarding the future video codec, the call for proposals is out since the last meeting as announced in my previous report and responses are due for the next meeting. Thus, it is expected that the 122nd MPEG meeting will be the place to be in terms of MPEG’s future video codec. Speaking about the future, shortly after the 121st MPEG, Leonardo Chiariglione published a blog post entitled “a crisis, the causes and a solution”, which is related to HEVC licensing, Alliance for Open Media (AOM), and possible future options. The blog post certainly caused some reactions within the video community at large and I think this was also intended. Let’s hope it will galvanice the video industry — not to push the button — but to start addressing and resolving the issues. As pointed out in one of my other blog posts about what to care about in 2018, the upcoming MPEG meeting in April 2018 is certainly a place to be. Additionally, it highlights some conferences related to various aspects also discussed in MPEG which I’d like to republish here:

QoMEX — Int’l Conf. on Quality of Multimedia Experience — will be hosted in Sardinia, Italy from May 29-31, which is THE conference to be for QoE of multimedia applications and services. Submission deadline is January 15/22, 2018.
MMSys — Multimedia Systems Conf. — and specifically Packet Video, which will be on June 12 in Amsterdam, The Netherlands. Packet Video is THE adaptive streaming scientific event 2018. Submission deadline is March 1, 2018.
Additionally, you might be interested in ICME (July 23-27, 2018, San Diego, USA), ICIP (October 7-10, 2018, Athens, Greece; specifically in the context of video coding), and PCS (June 24-27, 2018, San Francisco, CA, USA; also in the context of video coding).
The DASH-IF academic track hosts special events at MMSys (Excellence in DASH Award) and ICME (DASH Grand Challenge).
MIPR — 1st Int’l Conf. on Multimedia Information Processing and Retrieval — will be in Miami, Florida, USA from April 10-12, 2018. It has a broad range of topics including networking for multimedia systems as well as systems and infrastructures.

MPEG Column: 120th MPEG Meeting in Macau, China

By Christian Timmerer | December 19, 2017 - 18:21 |December 23, 2017 0417, Event Report, Feature, Standards

Leave a comment

The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects.

MPEG Plenary Meeting

The MPEG press release comprises the following topics:

Point Cloud Compression – MPEG evaluates responses to call for proposal and kicks off its technical work
The omnidirectional media format (OMAF) has reached its final milestone
MPEG-G standards reach Committee Draft for compression and transport technologies of genomic data
Beyond HEVC – The MPEG & VCEG call to set the next standard in video compression
MPEG adds better support for mobile environment to MMT
New standard completed for Internet Video Coding
Evidence of new video transcoding technology using side streams

Point Cloud Compression

At its 120th meeting, MPEG analysed the technologies submitted by nine industry leaders as responses to the Call for Proposals (CfP) for Point Cloud Compression (PCC). These technologies address the lossless or lossy coding of 3D point clouds with associated attributes such as colour and material properties. Point clouds are referred to as unordered sets of points in a 3D space and typically captured using various setups of multiple cameras, depth sensors, LiDAR scanners, etc., but can also be generated synthetically and are in use in several industries. They have recently emerged as representations of the real world enabling immersive forms of interaction, navigation, and communication. Point clouds are typically represented by extremely large amounts of data providing a significant barrier for mass market applications. Thus, MPEG has issued a Call for Proposal seeking technologies that allow reduction of point cloud data for its intended applications. After a formal objective and subjective evaluation campaign, MPEG selected three technologies as starting points for the test models for static, animated, and dynamically acquired point clouds. A key conclusion of the evaluation was that state-of-the-art point cloud compression can be significantly improved by leveraging decades of 2D video coding tools and combining 2D and 3D compression technologies. Such an approach provides synergies with existing hardware and software infrastructures for rapid deployment of new immersive experiences.

Although the initial selection of technologies for point cloud compression has been concluded at the 120th MPEG meeting, it could be also seen as a kick-off for its scientific evaluation and various further developments including the optimization thereof. It is expected that various scientific conference will focus on point cloud compression and may open calls for grand challenges like for example at IEEE ICME 2018.

Omnidirectional Media Format (OMAF)

The understanding of the virtual reality (VR) potential is growing but market fragmentation caused by the lack of interoperable formats for the storage and delivery of such content stifles VR’s market potential. MPEG’s recently started project referred to as Omnidirectional Media Format (OMAF) has reached Final Draft of International Standard (FDIS) at its 120th meeting. It includes

equirectangular projection and cubemap projection as projection formats;
signalling of metadata required for interoperable rendering of 360-degree monoscopic and stereoscopic audio-visual data; and
provides a selection of audio-visual codecs for this application.

It also includes technologies to arrange video pixel data in numerous ways to improve compression efficiency and reduce the size of video, a major bottleneck for VR applications and services. The standard also includes technologies for the delivery of OMAF content with MPEG-DASH and MMT.

MPEG has defined a format comprising a minimal set of tools to enable interoperability among implementers of the standard. Various aspects are deliberately excluded from the normative parts to foster innovation leading to novel products and services. This enables us — researcher and practitioners — to experiment with these new formats in various ways and focus on informative aspects where typically competition can be found. For example, efficient means for encoding and packaging of omnidirectional/360-degree media content and its adaptive streaming including support for (ultra-)low latency will become a big issue in the near future.

MPEG-G: Compression and Transport Technologies of Genomic Data

The availability of high throughput DNA sequencing technologies opens new perspectives in the treatment of several diseases making possible the introduction of new global approaches in public health known as “precision medicine”. While routine DNA sequencing in the doctor’s office is still not current practice, medical centers have begun to use sequencing to identify cancer and other diseases and to find effective treatments. As DNA sequencing technologies produce extremely large amounts of data and related information, the ICT costs of storage, transmission, and processing are also very high. The MPEG-G standard addresses and solves the problem of efficient and economical handling of genomic data by providing new

compression technologies (ISO/IEC 23092-2) and
transport technologies (ISO/IEC 23092-1),

which reached Committee Draft level at its 120th meeting.

Additionally, the Committee Drafts for

metadata and APIs (ISO/IEC 23092-3) and
reference software (ISO/IEC 23092-4)

are scheduled for the next MPEG meeting and the goal is to publish Draft International Standards (DIS) at the end of 2018.

This new type of (media) content, which requires compression and transport technologies, is emerging within the multimedia community at large and, thus, input is welcome.

Beyond HEVC – The MPEG & VCEG Call to set the Next Standard in Video Compression

The 120th MPEG meeting marked the first major step toward the next generation of video coding standard in the form of a joint Call for Proposals (CfP) with ITU-T SG16’s VCEG. After two years of collaborative informal exploration studies and a gathering of evidence that successfully concluded at the 118th MPEG meeting, MPEG and ITU-T SG16 agreed to issue the CfP for future video coding technology with compression capabilities that significantly exceed those of the HEVC standard and its current extensions. They also formalized an agreement on formation of a joint collaborative team called the “Joint Video Experts Team” (JVET) to work on development of the new planned standard, pending the outcome of the CfP that will be evaluated at the 122nd MPEG meeting in April 2018. To evaluate the proposed compression technologies, formal subjective tests will be performed using video material submitted by proponents in February 2018. The CfP includes the testing of technology for 360° omnidirectional video coding and the coding of content with high-dynamic range and wide colour gamut in addition to conventional standard-dynamic-range camera content. Anticipating a strong response to the call, a “test model” draft design is expected be selected in 2018, with development of a potential new standard in late 2020.

The major goal of a new video coding standard is to be better than its successor (HEVC). Typically this “better” is quantified by 50% which means, that it should be possible encode the video at the same quality with half of the bitrate or a significantly higher quality with the same bitrate including. However, at this time the “Joint Video Experts Team” (JVET) from MPEG and ITU-T SG16 faces competition from the Alliance for Open Media, which is working on AV1. In any case, we are looking forward to an exciting time frame from now until this new codec is ratified and how it will perform compared to AV1. Multimedia systems and applications will also benefit from new codecs which will gain traction as soon as first implementations of this new codec becomes available (note that AV1 is available as open source already and continuously further developed).

MPEG adds Better Support for Mobile Environment to MPEG Media Transport (MMT)

MPEG has approved the Final Draft Amendment (FDAM) to MPEG Media Transport (MMT; ISO/IEC 23008-1:2017), which is referred to as “MMT enhancements for mobile environments”. In order to reflect industry needs on MMT, which has been well adopted by broadcast standards such as ATSC 3.0 and Super Hi-Vision, it addresses several important issues on the efficient use of MMT in mobile environments. For example, it adds distributed resource identification message to facilitate multipath delivery and transition request message to change the delivery path of an active session. This amendment also introduces the concept of a MMT-aware network entity (MANE), which might be placed between the original server and the client, and provides a detailed description about how to use it for both improving efficiency and reducing delay of delivery. Additionally, this amendment provides a method to use WebSockets to setup and control an MMT session/presentation.

New Standard Completed for Internet Video Coding

A new standard for video coding suitable for the internet as well as other video applications, was completed at the 120th MPEG meeting. The Internet Video Coding (IVC) standard was developed with the intention of providing the industry with an “Option 1” video coding standard. In ISO/IEC language, this refers to a standard for which patent holders have declared a willingness to grant licenses free of charge to an unrestricted number of applicants for all necessary patents on a worldwide, non-discriminatory basis and under other reasonable terms and conditions, to enable others to make, use, and sell implementations of the standard. At the time of completion of the IVC standard, the specification contained no identified necessary patent rights except those available under Option 1 licensing terms. During the development of IVC, MPEG removed from the draft standard any necessary patent rights that it was informed were not available under such Option 1 terms, and MPEG is optimistic of the outlook for the new standard. MPEG encourages interested parties to provide information about any other similar cases. The IVC standard has roughly similar compression capability as the earlier AVC standard, which has become the most widely deployed video coding technology in the world. Tests have been conducted to verify IVC’s strong technical capability, and the new standard has also been shown to have relatively modest implementation complexity requirements.

Evidence of new Video Transcoding Technology using Side Streams

Following a “Call for Evidence” (CfE) issued by MPEG in July 2017, evidence was evaluated at the 120th MPEG meeting to investigate whether video transcoding technology has been developed for transcoding assisted by side data streams that is capable of significantly reducing the computational complexity without reducing compression efficiency. The evaluations of the four responses received included comparisons of the technology against adaptive bit-rate streaming using simulcast as well as against traditional transcoding using full video re-encoding. The responses span the compression efficiency space between simulcast and full transcoding, with trade-offs between the bit rate required for distribution within the network and the bit rate required for delivery to the user. All four responses provided a substantial computational complexity reduction compared to transcoding using full re-encoding. MPEG plans to further investigate transcoding technology and is soliciting expressions of interest from industry on the need for standardization of such assisted transcoding using side data streams.

MPEG currently works on two related topics which are referred to as network-distributed video coding (NDVC) and network-based media processing (NBMP). Both activities involve the network, which is more and more evolving to highly distributed compute and delivery platform as opposed to a bit pipe, which is supposed to deliver data as fast as possible from A to B. This phenomena could be also interesting when looking at developments around 5G, which is actually much more than just radio access technology. These activities are certainly worth to monitor as it basically contributes in order to make networked media resources accessible or even programmable. In this context, I would like to refer the interested reader to the December’17 theme of the IEEE Computer Society Computing Now, which is about Advancing Multimedia Content Distribution.

Publicly available documents from the 120th MPEG meeting can be found here (scroll down to the end of the page). The next MPEG meeting will be held in Gwangju, Korea, January 22-26, 2018. Feel free to contact Christian Timmerer for any questions or comments.

Some of the activities reported above are considered within the Call for Papers at 23rd Packet Video Workshop (PV 2018) co-located with ACM MMSys 2018 in Amsterdam, The Netherlands. Topics of interest include (but are not limited to):

Adaptive media streaming, and content storage, distribution and delivery
Network-distributed video coding and network-based media processing
Next-generation/future video coding, point cloud compression
Audiovisual communication, surveillance and healthcare systems
Wireless, mobile, IoT, and embedded systems for multimedia applications
Future media internetworking: information-centric networking and 5G
Immersive media: virtual reality (VR), augmented reality (AR), 360° video and multi-sensory systems, and its streaming
Machine learning in media coding and streaming systems
Standardization: DASH, MMT, CMAF, OMAF, MiAF, WebRTC, MSE, EME, WebVR, Hybrid Media, WAVE, etc.
Applications: social media, game streaming, personal broadcast, healthcare, industry 4.0, education, transportation, etc.

Important dates

Submission deadline: March 1, 2018
Acceptance notification: April 9, 2018
Camera-ready deadline: April 19, 2018

Report from ACM MMSys 2017

By Christian Timmerer | October 2, 2017 - 06:22 |December 26, 2017 0317, Conference Report, Feature

Leave a comment

–A report from Christian Timmerer, AAU/Bitmovin Austria

The ACM Multimedia Systems Conference (MMSys) provides a forum for researchers to present and share their latest research findings in multimedia systems. It is a unique event targeting “multimedia systems” from various angles and views across all domains instead of focusing on a specific aspect or data type. ACM MMSys’17 was held in Taipei, Taiwan in June 20-23, 2017.

MMSys is a single-track conference which hosts also a series of workshops, namely NOSSDAV, MMVE, and NetGames. Since 2016, it kicks off with overview talks and 2017 we’ve seen the following talks: “Geometric representations of 3D scenes” by Geraldine Morin; “Towards Understanding Truly Immersive Multimedia Experiences” by Niall Murray; “Rate Control In The Age Of Vision” by Ketan Mayer-Patel; “Humans, computers, delays and the joys of interaction” by Ragnhild Eg; “Context-aware, perception-guided workload characterization and resource scheduling on mobile phones for interactive applications” by Chung-Ta King and Chun-Han Lin.

Additionally, industry talks have been introduced: “Virtual Reality – The New Era of Future World” by WeiGing Ngang; “The innovation and challenge of Interactive streaming technology” by Wesley Kuo; “What challenges are we facing after Netflix revolutionized TV watching?” by Shuen-Huei Guan; “The overview of app streaming technology” by Sam Ding; “Semantic Awareness in 360 Streaming” by Shannon Chen; “On the frontiers of Video SaaS” by Sega Cheng.

An interesting set of keynotes presented different aspects related multimedia systems and its co-located workshops:

Henry Fuchs, The AR/VR Renaissance: opportunities, pitfalls, and remaining problems
Julien Lai, Towards Large-scale Deployment of Intelligent Video Analytics Systems
Dah Ming Chiu, Smart Streaming of Panoramic Video
Bo Li, When Computation Meets Communication: The Case for Scheduling Resources in the Cloud
Polly Huang, Measuring Subjective QoE for Interactive System Design in the Mobile Era – Lessons Learned Studying Skype Calls

The program included a diverse set of topics such as immersive experiences in AR and VR, network optimization and delivery, multisensory experiences, processing, rendering, interaction, cloud-based multimedia, IoT connectivity, infrastructure, media streaming, and security. A vital aspect of MMSys is dedicated sessions for showcasing latest developments in the area of multimedia systems and presenting datasets, which is important towards enabling reproducibility and sustainability in multimedia systems research.

The social events were a perfect venue for networking and in-depth discussion how to advance the state of the art. A welcome reception was held at “LE BLE D’OR (Miramar)”, the conference banquet at the Taipei World Trade Center Club, and finally a tour to the Shilin Night Market was organized.

ACM MMSys 2917 issued the following awards:

The Best Paper Award goes to “A Scalable and Privacy-Aware IoT Service for Live Video Analytics” by Junjue Wang (Carnegie Mellon University), Brandon Amos (Carnegie Mellon University), Anupam Das (Carnegie Mellon University), Padmanabhan Pillai (Intel Labs), Norman Sadeh (Carnegie Mellon University), and Mahadev Satyanarayanan (Carnegie Mellon University).
The Best Student Paper Award goes to “A Measurement Study of Oculus 360 Degree Video Streaming” by Chao Zhou (SUNY Binghamton), Zhenhua Li (Tsinghua University), and Yao Liu (SUNY Binghamton).
The NOSSDAV’17 Best Paper Award goes to “A Comparative Case Study of HTTP Adaptive Streaming Algorithms in Mobile Networks” by Theodoros Karagkioules (Huawei Technologies France/Telecom ParisTech), Cyril Concolato (Telecom ParisTech), Dimitrios Tsilimantos (Huawei Technologies France), Stefan Valentin (Huawei Technologies France).

Excellence in DASH award sponsored by the DASH-IF

1st place: “SAP: Stall-Aware Pacing for Improved DASH Video Experience in Cellular Networks” by Ahmed Zahran (University College Cork), Jason J. Quinlan (University College Cork), K. K. Ramakrishnan (University of California, Riverside), and Cormac J. Sreenan (University College Cork)
2nd place: “Improving Video Quality in Crowded Networks Using a DANE” by Jan Willem Kleinrouweler, Britta Meixner and Pablo Cesar (Centrum Wiskunde & Informatica)
3rd place: “Towards Bandwidth Efficient Adaptive Streaming of Omnidirectional Video over HTTP” by Mario Graf (Bitmovin Inc.), Christian Timmerer (Alpen-Adria-Universität Klagenfurt / Bitmovin Inc.), and Christopher Mueller (Bitmovin Inc.)

Finally, student travel grants awards have been sponsored by SIGMM. All details including nice pictures can be found here.

ACM MMSys 2018 will be held in Amsterdam, The Netherlands, June 12 – 15, 2018 and includes the following tracks:

Research track: Submission deadline on November 30, 2017
Demo track: Submission deadline on February 25, 2018
Open Dataset & Software Track: Submission deadline on February 25, 2018

MMSys’18 co-locates the following workshops (with submission deadline on March 1, 2018):

MMVE2018: 10th International Workshop on Immersive Mixed and Virtual Environment Systems,
NetGames2018: 16th Annual Worksop on Network and Systems Support for Games,
NOSSDAV2018: 28th ACM SIGMM Workshop on Network and Operating Systems Support for Digital Audio and Video,
PV2018: 23rd Packet Video Workshop

MMSys’18 includes the following special sessions (submission deadline on December 15, 2017):