MPEG Column: 134th MPEG Meeting (virtual/online)

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 134th MPEG meeting was once again held as an online meeting, and the official press release can be found here and comprises the following items:

  • First International Standard on Neural Network Compression for Multimedia Applications
  • Completion of the carriage of VVC and EVC
  • Completion of the carriage of V3C in ISOBMFF
  • Call for Proposals: (a) new Advanced Genomics Features and Technologies, (b) MPEG-I Immersive Audio, and (c) coded Representation of Haptics
  • MPEG evaluated Responses on Incremental Compression of Neural Networks
  • Progression of MPEG 3D Audio Standards
  • The first milestone of development of Open Font Format (2nd amendment)
  • Verification tests: (a) low Complexity Enhancement Video Coding (LCEVC) Verification Test and (b) more application cases of Versatile Video Coding (VVC)
  • Standardization work on Version 2 of VVC and VSEI started

In this column, the focus is on streaming-related aspects including a brief update about MPEG-DASH.

First International Standard on Neural Network Compression for Multimedia Applications

Artificial neural networks have been adopted for a broad range of tasks in multimedia analysis and processing, such as visual and acoustic classification, extraction of multimedia descriptors, or image and video coding. The trained neural networks for these applications contain many parameters (i.e., weights), resulting in a considerable size. Thus, transferring them to several clients (e.g., mobile phones, smart cameras) benefits from a compressed representation of neural networks.

At the 134th MPEG meeting, MPEG Video ratified the first international standards on Neural Network Compression for Multimedia Applications (ISO/IEC 15938-17), designed as a toolbox of compression technologies. The specification contains different methods for

  • parameter reduction (e.g., pruning, sparsification, matrix decomposition),
  • parameter transformation (e.g., quantization), and
  • entropy coding 

methods that can be assembled to encoding pipelines combining one or more (in the case of reduction) methods from each group.

The results show that trained neural networks for many common multimedia problems such as image or audio classification or image compression can be compressed by a factor of 10-20 with no performance loss and even by more than 30 with performance trade-off. The specification is not limited to a particular neural network architecture and is independent of the neural network exchange format choice. The interoperability with common neural network exchange formats is described in the annexes of the standard.

As neural networks are becoming increasingly important, the communication thereof over heterogeneous networks to a plethora of devices raises various challenges including efficient compression that is inevitable and addressed in this standard. ISO/IEC 15938 is commonly referred to as MPEG-7 (or the “multimedia content description interface”) and this standard becomes now part 15 of MPEG-7.

Research aspects: Like for all compression-related standards, research aspects are related to compression efficiency (lossy/lossless), computational complexity (runtime, memory), and quality-related aspects. Furthermore, the compression of neural networks for multimedia applications probably enables new types of applications and services to be deployed in the (near) future. Finally, simultaneous delivery and consumption (i.e., streaming) of neural networks including incremental updates thereof will become a requirement for networked media applications and services.

Carriage of Media Assets

At the 134th MPEG meeting, MPEG Systems completed the carriage of various media assets in MPEG-2 Systems (Transport Stream) and the ISO Base Media File Format (ISOBMFF), respectively.

In particular, the standards for the carriage of Versatile Video Coding (VVC) and Essential Video Coding (EVC) over both MPEG-2 Transport Stream (M2TS) and ISO Base Media File Format (ISOBMFF) reached their final stages of standardization, respectively:

  • For M2TS, the standard defines constraints to elementary streams of VVC and EVC to carry them in the packetized elementary stream (PES) packets. Additionally, buffer management mechanisms and transport system target decoder (T-STD) model extension are also defined.
  • For ISOBMFF, the carriage of codec initialization information for VVC and EVC is defined in the standard. Additionally, it also defines samples and sub-samples reflecting the high-level bitstream structure and independently decodable units of both video codecs. For VVC, signaling and extraction of a certain operating point are also supported.

Finally, MPEG Systems completed the standard for the carriage of Visual Volumetric Video-based Coding (V3C) data using ISOBMFF. Therefore, it supports media comprising multiple independent component bitstreams and considers that only some portions of immersive media assets need to be rendered according to the users’ position and viewport. Thus, the metadata indicating the relationship between the region in the 3D spatial data to be rendered and its location in the bitstream is defined. In addition, the delivery of the ISOBMFF file containing a V3C content over DASH and MMT is also specified in this standard.

Research aspects: Carriage of VVC, EVC, and V3C using M2TS or ISOBMFF provides an essential building block within the so-called multimedia systems layer resulting in a plethora of research challenges as it typically offers an interoperable interface to the actual media assets. Thus, these standards enable efficient and flexible provisioning or/and use of these media assets that are deliberately not defined in these standards and subject to competition.

Call for Proposals and Verification Tests

At the 134th MPEG meeting, MPEG issued three Call for Proposals (CfPs) that are briefly highlighted in the following:

  • Coded Representation of Haptics: Haptics provide an additional layer of entertainment and sensory immersion beyond audio and visual media. This CfP aims to specify a coded representation of haptics data, e.g., to be carried using ISO Base Media File Format (ISOBMFF) files in the context of MPEG-DASH or other MPEG-I standards.
  • MPEG-I Immersive Audio: Immersive Audio will complement other parts of MPEG-I (i.e., Part 3, “Immersive Video” and Part 2, “Systems Support”) in order to provide a suite of standards that will support a Virtual Reality (VR) or an Augmented Reality (AR) presentation in which the user can navigate and interact with the environment using 6 degrees of freedom (6 DoF), that being spatial navigation (x, y, z) and user head orientation (yaw, pitch, roll).
  • New Advanced Genomics Features and Technologies: This CfP aims to collect submissions of new technologies that can (i) provide improvements to the current compression, transport, and indexing capabilities of the ISO/IEC 23092 standards suite, particularly applied to data consisting of very long reads generated by 3rd generation sequencing devices, (ii) provide the support for representation and usage of graph genome references, (iii) include coding modes relying on machine learning processes, satisfying data access modalities required by machine learning and providing higher compression, and (iv) support of interfaces with existing standards for the interchange of clinical data.

Detailed information, including instructions on how to respond to the call for proposals, the requirements that must be considered, the test data to be used, and the submission and evaluation procedures for proponents are available at www.mpeg.org.

Call for proposals typically mark the beginning of the formal standardization work whereas verification tests are conducted once a standard has been completed. At the 134th MPEG meeting and despite the difficulties caused by the pandemic situation, MPEG completed verification tests for Versatile Video Coding (VVC) and Low Complexity Enhancement Video Coding (LCEVC).

For LCEVC, verification tests measured the benefits of enhancing four existing codecs of different generations (i.e., AVC, HEVC, EVC, VVC) using tools as defined in LCEVC within two sets of tests:

  • The first set of tests compared LCEVC-enhanced encoding with full-resolution single-layer anchors. The average bit rate savings produced by LCEVC when enhancing AVC were determined to be approximately 46% for UHD and 28% for HD. When enhancing HEVC approximately 31% for UHD and 24% for HD. Test results tend to indicate an overall benefit also when using LCEVC to enhance EVC and VVC.
  • The second set of tests confirmed that LCEVC provided a more efficient means of resolution enhancement of half-resolution anchors than unguided up-sampling. Comparing LCEVC full-resolution encoding with the up-sampled half-resolution anchors, the average bit-rate savings when using LCEVC with AVC, HEVC, EVC and VVC were calculated to be approximately 28%, 34%, 38%, and 32% for UHD and 27%, 26%, 21%, and 21% for HD, respectively.

For VVC, it was already the second round of verification testing including the following aspects:

  • 360-degree video for equirectangular and cubemap formats, where VVC shows on average more than 50% bit rate reduction compared to the previous major generation of MPEG video coding standard known as High Efficiency Video Coding (HEVC), developed in 2013.
  • Low-delay applications such as compression of conversational (teleconferencing) and gaming content, where the compression benefit is about 40% on average,
  • HD video streaming, with an average bit rate reduction of close to 50%.

A previous set of tests for 4K UHD content completed in October 2020 had shown similar gains. These verification tests used formal subjective visual quality assessment testing with “naïve” human viewers. The tests were performed under a strict hygienic regime in two test laboratories to ensure safe conditions for the viewers and test managers.

Research aspects: CfPs offer a unique possibility for researchers to propose research results for adoption into future standards. Verification tests provide objective or/and subjective evaluations of standardized tools which typically conclude the life cycle of a standard. The results of the verification tests are usually publicly available and can be used as a baseline for future improvements of the respective standards including the evaluation thereof.

DASH Update!

Finally, I’d like to provide a brief update on MPEG-DASH! At the 134th MPEG meeting, MPEG Systems recommended the approval of ISO/IEC FDIS 23009-1 5th edition. That is, the MPEG-DASH core specification will be available as 5th edition sometime this year. Additionally, MPEG requests that this specification becomes freely available which also marks an important milestone in the development of the MPEG-DASH standard. Most importantly, the 5th edition of this standard incorporates CMAF support as well as other enhancements defined in the amendment of the previous edition. Additionally, the MPEG-DASH subgroup of MPEG Systems is already working on the first amendment to its 5th edition entitled preroll, nonlinear playback, and other extensions. It is expected that the 5th edition will also impact related specifications within MPEG but also in other Standards Developing Organizations (SDOs) such as DASH-IF, i.e., defining interoperability points (IOPs) for various codecs and others, or CTA WAVE (Web Application Video Ecosystem), i.e., defining device playback capabilities such as the Common Media Client Data (CMCD). Both DASH-IF and CTA WAVE provide means for (conformance) test infrastructure for DASH and CMAF.

An updated overview of DASH standards/features can be found in the Figure below.

MPEG-DASH status as of April 2021.

Research aspects: MPEG-DASH has been ratified almost ten years ago which resulted in a plethora of research articles, mostly related to adaptive bitrate (ABR) algorithms and their impact on the streaming performance including the Quality of Experience (QoE). An overview of bitrate adaptation schemes is provided here including a list of open challenges and issues.

The 135th MPEG meeting will be again an online meeting in July 2021. Click here for more information about MPEG meetings and their developments.

VQEG Column: New topics

Introduction

Welcome to the fourth column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG).
During the last VQEG plenary meeting (14-18 Dec. 2020) various interesting discussions arose regarding new topics not addressed up to then by VQEG groups, which led to launching three new sub-projects and a new project related to: 1) clarifying the computation of spatial and temporal information (SI and TI), 2) including video quality metrics as metadata in compressed bitstreams, 3) Quality of Experience (QoE) metrics for live video streaming applications, and 4) providing guidelines on implementing objective video quality metrics to the video compression community.
The following sections provide more details about these new activities and try to encourage interested readers to follow and get involved in any of them by subscribing to the corresponding reflectors.

SI and TI Clarification

The VQEG No-Reference Metrics (NORM) group has recently focused on the topic of spatio-temporal complexity, revisiting the Spatial Information and Temporal Information (SI/TI) indicators, which are described in ITU-T Rec. P.910 [1]. They were originally developed for the T1A1 dataset in 1994 [2]. The metrics have found good use over the last 25 years – mostly employed for checking the complexity of video sources in datasets. However, SI/TI definitions contain ambiguities, so the goal of this sub-project is to provide revised definitions eliminating implementation inconsistencies.

Three main topics are discussed by VQEG in a series of online meetings:

  • Comparison of existing publicly available implementations for SI/TI: a comparison was made between several public open-source implementations for SI/TI, based on initial feedback from members of Facebook. Bugs and inconsistencies were identified with the handling of video frame borders, treatment of limited vs. full range content, as well as the reporting of TI values for the first frame. Also, the lack of standardized test vectors was brought up as an issue. As a consequence, a new reference library was developed in Python by members of TU Ilmenau, incorporating all bug fixes that were previously identified, and introducing a new test suite, to which the public is invited to contribute material. VQEG is now actively looking for specific test sequences that will be useful for both validating existing SI/TI implementations, but also extending the scope of the metrics, which is related to the next issue described below.
  • Study on how to apply SI/TI on different content formats: the description of SI/TI was found to be not suitable for extended applications such as video with a higher bit depth (> 8 Bit), HDR content, or spherical/3D video. Also, the question was raised on how to deal with the presence of scene changes in content. The community concluded that for content with higher bit depth, SI/TI functions should be calculated as specified, but that the output values could be mapped back to the original 8-Bit range to simplify comparisons. As for HDR, no conclusion was reached, given the inherent complexity of the subject. It was also preliminarily concluded that the treatment of scene changes should not be part of an SI/TI recommendation, instead focusing on calculating SI/TI for short sequences without scene changes, since the way scene changes would be dealt with may depend on the final application of the metrics.
  • Discussion on other relevant uses of SI/TI: it has been widely used for checking video datasets in terms of diversity and classifying content. Also, SI/TI have been used in some no-reference metrics as content features. The question was raised whether SI/TI could be used for predicting how well content could be encoded. The group noted that different encoders would deal with sources differently, e.g. related to noise in the video. It was stated that it would be nice to be able to find a metric that was purely related to content and not affected by encoding or representation.

As a first step, this revision of the topic of SI/TI has resulted in a harmonized implementation and in the identification of future application areas. Discussions on these topics will continue in the next months through audio-calls that are open to interested readers.

Video Quality Metadata Standard

Also within NORM group, another topic was launched related to the inclusion of video quality metadata in compressed streams [3].

Almost all modern transcoding pipelines use full-reference video quality metrics to decide on the most appropriate encoding settings. The computation of these quality metrics is demanding in terms of time and computational resources. In addition, estimation errors propagate and accumulate when quality metrics are recomputed several times along the transcoding pipeline. Thus, retaining the results of these metrics with the video can alleviate these constraints, requiring very little space and providing a “greener” way of estimating video quality. With this goal, the new sub-project has started working towards the definition of a standard format to include video quality metrics metadata both at video bitstream level and system layer [4].

In this sense, the experts involved in the new sub-project are working on the following items:

  • Identification of existing proposals and working groups within other standardisation bodies and organisations that address similar topics and propose amendments including new requirements. For example, MPEG has already worked on the adding of video quality metrics (e.g., PSNR, SSIM, MS-SSIM, VQM, PEVQ, MOS, FISG) metadata at system level (e.g, in MPEG2 streams [5], HTTP [6], etc.[7]).
  • Identification of quality metrics to be considered in the standard. In principle, validated and standardized metrics are of interest, although other metrics can be also considered after a validation process on a standard set of subjective data (e.g., using existing datasets). New metrics to those used in previous approaches are of special interest. (e.g., VMAF [8], FB-MOS [9]).
  • Consideration of the computation of multiple generations of full-reference metrics at different steps of the transcoding chain, of the use of metrics at different resolutions, different spatio-temporal aggregation methods, etc.
  • Definition of a standard video quality metadata payload, including relevant fields such as metric name (e.g., “SSIM”), version (e.g., “v0.6.1”), raw score (e.g., “0.9256”), mapped-to-MOS score (e.g., “3.89”), scaling method (e.g., “Lanczos-5”), temporal reference (e.g., “0-3” frames), aggregation method (e.g., “arithmetic mean”), etc [4].

More details and information on how to join this activity can be found in the NORM webpage.

QoE metrics for live video streaming applications

The VQEG Audiovisual HD Quality (AVHD) group launched a new sub-project on QoE metrics for live media streaming applications (Live QoE) in the last VQEG meeting [10].

The success of a live multimedia streaming session is defined by the experience of a participating audience. Both the content communicated by the media and the quality at which it is delivered matter – for the same content, the quality delivered to the viewer is a differentiating factor. Live media streaming systems undertake a lot of investment and operate under very tight service availability and latency constraints to support multimedia sessions for their audience. Both to measure the return on investment and to make sound investment decisions, it is paramount that we be able to measure the media quality offered by these systems. In this sense, given the large scale and complexity of media streaming systems, objective metrics are needed to measure QoE.

Therefore, the following topics have been identified and are studied [11]:

  • Creation of a high quality dataset, including media clips and subjective scores, which will be used to tune, train and develop objective QoE metrics. This dataset should represent the conditions that take place in typical live media streaming situations, therefore conditions and impairments comprising audio and video tracks (independently and jointly) will be considered. In addition, this datasets should cover a diverse set of content categories, including premium contentes (e.g., sports, movies, concerts, etc.) and user generated content (e.g., music, gaming, real life content, etc.).
  • Development of QoE objective metrics, especially focusing on no-reference or near-no-reference metrics, given the lack of access to the original video at various points in the live media streaming chain. Different types of models will be considered including signal-based (operate on the decoded signal), metadata-based (operate on available metadata, e.g. codecs, resolution, framerate, bitrate, etc.), bitstream-based (operate on the parsed bitstream), and hybrid models (combining signal and metadata) [12]. Also, machine-learning based models will be explored.

Certain challenges are envisioned to be faced when dealing with these two topics, such as separating “content” from “quality” (taking int account that content plays a big role on engagement and acceptability), spectrum expectations, role of network impairments and the collection of enough data to develop robust models [11]. Readers interested in joining this effort are encouraged to visit AVHD webpage for more details.

Implementer’s Guide to Video Quality Metrics

In the last meeting, a new dedicated group on Implementer’s Guide to Video Quality Metrics (IGVQM) was set up to work on introducing and provide guidelines on implementing objective video quality metrics to the video compression community.

During the development of new video coding standards, peak-signal-to-noise-ratio (PSNR) has been traditionally used as the main objective metric to determine which new coding tools to be adopted. It has been furthermore used to establish the bitrate savings that a new coding standard offers over its predecessor through the employment of the so-called “BD-rate” metric [13] that still relies on PSNR for measuring quality.

Although this choice was fully justified for the first image/video coding standards – JPEG (1992), MPEG1 (1994), MPEG2 (1996), JPEG2000 and even H.264/AVC (2004) – since there was simply no other alternative at that time, its continuing use for the development of H.265/HEVC (2013), VP9 (2013), AV1 (2018) and most recently EVC and VVC (2020) is questionable, given the rapid and continuous evolution of more perceptual image/video objective quality metrics, such as SSIM (2004) [14], MS-SSIM (2004) [15], and VMAF (2015) [8].

This project attempts to offer some guidance to the video coding community, including standards setting organisations, on how to better utilise existing objective video quality metrics to better capture the improvements offered by video coding tools. For this, the following goals have been envisioned:

  • Address video compression and scaling impairments only.
  • Explore and use “state-of-the-art” full-reference (pixel) objective metrics, examine applicability of no-reference objective metrics, and obtain reference implementations of them.
  • Offer temporal aggregation methods of image quality metrics into video quality metrics.
  • Present statistical analysis of existing subjective datasets, constraining them to compression and scaling artifacts.
  • Highlight differences among objective metrics and use-cases. For example, in case of very small differences, which metric is more sensitive? Which quality range is better served by what metric?
  • Offer standard logistic mappings of objective metrics to a normalised linear scale.

More details can be found in the working document that has been set up to launch the project [16] and on the VQEG website.

References

[1] ITU-T Rec. P.910. Subjective video quality assessment methods for multimedia applications, 2008.
[2] M. H. Pinson and A. Webster, “T1A1 Validation Test Database,” VQEG eLetter, vol. 1, no. 2, 2015.
[3] I. Katsavounidis, “Video quality metadata in compressed bitstreams”, Presentation in VQEG Meeting, Dec. 2020.
[4] I. Katsavounidis et al. “A case for embedding video quality metrics as metadata in compressed bitstreams, working document, 2019.
[5] ISO/IEC 13818-1:2015/AMD 6:2016 Carriage of Quality Metadata in MPEG2 Streams.
[6] ISO/IEC 23009 Dynamic Adaptive Streaming over HTTP (DASH).
[7] ISO/IEC 23001-10, MPEG Systems Technologies – Part 10: Carriage of timed metadata metrics of media in ISO base media file format.
[8] Toward a practical perceptual video quality metric, Tech blog with VMAF’s open sourcing on Github, Jun. 6, 2016.
[9] S.L. Regunathan, H. Wang, Y. Zhang, Y. R. Liu, D. Wolstencroft, S. Reddy, C. Stejerean, S. Gandhi, M. Chen, P. Sethi, A, Puntambekar, M. Coward, I. Katsavounidis, “Efficient measurement of quality at scale in Facebook video ecosystem”, in Applications of Digital Image Processing XLIII, vol. 11510, p. 115100J, Aug. 2020.
[10] R. Puri, “On a QoE metric for live media streaming applications”, Presentation in VQEG Meeting, Dec. 2020.
[11] R. Puri and S. Satti, “On a QoE metric for live media streaming applications”, working document, Jan. 2021.
[12] A. Raake, S. Borer, S. Satti, J. Gustafsson, R.R.R. Rao, S. Medagli, P. List, S. Göring, D. Lindero, W. Robitza, G. Heikkilä, S. Broom, C. Schmidmer, B. Feiten, U. Wüstenhagen, T. Wittmann, M. Obermann, R. Bitto, “Multi-model standard for bitstream-, pixel-based and hybrid video quality assessment of UHD/4K: ITU-T P.1204” , IEEE Access, vol. 8, Oct. 2020.
[13] G. Bjøntegaard, “Calculation of Average PSNR Differences Between RD-Curves”, Document VCEG-M33, ITU-T SG 16/Q6, 13th VCEG Meet- ing, Austin, TX, USA, Apr. 2001.
[14] Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” in IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, April 2004.
[15] Z. Wang, E. P. Simoncelli and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, Pacific Grove, CA, USA, 2003.
[16] I. Katsavounidis, “VQEG’s Implementer’s Guide to Video Quality Metrics (IGVQM) project , working document, 2021.

MPEG Column: 133rd MPEG Meeting (virtual/online)

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 133rd MPEG meeting was once again held as an online meeting, and this time, kicked off with great news, that MPEG is one of the organizations honored as a 72nd Annual Technology & Engineering Emmy® Awards Recipient, specifically the MPEG Systems File Format Subgroup and its ISO Base Media File Format (ISOBMFF) et al.

The official press release can be found here and comprises the following items:

  • 6th Emmy® Award for MPEG Technology: MPEG Systems File Format Subgroup wins Technology & Engineering Emmy® Award
  • Essential Video Coding (EVC) verification test finalized
  • MPEG issues a Call for Evidence on Video Coding for Machines
  • Neural Network Compression for Multimedia Applications – MPEG calls for technologies for incremental coding of neural networks
  • MPEG Systems reaches the first milestone for supporting Versatile Video Coding (VVC) and Essential Video Coding (EVC) in the Common Media Application Format (CMAF)
  • MPEG Systems continuously enhances Dynamic Adaptive Streaming over HTTP (DASH)
  • MPEG Systems reached the first milestone to carry event messages in tracks of the ISO Base Media File Format

In this report, I’d like to focus on ISOBMFF, EVC, CMAF, and DASH.

MPEG Systems File Format Subgroup wins Technology & Engineering Emmy® Award

MPEG is pleased to report that the File Format subgroup of MPEG Systems is being recognized this year by the National Academy for Television Arts and Sciences (NATAS) with a Technology & Engineering Emmy® for their 20 years of work on the ISO Base Media File Format (ISOBMFF). This format was first standardized in 1999 as part of the MPEG-4 Systems specification and is now in its 6th edition as ISO/IEC 14496-12. It has been used and adopted by many other specifications, e.g.:

  • MP4 and 3GP file formats;
  • Carriage of NAL unit structured video in the ISO Base Media File Format which provides support for AVC, HEVC, VVC, EVC, and probably soon LCEVC;
  • MPEG-21 file format;
  • Dynamic Adaptive Streaming over HTTP (DASH) and Common Media Application Format (CMAF);
  • High-Efficiency Image Format (HEIF);
  • Timed text and other visual overlays in ISOBMFF;
  • Common encryption format;
  • Carriage of timed metadata metrics of media;
  • Derived visual tracks;
  • Event message track format;
  • Carriage of uncompressed video;
  • Omnidirectional Media Format (OMAF);
  • Carriage of visual volumetric video-based coding data;
  • Carriage of geometry-based point cloud compression data;
  • … to be continued!

This is MPEG’s fourth Technology & Engineering Emmy® Award (after MPEG-1 and MPEG-2 together with JPEG in 1996, Advanced Video Coding (AVC) in 2008, and MPEG-2 Transport Stream in 2013) and sixth overall Emmy® Award including the Primetime Engineering Emmy® Awards for Advanced Video Coding (AVC) High Profile in 2008 and High-Efficiency Video Coding (HEVC) in 2017, respectively.

Essential Video Coding (EVC) verification test finalized

At the 133rd MPEG meeting, a verification testing assessment of the Essential Video Coding (EVC) standard was completed. The first part of the EVC verification test using high dynamic range (HDR) and wide color gamut (WCG) was completed at the 132nd MPEG meeting. A subjective quality evaluation was conducted comparing the EVC Main profile to the HEVC Main 10 profile and the EVC Baseline profile to AVC High 10 profile, respectively:

  • Analysis of the subjective test results showed that the average bitrate savings for EVC Main profile are approximately 40% compared to HEVC Main 10 profile, using UHD and HD SDR content encoded in both random access and low delay configurations.
  • The average bitrate savings for the EVC Baseline profile compared to the AVC High 10 profile is approximately 40% using UHD SDR content encoded in the random-access configuration and approximately 35% using HD SDR content encoded in the low delay configuration.
  • Verification test results using HDR content had shown average bitrate savings for EVC Main profile of approximately 35% compared to HEVC Main 10 profile.

By providing significantly improved compression efficiency compared to HEVC and earlier video coding standards while encouraging the timely publication of licensing terms, the MPEG-5 EVC standard is expected to meet the market needs of emerging delivery protocols and networks, such as 5G, enabling the delivery of high-quality video services to an ever-growing audience. 

In addition to verification tests, EVC, along with VVC and CMAF were subject to further improvements to their support systems.

Research aspects: as for every new video codec, its compression efficiency and computational complexity are important performance metrics. Additionally, the availability of (efficient) open-source implementations (i.e., x264, x265, soon x266, VVenC, aomenc, et al., etc.) are vital for its adoption in the (academic) research community.

MPEG Systems reaches the first milestone for supporting Versatile Video Coding (VVC) and Essential Video Coding (EVC) in the Common Media Application Format (CMAF)

At the 133rd MPEG meeting, MPEG Systems promoted Amendment 2 of the Common Media Application Format (CMAF) to Committee Draft Amendment (CDAM) status, the first major milestone in the ISO/IEC approval process. This amendment defines:

  • constraints to (i) Versatile Video Coding (VVC) and (ii) Essential Video Coding (EVC) video elementary streams when carried in a CMAF video track;
  • codec parameters to be used for CMAF switching sets with VVC and EVC tracks; and
  • support of the newly introduced MPEG-H 3D Audio profile.

It is expected to reach its final milestone in early 2022. For research aspects related to CMAF, the reader is referred to the next section about DASH.

MPEG Systems continuously enhances Dynamic Adaptive Streaming over HTTP (DASH)

At the 133rd MPEG meeting, MPEG Systems promoted Part 8 of Dynamic Adaptive Streaming over HTTP (DASH) also referred to as “Session-based DASH” to its final stage of standardization (i.e., Final Draft International Standard (FDIS)).

Historically, in DASH, every client uses the same Media Presentation Description (MPD), as it best serves the scalability of the service. However, there have been increasing requests from the industry to enable customized manifests for enabling personalized services. MPEG Systems has standardized a solution to this problem without sacrificing scalability. Session-based DASH adds a mechanism to the MPD to refer to another document, called Session-based Description (SBD), which allows per-session information. The DASH client can use this information (i.e., variables and their values) provided in the SBD to derive the URLs for HTTP GET requests.

An updated overview of DASH standards/features can be found in the Figure below.

MPEG DASH Status as of January 2021.

Research aspects: CMAF is mostly like becoming the main segment format to be used in the context of HTTP adaptive streaming (HAS) and, thus, also DASH (hence also the name common media application format). Supporting a plethora of media coding formats will inevitably result in a multi-codec dilemma to be addressed in the near future as there will be no flag day where everyone will switch to a new coding format. Thus, designing efficient bitrate ladders for multi-codec delivery will an interesting research aspect, which needs to include device/player support (i.e., some devices/player will support only a subset of available codecs), storage capacity/costs within the cloud as well as within the delivery network, and network distribution capacity/costs (i.e., CDN costs).

The 134th MPEG meeting will be again an online meeting in April 2021. Click here for more information about MPEG meetings and their developments.

JPEG Column: 90th JPEG Meeting

JPEG AI becomes a new work item of ISO/IEC

The 90th JPEG meeting was held online from 18 to 22 January 2021. This meeting was distinguished by very relevant activities, notably the new JPEG AI standardization project planning, and the analysis of the Call for Evidence on JPEG Pleno Point Cloud Coding.

The new JPEG AI Learning-based Image Coding System has become an official new work item registered under ISO/IEC 6048 and aims at providing compression efficiency in addition to image processing and computer visions tasks without the need for decompression.

The response to the Call for Evidence on JPEG Pleno Point Cloud Coding was a learning-based method that was found to offer state of the art compression efficiency.  Considering this response, the JPEG Pleno Point Cloud activity will analyse the possibility of preparing a future call for proposals on learning-based coding solutions that will also consider new functionalities, building on the relevant use cases already identified that require machine learning tasks processed in the compressed domain.

Meanwhile the new JPEG XL coding system has reached FDIS stage and it is ready for adoption. JPEG XL offers compression efficiency similar to the best state of the art in image coding, the best lossless compression performance, affordable low complexity and integration with the legacy JPEG image coding standard allowing a friendly transition between the two standards.

The new JPEG AI logo.

The 90th JPEG meeting had the following highlights:

  • JPEG AI,
  • JPEG Pleno Point Cloud response to the Call for Evidence,
  • JPEG XL Core Coding System reaches FDIS stage,
  • JPEG Fake Media exploration,
  • JPEG DNA continues the exploration on image coding suitable for DNA storage,
  • JPEG systems,
  • JPEG XS 2nd edition of Profiles reaches DIS stage.

JPEG AI

The scope of the JPEG AI is the creation of a learning-based image coding standard offering a single-stream, compact compressed domain representation, targeting both human visualization with significant compression efficiency improvement over image coding standards in common use at equivalent subjective quality, and effective performance for image processing and computer vision tasks, with the goal of supporting a royalty-free baseline.

JPEG AI has made several advances during the 90th technical meeting. During this meeting, the JPEG AI Use Cases and Requirements were discussed and collaboratively defined. Moreover, the JPEG AI vision and the overall system framework of an image compression solution with efficient compressed domain representation was defined. Following this approach, a set of exploration experiments were defined to assess the capabilities of the compressed representation generated by learning-based image codecs, considering some specific computer vision and image processing tasks.

Moreover, the performance assessment of the most popular objective quality metrics, using subjective scores obtained during the call for evidence were discussed, as well as anchors and some techniques to perform spatial prediction and entropy coding.

JPEG Pleno Point Cloud response to the Call for Evidence

JPEG Pleno is working towards the integration of various modalities of plenoptic content under a single and seamless framework. Efficient and powerful point cloud representation is a key feature within this vision. Point cloud data supports a wide range of applications including computer-aided manufacturing, entertainment, cultural heritage preservation, scientific research and advanced sensing and analysis. During the 90th JPEG meeting, the JPEG Committee reached an exciting major milestone and reviewed the results of its Final Call for Evidence on JPEG Pleno Point Cloud Coding. With an innovative Deep Learning based point cloud codec supporting scalability and random access submitted, the Call for Evidence results highlighted the emerging role of Deep Learning in point cloud representation and processing. Between the 90th and 91st meetings, the JPEG Committee will be refining the scope and direction of this activity in light of the results of the Call for Evidence.

JPEG XL Core Coding System reaches FDIS stage

The JPEG Committee has finalized JPEG XL Part 1 (Core Coding System), which is now at FDIS stage. The committee has defined new core experiments to determine appropriate profiles and levels for the codec, as well as appropriate criteria for defining conformance. With Part 1 complete, and Part 2 close to completion, JPEG XL is ready for evaluation and adoption by the market.

JPEG Fake Media exploration

The JPEG Committee initiated the JPEG Fake Media JPEG exploration study with the objective to create a standard that can facilitate secure and reliable annotation of media asset generation and modifications. The initiative aims to support usage scenarios that are in good faith as well as those with malicious intent. During the 90th JPEG meeting, the committee released a new version of the document entitled “JPEG Fake Media: Context Use Cases and Requirements” which is available on the JPEG website. A first workshop on the topic was organized on the 15th of December 2020. The program, presentations and a video recording of this workshop are available on the JPEG website. A second workshop will be organized around March 2021. More details will be made available soon on JPEG.org. JPEG invites interested parties to regularly visit https://jpeg.org/jpegfakemedia for the latest information and subscribe to the mailing list via http://listregistration.jpeg.org.

JPEG DNA continues the exploration on image coding suitable for DNA storage

The JPEG Committee continued its exploration for coding of images in quaternary representation, particularly suitable for DNA storage. After a second successful workshop presentation by stakeholders, additional requirements were identified, and a new version of the JPEG DNA overview document was issued and made publicly available. It was decided to continue this exploration by organising a third workshop and further outreach to stakeholders, as well as a proposal for an updated version of the JPEG overview document. Interested parties are invited to refer to the following URL and to consider joining the effort by registering to the mailing list of JPEG DNA here: https://jpeg.org/jpegdna/index.html.

JPEG Systems

JUMBF (ISO/IEC 19566-5) Amendment 1 draft review is complete and it is proceeding to international standard and subsequent publication; additional features to support new applications are under consideration.   Likewise, JPEG 360 (ISO/IEC 19566-5) Amendment 1 draft review is complete, and it is proceeding to international standard and subsequent publication.  The JLINK (ISO/IEC 19566-7) standard completed the committee draft review and is preparing a DIS study text ahead of the 91st meeting. The JPEG Snack (ISO/IEC 19566-8) will make a second working draft.  Interested parties can subscribe to the mailing list of the JPEG Systems AHG in order to contribute to the above activities.

JPEG XS 2nd edition of Profiles reaches DIS stage

The 2nd edition of Part 2 (Profiles) is now at the DIS stage and defines the required new profiles and levels to support the compression of raw Bayer content, mathematically lossless coding of up to 12-bit per component images, and 4:2:0 sampled image content. With the second editions of Parts 1, 2, and 3 completed, and the scheduled second editions of Part 4 (Conformance) and 5 (Reference Software), JPEG XS will soon have received a complete backwards-compatible revision of its entire suite of standards. Moreover, the committee defined a new exploration study to create new coding tools for improving the HDR and mathematically lossless compression capabilities, while still honoring the low-complexity and low-latency requirements.

Final Quote

“The official approval of JPEG AI by JPEG Parent Bodies ISO and IEC is a strong signal of support of this activity and its importance in the creation of AI-based imaging applications” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Future JPEG meetings are planned as follows:

  • No 91, will be held online from April 19 to 23, 2021.
  • No 92, will be held online from July 7 to 13, 2021.

ITU-T Standardization Activities Targeting Gaming Quality of Experience

Motivation for Research in the Gaming Domain

The gaming industry has eminently managed to intrinsically motivate users to interact with their services. According to the latest report of Newzoo, there will be an estimated total of 2.7 billion players across the globe by the end of 2020. The global games market will generate revenues of $159.3 billion in 2020 [1]. This surpasses the movie industry (box offices and streaming services) by a factor of four and almost three times the music industry market in value [2].

The rapidly growing domain of online gaming emerged in the late 1990s and early 2000s allowing social relatedness to a great number of players. During traditional online gaming, typically, the game logic and the game user interface are locally executed and rendered on the player’s hardware. The client device is connected via the internet to a game server to exchange information influencing the game state, which is then shared and synchronized with all other players connected to the server. However, in 2009 a new concept called cloud gaming emerged that is comparable to the rise of Netflix for video consumption and Spotify for music consumption. On the contrary to traditional online gaming, cloud gaming is characterized by the execution of the game logic, rendering of the virtual scene, and video encoding on a cloud server, while the player’s client is solely responsible for video decoding and capturing of client input [3].

For online gaming and cloud gaming services, in contrast to applications such as voice, video, and web browsing, little information existed on factors influencing the Quality of Experience (QoE) of online video games, on subjective methods for assessing gaming QoE, or on instrumental prediction models to plan and manage QoE during service set-up and operation. For this reason, Study Group (SG) 12 of the Telecommunication Standardization Sector of the International Telecommunication Union (ITU-T) has decided to work on these three interlinked research tasks [4]. This was especially required since the evaluation of gaming applications is fundamentally different compared to task-oriented human-machine interactions. Traditional aspects such as effectiveness and efficiency as part of usability cannot be directly applied to gaming applications like a game without any challenges and time passing would result in boredom, and thus, a bad player experience (PX). The absence of standardized assessment methods as well as knowledge about the quantitative and qualitative impact of influence factors resulted in a situation where many researchers tended to use their own self-developed research methods. This makes collaborative work through reliably, valid, and comparable research very difficult. Therefore, it is the aim of this report to provide an overview of the achievements reached by ITU-T standardization activities targeting gaming QoE.

Theory of Gaming QoE

As a basis for the gaming research carried out, in 2013 a taxonomy of gaming QoE aspects was proposed by Möller et al. [5]. The taxonomy is divided into two layers of which the top layer contains various influencing factors grouped into user (also human), system (also content), and context factors. The bottom layer consists of game-related aspects including hedonic concepts such as appeal, pragmatic concepts such as learnability and intuitivity (part of playing quality which can be considered as a kind of game usability), and finally, the interaction quality. The latter is composed of output quality (e.g., audio and video quality), as well as input quality and interactive behaviour. Interaction quality can be understood as the playability of a game, i.e., the degree to which all functional and structural elements of a game (hardware and software) enable a positive PX. The second part of the bottom layer summarized concepts related to the PX such as immersion (see [6]), positive and negative affect, as well as the well-known concept of flow that describes an equilibrium between requirements (i.e., challenges) and abilities (i.e., competence). Consequently, based on the theory depicted in the taxonomy, the question arises which of these aspects are relevant (i.e., dominant), how they can be assessed, and to which extent they are impacted by the influencing factors.

Fig. 1: Taxonomy of gaming QoE aspects. Upper panel: Influence factors and interaction performance aspects; lower panel: quality features (cf. [5]).

Introduction to Standardization Activities

Building upon this theory, the SG 12 of the ITU-T has decided during the 2013-2016 Study Period to start work on three new work items called P.GAME, G.QoE-gaming, and G.OMG. However, there are also other related activities at the ITU-T summarized in Fig. 2 about evaluation methods (P.CrowdG), and gaming QoE modelling activities (G.OMMOG and P.BBQCG).

Fig. 2: Overview of ITU-T SG12 recommendations and on-going work items related to gaming services.

The efforts on the three initial work items continued during the 2017-2020 Study Period resulting in the recommendations G.1032, P.809, and G.1072, for which an overview will be given in this section.

ITU-T Rec. G.1032 (G.QoE-gaming)

The ITU-T Rec. G.1032 aims at identifying the factors which potentially influence gaming QoE. For this purpose, the Recommendation provides an overview table and then roughly classifies the influence factors into (A) human, (B) system, and (C) context influence factors. This classification is based on [7] but is now detailed with respect to cloud and online gaming services. Furthermore, the recommendation considers whether an influencing factor carries an influence mainly in a passive viewing-and-listening scenario, in an interactive online gaming scenario, or in an interactive cloud gaming scenario. This classification is helpful to evaluators to decide which type of impact may be evaluated with which type of text paradigm [4]. An overview of the influencing factors identified for the ITU-T Rec. G.1032 is presented in Fig. 3. For subjective user studies, in most cases the human and context factors should be controlled and their influence should be reduced as much as possible. For example, even though it might be a highly impactful aspect of today’s gaming domain, within the scope of the ITU-T cloud gaming modelling activities, only single-player user studies are conducted to reduce the impact of social aspects which are very difficult to control. On the other hand, as network operators and service providers are the intended stakeholders of gaming QoE models, the relevant system factors must be included in the development process of the models, in particular the game content as well as network and encoding parameters.

Fig. 3: Overview of influencing factors on gaming QoE summarized in ITU-T Rec. G.1032 (cf. [3]).

ITU-T Rec. P.809 (P.GAME)

The aim of the ITU-T Rec. P.809 is to describe subjective evaluation methods for gaming QoE. Since there is no single standardized evaluation method available that would cover all aspects of gaming QoE, the recommendation mainly summarizes the state of the art of subjective evaluation methods in order to help to choose suitable methods to conduct subjective experiments, depending on the purpose of the experiment. In its main body, the draft consists of five parts: (A) Definitions for games considered in the Recommendation, (B) definitions of QoE aspects relevant in gaming, (C) a description of test paradigms, (D) a description of the general experimental set-up, recommendations regarding passive viewing-and-listening tests and interactive tests, and (E) a description of questionnaires to be used for gaming QoE evaluation. It is amended by two paragraphs regarding performance and physiological response measurements and by (non-normative) appendices illustrating the questionnaires, as well as an extensive list of literature references [4].

Fundamentally, the ITU-T Rec. P.809 defines two test paradigms to assess gaming quality:

  • Passive tests with predefined audio-visual stimuli passively observed by a participant.
  • Interactive tests with game scenarios interactively played by a participant.

The passive paradigm can be used for gaming quality assessment when the impairment does not influence the interaction of players. This method suggests a short stimulus duration of 30s which allows investigating a great number of encoding conditions while reducing the influence of user behaviours on the stimulus due to the absence of their interaction. Even for passive tests, as the subjective ratings will be merged with those derived from interactive tests for QoE model developments, it is recommended to give instruction about the game rules and objectives to allow participants to have similar knowledge of the game. The instruction should also explain the difference between video quality and graphic quality (e.g., graphical details such as abstract and realistic graphics), as this is one of the common mistakes of participants in video quality assessment of gaming content.

The interactive test should be used when other quality features such as interaction quality, playing quality, immersion, and flow are under investigation. While for the interaction quality, a duration of 90s is proposed, a longer duration of 5-10min is suggested in the case of research targeting engagement concepts such as flow. Finally, the recommendation provides information about the selection of game scenarios as stimulus material for both test paradigms, e.g., ability to provide repetitive scenarios, balanced difficulty, representative scenes in terms of encoding complexity, and avoiding ethically questionable content.

ITU-T Rec. G.1072 (G.OMG)

The quality management of gaming services would require quantitative prediction models. Such models should be able to predict either “overall quality” (e.g., in terms of a Mean Opinion Score), or individual QoE aspects from characteristics of the system, potentially considering the player characteristics and the usage context. ITU-T Rec. G.1072 aims at the development of quality models for cloud gaming services based on the impact of impairments introduced by typical Internet Protocol (IP) networks on the quality experienced by players. G.1072 is a network planning tool that estimates the gaming QoE based on the assumption of network and encoding parameters as well as game content.

The impairment factors are derived from subjective ratings of the corresponding quality aspects, e.g., spatial video quality or interaction quality, and modelled by non-linear curve fitting. For the prediction of the overall score, linear regression is used. To create the impairment factors and regression, a data transformation from the MOS values of each test condition to the R-scale was performed, similar to the well-known E-model [8]. The R-scale, which results from an s-shaped conversion of the MOS scale, promises benefits regarding the additivity of the impairments and compensation for the fact that participants tend to avoid using the extremes of rating scales [3].

As the impact of the input parameters, e.g. delay, was shown to be highly content-dependent, the model used two modes. If no assumption on a game sensitivity class towards degradations is available to the user of the model (e.g. a network provider), the “default” mode of operation should be used that considers the highest (sensitivity) game class. The “default” mode of operation will result in a pessimistic quality prediction for games that are not of high complexity and sensitivity. If the user of the model can make an assumption about the game class (e.g. a service provider), the “extended” mode can predict the quality with a higher degree of accuracy based on the assigned game classes.

On-going Activities

While the three recommendations provide a basis for researchers, as well as network operators and cloud gaming service providers towards improving gaming QoE, the standardization activities continue by initiating new work items focusing on QoE assessment methods and gaming QoE model development for cloud gaming and online gaming applications. Thus, three work items have been established within the past two years.

ITU-T P.BBQCG

P.BBQCG is a work item that aims at the development of a bitstream model predicting cloud gaming QoE. Thus, the model will benefit from the bitstream information, from header and payload of packets, to reach a higher accuracy of audiovisual quality prediction, compared to G.1072. In addition, three different types of codecs and a wider range of network parameters will be considered to develop a generalizable model. The model will be trained and validated for H.264, H.265, and AV1 video codecs and video resolutions up to 4K. For the development of the model, two paradigms of passive and interactive will be followed. The passive paradigm will be considered to cover a high range of encoding parameters, while the interactive paradigm will cover the network parameters that might strongly influence the interaction of players with the game.

ITU-T P.CrowdG

A gaming QoE study is per se a challenging task on its own due to the multidimensionality of the QoE concept and a large number of influence factors. However, it becomes even more challenging if the test would follow a crowdsourcing approach which is of particular interest in times of the COVID-19 pandemic or if subjective ratings are required from a highly diverse audience, e.g., for the development or investigation of questionnaires. The aim of the P.CrowdG work item is to develop a framework that describes the best practices and guidelines that have to be considered for gaming QoE assessment using a crowdsourcing approach. In particular, the crowd gaming framework provides the means to ensure reliable and valid results despite the absence of an experimenter, controlled network, and visual observation of test participants had to be considered. In addition to the crowd game framework, guidelines will be given that provide recommendations to ensure collecting valid and reliable results, addressing issues such as how to make sure workers put enough focus on the gaming and rating tasks. While a possible framework for interactive tests of simple web-based games is already presented in [9], more work is required to complete the ITU-T work item for more advanced setups and passive tests.

ITU-T G.OMMOG

G.OMMOG is a work item that focuses on the development of an opinion model predicting gaming Quality of Experience (QoE) for mobile online gaming services. The work item is a possible extension of the ITU-T Rec. G.1072. In contrast to G.1072, the games are not executed on a cloud server but on a gaming server that exchanges game states with the user’s clients instead of a video stream. This more traditional gaming concept represents a very popular service, especially considering multiplayer gaming such as recently published AAA titles of the Multiplayer Online Battle Arena (MOBA) and battle royal genres.

So far, it is decided to follow a similar model structure to ITU-T Rec. G.1072. However, the component of spatial video quality, which was a major part of G.1072, will be removed, and the corresponding game type information will not be used. In addition, for the development of the model, it was decided to investigate the impact of variable delay and packet loss burst, especially as their interaction can have a high impact on the gaming QoE. It is assumed that more variability of these factors and their interplay will weaken the error handling of mobile online gaming services. Due to missing information on the server caused by packet loss or strong delays, the gameplay is assumed to be not smooth anymore (in the gaming domain, this is called ‘rubber banding’), which will lead to reduced temporal video quality.

About ITU-T SG12

ITU-T Study Group 12 is the expert group responsible for the development of international standards (ITU-T Recommendations) on performance, quality of service (QoS), and quality of experience (QoE). This work spans the full spectrum of terminals, networks, and services, ranging from speech over fixed circuit-switched networks to multimedia applications over mobile and packet-based networks.

In this article, the previous achievements of the ITU-T SG12 with respect to gaming QoE are described. The focus was in particular on subjective assessment methods, influencing factors, and modelling of gaming QoE. We hope that this information will significantly improve the work and research in this domain by enabling more reliable, comparable, and valid findings. Lastly, the report also points out many on-going activities in this rapidly changing domain, to which everyone is gladly invited to participate.

More information about the SG12, which will host its next E-meeting from 4-13 May 2021, can be found at ITU Study Group (SG) 12.

For more information about the gaming activities described in this report, please contact Sebastian Möller (sebastian.moeller@tu-berlin.de).

Acknowledgement

The authors would like to thank all colleagues of ITU-T Study Group 12, as well as of the Qualinet gaming Task Force, for their support. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871793 and No 643072 as well as by the German Research Foundation (DFG) within project MO 1038/21-1.

References

[1] T. Wijman, The World’s 2.7 Billion Gamers Will Spend $159.3 Billion on Games in 2020; The Market Will Surpass $200 Billion by 2023, 2020.

[2] S. Stewart, Video Game Industry Silently Taking Over Entertainment World, 2019.

[3] S. Schmidt, Assessing the Quality of Experience of Cloud Gaming Services, Ph.D. dissertation, Technische Universität Berlin, 2021.

[4] S. Möller, S. Schmidt, and S. Zadtootaghaj, “New ITU-T Standards for Gaming QoE Evaluation and Management”, in 2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX), IEEE, 2018.

[5] S. Möller, S. Schmidt, and J. Beyer, “Gaming Taxonomy: An Overview of Concepts and Evaluation Methods for Computer Gaming QoE”, in 2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX), IEEE, 2013.

[6] A. Perkis and C. Timmerer, Eds., QUALINET White Paper on Definitions of Immersive Media Experience (IMEx), European Network on Quality of Experience in Multimedia Systems and Services, 14th QUALINET meeting, 2020.

[7] P. Le Callet, S. Möller, and A. Perkis, Eds, Qualinet White Paper on Definitions of Quality of Experience, COST Action IC 1003, 2013.

[8] ITU-T Recommendation G.107, The E-model: A Computational Model for Use in Transmission Planning. Geneva: International Telecommunication Union, 2015.

[9] S. Schmidt, B. Naderi, S. S. Sabet, S. Zadtootaghaj, and S. Möller, “Assessing Interactive Gaming Quality of Experience Using a Crowdsourcing Approach”, in 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), IEEE, 2020.

VQEG Column: VQEG Meeting Dec. 2020 (virtual/online)

Introduction

Welcome to the third column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG).
The last VQEG plenary meeting took place online from 14 to 18 December. Given the current circumstances, it was organized all online for the second time, with multiple sessions distributed over five to six hours each day allowing remote participation of people from different time zones. About 130 participants from 24 different countries registered to the meeting and could attend the several presentations and discussions that took place in all working groups.
This column provides an overview of this meeting, while all the information, minutes, files (including the presented slides), and video recordings from the meeting are available online in the VQEG meeting website. As highlights of interest for the SIGMM community, apart from several interesting presentations of state-of-the-art works, relevant contributions to ITU recommendations related to multimedia quality assessment were reported from various groups (e.g., on adaptive bitrate streaming services, on subjective quality assessment of 360-degree videos, on statistical analysis of quality assessments, on gaming applications, etc.), the new group on quality assessment for health applications was launched, and an interesting session on 5G use cases took place, as well as a workshop dedicated to user testing during Covid-19. In addition, new efforts have been launched related to the research on quality metrics for live media streaming applications, and to provide guidelines on implementing objective video quality metrics (ahead of PSNR) to the video compression community.
We encourage those readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Overview of VQEG Projects

Audiovisual HD (AVHD)

AVHD/P.NATS2 project was a joint collaboration between VQEG and ITU SG12, whose goal was to develop a multitude of objective models, varying in terms of complexity/type of input/use-cases for the assessment of video quality in adaptive bitrate streaming services over reliable transport up to 4K. The report of this project, which finished in January 2020, was approved in this meeting. In summary, it resulted in 10 model categories with models trained and validated on 26 subjective datasets. This activity resulted in 4 ITU standards (ITU-T Rec. P.1204 in [1], P.1204.3 in [2], P.1204.4 in [3], P.1204.5 in [4], a dataset created during this effort and a journal publication reporting details on the validation tests [5]. In this sense, one presentation by Alexander Raake (TU Ilmenau) provided details on the P.NATS Phase 2 project and the resulting ITU recommendations, while details of the processing chain used in the project were presented by Werner Robitza (AVEQ GmbH) and David Lindero (Ericsson).
In addition to this activity, there were various presentations covering topics related to this group. For instance, Cindy Chen, Deepa Palamadai Sundar, and Visala Vaduganathan (Facebook) presented their work on hardware acceleration of video quality metrics. Also from Facebook, Haixiong Wang presented their work on efficient measurement of quality at scale in their video ecosystem [6]. Lucjan Janowski (AGH University) proposed a discussion on more ecologically valid subjective experiments, Alan Bovik (University of Texas at Austin) presented a hitchhiker’s guide to SSIM, and Ali Ak (Université de Nantes) presented a comprehensive analysis of crowdsourcing for subjective evaluation of tone mapping operators. Finally, Rohit Puri (Twitch) opened a discussion on the research on QoE metrics for live media streaming applications, which led to the agreement to start a new sub-project within AVHD group on this topic.

Psycho-Physiological Quality Assessment (PsyPhyQA)

The chairs of the PsyPhyQA group provided an update on the activities carried out. In this sense, a test plan for psychophysiological video quality assessment was established and currently the group is aiming to develop ideas to do quality assessment tests with psychophysiological measures in times of a pandemic and to collect and discuss ideas about possible joint works. In addition, the project is trying to learn about physiological correlates of simulator sickness, and in this sense, a presentation was delivered J.P. Tauscher (Technische Universität Braunschweig) on exploring neural and peripheral physiological correlates of simulator sickness. Finally, Waqas Ellahi (Université de Nantes) gave a presentation on visual fidelity of tone mapping operators from gaze data using HMM [7].

Quality Assessment for Health applications (QAH)

This was the first meeting for this new QAH group. The chairs informed about the first audio call that took place on November to launch the project, know how many people are interested in this project, what each member has already done on medical images, what each member wants to do in this joint project, etc.
The plenary meeting served to collect ideas about possible joint works and to share experiences on related studies. In this sense, Lucie Lévêque (Université Gustave Eiffel) presented a review on subjective assessment of the perceived quality of medical images and videos, Maria Martini (Kingston University London) talked about the suitability of VMAF for quality assessment of medical videos (ultrasound & wireless capsule endoscopy), and Jorge Caviedes (ASU) delivered a presentation on cognition inspired diagnostic image quality models.

Statistical Analysis Methods (SAM)

The update report from SAM group presented the ongoing progress on new methods for data analysis, including the discussion with ITU-T (P.913 [8]) and ITU-R (BT.500 [9]) about including a new one in the recommendations.
Several interesting presentations related to the ongoing work within SAM were delivered. For instance, Jakub Nawala (AGH University) presented the “su-JSON”, a uniform JSON-based subjective data format, as well as his work on describing subjective experiment consistency by p-value p–p plots. An interesting discussion was raised by Lucjan Janowski (AGH University) on how to define the quality of a single sequence, analyzing different perspectives (e.g., crowd, experts, psychology, etc.). Also, Babak Naderi (TU Berlin) presented an analysis on the relation on Mean Opinion Score (MOS) and ranked-based statistics. Recent advances on Netflix quality metric VMAF were presented by Zhi Li (Netflix), especially on the properties of VMAF in the presence of image enhancement. Finally, two more presentations addressed the progress on statistical analyses of quality assessment data, one by Margaret Pinson (NTIA/ITS) on the computation of confidence intervals, and one by Suiyi Ling (Université de Nantes) on a probabilistic model to recover the ground truth and annotator’s behavior.

Computer Generated Imagery (CGI)

The report from the chairs of the CGI group covered the progress on the research on assessment methodologies for quality assessment of gaming services (e.g., ITU-T P.809 [10]), on crowdsourcing quality assessment for gaming application (P.808 [11]), on quality prediction and opinion models for cloud gaming (e.g., ITU-T G.1072 [12]), and on models (signal-, bitstream-, and parametric-based models) for video quality assessment of CGI content (e.g., nofu, NDNetGaming, GamingPara, DEMI, NR-GVQM, etc.).
In terms of planned activities, the group is targeting the generation of new gaming datasets and tools for metrics to assess gaming QoE, but also the group is aiming at identifying other topics of interest in CGI rather than gaming content.
In addition, there was a presentation on updates on gaming standardization activities and deep learning models for gaming quality prediction by Saman Zadtootaghaj (TU Berlin), another one on subjective assessment of multi-dimensional aesthetic assessment for mobile game images by Suiyi Ling (Université de Nantes), and one addressing quality assessment of gaming videos compressed via AV1 by Maria Martini (Kingston University London), leading to interesting discussions on those topics.

No Reference Metrics (NORM)

The session for NORM group included a presentation on the differences among existing implementations of spatial and temporal perceptual information indices (SI and TI as defined in ITU-T P.910 [13]) by Cosmin Stejerean (Facebook), which led to an open discussion and to the agreement on launching an effort to clarify the ambiguous details that have led to different implementations (and different results), to generate test vectors for reference and validation of the implementations and to address the computation of these indicators for HDR content. In addition, Margaret Pinson (NTIA/ITS) presented the paradigm of no-reference metric research analyzing design problems and presenting a framework for collaborative development of no-reference metrics for image and video quality. Finally, Ioannis Katsavounidis (Facebook) delivered a talk on addressing the addition of video quality metadata in compressed bitstreams. Further discussions on these topics are planned in the next month within the group.

Joint Effort Group (JEG) – Hybrid

The JEG-Hybrid group is currently working in collaboration with Sky Group in determining when video quality metrics are likely to inaccurately predict the MOS and on modelling single observers’ quality perception based in artificial intelligence techniques. In this sense, Lohic Fotio (Politecnico di Tornio) presented his work on artificial intelligence-based observers for media quality assessment. Also, together with Florence Agboma (Sky UK) they presented their work on comparing commercial and open source video quality metrics for HD constant bitrate videos. Finally, Dariusz Grabowski (AGH University) presented his work on comparing full-reference video quality metrics using cluster analysis.

Quality Assessment for Computer Vision Applications (QACoViA)

The QACoViA group announced Lu Zhang (INSA Rennes) as new third co-chair, who will also work in the near future in a project related to image compression for optimized recognition by distributed neural networks. In addition, Mikołaj Leszczuk (AGH University) presented a report on a recently finished project related to objective video quality assessment method for recognition tasks, in collaboration with Huawei through its Innovation Research Programme.

5G Key Performance Indicators (5GKPI)

The 5GKPI session was oriented to identify possible interested partners and joint works (e.g., contribution to ITU-T SG12 recommendation G.QoE-5G [14], generation of open/reference datasets, etc.). In this sense, it included four presentations of use cases of interest: tele-operated driving by Yungpeng Zang (5G Automotive Association), content production related to the European project 5G-Records by Paola Sunna (EBU), Augmented/Virtual Reality by Bill Krogfoss (Bell Labs Consulting), and QoE for remote controlled use cases by Kjell Brunnström (RISE).

Immersive Media Group (IMG)

A report on the updates within the IMG group was initially presented, especially covering the current joint work investigating the subjective quality assessment of 360-degree video. In particular, a cross-lab test, involving 10 different labs, were carried out at the beginning of 2020 resulting in relevant outcomes including various contributions to ITU SG12/Q13 and MPEG AhG on Quality of Immersive Media. It is worth noting that the new ITU-T recommendation P.919 [15], related to subjective quality assessment of 360-degree videos (in line with ITU-R BT.500 [8] or ITU-T P.910 [13]), was approved in mid-October, and was supported by the results of these cross-lab tests. 
Furthermore, since these tests have already finished, there was a presentation by Pablo Pérez (Nokia Bell-Labs) on possible future joint activities within IMG, which led to an open discussion after it that will continue in future audio calls.
In addition, a total of four talks covered topics related to immersive media technologies, including an update from the Audiovisual Technology Group of the TU Ilmenau on immersive media topics, and a presentation of a no-reference quality metric for light field content based on a structural representation of the epipolar plane image by Ali Ak and Patrick Le Callet (Université de Nantes) [16]. Also, there were two presentations related to 3D graphical contents, one addressing the perceptual characterization of 3D graphical contents based on visual attention patterns by Mona Abid (Université de Nantes), and another one comparing subjective methods for quality assessment of 3D graphics in virtual reality by Yana Nehmé (INSA Lyon). 

Intersector Rapporteur Group on Audiovisual Quality Assessment (IRG-AVQA) and Q19 Interim Meeting

Chulhee Lee (Yonsei University) chaired the IRG-AVQA session, providing an overview on the progress and recent works within ITU-R WP6C in HDR related topics and ITU-T SG12 Questions 9, 13, 14, 19 (e.g., P.NATS Phase 2 and follow-ups, subjective assessment of 360-degree video, QoE factors for AR applications, etc.). In addition, a new work item was announced within ITU-T SG9: End-to-end network characteristics requirements for video services (J.pcnp-char [17]).
From the discussions raised during this session, a new dedicated group was set up to work on introducing and provide guidelines on implementing objective video quality metrics, ahead of PSNR, to the video compression community. The group was named “Implementers Guide for Video Quality Metrics (IGVQM)” and will be chaired by Ioannis Katsavounidis (Facebook), accounting with the involvement of several people from VQEG.
After the IRG-AVQA session, the Q19 interim meeting took place with a report by Chulhee Lee and a presentation by Zhi Li (Netflix) on an update on improvements on subjective experiment data analysis process.

Other updates

Apart from the aforementioned groups, the Human Factors for Visual Experience (HVEI) is still active coordinating VQEG activities in liaison with the IEEE Standards Association Working Groups on HFVE, especially on perceptual quality assessment of 3D, UHD and HD contents, quality of experience assessment for VR and MR, quality assessment of light-field imaging contents, and deep-learning-based assessment of visual experience based on human factors. In this sense, there are ongoing contributions from VQEG members to IEEE Standards.
In addition, there was a workshop dedicated to user testing during Covid-19, which included a presentation on precaution for lab experiments by Kjell Brunnström (RISE), another presentation by Babak Naderi (TU Berlin) on subjective tests during the pandemic, and a break-out session for discussions on the topic.

Finally, the next VQEG plenary meeting will take place in spring 2021 (exact dates still to be agreed), probably online again.

References

[1] ITU-T Rec. P.1204. Video quality assessment of streaming services over reliable transport for resolutions up to 4K, 2020.
[2] ITU-T Rec. P.1204.3. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to full bitstream information, 2020.
[3] ITU-T Rec. P.1204.4. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to full and reduced reference pixel information, 2020.
[4] ITU-T Rec. P.1204.5. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to transport and received pixel information, 2020.
[5] A. Raake, S. Borer, S. Satti, J. Gustafsson, R.R.R. Rao, S. Medagli, P. List, S. Göring, D. Lindero, W. Robitza, G. Heikkilä, S. Broom, C. Schmidmer, B. Feiten, U. Wüstenhagen, T. Wittmann, M. Obermann, R. Bitto, “Multi-model standard for bitstream-, pixel-based and hybrid video quality assessment of UHD/4K: ITU-T P.1204”, IEEE Access, vol. 8, pp. 193020-193049, Oct. 2020.
[6] S.L. Regunathan, H. Wang, Y. Zhang, Y. R. Liu, D. Wolstencroft, S. Reddy, C. Stejerean, S. Gandhi, M. Chen, P. Sethi, A, Puntambekar, M. Coward, I. Katsavounidis, “Efficient measurement of quality at scale in Facebook video ecosystem”, in Applications of Digital Image Processing XLIII, vol. 11510, p. 115100J, Aug. 2020.
[7] W. Ellahi, T. Vigier and P. Le Callet, “HMM-Based Framework to Measure the Visual Fidelity of Tone Mapping Operators”, IEEE International Conference on Multimedia & Expo Workshops (ICMEW), London, United Kingdom, Jul. 2020.
[8] ITU-R Rec. BT.500-14. Methodology for the subjective assessment of the quality of television pictures, 2019.
[9] ITU-T Rec. P.913. Methods for the subjective assessment of video quality, audio quality and audiovisual quality of Internet video and distribution, 2016.
[10] ITU-T Rec. P.809. Subjective evaluation methods for gaming quality, 2018.
[11] ITU-T Rec. P.808. Subjective evaluation of speech quality with a crowdsourcing approach, 2018.
[12] ITU-T Rec. G.1072. Opinion model predicting gaming quality of experience for cloud gaming services, 2020.
[13] ITU-T Rec. P.910. Subjective video quality assessment methods for multimedia applications, 2008.
[14] ITU-T Rec. G.QoE-5G. QoE factors for new services in 5G networks, 2020 (under study).
[15] ITU-T Rec. P.919. Subjective test methodologies for 360º video on head-mounted displays, 2020.
[16] A. Ak, S. Ling and P. Le Callet, “No-Reference Quality Evaluation of Light Field Content Based on Structural Representation of The Epipolar Plane Image”, IEEE International Conference on Multimedia & Expo Workshops (ICMEW), London, United Kingdom, Jul. 2020.
[17] ITU-T Rec. J.pcnp-char. E2E Network Characteristics Requirement for Video Services, 2020 (under study).

JPEG Column: 89th JPEG Meeting

JPEG initiates standardisation of image compression based on AI

The 89th JPEG meeting was held online from 5 to 9 October 2020.

During this meeting, multiple JPEG standardisation activities and explorations were discussed and progressed. Notably, the call for evidence on learning-based image coding was successfully completed and evidence was found that this technology promises several new functionalities while offering at the same time superior compression efficiency, beyond the state of the art. A new work item, JPEG AI, that will use learning-based image coding as core technology has been proposed, enlarging the already wide families of JPEG standards.

Figure 1. JPEG Families of standards and JPEG AI.

The 89th JPEG meeting had the following highlights:

  • JPEG AI call for evidence report
  • JPEG explores standardization needs to address fake media
  • JPEG Pleno Point Cloud Coding reviews the status of the call for evidence
  • JPEG Pleno Holography call for proposals timeline
  • JPEG DNA identifies use cases and requirements
  • JPEG XL standard defines the final specification
  • JPEG Systems JLINK reaches committee draft stage
  • JPEG XS 2nd Edition Parts 1, 2 and 3.

JPEG AI

At the 89th meeting, the submissions to the Call for Evidence on learning-based image coding were presented and discussed. Four submissions were received in response to the Call for Evidence. The results of the subjective evaluation of the submissions to the Call for Evidence were reported and discussed in detail by experts. It was agreed that there is strong evidence that learning-based image coding solutions can outperform the already defined anchors in terms of compression efficiency when compared to state-of-the-art conventional image coding architecture. Thus, it was decided to create a new standardisation activity for a JPEG AI on learning-based image coding system, that applies machine learning tools to achieve substantially better compression efficiency compared to current image coding systems, while offering unique features desirable for efficient distribution and consumption of images. This type of approach should allow obtaining an efficient compressed domain representation not only for visualisation but also for machine learning-based image processing and computer vision. JPEG AI releases to the public the results of the objective and subjective evaluations as well as the first version of common test conditions for assessing the performance of learning-based image coding systems.

JPEG explores standardization needs to address fake media

Recent advances in media modification, particularly deep learning-based approaches, can produce near realistic media content that is almost indistinguishable from authentic content. These developments open opportunities for production of new types of media contents that are useful for many creative industries but also increase risks of spread of maliciously modified content (e.g., ‘deepfake’) leading to social unrest, spreading of rumours or encouragement of hate crimes. The JPEG Committee is interested in exploring if a JPEG standard can facilitate a secure and reliable annotation of media modifications, both in good faith and malicious usage scenarios. 

The JPEG is currently discussing with stakeholders from academia, industry and other organisations to explore the use cases that will define a roadmap to identify the requirements leading to a potential standard. The Committee has received significant interest and has released a public document outlining the context, use cases and requirements. JPEG invites experts and technology users to actively participate in this activity and attend a workshop, to be held online in December 2020. Details on the activities of JPEG in this area can be found on the JPEG.org website. Interested parties are notably encouraged to register to the mailing list of the ad hoc group that has been set up to facilitate the discussions and coordination on this topic.

JPEG Pleno Point Cloud Coding

JPEG Pleno is working towards the integration of various modalities of plenoptic content under a single and seamless framework. Efficient and powerful point cloud representation is a key feature within this vision. Point cloud data supports a wide range of applications including computer-aided manufacturing, entertainment, cultural heritage preservation, scientific research and advanced sensing and analysis. During the 89th JPEG meeting, the JPEG Committee reviewed expressions of interest in the Final Call for Evidence on JPEG Pleno Point Cloud Coding. This Call for Evidence focuses specifically on point cloud coding solutions supporting scalability and random access of decoded point clouds. Between its 89th and 90th meetings, the JPEG Committee will be actively promoting this activity and collecting submissions to participate in the Call for Evidence.

JPEG Pleno Holography

At the 89th meeting, the JPEG Committee released an updated draft of the Call for Proposals for JPEG Pleno Holography. A final Call for Proposals on JPEG Pleno Holography will be released in April 2021. JPEG Pleno Holography is seeking for compression solutions of holographic content. The scope of the activity is quite large and addresses diverse use cases such as holographic microscopy and tomography, but also holographic displays and printing. Current activities are centred around refining the objective and subjective quality assessment procedures. Interested parties are already invited at this stage to participate in these activities.

JPEG DNA

JPEG standards are used in storage and archival of digital pictures. This puts the JPEG Committee in a good position to address the challenges of DNA-based storage by proposing an efficient image coding format to create artificial DNA molecules. JPEG DNA has been established as an exploration activity within the JPEG Committee to study use cases, to identify requirements and to assess the state of the art in DNA storage for the purpose of image archival using DNA in order to launch a standardization activity. To this end, a first workshop was organised on 30 September 2020. Presentations made at the workshop are available from the following URL: http://ds.jpeg.org/proceedings/JPEG_DNA_1st_Workshop_Proceedings.zip.
At its 89th meeting, the JPEG Committee released a second version of a public document that describes its findings regarding storage of digital images using artificial DNA. In this framework, JPEG DNA ad hoc group was re-conducted in order to continue its activities to further refine the above-mentioned document and to organise a second workshop. Interested parties are invited to join this activity by participating in the AHG through the following URL: http://listregistration.jpeg.org.

JPEG XL

Final technical comments by national bodies have been addressed and incorporated into the JPEG XL specification (ISO/IEC 18181-1) and the reference implementation. A draft FDIS study text has been prepared and final validation experiments are planned.

JPEG Systems

The JLINK (ISO/IEC 19566-7) standard has reached the committee draft stage and will be made public.  The JPEG Committee invites technical feedback on the document which is available on the JPEG website.  Development of the JPEG Snack (IS0/IEC 19566-8) standard has begun to support the defined use cases and requirements.  Interested parties can subscribe to the mailing list of the JPEG Systems AHG in order to contribute to the above activities.

JPEG XS

The JPEG committee is finalizing its work on the 2nd Editions of JPEG-XS Part 1, Part 2 and Part 3. Part 1 defines new coding tools required to efficiently compress raw Bayer images. The observed quality gains of raw Bayer compression over compressing in the RGB domain can be as high as 5dB PSNR. Moreover, the second edition adds support for mathematically lossless image compression and allows compression of 4:2:0 sub-sampled images. Part 2 defines new profiles for such content. With the support for low-complexity high-quality compression of raw Bayer (or Color-Filtered Array) data, JPEG XS proves to also be an excellent compression scheme in the professional and consumer digital camera market, as well as in the machine vision and automotive industry.

Final Quote

“JPEG AI will be a new work item completing the collection of JPEG standards. JPEG AI relies on artificial intelligence to compress images. This standard not only will offer superior compression efficiency beyond the current state of the art but also will open new possibilities for vision tasks by machines and computational imaging for humans.” Said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Future JPEG meetings are planned as follows:

  • No 90, will be held online from January 18 to 22, 2021.
  • N0 91, will be held online from April 19 to 23, 2021.

MPEG Column: 132nd MPEG Meeting (virtual/online)

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 132nd MPEG meeting was the first meeting with the new structure. That is, ISO/IEC JTC 1/SC 29/WG 11 — the official name of MPEG under the ISO structure — was disbanded after the 131st MPEG meeting and some of the subgroups of WG 11 (MPEG) have been elevated to independent MPEG Working Groups (WGs) and Advisory Groups (AGs) of SC 29 rather than subgroups of the former WG 11. Thus, the MPEG community is now an affiliated group of WGs and AGs that will continue meeting together according to previous MPEG meeting practices and will further advance the standardization activities of the MPEG work program.

The 132nd MPEG meeting was the first meeting with the new structure as follows (incl. Convenors and position within WG 11 structure):

  • AG 2 MPEG Technical Coordination (Convenor: Prof. Jörn Ostermann; for overall MPEG work coordination and prev. known as the MPEG chairs meeting; it’s expected that one can also provide inputs to this AG without being a member of this AG)
  • WG 2 MPEG Technical Requirements (Convenor Dr. Igor Curcio; former Requirements subgroup)
  • WG 3 MPEG Systems (Convenor: Dr. Youngkwon Lim; former Systems subgroup)
  • WG 4 MPEG Video Coding (Convenor: Prof. Lu Yu; former Video subgroup)
  • WG 5 MPEG Joint Video Coding Team(s) with ITU-T SG 16 (Convenor: Prof. Jens-Rainer Ohm; former JVET)
  • WG 6 MPEG Audio Coding (Convenor: Dr. Schuyler Quackenbush; former Audio subgroup)
  • WG 7 MPEG Coding of 3D Graphics (Convenor: Prof. Marius Preda, former 3DG subgroup)
  • WG 8 MPEG Genome Coding (Convenor: Prof. Marco Mattaveli; newly established WG)
  • AG 3 MPEG Liaison and Communication (Convenor: Prof. Kyuheon Kim; (former Communications subgroup)
  • AG 5 MPEG Visual Quality Assessment (Convenor: Prof. Mathias Wien; former Test subgroup).

The 132nd MPEG meeting was held as an online meeting and more than 300 participants continued to work efficiently on standards for the future needs of the industry. As a group, MPEG started to explore new application areas that will benefit from standardized compression technology in the future. A new web site has been created and can be found at http://mpeg.org/.

The official press release can be found here and comprises the following items:

  • Versatile Video Coding (VVC) Ultra-HD Verification Test Completed and Conformance and Reference Software Standards Reach their First Milestone
  • MPEG Completes Geometry-based Point Cloud Compression (G-PCC) Standard
  • MPEG Evaluates Extensions and Improvements to MPEG-G and Announces a Call for Evidence on New Advanced Genomics Features and Technologies
  • MPEG Issues Draft Call for Proposals on the Coded Representation of Haptics
  • MPEG Evaluates Responses to MPEG IPR Smart Contracts CfP
  • MPEG Completes Standard on Harmonization of DASH and CMAF
  • MPEG Completes 2nd Edition of the Omnidirectional Media Format (OMAF)
  • MPEG Completes the Low Complexity Enhancement Video Coding (LCEVC) Standard

In this report, I’d like to focus on VVC, G-PCC, DASH/CMAF, OMAF, and LCEVC.

Versatile Video Coding (VVC) Ultra-HD Verification Test Completed and Conformance & Reference Software Standards Reach their First Milestone

MPEG completed a verification testing assessment of the recently ratified Versatile Video Coding (VVC) standard for ultra-high definition (UHD) content with standard dynamic range, as may be used in newer streaming and broadcast television applications. The verification test was performed using rigorous subjective quality assessment methods and showed that VVC provides a compelling gain over its predecessor — the High-Efficiency Video Coding (HEVC) standard produced in 2013. In particular, the verification test was performed using the VVC reference software implementation (VTM) and the recently released open-source encoder implementation of VVC (VVenC):

  • Using its reference software implementation (VTM), VVC showed bit rate savings of roughly 45% over HEVC for comparable subjective video quality.
  • Using VVenC, additional bit rate savings of more than 10% relative to VTM were observed, which at the same time runs significantly faster than the reference software implementation.

Additionally, the standardization work for both conformance testing and reference software for the VVC standard reached its first major milestone, i.e., progressing to the Committee Draft ballot in the ISO/IEC approval process. The conformance testing standard (ISO/IEC 23090-15) will ensure interoperability among the diverse applications that use the VVC standard, and the reference software standard (ISO/IEC 23090-16) will provide an illustration of the capabilities of VVC and a valuable example showing how the standard can be implemented. The reference software will further facilitate the adoption of the standard by being available for use as the basis of product implementations.

Research aspects: as for every new video codec, its compression efficiency and computational complexity are important performance metrics. While the reference software (VTM) provides a valid reference in terms of compression efficiency it is not optimized for runtime. VVenC seems to provide already a significant improvement and with x266 another open source implementation will be available soon. Together with AOMedia’s AV1 (including its possible successor AV2), we are looking forward to a lively future in the area of video codecs.

MPEG Completes Geometry-based Point Cloud Compression Standard

MPEG promoted its ISO/IEC 23090-9 Geometry-based Point Cloud Compression (G-PCC) standard to the Final Draft International Standard (FDIS) stage. G-PCC addresses lossless and lossy coding of time-varying 3D point clouds with associated attributes such as color and material properties. This technology is particularly suitable for sparse point clouds. ISO/IEC 23090-5 Video-based Point Cloud Compression (V-PCC), which reached the FDIS stage in July 2020, addresses the same problem but for dense point clouds, by projecting the (typically dense) 3D point clouds onto planes, and then processing the resulting sequences of 2D images using video compression techniques. The generalized approach of G-PCC, where the 3D geometry is directly coded to exploit any redundancy in the point cloud itself, is complementary to V-PCC and particularly useful for sparse point clouds representing large environments.

Point clouds are typically represented by extremely large amounts of data, which is a significant barrier to mass-market applications. However, the relative ease of capturing and rendering spatial information compared to other volumetric video representations makes point clouds increasingly popular for displaying immersive volumetric data. The current draft reference software implementation of a lossless, intra-frame G‐PCC encoder provides a compression ratio of up to 10:1 and lossy coding of acceptable quality for a variety of applications with a ratio of up to 35:1.

By providing high immersion at currently available bit rates, the G‐PCC standard will enable various applications such as 3D mapping, indoor navigation, autonomous driving, advanced augmented reality (AR) with environmental mapping, and cultural heritage.

Research aspects: the main research focus related to G-PCC and V-PCC is currently on compression efficiency but one should not dismiss its delivery aspects including its dynamic, adaptive streaming. A recent paper on this topic has been published in the IEEE Communications Magazine and is entitled “From Capturing to Rendering: Volumetric Media Delivery With Six Degrees of Freedom“.

MPEG Finalizes the Harmonization of DASH and CMAF

MPEG successfully completed the harmonization of Dynamic Adaptive Streaming over HTTP (DASH) with Common Media Application Format (CMAF) featuring a DASH profile for the use with CMAF (as part of the 1st Amendment of ISO/IEC 23009-1:2019 4th edition).

CMAF and DASH segments are both based on the ISO Base Media File Format (ISOBMFF), which per se enables smooth integration of both technologies. Most importantly, this DASH profile defines (a) a normative mapping of CMAF structures to DASH structures and (b) how to use Media Presentation Description (MPD) as a manifest format.
Additional tools added to this amendment include

  • DASH events and timed metadata track timing and processing models with in-band event streams,
  • a method for specifying the resynchronization points of segments when the segments have internal structures that allow container-level resynchronization,
  • an MPD patch framework that allows the transmission of partial MPD information as opposed to the complete MPD using the XML patch framework as defined in IETF RFC 5261, and
  • content protection enhancements for efficient signalling.

It is expected that the 5th edition of the MPEG DASH standard (ISO/IEC 23009-1) containing this change will be issued at the 133rd MPEG meeting in January 2021. An overview of DASH standards/features can be found in the Figure below.

Research aspects: one of the features enabled by CMAF is low latency streaming that is actively researched within the multimedia systems community (e.g., here). The main research focus has been related to the ABR logic while its impact on the network is not yet fully understood and requires strong collaboration among stakeholders along the delivery path including ingest, encoding, packaging, (encryption), content delivery network (CDN), and consumption. A holistic view on ABR is needed to enable innovation and the next step towards the future generation of streaming technologies (https://athena.itec.aau.at/).

MPEG Completes 2nd Edition of the Omnidirectional Media Format

MPEG completed the standardization of the 2nd edition of the Omnidirectional MediA Format (OMAF) by promoting ISO/IEC 23009-2 to Final Draft International Standard (FDIS) status including the following features:

  • “Late binding” technologies to deliver and present only that part of the content that adapts to the dynamically changing users’ viewpoint. To enable an efficient implementation of such a feature, this edition of the specification introduces the concept of bitstream rewriting, in which a compliant bitstream is dynamically generated that, by combining the received portions of the bitstream, covers only the users’ viewport on the client.
  • Extension of OMAF beyond 360-degree video. This edition introduces the concept of viewpoints, which can be considered as user-switchable camera positions for viewing content or as temporally contiguous parts of a storyline to provide multiple choices for the storyline a user can follow.
  • Enhances the use of video, image, or timed text overlays on top of omnidirectional visual background video or images related to a sphere or a viewport.

Research aspects: standards usually define formats to enable interoperability but various informative aspects are left open for industry competition and subject to research and development. The same holds for OMAF and its 2nd edition enables researchers and developers to work towards efficient viewport-adaptive implementations focusing on the users’ viewport.

MPEG Completes the Low Complexity Enhancement Video Coding Standard

MPEG is pleased to announce the completion of the new ISO/IEC 23094-2 standard, i.e., Low Complexity Enhancement Video Coding (MPEG-5 Part 2 LCEVC), which has been promoted to Final Draft International Standard (FDIS) at the 132nd MPEG meeting.

  • LCEVC adds an enhancement data stream that can appreciably improve the resolution and visual quality of reconstructed video with an effective compression efficiency of limited complexity by building on top of existing and future video codecs.
  • LCEVC can be used to complement devices originally designed only for decoding the base layer bitstream, by using firmware, operating system, or browser support. It is designed to be compatible with existing video workflows (e.g., CDNs, metadata management, DRM/CA) and network protocols (e.g., HLS, DASH, CMAF) to facilitate the rapid deployment of enhanced video services.
  • LCEVC can be used to deliver higher video quality in limited bandwidth scenarios, especially when the available bit rate is low for high-resolution video delivery and decoding complexity is a challenge. Typical use cases include mobile streaming and social media, and services that benefit from high-density/low-power transcoding.

Research aspects: LCEVC provides a kind of scalable video coding by combining hardware- and software-based decoders that allow for certain flexibility as part of regular software life cycle updates. However, LCEVC has been never compared to Scalable Video Coding (SVC) and Scalable High-Efficiency Video Coding (SHVC) which could be an interesting aspect for future work.

The 133rd MPEG meeting will be again an online meeting in January 2021.

Click here for more information about MPEG meetings and their developments.

VQEG Column: Recent contributions to ITU recommendations

Welcome to the second column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG).
VQEG plays a major role in research and the development of standards on video quality and this column presents examples of recent contributions to International Telecommunication Union (ITU) recommendations, as well as ongoing contributions to recommendations to come in the near future. In addition, the formation of a new group within VQEG addressing Quality Assessment for Health Applications (QAH) has been announced.  

VQEG website: www.vqeg.org
Authors: 
Jesús Gutiérrez (jesus.gutierrez@upm.es), Universidad Politécnica de Madrid (Spain)
Kjell Brunnström (kjell.brunnstrom@ri.se), RISE (Sweden) 
Thanks to Lucjan Janowski (AGH University of Science and Technology), Alexander Raake (TU Ilmenau) and Shahid Satti (Opticom) for their help and contributions.

Introduction

VQEG is an international and independent organisation that provides a forum for technical experts in perceptual video quality assessment from industry, academia, and standardization organisations. Although VQEG does not develop or publish standards, several activities (e.g., validation tests, multi-lab test campaigns, objective quality models developments, etc.) carried out by VQEG groups have been instrumental in the development of international recommendations and standards. VQEG contributions have been mainly submitted to relevant ITU Study Groups (e.g., ITU-T SG9, ITU-T SG12, ITU-R WP6C), but also to other standardization bodies, such as MPEG, ITU-R SG6, ATIS, IEEE P.3333 and P.1858, DVB, and ETSI. 

In our first column on the ACM SIGMM Records we provided a table summarizing the several VQEG studies that have resulted in ITU Recommendations. In this new column, we describe with more detail the last contributions to recent ITU standards, and we provide an insight on the ongoing contributions that may result in ITU recommendations in the near future.

ITU Recommendations with recent inputs from VQEG

ITU-T Rec. P.1204 standard series

A campaign within the ITU-T Study Group (SG) 12 (Question 14) in collaboration with the VQEG AVHD group resulted in the development of three new video quality model standards for the assessment of sequences of up to UHD/4K resolution. This campaign was carried out during more than two years under the project “AVHD-AS / P.NATS Phase 2”. While “P.NATS Phase 1” (finalized in 2016 and resulting in the standards series ITU-T Rec. P.1203, P.1203.1, P.1203.2 and P.1203.3) addressed the development of improved bitstream-based models for the prediction of the overall quality of long (1-5 minutes) video streaming sessions, the second phase addressed the development of short-term video quality models covering a wider scope with bitstream-based, pixel-based and hybrid models. The P.NATS Phase 2 project was executed as a competition between nine participating institutions in different tracks resulting in the aforementioned three types of video quality models. 

For the competition, a total of 26 databases were created, 13 used for training and 13 for validation and selection of the winning models. In order to establish the ground truth, subjective video quality tests were performed on four different display devices (PC-monitors, 55-75” TVs, mobile, and tablet) with at least 24 subjects each and using the 5-point Absolute Category Rating (ACR) scale. In total, about 5000 test sequences with a duration of around 8 seconds were evaluated, containing a variety of resolutions, encoding configurations, bitrates, and framerates using the codecs H.264/AVC, H.265/HEVC and VP9.   

More details about the whole workflow and results of the competition can be found in [1]. As a result of this competition, the new standard series ITU-T Rec. P.1204 [2] has been recently published, including a bitstream-based model  (ITU-T Rec. P.1204.3 [3]), a pixel-based model (ITU-T Rec. P.1204.4 [4]) and a hybrid model (ITU-T Rec. P.1204.5 [5]).

ITU-T Rec. P.1401

ITU-T Rec. P.1401 [6] is about statistical analysis, evaluation and reporting guidelines of quality measurements and was recently revised in January 2020.  Based on the article by Brunnström and Barkowsky [7], it was recognized and pointed out by VQEG that this Recommendation, which is very useful, lacked a section on the topic of multiple comparisons and its potential impact on the performance evaluations of objective quality methods. In the latest revision, Section 7.6.5 covers this topic.

Ongoing VQEG Inputs to ITU Recommendations

ITU-T Rec. P.919

ITU has been working on a recommendation for subjective test methodologies for 360º video on Head-Mounted Displays (HMDs), under the SG12 Question 13 (Q13). The Immersive Media Group (IMG) of the VQEG has collaborated in this effort through the fulfilment of the Phase 1 of the Test Plan for Quality Assessment of 360-degree Video. In particular, the Phase 1 of this test plan addresses the assessment of short sequences (less than 30 seconds), in the spirit of ITU-R BT.500 [8] and ITU-T P.910 [9]. In this sense, the evaluation of audiovisual quality and simulator sickness was considered. On the other hand, the Phase 2 of the test plan (envisioned for the near future) covers the assessment of other factors that can be more influential with longer sequences (several minutes), such as immersiveness and presence.  

Therefore, within Phase 1 the IMG designed and executed a cross-lab test with the participation of ten international laboratories, from AGH University of Science and Technology (Poland), Centrum Wiskunde & Informatica (The Netherlands), Ghent University (Belgium), Nokia Bell-Labs (Spain), Roma TRE University (Italy), RISE Acreo (Sweden), TU Ilmenau (Germany), Universidad Politécnica de Madrid (Spain), University of Surrey (England), Wuhan University (China). 

This test was aimed at assessing and validating subjective evaluation methodologies for 360º video. Thus, the single-stimulus methodology Absolute Category Rating (ACR) and the double-stimulus Degradation Category Rating (DCR) were considered to evaluate audiovisual quality of 360º videos distorted with uniform and non-uniform degradations.  In particular, different configurations of uniform and tile-based coding were applied to eight video sources with different spatial, temporal and exploration properties. Other influence factors were also studied, such as the influence of the sequence duration (from 10 to 30s) and the test setup (considering different HMDs and methods to collect the observers’ ratings, using audio or not, etc.).  Finally, in addition to the evaluation of audiovisual quality, the assessment of simulator sickness symptoms was addressed studying the use of different questionnaires. As a result of this work, the IMG of VQEG presented two contributions to the recommendation ITU-T Rec. P.919 (ex P.360-VR), which has been consented in the last SG12 meeting (7-11 September 2020) and is envisioned to be published soon. In addition, the results and the annotated dataset coming from the cross-lab test will be published soon.

ITU-T Rec. P.913

Another upcoming contribution is prepared by the Statistical Analysis Group (SAM). The main goal of the proposal is to increase the precision of the subjective experiment analysis by describing a subjective answer as a random variable. The random variable is described by three key influencing factors, the sequence quality, a subject bias, and a subject precision. It is further development of the ITU-T P.913 [10] recommendation where subject bias was introduced. Adding subject precision allows for two achievements: Better handling unreliable subjects and easier estimation procedure. 

Current standards describe a way to remove an unreliable subject. The problem is that the methods proposed in BT.500 [8] and P.913 [10] are different and point to different subjects. Also, both methods have some arbitrary parameters (e.g., thresholds) deciding when a subject should be removed. It means that two subjects can be similarly imprecise but one is over the threshold, and we accept all his answers as correct and the other is under the threshold, and we remove her all answers. The proposed method weights the impact of each subject answer depending on the subject precision. As the consequence, each subject is to some extent removed and kept. The balance between how much information we keep and how much we remove depends on the subject precision. 

The estimation procedure of the proposed model, described in the literature, is MLE (Maximum Likelihood Estimation). Such estimation is computationally costly and needs a careful setup to obtain a reliable solution. Therefore, we proposed Alternating Projection (AP) solver which is less general than MLE but works as well as MLE for the subject model estimation. This solver is called “alternating projection” because, in a loop, we alternate between projecting (or averaging) the opinion scores along the subject dimension and the stimulus dimension. It increases the precision of the obtained model parameters’ step by step weighting more information coming from the more precise subjects. More details can be found in the white paper in [11].

Other updates 

A new VQEG group has been recently established related to Quality Assessment for Health Applications (QAH), with the motivation to study visual quality requirements for medical imaging and telemedicine. The main goals of this new group are:

  • Assemble all the existing publicly accessible databases on medical quality.
  • Develop databases with new diagnostic tasks and new objective quality assessment models.
  • Provide methodologies, recommendations and guidelines for subjective test of medical image quality assessment.
  • Study the quality requirements and Quality of Experience in the context of telemedicine and other telehealth services.

For any further questions or expressions of interest to join this group, please contact QAH Chair Lu Zhang (lu.ge@insa-rennes.fr), Vice Chair Meriem Outtas (Meriem.Outtas@insa-rennes.fr), and Vice Chair Hantao Liu (hantao.liu@cs.cardiff.ac.uk).

References

[1] A. Raake, S. Borer, S. Satti, J. Gustafsson, R.R.R. Rao, S. Medagli, P. List, S. Göring, D. Lindero, W. Robitza, G. Heikkilä, S. Broom, C. Schmidmer, B. Feiten, U. Wüstenhagen, T. Wittmann, M. Obermann, R. Bitto, “Multi-model standard for bitstream-, pixel-based and hybrid video quality assessment of UHD/4K: ITU-T P.1204” , IEEE Access, 2020 (Available online soon).   
[2] ITU-T Rec. P.1204. Video quality assessment of streaming services over reliable transport for resolutions up to 4K. Geneva, Switzerland: ITU, 2020.
[3] ITU-T Rec. P.1204.3. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to full bitstream information. Geneva, Switzerland: ITU, 2020.
[4] ITU-T Rec. P.1204.4. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to full and reduced reference pixel information. Geneva, Switzerland: ITU, 2020.
[5] ITU-T Rec. P.1204.5. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to transport and received pixel information. Geneva, Switzerland: ITU, 2020.
[6] ITU-T Rec. P.1401. Methods, metrics and procedures for statistical evaluation, qualification and comparison of objective quality prediction models. Geneva, Switzerland: ITU, 2020.
[7] K. Brunnström and M. Barkowsky, “Statistical quality of experience analysis: on planning the sample size and statistical significance testing”, Journal of Electronic Imaging, vol. 27, no. 5,  p. 11, Sep. 2018 (DOI: 10.1117/1.JEI.27.5.053013).
[8] ITU-R Rec. BT.500-14. Methodology for the subjective assessment of the quality of television pictures. Geneva, Switzerland: ITU, 2019.
[9]  ITU-T Rec. P.910. Subjective video quality assessment methods for multimedia applications. Geneva, Switzerland: ITU, 2008.
[10] ITU-T Rec. P.913. Methods for the subjective assessment of video quality, audio quality and audiovisual quality of Internet video and distribution quality television in any environment. Geneva, Switzerland: ITU, 2016.
[11] Z. Li, C. G. Bampis, L. Janowski, I. Katsavounidis, “A simple model for subject behavior in subjective experiments”, arXiv:2004.02067, Apr. 2020.

JPEG Column: 88th JPEG Meeting

The 88th JPEG meeting initially planned to be held in Geneva, Switzerland, was held online because of the Covid-19 outbreak.

JPEG experts organised a large number of sessions spread over day and night to allow the remote participation of multiple time zones. A very intense activity has resulted in multiple outputs and initiatives. In particular two new explorations activities were initiated. The first explores possible standardisation needs to address the growing emergence of fake media by introducing appropriate security features to prevent the misuse of media content. The latest, considers the use of DNA for media content archival.

Furthermore, JPEG has started the work on the new part 8 of the JPEG Systems standard, called JPEG snack, for interoperable rich image experiences, and it is holding two Call for Evidence, JPEG AI and JPEG Pleno Point cloud coding.

Despite travel restrictions, JPEG Committee has managed to keep up with the majority of its plans, defined prior to the COVID-19 outbreak. An overview of the different activities is represented in Fig. 1.

Figure 1 – JPEG Planned Timeline.

The 88th JPEG meeting had the following highlights:

  • JPEG explores standardization needs to address fake media
  • JPEG Pleno Point Cloud call for evidence
  • JPEG DNA – based archival of media content using DNA
  • JPEG AI call for evidence
  • JPEG XL standard evolves to a final specification
  • JPEG Systems part 8, named JPEG Snack progress
  • JPEG XS Part-1 2nd Edition first ballot.

JPEG explores standardization needs to address fake media

Recent advances in media manipulation, particularly deep learning-based approaches, can produce near realistic media content that is almost indistinguishable from authentic content to the human eye. These developments open opportunities for production of new types of media contents that are useful for the entertainment industry and other business usage, e.g., creation of special effects or artificial natural scene production with actors in the studio. However, this also leads to issues relating to fake media generation undermining the integrity of the media (e.g., deepfakes), copyright infringements and defamation to mention a few examples. Misuse of manipulated media can cause social unrest, spread rumours for political gain or encourage hate crimes. In this context, the term ‘fake’ is used here to refer to any manipulated media, independently of its ‘good’ or ‘bad’ intention.

In many application domains, fake media producers may want or may be required to declare the type of manipulations performed, in opposition to other situations where the intention is to ‘hide’ the mere existence of such manipulations. This is already leading various Governmental organizations to plan new legislation or companies (especially social media platforms or news outlets) to develop mechanisms that would clearly detect and annotate manipulated media contents when they are shared. While growing efforts are noticeable in developing technologies, there is a need to have a standard for the media/metadata format, e.g., a JPEG standard that facilitates a secure and reliable annotation of fake media, both in good faith and malicious usage scenarios. To better understand the fake media ecosystem and needs in terms of standardization, the JPEG Committee has initiated an in-depth analysis of fake media use cases, naturally independently of the “intentions”.     

More information on the initiative is available on the JPEG website. Interested parties are invited to join the above AHG through the following URL: http://listregistration.jpeg.org.

JPEG Pleno Point Cloud

JPEG Pleno is working towards the integration of various modalities of plenoptic content under a single and seamless framework. Efficient and powerful point cloud representation is a key feature within this vision. Point cloud data supports a wide range of applications including computer-aided manufacturing, entertainment, cultural heritage preservation, scientific research and advanced sensing and analysis. During the 88th JPEG meeting, the JPEG Committee released a Final Call for Evidence on JPEG Pleno Point Cloud Coding that focuses specifically on point cloud coding solutions supporting scalability and random access of decoded point clouds. Between the 88th and 89th meetings, the JPEG Committee will be actively promoting this activity and collecting registrations to participate in the Call for Evidence.

JPEG DNA

In digital media information, notably images, the relevant representation symbols, e.g. quantized DCT coefficients, are expressed in bits (i.e., binary units) but they could be expressed in any other units, for example the DNA units which follow a 4-ary representation basis. This would mean that DNA molecules may be created with a specific DNA units’ configuration which stores some media representation symbols, e.g. the symbols of a JPEG image, thus leading to DNA-based media storage as a form of molecular data storage. JPEG standards have been used in storage and archival of digital pictures as well as moving images. While the legacy JPEG format is widely used for photo storage in SD cards, as well as archival of pictures by consumers,  JPEG 2000 as described in ISO/IEC 15444 is used in many archival applications, notably for preservation of cultural heritage in form of visual data as pictures and video in digital format. This puts the JPEG Committee in a unique position to address the challenges in DNA-based storage by creating a standard image representation and coding for such applications. To explore the latter, an AHG has been established. Interested parties are invited to join the above AHG through the following URL: http://listregistration.jpeg.org.

JPEG AI

At the 88th meeting, the submissions to the Call for Evidence were reported and analysed. Six submissions were received in response to the Call for Evidence made in coordination with the IEEE MMSP 2020 Challenge. The submissions along with the anchors were already evaluated using objective quality metrics. Following this initial process, subjective experiments have been designed to compare the performance of all submissions. Thus, during this meeting, the main focus of JPEG AI was on the presentation and discussion of the objective performance evaluation of all submissions as well as the definition of the methodology for the subjective evaluation that will be made next.

JPEG XL

The standardization of the JPEG XL image coding system is nearing completion. Final technical comments by national bodies have been received for the codestream (Part 1); the DIS has been approved and an FDIS text is under preparation. The container file format (Part 2) is progressing to the DIS stage. A white paper summarizing key features of JPEG XL is available at http://ds.jpeg.org/whitepapers/jpeg-xl-whitepaper.pdf.

JPEG Systems

ISO/IEC has approved the JPEG Snack initiative to deliver interoperable rich image experiences.  As a result, the JPEG Systems Part 8 (ISO/IEC 19566-8) has been created to define the file format construction and the metadata signalling and descriptions which enable animation with transition effects.  A Call for Participation and updated use cases and requirements have been issued. The CfP and the use cases and requirements documents are available at http://ds.jpeg.org/documents/wg1n87035-REQ-JPEG_Snack_Use_Cases_and_Requirements_v2_2.pdf and http://ds.jpeg.org/documents/wg1n88032-SI-CfP_JPEG_Snack.pdf respectively.

An updated working draft for the JLINK initiative was completed.  Interest parties are encouraged to review the JLINK Working Draft 3.0 available at http://ds.jpeg.org/documents/wg1n88031-SI-JLINK_WD_3_0.pdf

JPEG XS

The JPEG committee is pleased to announce a significant step in the standardization of an efficient Bayer image compression scheme, with the first ballot of the 2nd Edition of JPEG XS Part-1.

The new edition of this visually lossless low-latency and lightweight compression scheme now includes image sensor coding tools allowing efficient compression of Color-Filtered Array (CFA) data. This compression enables better quality and lower complexity than the corresponding compression in the RGB domain.  It can be used as a mezzanine codec in various markets such as real-time video storage in and outside of cameras, and data compression onboard autonomous cars.

Final Quote

“Fake Media has become a challenge with the wide-spread manipulated contents in the news. JPEG is determined to mitigate this problem by providing standards that can securely identify manipulated contents.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Future JPEG meetings are planned as follows:

  • No 89, will be held online from October 5 to 9, 2020.