The 143rd MPEG meeting took place in person in Geneva, Switzerland. The official press release can be accessed here and includes the following details:
MPEG finalizes the Carriage of Uncompressed Video and Images in ISOBMFF
MPEG reaches the First Milestone for two ISOBMFF Enhancements
MPEG ratifies Third Editions of VVC and VSEI
MPEG reaches the First Milestone of AVC (11th Edition) and HEVC Amendment
MPEG Genomic Coding extended to support Joint Structured Storage and Transport of Sequencing Data, Annotation Data, and Metadata
MPEG completes Reference Software and Conformance for Geometry-based Point Cloud Compression
We have adjusted the press release to suit the audience of ACM SIGMM and emphasized research on video technologies. This edition of the MPEG column centers around ISOBMFF and video codecs. As always, the column will conclude with an update on MPEG-DASH.
ISOBMFF Enhancements
The ISO Base Media File Format (ISOBMFF) supports the carriage of a wide range of media data such as video, audio, point clouds, haptics, etc., which has now been further extended to uncompressed video and images.
ISO/IEC 23001-17 – Carriage of uncompressed video and images in ISOBMFF – specifies how uncompressed 2D image and video data is carried in files that comply with the ISOBMFF family of standards. This encompasses a range of data types, including monochromatic and colour data, transparency (alpha) information, and depth information. The standard enables the industry to effectively exchange uncompressed video and image data while utilizing all additional information provided by the ISOBMFF, such as timing, color space, and sample aspect ratio for interoperable interpretation and/or display of uncompressed video and image data.
ISO/IEC 14496-15 (based on ISOBMFF) provides the basis for “network abstraction layer (NAL) unit structured video coding formats” such as AVC, HEVC, and VVC. The current version is the 6th edition, which has been amended to support neural-network post-filter supplemental enhancement information (SEI) messages. This amendment defines the carriage of the neural-network post-filter characteristics (NNPFC) SEI messages and the neural-network post-filter activation (NNPFA) SEI messages to enable the delivery of (i) a base post-processing filter and (ii) a series of neural network updates synchronized with the input video pictures/frames.
Research aspects: While the former, the carriage of uncompressed video and images in ISOBMFF, seems to be something obvious to be supported within a file format, the latter enables to use neural network-based post-processing filters to enhance video quality after the decoding process, which is an active field of research. The current extensions with the file format provide a baseline for the evaluation (cf. also next section).
Video Codec Enhancements
MPEG finalized the specifications of the third editions of the Versatile Video Coding (VVC, ISO/IEC 23090-3) and the Versatile Supplemental Enhancement Information (VSEI, ISO/IEC 23002-7) standards. Additionally, MPEG issued the Committee Draft (CD) text of the eleventh edition of the Advanced Video Coding (AVC, ISO/IEC 14496-10) standard and the Committee Draft Amendment (CDAM) text on top of the High Efficiency Video Coding standard (HEVC, ISO/IEC 23008-2).
These SEI messages include two systems-related SEI messages, (a) one for signaling of green metadata as specified in ISO/IEC 23001-11 and (b) the other for signaling of an alternative video decoding interface for immersive media as specified in ISO/IEC 23090-13. Furthermore, the neural network post-filter characteristics SEI message and the neural-network post-processing filter activation SEI message have been added to AVC, HEVC, and VVC.
The two SEI messages for describing and activating post-filters using neural network technology in video bitstreams could, for example, be used for reducing coding noise, spatial and temporal upsampling (i.e., super-resolution and frame interpolation), color improvement, or general denoising of the decoder output. The description of the neural network architecture itself is based on MPEG’s neural network representation standard (ISO/IEC 15938 17). As results from an exploration experiment have shown, neural network-based post-filters can deliver better results than conventional filtering methods. Processes for invoking these new post-filters have already been tested in a software framework and will be made available in an upcoming version of the VVC reference software (ISO/IEC 23090-16).
Research aspects: SEI messages for neural network post-filters (NNPF) for AVC, HEVC, and VVC, including systems supports within the ISOBMFF, is a powerful tool(box) for interoperable visual quality enhancements at the client. This tool(box) will (i) allow for Quality of Experience (QoE) assessments and (ii) enable the analysis thereof across codecs once integrated within the corresponding reference software.
MPEG-DASH Updates
The current status of MPEG-DASH is depicted in the figure below:
The latest edition of MPEG-DASH is the 5th edition (ISO/IEC 23009-1:2022) which is publicly/freely available here. There are currently three amendments under development:
ISO/IEC 23009-1:2022 Amendment 1: Preroll, nonlinear playback, and other extensions. This amendment has been ratified already and is currently being integrated into the 5th edition of part 1 of the MPEG-DASH specification.
ISO/IEC 23009-1:2022 Amendment 2: EDRAP streaming and other extensions. EDRAP stands for Extended Dependent Random Access Point and at this meeting the Draft Amendment (DAM) has been approved. EDRAP increases the coding efficiency for random access and has been adopted within VVC.
ISO/IEC 23009-1:2022 Amendment 3: Segment sequences for random access and switching. This amendment is at Committee Draft Amendment (CDAM) stage, the first milestone of the formal standardization process. This amendment aims at improving tune-in time for low latency streaming.
Additionally, MPEG Technologies under Consideration (TuC) comprises a few new work items, such as content selection and adaptation logic based on device orientation and signalling of haptics data within DASH.
Finally, part 9 of MPEG-DASH — redundant encoding and packaging for segmented live media (REAP) — has been promoted to Draft International Standard (DIS). It is expected to be finalized in the upcoming meetings.
Research aspects: Random access has been extensively evaluated in the context of video coding but not (low latency) streaming. Additionally, the TuC item related to content selection and adaptation logic based on device orientation raises QoE issues to be further explored.
The 144th MPEG meeting will be held in Hannover from October 16-20, 2023. Click here for more information about MPEG meetings and their developments.
JPEG Trust on a mission to re-establish trust in digital media
The 99th JPEG meeting was held online, from 24th to 28th April 2023.
Providing tools suitable for establishing provenance, authenticity and ownership of multimedia content is one of the most difficult challenges faced nowadays, considering the technological models that allow effective multimedia data manipulation and generation. As in the past, the JPEG Committee is again answering the emerging challenges in multimedia. JPEG Trust is a standard offering solutions to media authenticity, provenance and ownership.
Furthermore, learning-based coding standards, JPEG AI and JPEG Pleno Learning-based Point Cloud Coding, continue their development. New verification models that incorporate the technological developments resulting from verification experiments and contributions have been approved.
Also relevant, the responses to the Calls for Contributions on standardization of quality models of JPEG AIC and JPEG Pleno Light Field Quality Assessment received responses and started a collaborative process to define new standards.
The 99th JPEG meeting had the following highlights:
Trust, Authenticity and Provenance.
New JPEG Trust international standard targets media authenticity
JPEG AI new verification model
JPEG DNA releases its call for proposals
JPEG Pleno Light Field Quality Assessment analyses the response to the call for contributions
JPEG AIC analyses the response to the call for contributions
JPEG XE identifies use cases and requirements for event based vision
JPEG Systems: JUMBF second edition is progressing to publication stage
JPEG NFT prepares a call for proposals
JPEG XS progress for its third edition
The following summarizes the major achievements during the 99th JPEG meeting.
New JPEG Trust international standard targets media authenticity
Drawing reliable conclusions about the authenticity of digital media is complicated, and becoming more so as AI-based synthetic media such as Deep Fakes and Generative Adversarial Netwodrks (GANs) start appearing. Consumers of social media are challenged to assess the trustworthiness of the media they encounter, and agencies that depend on the authenticity of media assets must be concerned with mistaking fake media for real, with risks of real-world consequences.
To address this problem and to provide leadership in global interoperable media asset authenticity, JPEG initiated development of a new international standard: JPEG Trust. JPEG Trust defines a framework for establishing trust in media. This framework adresses aspects of authenticity, provenance and integrity through secure and reliable annotation of media assets throughout their life cycle. The first part, “Core foundation”, defines the JPEG Trust framework and provides building blocks for more elaborate use cases. It is expected that the standard will evolve over time and be extended with additional specifications.
JPEG Trust arises from a four-year exploration of requirements for addressing mis- and dis-information in online media, followed by a 2022 Call for Proposals, conducted by international experts from industry and academia from all over the world.
The new standard is expected to be published in 2024. To stay updated on JPEG Trust, please regularly check the JPEG website for the latest information.
JPEG AI
The JPEG AI activity progressed at this meeting with more than 60 technical contributions submitted for improvements and additions to the Verification Model (VM), which after some discussion and analysis, resulted in several adoptions for integration into the future VM3.0. These adoptions target the speed-up of the decoding process, namely the replacement of the range coder by an asymmetric numeral system, support for multi-threading or/and single instruction multiple data operations, and parallel decoding with sub-streams. The JPEG AI context module was significantly accelerated with a new network architecture along with other synthesis transform and entropy decoding network simplifications. Moreover, a lightweight model was also adopted targeting mobile devices, providing 10%-15% compression efficiency gains over VVC Intra at just 20-30 kMAC/pxl. In this context, JPEG AI will start the development and evaluation of two JPEG AI VM configurations at two different operating points: lightweight and high.
At the 99th meeting, the JPEG AI requirements were reviewed and it was concluded that most of the key requirements will be achieved by the previously anticipated timeline for DIS (scheduled for Oct. 2023) and thus version 1 of the JPEG AI standard will go as planned without changes in its timeline and with a clear focus on image reconstruction. Some core requirements, such as those addressing computer vision and image processing tasks as well as progressive decoding, will be addressed in a version 2 along with other tools that further improve requirements already addressed in version 1, such as better compression efficiency.
JPEG Pleno Learning-based Point Cloud coding
The JPEG Pleno Point Cloud activity progressed at this meeting with a major improvement to its VM providing improved performance and control over the balance between the coding of geometry and colour via a split geometry and colour coding framework. Colour attribute information is encoded using JPEG AI resulting in enhanced performance and compatibility with the ecosystem of emerging high-performance JPEG codecs. Prior to the 100th JPEG Meeting, JPEG experts will investigate possible advancements to the VM in the areas of attention models, sparse tensor convolution, and support for residual lossless coding.
JPEG DNA
The JPEG Committee has been working on an exploration for coding of images in quaternary representations particularly suitable for image archival on DNA storage. The scope of JPEG DNA is the creation of a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers. During the 99th JPEG meeting, a final call for proposals for JPEG DNA was issued and made public, as a first concrete step towards standardization.
The final call for proposals for JPEG DNA is complemented by a JPEG DNA Common Test Conditions document which is also made public, describing details about the dataset, operating points, anchors and performance assessment methodologies and metrics that will be used to evaluate anchors and future proposals to be submitted. A set of exploration studies has validated the procedures outlined in the final call for proposals for JPEG DNA. The deadline for submission of proposals to the Call for Proposals for JPEG DNA is 2 October 2023, with a pre-registration due by 10 July 2023. The JPEG DNA international standard is expected to be published by early 2025.
JPEG Pleno Light Field Quality Assessment
At the 99th JPEG meeting two contributions were received in response to the JPEG Pleno Final Call for Contributions (CfC) on Subjective Light Field Quality Assessment.
Contribution 1: presents a 3-step subjective quality assessment framework, with a pre-processing step; a scoring step; and a data processing step. The contribution includes a software implementation of the quality assessment framework.
Contribution 2: presents a multi-view light field dataset, comprising synthetic light fields. It provides RGB + ground-truth depth data, realistic and challenging blender scenes, with various textures, fine structures, rich depth, specularities, non-Lambertian areas, and difficult materials (water, patterns, etc).
The received contributions will be considered in the development of a modular framework based on a collaborative process addressing the use cases and requirements under the JPEG Pleno Quality Assessment of light fields standardization effort.
JPEG AIC
Three contributions in response to the JPEG Call for Contributions (CfC) on Subjective Image Quality Assessment were received at the 99th JPEG meeting. One contribution presented a new subjective quality assessment methodology that combines relative and absolute data. The second contribution reported a new subjective quality assessment methodology based on triplet comparison with boosting techniques. Finally, the last contribution reported a new pairwise sampling methodology.
These contributions will be considered in the development of the standard, following a collaborative process. Several core experiments were designed to assist the creation of a Working Draft (WD) for the future JPEG AIC Part 3 standard.
JPEG XE
The JPEG committee continued with the exploration activity on Event-based Vision, called JPEG XE. Event-based Vision revolves around a new and emerging image modality created by event-based visual sensors. At this meeting, the scope was defined to be the creation and development of a standard to represent events in an efficient way allowing interoperability between sensing, storage, and processing, targeting machine vision applications. Events in the context of this standard are defined as the messages that signal the result of an observation at a precise point in time, typically triggered by a detected change in the physical world. The exploration activity is currently working on the definition of the use cases and requirements.
The second editions of JPEG XL Part 1 (Core coding system) and Part 2 (File format) have proceeded to the DIS stage. These second editions provide clarifications, corrections and editorial improvements that will facilitate independent implementations. Experiments are planned to prepare for a second edition of JPEG XL Part 3 (Conformance testing), including conformance testing of the independent implementations J40, jxlatte, and jxl-oxide.
JPEG Systems
The second edition of JUMBF (JPEG Universal Metadata Box Format, ISO/IEC 19566-5) is progressing to the IS publication stage; the second edition brings new capabilities and support for additional types of media.
JPEG NFT
Many Non-Fungible Tokens (NFTs) point to assets represented in JPEG formats or can be represented in current and emerging formats under development by the JPEG Committee. However, various trust and security concerns have been raised about NFTs and the digital assets on which they rely. To better understand user requirements for media formats, the JPEG Committee conducted an exploration on NFTs. The scope of JPEG NFT is the creation of effective specifications that support a wide range of applications relying on NFTs applied to media assets. The standard will be secure, trustworthy and eco-friendly, allowing for an interoperable ecosystem relying on NFT within a single application or across applications. As a result of the exploration, at the 99th JPEG Meeting the committee released a “Draft Call for Proposals on JPEG NFT” and associated updated “Use Cases and Requirements for JPEG NFT”. Both documents are made publicly available for review and feedback.
JPEG XS
The JPEG committee continued its work on the JPEG XS 3rd edition. The primary goal of the 3rd edition is to deliver the same image quality as the 2nd edition, but with half of the required bandwidth. For Part 1 – Core coding tools – the Draft International Standard will proceed to ISO/IEC ballot. This is a significant step in the standardization process with all the core coding technology now final. Most notably, Part 1 adds a temporal decorrelation coding mode to further improve the coding efficiency, while keeping the low-latency and low-complexity core aspects of JPEG XS. Furthermore, Part 2 – Profiles and buffer models – and Part 3 – Transport and container formats – will proceed to Committee Draft consultation. Part 2 is important as it defines the conformance points for JPEG XS compliance. Completion of the JPEG XS 3rd edition standard is scheduled for January 2024.
Final Quote
“The creation of standardized tools to bring assurance of authenticity, provenance and ownership for multimedia content is the most efficient path to suppress the abusive use of fake media. JPEG Trust will be the first international standard that provides such tools.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.
Future JPEG meetings are planned as follows:
No 100, will be in Covilhã, Portugal from 17-21 July 2023
No 101, will be online from 30 October – 3 November 2023
A zip package containing the official JPEG logo and logos of all JPEG standards can be downloaded here.
This column showcases a series of video interviews shooted at ACM Multimedia 2022. Social media editors in chief (i.e., Silvia Rossi and Conor Keighrey) of the records interviewed the authors behind some of the most intriguing and compelling demos and artistic interactive artworks. Silvia and Conor have started this initiative and will continue, when possible, at conferences supported by SIGMM.
ACM Multimedia is the premier international conference in the area of multimedia within the field of computer science. As in every edition of ACM MM, the conference once again played host to riveting demonstrations and interactive showcases of the latest research concepts. These sessions serve a dual purpose: they stand as a testament to the presenters’ invaluable scientific and engineering contributions while also providing a unique opportunity for multimedia researchers and practitioners to delve into real-world applications, prototypes, and proofs-of-concept.
This dynamic setting is where conference attendees come face-to-face with groundbreaking multimedia systems. It’s a chance for them to gain insights into the innovative solutions and ideas that are actively shaping the future of this ever-evolving field. From visionary demonstrations of emerging technologies to interactive showcases that push the boundaries of creativity, these sessions are at the heart of what makes ACM MM a unique event in the world of multimedia.
Below is the list of video interviews with references to the corresponding authors and papers.
Varvara Guljajeva and Mar Canet Sola. 2022. Dream Painter: An Interactive Art Installation Bridging Audience Interaction, Robotics, and Creative AI. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 7235–7236. https://doi.org/10.1145/3503161.3549976
Jorge Forero, Gilberto Bernardes, and Mónica Mendes. 2022. Emotional Machines: Toward Affective Virtual Environments. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 7237–7238. https://doi.org/10.1145/3503161.3549973
Ignacio Reimat, Yanni Mei, Evangelos Alexiou, Jack Jansen, Jie Li, Shishir Subramanyam, Irene Viola, Johan Oomen, and Pablo Cesar. 2022. Mediascape XR: A Cultural Heritage Experience in Social VR. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 6955–6957. https://doi.org/10.1145/3503161.3547732
Manuel Silva, Luana Santos, Luís Teixeira, and José Vasco Carvalho. 2022. All is Noise: In Search of Enlightenment, a VR Experience. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 7223–7224. https://doi.org/10.1145/3503161.3549958
Pin-Xuan Liu, Tse-Yu Pan, Hsin-Shih Lin, Hung-Kuo Chu, and Min-Chun Hu. 2022. BetterSight: Immersive Vision Training for Basketball Players. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 6979–6981. https://doi.org/10.1145/3503161.3547745
Tiago Fornelos, Pedro Valente, Rafael Ferreira, Diogo Tavares, Diogo Silva, David Semedo, Joao Magalhaes, and Nuno Correia. 2022. A Conversational Shopping Assistant for Online Virtual Stores. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 6994–6996. https://doi.org/10.1145/3503161.3547738
Ting-Yang Kao, Tse-Yu Pan, Chen-Ni Chen, Tsung-Hsun Tsai, Hung-Kuo Chu, and Min-Chun Hu. 2022. ScoreActuary: Hoop-Centric Trajectory-Aware Network for Fine-Grained Basketball Shot Analysis. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 6991–6993. https://doi.org/10.1145/3503161.3547736
Maria Giovanna Donadio, Filippo Principi, Andrea Ferracani, Marco Bertini, and Alberto Del Bimbo. 2022. Engaging Museum Visitors with Gamification of Body and Facial Expressions. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 7000–7002. https://doi.org/10.1145/3503161.3547744
The 14th ACM Multimedia Systems Conference (with the associated workshops: NOSSDAV 2023, MMVE 2023, and the first edition of GMSys 2023) took place from 7th – 10th June 2023 in Vancouver, Canada. The MMSys conference brings together researchers in multimedia systems to showcase and exchange their cutting-edge research findings. Once again, there were technical talks spanning various multimedia domains and inspiring keynote presentations. Participants had also the opportunity to further interact with colleagues while enjoying the sunset with a 360° view of Vancouver on the Lookout tower or during a dinner in the core of the rainforest. Additionally, this year’s event included a special session dedicated to the memory of Dr. Kuan-Ta Chen, to honor his invaluable contributions to the multimedia community and to inspire the future generation of researches.
To encourage junior researchers to participate on-site, SIGMM has sponsored a group of students with Student Travel Grant Awards. For many of them, this was their first time presenting at an international conference, and it was a wonderful experience. In this article, the recipients of the travel grants share their experiences at MMSys 2023.
Mike Vandersanden, PhD student from Hasselt University, Belgium
As a new PhD student starting my professional academic career less than a year before MMSys ’23, I focused on finding my place in the academic community. My advisor and colleagues encouraged me to achieve two goals: get feedback on my research and build a network. I submitted my first paper and it got accepted for the Doctoral Symposium at the conference, which was a great opportunity to work towards achieving my goals. Presenting my paper allowed me to receive helpful feedback, have interesting discussions, and gain new perspectives. It was motivating to see people interested in my work. During the rest of the conference, I connected with many attendees from different parts of the world. The social events were a great way to meet others, and we also had enjoyable evenings downtown. Upon returning home, I was happy to report to my advisor that I accomplished all my goals for the first year. I am grateful for receiving a student travel grant, as it made it easier to travel to another continent. It also gave me the freedom to manage my budget and increases my chances of attending the conference again next year.
May Lim, PhD student from National University of Singapore, Singapore
I was both excited and nervous for MMSys 2023 as it was not only my first in-person conference but also in a country I had never visited before. The conference turned out to be one of the most unforgettable and pleasant experience I ever had. It was well-organized with very insightful presentations, many opportunities to interact and exchange contacts with fellow researchers, and not forgetting the organizers’ thoughtful efforts to ensure the great comfort and welfare of the participants. Vancouver’s weather and people were very kind as well.
I am thankful for the travel grant and the support from ACM SIGMM was truly heartening. I hope to continue to be part of this community and pay it forward in other ways.
Tiago Soares da Costa, PhD student from FEUP, Portugal
MMSys 2023 marked my return to in-person conferences and was one of the most well organized conferences I had the pleasure to participate in. After several virtual conferences, being able to actively meet and discuss interesting topics related to multimedia with fellow researchers was a breath of fresh air. The keynote presentation from Klara Nahrstedt was one of my highlights from MMSys 2023, due to its extensive focus on multi-view streaming, one of the main topics from my PhD research. Ihab Amer was another welcome surprise, presenting us with the current trends in AI encoding from one of the leading tech enterprises, AMD. Regarding paper presentations, I have to highlight the following works: 1) “The AD△ER Framework: Tools for Event Video Representations“, for providing us with a new approach to frameless videos; 2) “Remote Expert Assistance System for Mixed-HMD Clients over 5G Infrastructure” for delivering an impressive tech demonstration; 3) “FleXR: A System Enabling Flexibly Distributed Extended Reality“, for presenting us with a distributed stream processing solution which can be effectively applied to XR-based environments. As for the social events from MMSys 2023, the sights and sounds from Grouse Mountain and the impressive view from the Vancouver Lookout Tower were among some of my favourite moments in Vancouver, that I will forever cherish. Overall, MMSys 2023 was an amazing conference and I’m particularly grateful to the SIGMM committee for providing me with the travel grant.
Yu-Szu Wei from National Tsing Hua University, Taiwan
It is a great honor for me to receive the student travel grant and I appreciate it so much. ACM MMsys 2023 is my first in-person experience attending an international conference, it certainly is a fantastic experience for me. I met lots of astonishing researchers and volunteers who solve problems with different, creative, and novel approaches. I exchanged my ideas with them and learned a lot from them. The keynote sessions also gave me brand-new mindsets, finding out that there are lots of issues for us to investigate and deal with. The most impressive thing for me is to stand on the stage and present my work to those experts. I’m so proud of myself for delivering my research ideas in front of the public and gaining abundant feedback from the audience.
Thanks to the committee that organized this awesome event, and provided me the travel grant to attend the conference. I’m looking forward to attending ACM MMSys again in the future.
The 142nd MPEG meeting was held as a face-to-face meeting in Antalya, Türkiye, and the official press release can be found here and comprises the following items:
MPEG issues Call for Proposals for Feature Coding for Machines
MPEG finalizes the 9th Edition of MPEG-2 Systems
MPEG reaches the First Milestone for Storage and Delivery of Haptics Data
MPEG completes 2nd Edition of Neural Network Coding (NNC)
MPEG completes Verification Test Report and Conformance and Reference Software for MPEG Immersive Video
MPEG finalizes work on metadata-based MPEG-D DRC Loudness Leveling
The press release text has been modified to match the target audience of ACM SIGMM and highlight research aspects targeting researchers in video technologies. This column focuses on the 9th edition of MPEG-2 Systems, storage and delivery of haptics data, neural network coding (NNC), MPEG immersive video (MIV), and updates on MPEG-DASH.
Feature Coding for Video Coding for Machines (FCVCM)
At the 142nd MPEG meeting, MPEG Technical Requirements (WG 2) issued a Call for Proposals (CfP) for technologies and solutions enabling efficient feature compression for video coding for machine vision tasks. This work on “Feature Coding for Video Coding for Machines (FCVCM)” aims at compressing intermediate features within neural networks for machine tasks. As applications for neural networks become more prevalent and the neural networks increase in complexity, use cases such as computational offload become more relevant to facilitate the widespread deployment of applications utilizing such networks. Initially as part of the “Video Coding for Machines” activity, over the last four years, MPEG has investigated potential technologies for efficient compression of feature data encountered within neural networks. This activity has resulted in establishing a set of ‘feature anchors’ that demonstrate the achievable performance for compressing feature data using state-of-the-art standardized technology. These feature anchors include tasks performed on four datasets.
Research aspects: FCVCM is about compression, and the central research aspect here is compression efficiency which can be tested against a commonly agreed dataset (anchors). Additionally, it might be attractive to research which features are relevant for video coding for machines (VCM) and quality metrics in this emerging domain. One might wonder whether, in the future, robots or other AI systems will participate in subjective quality assessments.
9th Edition of MPEG-2 Systems
MPEG-2 Systems was first standardized in 1994, defining two container formats: program stream (e.g., used for DVDs) and transport stream. The latter, also known as MPEG-2 Transport Stream (M2TS), is used for broadcast and internet TV applications and services. MPEG-2 Systems has been awarded a Technology and Engineering Emmy® in 2013 and at the 142nd MPEG meeting, MPEG Systems (WG 3) ratified the 9th edition of ISO/IEC 13818-1 MPEG-2 Systems. The new edition includes support for Low Complexity Enhancement Video Coding (LCEVC), the youngest in the MPEG family of video coding standards on top of more than 50 media stream types, including, but not limited to, 3D Audio and Versatile Video Coding (VVC). The new edition also supports new options for signaling different kinds of media, which can aid the selection of the best audio or other media tracks for specific purposes or user preferences. As an example, it can indicate that a media track provides information about a current emergency.
Research aspects: MPEG container formats such as MPEG-2 Systems and ISO Base Media File Format are necessary for storing and delivering multimedia content but are often neglected in research. Thus, I would like to take up the cudgels on behalf of the MPEG Systems working group and argue that researchers should pay more attention to these container formats and conduct research and experiments for its efficient use with respect to multimedia storage and delivery.
Storage and Delivery of Haptics Data
At the 142nd MPEG meeting, MPEG Systems (WG 3) reached the first milestone for ISO/IEC 23090-32 entitled “Carriage of haptics data” by promoting the text to Committee Draft (CD) status. This specification enables the storage and delivery of haptics data (defined by ISO/IEC 23090-31) in the ISO Base Media File Format (ISOBMFF; ISO/IEC 14496-12). Considering the nature of haptics data composed of spatial and temporal components, a data unit with various spatial or temporal data packets is used as a basic entity like an access unit of audio-visual media. Additionally, an explicit indication of a silent period considering the sparse nature of haptics data has been introduced in this draft. The standard is planned to be completed, i.e., to reach the status of Final Draft International Standard (FDIS), by the end of 2024.
Research aspects: Coding (ISO/IEC 23090-31) and carriage (ISO/IEC 23090-32) of haptics data goes hand in hand and needs further investigation concerning compression efficiency and storage/delivery performance with respect to various use cases.
Neural Network Coding (NNC)
Many applications of artificial neural networks for multimedia analysis and processing (e.g., visual and acoustic classification, extraction of multimedia descriptors, or image and video coding) utilize edge-based content processing or federated training. The trained neural networks for these applications contain many parameters (weights), resulting in a considerable size. Therefore, the MPEG standard for the compressed representation of neural networks for multimedia content description and analysis (NNC, ISO/IEC 15938-17, published in 2022) was developed, which provides a broad set of technologies for parameter reduction and quantization to compress entire neural networks efficiently.
Recently, an increasing number of artificial intelligence applications, such as edge-based content processing, content-adaptive video post-processing filters, or federated training, need to exchange updates of neural networks (e.g., after training on additional data or fine-tuning to specific content). Such updates include changes in the neural network parameters but may also involve structural changes in the neural network (e.g. when extending a classification method with a new class). In scenarios like federated training, these updates must be exchanged frequently, such that much more bandwidth over time is required, e.g., in contrast to the initial deployment of trained neural networks.
The second edition of NNC addresses these applications through efficient representation and coding of incremental updates and extending the set of compression tools that can be applied to both entire neural networks and updates. Trained models can be compressed to at least 10-20% and, for several architectures, even below 3% of their original size without performance loss. Higher compression rates are possible at moderate performance degradation. In a distributed training scenario, a model update after a training iteration can be represented at 1% or less of the base model size on average without sacrificing the classification performance of the neural network. NNC also provides synchronization mechanisms, particularly for distributed artificial intelligence scenarios, e.g., if clients in a federated learning environment drop out and later rejoin.
Research aspects: The incremental compression of neural networks enables various new use cases, which provides research opportunities for media coding and communication, including optimization thereof.
MPEG Immersive Video
At the 142nd MPEG meeting, MPEG Video Coding (WG 4) issued the verification test report of ISO/IEC 23090-12 MPEG immersive video (MIV) and completed the development of the conformance and reference software for MIV (ISO/IEC 23090-23), promoting it to the Final Draft International Standard (FDIS) stage.
MIV was developed to support the compression of immersive video content, in which multiple real or virtual cameras capture a real or virtual 3D scene. The standard enables the storage and distribution of immersive video content over existing and future networks for playback with 6 degrees of freedom (6DoF) of view position and orientation. MIV is a flexible standard for multi-view video plus depth (MVD) and multi-planar video (MPI) that leverages strong hardware support for commonly used video formats to compress volumetric video.
ISO/IEC 23090-23 specifies how to conduct conformance tests and provides reference encoder and decoder software for MIV. This draft includes 23 verified and validated conformance bitstreams spanning all profiles and encoding and decoding reference software based on version 15.1.1 of the test model for MPEG immersive video (TMIV). The test model, objective metrics, and other tools are publicly available at https://gitlab.com/mpeg-i-visual.
Research aspects: Conformance and reference software are usually provided to facilitate product conformance testing, but it also provides researchers with a common platform and dataset, allowing for the reproducibility of their research efforts. Luckily, conformance and reference software are typically publicly available with an appropriate open-source license.
MPEG-DASH Updates
Finally, I’d like to provide a quick update regarding MPEG-DASH, which has become a new part, namely redundant encoding and packaging for segmented live media (REAP; ISO/IEC 23009-9). The following figure provides the reference workflow for redundant encoding and packaging of live segmented media.
Reference workflow for redundant encoding and packaging of live segmented media.
The reference workflow comprises (i) Ingest Media Presentation Description (I-MPD), (ii) Distribution Media Presentation Description (D-MPD), and (iii) Storage Media Presentation Description (S-MPD), among others; each defining constraints on the MPD and tracks of ISO base media file format (ISOBMFF).
Additionally, the MPEG-DASH Break out Group discussed various technologies under consideration, such as (a) combining HTTP GET requests, (b) signaling common media client data (CMCD) and common media server data (CMSD) in a MPEG-DASH MPD, (c) image and video overlays in DASH, and (d) updates on lower latency.
An updated overview of DASH standards/features can be found in the Figure below.
Research aspects: The REAP committee draft (CD) is publicly available feedback from academia and industry is appreciated. In particular, first performance evaluations or/and reports from proof of concept implementations/deployments would be insightful for the next steps in the standardization of REAP.
The 143rd MPEG meeting will be held in Geneva from July 17-21, 2023. Click here for more information about MPEG meetings and their developments.
The works addressed by this new group can be of interest for the SIGMM community since they are related to AI-based technologies for image and video processing, greening of streaming, blockchain in media and entertainment, and ongoing related standardization activities.
About ETG
The main objective of this group is to address various aspects of multimedia that do not fall under the scope of any of the existing VQEG groups. The group, through its activities, aims to provide a common platform for people to gather together and discuss new emerging topics and ideas, discuss possible collaborations in the form of joint survey papers/whitepapers, funding proposals, etc. The topics addressed are not necessarily directly related to “video quality” but rather focus on any ongoing work in the field of multimedia which can indirectly impact the work addressed as part of VQEG.
Scope
During the creation of the group, the following topics were tentatively identified to be of possible interest to the members of this group and VQEG in general:
AI-based technologies:
Super Resolution
Learning-based video compression
Video coding for machines, etc.,
Enhancement, Denoising and other pre- and post-filter techniques
Greening of streaming and related trends
For example, trade-off between HDR and SDR to save energy and its impact on visual quality
Ongoing Standards Activities (which might impact the QoE of end users and hence will be relevant for VQEG)
3GPP, SVTA, CTA WAVE, UHDF, etc.
MPEG/JVET
Blockchain in Media and Entertainment
Since the creation of the group, four talks on various topics have been organized, an overview of which is summarized next.
Overview of the Presentations
We briefly provide a summary of various talks that have been organized by the group since its inception.
On the work by MPEG Systems Smart Contracts for Media Subgroup
The first presentation was on the topic of the recent work by MPEG Systems on Smart Contract for Media [1], which was delivered by Dr Panos Kudumakis who is the Head of UK Delegation, ISO/IEC JTC1/SC29 & Chair of British Standards Institute (BSI) IST/37. Dr Panos in this talk highlighted the efforts in the last few years by MPEG towards developing several standardized ontologies catering to the needs of the media industry with respect to the codification of Intellectual Property Rights (IPR) information toward the fair trade of media. However, since inference and reasoning capabilities normally associated with ontology use cannot naturally be done on DLT environments, there is a huge potential to unlock the Semantic Web and, in turn, the creative economy by bridging this interoperability gap [2]. In that direction, ISO/IEC 21000-23 Smart Contracts for Media standard specifies the means (e.g., APIs) for converting MPEG IPR ontologies to smart contracts that can be executed on existing DLT environments [3]. The talk discussed the recent works that have been done as part of this effort and also on the ongoing efforts towards the design of a full-fledged ISO/IEC 23000-23 Decentralized Media Rights Application Format standard based on MPEG technologies (e.g., audio-visual codecs, file formats, streaming protocols, and smart contracts) and non-MPEG technologies (e.g., DLTs, content, and creator IDs). The recording of the presentation is available here, and the slides can be accessed here.
Introduction to NTIRE Workshop on Quality Assessment for Video Enhancement
The second presentation was given by Xiaohong Liu and Yuxuan Gao from Shanghai Jiao Tong University, China about one of the CVPR challenge workshops called the NTIRE 2023 Quality Assessment of Video Enhancement Challenge. The presentation described the motivation for starting this challenge and how this is of great relevance to the video community in general. Then the presenters described the dataset such as the dataset creation process, subjective tests to obtain ratings, and the reasoning behind the choice of the split of the dataset into training, validation, and test sets. The results of this challenge are scheduled to be presented at the upcoming spring meeting end of June 2023. The presentation recording is available here.
Perception: The Next Milestone in Learned Image Compression
Johannes Balle from Google was the third presenter on the topic of “Perception: The Next Milestone in Learned Image Compression.” In the first part, Johannes discussed the learned compression and described the nonlinear transforms [4] and how they could achieve a higher image compression rate than linear transforms. Next, they emphasized the importance of perceptual metrics in comparison to distortion metrics by introducing the difference between perceptual quality vs. reconstruction quality [5]. Next, an example of generative-based image compression is presented where the two criteria of distortion metric and perceptual metric (named as realism criteria) are combined, HiFiC [6]. Finally, the talk concluded with an introduction to perceptual spaces and an example of a perceptual metric, PIM [7]. The presentation slides can be found here.
Compression with Neural Fields
Emilien Dupont (DeepMind) was the fourth presenter. He started the talk with a short introduction on the emergence of neural compression that fits a signal, e.g., an image or video, to a neural network. He then discussed the two recent works on neural compression that he was involved in, named COIN [8] and COIN++ [9]. He then made a short overview of other Implicit Neural Representation in the domain of video such as NerV [10] and NIRVANA [11]. The slides for the presentation can be found here.
Upcoming Presentations
As part of the ongoing efforts of the group, the following talks/presentations are scheduled in the next two months. For an updated schedule and list of presentations, please check the ETG homepage here.
Sustainable/Green Video Streaming
Given the increasing carbon footprint of streaming services and climate crisis, many new collaborative efforts have started recently, such as the Greening of the Streaming alliance, Ultra HD Sustainability forum, etc. In addition, research works recently have started focussing on how to make video streaming more greener/sustainable. A talk providing an overview of the recent works and progress in direction is tentatively scheduled around mid-May, 2023.
Panel discussion at VQEG Spring Meeting (June 26-30, 2023), Sony Interactive Entertainment HQ, San Mateo, US
During the next face-to-face VQEG meeting in San Mateo there will be an interesting panel discussion on the topic of “Deep Learning in Video Quality and Compression.” The goal is to invite the machine learning experts to VQEG and bring the two groups closer. ETG will organize the panel discussion, and the following four panellists are currently invited to join this event: Zhi Li (Netflix), Ioannis Katsavounidis (Meta), Richard Zhang (Adobe), and Mathias Wien (RWTH Aachen). Before this panel discussion, two talks are tentatively scheduled, the first one on video super-resolution and the second one focussing on learned image compression. The meeting will talk place in hybrid mode allowing for participation both in-person and online. For further information about the meeting, please check the details here and if interested, register for the meeting.
Joining and Other Logistics
While participation in the talks is open to everyone, to get notified about upcoming talks and participate in the discussion, please consider subscribing to etg@vqeg.org email reflector and join the slack channel using this link. The meeting minutes are available here. We are always looking for new ideas to improve. If you have suggestions on topics we should focus on or have recommendation of presenters, please reach out to the chairs (Nabajeet and Saman).
JPEG explores standardization in event-based imaging
The 98th JPEG meeting was held in Sydney, Australia, from the 16th to 20th January 2023. This was a welcome return to face-to-face meetings after a long period of online meetings due to Covid-19 pandemics. Interestingly, the previous face-to-face meeting of the JPEG Committee was also held in Sydney, in January 2020. The face-to-face 98th JPEG meeting was complemented with online connections to allow the remote participation of those who could not be present.
The recent calls for proposals, such as JPEG Fake Media, JPEG AI and JPEG Pleno Learning Based Point Cloud Coding, resulted in a very dynamic and participative meeting in Sydney, with multiple technical sessions and decisions. Exploration activities such as JPEG DNA and JPEG NFT also produced drafts of future calls for proposals as a consequence of reaching sufficient maturity.
Furthermore, and considering the current trends in machine-based imaging applications, the JPEG Committee initiated an exploration on standardization in event-based imaging.
98th JPEG Meeting first plenary.
The 98th JPEG meeting had the following highlights:
New JPEG exploration in event-based imaging;
JPEG Fake Media and NFT;
JPEG AI;
JPEG Pleno Learning-based Point Cloud Coding improves its Verification Model;
JPEG AIC prepares the analysis of the responses to the Call for Contribution;
JPEG XL second editions;
JPEG Systems;
JPEG DNA prepares its call for proposals;
JPEG XS 3rd Edition;
JPEG 2000 guidelines.
The following summarizes the major achievements during the 98th JPEG meeting.
New JPEG exploration in event-based imaging
The JPEG Committee has started a new exploration activity on event-based imaging named JPEG XE.
Event-based Imaging revolves around a new and emerging image modality created by event-based visual sensors. Event-based sensors are the foundation for a new class of cameras that allow the efficient capture of visual information at high speed while at the same time requiring low computational cost, a requirement which it is common in many machine vision applications. Such sensors are modeled based on the mechanisms of the human visual system for the detection of scene changes and the asynchronous capture of those changes. This means that every pixel works individually to detect scene changes and creates the associated events. If nothing happens, then no events are generated. This contrasts with conventional image sensors, where pixels are sampled in a continuous and periodic manner, with images generated regardless of any changes in the scene and a risk of reacting with delay and even missing quick changes.
The JPEG Committee recognizes that this new image modality opens doors to a large number of applications where capture and processing of visual information is needed. Currently, there is no standard format to represent event-based information, and therefore existing and emerging applications are fragmented and lack interoperability. The new JPEG XE activity focuses on establishing a scope and relevant definitions, collecting use cases and their associated requirements, and investigating the role that JPEG can play in the definition of timely standards in the near- and long-term. To start, an Ad-hoc Group has been established. To stay informed about the activities please join the event based imaging Ad-hoc Group mailing list.
JPEG Fake Media and NFT
In April 2022, the JPEG Committee released a Final Call for Proposals on JPEG Fake Media. The scope of JPEG Fake Media is the creation of a standard that can facilitate the secure and reliable annotation of media assets creation and modifications. During the 98th meeting, the JPEG Committee finalised the evaluation of the six submitted proposals and initiated the process for establishing a new standard.
The JPEG Committee also continues to explore use cases and requirements related to Non-Fungible Tokens (NFTs). Although the use cases for both topics are very different, there is a clear commonality in terms of requirements and relevant solutions. An updated version of the “Use Cases and Requirements for JPEG NFT” was produced and made publicly available for review and feedback.
To stay informed about the activities, please join the mailing list of the Ad-hoc Group and regularly check the JPEG website for the latest information.
JPEG AI
Following the creation of the JPEG AI Verification Model at the previous 97th JPEG meeting, more discussions occurred at the 98th meeting to improve the coding efficiency, and complexity, especially on the decoder side. The JPEG AI VM has several unique characteristics, such as a parallelizable context model to perform latent prediction, decoupling of prediction and sample reconstruction, and rate adaptation, among others. JPEG AI VM shows up to 31% compression gain over VVC Intra for natural content. A new JPEG AI test set was released during the 98th meeting. This is a large dataset for the evaluation of the JPEG AI VM containing 50 images, with the objective of tracking the performance improvements at every meeting. The JPEG AI Common Training and Test Conditions were updated to include this new dataset. In this meeting, it was also decided to integrate several changes into the JPEG AI VM, speeding up training, improving performance at high rates and fixing bugs. A set of core experiments were established at this meeting targeting RD performance and complexity improvements. The JPEG AI VM Software Guidelines were approved, describing the initial setup repository of JPEG AI VM, how to obtain the JPEG AI dataset, and how to run tests and training. A description of the structure of the JPEG AI VM repository was also made available.
JPEG Pleno Learning-based Point Cloud coding
The JPEG Pleno Point Cloud activity progressed at this meeting with a number of technical submissions for improvements to the VM in the area of colour coding, artefact processing and improvements to coding speed. In addition, the JPEG Committee released the “Call for Content for JPEG Pleno Point Cloud Coding” to expand on the current training and test set with new point clouds representing key use cases. Prior to the 99th JPEG Meeting, JPEG experts will promote the Call for Content as well as investigate possible advancements to the VM in the areas of auto-regressive entropy encoding, sparse tensor convolution, meta-data controlled post-filtering of colour and a flexible split geometry and colour coding framework for the VM.
JPEG AIC
During the 98th JPEG meeting in Sydney, Australia, Exploration Study 1 on JPEG AIC was established. This exploration study will collect results from three types of previously standardized subjective evaluation methodologies in order to provide an informative reference for the JPEG AIC submissions to the Call for Contributions that are due by April 1st, 2023. Corrections and additions to the JPEG AIC Common Test Conditions were issued in order to reflect the addition of a new codec for testing content generation and a new anchor subjective quality assessment methodology.
The JPEG Committee is working on the continuation of the previous standardization efforts (AIC-1 and AIC-2) and aims at developing a new standard, known as AIC-3. The new standard will focus on the methodologies for quality assessment of images in a range that goes from high quality to near-visually lossless quality, which are not covered by any previous AIC standards.
JPEG XL
The second editions of JPEG XL Part 1 (Core coding system) and Part 2 (File format) have reached the CD stage. These second editions provide clarifications, corrections and editorial improvements that will facilitate independent implementations. Also, an updated version of the JPEG XL White Paper has been published and is freely available through jpeg.org.
JPEG Systems
The JLINK standard (19566-7:2022) is now published by ISO. JLINK specifies an image file format capable of linking multiple media elements, such as image and text in any JPEG file format. It enables enhanced curated experiences of a set of images for education, training, virtual museum tours, travelogs, and similar visually-oriented content.
The JPEG Snack (19566-8) standard is expected to be published in February 2023. JPEG Snack specifies the coding of audio, picture, multimedia and hypermedia information, enabling a rich, image-based, short-form animated experiences for social media.
The second edition of JUMBF (JPEG Universal Metadata Box Format, 19566-5) is progressing to IS stage; the second edition brings new capabilities and support for additional types of media.
JPEG DNA
The JPEG Committee has been working on an exploration for coding of images in quaternary representations particularly suitable for image archival on DNA storage. The scope of JPEG DNA is the creation of a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers. During the 98th JPEG meeting, a draft Call for Proposals for JPEG DNA was issued and made public, as a first concrete step towards standardisation. The draft call for proposals for JPEG DNA is complemented by a JPEG DNA Common Test Conditions document which is also made public, describing details about the dataset, operating points, anchors and performance assessment methodologies and metrics that will be used to evaluate anchors and future responses to the Call for Proposals. The final Call for Proposals for JPEG DNA is expected to be released at the conclusion of the 99th JPEG meeting in April 2023, after a set of exploration experiments have validated the procedures outlined in the draft Call for Proposals for JPEG DNA and JPEG DNA Common Test Conditions. The deadline for submission of proposals to the Call for Proposals for JPEG DNA is 2 October 2023 with a pre-registration due by 10 July 2023. The JPEG DNA international standard is expected to be published by early 2025.
JPEG XS
The JPEG Committee continued with the definition of JPEG XS 3rd edition. The primary goal of the 3rd edition is to deliver the same image quality as the 2nd edition, but with half of the required bandwidth. The Committee Draft for Part 1 (Core coding system) will proceed to ISO ballot. This means that the standard is now technically defined, and all the new coding tools are known. Most notably, Part 1 adds a temporal decorrelation coding mode to further improve the coding efficiency, while keeping the low-latency and low-complexity core aspects of JPEG XS. This new coding tool is of extreme importance for remote desktop applications and screen sharing. In addition, mathematically lossless coding can now support up to 16 bits precision (up from 12 bits). For Part 2 (Profiles and buffer models), the committee created a second Working Draft and issued further core experiments to proceed and support this work. Meanwhile, ISO approved the creation of a new edition of Part 3 (Transport and container formats) that is needed to address the changes of Part 1 and Part 2.
JPEG 2000
The JPEG committee publishes two sets of guidelines for implementers of JPEG 2000, available on jpeg.org.
The first describes an algorithm for controlling JPEG 2000 coding quality using a single number (Qfactor) between 1 (worst quality) and 100 (best quality), as is commonly done with JPEG.
The second explains how to create, parse and use HTJ2K placeholder passes and HT Sets. These features are an integral part of HTJ2K and enable mathematically lossless transcoding between HT- and J2K-based codestreams, among other applications.
Final Quote
“The interest in event-based imaging has been rising with several products designed and offered by the industry. The JPEG Committee believes in interoperable solutions and has initiated an exploration for standardization of event-based imaging in order to accelerate creation of an ecosystem.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.
Upcoming JPEG meetings are planned as follows:
No 99, will be online from 24-28 April 2023
No 100, will be in Covilhã, Portugal from 17-21 July 2023
ACM SIGMM co-sponsored the Spring School on Social XR, organized by the Distributed and Interactive Systems group (DIS) at CWI in Amsterdam. The event took place on March 13th – 17th 2023 and attracted 33 students from different disciplines (technology, social sciences, and humanities). The program included 18 lectures, 4 of them open, by 20 instructors. The event was co-sponsored by the ACM Special Interest Group on Multimedia ACM SIGMM, making available student grants, and The Netherlands Institute for Sound and Vision (https://www.beeldengeluid.nl/en). The event was part of the recently started research semester programmes of CWI.
Students and organisers of the Spring School on Social XR (March 13th – 17th 2023, Amsterdam)
“The future of media communication is immersive, and will empower sectors such as cultural heritage, education, manufacturing, and provide a climate-neutral alternative to travelling in the European Green Deal”. With such a vision in mind, the organization committee created a holistic program around the research topic of Social XR. The program included keynotes and workshops, where prominent scientists in the field shared their knowledge with students and triggered meaningful conversations and exchanges.
The program included topics such as the capturing and modelling of realistic avatars and their behavior, coding and transmission techniques of volumetric video content, ethics for the design and development of responsible social XR experiences, novel rending and interaction paradigms, and human factors and evaluation of experiences. Together, they provided a holistic perspective, helping participants to better understand the area and to initiate a network of collaboration to overcome current limitations of current real-time conferencing systems.
Apart from science, there is always time for fun, so a number of social events took place, including a visit to the recently renovated Museum of Sound and Vision!
Museum of Sound and Vision
The spring school is part of the semester program organized by the DIS group of CWI, which was initiated in May 2022 with the Symposium on human-centered multimedia systems: a workshop and seminar to celebrate the inaugural lecture, “Human-Centered Multimedia: Making Remote Togetherness Possible” of Prof. Pablo Cesar.
This column provides an overview of the last Video Quality Experts Group (VQEG) plenary meeting, which took place from 12 to 16 December 2022. Around 100 participants from 21 different countries around the world registered for the meeting that was organized online by Brightcove (United Kingdom). During the five days, there were more than 40 presentations and discussions among researchers working on topics related to the projects ongoing within VQEG. All the related information, minutes, and files from the meeting are available online on the VQEG meeting website, and video recordings of the meeting are available on Youtube.
Many of the works presented in this meeting can be relevant for the SIGMM community working on quality assessment. Particularly interesting can be the proposals to update and merge ITU-T recommendations P.913, P.911, and P.910, the kick-off of the test plan to evaluate the QoE of immersive interactive communication systems, and the creation of a new group on emerging technologies that will start working on AI-based technologies and greening of streaming and related trends.
We encourage readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.
Group picture of the VQEG Meeting 12-16 December 2022 (online).
Overview of VQEG Projects
Audiovisual HD (AVHD)
The AVHD group investigates improved subjective and objective methods for analysing commonly available video systems. Currently, there are two projects ongoing under this group: Quality of Experience (QoE) Metrics for Live Video Streaming Applications (Live QoE) and Advanced Subjective Methods (AVHD-SUB).
In this meeting, there were three presentations related to topics covered by this group. In the first one, Maria Martini (Kingston University, UK), presented her work on converting video quality assessment metrics. In particular, the work addressed the relationship between SSIM and PSNR for DCT-based compressed images and video, exploiting the content-related factor [1]. The second presentation was given by Urvashi Pal (Akamai, Australia) and dealt with video codec profiling with video quality assessment complexities and resolutions. Finally, Jingwen Zhu (Nantes Université, France) presented her work on the benefit of parameter-driven approaches for the modelling and the prediction of a Satisfied User Ratio for compressed videos [2].
Quality Assessment for Health applications (QAH)
The QAH group works on the quality assessment of health applications, considering both subjective evaluation and the development of datasets, objective metrics, and task-based approaches. Currently there is an open discussion on new topics to address within the group, such as the application of visual attention models and studies to health applications. Also, an opportunity to conduct medical perception research was announced, which was proposed by Elizabeth Krupinski and will take place in the European Congress of Radiology (Vienna, Austria, Mar. 2023).
In addition, four research works were presented at the meeting. Firstly, Julie Fournier (INSA Rennes, France) presented new insights on affinity therapy for people with ASD, based on an eye-tracking study on images. The second presentation was delivered by Lumi Xia (INSA Rennes, France) and dealt with the evaluation of the usability of deep learning-based denoising models for low-dose CT simulation. Also, Mohamed Amine Kerkouri (University of Orleans, France), presented his work on deep-based quality assessment of medical images through domain adaptation. Finally, Jorge Caviedes (ASU, USA) delivered a talk on cognition inspired diagnostic image quality models, emphasising the need of distinguishing among interpretability (e.g., medical professional is confident in making a diagnosis), adequacy (e.g., capture technique shows the right area for assessment), and visual quality (e.g., MOS) in quality assessment of medical contents.
Statistical Analysis Methods (SAM)
The SAM group works on improving analysis methods both for the results of subjective experiments and for objective quality models and metrics. The group is currently working on updating and merging the ITU-T recommendations P.913, P.911, and P.910. The suggestion is to make P.910 and P.911 obsolete and make P.913 the only recommendation from ITU-T on subjective video quality assessments. The group worked on the liaison and document to be sent to ITU-T SG12 and will be available in the meeting files.
In addition, Mohsen Jenadeleh (Univerity of Konstanz, Germany) presented his work
on collective just noticeable difference assessment for compressed video with
Flicker Test and QUEST+.
Computer Generated Imagery (CGI)
CGI group is devoted to analysing and evaluating computer-generated content, with a focus on gaming in particular. The group is currently working in collaboration with ITU-T SG12 on the work item P.BBQCG on Parametric bitstream-based Quality Assessment of Cloud Gaming Services. In this sense, Saman Zadtootaghaj (Sony Interactive Entertainment, Germany) provided an update on the ongoing activities. In addition, they are working on two new work items: G.OMMOG on Opinion Model for Mobile Online Gaming applications and P.CROWDG on Subjective Evaluation of Gaming Quality with a Crowdsourcing Approach. Also, the group is working on identifying other topics and interests in CGI rather than gaming content.
No Reference Metrics (NORM)
The NORM group is an open collaborative project for developing no-reference metrics for monitoring visual service quality. Currently, the group is working on three topics: the development of no-reference metrics, the clarification of the computation of the Spatial and Temporal Indexes (SI and TI, defined in the ITU-T Recommendation P.910), and the development of a standard for video quality metadata.
In relation to the first topic, Margaret Pinson (NTIA/ITS, US), talked about why no-reference metrics for image and video quality lack accuracy and reproducibility [3] and presented new datasets containing camera noise and compression artifacts for the development of no-reference metrics by the group. In addition, Oliver Wiedeman (University of Konstanz, Germany) presented his work on cross-resolution image quality assessment.
Finally, related to the third topic, Ioannis Katsavounidis (Meta, US) provided an update on the status of the project. Given that the idea is already mature enough, a contribution will be made to MPEG to consider the insertion of metadata of video metrics into the encoded video streams. In addition, a liaison with AOMedia will be established that may go beyond this particular topic. And include best practices on subjective testing, IMG topics, etc.
Joint Effort Group (JEG) – Hybrid
The JEG group was focused on a joint work to develop hybrid perceptual/bitstream metrics and gradually evolved over time to include several areas of Video Quality Assessment (VQA), such as the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. Currently, the group is working on research problems rather than algorithms and models with immediate applicability. In addition, the group has launched a new website, which includes a list of activities of interest, freely available publications, and other resources.
The 5GKPI group studies the relationship between key performance indicators of new 5G networks and QoE of video services on top of them. In this meeting, Pablo Pérez (Nokia XR Lab, Spain) presented an overview of activities related to QoE and XR within 3GPP.
Immersive Media Group (IMG)
The IMG group is focused on the research on quality assessment of immersive media. The main joint activity going on within the group is the development of a test plan to evaluate the QoE of immersive interactive communication systems. After the discussions that took place in previous meetings and audio calls, a tentative schedule has been proposed to start the execution of the test plan in the following months. In this sense, a new work item will be proposed in the next ITU-T SG12 meeting to establish a collaboration between VQEG-IMG and ITU on this topic.
Quality Assessment for Computer Vision Applications (QACoViA)
The goal of the QACoViA group is to study the visual quality requirements for computer vision methods, where the “final observer” is an algorithm. Four presentations were delivered in this meeting addressing diverse related topics. In the first one, Mikołaj Leszczuk (AGH University, Poland) presented a method for assessing objective video quality for automatic license plate recognition tasks [6]. Also, Femi Adeyemi-Ejeye (University of Surrey, UK) presented his work related to the assessment of rail 8K-UHD CCTV facing video for the investigation of collisions. The third presentation dealt with the application of facial expression recognition and was delivered by Lucie Lévêque (Nantes Université, France), who compared the robustness of humans and deep neural networks on this task [7]. Finally, Alban Marie (INSA Rennes, France) presented a study video coding for machines through a large-scale evaluation of DNNs robustness to compression artefacts for semantic segmentation [8].
Other updates
In relation to the Human Factors for Visual Experiences (HFVE) group, Maria Martini (Kingston University, UK) provided a summary of the status of IEEE recommended practice for the quality assessment of light field imaging. Also, Kjell Brunnström (RISE, Sweden) presented a study related to the perceptual quality of video on simulated low temperatures in LCD vehicle displays.
In addition, a new group was created in this meeting called Emerging Technologies Group (ETG), whose main objective is to address various aspects of multimedia that do not fall under the scope of any of the existing VQEG groups. The topics addressed are not necessarily directly related to “video quality” but can indirectly impact the work addressed as part of VQEG. In particular, two major topics of interest were currently identified: AI-based technologies and greening of streaming and related trends. Nevertheless, the group aims to provide a common platform for people to gather together and discuss new emerging topics, discuss possible collaborations in the form of joint survey papers/whitepapers, funding proposals, etc.
Moreover, it was agreed during the meeting to make the Psycho-Physiological Quality Assessment (PsyPhyQA) group dormant until interest resumes in this effort. Also, it was proposed to move the Implementer’s Guide for Video Quality Metrics (IGVQM) project into the JEG-Hybrid, since their activities are currently closely related. This will be discussed in future group meetings and the final decisions will be announced. Finally, as a reminder, the VQEG GitHub with tools and subjective labs setup is still online and kept updated.
The next VQEG plenary meeting will take place in May 2023 and the location will be announced soon on the VQEG website.
After several years of online meetings, the 140th MPEG meeting was held as a face-to-face meeting in Mainz, Germany, and the official press release can be found here and comprises the following items:
MPEG evaluates the Call for Proposals on Video Coding for Machines
MPEG evaluates Call for Evidence on Video Coding for Machines Feature Coding
MPEG reaches the First Milestone for Haptics Coding
MPEG completes a New Standard for Video Decoding Interface for Immersive Media
MPEG completes Development of Conformance and Reference Software for Compression of Neural Networks
MPEG White Papers: (i)MPEG-H 3D Audio, (ii)MPEG-I Scene Description
Video Coding for Machines
Video coding is the process of compression and decompression of digital video content with the primary purpose of consumption by humans (e.g., watching a movie or video telephony). Recently, however, massive video data is more and more analyzed without human intervention leading to a new paradigm referred to as Video Coding for Machines (VCM) which targets both (i) conventional video coding and (ii) feature coding (see here for further details).
At the 140th MPEG meeting, MPEG Technical Requirements (WG 2) evaluated the responses to the Call for Proposals (CfP) for technologies and solutions enabling efficient video coding for machine vision tasks. A total of 17 responses to this CfP were received, with responses providing various technologies such as (i) learning-based video codecs, (ii) block-based video codecs, (iii) hybrid solutions combining (i) and (ii), and (iv) novel video coding architectures. Several proposals use a region of interest-based approach, where different areas of the frames are coded in varying qualities.
The responses to the CfP reported an improvement in compression efficiency of up to 57% on object tracking, up to 45% on instance segmentation, and up to 39% on object detection, respectively, in terms of bit rate reduction for equivalent task performance. Notably, all requirements defined by WG 2 were addressed by various proposals.
Furthermore, MPEG Technical Requirements (WG 2) evaluated the responses to the Call for Evidence (CfE) for technologies and solutions enabling efficient feature coding for machine vision tasks. A total of eight responses to this CfE were received, of which six responses were considered valid based on the conditions described in the call:
For the tested video dataset increases in compression efficiency of up to 87% compared to the video anchor and over 90% compared to the feature anchor were reported.
For the tested image dataset, the compression efficiency can be increased by over 90% compared to both image and feature anchors.
Research aspects: the main research area is still the same as described in my last column, i.e., compression efficiency (incl. probably runtime, sometimes called complexity) and Quality of Experience (QoE). Additional research aspects are related to the actual task for which video coding for machines is used (e.g., segmentation, object detection, as mentioned above).
Video Decoding Interface for Immersive Media
One of the most distinctive features of immersive media compared to 2D media is that only a tiny portion of the content is presented to the user. Such a portion is interactively selected at the time of consumption. For example, a user may not see the same point cloud object’s front and back sides simultaneously. Thus, for efficiency reasons and depending on the users’ viewpoint, only the front or back sides need to be delivered, decoded, and presented. Similarly, parts of the scene behind the observer may not need to be accessed.
At the 140th MPEG meeting, MPEG Systems (WG 3) reached the final milestone of the Video Decoding Interface for Immersive Media (VDI) standard (ISO/IEC 23090-13) by promoting the text to Final Draft International Standard (FDIS). The standard defines the basic framework and specific implementation of this framework for various video coding standards, including support for application programming interface (API) standards that are widely used in practice, e.g., Vulkan by Khronos.
The VDI standard allows for dynamic adaptation of video bitstreams to provide the decoded output pictures so that the number of actual video decoders can be smaller than the number of elementary video streams to be decoded. In other cases, virtual instances of video decoders can be associated with the portions of elementary streams required to be decoded. With this standard, the resource requirements of a platform running multiple virtual video decoder instances can be further optimized by considering the specific decoded video regions to be presented to the users rather than considering only the number of video elementary streams in use. The first edition of the VDI standard includes support for the following video coding standards: High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC), and Essential Video Coding (EVC).
Research aspect: VDI is also a promising standard to enable the implementation of viewport adaptive tile-based 360-degree video streaming, but its performance still needs to be assessed in various scenarios. However, requesting and decoding individual tiles within a 360-degree video streaming application is a prerequisite for enabling efficiency in such cases, and VDI provides the basis for its implementation.
MPEG-DASH Updates
Finally, I’d like to provide a quick update regarding MPEG-DASH, which seems to be in maintenance mode. As mentioned in my last blog post, amendments, Defects under Investigation (DuI), and Technologies under Consideration (TuC) are output documents, as well as a new working draft called Redundant encoding and packaging for segmented live media (REAP), which eventually will become ISO/IEC 23009-9. The scope of REAP is to define media formats for redundant encoding and packaging of live segmented media, media ingest, and asset storage. The current working draft can be downloaded here.
Research aspects: REAP defines a distributed system and, thus, all research aspects related to such systems apply here, e.g., performance and scalability, just to name a few.
The 141st MPEG meeting will be online from January 16-20, 2023. Click here for more information about MPEG meetings and their developments.