VQEG Column: VQEG Meeting December 2023

Introduction

The last plenary meeting of the Video Quality Experts Group (VQEG) was held online by the University of Konstantz (Germany) in December 18th to 21st, 2023. It offered the possibility to more than 100 registered participants from 19 different countries worldwide to attend the numerous presentations and discussions about topics related to the ongoing projects within VQEG. All the related information, minutes, and files from the meeting are available online in the VQEG meeting website, and video recordings of the meeting are soon available at Youtube.

All the topics mentioned below can be of interest for the SIGMM community working on quality assessment, but special attention can be devoted to the current activities on improvements of the statistical analysis of subjective experiments and objective metrics and on the development of a test plan to evaluate the QoE of immersive interactive communication systems in collaboration with ITU.

Readers of these columns interested in the ongoing projects of VQEG are encouraged to suscribe to the VQEG’s  email reflectors to follow the activities going on and to get involved with them.

As already announced in the VQEG website, the next VQEG plenary meeting be hosted by Universität Klagenfurt in Austria from July 1st to 5th, 2024.

Group picture of the online meeting

Overview of VQEG Projects

Audiovisual HD (AVHD)

The AVHD group works on developing and validating subjective and objective methods to analyze commonly available video systems. During the meeting, there were various sessions in which presentations related to these topics were discussed.

Firstly, Ali Ak (Nantes Université, France), provided an analysis of the relation between acceptance/annoyance and visual quality in a recently collected dataset of several User Generated Content (UGC) videos. Then, Syed Uddin (AGH University of Krakow, Poland) presented a video quality assessment method based on the quantization parameter of MPEG encoders (MPEG-4, MPEG-AVC, and MPEG-HEVC) leveraging VMAF. In addition, Sang Heon Le (LG Electronics, Korea) presented a technique for pre-enhancement for video compression and applicable subjective quality metrics. Another talk was given by Alexander Raake (TU Ilmenau, Germany), who presented AVQBits, a versatile no-reference bitstream-based video quality model (based on the standardized ITU-T P.1204.3 model) that can be applied in several contexts such as video service monitoring, evaluation of video encoding quality, of gaming video QoE, and even of omnidirectional video quality. Also, Jingwen Zhu (Nantes Université, France) and Hadi Amirpour (University of Klagenfurt, Austria) described a study on the evaluation of the effectiveness of different video quality metrics in predicting the Satisfied User Ratio (SUR) in order to enhance the VMAF proxy to better capture content-specific characteristics. Andreas Pastor (Nantes Université, France) presented a method to predict the distortion perceived locally by human eyes in AV1-encoded videos using deep features, which can be easily integrated into video codecs as a pre-processing step before starting encoding.

In relation with standardization efforts, Mathias Wien (RWTH Aachen University, Germany) gave an overview on recent expert viewing tests that have been conducted within MPEG AG5 at the 143rd and 144th MPEG meetings. Also, Kamil Koniuch (AGH University of Krakow, Poland) presented a proposal to update the Survival Game task defined in the ITU-T Recommendation P.1301 on subjective quality evaluation of audio and audiovisual multiparty telemeetings, in order to improve its implementation and application to recent efforts such as the evaluation of immersive communication systems within the ITU-T P.IXC (see the paragraph related to the Immersive Media Group).

Quality Assessment for Health applications (QAH)

The QAH group is focused on the quality assessment of health applications. It addresses subjective evaluation, generation of datasets, development of objective metrics, and task-based approaches. Recently, the group has been working towards an ITU-T recommendation for the assessment of medical contents. On this topic, Meriem Outtas (INSA Rennes, France) led a discussion dealing with the edition of a draft of this recommendation. In addition, Lumi Xia (INSA Rennes, France) presented a study of task-based medical image quality assessment focusing on a use case of adrenal lesions.

Statistical Analysis Methods (SAM)

The group SAM investigates on analysis methods both for the results of subjective experiments and for objective quality models and metrics. This was one of the most active groups in this meeting, with several presentations on related topics.

On this topic, Krzystof Rusek (AGH University of Krakow, Poland) presented a Python package to estimate Generalized Score Distribution (GSD) parameters and showed how to use it to test the results obtained in subjective experiments. Andreas Pastor (Nantes Université, France) presented a comparison between two subjective studies using Absolute Category Rating with Hidden Reference (ACR-HR) and Degradation Category Rating (DCR), conducted in a controlled laboratory environment on SDR HD, UHD, and HDR UHD contents using naive observers. The goal of these tests is to estimate rate-distortion savings between two modern video codecs and compare the precision and accuracy of both subjective methods. He also presented another study on the comparison of conditions for omnidirectional video with spatial audio in terms of subjective quality and impacts on objective metrics resolving power.

In addition, Lukas Krasula (Netflix, USA) introduced e2nest, a web-based platform to conduct media-centric (video, audio, and images) subjective tests. Also, Dietmar Saupe (University of Konstanz, Germany) and Simon Del Pin (NTNU, Norway) showed the results of a study analyzing the national difference in image quality assessment, showing significant differences in various areas. Alexander Raake (TU Ilmenau, Germany) presented a study on the remote testing of high resolution images and videos, using AVrate Voyager , which is a publicly accessible framework for online tests. Finally, Dominik Keller (TU Ilmenau, Germany) presented a recent study exploring the impact of 8K (UHD-2) resolution on HDR video quality, considering different viewing distances. The results showed that the enhanced video quality of 8K HDR over 4K HDR diminishes with increasing viewing distance.

No Reference Metrics (NORM)

The group NORM addresses a collaborative effort to develop no-reference metrics for monitoring visual service quality. In At this meeting, Ioannis Katsavounidis (Meta, USA) led a discussion on the current efforts to improve complexity image and video metrics. In addition, Krishna Srikar Durbha (Univeristy of Texas at Austin, USA) presented a technique to tackle the problem of bitrate ladder construction based on multiple Visual Information Fidelity (VIF) feature sets extracted from different scales and subbands of a video

Emerging Technologies Group (ETG)

The ETG group focuses on various aspects of multimedia that, although they are not necessarily directly related to “video quality”, can indirectly impact the work carried out within VQEG and are not addressed by any of the existing VQEG groups. In particular, this group aims to provide a common platform for people to gather together and discuss new emerging topics, possible collaborations in the form of joint survey papers, funding proposals, etc.

In this meeting, Nabajeet Barman and Saman Zadtootaghaj (Sony Interactive Entertainment, Germany), suggested a topic to start to be discussed within VQEG: Quality Assessment of AI Generated/Modified Content. The goal is to have subsequent discussions on this topic within the group and write a position or whitepaper.

Joint Effort Group (JEG) – Hybrid

The group JEG addresses several areas of Video Quality Assessment (VQA), such as the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. In addition, the group includes the VQEG project Implementer’s Guide for Video Quality Metrics (IGVQM). At the meeting, Enrico Masala (Politecnico di Torino, Italy) provided  updates on the activities of the group and on IGVQM.

Apart from this, there were three presentations addressing related topics in this meeting, delivered by Lohic Fotio Tiotsop (Politecnico di Torino, Italy). The first presentation focused on quality estimation in subjective experiments and the identification of peculiar subject behaviors, introducing a robust approach for estimating subjective quality from noisy ratings, and a novel subject scoring model that enables highlighting several peculiar behaviors. Also, he introduced a non-parametric perspective to address the media quality recovery problem, without making any a priori assumption on the subjects’ scoring behavior. Finally, he presented an approach called “human-in-the-loop training process” that uses  multiple cycles of a human voting, DNN training, and inference procedure.

Immersive Media Group (IMG)

The IMG group is performing research on the quality assessment of immersive media technologies. Currently, the main joint activity of the group is the development of a test plan to evaluate the QoE of immersive interactive communication systems, which is carried out in collaboration with ITU-T through the work item P.IXC. In this meeting, Pablo Pérez (Nokia XR Lab, Spain), Jesús Gutiérrez (Universidad Politécnica de Madrid, Spain), Kamil Koniuch (AGH University of Krakow, Poland), Ashutosh Singla (CWI, The Netherlands) and other researchers involved in the test plan provided an update on the status of the test plan, focusing on the description of four interactive tasks to be performed in the test, the considered measures, and the 13 different experiments that will be carried out in the labs involved in the test plan. Also, in relation with this test plan, Felix Immohr (TU Ilmenau, Germany), presented a study on the impact of spatial audio on social presence and user behavior in multi-modal VR communications.

Diagram of the methodology of the joint IMG test plan

Quality Assessment for Computer Vision Applications (QACoViA)

The group QACoViA addresses the study the visual quality requirements for computer vision methods, where the final user is an algorithm. In this meeting, Mikołaj Leszczuk (AGH University of Krakow, Poland) and  Jingwen Zhu (Nantes Université, France) presented a specialized data set developed for enhancing Automatic License Plate Recognition (ALPR) systems. In addition, Hanene Brachemi (IETR-INSA Rennes, France), presented an study on evaluating the vulnerability of deep learning-based image quality assessment methods to adversarial attacks. Finally, Alban Marie (IETR-INSA Rennes, France) delivered a talk on the exploration of lossy image coding trade-off between rate, machine perception and quality.

5G Key Performance Indicators (5GKPI)

The 5GKPI group studies relationship between key performance indicators of new 5G networks and QoE of video services on top of them. At the meeting, Pablo Pérez (Nokia XR Lab, Spain) led an open discussion on the future activities of the group towards 6G, including a brief presentation of QoS/QoE management in 3GPP and presenting potential opportunities to influence QoE in 6G.

MPEG Column: 146th MPEG Meeting in Rennes, France

The 146th MPEG meeting was held in Rennes, France from 22-26 April 2024, and the official press release can be found here. It comprises the following highlights:

  • AI-based Point Cloud Coding*: Call for proposals focusing on AI-driven point cloud encoding for applications such as immersive experiences and autonomous driving.
  • Object Wave Compression*: Call for interest in object wave compression for enhancing computer holography transmission.
  • Open Font Format: Committee Draft of the fifth edition, overcoming previous limitations like the 64K glyph encoding constraint.
  • Scene Description: Ratified second edition, integrating immersive media objects and extending support for various data types.
  • MPEG Immersive Video (MIV): New features in the second edition, enhancing the compression of immersive video content.
  • Video Coding Standards: New editions of AVC, HEVC, and Video CICP, incorporating additional SEI messages and extended multiview profiles.
  • Machine-Optimized Video Compression*: Advancement in optimizing video encoders for machine analysis.
  • MPEG-I Immersive Audio*: Reached Committee Draft stage, supporting high-quality, real-time interactive audio rendering for VR/AR/MR.
  • Video-based Dynamic Mesh Coding (V-DMC)*: Committee Draft status for efficiently storing and transmitting dynamic 3D content.
  • LiDAR Coding*: Enhanced efficiency and responsiveness in LiDAR data processing with the new standard reaching Committee Draft status.

* … covered in this column.

AI-based Point Cloud Coding

MPEG issued a Call for Proposals (CfP) on AI-based point cloud coding technologies as a result from ongoing explorations regarding use cases, requirements, and the capabilities of AI-driven point cloud encoding, particularly for dynamic point clouds.

With recent significant progress in AI-based point cloud compression technologies, MPEG is keen on studying and adopting AI methodologies. MPEG is specifically looking for learning-based codecs capable of handling a broad spectrum of dynamic point clouds, which are crucial for applications ranging from immersive experiences to autonomous driving and navigation. As the field evolves rapidly, MPEG expects to receive multiple innovative proposals. These may include a unified codec, capable of addressing multiple types of point clouds, or specialized codecs tailored to meet specific requirements, contingent upon demonstrating clear advantages. MPEG has therefore publicly called for submissions of AI-based point cloud codecs, aimed at deepening the understanding of the various options available and their respective impacts. Submissions that meet the requirements outlined in the call will be invited to provide source code for further analysis, potentially laying the groundwork for a new standard in AI-based point cloud coding. MPEG welcomes all relevant contributions and looks forward to evaluating the responses.

Research aspects: In-depth analysis of algorithms, techniques, and methodologies, including a comparative study of various AI-driven point cloud compression techniques to identify the most effective approaches. Other aspects include creating or improving learning-based codecs that can handle dynamic point clouds as well as metrics for evaluating the performance of these codecs in terms of compression efficiency, reconstruction quality, computational complexity, and scalability. Finally, the assessment of how improved point cloud compression can enhance user experiences would be worthwhile to consider here also.

Object Wave Compression

A Call for Interest (CfI) in object wave compression has been issued by MPEG. Computer holography, a 3D display technology, utilizes a digital fringe pattern called a computer-generated hologram (CGH) to reconstruct 3D images from input 3D models. Holographic near-eye displays (HNEDs) reduce the need for extensive pixel counts due to their wearable design, positioning the display near the eye. This positions HNEDs as frontrunners for the early commercialization of computer holography, with significant research underway for product development. Innovative approaches facilitate the transmission of object wave data, crucial for CGH calculations, over networks. Object wave transmission offers several advantages, including independent treatment from playback device optics, lower computational complexity, and compatibility with video coding technology. These advancements open doors for diverse applications, ranging from entertainment experiences to real- time two-way spatial transmissions, revolutionizing fields such as remote surgery and virtual collaboration. As MPEG explores object wave compression for computer holography transmission, a Call for Interest seeks contributions to address market needs in this field.

Research aspects: Apart from compression efficiency, lower computation complexity, and compatibility with video coding technology, there is a range of research aspects, including the design, implementation, and evaluation of coding algorithms within the scope of this CfI. The QoE of computer-generated holograms (CGHs) together with holographic near-eye displays (HNEDs) is yet another dimension to be explored.

Machine-Optimized Video Compression

MPEG started working on a technical report regarding to the “Optimization of Encoders and Receiving Systems for Machine Analysis of Coded Video Content”. In recent years, the efficacy of machine learning-based algorithms in video content analysis has steadily improved. However, an encoder designed for human consumption does not always produce compressed video conducive to effective machine analysis. This challenge lies not in the compression standard but in optimizing the encoder or receiving system. The forthcoming technical report addresses this gap by showcasing technologies and methods that optimize encoders or receiving systems to enhance machine analysis performance.

Research aspects: Video (and audio) coding for machines has been recently addressed by MPEG Video and Audio working groups, respectively. MPEG Joint Video Experts Team with ITU-T SG16, also known as JVET, joined this space with a technical report, but research aspects remain unchanged, i.e., coding efficiency, metrics, and quality aspects for machine analysis of compressed/coded video content.

MPEG-I Immersive Audio

MPEG Audio Coding is entering the “immersive space” with MPEG-I immersive audio and its corresponding reference software. The MPEG-I immersive audio standard sets a new benchmark for compact and lifelike audio representation in virtual and physical spaces, catering to Virtual, Augmented, and Mixed Reality (VR/AR/MR) applications. By enabling high-quality, real-time interactive rendering of audio content with six degrees of freedom (6DoF), users can experience immersion, freely exploring 3D environments while enjoying dynamic audio. Designed in accordance with MPEG’s rigorous standards, MPEG-I immersive audio ensures efficient distribution across bandwidth-constrained networks without compromising on quality. Unlike proprietary frameworks, this standard prioritizes interoperability, stability, and versatility, supporting both streaming and downloadable content while seamlessly integrating with MPEG-H 3D audio compression. MPEG-I’s comprehensive modeling of real-world acoustic effects, including sound source properties and environmental characteristics, guarantees an authentic auditory experience. Moreover, its efficient rendering algorithms balance computational complexity with accuracy, empowering users to finely tune scene characteristics for desired outcomes.

Research aspects: Evaluating QoE of MPEG-I immersive audio-enabled environments as well as the efficient audio distribution across bandwidth-constrained networks without compromising on audio quality are two important research aspects to be addressed by the research community.

Video-based Dynamic Mesh Coding (V-DMC)

Video-based Dynamic Mesh Compression (V-DMC) represents a significant advancement in 3D content compression, catering to the ever-increasing complexity of dynamic meshes used across various applications, including real-time communications, storage, free-viewpoint video, augmented reality (AR), and virtual reality (VR). The standard addresses the challenges associated with dynamic meshes that exhibit time-varying connectivity and attribute maps, which were not sufficiently supported by previous standards. Video-based Dynamic Mesh Compression promises to revolutionize how dynamic 3D content is stored and transmitted, allowing more efficient and realistic interactions with 3D content globally.

Research aspects: V-DMC aims to allow “more efficient and realistic interactions with 3D content”, which are subject to research, i.e., compression efficiency vs. QoE in constrained networked environments.

Low Latency, Low Complexity LiDAR Coding

Low Latency, Low Complexity LiDAR Coding underscores MPEG’s commitment to advancing coding technologies required by modern LiDAR applications across diverse sectors. The new standard addresses critical needs in the processing and compression of LiDAR-acquired point clouds, which are integral to applications ranging from automated driving to smart city management. It provides an optimized solution for scenarios requiring high efficiency in both compression and real-time delivery, responding to the increasingly complex demands of LiDAR data handling. LiDAR technology has become essential for various applications that require detailed environmental scanning, from autonomous vehicles navigating roads to robots mapping indoor spaces. The Low Latency, Low Complexity LiDAR Coding standard will facilitate a new level of efficiency and responsiveness in LiDAR data processing, which is critical for the real-time decision-making capabilities needed in these applications. This standard builds on comprehensive analysis and industry feedback to address specific challenges such as noise reduction, temporal data redundancy, and the need for region-based quality of compression. The standard also emphasizes the importance of low latency coding to support real-time applications, essential for operational safety and efficiency in dynamic environments.

Research aspects: This standard effectively tackles the challenge of balancing high compression efficiency with real-time capabilities, addressing these often conflicting goals. Researchers may carefully consider these aspects and make meaningful contributions.

The 147th MPEG meeting will be held in Sapporo, Japan, from July 15-19, 2024. Click here for more information about MPEG meetings and their developments.

JPEG Column: 102nd JPEG Meeting in San Francisco, U.S.A.

JPEG Trust reaches Draft International Standard stage

The 102nd JPEG meeting was held in San Francisco, California, USA, from 22 to 26 January 2024. At this meeting, JPEG Trust became a Draft International Standard. Moreover, the responses to the Call for Proposals of JPEG NFT were received and analysed. As a consequence, relevant steps were taken towards the definition of standardized tools for certification of the provenance and authenticity of media content in a time where tools for effective media manipulation should be made available to the general public. The 102nd JPEG meeting was finalised with the JPEG Emerging Technologies Workshop, at Tencent, Palo Alto on 27 January.

JPEG Emerging Technologies Workshop, organised on 27 January at Tencent, Palo Alto

The following sections summarize the main highlights of the 102nd JPEG meeting:

  • JPEG Trust reaches Draft International Standard stage;
  • JPEG AI improves the Verification Model;
  • JPEG Pleno Learning-based Point Cloud coding releases the Committee Draft;
  • JPEG Pleno Light Field continues development of Quality assessment tools;
  • AIC starts working on Objective Quality Assessment models for Near Visually Lossless coding;
  • JPEG XE prepares Common Test Conditions;
  • JPEG DNA evaluates its Verification Model;
  • JPEG XS 3rd edition parts are ready for publication as International standards;
  • JPEG XL investigate HDR compression performance.

JPEG Trust

At its 102nd meeting the JPEG Committee produced the DIS (Draft International Standard) of JPEG Trust Part 1 “Core Foundation” (21617-1). It is expected that the standard will be published as an International Standard during the Summer of 2024. This rapid standardization schedule has been necessary because of the speed at which fake media and misinformation are proliferating especially with respect to Generative AI.

The JPEG Trust Core Foundation specifies a comprehensive framework for individuals, organizations, and governing institutions interested in establishing an environment of trust for the media that they use, and for supporting trust in the media they share online. This framework addresses aspects of provenance, authenticity, integrity, copyright, and identification of assets and stakeholders. To complement Part 1, a proposed new Part 2 “Trust Profiles Catalogue” has been established. This new Part will specify a catalogue of Trust Profiles, targeting common usage scenarios.

During the meeting, the committee also evaluated responses received to the JPEG NFT Final Call for Proposals (CfP). Certain portions of the submissions will be incorporated in the JPEG Trust suite of standards to improve interoperability with respect to media tokenization. As a first step, the committee will focus on standardization of declarations of authorship and ownership.

Finally, the Use Cases and Requirements document for JPEG Trust was updated to incorporate additional requirements in respect of composited media. This document is publicly available on the JPEG website.

white paper describing the JPEG Trust framework is also available publicly on the JPEG website.

JPEG AI

At the 102nd JPEG meeting, the JPEG AI Verification Model was improved by integrating nearly all the contributions adopted at the 101st JPEG meeting. The major change is a multi-branch JPEG AI decoding architecture with two encoders and three decoders (6 possible compatible combinations) that have been jointly trained, which allows the coverage of encoder and decoder complexity-efficiency tradeoffs. The entropy decoding and latent prediction portion is common for all possible combinations and thus differences reside at the analysis/synthesis networks. Moreover, the number of models has been reduced to 4, both 4:4:4 and 4:2:0 coding is supported, and JPEG AI can now achieve better rate-distortion performance in some relevant use cases. A new training dataset has also been adopted with difficult/high-contrast/versatile images to reduce the number of artifacts and to achieve better generalization and color reproducibility for a wide range of situations. Other enhancements have also been adopted, namely feature clipping for decoding artifacts reduction, improved variable bit-rate training strategy and post-synthesis transform filtering speedups.

The resulting performance and complexity characterization show compression efficiency (BD-rate) gains of 12.5% to 27.9% over the VVC Intra anchor, for relevant encoder and decoder configurations with a wide range of complexity-efficiency tradeoffs (7 to 216 kMAC/px at the decoder side). For the CPU platform, the decoder complexity is 1.6x/3.1x times higher compared to VVC Intra (reference implementation) for the simplest/base operating point. At the 102nd meeting, 12 core experiments were established to further continue work related to different topics, namely about the JPEG AI high-level syntax, progressive decoding, training dataset, hierarchical dependent tiling, spatial random access, to mention the most relevant. Finally, two demonstrations were shown where JPEG AI decoder implementations were run on two smartphone devices, Huawei Mate50 Pro and iPhone14 Pro.

JPEG Pleno Learning-based Point Cloud coding

The 102nd JPEG meeting marked an important milestone for JPEG Pleno Point Cloud with the release of its Committee Draft (CD) for ISO/IEC 21794-Part 6 “Learning-based point cloud coding” (21794-6). Part 6 of the JPEG Pleno framework brings an innovative Learning-based Point Cloud Coding technology adding value to existing Parts focused on Light field and Holography coding. It is expected that a Draft International Standard (DIS) of Part 6 will be approved at the 104th JPEG meeting in July 2024 and the International Standard to be published during 2025. The 102nd meeting also marked the release of version 4 of the JPEG Pleno Point Cloud Verification Model updated to be robust to different hardware and software operating environments.

JPEG Pleno Light Field

The JPEG Committee has recently published a light field coding standard, and JPEG Pleno is constantly exploring novel light field coding architectures. The JPEG Committee is also preparing standardization activities – among others – in the domains of objective and subjective quality assessment for light fields, improved light field coding modes, and learning-based light field coding.

As the JPEG Committee seeks continuous improvement of its use case and requirements specifications, it organized a Light Field Industry Workshop. The presentations and video recording of the workshop that took place on November 22nd, 2023 are available on the JPEG website.

JPEG AIC

During the 102nd JPEG meeting, work on Image Quality Assessment continued with a focus on JPEG AIC-3, targeting standardizing a subjective visual quality assessment methodology for images in the range from high to nearly visually lossless qualities. The activity is currently investigating three different subjective image quality assessment methodologies.

The JPEG Committee also launched the activities on Part 4 of the standard (AIC-4), by initiating work on the Draft Call for Proposals on Objective Image Quality Assessment. The Final Call for Proposals on Objective Image Quality Assessment is planned to be released in July 2024, while the submission of the proposals is planned for October 2024.

JPEG XE

The JPEG Committee continued its activity on JPEG XE and event-based vision. This activity revolves around a new and emerging image modality created by event-based visual sensors. JPEG XE is about the creation and development of a standard to represent events in an efficient way allowing interoperability between sensing, storage, and processing, targeting machine vision and other relevant applications. The JPEG Committee is preparing a Common Test Conditions document that provides the means to perform an evaluation of candidate technology for the efficient coding of event sequences. The Common Test Conditions provide a definition of a reference format, a dataset, a set of key performance metrics and an evaluation methodology. In addition, the committee is preparing a Draft Call for Proposals on lossless coding, with the intent to make it public in April of 2024. Standardization will first start with lossless coding of event sequences as this seems to have the higher application urgency in industry. However, the committee acknowledges that lossy coding of event sequences is also a valuable feature, which will be addressed at a later stage. The public Ad-hoc Group on Event-based Vision was reestablished to continue the work towards the next 103rd JPEG meeting in April of 2024. To stay informed about the activities please join the event based imaging Ad-hoc Group mailing list.

JPEG DNA

During the 102nd JPEG meeting, the JPEG DNA Verification Model description and software were approved along with continued efforts to evaluate its rate-distortion characteristics. Notably, during the 102nd meeting, a subjective quality assessment was carried out by expert viewing using a new approach under development in the framework of AIC-3. The robustness of the Verification Model to errors generated in a biochemical process was also analysed using a simple noise simulator. After meticulous analysis of the results, it was decided to create a number of core experiments to improve the Verification Model rate-distortion performance and the robustness to the errors by adding an error correction technique to the latter. In parallel, efforts are underway to improve the rate-distortion performance of the JPEG DNA Verification Model by exploring learning-based coding solutions. In addition, further efforts are defined to improve the noise simulator so as to allow assessment of the resilience to noise in the Verification Model in more realistic conditions, laying the groundwork for a JPEG DNA robust to insertion, deletion and substitution errors.

JPEG XS

The JPEG Committee is happy to announce that the core parts of JPEG XS 3rd edition are ready for publication as International standards. The Final Draft International Standard for Part 1 of the standard – Core coding tools – was created at the last meeting in November 2023, and is scheduled for publication. DIS ballot results for Part 2 – Profiles and buffer models – and Part 3 – Transport and container formats – of the standard came back, allowing the JPEG Committee to produce and deliver the proposed IS texts to ISO. This means that Part 2 and Part 3 3rd edition are also scheduled for publication.

At this meeting, the JPEG Committee continued the work on Part 4 – Conformance testing, to provide the necessary test streams of the 3rd edition for potential implementors. A Committee Draft for Part 4 was issued. With Parts 1, 2, and 3 now ready, and Part 4 ongoing, the JPEG Committee initiated the 3rd edition of Part 5 – Reference software. A first Working Draft was prepared and work on the reference software will start.

Finally, experimental results were presented on how to use JPEG XS over 5G mobile networks for the transmission of low-latency and high quality 4K/8K 360 degree views with mobile devices. This use case was added at the previous JPEG meeting. It is expected that the new use case can already be covered by the 3rd edition, meaning that no further updates to the standard would be necessary. However, investigations and experimentation on this subject continue.

JPEG XL

The second edition of JPEG XL Part 3 (Conformance testing) has proceeded to the DIS stage. Work on a hardware implementation continues. Experiments are planned to investigate HDR compression performance of JPEG XL.

“In its efforts to provide standardized solutions to ascertain authenticity and provenance of the visual information, the JPEG Committee has released the Draft international Standard of the JPEG Trust. JPEG Trust will bring trustworthiness back to imaging with specifications under the governance of the entire International community and stakeholders as opposed to a small number of companies or countries.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Energy-Efficient Video Streaming: Open-Source Tools, Datasets, and Solutions


Abstract: Energy efficiency has become a crucial aspect of today’s IT infrastructures, and video (streaming) accounts for over half of today’s Internet traffic. This column highlights open-source tools, datasets, and solutions addressing energy efficiency in video streaming presented at ACM Multimedia Systems 2024 and its co-located workshop ACM Green Multimedia Systems.

Introduction

Across various platforms, users seek the highest Quality of Experience (QoE) in video communication and streaming. Whether it’s a crucial business meeting or a relaxing evening of entertainment, individuals desire seamless and high-quality video experiences. However, meeting this demand for high-quality video comes with a cost: increased energy usage [1],[2]. This energy consumption occurs at every stage of the process, including content provision via cloud services and consumption on end users’ devices [3]. Unfortunately, this heightened energy consumption inevitably leads to higher CO2 emissions (except for renewable energy sources), posing environmental challenges. It emphasizes the need for studies to assess the carbon footprint of video streaming. 

Content provision is a critical stage in video streaming, involving encoding videos into various formats, resolutions, and bitrates. Encoding demands computing power and energy, especially in cloud-based systems. Cloud computing has become famous for video encoding due to its scalability [4] to adjust cloud resources to handle changing workloads and flexibility [5] to scale their operations based on demand. However, this convenience comes at a cost. Data centers, the heart of cloud computing, consume a significant portion of global electricity, around 3% [6]. Video encoding is one of the biggest energy consumers within these data centers. Therefore, optimizing video encoding for lower energy consumption is crucial for reducing the environmental impact of cloud-based video delivery.

Content consumption [7] involves the device using the network interface card to request and download video segments from the server, decompressing them for playback, and finally rendering the decoded frames on the screen, where the energy consumption depends on the screen technology and brightness settings.

The GAIA project showcased its research on the environmental impact of video streaming at the recent 15th ACM Multimedia Systems Conference (April 15-18, Bari, Italy). We presented our findings at relevant conference sessions: Open-Source Software and Dataset and the Green Multimedia Systems (GMSys) workshop.

Open Source Software

GREEM: An Open-Source Benchmark Tool Measuring the Environmental Footprint of Video Streaming [PDF] [Github] [Poster]

GREEM (Gaia Resource Energy and Emission Monitoring) aims to measure energy usage during video encoding and decoding processes. GREEM tracks the effects of video processing on hardware performance and provides a suite of analytical scenarios. This tool offers easy-to-use scenarios covering the most common video streaming situations, such as measuring sequential and parallel video encoding and decoding.

Contributions:

  • Accessible:  GREEM is available in a GitHub repository (https://github.com/cd-athena/GREEM) for energy measurement of video processing.
  • Automates experimentation: It allows users to easily configure and run various encoding scenarios with different parameters to compare results.
  • In-depth monitoring: The tool traces numerous hardware parameters, specifically monitoring energy consumption and GPU metrics, including core and memory utilization, temperature, and fan speed, providing a complete picture of video processing resource usage.
  • Visualization: GREEM offers scripts that generate analytic plots, allowing users to visualize and understand their measurement results easily.

Verifiable: GREEM empowers researchers with a tool that has earned the ACM Reproducibility Badge, which allows others to reproduce the experiments and results reported in the paper.

Open Source Datasets

VEED: Video Encoding Energy and CO2 Emissions Dataset for AWS EC2 instances [PDF] [Github] [Poster]

As video encoding increasingly shifts to cloud-based services, concerns about the environmental impact of massive data centers arise. The Video Encoding Energy and CO2 Emissions Dataset (VEED) provides the energy consumption and CO2 emissions associated with video encoding on Amazon’s Elastic Compute Cloud (EC2) instances. Additionally, VEED goes beyond energy consumption as it also captures encoding duration and CPU utilization.

Contributions:

  • Findability: A comprehensive metadata description file ensures VEED’s discoverability for researchers.
  • Accessibility: VEED is open for download on GitHub (https://github.com/cd-athena/VEEDdataset), removing access barriers for researchers. Core findings in the research that leverages the VEED dataset have been independently verified (ACM Reproducibility Badge).
  • Interoperability: The dataset is provided in a comma-separated value (CSV) format, allowing integration with various analysis applications.
  • Reusability: Description files empower researchers to understand the data structure and context, facilitating its use in diverse analytical projects.

COCONUT: Content Consumption Energy Measurement Dataset for Adaptive Video Streaming  [PDF] [Github]

COCONUT is a dataset comprising the energy consumption of video streaming across various devices and different HAS (HTTP Adaptive Streaming) players. COCONUT captures user data during MPEG-DASH video segment streaming on laptops, smartphones, and other client devices, measuring energy consumption at different stages of streaming, including segment retrieval through the network interface card, video decoding, and rendering on the device.This paper has been designated the ACM Artifacts Available badge, signifying that the COCONUT dataset is publicly accessible. COCONUT can be accessed at https://athena.itec.aau.at/coconut/.

Second International ACM Green Multimedia Systems Workshop — GMSys 2024

VEEP: Video Encoding Energy and CO2 Emission Prediction  [pdf] [slides]

In VEEP, a machine learning (ML) scheme that empowers users to predict the energy consumption and CO2 emissions associated with cloud-based video encoding.

Contributions:

  • Content-aware energy prediction:  VEEP analyzes video content to extract features impacting encoding complexity. This understanding feeds an ML model that accurately predicts the energy consumption required for encoding the video on AWS EC2 instances. (High Accuracy: Achieves an R² score of 0.96)
  • Real-time carbon footprint: VEEP goes beyond energy. It also factors in real-time carbon intensity data based on the location of the cloud instance. This allows VEEP to calculate the associated CO2 emissions for your encoding tasks at encoding time.
  • Resulting impact: By carefully selecting the type and location of cloud instances based on VEEP’s predictions, CO2 emissions can be reduced by up to 375 times. This significant reduction signifies VEEP’s potential to contribute to greener video encoding.

Conclusions

This column provided an overview of the GAIA project’s research on the environmental impact of video streaming, presented at the 15th ACM Multimedia Systems Conference. GREEM measurement tool empowers developers and researchers to measure the energy and  CO2 emissions of video processing. VEED provides valuable insights into energy consumption and CO2 emissions during cloud-based video encoding on AWS EC2 instances. COCONUT sheds light on energy usage during video playback on various devices and with different players, aiding in optimizing client-side video streaming. Furthermore, VEEP, a machine learning framework, takes energy efficiency a step further. It allows users to predict energy consumption and CO2 emissions associated with cloud-based video encoding, allowing users to select cloud instances that minimize environmental impact. These studies can help researchers, developers, and service providers to optimize video streaming for a more sustainable future. The focus on encoding and playback highlights the importance of a holistic approach considering the entire video streaming lifecycle. While these papers primarily focus on the environmental impact of video streaming, a strong connection exists between energy efficiency and QoE [8],[9],[10]. Optimizing video processing for lower energy consumption can sometimes lead to trade-offs regarding video quality. Future research directions could explore techniques for optimizing video processing while ensuring a consistently high QoE for viewers.

References

[1] A. Katsenou, J. Mao, and I. Mavromatis, “Energy-Rate-Quality Tradeoffs of State-of-the-Art Video Codecs.” arXiv, Oct. 02, 2022. Accessed: Oct. 06, 2022. [Online]. Available: http://arxiv.org/abs/2210.00618

[2] H. Amirpour, V. V. Menon, S. Afzal, R. Prodan, and C. Timmerer, “Optimizing video streaming for sustainability and quality: The role of preset selection in per-title encoding,” in 2023 IEEE International Conference on Multimedia and Expo (ICME), IEEE, 2023, pp. 1679–1684. Accessed: May 05, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10219577/

[3] S. Afzal, R. Prodan, C. Timmerer, “Green Video Streaming: Challenges and Opportunities – ACM SIGMM Records.” Accessed: May 05, 2024. [Online]. Available: https://records.sigmm.org/2023/01/08/green-video-streaming-challenges-and-opportunities/

[4] A. Atadoga, U.  J. Umoga, O. A. Lottu, and E. O. Sodiya, “Evaluating the impact of cloud computing on accounting firms: A review of efficiency, scalability, and data security,” Glob. J. Eng. Technol. Adv., vol. 18, no. 2, pp. 065–075, Feb. 2024, doi: 10.30574/gjeta.2024.18.2.0027.

[5] B. Zeng, Y. Zhou, X. Xu, and D. Cai, “Bi-level planning approach for incorporating the demand-side flexibility of cloud data centers under electricity-carbon markets,” Appl. Energy, vol. 357, p. 122406, Mar. 2024, doi: 10.1016/j.apenergy.2023.122406.

[6] M. Law, “Energy efficiency predictions for data centers in 2023.” Accessed: May 03, 2024. [Online]. Available: https://datacentremagazine.com/articles/efficiency-to-loom-large-for-data-centre-industry-in-2023

[7] C. Yue, S. Sen, B. Wang, Y. Qin, and F. Qian, “Energy considerations for ABR video streaming to smartphones: Measurements, models and insights,” in Proceedings of the 11th ACM Multimedia Systems Conference, 2020, pp. 153–165, doi: 10.1145/3339825.3391867.

[8] G. Bingöl, A. Floris, S. Porcu, C. Timmerer, and L. Atzori, “Are Quality and Sustainability Reconcilable? A Subjective Study on Video QoE, Luminance and Resolution,” in 2023 15th International Conference on Quality of Multimedia Experience (QoMEX), IEEE, 2023, pp. 19–24. Accessed: May 06, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10178513/

[9] G. Bingöl, S. Porcu, A. Floris, and L. Atzori, “An Analysis of the Trade-Off Between Sustainability and Quality of Experience for Video Streaming,” in 2023 IEEE International Conference on Communications Workshops (ICC Workshops), IEEE, 2023, pp. 1600–1605. Accessed: May 06, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10283614/

[10] C. Herglotz, W. Robitza, A. Raake, T. Hossfeld, and A. Kaup, “Power Reduction Opportunities on End-User Devices in Quality-Steady Video Streaming.” arXiv, May 24, 2023. doi: 10.48550/arXiv.2305.15117.

First edition of the Social Robotics, Artificial Intelligence and Multimedia (SoRAIM) School


In February 20204 was held the first edition of the Social Robotics, Artificial Intelligence and Multimedia (SoRAIM) Winter School, which, with the support of SIGMM attracted more than 50 students and young researchers to learn, discuss and first-hand experiment in topics related to social robotics. The event’s success calls for further editions in upcoming years.

Rationale for SoRAIM

SPRING, a collaborative research project funded by the European Commission under Horizon 2020, is coming to an end in May 2024. Its scientific and technological objectives were to test a versatile social robotic platform within a hospital and have it perform social activities in a multi-person, dynamic setup are in most part achieved. In order to empower the next generation of young researchers with concepts and tools to answer tomorrow’s challenges in the field of social robotics, one must tackle the issue of knowledge and know-how transmission. We therefore chose to provide a winter school, free of charge to the participants (thanks to the additional support of SIGMM), so that as many students and young researchers from various horizons (not only technical fields) could attend. 

Contents of the Winter School

The Social Robotics, Artificial Intelligence and Multimedia (SoRAIM) Winter School took place from 19 to 23 February 2024 in Grenoble, France. An introduction to the contents of the school and the context provided by the SPRING project was provided, and a demonstration combining social navigation and dialogue interaction was given on the first day. This triggered the curiosity of the participants, and a spontaneous Q&A session with the contributions, questions and comments from the participants to the school was held. 

The school spanned over the entire week, with 17 talks, 8 speakers from the H2020 SPRING project, and 9 invited speakers external to the project. The school also included a panel discussion on the topic “Are social robots already out there? Immediate challenges in real-world deployment”, a poster session with 15 contributions, and two hands-on sessions where the participants could choose among the following topics: Robot navigation with Reinforcement Learning, ROS4HRI: How to represent and reason about humans with ROS, Building a conversational system with LLMs using prompt engineering, Robot self-localisation based on camera images, and Speaker extraction from microphone recordings. A social activity (visit of Grenoble’s downtown and Bastille) was organised on Thursday afternoon, allowing participants to mingle with speakers and to discover the host town’s history.

One of the highlights of SoRAIM was its Panel Session, which topic was “Are social robots already out there? Immediate challenges in real-world deployment”.  Although no definitive answers were found, the session stressed the fact that challenges remain numerous for the deployment of actual social robots in our everyday lives (at work, at home). On the technical side, because robotic platforms are subject to certain hardware and software constraints. On the hardware side, because sensors and actuators are restricted in size, power and performance, since the physical space and the battery capacity are also limited. On the software side, because large models can be used if lots of computing resources are permanently available, which is not always the case, since they need to be shared between the various computing modules. Finally on the regulatory and legal side, because the rise of AI use is fast and needs to be balanced with ethical views that address our society’s needs; but the construction of proper laws, norms and their acknowledgement and understanding by stakeholders is slow. In this session the panellists surveyed all aspects of the problems at hand and provided an overview of the challenges that future scientists will need to solve in order to take social robots out of the labs and into the world.

Attendance & future perspectives

SoRAIM attracted 57 participants through the whole week. The attendees were diverse, as was aimed initially, with a breakdown of 50% of PhD students, 20% of young researchers (public sector), 10% of engineers and young researchers (private sector), and 20% of MSc students. Of particular focus, the ratio of women attendees was close to 40%, which is double of the usual in this field. Finally, in terms of geographic spread, attendees came in majority from other European countries (17 countries total), with just below 50% attendees coming from France. Following the school, a satisfaction survey was sent to the attendees in order to better grasp which elements were the most appreciated in view of a longer-term objective to hold this winter school as a serial event. Given the diverse background of attendees, opinions on contents such as the hands-on session varied, but overall satisfaction was very high, which shows the interest of the next generation of researchers for more opportunities to learn in this field. We are currently reviewing options to held similar events each year or every two years, depending on available funding.

More information about the SoRAIM winter school is available on the webpage: https://spring-h2020.eu

Sponsors

SoRAIM was sponsored by the H2020 SPRING project, Inria, the University Grenoble Alpes, the Multidisciplinary Institute of Artificial Intelligence and by ACM’s Special Interest Group on Multimedia (SIGMM). Through ACM SIGMM, we received significant funding which allowed us to invite 14 students and young researchers, members of SIGMM, from abroad.

Full list of contributions

All the talks are available in replay on our YouTube channel: https://www.youtube.com/watch?v=ckJv0eKOgzY&list=PLwdkYSztYsLfWXWai6mppYBwLVjK0VA6y
The complete list of talks and posters presented at SoRAIM Winter School 2024 can be found here: https://spring-h2020.eu/soraim/
In the following, the list of talks in chronological order: