The support from ACM SIGMM permitted to bring top researchers in the field of XR to deliver keynotes in different topics related to technological (e.g., volumetric video, XR for healthcare) and user-centric factors (e.g., user experience evaluation, multisensory experiences, etc.), such as Pablo César (CWI, Netherlands), Anthony Steed (UCL, UK), Manuela Chessa (University of Genoa, Italy), Qi Sun (New York University), Diego Gutiérrez (Universidad de Zaragoza, Spain), and Marianna Obrist (UCL, UK). Also, an industry session was also included, led by Gloria Touchard (Nokia, Spain).
In addition to these 7 keynotes, the program included a creative design session led by Elena Márquez-Segura (Universidad Carlos III, Spain), a tutorial on Ubiq given by Sebastian Friston (UCL, UK), and 2 practical sessions led by Telmo Zarraonandia (Universidad Carlos III, Spain) and Sandra Malpica (Universidad de Zaragoza, Spain) to get hands-on experience working with Unity for VR development .
Moreover, there were poster and demo sessions distributed along the whole duration of the school, in which the participants could showcase their works.
The summer school was also sponsored by Nokia and the Computer Science and Engineering Department of Universidad Carlos III, which allowed to offer grants to support a number of students with the registration and traveling costs.
Finally, in addition to science, there was time for fun with social activities, like practicing real and VR archery and visiting and immersive exhibition about Pompeii.
The 21st International Conference on Content-based Multimedia Indexing (CBMI) was hosted by Reykjavik University in cooperation with ACM, SIGMM, VTT and IEEE. The three-day event took place on September 20-22 in Reykjavik, Iceland. Like the year before, it was as an exclusively in-person event. Despite the remote location, an active volcano and in person attendance requirement, we are pleased to report that we had a perfect attendance of presenting authors. CBMI was started in France and still has strong European roots. Looking at the nationality of the submitting authors we can see 17 unique nationalities, 14 countries in Europe, 2 in Asia and 1 in North America.
Conference highlights
Key elements of a successful conference are the keynote sessions. The first and opening keynote, titled “What does it mean to ‘work as intended’?” was presented by Dr. Cynthia C. S. Liem on day 1. In this talk Cynthia raised important questions on how complex it can be to define, measure and evaluate human-focused systems. Using real-world examples, she demonstrated how recently developed systems, that passed the traditional evaluating metrics, still failed when deployed in the real-world. Her talk was an important reminder that certain weaknesses in human-focused systems are only revealed when exposed to reality.
Traditionally there are only two keynotes at CBMI, first on day 1 and second on day 2. However, our planned second keynotes could not attend until the last day and thus a 3rd “surprise” keynote was organized on day 2 with the title “Remote Sensing of Natural Hazards”. The speaker was Dr. Ingibjörg Jónsdóttir, an associate professor of geology at the University of Iceland. She gave a very interesting talk about the unique geology of Iceland, the threats posed by natural hazards and her work using remote sensing to monitor both sea ice and volcanoes. This talk was well received by attendees as it gave insight into the host country, the volcanic eruption that ended just a week before the start of the conference (7th in past 2 years on the Reykjanes Peninsula). This subject is highly relevant to community, as the analysis and prediction is based on multimodal data.
The planned second keynote took place in the last session on day 3 and was given by Dr. Hannes Högni Vilhjálmsson, professor at Reykjvik University. The talk, titled “Being Multimodal: What Building Virtual Humans has Taught us about Multimodality”, gave the audience a deep dive into lessons learnt from his 20+ years of experience of developing intelligent virtual agents with face-to-face communication skills. “I will review our attempts to capture, understand and analyze the multi-modal nature of human communication, and how we have built and evaluated systems that engage in and support such communication.” is a direct quote from his abstract of the talk.
CBMI is a relatively small, but growing, conference that is built on a strong legacy and has a highly motivated community behind it. The special sessions have long played an important role at CBMI and this year there were 8 special sessions accepted.
AIMHDA: Advances in AI-Driven Medical and Health Data Analysis
CB4AMAS: Content-based Indexing for audio and music: from analysis to synthesis
ExMA: Explainability in Multimedia Analysis
IVR4B: Interactive Video Retrieval for Beginners
MAS4DT: Multimedia analysis and simulations for Digital Twins in the construction domain
MmIXR: Multimedia Indexing for XR
MIDRA: Multimodal Insights for Disaster Risk Management and Applications
UHBER: Multimodal Data Analysis for Understanding of Human Behaviour, Emotions and their Reasons
The number of papers per session ranged from 2 to 8. The larger sessions (CB4AMAS, MmIXR and UHBER) used a discussion panel format that created a more inclusive atmosphere and, at times, sparked lively discussions.
Especially popular with attendees was the competition that took place in the Interactive Video Retrieval for Beginners (IVR4B) session. This session was hosted right after the poster session in the wide open space of Reykjavik University’s foyer.
Awards
The selection committee was unanimous in that the contribution of Lorenzo Bianchi, Fabio Carrara, Nicola Messina & Fabrizio Falchi, titled “Is CLIP the main roadblock for fine-grained open-world perception?”, was the best paper award winner. With the generous support of ACM SIGMM, they were awarded 500 Euros. As the best paper was indeed also a student paper, it was decided to also give the runner-up a 300 Euro award. The runner-up was the contribution of Recep Oguz Araz, Dmitry Bogdanov, Pablo Alonso-Jimenez and Frederic Font, titled “Evaluation of Deep Audio Representations for Semantic Sound Similarity”.
The best demonstration was awarded to Joshua David Springer, Gylfi Thor Gudmundsson and Marcel Kyas for “Lowering Barriers to Entry for Fully-Integrated Custom Payloads on a DJI Matrice”.
The top two systems in the IVAR4B competition were also recognized: the first place was for Nick Pantelidis, Maria Pegia, Damianos Galanopoulos, et al. for “VERGE: Simplifying Video Search for Novice”; and the second place was for Giuseppe Amato, Paolo Bolettieri, Fabio Carrara, et al. for “VISIONE 5.0: toward evaluation with novice users”.
Social events
The first day of the conference was quite eventful as before the poster and IVAR4B sessions Francois Pineau-Benois and Raphael Moraly of the Odyssée Quartet performed selected classical works in the “Music-meets-Science” cultural event. The goals of the latter are to bring live classical music content to the community of Multimedia Research. Musicians played a concert and then discussed with researchers, specifically involved into music analysis and retrieval. Such kind of exchanges between content creators and content analysis, indexing and retrieval researchers has been a distinctive feature of CBMI since 2018. This event would not have been possible without the generous support of ACM SIGMM.
The second day was no less entertaining as before the banquet attendees took a virtual flight over Iceland’s beautiful landscape via the services of FlyOver Iceland. The next CBMI’2025 will be hold in Dublin organized by DCU.
MediaEval, the Multimedia Evaluation Benchmark, has offered a yearly set of multimedia challenges since 2010. MediaEval supports the development of algorithms and technologies for analyzing, exploring and accessing information in multimedia data. MediaEval aims to help make multimedia technology a force for good in society and for this reason focuses on tasks with a human or social aspect. Benchmarking contributes in two ways to advancing multimedia research. First, by offering standardized definitions of tasks and evaluation data sets, it makes it possible to fairly compare algorithms and, in this way, track progress. If we can understand which types of algorithms perform better, we can more easily find ways (and the motivation) to improve them. Second, benchmarking helps to direct the attention of the research community, for example, towards new tasks that are based on real-world needs, or towards known problems for which more research is necessary to have a solution that is good enough for a real world application scenario.
The 2023 MediaEval benchmarking season culminated with the yearly workshop, which was held in conjunction with MMM 2024 (https://www.mmm2024.org) in Amsterdam, Netherlands. It was a hybrid workshop, which also welcomed online participants. The workshop kicked off with a joint keynote with MMM 2024. Yiannis Kompatsiaris, Information Technologies Institute, CERTH, on Visual and Multimodal Disinformation Detection. The talk covered the implications of multimodal disinformation online and the challenges that must be faced in order to detect it. The workshop featured an invited speaker, Adriënne Mendrik, CEO & Co-founder of Eyra, supporting benchmarks with the online Next platform. She talked about benchmark challenge design for science and how the Next platform is currently being used in the Social Sciences.
More information about the workshop can be found at https://multimediaeval.github.io/editions/2023/ and the proceedings were published at https://ceur-ws.org/Vol-3658/ In the rest of this article, we provide an overview of the highlights of the workshop as well as an outlook to the next edition of MediaEval in 2025.
Tasks at MediaEval
The MultimediaEval Workshop 2023 featured five tasks that focused on human and social aspects of multimedia analysis.
Three of the tasks required participants to combine or cross modalities or even consider new modalities. The Musti: Multimodal Understanding of Smells in Texts and Images task challenged participants to detect and classify smell references in multilingual texts and images from the 17th to the 20th century. They needed to identify whether a text and image evoked the same smell source, detect specific smell sources, and apply zero-shot learning for untrained languages. The remaining two tasks emphasized the social aspects of multimedia. In the NewsImages: Connecting Text and Images task, participants worked with a dataset of news articles and images, predicting which image accompanied a news article. This task aimed to explore cases in which there is a link between a text and an image that goes beyond the text being a literal description of what was pictured in the image. The Predicting Video Memorability task required participants to predict how likely videos were to be remembered, both short- and long-term, and to use EEG data to predict whether specific individuals would remember a given video, combining visual features and neurological signals.
Two of the tasks focused on pushing forward video analysis, to be useful to support experts in carrying out their jobs. The task SportsVideo: Fine-Grained Action Classification and Position Detection task strives to develop technology that will support coaches. To address this task, participants analyzed videos of table tennis and swimming competitions, detecting athlete positions, identifying strokes, classifying actions, and recognizing game events such as scores and sounds. The task Transparent Tracking of Spermatozoa strived to develop technology that will support medical professionals. Task participants were asked to track sperm cells in video recordings to evaluate male reproductive health. This involved localizing and tracking individual cells in real time, predicting their motility, and using bounding box data to assess sperm quality. The task emphasized both accuracy and processing efficiency, with subtasks involving graph data structures for motility prediction.
Impressions of Student Participants
MediaEval is grateful to SIGMM for providing funding for three students who attended the MediaEval Workshop and greatly helped us with the organization of this edition: Iván Martín-Fernández and Sergio Esteban-Romero from Speech Technology and Machine Learning Group (GTHAU) – Universidad Politécnica de Madrid, and Xiaomeng Wang from Radboud University. Below the students provide their comments and impressions of the workshop.
“As a novel PhD student, I greatly valued my experience attending MediaEval 2023. I participated as the main author and presented work from our group, GTHAU – Universidad Politécnica de Madrid, on the Predicting Video Memorability Challenge. The opportunity to meet renowned colleagues and senior researchers, and learn from their experiences, provided valuable insight into what academia looks like from the inside.
MediaEval offers a range of multimedia-related tasks, which may sometimes seem under the radar but are crucial in developing real-world applications. Moreover, the conference distinguishes itself by pushing the boundaries, going beyond just presenting results to foster a deeper understanding of the challenges being addressed. This makes it a truly enriching experience for both newcomers and seasoned professionals alike.
Having volunteered and contributed to organizational tasks, I also gained first-hand insight into the inner workings of an academic conference, a facet I found particularly rewarding. Overall, MediaEval 2023 proved to be an exceptional blend of scientific rigor, collaborative spirit, and practical insights, making it an event I would highly recommend for anyone in the multimedia community.”
Iván Martín-Fernández, PhD Student, GTHAU – Universidad Politécnica de Madrid
“Attending MediaEval was an invaluable experience that allowed me to connect with a wide range of researchers and engage in discussions about the latest advancements in Artificial Intelligence. Presenting my work on the Multimedia Understanding of Smells in Text and Images (MUSTI) challenge was particularly insightful, as the feedback I received sparked ideas for future research. Additionally, volunteering and assisting with organizational tasks gave me a behind-the-scenes perspective on the significant effort required to coordinate an event like MediaEval. Overall, this experience was highly enriching, and I look forward to participating and collaborating in future editions of the workshop.”
Sergio Esteban-Romero, PhD Student, GTHAU – Universidad Politécnica de Madrid
“I was glad to be a student volunteer at MediaEval 2024. Collaborating with other volunteers, we organized submission files and prepared the facilities. Everyone was exceptionally kind and supportive. In addition to volunteering, I also participated in the workshop as a paper author. I submitted a paper to the NewsImage task and delivered my first oral presentation. The atmosphere was highly academic, fostering insightful discussions. And I received valuable suggestions to improve my paper. I truly appreciate this experience, both as a volunteer and as a participant.”
Xiaomeng Wang PhD Student, Data Science – Radboud University
Outlook to MediaEval 2025
We are happy to announce that in 2025 MediaEval will be hosted in Dublin, Ireland, co-located with CBMI 2025. The Call for Task Proposals is now open, and details regarding submitting proposals can be found here: https://multimediaeval.github.io/2024/09/24/call.html. The final deadline for submitting your task proposals is Wed. 22nd January 2025. We will publish the list of tasks offered in March and registration for participation in MediaEval 2025 will open in April 2025.
For this edition of MediaEval we will again emphasize our “Quest for Insight”: we push beyond improving evaluation scores to achieving deeper understanding about the challenges, including data and the strengths and weaknesses of particular types of approaches, with the larger aim of understanding and explaining the concepts that the tasks revolve around, promoting reproducible research, and fostering a positive impact on society. We look forward to welcoming you to participate in the new benchmarking year.
JPEG XE issues Call for Proposals on event-based vision representation
The 104th JPEG meeting was held in Sapporo, Japan from July 15 to 19, 2024. During this JPEG meeting, a Call for Proposals on event-based vision representation was launched for the creation of the first standardised representation of this type of data. This CfP addresses lossless coding, and aims to provide the first standard representation for event-based data that ensures interoperability between systems and devices.
Furthermore, the JPEG Committee pursued its work in various standardisation activities, particularly the development of new learning-based technology codecs and JPEG Trust.
The following summarises the main highlights of the 104th JPEG meeting.
JPEG XE
JPEG Trust
JPEG AI
JPEG Pleno Learning-based Point Cloud coding
JPEG Pleno Light Field
JPEG AIC
JPEG Systems
JPEG DNA
JPEG XS
JPEG XL
JPEG XE
The JPEG Committee continued its activity on JPEG XE and event-based vision. This activity revolves around a new and emerging image modality created by event-based visual sensors. JPEG XE is about the creation and development of a standard to represent events in an efficient way allowing interoperability between sensing, storage, and processing, targeting machine vision and other relevant applications. The JPEG Committee completed the Common Test Conditions (CTC) v2.0 document that provides the means to perform an evaluation of candidate technologies for efficient coding of events. The Common Test Conditions document also defines a canonical raw event format, a reference dataset, a set of key performance metrics and an evaluation methodology.
The JPEG Committee furthermore issued a Final Call for Proposals (CfP) on lossless coding for event-based data. This call marks an important milestone in the standardization process and the JPEG Committee is eager to receive proposals. The deadline for submission of proposals is set to March 31st of 2025. Standardization will start with lossless coding of events as this has the most imminent application urgency in industry. However, the JPEG Committee acknowledges that lossy coding of events is also a valuable feature, which will be addressed at a later stage.
Accompanying these two new public documents, a revised Use Cases and Requirements v2.0 document was also released to provide a formal definition for lossless coding of events that is used in the CTC and the CfP.
All documents are publicly available on jpeg.org. The Ad-hoc Group on event-based vision was re-established to continue work towards the 105th JPEG meeting. To stay informed about this activity please join the event-based vision Ad-hoc Group mailing list.
JPEG Trust
JPEG Trust provides a comprehensive framework for individuals, organizations, and governing institutions interested in establishing an environment of trust for the media that they use, and supports trust in the media they share. At the 104th meeting, the JPEG Committee produced an updated version of the Use Cases and Requirements for JPEG Trust (v3.0). This document integrates additional use cases and requirements related to authorship, ownership, and rights declaration. The JPEG Committee also requested a new Part to JPEG Trust, entitled “Media asset watermarking”. This new Part will define the use of watermarking as one of the available components of the JPEG Trust framework to support usage scenarios for content authenticity, provenance, integrity, labeling, and binding between JPEG Trust metadata and corresponding media assets. This work will focus on various types of watermarking, including explicit or visible watermarking, invisible watermarking, and implicit watermarking of the media assets with relevant metadata.
JPEG AI
At the 104th meeting, the JPEG Committee reviewed recent integration efforts, following the adoption of the changes in the past meeting and the creation of a new version of the JPEG AI verification model. This version reflects the JPEG AI DIS text and was thoroughly evaluated for performance and functionalities, including bitrate matching, 4:2:0 coding, region adaptive quantization maps, and other key features. JPEG AI supports a multi-branch coding architecture with two encoders and three decoders, allowing for six compatible combinations that have been jointly trained. The compression efficiency improvements range from 12% to 27% over the VVC Intra coding anchor, with decoding complexities between 8 to 215 kMAC/px.
The meeting also focused on Part 2: Profiles and Levels, which is moving to Committee Draft consultation. Two main concepts have been established: 1) the stream profile, defining a specific subset of the code stream syntax along with permissible parameter values, and 2) the decoder profile, specifying a subset of the full JPEG AI decoder toolset required to obtain the decoded image. Additionally, Part 3: Reference Software and Part 5: File Format will also proceed to Committee Draft consultation. Part 4 is significant as it sets the conformance points for JPEG AI compliance, and some preliminary experiments have been conducted in this area.
JPEG Pleno Learning-based Point Cloud coding
Learning-based solutions are the state of the art for several computer vision tasks, such as those requiring high-level understanding of image semantics, e.g., image classification, face recognition and object segmentation, but also 3D processing tasks, e.g. visual enhancement and super-resolution. Learning-based point cloud coding solutions have demonstrated the ability to achieve competitive compression efficiency compared to available conventional point cloud coding solutions at equivalent subjective quality. At the 104th meeting, the JPEG Committee instigated balloting for the Draft International Standard (DIS) of ISO/IEC 21794 Information technology — Plenoptic image coding system (JPEG Pleno) — Part 6: Learning-based point cloud coding. This activity is on track for the publication of an International Standard in January 2025. The 104th meeting also began an exploration into advanced point cloud coding functionality, in particular the potential for progressive decoding of point clouds.
JPEG Pleno Light Field
The JPEG Pleno Light Field effort has an ongoing standardization activity concerning a novel light field coding architecture that delivers a single coding mode to efficiently code light fields spanning from narrow to wide baselines. This novel coding mode is depth information agnostic resulting in significant improvement in compression efficiency. The first version of the Working Draft of the JPEG Pleno Part 2: Light Field Coding second edition (ISO/IEC 21794-2 2ED), including this novel coding mode, was issued during the 104th JPEG meeting in Sapporo, Japan.
The JPEG PLeno Model (JPLM) provides reference implementations for the standardized technologies within the JPEG Pleno framework, including the JPEG Pleno Part 2 (ISO/IEC 21794-2). Improvements to the JPLM have been implemented and tested, including the design of a more user-friendly platform.
The JPEG Pleno Light Field effort is also preparing standardization activities in the domains of objective and subjective quality assessment for light fields, aiming to address other plenoptic modalities in the future. During the 104th JPEG meeting in Sapporo, Japan, the collaborative subjective experiments aiming at exploring various aspects of subjective light field quality assessments were presented and discussed. The outcomes of these experiments will guide the decisions during the subjective quality assessment standardization process, which has issued its third Working Draft. A new version of a specialized tool for subjective quality evaluation, that supports these experiments, has also been released.
JPEG AIC
At its 104th meeting, the JPEG Committee reviewed results from previous Core Experiments that collected subjective data for fine-grained quality assessments of compressed images ranging from high to near-lossless visual quality. These crowdsourcing experiments used triplet comparisons with and without boosted distortions, as well as double stimulus ratings on a visual analog scale. Analysis revealed that boosting increased the precision of reconstructed scale values by nearly a factor of two. Consequently, the JPEG Committee has decided to use triplet comparisons in the upcoming AIC-3.
The JPEG Committee also discussed JPEG AIC Part 4, which focuses on objective image quality assessments for compressed images in the high to near-lossless quality range. This includes developing methods to evaluate the performance of such objective image quality metrics. A draft call for contributions is planned for January 2025.
JPEG Systems
At the 104th meeting Part 10 of JPEG Systems (ISO/IEC 19566-10), the JPEG Systems Reference Software, reached the IS stage. This first version of the reference software provides a reference implementation and reference dataset for the JPEG Universal Metadata Box Format (JUMBF, ISO/IEC 19566-5). Meanwhile, work is in progress to extend the reference software implementations of additional Parts, including JPEG Privacy and Security and JPEG 360.
JPEG DNA
JPEG DNA is an initiative aimed at developing a standard capable of representing bi-level, continuous-tone grey-scale, continuous-tone colour, or multichannel digital samples in a format using nucleotide sequences to support DNA storage. A Call for Proposals was published at the 99th JPEG meeting. Based on the performance assessments and descriptive analyses of the submitted solutions, the JPEG DNA Verification Model was created during the 102nd JPEG meeting. Several core experiments were conducted to validate this Verification Model, leading to the creation of the first Working Draft of JPEG DNA during the 103rd JPEG meeting.
The next phase of this work involves newly defined core experiments to enhance the rate-distortion performance of the Verification Model and its robustness to insertion, deletion, and substitution errors. Additionally, core experiments to test robustness against substitution and indel noise are conducted. A core experiment was also performed to integrate JPEG AI into the JPEG DNA VM, and quality comparisons have been carried out. A study on visual quality assessment of JPEG AI as an alternative to JPEG XL in the VM will be carried out.
In parallel, efforts are underway to improve the noise simulator developed at the 102nd JPEG meeting, enabling a more realistic assessment of the Verification Model’s resilience to noise. There is also ongoing exploration of the performance of different clustering and consensus algorithms to further enhance the VM’s capabilities.
JPEG XS
The core parts of JPEG XS 3rd edition were prepared for immediate publication as International Standards. This means that Part 1 of the standard – Core coding tools, Part 2 – Profiles and buffer models, and Part 3 – Transport and container formats, will be available before the end of 2024. Part 4 – Conformance testing is currently still under DIS ballot and it will be finalized in October 2024. At the 104th meeting, the JPEG Committee continued the work on Part 5 – Reference software. This part is currently at Committee Draft stage and the DIS is planned for October 2024. The reference software has a feature-complete decoder that is fully compliant with the 3rd edition. Work on the encoder is ongoing.
Finally, additional experimental results were presented on how JPEG XS can be used over 5G mobile networks for wireless transmission of low-latency and high quality 6K/8K 360 degree views with mobile devices and VR headsets. This work will be continued.
JPEG XL
Objective metrics results for HDR images were investigated (using among others the ColorVideoVDP metric), indicating very promising compression performance of JPEG XL compared to other codecs like AVIF and JPEG 2000. Both the libjxl reference software encoder and a simulated candidate hardware encoder were tested. Subjective experiments for HDR images are planned.
The second editions of JPEG XL Part 1 (Core coding system) and Part 2 (File format) are now ready for publication. The second edition of JPEG XL Part 3 (Conformance testing) has moved to the FDIS stage.
Final Quote
“The JPEG Committee has reached a new milestone by releasing a new Call for Proposals to code events. This call is aimed at creating the first International Standard to efficiently represent events, enabling interoperability between devices and systems that rely on event sensing.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.
JPEG AI reaches Draft International Standard stage
The 103rd JPEG meeting was held online from April 8 to 12, 2024. During the 103rd JPEG meeting, the first learning-based standard, JPEG AI, reached the Draft International Standard (DIS) and was sent for balloting after a very successful development stage that led to performance improvements above 25% against its best-performing anchor, VVC. This high performance, combined with implementation in current mobile phones or the possibilities given by the latent representation to be used in image processing applications, leads to new opportunities and will certainly launch a new era of compression technology.
The following are the main highlights of the 103rd JPEG meeting:
JPEG AI reaches Draft International Standard;
JPEG Trust integrates JPEG NFT;
JPEG Pleno Learning based Point Cloud coding releases a Draft International Standard;
JPEG Pleno Light Field works in a new compression model;
JPEG AIC analyses different subjective evaluation models for near visually lossless quality evaluation;
JPEG XE prepares a call for proposal on event-based coding;
JPEG DNA proceeds with the development of a standard for image compression using nucleotide sequences for supporting DNA storage;
JPEG XS 3rd edition;
JPEG XL analyses HDR coding.
The following sections summarise the main highlights of the 103rdJPEG meeting.
JPEG AI reaches Draft International Standard
At its 103rd meeting the JPEG Committee produced the Draft International Standard (DIS) of the JPEG AI Part 1 Core Coding Engine which is expected to be published as an International Standard in October 2024. JPEG AI offers a coding solution for standard reconstruction with significant improvements in compression efficiency over previous image coding standards at equivalent subjective quality. The JPEG AI coding design allows for hardware/software implementation encoding and decoding, in terms of memory and computational complexity, efficient coding of images with text and graphics, support for 8- and 10-bit depth, region of interest coding, and progressive coding. To cover multiple encoder and decoder complexity-efficiency tradeoffs, JPEG AI supports a multi-branch coding architecture with two encoders and three decoders (6 possible compatible combinations) that have been jointly trained. Compression efficiency (BD-rate) gains of 12.5% to 27.9% over the VVC Intra coding anchor, for relevant encoder and decoder configurations, can be achieved with a wide range of complexity tradeoffs (7 to 216 kMAC/px at the decoder side).
The work regarding JPEG AI profiles and levels (part 2), reference software (part 3) and conformance (part 4) has started and a request for sub-division has been issued in this meeting to establish a new part on the file format (part 5). At this meeting, most of the work focused on the JPEG AI high-level syntax and improvement of several normative and non-normative tools, such as hyper-decoder activations, training dataset, progressive decoding, training methodology and enhancement filters. There are now two smartphone implementations of JPEG AI available. In this meeting, a JPEG AI demo was shown running on a Huawei Mate50 Pro with a Qualcomm Snapdragon 8+ Gen1 with high resolution (4K) image decoding, tiling, full base operating point support and arbitrary image resolution decoding.
JPEG Trust
At the 103rd meeting, the JPEG Committee produced an updated version of the Use Cases and Requirements for JPEG Trust (v2.0). This document integrates the use cases and requirements of the JPEG NFT exploration with the use cases and requirements of JPEG Trust. In addition, a new document with Terms and Definitions for JPEG Trust (v1.0) was published which incorporates all terms and concepts as they are used in the context of the JPEG Trust activities. Finally, an updated version of the JPEG Trust White Paper v1.1 has been released. These documents are publicly available on the JPEG Trust/Documentation page.
JPEG Pleno Learning-based Point Cloud coding
The JPEG Committee continued its activity on Learning-based Point Cloud Coding under the JPEG Pleno family of standards. During the 103rd JPEG meeting, comments on the Committee Draft of IS0/IEC 21794 Part 6: “Learning-based point cloud coding” were received and the activity is on track for the release of a Draft International Standard for balloting at the 104th JPEG meeting in Sapporo, Japan in July 2024. A new version of the Verification Model (Version 4.1) was released during the 103rd JPEG meeting containing an updated entropy coding module. In addition, version 2.1 of the Common Training and Test Conditions was released as a public document.
JPEG Pleno Light Field
The JPEG Pleno Light Field activity progressed at this meeting with a number of technical submissions for improvements to the JPEG PLeno Model (JPLM). The JPLM provides reference implementations for the standardized technologies within the JPEG Pleno framework. The JPEG Pleno Light Field activity has an ongoing standardization activity concerning a novel light field coding architecture that delivers a single coding mode to efficiently code all types of light fields. This novel coding mode does not need any depth information resulting in significant improvement in compression efficiency.
The JPEG Pleno Light Field is also preparing standardization activities in the domains of objective and subjective quality assessment for light fields, aiming to address other plenoptic modalities in the future. During the meeting, important decisions were made regarding the execution of multiple collaborative subjective experiments aiming at exploring various aspects of subjective light field quality assessments. Additionally, a specialized tool for subjective quality evaluation has been developed to support these experiments. The outcomes of these experiments will guide the decisions during the subjective quality assessment standardization process. They will also be utilized in evaluating proposals for the upcoming objective quality assessment standardization activities.
JPEG AIC
During the 103rd JPEG meeting, the work on visual image quality assessment continued with a focus on JPEG AIC-3, targeting a standard for a subjective quality assessment methodology for images in the range from high to nearly visually lossless quality. The activity is currently investigating three kinds of subjective image quality assessment methodologies, notably the Boosted Triplet Comparison (BTC), the In-place Double Stimulus Quality Scale (IDSQS), and the In-place Plain Triplet Comparison (IPTC), as well as a unified framework capable of merging the results of two among them.
The JPEG Committee has also worked on the preparation of the Part 4 of the standard (JPEG AIC-4) by initiating work on the Draft Call for Proposals on Objective Image Quality Assessment. The Final Call for Proposals on Objective Image Quality Assessment is planned to be released in January 2025, while the submission of the proposals is planned for April 2025.
JPEG XE
The JPEG Committee continued its activity on JPEG XE and event-based vision. This activity revolves around a new and emerging image modality created by event-based visual sensors. JPEG XE is about the creation and development of a standard to represent events in an efficient way allowing interoperability between sensing, storage, and processing, targeting machine vision and other relevant applications. The JPEG Committee finished the Common Test Conditions v1.0 document that provides the means to perform an evaluation of candidate technologies for efficient coding of event sequences. The Common Test Conditions define a canonical raw event format, a reference dataset, a set of key performance metrics and an evaluation methodology. In addition, the JPEG Committee also finalized the Draft Call for Proposals on lossless coding for event-based data. This call will be finalized at the next JPEG meeting in July 2024. Both the Common Test Conditions v1.0 and the Draft Call for Proposals are publicly available on jpeg.org. Standardization will start with lossless coding of event sequences as this has the most imminent application urgency in industry. However, the JPEG Committee acknowledges that lossy coding of event sequences is also a valuable feature, which will be addressed at a later stage. The Ad-hoc Group on Event-based Vision was reestablished to continue the work towards the 104th JPEG meeting. To stay informed about the activities please join the event-based imaging Ad-hoc Group mailing list.
JPEG DNA
JPEG DNA is an exploration aiming at developing a standard that provides technical solutions that are capable of representing bi-level, continuous-tone grey-scale, continuous-tone colour, or multichannel digital samples in a format representing nucleotide sequences for supporting DNA storage. A Call for Proposals was published at the 99th JPEG meeting and based on performance assessment and a descriptive analysis of the solutions that had been submitted, the JPEG DNA Verification Model was created during the 102nd JPEG meeting. A number of core experiments were conducted to validate the Verification Model, and notably, the first Working Draft of JPEG DNA was produced during the 103rd JPEG meeting. Work towards the creation of the specification will start with newly defined core experiments to improve the rate-distortion performance of Verification Model and the robustness to insertion, deletion, and substitution errors. In parallel, efforts are underway to improve the noise simulator produced at the 102nd JPEG meeting to allow the assessment of the resilience to noise in the Verification Model in more realistic conditions and to explore learning-based coding solutions.
JPEG XS
The JPEG Committee is happy to announce that the core parts of JPEG XS 3rd edition are ready for publication as International Standards. The Final Draft International Standard for Part 1 of the standard – Core coding tools – is ready, and Part 2 – Profiles and buffer models – and Part 3 – Transport and container formats – are both being prepared by ISO for immediate publication. At this meeting, the JPEG Committee continued the work on Part 4 – Conformance testing, to provide the necessary test streams and test protocols to implementers of the 3rd edition. Consultation of the Committee Draft for Part 4 took place and a DIS version was issued. The development of the reference software, contained in Part 5, continued and the reference decoder is now feature-complete and fully compliant with the 3rd edition. A Committee Draft for Part 5 was issued at this meeting. Development of a fully compliant reference encoder is scheduled to be completed by July.
Finally, new experimental results were presented on how to use JPEG XS over 5G mobile networks for the wireless transmission of low-latency and high quality 4K/8K 360 degree views with mobile devices and VR headsets. More experiments will be conducted, but first results show that JPEG XS is capable of providing immersive and excellent quality of experience in VR use cases, mainly thanks to its native low-latency and low-complexity properties.
JPEG XL
The performance of JPEG XL on HDR images was investigated and the experiments will continue. Work on a hardware implementation continues, and further improvements are made to the libjxl reference software. The second editions of Parts 1 and 2 are in the final stages of the ISO process and will be published soon.
Final Quote
“The JPEG AI Draft International Standard is a yet another important milestone in an age where AI is rapidly replacing previous technologies. With this achievement, the JPEG Committee has demonstrated its ability to reinvent itself and adapt to new technological paradigms, offering standardized solutions based on latest state-of-the-art technologies.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.
The 147th MPEG meeting was held in Sapporo, Japan from 15-19 July 2024, and the official press release can be found here. It comprises the following highlights:
ISO Base Media File Format*: The 8th edition was promoted to Final Draft International Standard, supporting seamless media presentation for DASH and CMAF.
Syntactic Description Language: Finalized as an independent standard for MPEG-4 syntax.
Low-Overhead Image File Format*: First milestone achieved for small image handling improvements.
Neural Network Compression*: Second edition for conformance and reference software promoted.
Internet of Media Things (IoMT): Progress made on reference software for distributed media tasks.
* … covered in this column and expanded with possible research aspects.
8th edition of ISO Base Media File Format
The ever-growing expansion of the ISO/IEC 14496-12 ISO base media file format (ISOBMFF) application area has continuously brought new technologies to the standards. During the last couple of years, MPEG Systems (WG 3) has received new technologies on ISOBMFF for more seamless support of ISO/IEC 23009 Dynamic Adaptive Streaming over HTTP (DASH) and ISO/IEC 23000-19 Common Media Application Format (CMAF) leading to the development of the 8th edition of ISO/IEC14496-12.
The new edition of the standard includes new technologies to explicitly indicate the set of tracks representing various versions of the media presentation of a single media for seamless switching and continuous presentation. Such technologies will enable more efficient processing of the ISOBMFF formatted files for DASH manifest or CMAF Fragments.
Research aspects: The central research aspect of the 8th edition of ISOBMFF, which “will enable more efficient processing,” will undoubtedly be its evaluation compared to the state-of-the-art. Standards typically define a format, but how to use it is left open to implementers. Therefore, the implementation is a crucial aspect and will allow for a comparison of performance. One such implementation of ISOBMFF is GPAC, which most likely will be among the first to implement these new features.
Low-Overhead Image File Format
ISO/IEC 23008-12 image format specification defines generic structures for storing image items and sequences based on ISO/IEC 14496-12 ISO base media file format (ISOBMFF). As it allows the use of various high-performance video compression standards for a single image or a series of images, it has been adopted by the market quickly. However, it was challenging to use it for very small-sized images such as icons or emojis. While the initial design of the standard was versatile and useful for a wide range of applications, the size of headers becomes an overhead for applications with tiny images. Thus, Amendment 3 of ISO/IEC 23008-12 low-overhead image file format aims to address this use case by adding a new compact box for storing metadata instead of the ‘Meta’ box to lower the size of the overhead.
Research aspects: The issue regarding header sizes of ISOBMFF for small files or low bitrate (in the case of video streaming) was known for some time. Therefore, amendments in these directions are appreciated while further performance evaluations are needed to confirm design choices made at this initial step of standardization.
Neural Network Compression
An increasing number of artificial intelligence applications based on artificial neural networks, such as edge-based multimedia content processing, content-adaptive video post-processing filters, or federated training, need to exchange updates of neural networks (e.g., after training on additional data or fine-tuning to specific content). For this purpose, MPEG developed a second edition of the standard for coding of neural networks for multimedia content description and analysis (NNC, ISO/IEC 15938-17, published in 2024), adding syntax for differential coding of neural network parameters as well as new coding tools. Trained models can be compressed to at least 10-20% for several architectures, even below 3%, of their original size without performance loss. Higher compression rates are possible at moderate performance degradation. In a distributed training scenario, a model update after a training iteration can be represented at 1% or less of the base model size on average without sacrificing the classification performance of the neural network.
In order to facilitate the implementation of the standard, the accompanying standard ISO/IEC 15938-18 has been updated to cover the second edition of ISO/IEC 15938-17. This standard provides a reference software for encoding and decoding NNC bitstreams, as well as a set of conformance guidelines and reference bitstreams for testing of decoder implementations. The software covers the functionalities of both editions of the standard, and can be configured to test different combinations of coding tools specified by the standard.
Research aspects: The reference software for NNC, together with the reference software for audio/video codecs, are vital tools for building complex multimedia systems and its (baseline) evaluation with respect to compression efficiency only (not speed). This is because reference software is usually designed for functionality (i.e., compression in this case) and not performance.
The 148th MPEG meeting will be held in Kemer, Türkiye, from November 04-08, 2024. Click here for more information about MPEG meetings and their developments.
ACM SIGMM co-sponsored the second edition of the Spring School on Social XR, organized by the Distributed and Interactive Systems group (DIS) at CWI in Amsterdam. The event took place on March 4th – 8th 2024 and attracted 30 students from different disciplines (technology, social sciences, and humanities). The program included 22 lectures, 6 of them open, by 23 instructors. The event was organized by Irene Viola, Silvia Rossi, Thomas Röggla, and Pablo Cesar from CWI; and Omar Niamut from TNO. The event was co-sponsored by the ACM Special Interest Group on Multimedia ACM SIGMM, making available student grants and supporting international speaker from under-represented countries, and The Netherlands Institute for Sound and Vision (https://www.beeldengeluid.nl/en).
“The future of media communication is immersive, and will empower sectors such as cultural heritage, education, manufacturing, and provide a climate-neutral alternative to travelling in the European Green Deal”. With such a vision in mind, the organization committee continued for a second edition with a holistic program around the research topic of Social XR. The program included keynotes and workshops, where prominent scientists in the field shared their knowledge with students and triggered meaningful conversations and exchanges.
The program included topics such as the capturing and modelling of realistic avatars and their behavior, coding and transmission techniques of volumetric video content, ethics for the design and development of responsible social XR experiences, novel rending and interaction paradigms, and human factors and evaluation of experiences. Together, they provided a holistic perspective, helping participants to better understand the area and to initiate a network of collaboration to overcome current limitations of current real-time conferencing systems.
The spring school is part of the semester program organized by the DIS group of CWI. It was initiated in May 2022 with the Symposium on human-centered multimedia systems: a workshop and seminar to celebrate the inaugural lecture, “Human-Centered Multimedia: Making Remote Togetherness Possible” of Prof. Pablo Cesar. Then, it was continued in 2023 with the 1st Spring School on Social XR.
On April 10th, 2024, during the SIGMM Advisory Board meeting, the Strike Team Leaders, Touradj Ebrahimi, Arnold Smeulders, Miriam Redi and Xavier Alameda Pineda (represented by Marco Bertini) reported the results of their activity. They are summarized in the following in the form of recommendations that should be intended as guidelines and behavioral advice for our ongoing and future activity. SIGMM members in charge of SIGMM activities, SIGMM Conference leaders and particularly the organizers of the next ACMMM editions, are invited to adhere to these recommendations for their concerns, implement the items marked as mandatory and report to the SIGMM Advisory Board after the event.
All the SIGMM Strike Teams will remain in charge for two years starting January 1st, 2024 for reviews and updates.
The world is changing rapidly, and technology is driving these changes at an unprecedented pace. In this scenario, multimedia has become ubiquitous, providing new services to users, advanced modalities for information transmission, processing, and management, as well as innovative solutions for digital content understanding and production. The progress of Artificial Intelligence has fueled new opportunities and vitality in the field. New media formats, such as 3D, event data, and other sensory inputs, have become popular. Cutting-edge applications are constantly being developed and introduced.
SIGMM Strike Team on Industry Engagement
Team members: Touradj Ebrahimi (EPFL),Ali Begen (Ozyegin Univ), Balu Adsumilli (Google), Yong Rui (Lenovo) and ChangSheng Xu (Chinese Academy of Sciences) Coordinator: Touradj Ebrahimi
The team provided recommendations for both ACMMM organizers and SIGMM Advisory Board. The recommendations addressed improving the presence of industry at ACMMM and other SIGMM Conferences/Workshops launching new in-cooperation initiatives and establishing stable bi- directional links.
Organization of industry-focused events
Suggested / Mandatory for ACMMM Organizers and SIGMM AB: Create industry-focused promotional materials like pamphlets/brochures for industry participation (sponsorship, exhibit, etc.) in the style of ICASSP 2024 and ICIP 2024
Suggested for ACMMM Organizers: invite Keynote Speakers from industry, eventually with financial support of SIGMMM. Keynote talks should be similar to plenary talks but around specific application challenges.
Suggested for ACMMM Organizers: organize Special Sessions and Workshops around specific applications of interest to companies and startups. Sessions should be coordinated by industry with eventual support from an experienced and confirmed scholar.
Suggested for ACMMM Organizers: organize Hands-on Sessions led by industry to receive feedback on future products and services.
Suggested for ACMMM Organizers: organize Panel Sessions led by industry and standardization committees on timely topics relevant to industry e.g. How companies cope with AI.
Suggested for ACMMM Organizers: organize Tutorial sessions given by qualified people from industry and standardization committees at SIGMM-sponsored conferences/workshops
Suggested for ACMMM Organizers: promote contributions mainly from the industry in theform of Industry Sessions to present companies and their products and services.
Suggested for ACMMM Organizers and SIGMM AB: promote Joint SIGMM / Standardization workshop on latest standards e.g. JPEG meets SIGMM, MPEG meets SIGMM, AOM meets SIGMM.
Suggested for ACMMM Organizers: organize Job Fairs like job interview speed dating during ACMMM
Initiatives for linkage
Mandatory for SIGMM Organizers and SIGMM AB: Create and maintain a mailing list of industrial targets, taking care of GDPR (Include a question in the registration form of SIGMM-sponsored conferences)
Suggested for SIGMM AB: organize monthly talks by industry leaders either from large established or SMEs or startups sharing technical/scientific challenges they face and solutions
Initiatives around reproducible results and benchmarking
Suggested for ACMMM Organizers and SIGMM AB: support release of databases, studies on performance assessment procedures and metrics eventually focused on specific applications.
Suggested for ACMMM Organizers: organize Grand Challenges initiated and sponsored by industry.
Strike Team on ACMMM Format
Team Members: Arnold Smeulders (Univ. of Amsterdam), Alan Smeaton (Dublin City University), Tat Seng Chua (National University of Singapore), Ralf Steinmetz (Univ. Darmstadt), Changwen Chen (Hong Kong Polytechnic Univ.), Nicu Sebe (Univ. of Trento), Marcel Worring (Univ. of Amsterdam), Jianfei Cai (Monash Univ.), Cathal Gurrin (Dublin City Univ.). Coordinator: Arnold Smeulders
The team provided recommendations for both ACMMM organizers and SIGMM Advisory Board. The recommendations addressed distinct items related to Conference identity, Conference budget and Conference memory.
1. Intended audience. It is generally felt that ACMMM is under pressure from neighboring conferences growing very big. There is consensus that growing big should not be the purpose of ACMMM: a 750 – 1500 size was thought to be ideal including being attractive to industry. Growth should come naturally.
Suggested for ACMMM Organizers and SIGMM AB: Promote distant travel by lowering fees for those who travels far
Suggested for ACMMM Organizers: Include (a personalized) visa invitation in the call for papers.
2. Community feel, differentiation and interdisciplinarity. Identity is not an actionable concern, but one of the shared common goods is T-shaped individuals interested in neighboring disciplines making an interdisciplinary or multidisciplinary connection. It is desirable to differentiate submitted papers from major close conferences like CVPR. This point is already implemented in the call for papers of ACMMM 2024.
null
Mandatory for ACMMM Organizers: Ask in the submission how the paper fits in the multimedia community and its scientific tradition as illustrated by citations. Consider this information in the explicit review criteria.
Recommended for ACMMM Organizers: Support the physical presence of participants by rebalancing fees.
Suggested for ACMMM Organizers and SIGMM AB: Organize a session around the SIGMM test of time award, make selection early, funded by SIGMM.
Suggested for ACMMM Organizers: Organize moderated discussion sessions for papers on the same theme.
3. Brave New Ideas. Brave New is very well fitting with the intended audience. It is essential that we are able to draw out brave and new ideas from our community for long term growth and vibrancy. The emphasis in reviewing Brave New Ideas should be on the novelty even if it is not perfect. Rotate over a pool of people to prevent lock-in.
null
Suggested / Mandatory for ACMMM Organizers: Include in the submission a 3-minute pitch video to archive in the ACM digital library.
Suggested / Mandatory for ACMMM Organizers: Select reviewers from a pool of senior people to review novelty.
Suggested for ACMMM Organizers: Start with one session of 4 papers, if successful, add another session later.
4. Application. There should be no support for one specific application area exclusively in the main conference. Yet, applications areas should be focused in special sessions or workshops.
Suggested for ACMMM Organizers: Focus on application-related workshops or special sessions with own reviewing.
5. Presentation. When the core business of ACM MM is inter- and multi-disciplinarity it is natural to make the presentation for a broader audience part of the selection. ACM should make the short videos accessible as a service to the science or general public. TED-like videos for a paper fit naturally with ACMMM and fit with the trend in YouTube to communicate your paper. If too much to do, SIGMM AB should support reviewing the videos financially.
Mandatory to ACMMM Organizers: Include a TED-like 3-minute pitch video as part of the submission and this is archived by ACM Digital Library as part of the conference proceedings, to be submitted a week after the paper deadline for review, so there is time to prepare it after the regular paper submission.
6. Promote open-access. For a data-driven and fair comparison promote open access of data to be used in the next conference to compare to.
Suggested for SIGMM AB: Open access for data encouraged.
7. Keynotes. For the intended audience and interdisciplinary, it is felt essential to have keynote on the key-topics of the moment. Keynotes should not focus on one topic but maintaining the diversity of topics in the conference and over the years, so to be sure new ideas are inserted in the community.
Suggested to SIGMM AB: to directly fund a big name, expensive, marquee keynote speaker sponsored by SIGMM to one of the societally urgent key-notes as evident from news.
8. Diversity over subdisciplines, etc Do extra effort for Arts, GenAI use models, security, HCI and demos. We need to ensure that if the submitted papers are of sufficiently high quality, there should be at least a session on that sub- topic in the conference. We need to ensure that the conference is not overwhelmed by a popular topic with easy review criteria and generally of much higher review scores.
Suggested for ACMMM Organizers: Promote diversity of all relevant topics in the call for papers and by action in subcommunities by an ambassador. SIGMM will supervise the diversity.
9. Living report. To enhance the institutional memory, maintain a living document passed on from organizer to organizer, with suggestions. The owner of the document is the commissioner for conferences of SIG MM.
Mandatory for ACMMM Organizers and SIGMM AB: A short report to the SIGMM commissioner for conferences from the ACMMM chair, including a few recommendations for the next time; handed over to the next conference after the end of the current conference.
SIGMM Strike Team on Harmonization and Spread
Team members: Miriam Redi (Wikimedia Foundation), Sivia Rossi (CWI), Irene Viola (CWI), Mylene Farias (Texas State Univ. and Univ. Brasilia), Ichiro Ide (Nagoya Univ), Pablo Cesar (CWI and TU Delft). Coordinator: Miriam Redi
The team provided recommendations for both ACMMM organizers and SIGMM Advisory Board. The recommendations addressed distinct items related to give SIGMM Records and Social Media a more central role in SIGMM, integrate SIGMM Records and Social Media in the whole process of the ACMMM organization since its initial planning.
1. SIGMM Website The SIGMM Website is not updated and needs a serious overhaul.
Mandatory for SIGMM AB: restart the website from scratch being inspired by other SIGs f.e. reaching out to people at CHI to understand what can be done. Budget should be provided by SIGMM.
2. SIGMM Social Media Channels SIGMM Social media accounts (twitter and linkedin) are managed by the Social Media Team at the SIGMM Records
Suggested for SIGMM AB: continuing this organization expanding responsibilities of the team to include conferences and other events
3. Conference Social Media: Social media presence of conferences is managed by the individual conferences. It is not uniform and disconnected from SIGMM social media and the Records. The social media presence of ACMMM flagship conference is weak and needs help. Creating continuity in terms of strategy and processes across conference editions is key.
Mandatory for ACMMM Organizers and SIGMM AB: create a Handbook of conference communications: a set of guidelines about how to create continuity across conference editions in terms of communications, and how to connect the SIGMM Records to the rest of the community.
Suggested for ACMMM Organizers and SIGMM AB: one member of the Social Media team at the SIGMM Records is systematically invited to join the OC of major conferences as publicity co-chair. The steering committee chair of each conference should commit to keeping the organizers of each conference edition informed about this policy, and monitor its implementation throughout the years.
SIGMM Strike Team on Open Review
Team members: Xavier Alameda Pineda (Univ. Grenoble-Alpes), Marco Bertini (Univ. Firenze). Coordinator: Xavier Alameda Pineda
The team continued the support to ACMMM Conference organizers for the use of Open Review in the ACMMM reviewing process, helping to implement new functions or improve the existing ones and supporting smooth transfer of the best practices. The recommendations addressed distinct items to complete the migration and stabilize use of Open Review in the future ACMMM editions.
1. Technical development and support
Mandatory for the Team: update and publish the scripts; complete the Open Review configuration.
Mandatory for SIGMM AB and ACMMM organizers: create a Committee led by the TPC chairs of the current ACMM edition a rotating basis.
2. Communication
Mandatory for the Team: write a small manual for use and include it in the future ACMMM Handbook.
The last plenary meeting of the Video Quality Experts Group (VQEG) was held online by the University of Konstantz (Germany) in December 18th to 21st, 2023. It offered the possibility to more than 100 registered participants from 19 different countries worldwide to attend the numerous presentations and discussions about topics related to the ongoing projects within VQEG. All the related information, minutes, and files from the meeting are available online in the VQEG meeting website, and video recordings of the meeting are soon available at Youtube.
All the topics mentioned below can be of interest for the SIGMM community working on quality assessment, but special attention can be devoted to the current activities on improvements of the statistical analysis of subjective experiments and objective metrics and on the development of a test plan to evaluate the QoE of immersive interactive communication systems in collaboration with ITU.
Readers of these columns interested in the ongoing projects of VQEG are encouraged to suscribe to the VQEG’s email reflectors to follow the activities going on and to get involved with them.
As already announced in the VQEG website, the next VQEG plenary meeting be hosted by Universität Klagenfurt in Austria from July 1st to 5th, 2024.
Overview of VQEG Projects
Audiovisual HD (AVHD)
The AVHD group works on developing and validating subjective and objective methods to analyze commonly available video systems. During the meeting, there were various sessions in which presentations related to these topics were discussed.
Firstly, Ali Ak (Nantes Université, France), provided an analysis of the relation between acceptance/annoyance and visual quality in a recently collected dataset of several User Generated Content (UGC) videos. Then, Syed Uddin (AGH University of Krakow, Poland) presented a video quality assessment method based on the quantization parameter of MPEG encoders (MPEG-4, MPEG-AVC, and MPEG-HEVC) leveraging VMAF. In addition, Sang Heon Le (LG Electronics, Korea) presented a technique for pre-enhancement for video compression and applicable subjective quality metrics. Another talk was given by Alexander Raake (TU Ilmenau, Germany), who presented AVQBits, a versatile no-reference bitstream-based video quality model (based on the standardized ITU-T P.1204.3 model) that can be applied in several contexts such as video service monitoring, evaluation of video encoding quality, of gaming video QoE, and even of omnidirectional video quality. Also, Jingwen Zhu (Nantes Université, France) and Hadi Amirpour (University of Klagenfurt, Austria) described a study on the evaluation of the effectiveness of different video quality metrics in predicting the Satisfied User Ratio (SUR) in order to enhance the VMAF proxy to better capture content-specific characteristics. Andreas Pastor (Nantes Université, France) presented a method to predict the distortion perceived locally by human eyes in AV1-encoded videos using deep features, which can be easily integrated into video codecs as a pre-processing step before starting encoding.
In relation with
standardization efforts, Mathias Wien (RWTH Aachen University, Germany) gave an
overview on recent expert viewing tests that have been conducted within MPEG
AG5 at the 143rd and 144th MPEG meetings. Also, Kamil Koniuch (AGH University
of Krakow, Poland) presented a proposal to update the Survival Game task
defined in the ITU-T
Recommendation P.1301 on subjective quality evaluation of audio and
audiovisual multiparty telemeetings, in order to improve its implementation and
application to recent efforts such as the evaluation of immersive communication
systems within the ITU-T P.IXC (see the paragraph
related to the Immersive Media Group).
Quality Assessment for Health applications (QAH)
The QAH group is focused on the quality assessment of health applications. It addresses subjective evaluation, generation of datasets, development of objective metrics, and task-based approaches. Recently, the group has been working towards an ITU-T recommendation for the assessment of medical contents. On this topic, Meriem Outtas (INSA Rennes, France) led a discussion dealing with the edition of a draft of this recommendation. In addition, Lumi Xia (INSA Rennes, France) presented a study of task-based medical image quality assessment focusing on a use case of adrenal lesions.
Statistical Analysis Methods (SAM)
The group SAM investigates on analysis methods both for the results of subjective experiments and for objective quality models and metrics. This was one of the most active groups in this meeting, with several presentations on related topics.
In addition, Lukas Krasula (Netflix, USA) introduced e2nest, a web-based platform to conduct media-centric (video, audio, and images) subjective tests. Also, Dietmar Saupe (University of Konstanz, Germany) and Simon Del Pin (NTNU, Norway) showed the results of a study analyzing the national difference in image quality assessment, showing significant differences in various areas. Alexander Raake (TU Ilmenau, Germany) presented a study on the remote testing of high resolution images and videos, using AVrate Voyager , which is a publicly accessible framework for online tests. Finally, Dominik Keller (TU Ilmenau, Germany) presented a recent study exploring the impact of 8K (UHD-2) resolution on HDR video quality, considering different viewing distances. The results showed that the enhanced video quality of 8K HDR over 4K HDR diminishes with increasing viewing distance.
No Reference Metrics (NORM)
The group NORM addresses a collaborative effort to develop no-reference metrics for monitoring visual service quality. In At this meeting, Ioannis Katsavounidis (Meta, USA) led a discussion on the current efforts to improve complexity image and video metrics. In addition, Krishna Srikar Durbha (Univeristy of Texas at Austin, USA) presented a technique to tackle the problem of bitrate ladder construction based on multiple Visual Information Fidelity (VIF) feature sets extracted from different scales and subbands of a video
Emerging Technologies Group (ETG)
The ETG group focuses on various aspects of multimedia that, although they are not necessarily directly related to “video quality”, can indirectly impact the work carried out within VQEG and are not addressed by any of the existing VQEG groups. In particular, this group aims to provide a common platform for people to gather together and discuss new emerging topics, possible collaborations in the form of joint survey papers, funding proposals, etc.
In this meeting, Nabajeet Barman and Saman Zadtootaghaj (Sony
Interactive Entertainment, Germany), suggested a topic to start to be discussed
within VQEG: Quality Assessment of AI Generated/Modified Content. The goal is
to have subsequent discussions on this topic within the group and write a position
or whitepaper.
The IMG group is performing research on the quality assessment of immersive media technologies. Currently, the main joint activity of the group is the development of a test plan to evaluate the QoE of immersive interactive communication systems, which is carried out in collaboration with ITU-T through the work item P.IXC. In this meeting, Pablo Pérez (Nokia XR Lab, Spain), Jesús Gutiérrez (Universidad Politécnica de Madrid, Spain), Kamil Koniuch (AGH University of Krakow, Poland), Ashutosh Singla (CWI, The Netherlands) and other researchers involved in the test plan provided an update on the status of the test plan, focusing on the description of four interactive tasks to be performed in the test, the considered measures, and the 13 different experiments that will be carried out in the labs involved in the test plan. Also, in relation with this test plan, Felix Immohr (TU Ilmenau, Germany), presented a study on the impact of spatial audio on social presence and user behavior in multi-modal VR communications.
Quality Assessment for Computer Vision Applications (QACoViA)
The 5GKPI group studies relationship between key performance indicators of new 5G networks and QoE of video services on top of them. At the meeting, Pablo Pérez (Nokia XR Lab, Spain) led an open discussion on the future activities of the group towards 6G, including a brief presentation of QoS/QoE management in 3GPP and presenting potential opportunities to influence QoE in 6G.
The 146th MPEG meeting was held in Rennes, France from 22-26 April 2024, and the official press release can be found here. It comprises the following highlights:
AI-based Point Cloud Coding*: Call for proposals focusing on AI-driven point cloud encoding for applications such as immersive experiences and autonomous driving.
Object Wave Compression*: Call for interest in object wave compression for enhancing computer holography transmission.
Open Font Format: Committee Draft of the fifth edition, overcoming previous limitations like the 64K glyph encoding constraint.
Scene Description: Ratified second edition, integrating immersive media objects and extending support for various data types.
MPEG Immersive Video (MIV): New features in the second edition, enhancing the compression of immersive video content.
Video Coding Standards: New editions of AVC, HEVC, and Video CICP, incorporating additional SEI messages and extended multiview profiles.
Machine-Optimized Video Compression*: Advancement in optimizing video encoders for machine analysis.
Video-based Dynamic Mesh Coding (V-DMC)*: Committee Draft status for efficiently storing and transmitting dynamic 3D content.
LiDAR Coding*: Enhanced efficiency and responsiveness in LiDAR data processing with the new standard reaching Committee Draft status.
* … covered in this column.
AI-based Point Cloud Coding
MPEG issued a Call for Proposals (CfP) on AI-based point cloud coding technologies as a result from ongoing explorations regarding use cases, requirements, and the capabilities of AI-driven point cloud encoding, particularly for dynamic point clouds.
With recent significant progress in AI-based point cloud compression technologies, MPEG is keen on studying and adopting AI methodologies. MPEG is specifically looking for learning-based codecs capable of handling a broad spectrum of dynamic point clouds, which are crucial for applications ranging from immersive experiences to autonomous driving and navigation. As the field evolves rapidly, MPEG expects to receive multiple innovative proposals. These may include a unified codec, capable of addressing multiple types of point clouds, or specialized codecs tailored to meet specific requirements, contingent upon demonstrating clear advantages. MPEG has therefore publicly called for submissions of AI-based point cloud codecs, aimed at deepening the understanding of the various options available and their respective impacts. Submissions that meet the requirements outlined in the call will be invited to provide source code for further analysis, potentially laying the groundwork for a new standard in AI-based point cloud coding. MPEG welcomes all relevant contributions and looks forward to evaluating the responses.
Research aspects: In-depth analysis of algorithms, techniques, and methodologies, including a comparative study of various AI-driven point cloud compression techniques to identify the most effective approaches. Other aspects include creating or improving learning-based codecs that can handle dynamic point clouds as well as metrics for evaluating the performance of these codecs in terms of compression efficiency, reconstruction quality, computational complexity, and scalability. Finally, the assessment of how improved point cloud compression can enhance user experiences would be worthwhile to consider here also.
Object Wave Compression
A Call for Interest (CfI) in object wave compression has been issued by MPEG. Computer holography, a 3D display technology, utilizes a digital fringe pattern called a computer-generated hologram (CGH) to reconstruct 3D images from input 3D models. Holographic near-eye displays (HNEDs) reduce the need for extensive pixel counts due to their wearable design, positioning the display near the eye. This positions HNEDs as frontrunners for the early commercialization of computer holography, with significant research underway for product development. Innovative approaches facilitate the transmission of object wave data, crucial for CGH calculations, over networks. Object wave transmission offers several advantages, including independent treatment from playback device optics, lower computational complexity, and compatibility with video coding technology. These advancements open doors for diverse applications, ranging from entertainment experiences to real- time two-way spatial transmissions, revolutionizing fields such as remote surgery and virtual collaboration. As MPEG explores object wave compression for computer holography transmission, a Call for Interest seeks contributions to address market needs in this field.
Research aspects: Apart from compression efficiency, lower computation complexity, and compatibility with video coding technology, there is a range of research aspects, including the design, implementation, and evaluation of coding algorithms within the scope of this CfI. The QoE of computer-generated holograms (CGHs) together with holographic near-eye displays (HNEDs) is yet another dimension to be explored.
Machine-Optimized Video Compression
MPEG started working on a technical report regarding to the “Optimization of Encoders and Receiving Systems for Machine Analysis of Coded Video Content”. In recent years, the efficacy of machine learning-based algorithms in video content analysis has steadily improved. However, an encoder designed for human consumption does not always produce compressed video conducive to effective machine analysis. This challenge lies not in the compression standard but in optimizing the encoder or receiving system. The forthcoming technical report addresses this gap by showcasing technologies and methods that optimize encoders or receiving systems to enhance machine analysis performance.
Research aspects: Video (and audio) coding for machines has been recently addressed by MPEG Video and Audio working groups, respectively. MPEG Joint Video Experts Team with ITU-T SG16, also known as JVET, joined this space with a technical report, but research aspects remain unchanged, i.e., coding efficiency, metrics, and quality aspects for machine analysis of compressed/coded video content.
MPEG-I Immersive Audio
MPEG Audio Coding is entering the “immersive space” with MPEG-I immersive audio and its corresponding reference software. The MPEG-I immersive audio standard sets a new benchmark for compact and lifelike audio representation in virtual and physical spaces, catering to Virtual, Augmented, and Mixed Reality (VR/AR/MR) applications. By enabling high-quality, real-time interactive rendering of audio content with six degrees of freedom (6DoF), users can experience immersion, freely exploring 3D environments while enjoying dynamic audio. Designed in accordance with MPEG’s rigorous standards, MPEG-I immersive audio ensures efficient distribution across bandwidth-constrained networks without compromising on quality. Unlike proprietary frameworks, this standard prioritizes interoperability, stability, and versatility, supporting both streaming and downloadable content while seamlessly integrating with MPEG-H 3D audio compression. MPEG-I’s comprehensive modeling of real-world acoustic effects, including sound source properties and environmental characteristics, guarantees an authentic auditory experience. Moreover, its efficient rendering algorithms balance computational complexity with accuracy, empowering users to finely tune scene characteristics for desired outcomes.
Research aspects: Evaluating QoE of MPEG-I immersive audio-enabled environments as well as the efficient audio distribution across bandwidth-constrained networks without compromising on audio quality are two important research aspects to be addressed by the research community.
Video-based Dynamic Mesh Coding (V-DMC)
Video-based Dynamic Mesh Compression (V-DMC) represents a significant advancement in 3D content compression, catering to the ever-increasing complexity of dynamic meshes used across various applications, including real-time communications, storage, free-viewpoint video, augmented reality (AR), and virtual reality (VR). The standard addresses the challenges associated with dynamic meshes that exhibit time-varying connectivity and attribute maps, which were not sufficiently supported by previous standards. Video-based Dynamic Mesh Compression promises to revolutionize how dynamic 3D content is stored and transmitted, allowing more efficient and realistic interactions with 3D content globally.
Research aspects: V-DMC aims to allow “more efficient and realistic interactions with 3D content”, which are subject to research, i.e., compression efficiency vs. QoE in constrained networked environments.
Low Latency, Low Complexity LiDAR Coding
Low Latency, Low Complexity LiDAR Coding underscores MPEG’s commitment to advancing coding technologies required by modern LiDAR applications across diverse sectors. The new standard addresses critical needs in the processing and compression of LiDAR-acquired point clouds, which are integral to applications ranging from automated driving to smart city management. It provides an optimized solution for scenarios requiring high efficiency in both compression and real-time delivery, responding to the increasingly complex demands of LiDAR data handling. LiDAR technology has become essential for various applications that require detailed environmental scanning, from autonomous vehicles navigating roads to robots mapping indoor spaces. The Low Latency, Low Complexity LiDAR Coding standard will facilitate a new level of efficiency and responsiveness in LiDAR data processing, which is critical for the real-time decision-making capabilities needed in these applications. This standard builds on comprehensive analysis and industry feedback to address specific challenges such as noise reduction, temporal data redundancy, and the need for region-based quality of compression. The standard also emphasizes the importance of low latency coding to support real-time applications, essential for operational safety and efficiency in dynamic environments.
Research aspects: This standard effectively tackles the challenge of balancing high compression efficiency with real-time capabilities, addressing these often conflicting goals. Researchers may carefully consider these aspects and make meaningful contributions.
The 147th MPEG meeting will be held in Sapporo, Japan, from July 15-19, 2024. Click here for more information about MPEG meetings and their developments.