First edition of the Social Robotics, Artificial Intelligence and Multimedia (SoRAIM) School


In February 20204 was held the first edition of the Social Robotics, Artificial Intelligence and Multimedia (SoRAIM) Winter School, which, with the support of SIGMM attracted more than 50 students and young researchers to learn, discuss and first-hand experiment in topics related to social robotics. The event’s success calls for further editions in upcoming years.

Rationale for SoRAIM

SPRING, a collaborative research project funded by the European Commission under Horizon 2020, is coming to an end in May 2024. Its scientific and technological objectives were to test a versatile social robotic platform within a hospital and have it perform social activities in a multi-person, dynamic setup are in most part achieved. In order to empower the next generation of young researchers with concepts and tools to answer tomorrow’s challenges in the field of social robotics, one must tackle the issue of knowledge and know-how transmission. We therefore chose to provide a winter school, free of charge to the participants (thanks to the additional support of SIGMM), so that as many students and young researchers from various horizons (not only technical fields) could attend. 

Contents of the Winter School

The Social Robotics, Artificial Intelligence and Multimedia (SoRAIM) Winter School took place from 19 to 23 February 2024 in Grenoble, France. An introduction to the contents of the school and the context provided by the SPRING project was provided, and a demonstration combining social navigation and dialogue interaction was given on the first day. This triggered the curiosity of the participants, and a spontaneous Q&A session with the contributions, questions and comments from the participants to the school was held. 

The school spanned over the entire week, with 17 talks, 8 speakers from the H2020 SPRING project, and 9 invited speakers external to the project. The school also included a panel discussion on the topic “Are social robots already out there? Immediate challenges in real-world deployment”, a poster session with 15 contributions, and two hands-on sessions where the participants could choose among the following topics: Robot navigation with Reinforcement Learning, ROS4HRI: How to represent and reason about humans with ROS, Building a conversational system with LLMs using prompt engineering, Robot self-localisation based on camera images, and Speaker extraction from microphone recordings. A social activity (visit of Grenoble’s downtown and Bastille) was organised on Thursday afternoon, allowing participants to mingle with speakers and to discover the host town’s history.

One of the highlights of SoRAIM was its Panel Session, which topic was “Are social robots already out there? Immediate challenges in real-world deployment”.  Although no definitive answers were found, the session stressed the fact that challenges remain numerous for the deployment of actual social robots in our everyday lives (at work, at home). On the technical side, because robotic platforms are subject to certain hardware and software constraints. On the hardware side, because sensors and actuators are restricted in size, power and performance, since the physical space and the battery capacity are also limited. On the software side, because large models can be used if lots of computing resources are permanently available, which is not always the case, since they need to be shared between the various computing modules. Finally on the regulatory and legal side, because the rise of AI use is fast and needs to be balanced with ethical views that address our society’s needs; but the construction of proper laws, norms and their acknowledgement and understanding by stakeholders is slow. In this session the panellists surveyed all aspects of the problems at hand and provided an overview of the challenges that future scientists will need to solve in order to take social robots out of the labs and into the world.

Attendance & future perspectives

SoRAIM attracted 57 participants through the whole week. The attendees were diverse, as was aimed initially, with a breakdown of 50% of PhD students, 20% of young researchers (public sector), 10% of engineers and young researchers (private sector), and 20% of MSc students. Of particular focus, the ratio of women attendees was close to 40%, which is double of the usual in this field. Finally, in terms of geographic spread, attendees came in majority from other European countries (17 countries total), with just below 50% attendees coming from France. Following the school, a satisfaction survey was sent to the attendees in order to better grasp which elements were the most appreciated in view of a longer-term objective to hold this winter school as a serial event. Given the diverse background of attendees, opinions on contents such as the hands-on session varied, but overall satisfaction was very high, which shows the interest of the next generation of researchers for more opportunities to learn in this field. We are currently reviewing options to held similar events each year or every two years, depending on available funding.

More information about the SoRAIM winter school is available on the webpage: https://spring-h2020.eu

Sponsors

SoRAIM was sponsored by the H2020 SPRING project, Inria, the University Grenoble Alpes, the Multidisciplinary Institute of Artificial Intelligence and by ACM’s Special Interest Group on Multimedia (SIGMM). Through ACM SIGMM, we received significant funding which allowed us to invite 14 students and young researchers, members of SIGMM, from abroad.

Full list of contributions

All the talks are available in replay on our YouTube channel: https://www.youtube.com/watch?v=ckJv0eKOgzY&list=PLwdkYSztYsLfWXWai6mppYBwLVjK0VA6y
The complete list of talks and posters presented at SoRAIM Winter School 2024 can be found here: https://spring-h2020.eu/soraim/
In the following, the list of talks in chronological order:

JPEG Column: 101st JPEG Meeting

JPEG Trust reaches Committee Draft stage at the 101st JPEG meeting

The 101st JPEG meeting was held online, from the 30th of October to the 3rd of November 2023. At this meeting, JPEG Trust became a Committee Draft. In addition, JPEG analyzed the responses to its Calls for Proposals for JPEG DNA.

The 101st JPEG meeting had the following highlights:

  • JPEG Trust reaches Committee Draft;
  • JPEG AI request its re-establishment;
  • JPEG Pleno Learning-based Point Cloud coding establishes a new Verification Model;
  • JPEG Pleno organizes a Light Field Industry Workshop;
  • JPEG AIC-3 continues the evaluation of contributions;
  • JPEG XE produces a first draft of the Common Test Conditions;
  • JPEG DNA analyses the responses to the Call for Proposals;
  • JPEG XS proceeds with the development of the 3rd edition;
  • JPEG XL proceeds with the development of the 2nd edition.

The following sections summarize the main highlights of the 101st JPEG meeting.

JPEG Trust

The 101st meeting marked an important milestone for JPEG Trust project with its Committee Draft (CD) for Part 1 “Core Foundation” (21617-1) of the standard approved for consultation. It is expected that a Draft International Standard (DIS) of the Core Foundation will be approved at the 102nd JPEG meeting in January 2024, which will be another important milestone. This rapid schedule is necessitated by the speed at which fake media and misinformation are proliferating especially in respect of generative AI.

Aligned with JPEG Trust, the NFT Call for Proposals (CfP) has yielded two expressions of interest to date, and submission of proposals is still open till the 15th of January 2024.

Additionally, the Use Cases and Requirements document for JPEG Fake Media (the JPEG Fake Media exploration preceded the initiation of the JPEG Trust international standard) was updated to reflect the change to JPEG Trust as well as incorporate additional use cases that have arisen since the previous JPEG meeting, namely in respect of composited images. This document is publicly available on the JPEG website.

JPEG AI

At the 101st meeting, the JPEG Committee issued a request for re-establishing the JPEG AI (6048-1) project, along with a Committee Draft (CD) of its version 1. A new JPEG AI timeline has also been approved and is now publicly available, where a Draft International Standard (DIS) of the Core Coding Engine of JPEG AI version 1 is foreseen at the 103rd JPEG meeting (April 2024), a rather important milestone for JPEG AI. The JPEG Committee also established that JPEG AI version 2 will address requirements not yet fulfilled (especially regarding machine consumption tasks) but also significant improvements on requirements already addressed in version 1, e.g. compression efficiency. JPEG AI version 2 will issue the final Call for Proposals in January 2025 and the presentation and evaluation of JPEG AI version 2 proposals will occur in July 2025. During 2023, the JPEG AI Verification Model (VM) has evolved from a complex system (800kMAC/pxl) to two acceptable complexity-efficiency operation points, providing 11% compression efficiency gains at 20 kMAC/pxl and 25% compression efficiency gains at 200 kMAC/pxl. The decoder for the lower-end operating point has now been implemented on mobile devices and demonstrated during the 100th and 101st JPEG meetings. A presentation with the JPEG AI architecture, networks, and tools is now publicly available. To avoid project delays in the future, the promising input contributions from the 101st meeting will be combined in JPEG AI Core Experiment 6.1 (CE6.1) to study interaction and resolve potential issues during the next meeting cycle. After this integration, a model will be trained and cross-checked to be approved for release (JPEG AI VM5 release candidate) along with the study DIS text. Among promising technologies included in CE6.1 are high quality and variable rate improvements, with a smaller number of models (from 5 to 4), a multi-branch decoder that allows up to three reconstructions with different levels of quality from the same latent representation, but with synthesis transform networks with different complexity along with several post-filter and arithmetic coder simplifications.

JPEG Pleno Learning-based Point Cloud coding

The JPEG Pleno Learning-based Point Cloud coding activity progressed at the 101st meeting with a major investigation into point cloud quality metrics. The JPEG Committee decided to continue this investigation into point cloud quality metrics as well as explore possible advancements to the VM in the areas of parameter tuning and support for residual lossless coding. The JPEG Committee is targeting a release of the Committee Draft of Part 6 of the JPEG Pleno standard relating to Learning-based point cloud coding at the 102nd JPEG meeting in San Francisco, USA in January 2024.

JPEG Pleno Light Field

The JPEG Committee has been creating several standards to provision the dynamic demands of the market, with its royalty-free patent licensing commitments. A light field coding standard has recently been developed, and JPEG Pleno is constantly exploring novel light field coding architectures.

The JPEG Committee is also preparing standardization activities – among others – in the domains of objective and subjective quality assessment for light fields, improved light field coding modes, and learning-based light field coding.

A Light Field Industry Workshop takes place on November 22nd, 2023, aiming at providing a forum for industrial actors to exchange information on their needs and expectations with respect to standardization activities in this domain.

JPEG AIC

During the 101st JPEG meeting, the AIC activity continued its efforts on the evaluation of the contributions received in April 2023 in response to the Call for Contributions on Subjective Image Quality Assessment. Notably, the activity is currently investigating three different subjective image quality assessment methodologies. The results of the newly established Core Experiments will be considered during the design of the AIC-3 standard, which has been carried out in a collaborative way since its beginning.

The AIC activity also initiated the discussion on Part 4 of the standard on Objective Image Quality Metrics (AIC-4) by refining the Use Cases and Requirements document. During the 102nd JPEG meeting in January 2024, the activity is planning to work on the Draft Call for Proposals on Objective Image  

JPEG XE

The JPEG Committee continued its activity on Event-based Vision. This activity revolves around a new and emerging image modality created by event-based visual sensors. JPEG XE aims at the creation and development of a standard to represent events in an efficient way allowing interoperability between sensing, storage, and processing, targeting machine vision and other relevant applications. For better dissemination and raising external interest, a workshop around Event-based Vision was organized and took place on Oct 24th, 2023. The workshop triggered the attention of various stakeholders in the field of Event-based Vision, who will start contributing to JPEG XE. The workshop proceedings will be made available on jpeg.org. In addition, the JPEG Committee created a minor revision for the Use cases and Requirements as v1.0, adding an extra use case on scientific and engineering measurements. Finally, a first draft of the Common Test Conditions for JPEG XE was produced, along with the first Exploration Experiments to start practical experiments in the coming 3-month period until the next JPEG meeting. The public Ad-hoc Group on Event-based Vision was re-established to continue the work towards the next 102nd JPEG meeting in January of 2024. To stay informed about the activities please join the Event-based Vision Ad-hoc Group mailing list.

JPEG DNA

As a result of the Call for Proposals issued by the JPEG Committee for contributions to JPEG DNA standard, 5 proposals were submitted under three distinct codecs by three organizations. Two codecs were submitted to both coding and transcoding categories, and one was submitted to the coding category only. All proposals showed improved compression efficiency when compared to three selected anchors by the JPEG Committee. After a rigorous analysis of the proposals and their cross checking by independent parties, it was decided to create a first Verification Model (VM) based on V-DNA, the best performing proposal. In addition, a number of core experiments were designed to improve the JPEG DNA VM with elements from other proposals submitted by quantifying their added value when integrated in the VM.

JPEG XS

The JPEG Committee continued its work on JPEG XS 3rd edition. The primary goal of the 3rd edition is to deliver the same image quality as the 2nd edition, but with half of the required bandwidth. The Final Draft International Standard for Part 1 of the standard — Core coding tools — was produced at this meeting. With this FDIS version, all technical features are now fixed and completed. Part 2 — Profiles and buffer models — and Part 3 — Transport and container formats — of the standard are still in DIS ballot, and ballot results will only be known by the end of January 2024. The JPEG Committee is now working on Part 4 — Conformance testing, to provide the necessary test streams of the 3rd edition for potential implementors. A first Working Draft for Part 4 was issued. Completion of the JPEG XS 3rd edition is scheduled for April 2024 (Parts 1, 2, and 3) and Parts 4 and 5 will follow shortly after that. Finally, the new Use cases and Requirements for JPEG XS document was created containing a new use case to use JPEG XS for transport of 4K/8K video over 5G mobile networks. It is expected that the new use case can already be covered by the 3rd edition, meaning that no further updates to the standard would be needed. However, more investigations and experimentations will be conducted on this subject.

JPEG XL

The second editions of JPEG XL Part 1 (Core coding system) and Part 2 (File format) have proceeded to the FDIS stage, and the second edition of JPEG XL Part 3 (Conformance testing) has proceeded to the CD stage. These second editions provide clarifications, corrections and editorial improvements that will facilitate independent implementations. At the same time, the development of hardware implementation solutions continues.

Final Quote

“The release of the first Committee Draft of JPEG Trust is a strong signal that the JPEG Committee is reacting with a timely response to demands for solutions that inform users when digital media assets are created or modified, in particular through Generative AI, hence contributing to bringing back trust into media-centric ecosystems.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

MPEG Column: 145th MPEG Meeting (Virtual/Online)

The 145th MPEG meeting was held online from 22-26 January 2024, and the official press release can be found here. It comprises the following highlights:

  • Latest Edition of the High Efficiency Image Format Standard Unveils Cutting-Edge Features for Enhanced Image Decoding and Annotation
  • MPEG Systems finalizes Standards supporting Interoperability Testing
  • MPEG finalizes the Third Edition of MPEG-D Dynamic Range Control
  • MPEG finalizes the Second Edition of MPEG-4 Audio Conformance
  • MPEG Genomic Coding extended to support Transport and File Format for Genomic Annotations
  • MPEG White Paper: Neural Network Coding (NNC) – Efficient Storage and Inference of Neural Networks for Multimedia Applications

This column will focus on the High Efficiency Image Format (HEIF) and interoperability testing. As usual, a brief update on MPEG-DASH et al. will be provided.

High Efficiency Image Format (HEIF)

The High Efficiency Image Format (HEIF) is a widely adopted standard in the imaging industry that continues to grow in popularity. At the 145th MPEG meeting, MPEG Systems (WG 3) ratified its third edition, which introduces exciting new features, such as progressive decoding capabilities that enhance image quality through a sequential, single-decoder instance process. With this enhancement, users can decode bitstreams in successive steps, with each phase delivering perceptible improvements in image quality compared to the preceding step. Additionally, the new edition introduces a sophisticated data structure that describes the spatial configuration of the camera and outlines the unique characteristics responsible for generating the image content. The update also includes innovative tools for annotating specific areas in diverse shapes, adding a layer of creativity and customization to image content manipulation. These annotation features cater to the diverse needs of users across various industries.

Research aspects: Progressive coding has been a part of modern image coding formats for some time now. However, the inclusion of supplementary metadata provides an opportunity to explore new use cases that can benefit both user experience (UX) and quality of experience (QoE) in academic settings.

Interoperability Testing

MPEG standards typically comprise format definitions (or specifications) to enable interoperability among products and services from different vendors. Interestingly, MPEG goes beyond these format specifications and provides reference software and conformance bitstreams, allowing conformance testing.

At the 145th MPEG meeting, MPEG Systems (WG 3) finalized two standards comprising conformance and reference software by promoting it to the Final Draft International Standard (FDIS), the final stage of standards development. The finalized standards, ISO/IEC 23090-24 and ISO/IEC 23090-25, showcase the pinnacle of conformance and reference software for scene description and visual volumetric video-based coding data, respectively.

ISO/IEC 23090-24 focuses on conformance and reference software for scene description, providing a comprehensive reference implementation and bitstream tailored for conformance testing related to ISO/IEC 23090-14, scene description. This standard opens new avenues for advancements in scene depiction technologies, setting a new standard for conformance and software reference in this domain.

Similarly, ISO/IEC 23090-25 targets conformance and reference software for the carriage of visual volumetric video-based coding data. With a dedicated reference implementation and bitstream, this standard is poised to elevate the conformance testing standards for ISO/IEC 23090-10, the carriage of visual volumetric video-based coding data. The introduction of this standard is expected to have a transformative impact on the visualization of volumetric video data.

At the same 145th MPEG meeting, MPEG Audio Coding (WG6) celebrated the completion of the second edition of ISO/IEC 14496-26, audio conformance, elevating it to the Final Draft International Standard (FDIS) stage. This significant update incorporates seven corrigenda and five amendments into the initial edition, originally published in 2010.

ISO/IEC 14496-26 serves as a pivotal standard, providing a framework for designing tests to ensure the compliance of compressed data and decoders with the requirements outlined in ISO/IEC 14496-3 (MPEG-4 Audio). The second edition reflects an evolution of the original, addressing key updates and enhancements through diligent amendments and corrigenda. This latest edition, now at the FDIS stage, marks a notable stride in MPEG Audio Coding’s commitment to refining audio conformance standards and ensuring the seamless integration of compressed data within the MPEG-4 Audio framework.

These standards will be made freely accessible for download on the official ISO website, ensuring widespread availability for industry professionals, researchers, and enthusiasts alike.

Research aspects: Reference software and conformance bitstreams often serve as the basis for further research (and development) activities and, thus, are highly appreciated. For example, reference software of video coding formats (e.g., HM for HEVC, VM for VVC) can be used as a baseline when improving coding efficiency or other aspects of the coding format.

MPEG-DASH Updates

The current status of MPEG-DASH is shown in the figure below.

MPEG-DASH Status, January 2024.

The following most notable aspects have been discussed at the 145th MPEG meeting and adopted into ISO/IEC 23009-1, which will eventually become the 6th edition of the MPEG-DASH standard:

  • It is now possible to pass CMCD parameters sid and cid via the MPD URL.
  • Segment duration patterns can be signaled using SegmentTimeline.
  • Definition of a background mode of operation, which allows a DASH player to receive MPD updates and listen to events without possibly decrypting or rendering any media.

Additionally, the technologies under consideration (TuC) document has been updated with means to signal maximum segment rate, extend copyright license signaling, and improve haptics signaling in DASH. Finally, REAP is progressing towards FDIS but not yet there and most details will be discussed in the upcoming AhG period.

The 146th MPEG meeting will be held in Rennes, France, from April 22-26, 2024. Click here for more information about MPEG meetings and their developments.

Message from the ACM SIGMM Chair

About our initiatives for the Multimedia community

Dear SIGMM members, colleagues, students,

The world is changing rapidly, and technology is driving these changes at an unprecedented pace. In this scenario, multimedia has become ubiquitous, providing new services to users, advanced modalities for information transmission, processing, and management, as well as innovative solutions for digital content understanding and production. The progress of Artificial Intelligence has fueled new opportunities and vitality in the field. New media formats, such as 3D, event data, and other sensory inputs, have become popular. Cutting-edge applications are constantly being developed and introduced.

We believed that these changes should be reflected in our SIGMM flagship conference, ACM Multimedia and the SIGMM organization and activities overall. This belief led us to organize the SIGMM Retreat in coincidence with ACM MM23 in Ottawa on October 30, 2023. The goal of the meeting was opening a discussion on key strategic issues such as the coverage of ACM Multimedia, its quality and reputation and how we can grow the SIGMM community. We invited the members of the SIGMM Advisory Committee, the members of the Steering Committees of the SIGMM-sponsored conferences, the ACM TOMM Editor in Chief, the past SIGMM Chairs, and senior personalities and emerging researchers of our community.  The Retreat was well attended. Twenty people attended in-person. Ten attended on-line. Alberto Del Bimbo, SIGMM Chair, chaired the Retreat with the assistance of Phoebe Chen, SIGMM Vice-chair and Xavier Alameda Pineda.  

The discussion was vibrant and valued opinions and suggestions emerged. It was widely agreed that the distinctive feature of multimedia research is the combination and integration of various modalities to build end-to-end systems.

People agreed on the need to introduce significant changes in the format of our flagship conference to bring new attractiveness. High consensus received the ideas of giving more room to Brave New Ideas sections, having TED-like talks, and soliciting workshops on innovative topics and striving for their continuity. There was also consensus on revitalizing the program by including new emerging topics like Foundation Models, 3D, glass free interactivity, new networking platforms. All the attendees recognized the need to balance the traditional research areas of the ACM Multimedia program.

There was general agreement on using Open Access as the reviewing system for ACM Multimedia. It was recognized it improves the quality and transparency of the reviewing process, enhances respectability, empowers the reviewers to conduct serious reviews, and aligns ACM Multimedia to the top ranked conferences. 

Other important topics of discussion included how to incentivize in-person attendance and discourage online participation to maximize the value of conferences, the collaboration and synchronization of SIGMM-sponsored conferences, and the need to make the transition between conference editions more seamless.

Recognizing the need for greater industry presence, also to offer internship opportunities for students and improve the attendance of younger generations, was identified as a key issue for improvement. All the attendees recognized the importance to exploit SIGMM Records and Social Media as a means to improve the sense of community and disseminate information.

Following our commitment to align words with actions, we decided to create Strike Teams focusing on the most strategic themes. These teams are composed of a few experienced colleagues who volunteered to define realistic strategies for the key issues, determine concrete actions, and help to implement them in the near future. Starting in January 2024, four strike teams are in operation, with members appointed for two years:

  • SIGMM Strike Team on Open Review to provide operational support on the implementation of Open Review, smoothly transferring the best practices and helping to provide new functions.
    Team members are: Xavier Alameda Pineda (Univ. Grenoble-Alpes) Coordinator, Marco Bertini (Univ. Firenze).
  • SIGMM Strike Team on Harmonization and Spread to integrate SIGMM Records and Social Media in the whole process of the ACM Multimedia organization, improve synchronization and harmonization between ACM Multimedia and other SIGMM Conferences, and strengthen the sense of community.
    Team members are: Miriam Redi (Wikimedia Foundation) Coordinator, Silvia Rossi (CWI), Irene Viola (CWI), Mylene Farias (Texas State Univ. and Univ. Brasilia), Ichiro Ide (Nagoya Univ), Pablo Cesar (CWI and TU Delft).
  • SIGMM Strike Team on Industry Engagement to improve the presence of industry at ACM Multimedia, launching new in-cooperation initiatives and establishing stable bi-directional links.
    Team members are:  Touradj Ebrahimi (EPFL) Coordinator, Ali Begen (Ozyegin Univ), Balu Adsumilli (Google), Yong Rui (Lenovo) and ChangSheng Xu (Chinese Academy of Sciences)
  • Strike Team on ACMMM Format to innovate the ACM Multimedia program, aligning it with technological advancements and the emergence of new research areas, and igniting fresh and efficient means of disseminating research.
    Team members are: Arnold Smeulders (Univ. of Amsterdam) Coordinator, Alan Smeaton (Dublin City University), Tat Seng Chua (National University of Singapore), Changwen Chen (Hong Kong Polytechnic Univ.),  Nicu Sebe (Univ. of Trento), Marcel Worring  (Univ. of Amsterdam) and the Chairs of the next two ACMMM Conferences, Jianfei Cai (Monash Univ.) and Cathal Gurrin (Dublin City Univ.).

All the teams report to SIGMM Chair and the SIGMM Executive Committee and will work in close connection with the General Chairs and Program Chairs of the next ACM Multimedia editions.

I take this opportunity to thank again all those who participated in the SIGMM Retreat, and especially those who are committed to the Strike Teams. I sincerely hope that their work brings new ideas and vitality to our community and strengthens its visibility and reputation in the international scientific arena in the years to come.

Alberto Del Bimbo                                                                                        
SIGMM Chair

Towards Immersive Digiphysical Experiences


Immersive experiences have the potential of redefining traditional forms of media engagement by intricately combining reality with imagination. Motivated by necessities, current developments and emerging technologies, this column sets out to bridge immersive experiences in both digital and physical realities. Fitting under the umbrella term of eXtended Reality (XR), the first section describes various realizations of blending digital and physical elements to design what we refer to as immersive digiphysical experiences. We further highlight industry and research initiatives related to driving the design and development of such experiences, considered to be key building-blocks of the futuristic ‘metaverse’. The second section outlines challenges related to assessing, modeling, and managing the Quality of Experience (QoE) of immersive digiphysical experiences and reflects upon ongoing work in the area. While potential use cases span a wide range of application domains, the third section elaborates on the specific case of conference organization, which has over the past few years spanned from fully physical, to fully virtual, and finally to attempts at hybrid organization. We believe this use case provides valuable insights into needs and promising approaches, to be demonstrated and experienced at the upcoming 16th edition of the International Conference on Quality of Multimedia Experience (QoMEX 2024) in Karlshamn, Sweden in June 2024.

Multiple users engaged in a co-located mixed reality experience

Bridging The Digital And Physical Worlds

According to [IMeX WP, 2020], immersive media have been described as involving “multi-modal human-computer interaction where either a user is immersed inside a digital/virtual space or digital/virtual artifacts become a part of the physical world”. Spanning the so-called virtuality continuum [Milgram, 1995], immersive media experiences may involve various realizations of bridging the digital and physical worlds, such as the seamless integration of digital content with the real world (via Augmented or Mixed Reality, AR/MR), and vice versa by incorporating real objects into a virtual environment (Augmented Virtuality, AV). More recently, the term eXtended Reality (XR) (also sometimes referred to as xReality) has been used as an umbrella term for a wide range of levels of “realities”, with [Rauschnabel, 2022] proposing a distinction between AR/MR and Virtual Reality (VR) based on whether the physical environment is, at least visually, part of the user’s experience.

By seamlessly merging digital and physical elements and supporting real-time user engagement with both digital and physical components, immersive digiphysical (i.e., both digitally and physically accessible [Westerlund, 2020]) experiences have the potential of providing compelling experiences blurring the distinction between the real and virtual worlds. A key aspect is that of digital elements responding to user input or the physical environment, and the physical environment responding to interactions with digital objects. Going beyond only visual or auditory stimuli, the incorporation of additional senses, for example via haptic feedback or olfactory elements, can contribute to multisensory engagement [Gibbs, 2022].

The rapid development of XR technologies has been recognized as a key contributor to realizing a wide range of applications built on the fusion of the digital and physical worlds [NEM WP, 2022]. In its contribution to the European XR Coalition (launched by the European Commission), the New European Media Initiative (NEM), Europe’s Technology Platform of Horizon 2020 dedicated to driving the future of digital experiences, calls for needed actions from both industry and research perspectives addressing challenges related to social and human centered XR as well as XR communication aspects [NEM XR, 2022]. One such initiative is the Horizon 2020 TRANSMIXR project [TRANSMIXR], aimed at developing a distributed XR creation environment that supports remote collaboration practices, as well as an XR media experience environment for the delivery and consumption of social immersive media experiences. The NEM initiative further identifies the need for scalable solutions to obtain plausible and convincing virtual copies of physical objects and environments, as well as solutions supporting seamless and convincing interaction between the physical and the virtual world. Among key technologies and infrastructures needed to overcome outlined challenges, the following are identified [NEM XR, 2022]: high bandwidth and low-latency energy-efficient networks; remote computing for processing and rendering deployed on cloud and edge infrastructures; tools for the creation and updating of digital twins (DT) to strengthen the link between the real and virtual worlds, integrating Internet of Things (IoT) platforms; hardware in the form of advanced displays; and various content creation tools relying on interoperable formats.

Merging the digital and physical worlds

Looking towards the future, immersive digiphysical experiences set the stage for visions of the metaverse [Wang, 2023], described as representing the evolution of the Internet towards a platform enabling immersive, persistent, and interconnected virtual environments blending digital and physical [Lee, 2021].[Wang, 2022] see the metaverse as `created by the convergence of physically persistent virtual space and virtually enhance physical reality’. The metaverse is further seen as a platform offering the potential to host real-time multisensory social interactions (e.g., involving sight, hearing, touch) between people communicating with each other in real-time via avatars [Hennig-Thurau, 2023]. As of 2022, the Metaverse Standards Forum is proving a venue for industry coordination fostering the development of interoperability standards for an open and inclusive metaverse [Metaverse, 2023]. Relevant existing standards include: ISO/IEC 23005 (MPEG-V) (standardization of interfaces between the real world and the virtual world, and among virtual worlds) [ISO/IEC 23055], IEEE 2888 (definition of standardized interfaces for synchronization of cyber and physical worlds) [IEEE 2888], and MPEG-I (standards to digitally represent immersive media) [ISO/IEC 23090].

Research Challenges For The Qoe Community

Achieving wide-spread adoption of XR-based services providing digiphysical experiences across a broad range of application domains (e.g., education, industry & manufacturing, healthcare, engineering, etc.) inherently requires ensuring intuitive, comfortable, and positive user experiences. While research efforts in meeting such requirements are well under way, a number of open challenges remain.

Quality of Experience (QoE) for immersive media has been defined as [IMeX WP, 2020]the degree of delight or annoyance of the user of an application or service which involves an immersive media experience. It results from the fulfillment of his or her expectations with respect to the utility and/or enjoyment of the application or service in the light of the user’s personality and current state.” Furthermore, a bridge between QoE and UX has been established through the concept of Quality of User Experience (QUX), combining hedonic, eudaimonic and pragmatic aspects of QoE and UX [Egger-Lampl, 2019]. In the context of immersive communication and collaboration services, significant efforts are being invested towards understanding and optimizing the end user experience [Perez, 2022].

The White Paper [IMeX WP, 2020] ties immersion to the digital media world (“The more the system blocks out stimuli from the physical world, the more the system is considered to be immersive.”). Nevertheless, immersion as such exists in physical contexts as well, e.g., when reading a captivating book. MR, XR and AV scenarios are digiphysical in their nature. These considerations pose several challenges:

  1. Achieving intuitive and natural interactive experiences [Hennig-Thurau, 2023] when mixing realities.
  2. Developing a common understanding of MR-, XR- and AV-related challenges in digiphysical multi-modal multi-party settings.
  3. Advancing VR, AR, MR, XR and AV technologies to allow for truly digiphysical experiences.
  4. Measuring and modeling QoE, UX and QUX for immersive digiphysical services, covering overall methodology, measurement instruments, modeling approaches, test environments and application domains.
  5. Management of the networked infrastructure to support immersive digiphysical experiences with appropriate QoE, UX and QUX.
  6. Sustainability considerations in terms of environmental footprint, accessibility, equality of opportunities in various parts of the world, and cost/benefit ratio.

Challenges 1 and 2 demand for an experience-based bottom-up approach to focus on the most important aspects. Examples include designing and evaluating different user representations [Aseeri, 2021][Viola, 2023], natural interaction techniques [Spittle, 2023] and use of different environments by participants (AR/MR/VR) [Moslavac, 2023]. The latter has shown beneficial for challenges 3 (cf. the emergence of MR-/XR-/AV-supporting head-mounted devices such as the Microsoft Hololens and recent pass-through versions of the Meta Quest) and 4. Finally, challenges 5 and 6 need to be carefully addressed to allow for long-term adoption and feasibility.

Challenges 1 to 4 have been addressed in standardization. For instance, ITU-T Recommendation P.1320 specifies QoE assessment procedures and metrics for the evaluation of XR telemeetings, outlining various categories of QoE influence factors and use cases [ITU-T Rec. P.1320, 2022] (adopted from the 3GPP technical report TR 26.928 on XR technology in 5G). The corresponding ITU-T Study Group 12 (Question 10) developed a taxonomy of telemeetings [ITU-T Rec. G.1092, 2023], providing a systematic classification of telemeeting systems. Ongoing joint efforts between the VQEG Immersive Media Group and ITU-T Study Group 12 are targeted towards specifying interactive test methods for subjective assessment of XR communications [ITU-T P.IXC, 2022].

The complexity of the aforementioned challenges demand for a combination of fundamental work, use cases, implementations, demonstrations, and testing. One specific use case that has shown its urge during recent years in combining digital and physical realities is that of hybrid conference organization, touching in particular on the challenge of achieving intuitive and natural interactions between remote and physically present participants. We consider this use case in detail in the following section, referring to the organization of the International Conference on Quality of Multimedia Experience (QoMEX) as an example.

Immersive Communication And Collaboration: The Case Of Conference Organization

What seemed to be impossible and was undesirable in the past, became a necessity overnight during the CoVid-19 pandemic: running conferences as fully virtual events. Many research communities succeeded in adapting ongoing conference organizations such that communities could meet, present, demonstrate and socialize online. The conference QoMEX 2020 is one such example, whose organizers introduced a set of innovative instruments to mutually interact and enjoy, such as virtual Mozilla Hubs spaces for poster presentations and a music session with prerecorded contributions mixed to form a joint performance to be enjoyed virtually together. A yet unknown inventiveness was observed to make the best out of the heavily travel-restricted situation. Furthermore, the technical approaches varied from off-the-shelf systems (such as Zoom or Teams) to custom-built applications. However, the majority of meetings during CoVid times, no matter scale and nature, were run in unnatural 2D on-screen settings. The frequently reported phenomenon of videoconference (VC) fatigue can be attributed to a set of personal, organizational, technical and environmental factors [Döring, 2022]. Indeed, talking to one’s computer with many faces staring back, limited possibilities to move freely, technostress [Brod, 1984] and organizational mishaps made many people tired of VC technology that was designed for a better purpose, but could not get close enough to a natural real-life experience.

As CoVid was on its retreat, conferences again became physical events and communities enjoyed meeting again, e.g., at QoMEX 2022. However, voices were raised that asked for remote participation for various reasons, such as time or budget restrictions, environmental sustainability considerations, or simply the comfort of being able to work from home. With remote participation came the challenge of bridging between in-person and remote participants, i.e., turning conferences into hybrid events [Bajpai, 2022]. However, there are many mixed experiences from hybrid conferences, both with onsite and online participants: (1) The onsite participants suffer from interruptions of the session flow needed to fix problems with the online participation tool. Their readiness to devote effort, time, and money to participate in a future hybrid event in person might suffer from such issues, which in turn would weaken the corresponding communities; (2) The online participants suffer from similar issues, where sound irregularities (echo, excessive sound volumes, etc.) are felt to be particularly disturbing, along with feelings of being not properly included e.g., in Q&A-sessions and personal interactions. At both ends, clear signs of technostress and “us-and-them” feelings can be observed. Consequently, and despite good intentions and advice [Bajpai, 2022], any hybrid conference might miss its main purpose to bring researchers together to present, discuss and socialize. To avoid the above-listed issues, the post-CoVid QoMEX conferences (since 2022) avoided hybrid operations, with few exceptions.

A conference is a typical case that reveals difficulties in bringing the physical and digital worlds together [Westerlund, 2020], at least when relying upon state-of-the-art telemeeting approaches that have not explicitly been designed for hybrid and digiphysical operations. At the recent 26th ACM Conference on Computer-Supported Cooperative Work And Social Computing in Minneapolis, USA (CSCW 2023), one of the panel sessions focused on “Realizing Values in Hybrid Environments”. Panelists and audience shared experiences about successes and failures with hybrid events. The main take-aways were as follows: (1) there is a general lack of know-how, no matter how much funds are allocated, and (2) there is a significant demand for research activities in the area.

Yet, there is hope, as increasingly many VR, MR, XR and AV-supporting devices and applications keep emerging, enabling new kinds and representations of immersive experiences. In a conference context, the latter implies the feeling of “being there”, i.e., being integrated in the conference community, no matter where the participant is located. This calls for new ways of interacting amongst others through various realities (VR/MR/XR), which need to be invented, tried and evaluated in order to offer new and meaningful experiences in telemeeting scenarios [Viola, 2023]. Indeed, CSCW 2023 hosted a specific workshop titled “Emerging Telepresence Technologies for Hybrid Meetings: an Interactive Workshop”, during which visions, experiences, and solutions were shared and could be experienced locally and remotely. About half of the participants were online, successfully interacting with participants onsite via various techniques.

With these challenges and opportunities in mind, the motto of QoMEX 2024 has been set as “Towards immersive digiphysical experiences.” While the conference is organized as an in-person event, a set of carefully selected hybrid activities will be offered to interested remote participants, such as (1) 360° stereoscopic streaming of the keynote speeches and demo sessions, and (2) the option to take part in so-called hybrid experience demos. The 360° stereoscopic streaming has so far been tested successfully in local, national and transatlantic sessions (during the above-mentioned CSCW workshop) with various settings, and further fine-tuning will be done and tested before the conference. With respect to the demo session – and in addition to traditional onsite demos – this year, the conference will in particular solicit hybrid experience demos that enable both onsite and remote participants to test the demo in an immersive environment. Facilities will also be provided for onsite participants to test demos from both the perspective of a local and remote user, enabling them to experience different roles. The organizers of QoMEX 2024 hope that the hybrid activities of QoMEX 2024 will trigger more research interest in these areas along and beyond the classical lines of QoE research (to perform quantitative subjective studies of QoE features and correlating them with QoE factors).

QoMEX 2024: Towards Immersive Digiphysical Experiences

Concluding Remarks

As immersive experiences extend into both digital and physical worlds and realities, there is a great space to conquer for QoE, UX, and QUX-related research. While the recent CoVid pandemic has forced many users to replace physical with digital meetings and sustainability considerations have reduced many peoples’ and organizations’ readiness to (support) travel, shortcomings of hybrid digiphysical meetings have failed to persuade their participants of their superiority over pure online or on-site meetings. Indeed, one promising path towards a successful integration of physical and digital worlds consists of trying out, experiencing, reflecting, and deriving important research questions for and beyond the QoE research community The upcoming conference QoMEX 2024 will be a stop along this road with carefully selected hybrid experiences aimed at boosting research and best practice in the QoE domain towards immersive digiphysical experiences.

References

  • [Aseeri, 2021] Aseeri, S., & Interrante, V. (2021). The Influence of Avatar Representation on Interpersonal Communication in Virtual Social Environments. IEEE Transactions on Visualization and Computer Graphics, 27(5), 2608-2617.
  • [Bajpai, 2022] Bajpai, V., et al.. (2022). Recommendations for designing hybrid conferences. ACM SIGCOMM Computer Communication Review, 52(2), 63-69.
  • [Brod, 1984] Brod, C. (1984). Technostress: The Human Cost of the Computer Revolution. Basic Books; New York, NY, USA: 1984.
  • [Döring, 2022] Döring, N., Moor, K. D., Fiedler, M., Schoenenberg, K., & Raake, A. (2022). Videoconference Fatigue: A Conceptual Analysis. International Journal of Environmental Research and Public Health, 19(4), 2061.
  • [Egger-Lampl, 2019] Egger-Lampl, S., Hammer, F., & Möller, S. (2019). Towards an integrated view on QoE and UX: adding the Eudaimonic Dimension, ACM SIGMultimedia Records, 10(4):5.
  • [Gibbs, 2022] Gibbs, J. K., Gillies, M., & Pan, X. (2022). A comparison of the effects of haptic and visual feedback on presence in virtual reality. International Journal of Human-Computer Studies, 157, 102717.
  • [Hennig-Thurau, 2023] Hennig-Thurau, T., Aliman, D. N., Herting, A. M., Cziehso, G. P., Linder, M., & Kübler, R. V. (2023). Social Interactions in the Metaverse: Framework, Initial Evidence, and Research Roadmap. Journal of the Academy of Marketing Science, 51(4), 889-913.
  • [IMeX WP, 2020] Perkis, A., Timmerer, C., et al., “QUALINET White Paper on Definitions of Immersive Media Experience (IMEx)”, European Network on Quality of Experience in Multimedia Systems and Services, 14th QUALINET meeting (online), May 25, 2020. Online: https://arxiv.org/abs/2007.07032
  • [ISO/IEC 23055] ISO/IEC 23005 (MPEG-V) standards, Media Context and Control, https://mpeg.chiariglione.org/standards/mpeg-v, accessed January 21, 2024.
  • [ISO/IEC 23090] ISO/IEC 23090 (MPEG-I) standards, Coded representation of Immersive Media, https://mpeg.chiariglione.org/standards/mpeg-i, accessed January 21, 2024.
  • [IEEE 2888] IEEE 2888 standards, https://sagroups.ieee.org/2888/, accessed January 21, 2024.
  • [ITU-T Rec.. G.1092, 2023] ITU-T Recommendation G.1092 – Taxonomy of telemeetings from a quality of experience perspective, Oct. 2023.
  • [ITU-T Rec. P.1320, 2022] ITU-T Recommendation P.1320 – QoE assessment of extended reality (XR) meetings, 2022.
  • [ITU-T P.IXC, 2022] ITU-T Work Item: Interactive test methods for subjective assessment of extended reality communications, under study,” 2022.
  • [Lee, 2021] Lee, L. H. et al. (2021). All One Needs to Know about Metaverse: A Complete Survey on Technological Singularity, Virtual Ecosystem, and Research Agenda. arXiv preprint arXiv:2110.05352.
  • [Metaverse, 2023] Metaverse Standards Forum, https://metaverse-standards.org/
  • [Milgram, 1995] Milgram, P., Takemura, H., Utsumi, A., & Kishino, F. (1995, December). Augmented reality: A class of displays on the reality-virtuality continuum. In Telemanipulator and telepresence technologies (Vol. 2351, pp. 282-292). International Society for Optics and Photonics.
  • [Moslavac, 2023] Moslavac, M., Brzica, L., Drozd, L., Kušurin, N., Vlahović, S., & Skorin-Kapov, L. (2023, July). Assessment of Varied User Representations and XR Environments in Consumer-Grade XR Telemeetings. In 2023 17th International Conference on Telecommunications (ConTEL) (pp. 1-8). IEEE.
  • [Rauschnabel, 2022] Rauschnabel, P. A., Felix, R., Hinsch, C., Shahab, H., & Alt, F. (2022). What is XR? Towards a Framework for Augmented and Virtual Reality. Computers in human behavior, 133, 107289.
  • [NEM WP, 2022] New European Media (NEM), NEM: List of topics for the Work Program 2023-2024.
  • [NEM XR, 2022] New European Media (NEM), NEM contribution to the XR coalition, June 2022.
  • [Perez, 2022] Pérez, P., Gonzalez-Sosa, E., Gutiérrez, J., & García, N. (2022). Emerging Immersive Communication Systems: Overview, Taxonomy, and Good Practices for QoE Assessment. Frontiers in Signal Processing, 2, 917684.
  • [Spittle, 2023] Spittle, B., Frutos-Pascual, M., Creed, C., & Williams, I. (2023). A Review of Interaction Techniques for Immersive Environments. IEEE Transactions on Visualization and Computer Graphics, 29(9), Sept. 2023.
  • [TRANSMIXR] EU HORIZON 2020 TRANSMIXR project, Ignite the Immersive Media Sector by Enabling New Narrative Visions, https://transmixr.eu/
  • [Viola, 2023] Viola, I., Jansen, J., Subramanyam, S., Reimat, I., & Cesar, P. (2023). VR2Gather: A Collaborative Social VR System for Adaptive Multi-Party Real-Time Communication. IEEE MultiMedia, 30(2).
  • [Wang 2023] Wang, H. et al. (2023). A Survey on the Metaverse: The State-of-the-Art, Technologies, Applications, and Challenges. IEEE Internet of Things Journal, 10(16).
  • [Wang, 2022] Wang, Y. et al. (2022). A Survey on Metaverse: Fundamentals, Security, and Privacy. IEEE Communications Surveys & Tutorials, 25(1).
  • [Westerlund, 2020] Westerlund, T. & Marklund, B. (2020). Community pharmacy and primary health care in Sweden – at a crossroads. Pharm Pract (Granada), 18(2): 1927.

MPEG Column: 144th MPEG Meeting in Hannover, Germany

The 144th MPEG meeting was held in Hannover, Germany! For those interested, the press release is available with all the details. It’s great to see progress being made in person (cf. also the group pictures below). The main outcome of this meeting is as follows:

  • MPEG issues Call for Learning-Based Video Codecs for Study of Quality Assessment
  • MPEG evaluates Call for Proposals on Feature Compression for Video Coding for Machines
  • MPEG progresses ISOBMFF-related Standards for the Carriage of Network Abstraction Layer Video Data
  • MPEG enhances the Support of Energy-Efficient Media Consumption
  • MPEG ratifies the Support of Temporal Scalability for Geometry-based Point Cloud Compression
  • MPEG reaches the First Milestone for the Interchange of 3D Graphics Formats
  • MPEG announces Completion of Coding of Genomic Annotations

We have modified the press release to cater to the readers of ACM SIGMM Records and highlighted research on video technologies. This edition of the MPEG column focuses on MPEG Systems-related standards and visual quality assessment. As usual, the column will end with an update on MPEG-DASH.

Attendees of the 144th MPEG meeting in Hannover, Germany.

Visual Quality Assessment

MPEG does not create standards in the visual quality assessment domain. However, it conducts visual quality assessments for its standards during various stages of the standardization process. For instance, it evaluates responses to call for proposals, conducts verification tests of its final standards, and so on. MPEG Visual Quality Assessment (AG 5) issued an open call to study quality assessment for learning-based video codecs. AG 5 has been conducting subjective quality evaluations for coded video content and studying their correlation with objective quality metrics. Most of these studies have focused on the High Efficiency Video Coding (HEVC) and Versatile Video Coding (VVC) standards. To facilitate the study of visual quality, MPEG maintains the Compressed Video for the study of Quality Metrics (CVQM) dataset.

With the recent advancements in learning-based video compression algorithms, MPEG is now studying compression using these codecs. It is expected that reconstructed videos compressed using learning-based codecs will have different types of distortion compared to those induced by traditional block-based motion-compensated video coding designs. To gain a deeper understanding of these distortions and their impact on visual quality, MPEG has issued a public call related to learning-based video codecs. MPEG is open to inputs in response to the call and will invite responses that meet the call’s requirements to submit compressed bitstreams for further study of their subjective quality and potential inclusion into the CVQM dataset.

Considering the rapid advancements in the development of learning-based video compression algorithms, MPEG will keep this call open and anticipates future updates to the call.

Interested parties are kindly requested to contact the MPEG AG 5 Convenor Mathias Wien (wien@lfb.rwth- aachen.de) and submit responses for review at the 145th MPEG meeting in January 2024. Further details are given in the call, issued as AG 5 document N 104 and available from the mpeg.org website.

Research aspects: Learning-based data compression (e.g., for image, audio, video content) is a hot research topic. Research on this topic relies on datasets offering a set of common test sequences, sometimes also common test conditions, that are publicly available and allow for comparison across different schemes. MPEG’s Compressed Video for the study of Quality Metrics (CVQM) dataset is such a dataset, available here, and ready to be used also by researchers and scientists outside of MPEG. The call mentioned above is open for everyone inside/outside of MPEG and allows researchers to participate in international standards efforts (note: to attend meetings, one must become a delegate of a national body).

MPEG Systems-related Standards

At the 144th MPEG meeting, MPEG Systems (WG 3) produced three news-worthy items as follows:

  • Progression of ISOBMFF-related standards for the carriage of Network Abstraction Layer (NAL) video data.
  • Enhancement of the support of energy-efficient media consumption.
  • Support of temporal scalability for geometry-based Point Cloud Compression (PPC).

ISO/IEC 14496-15, a part of the family of ISOBMFF-related standards, defines the carriage of Network Abstract Layer (NAL) unit structured video data such as Advanced Video Coding (AVC), High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC), Essential Video Coding (EVC), and Low Complexity Enhancement Video Coding (LCEVC). This standard has been further improved with the approval of the Final Draft Amendment (FDAM), which adds support for enhanced features such as Picture-in-Picture (PiP) use cases enabled by VVC.

In addition to the improvements made to ISO/IEC 14496-15, separately developed amendments have been consolidated in the 7th edition of the standard. This edition has been promoted to Final Draft International Standard (FDIS), marking the final milestone of the formal standard development.

Another important standard in development is the 2nd edition of ISO/IEC14496-32 (file format reference software and conformance). This standard, currently at the Committee Draft (CD) stage of development, is planned to be completed and reach the status of Final Draft International Standard (FDIS) by the beginning of 2025. This standard will be essential for industry professionals who require a reliable and standardized method of verifying the conformance of their implementation.

MPEG Systems (WG 3) also promoted ISO/IEC 23001-11 (energy-efficient media consumption (green metadata)) Amendment 1 to Final Draft Amendment (FDAM). This amendment introduces energy-efficient media consumption (green metadata) for Essential Video Coding (EVC) and defines metadata that enables a reduction in decoder power consumption. At the same time, ISO/IEC 23001-11 Amendment 2 has been promoted to the Committee Draft Amendment (CDAM) stage of development. This amendment introduces a novel way to carry metadata about display power reduction encoded as a video elementary stream interleaved with the video it describes. The amendment is expected to be completed and reach the status of Final Draft Amendment (FDAM) by the beginning of 2025.

Finally, MPEG Systems (WG 3) promoted ISO/IEC 23090-18 (carriage of geometry-based point cloud compression data) Amendment 1 to Final Draft Amendment (FDAM). This amendment enables the compression of a single elementary stream of point cloud data using ISO/IEC 23090-9 (geometry-based point cloud compression) and storing it in more than one track of ISO Base Media File Format (ISOBMFF)-based files. This enables support for applications that require multiple frame rates within a single file and introduces a track grouping mechanism to indicate multiple tracks carrying a specific temporal layer of a single elementary stream separately.

Research aspects: MPEG Systems usually provides standards on top of existing compression standards, enabling efficient storage and delivery of media data (among others). Researchers may use these standards (including reference software and conformance bitstreams) to conduct research in the general area of multimedia systems (cf. ACM MMSys) or, specifically on green multimedia systems (cf. ACM GMSys).

MPEG-DASH Updates

The current status of MPEG-DASH is shown in the figure below with only minor updates compared to the last meeting.

MPEG-DASH Status, October 2023.

In particular, the 6th edition of MPEG-DASH is scheduled for 2024 but may not include all amendments under development. An overview of existing amendments can be found in the column from the last meeting. Current amendments have been (slightly) updated and progressed toward completion in the upcoming meetings. The signaling of haptics in DASH has been discussed and accepted for inclusion in the Technologies under Consideration (TuC) document. The TuC document comprises candidate technologies for possible future amendments to the MPEG-DASH standard and is publicly available here.

Research aspects: MPEG-DASH has been heavily researched in the multimedia systems, quality, and communications research communities. Adding haptics to MPEG-DASH would provide another dimension worth considering within research, including, but not limited to, performance aspects and Quality of Experience (QoE).

The 145th MPEG meeting will be online from January 22-26, 2024. Click here for more information about MPEG meetings and their developments.

JPEG Column: 100th meeting in Covilha, Portugal

JPEG AI reaches Committee Draft stage at the 100th JPEG meeting

The 100th JPEG meeting was held in Covilhã, Portugal, from July 17th to 21st, 2023. At this meeting, in addition to its usual standardization activities, the JPEG Committee organized a celebration on the occasion of its 100th meeting. This face-to-face meeting, the second after the pandemic, had a record amount of face-to-face participation, with more than 70 experts attending the meeting in person.

Several activities reached important milestones. JPEG AI became a committee draft after intensive meeting sessions with detailed analysis of the core experiment results and multiple evaluations of the considered technologies. JPEG NFT issued a call for proposals, and the first JPEG XE use cases and requirements document was also issued publicly. Furthermore, JPEG Trust has made major steps towards its standardization.

The 100th JPEG meeting had the following highlights:

  • JPEG Celebrates its 100th meeting;
  • JPEG AI reaches Committee Draft;
  • JPEG Pleno Learning-based Point Cloud coding improves its Verification Model;
  • JPEG Trust develops its first part, the “Core Foundation”;
  • JPEG NFT releases the Final Call for Proposals;
  • JPEG AIC-3 initiates the definition of a Working Draft;
  • JPEG XE releases the Use Cases and Requirements for Event-based Vision;
  • JPEG DNA defines the evaluation of the responses to the Call for Proposals;
  • JPEG XS proceeds the development of the 3rd edition;
  • JPEG Systems releases a Reference Software.

The following sections summarize the main highlights of the 100th JPEG meeting.

JPEG Celebrates its 100th meeting

The JPEG Committee organized a celebration of its 100th meeting. A ceremony took place on July 19, 2023 to mark this important milestone. The JPEG Convenor initiated the ceremony, followed by a speech from Prof. Carlos Salema, founder and former chair of the Instituto de Telecomunicações and current vice president of the Lisbon Academy of Sciences, and a welcome note from Prof. Silvia Socorro, vice-rector for research at the University of Beira Interior. Personalities from standardization organizations ISO, IEC and ITU, as well as the Portuguese government, sent welcome addresses in form of recorded videos. Furthermore, a collection of short video addresses from past and current JPEG experts was collected and presented during the ceremony. The celebration was preceded by a workshop on “Media Authenticity in the Age of Artificial Intelligence”. Further information on the workshop and its proceedings are accessible on jpeg.org. A social event followed the celebration ceremony.

The 100th meeting celebration and cake.

100th meeting Social Event.

JPEG AI

The JPEG AI (ISO/IEC 6048) learning-based image coding system has completed the Committee Draft of the standard. The current JPEG AI Verification Model (VM) has two operation points, called base and high which include several tools which can be enabled or disabled, without re-training the neural network models. The base operation point is a subset of design elements of the high operation point. The lowest configuration (base operating point without tools) provides 8% rate savings over the VVC Intra anchor with twice faster decoding and 250 times faster encoder run time on CPU. In the most powerful configuration, the current VM achieves a 29% compression gain over the VVC Intra anchor.

The performance of the JPEG AI VM 3 was presented and discussed during the 100th JPEG meeting. The findings of the 15 core experiments created during the previous 99th JPEG meeting, as well as other input contributions, were discussed and investigated. This effort resulted in the reorganization of many syntactic parts with the goal of their simplification, as well as the use of several neural networks and tools, namely some design simplifications and post filtering improvements. Furthermore, coding efficiency was increased at high quality up to visually lossless, and region-of-interest quality enhancement functionality, as well as bit-exact repeatability, were added among other enhancements. The attention mechanism for the high operation point is the most significant change, as it considerably decreases decoder complexity. The entropy decoding neural network structure is now identical for the high and base operation points. The defined analysis and synthesis transforms enable efficient coding from high quality to near visually lossless and the chroma quality has been improved with the use of novel enhancement filtering technologies.

JPEG Pleno Learning-based Point Cloud coding

The JPEG Pleno Point Cloud activity progressed at the 100th meeting with a major improvement to its Verification Model (VM) incorporating a sparse convolutional framework providing improved quality with a more efficient computational model. In addition, an exciting new application was demonstrated showing the ability of the JPEG VM to support point cloud classification. The 100th JPEG Meeting also saw the release of a new point cloud test set to better support this activity. Prior to the 101st JPEG meeting in October 2023, JPEG experts will investigate possible advancements to the VM in the areas of attention models, voxel pruning within sparse tensor convolution, and support for residual lossless coding. In addition, a major Exploration Study will be conducted to explore the latest point cloud quality metrics.

JPEG Trust

The JPEG Committee is expediting the development of the first part, the “Core Foundation”, of its new international standard: JPEG Trust. This standard defines a framework for establishing trust in media, and addresses aspects of authenticity and provenance through secure and reliable annotation of media assets throughout their life cycle. JPEG Trust is being built on its 2022 Call for Proposals, whose responses form the basis of the framework under development.

The new standard is expected to be published in 2024. To stay updated on JPEG Trust, please regularly check the JPEG website at jpeg.org for the latest information and reach out to the contacts listed below to subscribe to the JPEG Trust mailing list.

JPEG NFT

Non-Fungible Tokens (NFTs) are an exciting new way to create and trade media assets, and have seen an increasing interest from global markets. NFTs promise to impact the trading of artworks, collectible media assets, micro-licensing, gaming, ticketing and more.  At the same time, concerns about interoperability between platforms, intellectual property rights, and fair dealing must be addressed.

JPEG is pleased to announce a Final Call for Proposals on JPEG NFT to address these challenges. The Final Call for Proposals on JPEG NFT and the associated Use Cases and Requirements for JPEG NFT document can be downloaded from the jpeg.org website. JPEG invites interested parties to register their proposals by 2023-10-23. The final deadline for submission of full proposals is 2024-01-15.

JPEG AIC

During the 100th JPEG meeting, the AIC activity continued its efforts on the Core Experiments, which aim at collecting fundamental information on the performance of the contributions received in April 2023 in response to a Call for Contributions on Subjective Image Quality Assessment. These results will be considered during the design of the AIC-3 standard, which has been carried out in a collaborative way since its beginning. The activity also initiated the definition of a Working Draft for AIC-3.

Other activities are also planned to initiate the work on a Draft Call for Proposals on Objective Image Quality Metrics (AIC-4) during the 101st JPEG meeting, October 2023. The JPEG Committee invites interested parties to take part in the discussions and drafting of the Call.

JPEG XE

For the Event-based Vision exploration, called JPEG XE, the JPEG Committee finalized a first version of a Use Cases and Requirements for Event-based Vision v0.5 document. Event-based Vision revolves around a new and emerging image modality created by event-based visual sensors. JPEG XE is about creation and development of a standard to represent events in an efficient way allowing interoperability between sensing, storage, and processing, targeting machine vision and other relevant applications. Events in the context of this standard are defined as the messages that signal the result of an observation at a precise point in time, typically triggered by a detected change in the physical world. The new Use Cases and Requirements document is the first version to become publicly available and serves mainly to attract interest from external experts and other standardization organizations. Although still in a preliminary version, the JPEG committee continues to invest efforts into refining this document, so that it can serve as a solid basis for further standardization. An Ad-Hoc Group has been re-established to work on this topic until the 101st JPEG meeting in October 2023. To stay informed about the activities please join the event-based imaging Ad-hoc Group mailing list.

JPEG DNA

The JPEG Committee has been exploring coding of images in quaternary representations particularly suitable for image archival on DNA storage. The scope of JPEG DNA is to create a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers.

At the 100th JPEG meeting, “Additions to the JPEG DNA Common Test Conditions version 2.0”, was produced which supplements the “JPEG DNA Common Test Conditions” by specifying a new constraint to be taken into account when coding images in quaternary representation. In addition, the detailed procedures for evaluation of the pre-registered responses to the JPEG DNA Call for Proposals were defined.

Furthermore, the next steps towards a deployed high-performance standard were discussed and defined. In particular, it was decided to request for the new work item approval once a Committee Draft stage has been reached.

The JPEG-DNA AHG has been re-established to work on the preparation of assessment and crosschecking of responses to the JPEG DNA Call for Proposals until the 101st JPEG meeting in October 2023.

JPEG XS

The JPEG Committee continued its work on the JPEG XS 3rd edition. The main goal of the 3rd edition is to reduce the bitrate for on-screen content by half while maintaining the same image quality.

Part 1 of the standard – Core coding tools – is still under Draft International Standard (DIS) ballot. For Part 2 – Profiles and buffer models – and Part 3 – Transport and container formats – the Committee Draft (CD) circulation results were processed and the DIS ballot document was created. In Part 2, three new profiles have been added to better adapt to the needs of the market. In particular, two profiles are based on the High 444.12 profile, but introduce some useful constraints on the wavelet decomposition structure and disable the column modes entirely. This makes the profiles easier to implement (with lower resource usage and fewer options to support) while remaining consistent with the way JPEG XS is already being deployed in the market today. Additionally, the two new High profiles are further constrained by explicit conformance points (like the new TDC profile) to better support market interoperability. The third new profile is called TDC MLS 444.12, and allows the achievement of mathematically lossless quality. For example, it is intended for medical applications, where a truly lossless reconstruction might be required.

Completion of the JPEG XS 3rd edition standard is scheduled for January 2024.

JPEG Systems

At the 100th meeting the JPEG Committee produced the CD text of 19566-10, the JPEG Systems Reference Software. In addition, a JPEG white paper was released that provides an overview of the entire JPEG Systems standard. The white paper can be downloaded on the JPEG.org website.

Final Quote

“The JPEG Committee celebrated its 100th meeting, an important milestone considering the current success of JPEG standards. This celebration was enriched with significant achievements at the meeting, notably the release of the Committee Draft of JPEG AI.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Overview of Benchmarking Platforms and Software for Multimedia Applications

In a time where Artificial Intelligence (AI) continues to push the boundaries of what was previously thought possible, the demand for benchmarking platforms that allow to fairly assess and evaluate AI models has become paramount. These platforms serve as connecting hubs between data scientists, machine learning specialists, industry partners, and other interested parties. They mostly function under the Evaluation-as-a-Service (EaaS) paradigm [1], the idea that participants that do a certain benchmarking task should be able to test the output of their systems in similar conditions, by being provided with a common definition of the targeted concepts, datasets and data splits, metrics, and evaluation tools. These common elements are provided through online platforms that can even offer Application Programming Interfaces (APIs) or container-level integration of the participants’ AI models. This column provides an insight into these platforms, looking at their main characteristics, use cases, and particularities. In the second part of the column we will also look into some of the main benchmarking platforms that are geared towards handling multimedia-centric benchmarks and datasets, relevant to SIGMM.

Defining Characteristics of EaaS platforms

Benchmarking competitions and initiatives, and EaaS platforms attempt to tackle a number of keypoints in the development of AI algorithms and models, namely:

  • Creating a fair and impartial evaluation environment, by standardizing the datasets and evaluation metrics used by all participants to an evaluation competition. In doing so, EaaS platforms play a pivotal role in promoting transparency and comparability in AI models and approaches.
  • Enhancing reproducibility by giving the option to run the AI models on dedicated servers provided and managed by competition organizers. This increases the trust and bolsters the integrity of the results produced by competition participants, as the organizers are able to closely monitor the testing process for each individual AI model. 
  • Fostering, as a natural consequence, a higher degree of data privacy, as participants could be given access only to training data, while testing data is kept private and is only accessed via APIs on the dedicated servers, reducing the risk of data exposure.
  • Creating a common repository for the sharing the data and details of a benchmarking task, building a history not only of the results of the benchmarking tasks throughout the years, but also of the evolution of the types of approaches and models used by participants. Other interesting features, like the existence of forums and discussion threads on competitions, allow new participants to quickly search for problems they encounter and hopefully have a quicker resolution of their issues.

Given these common goals, benchmarking platforms usually integrate a set of common features and user-level functionalities that are summed up in this section and grouped into three categories: task organization and scheduling, scoring and reproducibility, and communication and dissemination.

Task organization and scheduling. The platforms allow the creation, modification and maintenance of benchmarking tasks, either through a graphical user interface (GUI) or by using task bundles (most commonly using JSON, XML, Python or custom scripting languages). Competition organizers can define their task, and define sub-tasks that may explore different facets of the targeted data. Scheduling is another important feature in benchmarking competition creation, as some parts of the data may be kept private until a certain moment in time, and allow the competition organizers to hide the results of other teams until a certain point in time. We consider the last point an important one, as participants may feel discouraged from continuing their participation if their initial results are not high enough compared with other participants. Another noteworthy feature is the run quantity management that allows organizers to specify a maximum number of allowed runs per participant during the benchmarking task. This limitation discourages participants from attempting to solve the given tasks with brute force approaches, where they implement a large number of models and model variations. As a result, participants are incentivized to delve deeper into the data, critically analyzing why certain methods succeed and others fall short.

Scoring and reproducibility. EaaS platforms generally deploy two paradigms, sometimes side-by-side, with regards to AI model testing and results generation [1, 2]: the Data-to-Algorithm (D2A) approach, and the Algorithm-to-Data (A2D) approach. The former refers to competitions where participants must download the testing set, run the prediction systems on their own machines, and provide the predictions to the organizers, usually in CSV format for the multimedia domain. In this setup, the ground truth data for the testing set is kept private, and after the organizers receive the prediction result files, they communicate the performance to the participants, or the results are automatically computed by the platform by organizer-provided scripts, once the files are uploaded to it. The A2D approach on the other hand is more complex, may incur additional financial costs, and may be more time consuming for both organizers and task participants, but increases the trustworthiness and reproducibility of the task and AI models themselves. In this setup, organizers provide cloud-based computing resources via Virtual Machines (VMs) and containers, and a common processing pipeline or API that competitors must integrate in their source code. The participants develop the wrappers that integrate their AI models accordingly, and upload the model to the EaaS platforms directly. The AI models are then executed according to the common pipeline and results are automatically provided to the participants, while also allowing for the testing data to be kept completely private. Traditionally, in order to achieve this, EaaS platforms offer the possibility of integration with cloud computing platforms like Amazon AWS, Microsoft Azure, or Google Cloud, and offer Docker integration for the creation of containers where the code can be hosted.

Communication and dissemination. EaaS platforms allow the interaction between competition organizers and participants, either through emails, automatic notifications, or forums where interested parties can exchange ideas, ask questions, offer help, signal potential problems in the data or scripts associated with the tasks.

Popular multimedia EaaS platforms

This section presents some of the most popular benchmarking platforms aimed at the multimedia domain. We will present some key features and associated popular multimedia datasets for the following platforms: Kaggle, AIcrowd, Codabench, Drivendata, and EvalAI.

Kaggle represents perhaps the top-most popular benchmarking platform at this moment, and goes beyond the scope of providing datasets and benchmarking competitions, also hosting AI models, courses, and source code repositories. Competition organizers can design the tasks under either of the D2A or A2D paradigms, giving participants the possibility of integrating their AI models in Jupyter Notebooks for reproducibility. The platform also gives the option of alloting CPU and GPU cloud-based resources for A2D competitions. The Kaggle repository offers code for a large number of additional competition management tools and communication APIs. Among an impressive number of datasets and competitions, Kaggle currently hosts competitions that use the MNIST original data [3], as well as other MNIST-like datasets like Fashion-MNIST [4], as well as datasets on varied subjects ranging from sentiment analysis in social media [5] to medical image processing [6].

AIcrowd is an open source EaaS platform for open benchmarking challenges that puts an accent on connections and collaborative work between data science and machine learning experts. This platform offers the source code for command line interface (CLI) and API clients that can interact with AIcrowd servers. ImageCLEF, between 2018 and 2022 [7 – 11], is one of the most popular multimedia benchmarking initiatives hosted on AICrowd, featuring diverse multimedia topics such as lifelogging, medical image processing, image processing for environment health prediction, the analysis of social media dangers with regards to image sharing, and ensemble learning for multimedia data.

Codabench, launched in August 2023, and its precursor CodaLab, are two open source benchmarking platforms that provide a large number of options, including A2D and D2A approaches, as well as “inverted benchmarks”, where organizers provide the reference algorithms and participants contribute with the datasets. Among the current running challenges on this platform standouts are the two Quality-of-Service-oriented challenges on audio-video synchronization error detection and error measurement challenges that are part of the 3rd Workshop on Image/Video/Audio Quality in Computer Vision and Generative AI at the Winter Conference on Applications of Computer Vision – WACV2024.

Drivendata targets the intersection of data science and social impact. This platform hosts competitions that integrate the social aspect of their domain of interest directly in their mission and definition, while also hosting a number of open-source projects and competition-winning AI models. Given its accent on social impact, this platform hosts a number of benchmarking challenges that target social issues like the detection of hateful memes [12] and image-based nature conservation efforts.

EvalAI is another open source platform that is able to create A2D and D2A competition environments, while also integrating optimization steps that allow for evaluation code to run faster on multi-core cloud infrastructure. The EvalAI platform holds many diverse multimedia-centric competitions, including image segmentation tasks based on LVIS [13] and a wide range of sport tasks [14].

Future directions, developments and other tools

While the tools and platforms described in the previous section represent just a portion of the number of EaaS platform currently online in the research community, we would also like to mention some projects that are currently in the development stage or that can be considered additional tools for benchmarking initiatives:

  • The AI4Media benchmarking platform, is a benchmarking platform that is currently in the prototype and development stage. Among its most interesting features and ideas promoted by the platform developers is the creation of complexity metrics that would help competition organizers understand the computational efficiency and resource requirements for the submitted systems.
  • The BenchmarkSTT started as a specialized benchmarking platform for speech-to-text, but is now evolving in different directions, including facial recognition in videos.
  • The PapersWithCode platform, while not a benchmarking platform per se, is useful as a repository that collects the results AI model on datasets throughout the years, and groups different datasets studying the same concepts under the same umbrella (i.e., Image Classification, Object Detection, Medical Image Segmentation, etc.), while also providing links to scientific papers, github implementations of the models, and links to the datasets. This may represent a good starting point for young researchers that are trying to understand the history and state-of-the-art for certain domains and applications.

Conclusions

Benchmarking platforms represent a key component of benchmarking, pushing for fairness and trustworthiness in AI model comparison, while also providing tools that may foster reproducibility in AI. We are happy to see that many of the platforms discussed in this article are open source, or have open source components, thus allowing interested scientists to create their own custom implementations of these platforms, and to adapt them when necessary to their particular fields.

Acknowledgements

The work presented in this column is supported under the H2020 AI4Media “A European Excellence Centre for Media, Society and Democracy” project, contract #951911.

References

[1] Hanbury, A., Müller, H., Balog, K., Brodt, T., Cormack, G. V., Eggel, I., Gollub, T., Hopfgartner, F., Kalpathy-Cramer, J., Kando, N., Krithara, A., Lin, J., Mercer, S. & Potthast, M. (2015). Evaluation-as-a-service: Overview and outlook. arXiv preprint arXiv:1512.07454.
[2] Hanbury, A., Müller, H., Langs, G., Weber, M. A., Menze, B. H., & Fernandez, T. S. (2012). Bringing the algorithms to the data: cloud–based benchmarking for medical image analysis. In Information Access Evaluation. Multilinguality, Multimodality, and Visual Analytics: Third International Conference of the CLEF Initiative, CLEF 2012, Rome, Italy, September 17-20, 2012. Proceedings 3 (pp. 24-29). Springer Berlin Heidelberg.
[3] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
[4] Xiao, H., Rasul, K., & Vollgraf, R. (2017). Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747.
[5] Niu, T., Zhu, S., Pang, L., & El Saddik, A. (2016). Sentiment analysis on multi-view social data. In MultiMedia Modeling: 22nd International Conference, MMM 2016, Miami, FL, USA, January 4-6, 2016, Proceedings, Part II 22 (pp. 15-27). Springer International Publishing.
[6] Thambawita, V., Hicks, S. A., Storås, A. M., Nguyen, T., Andersen, J. M., Witczak, O., … & Riegler, M. A. (2023). VISEM-Tracking, a human spermatozoa tracking dataset. Scientific Data, 10(1), 1-8.
[7] Ionescu, B., Müller, H., Villegas, M., García Seco de Herrera, A., Eickhoff, C., Andrearczyk, V., … & Gurrin, C. (2018). Overview of ImageCLEF 2018: Challenges, datasets and evaluation. In Experimental IR Meets Multilinguality, Multimodality, and Interaction: 9th International Conference of the CLEF Association, CLEF 2018, Avignon, France, September 10-14, 2018, Proceedings 9 (pp. 309-334). Springer International Publishing.
[8] Ionescu, B., Müller, H., Péteri, R., Dang-Nguyen, D. T., Piras, L., Riegler, M., … & Karampidis, K. (2019). ImageCLEF 2019: Multimedia retrieval in lifelogging, medical, nature, and security applications. In Advances in Information Retrieval: 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14–18, 2019, Proceedings, Part II 41 (pp. 301-308). Springer International Publishing.
[9] Ionescu, B., Müller, H., Péteri, R., Dang-Nguyen, D. T., Zhou, L., Piras, L., … & Constantin, M. G. (2020). ImageCLEF 2020: Multimedia retrieval in lifelogging, medical, nature, and internet applications. In Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part II 42 (pp. 533-541). Springer International Publishing.
[10] Ionescu, B., Müller, H., Péteri, R., Abacha, A. B., Demner-Fushman, D., Hasan, S. A., … & Popescu, A. (2021). The 2021 ImageCLEF Benchmark: Multimedia retrieval in medical, nature, internet and social media applications. In Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28–April 1, 2021, Proceedings, Part II 43 (pp. 616-623). Springer International Publishing.
[11] de Herrera, A. G. S., Ionescu, B., Müller, H., Péteri, R., Abacha, A. B., Friedrich, C. M., … & Dogariu, M. (2022, April). Imageclef 2022: multimedia retrieval in medical, nature, fusion, and internet applications. In European Conference on Information Retrieval (pp. 382-389). Cham: Springer International Publishing.
[12] Kiela, D., Firooz, H., Mohan, A., Goswami, V., Singh, A., Fitzpatrick, C. A., … & Parikh, D. (2021, August). The hateful memes challenge: Competition report. In NeurIPS 2020 Competition and Demonstration Track (pp. 344-360). PMLR.
[13] Gupta, A., Dollar, P., & Girshick, R. (2019). Lvis: A dataset for large vocabulary instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5356-5364).
[14] Giancola, S., Cioppa, A., Deliège, A., Magera, F., Somers, V., Kang, L., … & Li, Z. (2022, October). SoccerNet 2022 challenges results. In Proceedings of the 5th International ACM Workshop on Multimedia Content Analysis in Sports (pp. 75-86).

Report from CBMI 2023


The 20th International Conference on Content-based Multimedia Indexing (CBMI) was held exclusively as an in-person event in Orleans, France, on September 20-22, 2023. The conference was organized by the University of Orleans and received support from SIGMM. This edition marked a significant milestone as it was the first fully physical conference following the pandemic, providing a welcome opportunity for face-to-face interactions. The event drew a diverse and international audience, with participation from between 70 and 80 attendees representing 18 countries (12 Europeans, 4 Asians, 1 American and 1 African). Additionally, the conference included a European meeting (CHIST-ERA XAIface project) associated with the main event, which brought together approximately 15 individuals. Furthermore, several engineering students from the University of Orleans were invited to participate, allowing them to gain insights into cutting-edge multimedia research and exchange knowledge and ideas.

Program highlights

The conference was structured around two keynote presentations. The first keynote was presented by Prof. Alberto del Bimbo from the University of Florence, who spoke on the topic of “AI-Powered Personal Fashion Advising.” During his talk, Prof. Delbimbo discussed the key tasks and challenges related to using artificial intelligence in the fashion advisory field.

The closing keynote was delivered by Prof. Nicolas Hervé from the Institut National de l’Audiovisuel (French National Audiovisual Archive). Prof. Hervé highlighted the research activities conducted at Ina and how they could be integrated into information systems and enhance the value of their collections. His presentation provided insights into the practical applications of their work.

Presentation of our keynote speakers.

In conjunction with the presentation of 18 papers across four regular paper sessions, the 2023 conference adhered to the established tradition of previous editions by incorporating special sessions. These special sessions were designed to delve into the practical applications of multimedia indexing within specific domains or distinctive settings. This approach allowed for a more focused and in-depth exploration of several topics, offering valuable insights and discussions beyond the regular paper sessions.

In the ongoing year, we received a substantial volume of submissions, culminating in the approval of six special sessions. These special sessions have collectively embraced a total of 25 accepted papers.

  • Cultural Heritage and Multimedia Content
  • Interactive Video Retrieval for Beginners (IVR4B)
  • Physical Models and AI in Image and in Multi-modality 
  • Computational Memorability of Imagery
  • Cross-modal multimedia analysis and retrieval for well-being insights
  • Explainability in Multimedia Analysis (ExMA)

The coordination of these special sessions involved the collaborative efforts of multiple countries, including France, Austria, Ireland, Iceland, the UK, Romania, Japan, Norway, and Vietnam.

The special sessions encompassed a diverse range of multimedia topics, spanning from applications such as cultural heritage preservation and retrieval to machine learning, with a particular focus on facets like explainability and the utilization of physical models.

The conference program was complemented by a poster session composed of fourteen posters. The latter was followed by a demo session which comprised IVR4B video retrieval competition. 

Participants at the poster session.
Participants at the demo session.

The best paper of the conference was awarded EUR 500, generously sponsored by ACM SIGMM. The selection committee quickly found consensus to award the best paper award to Romain XU-DARME, Jenny Benois-Pineau, Romain Giot, Georges Quénot, Zakaria Chihani, Marie-Christine Rousset and Alexey Zhukov for their paper “On the stability, correctness, and plausibility of visual explanation methods based on feature importance”.

Social events

In addition to the two conference dinners organized by the conference committee, the participants had the opportunity to enjoy a guided tour through Orleans on their way to the first restaurant.

Participants enjoyed the first dinner after the guided tour

Among the social events organized during CBMI 2023, was the Music meets Science concert with the support of ACM SIGMM. After a series of scientific presentations, participants were able to appreciate the works of Beethoven, Murphy and Lizee. We thank ACM SIGMM for their support which made this cultural event possible.

The Odyssée Quartet composed of François Pineau-Benois (violinist),  Raphael Moraly (cellist), Olivier Marin (violist) and Audrey Sproule (violinist).

Outlook

The next edition of CBMI will be organized in Iceland. After several hybrid editions, we moved back on site towards the pre-pandemic level. 

Equity, Diversity and Inclusion at ACM MMSys 2023


The 14th ACM Multimedia Systems Conference (MMSys 2023) took place from June 7-10, 2023 in Vancouver, Canada. To continue the significant efforts from the last years,  and building on the strong commitment of the MMSys community to create a diverse, inclusive and accessible forum to discuss advancements in the area of multimedia systems and the technology experiences they enable, several EDI measures were adopted.  The main goals were to (1) raise awareness around the importance of diversity and inclusion for both the MMSys community and the research fields represented at MMSys and (2) to enable diverse participation and inclusion of underrepresented groups. In this column, we provide a brief overview of the main EDI activities and a number of key numbers, as well as short testimonials from two participants.  

Support and activities

Associate Professor Yvette Wohn giving her EDI keynote on “Moderating the Metaverse”

Supported by the ACM Special Interest Group on Multimedia (SIGMM) and ACM through founding for special initiatives, the provided support at MMSys 2023 included the following:

1. EDI Keynote Speech
We invited Dr. Yvette Wohn for a keynote speech on Moderating the Metaverse. Dr. Wohn (she/her) is an associate professor of Informatics at New Jersey Institute of Technology and director of the Social Interaction Lab . Her research is in the area of Human Computer Interaction (HCI) where she studies the characteristics and consequences of social interactions in online environments such as virtual worlds and social media. Yvette’s keynote speech was very well received and ignited conversations during the conference.
Abstract of the talk: Online harassment is a problem that we still have been unable to solve in the social media age of Web 2.0. As we move deeper into Web 3.0, which includes 3D virtual worlds, moderation moves beyond content to include behavioral components such as embodied interactions. How do we design these systems to be creative and generative while maintaining safety and equity? This talk will discuss the challenges and opportunities, both social and technical, in creating the next wave of networked multimedia systems.

2. EDI Luncheon & Challenge
Our goal for the luncheon and challenge was picking a topic to spark conversations during lunch that is engaging enough for all audience, is something that everyone can have some opinion on (and those opinions can be challenged during conversations), and the answers can provide us some insight about our audience and their take on EDIJ issues.
The questions were: 

  • What is the biggest diversity issue that you think can affect YOU in the metaverse?
  • What is the simplest, yet most practical solution you can think for this problem?

After the initial announcement and presentation, example scenarios and conversation icebreakers were printed and placed on the Break and Lunch tables and conversations were encouraged by volunteers, so that attendees would discuss over lunch, and submit their solution. The Rubric used for selecting the winner of this challenge was:

  • Problem (15 pts): Explorative Value, Importance, Scale of effect
  • Solution Quality (15 pts): Feasibility, Simplicity, Effectiveness
  • Each item was rated on the scale of 0-5: not meeting requirements: 0, minimal: 1, acceptable: 2, good: 3, very good: 4, excellent: 5.

We received 14 entries by the given deadline, and from two entries with 28 points, Dr. Sylvie Dijkstra-Soudrissanane was selected as the winner of the EDI Challenge for discussing the inaccurate representation of dark skin tones due to the inherent design of 3D capture devices such as LIDARs in her response. Sylvie’s wrote a short testimonial (see below).

3. Additional EDI Activities
EDI Considerations in Conference Name Tags
Preferred pronouns were used to foster a healthier and more inclusive space, safe and respectful for all attendees. In addition, the following was explicitly mentioned on the name tags:

  1. Diversity Advocate: To show we are proud of diversity and inclusion efforts, and we acknowledge and foster the enthusiasm for this important work.
  2. First-Timer: To easily find people who might not be familiar with the community to provide them further help and support, if needed.

Childcare Support
Due to financial uncertainty, we were not able to announce availability of childcare support funds before the conference which could help better planning for people with children and ensuring that we support all people with such need equally. However, we were nevertheless able to support a presenter who had planned childcare during the conference. Towards next year’s edition of MMSys, we strongly encourage that dedicated funds are made available well ahead of the conference, so that equal opportunities to attend can be offered to caregivers.

EDI volunteer support 
While most of our student volunteers were Vancouver-based and were supported with a free registration to the conference, one additional student volunteers who travelled to Vancouver and would otherwise not have been able to attend, was supported by the EDI chairs. His testimonial can be read below. 

Key numbers

  • Two out of Four Keynote Speakers for MMSys 2023 were Women (50%)
  • One out of Three Technical Program Chairs were Women (33%)
  • Nine out of 25 organizing committee members were Women (36%)
  • Four out of Fourteen (seventeen including parallel sessions) sessions of the main conference were chaired by Women (24%), and three out of four Workshop chairs were Women (75%).
Jinwei Zhao’s badge, illustrating several measures to make attendees feel welcome and included (e.g., showing self-selected preferred pronouns, diversity advocate, first time at MMSys indication, such that other attendees can make sure that new people to the conference are warmly welcomed and included).

Testimonials

Testimonial by Jinwei Zhao, Student Volunteer supported by the MMSys 2023 EDI  

“I was honored to be able to attend ACM MMSys 2023 in Vancouver as a student volunteer, an experience that afforded me a breadth of professional engagements. My responsibilities as a student volunteer encompassed assisting with the registration process and the assistance of technical sessions and workshops, thereby ensuring a seamless execution of the conference. It also gave me the invaluable opportunity to engage with distinguished researchers and talented PhD students in the multimedia community, facilitating a rich exchange of brilliant and novel ideas. The keynotes and technical sessions at the conference shed light on cutting-edge developments and emerging trends in the field of multimedia systems. These included advanced adaptive video bitrate algorithms, the integration of multimedia systems with next-generation networks like Starlink, the development of new protocols such as multipath QUIC and Media-Over-QUIC, and the future of immersive technologies in AR, VR, and XR domains. Additionally, I was deeply appreciative of receiving the ACM SIGMM MMSys Volunteer Honorarium after the conference. Although I did not have the occasion to present my research at MMSys 2023, the passion and dedication of my peers served as a catalyst for my further contributions to the field. This engagement was evidently fruitful and advantageous, as it led to the acceptance of my paper for presentation at MMSys 2024 next year. This experience also encouraged me to make further contributions more actively to the multimedia community, aligning with my decision to embark on a PhD program starting in 2024.»

Testimontial by Sylvie Dijkstra-Soudarissanane, MMSys 2023 attendee and winner of the MMSys 2023 EDI Challenge.

EDI Co-chair Dr. Dr Ouldooz Baghban Karimi hands over the EDI Challenge Award to the EDI Challenge winner Sylvie Dijkstra-Soudarissanane.

«I had the privilege of attending the ACM Multimedia Systems Conference (MMSys) in June 2023, an experience that left an incredible mark on my perspective as a scientist in the field of Social XR. The conference, held in the city of Vancouver, Canada, provided a unique platform for professionals from diverse backgrounds to converge and share cutting-edge insights in multimedia systems research and development. 
The MMSys conference proved to be an invaluable forum for hosting discussions on the latest advancements in multimedia technology. Keynotes and regular sessions covered a myriad of topics, ranging from advanced videos with 3D point clouds rendering, to multi-modal experiences and open software. This year, the rich program also included technical demo sessions, allowing participants to witness real-time systems in action, presented by leaders from organizations such as Xiaomi, Fraunhofer FOKUS, and my company TNO. Beyond the academic world, the conference facilitated networking and social interactions, providing a platform to connect with like-minded researchers. Engaging in discussions about user-interactive VR experiences, real-time holographic representations, and mobile-based deep learning video codecs … all happening in a breathtaking skyride above the Grouse Mountain added an extra layer of depth to the overall experience.
One of the highlights of my participation was the opportunity to pitch my idea on building socially responsible systems that prioritize inclusivity. The focus of my proposal revolved around designing systems that are inherently inclusive, considering factors such as skin tones, hair types, and ethnicities. The aim was to bridge the accessibility gap and ensure that these systems reach and cater to minority populations. It is a very personal endeavor, as a person of color. To my delight, this endeavor earned me recognition with a prestigious award in Diversity, Equity, and Inclusion. I am immensely proud to have received the DEI award offered by Dr Ouldooz Baghban Karimi for my commitment to inclusive research and innovation. This recognition reinforces the importance of pushing boundaries in technology to create solutions that resonate with diverse communities. The conference not only expanded my knowledge but also allowed me to forge meaningful connections with fellow researchers who share a passion for advancing the frontiers of multimedia systems.”