SIGMM Workshop on Multimodal AI Agents

The SIGMM Workshop on Multimodal AI Agents was held on October 28th, 2024, at ACMMM24 in Melbourne as an invitation-only event. The initiative was launched by Alberto Del Bimbo, Ramesh Jain, and Alan Smeaton following a vision of the future where multimedia expertise converges with the power of large language models and the belief that there is a great opportunity to position the Multimedia research community at the center of this transformation. The event was structured as three roundtables, inviting some of the most influential figures in the multimedia field to brainstorm on key issues. The goal was to design the future, identifying the multimodal opportunity in the days of powerful large-model systems and preparing an agenda for the coming years for the SIGMM community. We did not want to overlap with the current thinking of how multimodality will be included in the emerging large-models.  Instead, the goal was on how deep multimodality is essential in building next stages of AI agents for real world applications and how fundamental it is in understanding real-time contexts and for actions by agents. The event received a great response, with over 30 attendees from both Academia and Industry, representing 13 different countries.

Three roundtables focused on Tech ChallengesApplications, and Industry-University collaboration. The participants were divided into three groups and assigned to the three roundtables according to their profiles and preferences. For the roundtables, we did not prepare specific questions but rather outlined key areas of focus for discussion. A brief document that provided a short introduction for each roundtable, summarizing the topic of the debate and highlighting three major subjects to guide the discussion was prepared and given to the discussant a few days before the meeting. 

In the following we report a brief synthesis of the discussions at the roundtables, highlighting the principal arguments of discussion and proposals. 

Tech challenges Roundtable

Motivations for the discussion: As large pre-trained models become more prevalent and move towards multimodality, looking at the future, a key issue for their usage arises around the impact of their updating and fine-tuning, understanding how to ensure that improvements in one area don’t come at the cost of degradation in others. It is also fundamentally important to understand how deep multimodality is essential for building next stages of AI agents for real world applications, as well as for comprehending real-time contexts and guiding actions by agents towards Artificial General Intelligence. 

Some salient sentences, open questions, proposals from the discussion:

  • The interplay between human intelligence and machine intelligence is a fundamental aspect of what should be multi-modal. There are not yet deep enough multimodal models…. models for information that truly span all, or even a subset of modalities. We need metrics for this human-machine, human-intelligence machine-intelligence, action. We should come up with and define a task around how people collaborate productively. We should look at something like dynamic difficulty adjustment, that requires continuous, real-time development or training. 
  • Benchmarks are of crucial importance, not just to evaluate one thing against another thing, but to stretch the capabilities. It is not just about passing the benchmark; it is about setting the targets. We should envision a SIGMM-endorsed or sponsored multimodal benchmark by approaching some big tech companies to benchmark some multimodal activity within and across companies.

Applications Roundtable

Motivations for the discussion:   Multimodality is a cornerstone of emerging real-world applications, providing context and situational awareness to systems. Large Multimodal Models are credited for transforming various industries and enabling new applications. Key challenges lie in developing computational approaches for media fusion to construct context and situational understanding, addressing real-time computing costs, and refining model building. It is therefore essential for the SIGMM community to reason on how to build a vibrant community around one or a few key applications.

Some salient sentences, open questions, proposals from the discussion:

There are many areas for application where the SIGMM community can provide vital and innovative contributions and should concentrate its applicative research. Example application areas  and examples of research are: 

  • Health: there is an absence of open-ended sensory data representing of long-term complex information in the health area. We can think of integrated, federated machine learning, i.e. an integrated, federated data space for data control. 
  • Education: we can think of some futuristic learning approach, like completely autonomous learning.  Namely, AI agents that will be supportive through observation models, able to adjust the learning level so that some can finish faster than the others and learn depending on the modalities they like to receive. It is also of key importance to consider what the role of teacher and the role of AI is. 
  • Productivitywe can think of tools for immersive multi-modal experiences, to generate cross-modal content including 3D and podcasting in immersive environments.
  • Entertainment: we should think of how we can improve entertainment through immersive story driven experiences. 

Industry and University Roundtable

Motivations for the discussion:   Research on large AI models is by far dominated by private companies, thanks in part to their access to the data and the cost for building and training such models. As a result, academic institutions are being left behind in the AI race. It is therefore urgent to reason about which research directions are viable for universities and think of new Industry-University collaboration models for multimodal AI research.  It is also important to capitalize on the unique advantage of Academy, concerning their neutrality and ability to address long-term social and ethical issues related to technology.

Some salient sentences, open questions, proposals from the discussion:

  • Small and medium enterprises feel that they are left out. These are the ones who came to talk to universities. This is an opportunity for the SIGMM community to see how we can help.  SIGMM could sponsor joint PhD programs for example addressing small size, multi-model, foundation models, or intelligent agents, where a company sponsors part of the grant project. 
  • SIGMM should promote large visibility events at ACM Multimedia like Grand Challenges and Hackathons. As a community we could sponsor a company-wise Grand Challenge on multimodal AI and intelligent agents, leveraging industry to contribute more data sets. We could promote a regional-global Hackathon where Hackathons are held and overseen in different regions in the world, and the top teams then invited to come to ACM Multimedia and compete for it. 

Based on the discussions at the roundtables, we have identified several concrete actions that could help position the SIGMM research community at the forefront of the multimodal AI transformation:

At the next ACM Multimedia Conference

  • Explicit inclusion of multimodality as a key topic in the next ACM Multimedia call.
  • Multimodal Hackathon on Intelligent Agents (regional-global hackathon).
  • Multimodal Benchmarks (collaborations within and across major tech companies).
  • Multimodal Grand Challenges (in partnership with industry leaders).

At the next ACM SIGMM call for Special projects

  • Special Projects focused on Multimodal AI.

SIGMM is committed to pursuing these initiatives.

Diversity and Inclusion in focus at ACM IMX 2024

Summary: ACM IMX 2024 took place in Stockholm, Sweden, from June 12 to 14, continuing its dedication to promoting diversity within the community. Recognising the importance of amplifying varied voices and experiences to advance the field, the conference built on prior achievements in diversity and inclusion of IMX through a series of initiatives to promote diversity and inclusion (D&I).  This column provides a concise overview of the main D&I initiatives, including childcare support, early-career researcher grants, and manuscript accessibility support.  It includes participant feedback and short testimonials shared during and after the conference to highlight the value of these initiatives. 

To encourage a broad and inclusive pool of organisers, one method employed by the general chairs of ACM IMX’24 to prioritise diversity and inclusion was to team seasoned committee members with new members within the organising committee, this was done as a method to actively foster mentoring opportunities that support continuity and the development of future conference leadership. In addition to this, IMX’24 invited community members to self-nominate for various chair and organisational roles to make it clear that chair roles were open and available to all who were interested in being part of organising the conference. This call for applications was announced during the closing session of ACM IMX’23 in Nantes, France and, over a two-month period, the committee received 12 applications from which 5 candidates were selected to serve as chairs in various capacities. This inclusive approach allowed ACM IMX to engage with junior members and volunteers who might not have been reached through traditional recruitment methods, pairing them with experienced team members to ensure that they were able to build their network within the community and their skills in conference organisation and management. 

SIGMM support was used to enable the chairs of IMX’24 to introduce several initiatives to ensure that all individuals, regardless of personal circumstances, could participate fully in the conference. These initiatives had openly announced calls to all eligible community members who wished to attend the conference in person in Stockholm but required financial assistance. To ensure a fair and thorough selection, the IMX’24 Diversity and Inclusion Chairs, in collaboration with the General Chairs, reviewed each of the applications to ensure that the widest range of support could be offered with the available funds. Applications were evaluated on a rolling basis to ensure that participants were able to organise their travel and visa arrangements without the added challenges of time pressure.

With this support from SIGMM, Diversity and Inclusion grants for IMX were made available for participants, covering:

  • Travel Support for Non-Students from Marginalised and Underrepresented Groups: This grant provided travel support for researchers who self-identified as marginalised or underrepresented within the ACM IMX community, particularly those from non-WEIRD (Western, Educated, Industrialised, Rich, Developed) countries who lacked other funding opportunities. Priority was given to early-career researchers (such as post-docs), and those needing financial assistance, to compliment existing SIGCHI and SIGMM student targeted travel grants. 
  • Childcare and Parental Support: This grant offered financial assistance to parents attending ACM IMX’24, subsidising childcare costs to enable broader participation and to cover expenses related to children’s travel, travel for a childcare companion, and on-site or arranged babysitting during the conference.
  • Disability and Carer Support: This grant aimed to support attendees on extended leave from work due to disability, parental responsibilities, or other personal circumstances. Recipients of this award also received a complementary free conference registration. 
  • Student Travel Awards: SIGMM also provided awards directly to students to support travel expences, enabling a broader range of participation and complimenting free registration offered for those students volunteering at the conference. 

The SIGMM’s special initiatives for diversity and inclusion enable IMX’24 to secure a keynote designed to foster a more inclusive dialogue. Delivered by artist Jake Elwes—a self described hacker, radical faerie, and researcher—the keynote focused on “queer artificial intelligence” and featured deepfake drag performers. Elwes’ work invited the attendees to reflect on who builds these systems, the intentions behind them, and how they can be reclaimed to envision and create different visions of a technology enhanced future.

In combination with support from SIGMM, a special workshop focused on engaging with research and researchers from Latin America as a region of interest was made possible through the generous backing of the SIGCHI Development Fund (SDF). This enabled researchers and workshop keynote speakers to participate in both the “IMX in Latin America – 2nd International Workshop” and attend the conference. A core objective was to increase diversity by broadening the IMX community through actively encouraging colleagues from Latin America to attend and contribute. This workshop also published it’s submissions as part of the ACM IMX’24 workshop proceedings in ICPS.

For the first time at ACM IMX, an external provider (TAPS) was hired to ensure accessibility of papers prior to publication. Finally, the conference offered a range of venue-focused diversity and inclusion initiatives, including the provision of all-gender bathrooms, pronoun badges, and approachable senior community members to support engagement. Care corner and tables were thoughtfully set up throughout the conference to provide attendees with free hygiene essentials such as masks, refreshers, hand sanitisers, sanitary pads and tampons. These measures highlighted ACM IMX’24 commitment to fostering a welcoming and accessible environment for all participants.

Figure 1: Participants’ responses on their perception of diversity and inclusion at IMX, highlighting that it encompasses representation, welcoming environments, active engagement, research focus, and shaping future media experiences.

“During the closing event of IMX2024, we asked our attendees to answer a few questions that could help plan future IMX conferences. We asked everyone to share what future research directions could be included to address D&I at IMX. Some of the suggestions were to include the field of Humanities, to study usability among different demographics, and to understand how people who might not have economic access to technology could benefit from such technology. We also asked everyone to select what, according to them, is D&I at IMX. The options Everyone feels welcomed, Diverse individuals are able to engage and contribute and People from diverse backgrounds get represented and have a voice received a majority of the votes when compared to “Shape the future of interactive media experiences and “Research that focuses on diversity and inclusion in media experiences”. When asked to share how included they felt at IMX2024, 92% of the participants shared that they either felt included or very much [with some leaving the question unanswered]. They also shared how different aspects made them feel included. Some of the highlights were the care corner that was arranged to support the basic needs of the attendees, the social events, interactions at the conference, and the community. ” – Sujithra Raviselvam, IMX’24 Diversity and Inclusion Co-Chair.

Figure 2: Participants’ feedback on factors contributing to feelings of inclusion and exclusion at IMX, along with suggestions for future research directions aimed at improving diversity and inclusion. The feedback highlights personal interactions, event organization, and amenities as key to feeling included, while future research suggestions focus on enhancing accessibility, providing economic support, and integrating more diverse perspectives in HCI research.

The best way to understand the impacts of these supports is through the words of those who were enabled to join the conference by receiving it. 

The grant received for IMX2024 allowed me to attend the conference. Having a young child is challenging as an early researcher, as you must, sometimes, sacrifice your career or family. This grant allowed me to travel without any of these. I could attend the conference without stress or second thoughts, and support my family during the few days of the conference. Thanks to this, I received valuable feedback on my work, followed interesting presentations, and did not miss my family.” – Romain Herault, childcare award recipient. 

“I had the opportunity to present our qualitative study focused on understanding the sensitive values of women entrepreneurs in Brazil to support designing multi-model conversational AI financial systems at IMX, followed by interesting discussions about it in the workshop organized by Debora Christina Muchaluat Saade, Mylene Farias and Jesus Favela. The conference was focused on the future of multimodal technologies, with many exciting demos to investigate, to make more accessible, and to challenge assumptions of real life through a multimedia lens. We also had a conference dinner with the theme of the midsummer celebration. I was amazed by its meaning; as far as I understood, the purpose is to celebrate the light, sun, and summer season with family and friends! I loved it! It was also an opportunity to explore the beautiful Stockholm city with new colleagues and meet current collaborators in research.”– Heloisa Caroline de Souza Pereira Candello.  

A total of 21 applicants received support through diversity and inclusion grants provided by both SIGMM and the SIGCHI Development Fund (SDF). This assistance enabled full participation in ACM IMX’24 and supported a diverse group, including students, non-students from marginalised backgrounds, early-career researchers, and Latin American researchers, all of whom benefitted from these grants and made up more than 10% of the total conference attendees – truly changing and undoubtedly enhancing the experience of all attendees at the conference. 

Figure 3: The word clouds present two data sets from an IMX survey: the countries respondents identify as home, and the locations they would like IMX to feature in the future. It highlights a diverse range of home countries, including Brazil, Germany, and India, and suggest future IMX locations such as Japan, Brazil, and various cities in the USA, indicating a global interest and the geographical diversity of the IMX community.

One benchmarking cycle wraps up, and the next ramps up: News from the MediaEval Multimedia Benchmark

Introduction

MediaEval, the Multimedia Evaluation Benchmark, has offered a yearly set of multimedia challenges since 2010. MediaEval supports the development of algorithms and technologies for analyzing, exploring and accessing information in multimedia data. MediaEval aims to help make multimedia technology a force for good in society and for this reason focuses on tasks with a human or social aspect. Benchmarking contributes in two ways to advancing multimedia research. First, by offering standardized definitions of tasks and evaluation data sets, it makes it possible to fairly compare algorithms and, in this way, track progress. If we can understand which types of algorithms perform better, we can more easily find ways (and the motivation) to improve them. Second, benchmarking helps to direct the attention of the research community, for example, towards new tasks that are based on real-world needs, or towards known problems for which more research is necessary to have a solution that is good enough for a real world application scenario.

The 2023 MediaEval benchmarking season culminated with the yearly workshop, which was held in conjunction with MMM 2024 (https://www.mmm2024.org) in Amsterdam, Netherlands. It was a hybrid workshop, which also welcomed online participants. The workshop kicked off with a joint keynote with MMM 2024. Yiannis Kompatsiaris, Information Technologies Institute, CERTH, on Visual and Multimodal Disinformation Detection. The talk covered the implications of multimodal disinformation online and the challenges that must be faced in order to detect it. The workshop featured an invited speaker, Adriënne Mendrik, CEO & Co-founder of Eyra, supporting benchmarks with the online Next platform. She talked about benchmark challenge design for science and how the Next platform is currently being used in the Social Sciences.

More information about the workshop can be found at https://multimediaeval.github.io/editions/2023/ and the proceedings were published at  https://ceur-ws.org/Vol-3658/ In the rest of this article, we provide an overview of the highlights of the workshop as well as an outlook to the next edition of MediaEval in 2025.  

Tasks at MediaEval

The MultimediaEval Workshop 2023 featured five tasks that focused on human and social aspects of multimedia analysis.

Three of the tasks required participants to combine or cross modalities or even consider new modalities. The Musti: Multimodal Understanding of Smells in Texts and Images task challenged participants to detect and classify smell references in multilingual texts and images from the 17th to the 20th century. They needed to identify whether a text and image evoked the same smell source, detect specific smell sources, and apply zero-shot learning for untrained languages. The remaining two tasks emphasized the social aspects of multimedia. In the NewsImages: Connecting Text and Images task, participants worked with a dataset of news articles and images, predicting which image accompanied a news article. This task aimed to explore cases in which there is a link between a text and an image that goes beyond the text being a literal description of what was pictured in the image. The Predicting Video Memorability task required participants to predict how likely videos were to be remembered, both short- and long-term, and to use EEG data to predict whether specific individuals would remember a given video, combining visual features and neurological signals. 

Two of the tasks focused on pushing forward video analysis, to be useful to support experts in carrying out their jobs. The task SportsVideo: Fine-Grained Action Classification and Position Detection task strives to develop technology that will support coaches. To address this task, participants analyzed videos of table tennis and swimming competitions, detecting athlete positions, identifying strokes, classifying actions, and recognizing game events such as scores and sounds. The task Transparent Tracking of Spermatozoa strived to develop technology that will support medical professionals. Task participants were asked to track sperm cells in video recordings to evaluate male reproductive health. This involved localizing and tracking individual cells in real time, predicting their motility, and using bounding box data to assess sperm quality. The task emphasized both accuracy and processing efficiency, with subtasks involving graph data structures for motility prediction. 

Impressions of Student Participants

MediaEval is grateful to SIGMM for providing funding for three students who attended the MediaEval Workshop and greatly helped us with the organization of this edition: Iván Martín-Fernández and Sergio Esteban-Romero from Speech Technology and Machine Learning Group (GTHAU) – Universidad Politécnica de Madrid, and Xiaomeng Wang from Radboud University. Below the students provide their comments and impressions of the workshop.

“As a novel PhD student, I greatly valued my experience attending MediaEval 2023. I participated as the main author and presented work from our group, GTHAU – Universidad Politécnica de Madrid, on the Predicting Video Memorability Challenge. The opportunity to meet renowned colleagues and senior researchers, and learn from their experiences, provided valuable insight into what academia looks like from the inside. 

MediaEval offers a range of multimedia-related tasks, which may sometimes seem under the radar but are crucial in developing real-world applications. Moreover, the conference distinguishes itself by pushing the boundaries, going beyond just presenting results to foster a deeper understanding of the challenges being addressed. This makes it a truly enriching experience for both newcomers and seasoned professionals alike. 

Having volunteered and contributed to organizational tasks, I also gained first-hand insight into the inner workings of an academic conference, a facet I found particularly rewarding. Overall, MediaEval 2023 proved to be an exceptional blend of scientific rigor, collaborative spirit, and practical insights, making it an event I would highly recommend for anyone in the multimedia community.”

Iván Martín-Fernández, PhD Student, GTHAU – Universidad Politécnica de Madrid

“Attending MediaEval was an invaluable experience that allowed me to connect with a wide range of researchers and engage in discussions about the latest advancements in Artificial Intelligence. Presenting my work on the Multimedia Understanding of Smells in Text and Images (MUSTI) challenge was particularly insightful, as the feedback I received sparked ideas for future research. Additionally, volunteering and assisting with organizational tasks gave me a behind-the-scenes perspective on the significant effort required to coordinate an event like MediaEval. Overall, this experience was highly enriching, and I look forward to participating and collaborating in future editions of the workshop.”

Sergio Esteban-Romero, PhD Student, GTHAU – Universidad Politécnica de Madrid

“I was glad to be a student volunteer at MediaEval 2024. Collaborating with other volunteers, we organized submission files and prepared the facilities. Everyone was exceptionally kind and supportive.
In addition to volunteering, I also participated in the workshop as a paper author. I submitted a paper to the NewsImage task and delivered my first oral presentation. The atmosphere was highly academic, fostering insightful discussions. And I received valuable suggestions to improve my paper.  I truly appreciate this experience, both as a volunteer and as a participant.”

Xiaomeng Wang PhD Student, Data Science – Radboud University

Outlook to MediaEval 2025 

We are happy to announce that in 2025 MediaEval will be hosted in Dublin, Ireland, co-located with CBMI 2025. The Call for Task Proposals is now open, and details regarding submitting proposals can be found here: https://multimediaeval.github.io/2024/09/24/call.html. The final deadline for submitting your task proposals is Wed. 22nd January 2025. We will publish the list of tasks offered in March and registration for participation in MediaEval 2025 will open in April 2025.

For this edition of MediaEval we will again emphasize our “Quest for Insight”: we push beyond improving evaluation scores to achieving deeper understanding about the challenges, including data and the strengths and weaknesses of particular types of approaches, with the larger aim of understanding and explaining the concepts that the tasks revolve around, promoting reproducible research, and fostering a positive impact on society. We look forward to welcoming you to participate in the new benchmarking year.

Report from CBMI 2024

The 21st International Conference on Content-based Multimedia Indexing (CBMI) was hosted by Reykjavik University in cooperation with ACM, SIGMM, VTT and IEEE. The three-day event took place on September 20-22 in Reykjavik, Iceland. Like the year before, it was as an exclusively in-person event. Despite the remote location, an active volcano and in person attendance requirement, we are pleased to report that we had a perfect attendance of presenting authors. CBMI was started in France and still has strong European roots. Looking at the nationality of the submitting authors we can see 17 unique nationalities, 14 countries in Europe, 2 in Asia and 1 in North America.

Conference highlights

Figure 1: First keynote speaker being introduced.

Key elements of a successful conference are the keynote sessions. The first and opening keynote, titled “What does it mean to ‘work as intended’?” was presented by Dr. Cynthia C. S. Liem on day 1. In this talk Cynthia raised important questions on how complex it can be to define, measure and evaluate human-focused systems. Using real-world examples, she demonstrated how recently developed systems, that passed the traditional evaluating metrics, still failed when deployed in the real-world. Her talk was an important reminder that certain weaknesses in human-focused systems are only revealed when exposed to reality.

Figure 2: Keynote speaker Ingibjörg Jónsdóttir (left) and closing keynote speaker Hannes Högni Vilhjálmsson (right).

Traditionally there are only two keynotes at CBMI, first on day 1 and second on day 2. However, our planned second keynotes could not attend until the last day and thus a 3rd “surprise” keynote was organized on day 2 with the title “Remote Sensing of Natural Hazards”.  The speaker was Dr. Ingibjörg Jónsdóttir, an associate professor of geology at the University of Iceland. She gave a very interesting talk about the unique geology of Iceland, the threats posed by natural hazards and her work using remote sensing to monitor both sea ice and volcanoes. This talk was well received by attendees as it gave insight into the host country, the volcanic eruption that ended just a week before the start of the conference (7th in past 2 years on the Reykjanes Peninsula). This subject is highly relevant to community, as the analysis and prediction is based on multimodal data.

The planned second keynote took place in the last session on day 3 and was given by Dr. Hannes Högni Vilhjálmsson, professor at Reykjvik University. The talk, titled “Being Multimodal: What Building Virtual Humans has Taught us about Multimodality”, gave the audience a deep dive into lessons learnt from his 20+ years of experience of developing intelligent virtual agents with face-to-face communication skills. “I will review our attempts to capture, understand and analyze the multi-modal nature of human communication, and how we have built and evaluated systems that engage in and support such communication.” is a direct quote from his abstract of the talk. 

CBMI is a relatively small, but growing, conference that is built on a strong legacy and has a highly motivated community behind it. The special sessions have long played an important role at CBMI and this year there were 8 special sessions accepted.

  • AIMHDA: Advances in AI-Driven Medical and Health Data Analysis
  • CB4AMAS: Content-based Indexing for audio and music: from analysis to synthesis
  • ExMA: Explainability in Multimedia Analysis
  • IVR4B: Interactive Video Retrieval for Beginners
  • MAS4DT: Multimedia analysis and simulations for Digital Twins in the construction domain
  • MmIXR: Multimedia Indexing for XR
  • MIDRA: Multimodal Insights for Disaster Risk Management and Applications
  • UHBER: Multimodal Data Analysis for Understanding of Human Behaviour, Emotions and their Reasons
Figure 3: SS UHBER chair Dr.  E. Vildjunaite with a conference participant. 

The number of papers per session ranged from 2 to 8. The larger sessions (CB4AMAS, MmIXR and UHBER) used a discussion panel format that created a more inclusive atmosphere and, at times, sparked lively discussions. 

Figure 4: Images from the poster session and the IVR4B competition.

Especially popular with attendees was the competition that took place in the Interactive Video Retrieval for Beginners (IVR4B) session. This session was hosted right after the poster session in the wide open space of Reykjavik University’s foyer. 

Awards

The selection committee was unanimous in that the contribution of Lorenzo Bianchi, Fabio Carrara, Nicola Messina & Fabrizio Falchi, titled “Is CLIP the main roadblock for fine-grained open-world perception?”, was the best paper award winner. With the generous support of ACM SIGMM, they were awarded 500 Euros. As the best paper was indeed also a student paper, it was decided to also give the runner-up a 300 Euro award. The runner-up was the contribution of Recep Oguz Araz, Dmitry Bogdanov, Pablo Alonso-Jimenez and Frederic Font, titled “Evaluation of Deep Audio Representations for Semantic Sound Similarity”.

The best demonstration was awarded to Joshua David Springer, Gylfi Thor Gudmundsson and Marcel Kyas for “Lowering Barriers to Entry for Fully-Integrated Custom Payloads on a DJI Matrice”. 

The top two systems in the IVAR4B competition were also recognized: the first place was for Nick Pantelidis, Maria Pegia, Damianos Galanopoulos, et al. for “VERGE: Simplifying Video Search for Novice”; and the second place was for Giuseppe Amato, Paolo Bolettieri, Fabio Carrara, et al. for “VISIONE 5.0: toward evaluation with novice users”. 

Social events

The first day of the conference was quite eventful as before the poster and IVAR4B sessions Francois Pineau-Benois and Raphael Moraly of the Odyssée Quartet performed selected classical works in the “Music-meets-Science” cultural event. The goals of the latter are to bring live classical music content to the community of Multimedia Research. Musicians played a concert and then discussed with researchers, specifically involved into music analysis and retrieval. Such kind of exchanges between content creators and content analysis, indexing and retrieval researchers has been a distinctive feature of CBMI since 2018. 
This event would not have been possible without the generous support of ACM SIGMM.

The second day was no less entertaining as before the banquet attendees took a virtual flight over Iceland’s beautiful landscape via the services of FlyOver Iceland. 
The next CBMI’2025 will be hold in Dublin organized by DCU.

The 2nd Edition of the Spring School on Social XR organized by CWI

ACM SIGMM co-sponsored the second edition of the Spring School on Social XR, organized by the Distributed and Interactive Systems group (DIS) at CWI in Amsterdam. The event took place on March 4th – 8th 2024 and attracted 30 students from different disciplines (technology, social sciences, and humanities). The program included 22 lectures, 6 of them open, by 23 instructors. The event was organized by Irene Viola, Silvia Rossi, Thomas Röggla, and Pablo Cesar from CWI; and Omar Niamut from TNO. The event was co-sponsored by the ACM Special Interest  Group on Multimedia ACM SIGMM, making available student grants and supporting international speaker from under-represented countries, and The Netherlands Institute for Sound and Vision (https://www.beeldengeluid.nl/en).

Students and organisers of the Spring School on Social XR (March 4th – 8th 2024, Amsterdam)

“The future of media communication is immersive, and will empower sectors such as cultural heritage, education, manufacturing, and provide a climate-neutral alternative to travelling in the European Green Deal”. With such a vision in mind, the organization committee continued for a second edition with a holistic program around the research topic of Social XR. The program included keynotes and workshops, where prominent scientists in the field shared their knowledge with students and triggered meaningful conversations and exchanges.

A poster session at the CWI DIS Spring School 2024.

The program included topics such as the capturing and modelling of realistic avatars and their behavior, coding and transmission techniques of volumetric video content, ethics for the design and development of responsible social XR experiences, novel rending and interaction paradigms, and human factors and evaluation of experiences. Together, they provided a holistic perspective, helping participants to better understand the area and to initiate a network of collaboration to overcome current limitations of current real-time conferencing systems.

The spring school is part of the semester program organized by the DIS group of CWI. It was initiated in May 2022 with the Symposium on human-centered multimedia systems: a workshop and seminar to celebrate the inaugural lecture, “Human-Centered Multimedia: Making Remote Togetherness Possible” of Prof. Pablo Cesar. Then, it was continued in 2023 with the 1st Spring School on Social XR.

The list of talks were:

  • “Volumetric Content Creation for Immersive XR Experiences” by Aljosa Smolic
  • “Social Signal Processing as a Method for Modelling Behaviour in SocialXR” by Julie Williamson
  • “Towards a Virtual Reality” by Elmar Eisemann
  • “Meeting Yourself and Others in Virtual Reality” by Mel Slater
  • “Social Presence in VR – A Media Psychology Perspective” by Tilo Hartmann
  • “Ubiquitous Mixed Reality: Designing Mixed Reality Technology to Fit into the Fabric of our Daily Lives” by Jan Gugenheimer
  • “Building Military Family Cohesion through Social XR: A 8-Week Field Study” by Sun Joo (Grace) Ahn
  • “Navigating the Ethical Landscape of XR: Building a Necessary Framework” by Eleni Mangina
  • “360° Multi-Sensory Experience Authoring” by Debora Christina Muchaluat Saade
  • “QoE Assessment of XR” by Patrick le Callet
  • “Bringing Soul to Digital” by Natasja Paulssen
  • “Evaluating QoE for Social XR – Audio, Visual, Audiovisual and Communication Aspects” by Alexander Raake
  • “Immersive Technologies Through the Lens of Public Values” by Mariëtte van Huijstee
  • “Designing Innovative Future XR Meeting Spaces” by Katherine Isbister
  • “Evaluation Methods for Social XR Experiences” by Mark Billinghurst
  • “Recent Advances in 3D Videocommunication” by Oliver Schreer
  • “Virtual Humans in Social XR” by Zerrin Yumak
  • “The Power of Graphs Learning in Immersive Communications” by Laura Toni
  • “Boundless Creativity: Bridging Sectors for Social Impact” by Benjamin de Wit
  • “Social XR in 5G and Beyond: Use Cases, Requirements, and Standardization Activities” by Lea Skorin-Kapov
  • “An Overview on Standardization for Social XR”  by Pablo Perez and Jesús Gutiérrez
  • “Funding: The Path to Research Independence” by Sergio Cabrero

SIGMM Strike Teams Activity Report (April, 2024)

On April 10th, 2024, during the SIGMM Advisory Board meeting, the Strike Team Leaders, Touradj Ebrahimi, Arnold Smeulders, Miriam Redi and Xavier Alameda Pineda (represented by Marco Bertini) reported the results of their activity. They are summarized in the following in the form of recommendations that should be intended as guidelines and behavioral advice for our ongoing and future activity. SIGMM members in charge of SIGMM activities, SIGMM Conference leaders and particularly the organizers of the next ACMMM editions, are invited to adhere to these recommendations for their concerns, implement the items marked as mandatory and report to the SIGMM Advisory Board after the event.

All the SIGMM Strike Teams will remain in charge for two years starting January 1st, 2024 for reviews and updates.

The world is changing rapidly, and technology is driving these changes at an unprecedented pace. In this scenario, multimedia has become ubiquitous, providing new services to users, advanced modalities for information transmission, processing, and management, as well as innovative solutions for digital content understanding and production. The progress of Artificial Intelligence has fueled new opportunities and vitality in the field. New media formats, such as 3D, event data, and other sensory inputs, have become popular. Cutting-edge applications are constantly being developed and introduced.

SIGMM Strike Team on Industry Engagement

Team members: Touradj Ebrahimi (EPFL),Ali Begen (Ozyegin Univ), Balu Adsumilli (Google), Yong Rui (Lenovo) and ChangSheng Xu (Chinese Academy of Sciences)
Coordinator: Touradj Ebrahimi

The team provided recommendations for both ACMMM organizers and SIGMM Advisory Board. The recommendations addressed improving the presence of industry at ACMMM and other SIGMM Conferences/Workshops launching new in-cooperation initiatives and establishing stable bi- directional links.

  1. Organization of industry-focused events
    • Suggested / Mandatory for ACMMM Organizers and SIGMM AB: Create industry-focused promotional materials like pamphlets/brochures for industry participation (sponsorship, exhibit, etc.) in the style of ICASSP 2024 and ICIP 2024
    • Suggested for ACMMM Organizers: invite Keynote Speakers from industry, eventually with financial support of SIGMMM. Keynote talks should be similar to plenary talks but around specific application challenges.
    • Suggested for ACMMM Organizers: organize Special Sessions and Workshops around specific applications of interest to companies and startups. Sessions should be coordinated by industry with eventual support from an experienced and confirmed scholar.
    • Suggested for ACMMM Organizers: organize Hands-on Sessions led by industry to receive feedback on future products and services.
    • Suggested for ACMMM Organizers: organize Panel Sessions led by industry and standardization committees on timely topics relevant to industry e.g. How companies cope with AI.
    • Suggested for ACMMM Organizers: organize Tutorial sessions given by qualified people from industry and standardization committees at SIGMM-sponsored conferences/workshops
    • Suggested for ACMMM Organizers: promote contributions mainly from the industry in theform of Industry Sessions to present companies and their products and services.
    • Suggested for ACMMM Organizers and SIGMM AB: promote Joint SIGMM / Standardization workshop on latest standards e.g. JPEG meets SIGMM, MPEG meets SIGMM, AOM meets SIGMM.
    • Suggested for ACMMM Organizers: organize Job Fairs like job interview speed dating during ACMMM
  2. Initiatives for linkage
    • Mandatory for SIGMM Organizers and SIGMM AB: Create and maintain a mailing list of industrial targets, taking care of GDPR (Include a question in the registration form of SIGMM-sponsored conferences)
    • Suggested for SIGMM AB: organize monthly talks by industry leaders either from large established or SMEs or startups sharing technical/scientific challenges they face and solutions
  3. Initiatives around reproducible results and benchmarking
    • Suggested for ACMMM Organizers and SIGMM AB: support release of databases, studies on performance assessment procedures and metrics eventually focused on specific applications.
    • Suggested for ACMMM Organizers: organize Grand Challenges initiated and sponsored by industry.

Strike Team on ACMMM Format

Team Members: Arnold Smeulders (Univ. of Amsterdam), Alan Smeaton (Dublin City University), Tat Seng Chua (National University of Singapore), Ralf Steinmetz (Univ. Darmstadt), Changwen Chen (Hong Kong Polytechnic Univ.), Nicu Sebe (Univ. of Trento), Marcel Worring (Univ. of Amsterdam), Jianfei Cai (Monash Univ.), Cathal Gurrin (Dublin City Univ.).
Coordinator: Arnold Smeulders

The team provided recommendations for both ACMMM organizers and SIGMM Advisory Board. The recommendations addressed distinct items related to Conference identity, Conference budget and Conference memory.

1. Intended audience. It is generally felt that ACMMM is under pressure from neighboring conferences growing very big. There is consensus that growing big should not be the purpose of ACMMM: a 750 – 1500 size was thought to be ideal including being attractive to industry. Growth should come naturally.

  • Suggested for ACMMM Organizers and SIGMM AB: Promote distant travel by lowering fees for those who travels far
  • Suggested for ACMMM Organizers: Include (a personalized) visa invitation in the call for papers.

2. Community feel, differentiation and interdisciplinarity. Identity is not an actionable concern, but one of the shared common goods is T-shaped individuals interested in neighboring disciplines making an interdisciplinary or multidisciplinary connection. It is desirable to differentiate submitted papers from major close conferences like CVPR. This point is already implemented in the call for papers of ACMMM 2024.

    null
  • Mandatory for ACMMM OrganizersAsk in the submission how the paper fits in the multimedia community and its scientific tradition as illustrated by citations. Consider this information in the explicit review criteria.
  • Recommended for ACMMM Organizers: Support the physical presence of participants by rebalancing fees.
  • Suggested for ACMMM Organizers and SIGMM AB: Organize a session around the SIGMM test of time award, make selection early, funded by SIGMM.
  • Suggested for ACMMM Organizers: Organize moderated discussion sessions for papers on the same theme.

3. Brave New Ideas. Brave New is very well fitting with the intended audience. It is essential that we are able to draw out brave and new ideas from our community for long term growth and vibrancy. The emphasis in reviewing Brave New Ideas should be on the novelty even if it is not perfect. Rotate over a pool of people to prevent lock-in.

    null
  • Suggested / Mandatory for ACMMM OrganizersInclude in the submission a 3-minute pitch video to archive in the ACM digital library.
  • Suggested / Mandatory for ACMMM Organizers: Select reviewers from a pool of senior people to review novelty.
  • Suggested for ACMMM Organizers: Start with one session of 4 papers, if successful, add another session later.

4. Application. There should be no support for one specific application area exclusively in the main conference. Yet, applications areas should be focused in special sessions or workshops.

  • Suggested for ACMMM Organizers: Focus on application-related workshops or special sessions with own reviewing.

5. Presentation. When the core business of ACM MM is inter- and multi-disciplinarity it is natural to make the presentation for a broader audience part of the selection. ACM should make the short videos accessible as a service to the science or general public. TED-like videos for a paper fit naturally with ACMMM and fit with the trend in YouTube to communicate your paper. If too much to do, SIGMM AB should support reviewing the videos financially.

  • Mandatory to ACMMM Organizers: Include a TED-like 3-minute pitch video as part of the submission and this is archived by ACM Digital Library as part of the conference proceedings, to be submitted a week after the paper deadline for review, so there is time to prepare it after the regular paper submission.

6. Promote open-accessFor a data-driven and fair comparison promote open access of data to be used in the next conference to compare to.

  • Suggested for SIGMM AB: Open access for data encouraged.

7. Keynotes. For the intended audience and interdisciplinary, it is felt essential to have keynote on the key-topics of the moment. Keynotes should not focus on one topic but maintaining the diversity of topics in the conference and over the years, so to be sure new ideas are inserted in the community.

  • Suggested to SIGMM AB: to directly fund a big name, expensive, marquee keynote speaker sponsored by SIGMM to one of the societally urgent key-notes as evident from news.

8. Diversity over subdisciplines, etc Do extra effort for Arts, GenAI use models, security, HCI and demos. We need to ensure that if the submitted papers are of sufficiently high quality, there should be at least a session on that sub- topic in the conference. We need to ensure that the conference is not overwhelmed by a popular topic with easy review criteria and generally of much higher review scores.

  •  Suggested for ACMMM Organizers: Promote diversity of all relevant topics in the call for papers and by action in subcommunities by an ambassador. SIGMM will supervise the diversity.

9. Living report. To enhance the institutional memory, maintain a living document passed on from organizer to organizer, with suggestions. The owner of the document is the commissioner for conferences of SIG MM.

  • Mandatory for ACMMM Organizers and SIGMM AB: A short report to the SIGMM commissioner for conferences from the ACMMM chair, including a few recommendations for the next time; handed over to the next conference after the end of the current conference.

SIGMM Strike Team on Harmonization and Spread

Team members: Miriam Redi (Wikimedia Foundation), Sivia Rossi (CWI), Irene Viola (CWI), Mylene Farias (Texas State Univ. and Univ. Brasilia), Ichiro Ide (Nagoya Univ), Pablo Cesar (CWI and TU Delft).
Coordinator: Miriam Redi

The team provided recommendations for both ACMMM organizers and SIGMM Advisory Board. The recommendations addressed distinct items related to give SIGMM Records and Social Media a more central role in SIGMM, integrate SIGMM Records and Social Media in the whole process of the ACMMM organization since its initial planning.

1. SIGMM Website The SIGMM Website is not updated and needs a serious overhaul.

  • Mandatory for SIGMM AB: restart the website from scratch being inspired by other SIGs f.e. reaching out to people at CHI to understand what can be done. Budget should be provided by SIGMM.

2. SIGMM Social Media Channels SIGMM Social media accounts (twitter and linkedin) are managed by the Social Media Team at the SIGMM Records

  • Suggested for SIGMM AB: continuing this organization expanding responsibilities of the team to include conferences and other events

3. Conference Social Media: Social media presence of conferences is managed by the individual conferences. It is not uniform and disconnected from SIGMM social media and the Records. The social media presence of ACMMM flagship conference is weak and needs help. Creating continuity in terms of strategy and processes across conference editions is key.

  • Mandatory for ACMMM Organizers and SIGMM AB: create a Handbook of conference communications: a set of guidelines about how to create continuity across conference editions in terms of communications, and how to connect the SIGMM Records to the rest of the community.
  • Suggested for ACMMM Organizers and SIGMM AB: one member of the Social Media team at the SIGMM Records is systematically invited to join the OC of major conferences as publicity co-chair. The steering committee chair of each conference should commit to keeping the organizers of each conference edition informed about this policy, and monitor its implementation throughout the years.

SIGMM Strike Team on Open Review

Team members: Xavier Alameda Pineda (Univ. Grenoble-Alpes), Marco Bertini (Univ. Firenze). Coordinator: Xavier Alameda Pineda

The team continued the support to ACMMM Conference organizers for the use of Open Review in the ACMMM reviewing process, helping to implement new functions or improve the existing ones and supporting smooth transfer of the best practices. The recommendations addressed distinct items to complete the migration and stabilize use of Open Review in the future ACMMM editions.

1. Technical development and support

  • Mandatory for the Team: update and publish the scripts; complete the Open Review configuration.
  • Mandatory for SIGMM AB and ACMMM organizers: create a Committee led by the TPC chairs of the current ACMM edition a rotating basis.

2. Communication

  • Mandatory for the Team: write a small manual for use and include it in the future ACMMM Handbook.

Alberto Del Bimbo                                                                                        
SIGMM Chair

First edition of the Social Robotics, Artificial Intelligence and Multimedia (SoRAIM) School


In February 20204 was held the first edition of the Social Robotics, Artificial Intelligence and Multimedia (SoRAIM) Winter School, which, with the support of SIGMM attracted more than 50 students and young researchers to learn, discuss and first-hand experiment in topics related to social robotics. The event’s success calls for further editions in upcoming years.

Rationale for SoRAIM

SPRING, a collaborative research project funded by the European Commission under Horizon 2020, is coming to an end in May 2024. Its scientific and technological objectives were to test a versatile social robotic platform within a hospital and have it perform social activities in a multi-person, dynamic setup are in most part achieved. In order to empower the next generation of young researchers with concepts and tools to answer tomorrow’s challenges in the field of social robotics, one must tackle the issue of knowledge and know-how transmission. We therefore chose to provide a winter school, free of charge to the participants (thanks to the additional support of SIGMM), so that as many students and young researchers from various horizons (not only technical fields) could attend. 

Contents of the Winter School

The Social Robotics, Artificial Intelligence and Multimedia (SoRAIM) Winter School took place from 19 to 23 February 2024 in Grenoble, France. An introduction to the contents of the school and the context provided by the SPRING project was provided, and a demonstration combining social navigation and dialogue interaction was given on the first day. This triggered the curiosity of the participants, and a spontaneous Q&A session with the contributions, questions and comments from the participants to the school was held. 

The school spanned over the entire week, with 17 talks, 8 speakers from the H2020 SPRING project, and 9 invited speakers external to the project. The school also included a panel discussion on the topic “Are social robots already out there? Immediate challenges in real-world deployment”, a poster session with 15 contributions, and two hands-on sessions where the participants could choose among the following topics: Robot navigation with Reinforcement Learning, ROS4HRI: How to represent and reason about humans with ROS, Building a conversational system with LLMs using prompt engineering, Robot self-localisation based on camera images, and Speaker extraction from microphone recordings. A social activity (visit of Grenoble’s downtown and Bastille) was organised on Thursday afternoon, allowing participants to mingle with speakers and to discover the host town’s history.

One of the highlights of SoRAIM was its Panel Session, which topic was “Are social robots already out there? Immediate challenges in real-world deployment”.  Although no definitive answers were found, the session stressed the fact that challenges remain numerous for the deployment of actual social robots in our everyday lives (at work, at home). On the technical side, because robotic platforms are subject to certain hardware and software constraints. On the hardware side, because sensors and actuators are restricted in size, power and performance, since the physical space and the battery capacity are also limited. On the software side, because large models can be used if lots of computing resources are permanently available, which is not always the case, since they need to be shared between the various computing modules. Finally on the regulatory and legal side, because the rise of AI use is fast and needs to be balanced with ethical views that address our society’s needs; but the construction of proper laws, norms and their acknowledgement and understanding by stakeholders is slow. In this session the panellists surveyed all aspects of the problems at hand and provided an overview of the challenges that future scientists will need to solve in order to take social robots out of the labs and into the world.

Attendance & future perspectives

SoRAIM attracted 57 participants through the whole week. The attendees were diverse, as was aimed initially, with a breakdown of 50% of PhD students, 20% of young researchers (public sector), 10% of engineers and young researchers (private sector), and 20% of MSc students. Of particular focus, the ratio of women attendees was close to 40%, which is double of the usual in this field. Finally, in terms of geographic spread, attendees came in majority from other European countries (17 countries total), with just below 50% attendees coming from France. Following the school, a satisfaction survey was sent to the attendees in order to better grasp which elements were the most appreciated in view of a longer-term objective to hold this winter school as a serial event. Given the diverse background of attendees, opinions on contents such as the hands-on session varied, but overall satisfaction was very high, which shows the interest of the next generation of researchers for more opportunities to learn in this field. We are currently reviewing options to held similar events each year or every two years, depending on available funding.

More information about the SoRAIM winter school is available on the webpage: https://spring-h2020.eu

Sponsors

SoRAIM was sponsored by the H2020 SPRING project, Inria, the University Grenoble Alpes, the Multidisciplinary Institute of Artificial Intelligence and by ACM’s Special Interest Group on Multimedia (SIGMM). Through ACM SIGMM, we received significant funding which allowed us to invite 14 students and young researchers, members of SIGMM, from abroad.

Full list of contributions

All the talks are available in replay on our YouTube channel: https://www.youtube.com/watch?v=ckJv0eKOgzY&list=PLwdkYSztYsLfWXWai6mppYBwLVjK0VA6y
The complete list of talks and posters presented at SoRAIM Winter School 2024 can be found here: https://spring-h2020.eu/soraim/
In the following, the list of talks in chronological order:

Message from the ACM SIGMM Chair

About our initiatives for the Multimedia community

Dear SIGMM members, colleagues, students,

The world is changing rapidly, and technology is driving these changes at an unprecedented pace. In this scenario, multimedia has become ubiquitous, providing new services to users, advanced modalities for information transmission, processing, and management, as well as innovative solutions for digital content understanding and production. The progress of Artificial Intelligence has fueled new opportunities and vitality in the field. New media formats, such as 3D, event data, and other sensory inputs, have become popular. Cutting-edge applications are constantly being developed and introduced.

We believed that these changes should be reflected in our SIGMM flagship conference, ACM Multimedia and the SIGMM organization and activities overall. This belief led us to organize the SIGMM Retreat in coincidence with ACM MM23 in Ottawa on October 30, 2023. The goal of the meeting was opening a discussion on key strategic issues such as the coverage of ACM Multimedia, its quality and reputation and how we can grow the SIGMM community. We invited the members of the SIGMM Advisory Committee, the members of the Steering Committees of the SIGMM-sponsored conferences, the ACM TOMM Editor in Chief, the past SIGMM Chairs, and senior personalities and emerging researchers of our community.  The Retreat was well attended. Twenty people attended in-person. Ten attended on-line. Alberto Del Bimbo, SIGMM Chair, chaired the Retreat with the assistance of Phoebe Chen, SIGMM Vice-chair and Xavier Alameda Pineda.  

The discussion was vibrant and valued opinions and suggestions emerged. It was widely agreed that the distinctive feature of multimedia research is the combination and integration of various modalities to build end-to-end systems.

People agreed on the need to introduce significant changes in the format of our flagship conference to bring new attractiveness. High consensus received the ideas of giving more room to Brave New Ideas sections, having TED-like talks, and soliciting workshops on innovative topics and striving for their continuity. There was also consensus on revitalizing the program by including new emerging topics like Foundation Models, 3D, glass free interactivity, new networking platforms. All the attendees recognized the need to balance the traditional research areas of the ACM Multimedia program.

There was general agreement on using Open Access as the reviewing system for ACM Multimedia. It was recognized it improves the quality and transparency of the reviewing process, enhances respectability, empowers the reviewers to conduct serious reviews, and aligns ACM Multimedia to the top ranked conferences. 

Other important topics of discussion included how to incentivize in-person attendance and discourage online participation to maximize the value of conferences, the collaboration and synchronization of SIGMM-sponsored conferences, and the need to make the transition between conference editions more seamless.

Recognizing the need for greater industry presence, also to offer internship opportunities for students and improve the attendance of younger generations, was identified as a key issue for improvement. All the attendees recognized the importance to exploit SIGMM Records and Social Media as a means to improve the sense of community and disseminate information.

Following our commitment to align words with actions, we decided to create Strike Teams focusing on the most strategic themes. These teams are composed of a few experienced colleagues who volunteered to define realistic strategies for the key issues, determine concrete actions, and help to implement them in the near future. Starting in January 2024, four strike teams are in operation, with members appointed for two years:

  • SIGMM Strike Team on Open Review to provide operational support on the implementation of Open Review, smoothly transferring the best practices and helping to provide new functions.
    Team members are: Xavier Alameda Pineda (Univ. Grenoble-Alpes) Coordinator, Marco Bertini (Univ. Firenze).
  • SIGMM Strike Team on Harmonization and Spread to integrate SIGMM Records and Social Media in the whole process of the ACM Multimedia organization, improve synchronization and harmonization between ACM Multimedia and other SIGMM Conferences, and strengthen the sense of community.
    Team members are: Miriam Redi (Wikimedia Foundation) Coordinator, Silvia Rossi (CWI), Irene Viola (CWI), Mylene Farias (Texas State Univ. and Univ. Brasilia), Ichiro Ide (Nagoya Univ), Pablo Cesar (CWI and TU Delft).
  • SIGMM Strike Team on Industry Engagement to improve the presence of industry at ACM Multimedia, launching new in-cooperation initiatives and establishing stable bi-directional links.
    Team members are:  Touradj Ebrahimi (EPFL) Coordinator, Ali Begen (Ozyegin Univ), Balu Adsumilli (Google), Yong Rui (Lenovo) and ChangSheng Xu (Chinese Academy of Sciences)
  • Strike Team on ACMMM Format to innovate the ACM Multimedia program, aligning it with technological advancements and the emergence of new research areas, and igniting fresh and efficient means of disseminating research.
    Team members are: Arnold Smeulders (Univ. of Amsterdam) Coordinator, Alan Smeaton (Dublin City University), Tat Seng Chua (National University of Singapore), Changwen Chen (Hong Kong Polytechnic Univ.),  Nicu Sebe (Univ. of Trento), Marcel Worring  (Univ. of Amsterdam) and the Chairs of the next two ACMMM Conferences, Jianfei Cai (Monash Univ.) and Cathal Gurrin (Dublin City Univ.).

All the teams report to SIGMM Chair and the SIGMM Executive Committee and will work in close connection with the General Chairs and Program Chairs of the next ACM Multimedia editions.

I take this opportunity to thank again all those who participated in the SIGMM Retreat, and especially those who are committed to the Strike Teams. I sincerely hope that their work brings new ideas and vitality to our community and strengthens its visibility and reputation in the international scientific arena in the years to come.

Alberto Del Bimbo                                                                                        
SIGMM Chair

Report from CBMI 2023


The 20th International Conference on Content-based Multimedia Indexing (CBMI) was held exclusively as an in-person event in Orleans, France, on September 20-22, 2023. The conference was organized by the University of Orleans and received support from SIGMM. This edition marked a significant milestone as it was the first fully physical conference following the pandemic, providing a welcome opportunity for face-to-face interactions. The event drew a diverse and international audience, with participation from between 70 and 80 attendees representing 18 countries (12 Europeans, 4 Asians, 1 American and 1 African). Additionally, the conference included a European meeting (CHIST-ERA XAIface project) associated with the main event, which brought together approximately 15 individuals. Furthermore, several engineering students from the University of Orleans were invited to participate, allowing them to gain insights into cutting-edge multimedia research and exchange knowledge and ideas.

Program highlights

The conference was structured around two keynote presentations. The first keynote was presented by Prof. Alberto del Bimbo from the University of Florence, who spoke on the topic of “AI-Powered Personal Fashion Advising.” During his talk, Prof. Delbimbo discussed the key tasks and challenges related to using artificial intelligence in the fashion advisory field.

The closing keynote was delivered by Prof. Nicolas Hervé from the Institut National de l’Audiovisuel (French National Audiovisual Archive). Prof. Hervé highlighted the research activities conducted at Ina and how they could be integrated into information systems and enhance the value of their collections. His presentation provided insights into the practical applications of their work.

Presentation of our keynote speakers.

In conjunction with the presentation of 18 papers across four regular paper sessions, the 2023 conference adhered to the established tradition of previous editions by incorporating special sessions. These special sessions were designed to delve into the practical applications of multimedia indexing within specific domains or distinctive settings. This approach allowed for a more focused and in-depth exploration of several topics, offering valuable insights and discussions beyond the regular paper sessions.

In the ongoing year, we received a substantial volume of submissions, culminating in the approval of six special sessions. These special sessions have collectively embraced a total of 25 accepted papers.

  • Cultural Heritage and Multimedia Content
  • Interactive Video Retrieval for Beginners (IVR4B)
  • Physical Models and AI in Image and in Multi-modality 
  • Computational Memorability of Imagery
  • Cross-modal multimedia analysis and retrieval for well-being insights
  • Explainability in Multimedia Analysis (ExMA)

The coordination of these special sessions involved the collaborative efforts of multiple countries, including France, Austria, Ireland, Iceland, the UK, Romania, Japan, Norway, and Vietnam.

The special sessions encompassed a diverse range of multimedia topics, spanning from applications such as cultural heritage preservation and retrieval to machine learning, with a particular focus on facets like explainability and the utilization of physical models.

The conference program was complemented by a poster session composed of fourteen posters. The latter was followed by a demo session which comprised IVR4B video retrieval competition. 

Participants at the poster session.
Participants at the demo session.

The best paper of the conference was awarded EUR 500, generously sponsored by ACM SIGMM. The selection committee quickly found consensus to award the best paper award to Romain XU-DARME, Jenny Benois-Pineau, Romain Giot, Georges Quénot, Zakaria Chihani, Marie-Christine Rousset and Alexey Zhukov for their paper “On the stability, correctness, and plausibility of visual explanation methods based on feature importance”.

Social events

In addition to the two conference dinners organized by the conference committee, the participants had the opportunity to enjoy a guided tour through Orleans on their way to the first restaurant.

Participants enjoyed the first dinner after the guided tour

Among the social events organized during CBMI 2023, was the Music meets Science concert with the support of ACM SIGMM. After a series of scientific presentations, participants were able to appreciate the works of Beethoven, Murphy and Lizee. We thank ACM SIGMM for their support which made this cultural event possible.

The Odyssée Quartet composed of François Pineau-Benois (violinist),  Raphael Moraly (cellist), Olivier Marin (violist) and Audrey Sproule (violinist).

Outlook

The next edition of CBMI will be organized in Iceland. After several hybrid editions, we moved back on site towards the pre-pandemic level. 

Equity, Diversity and Inclusion at ACM MMSys 2023


The 14th ACM Multimedia Systems Conference (MMSys 2023) took place from June 7-10, 2023 in Vancouver, Canada. To continue the significant efforts from the last years,  and building on the strong commitment of the MMSys community to create a diverse, inclusive and accessible forum to discuss advancements in the area of multimedia systems and the technology experiences they enable, several EDI measures were adopted.  The main goals were to (1) raise awareness around the importance of diversity and inclusion for both the MMSys community and the research fields represented at MMSys and (2) to enable diverse participation and inclusion of underrepresented groups. In this column, we provide a brief overview of the main EDI activities and a number of key numbers, as well as short testimonials from two participants.  

Support and activities

Associate Professor Yvette Wohn giving her EDI keynote on “Moderating the Metaverse”

Supported by the ACM Special Interest Group on Multimedia (SIGMM) and ACM through founding for special initiatives, the provided support at MMSys 2023 included the following:

1. EDI Keynote Speech
We invited Dr. Yvette Wohn for a keynote speech on Moderating the Metaverse. Dr. Wohn (she/her) is an associate professor of Informatics at New Jersey Institute of Technology and director of the Social Interaction Lab . Her research is in the area of Human Computer Interaction (HCI) where she studies the characteristics and consequences of social interactions in online environments such as virtual worlds and social media. Yvette’s keynote speech was very well received and ignited conversations during the conference.
Abstract of the talk: Online harassment is a problem that we still have been unable to solve in the social media age of Web 2.0. As we move deeper into Web 3.0, which includes 3D virtual worlds, moderation moves beyond content to include behavioral components such as embodied interactions. How do we design these systems to be creative and generative while maintaining safety and equity? This talk will discuss the challenges and opportunities, both social and technical, in creating the next wave of networked multimedia systems.

2. EDI Luncheon & Challenge
Our goal for the luncheon and challenge was picking a topic to spark conversations during lunch that is engaging enough for all audience, is something that everyone can have some opinion on (and those opinions can be challenged during conversations), and the answers can provide us some insight about our audience and their take on EDIJ issues.
The questions were: 

  • What is the biggest diversity issue that you think can affect YOU in the metaverse?
  • What is the simplest, yet most practical solution you can think for this problem?

After the initial announcement and presentation, example scenarios and conversation icebreakers were printed and placed on the Break and Lunch tables and conversations were encouraged by volunteers, so that attendees would discuss over lunch, and submit their solution. The Rubric used for selecting the winner of this challenge was:

  • Problem (15 pts): Explorative Value, Importance, Scale of effect
  • Solution Quality (15 pts): Feasibility, Simplicity, Effectiveness
  • Each item was rated on the scale of 0-5: not meeting requirements: 0, minimal: 1, acceptable: 2, good: 3, very good: 4, excellent: 5.

We received 14 entries by the given deadline, and from two entries with 28 points, Dr. Sylvie Dijkstra-Soudrissanane was selected as the winner of the EDI Challenge for discussing the inaccurate representation of dark skin tones due to the inherent design of 3D capture devices such as LIDARs in her response. Sylvie’s wrote a short testimonial (see below).

3. Additional EDI Activities
EDI Considerations in Conference Name Tags
Preferred pronouns were used to foster a healthier and more inclusive space, safe and respectful for all attendees. In addition, the following was explicitly mentioned on the name tags:

  1. Diversity Advocate: To show we are proud of diversity and inclusion efforts, and we acknowledge and foster the enthusiasm for this important work.
  2. First-Timer: To easily find people who might not be familiar with the community to provide them further help and support, if needed.

Childcare Support
Due to financial uncertainty, we were not able to announce availability of childcare support funds before the conference which could help better planning for people with children and ensuring that we support all people with such need equally. However, we were nevertheless able to support a presenter who had planned childcare during the conference. Towards next year’s edition of MMSys, we strongly encourage that dedicated funds are made available well ahead of the conference, so that equal opportunities to attend can be offered to caregivers.

EDI volunteer support 
While most of our student volunteers were Vancouver-based and were supported with a free registration to the conference, one additional student volunteers who travelled to Vancouver and would otherwise not have been able to attend, was supported by the EDI chairs. His testimonial can be read below. 

Key numbers

  • Two out of Four Keynote Speakers for MMSys 2023 were Women (50%)
  • One out of Three Technical Program Chairs were Women (33%)
  • Nine out of 25 organizing committee members were Women (36%)
  • Four out of Fourteen (seventeen including parallel sessions) sessions of the main conference were chaired by Women (24%), and three out of four Workshop chairs were Women (75%).
Jinwei Zhao’s badge, illustrating several measures to make attendees feel welcome and included (e.g., showing self-selected preferred pronouns, diversity advocate, first time at MMSys indication, such that other attendees can make sure that new people to the conference are warmly welcomed and included).

Testimonials

Testimonial by Jinwei Zhao, Student Volunteer supported by the MMSys 2023 EDI  

“I was honored to be able to attend ACM MMSys 2023 in Vancouver as a student volunteer, an experience that afforded me a breadth of professional engagements. My responsibilities as a student volunteer encompassed assisting with the registration process and the assistance of technical sessions and workshops, thereby ensuring a seamless execution of the conference. It also gave me the invaluable opportunity to engage with distinguished researchers and talented PhD students in the multimedia community, facilitating a rich exchange of brilliant and novel ideas. The keynotes and technical sessions at the conference shed light on cutting-edge developments and emerging trends in the field of multimedia systems. These included advanced adaptive video bitrate algorithms, the integration of multimedia systems with next-generation networks like Starlink, the development of new protocols such as multipath QUIC and Media-Over-QUIC, and the future of immersive technologies in AR, VR, and XR domains. Additionally, I was deeply appreciative of receiving the ACM SIGMM MMSys Volunteer Honorarium after the conference. Although I did not have the occasion to present my research at MMSys 2023, the passion and dedication of my peers served as a catalyst for my further contributions to the field. This engagement was evidently fruitful and advantageous, as it led to the acceptance of my paper for presentation at MMSys 2024 next year. This experience also encouraged me to make further contributions more actively to the multimedia community, aligning with my decision to embark on a PhD program starting in 2024.»

Testimontial by Sylvie Dijkstra-Soudarissanane, MMSys 2023 attendee and winner of the MMSys 2023 EDI Challenge.

EDI Co-chair Dr. Dr Ouldooz Baghban Karimi hands over the EDI Challenge Award to the EDI Challenge winner Sylvie Dijkstra-Soudarissanane.

«I had the privilege of attending the ACM Multimedia Systems Conference (MMSys) in June 2023, an experience that left an incredible mark on my perspective as a scientist in the field of Social XR. The conference, held in the city of Vancouver, Canada, provided a unique platform for professionals from diverse backgrounds to converge and share cutting-edge insights in multimedia systems research and development. 
The MMSys conference proved to be an invaluable forum for hosting discussions on the latest advancements in multimedia technology. Keynotes and regular sessions covered a myriad of topics, ranging from advanced videos with 3D point clouds rendering, to multi-modal experiences and open software. This year, the rich program also included technical demo sessions, allowing participants to witness real-time systems in action, presented by leaders from organizations such as Xiaomi, Fraunhofer FOKUS, and my company TNO. Beyond the academic world, the conference facilitated networking and social interactions, providing a platform to connect with like-minded researchers. Engaging in discussions about user-interactive VR experiences, real-time holographic representations, and mobile-based deep learning video codecs … all happening in a breathtaking skyride above the Grouse Mountain added an extra layer of depth to the overall experience.
One of the highlights of my participation was the opportunity to pitch my idea on building socially responsible systems that prioritize inclusivity. The focus of my proposal revolved around designing systems that are inherently inclusive, considering factors such as skin tones, hair types, and ethnicities. The aim was to bridge the accessibility gap and ensure that these systems reach and cater to minority populations. It is a very personal endeavor, as a person of color. To my delight, this endeavor earned me recognition with a prestigious award in Diversity, Equity, and Inclusion. I am immensely proud to have received the DEI award offered by Dr Ouldooz Baghban Karimi for my commitment to inclusive research and innovation. This recognition reinforces the importance of pushing boundaries in technology to create solutions that resonate with diverse communities. The conference not only expanded my knowledge but also allowed me to forge meaningful connections with fellow researchers who share a passion for advancing the frontiers of multimedia systems.”