Students Report from ACM MMsys 2025

The 16th ACM Multimedia Systems Conference (with the associated workshops: NOSSDAV 2025 and MMVE 2025) was held from March 31st to April 4th 2025, in Stellenbosch, South Africa. By choosing this location, the steering committee marked a milestone for SIGMM: MMSys became the very first SIGMM conference to take place on the African continent. This perfectly aligns with the SIGMM ongoing mission to build an inclusive and globally representative multimedia‑systems community.

The MMSys conference brings together researchers in multimedia systems to showcase and exchange their cutting-edge research findings. Once again, there were technical talks spanning various multimedia domains and inspiring keynote presentations.

Recognising the importance of in‑person exchange—especially for early‑career researchers—SIGMM once again funded Student Travel Grants. This support enabled a group of doctoral students to attend the conference, present their work and start building their international peer networks.
In this column, the recipients of the travel grants share their experiences at MMSys 2025.

Guodong Chen – PhD student, Northeastern University, USA 

What an incredible experience attending ACM MMSys 2025 in South Africa! Huge thanks to SIGMM for the travel grant that made this possible. 

It was an honour to present our paper, “TVMC: Time-Varying Mesh Compression Using Volume-Tracked Reference Meshes”, and I’m so happy that it received the Best Reproducible Paper Award! 

MMSys is not that huge, but it’s truly great. It’s exceptionally well-organized, and what impressed me the most was the openness and enthusiasm of the community. Everyone is eager to communicate, exchange ideas, and dive deep into cutting-edge multimedia systems research. I made many new friends and discovered exciting overlaps between my research and the work of other groups. I believe many collaborations are on the way and that, to me, is the true mark of a successful conference. 

Besides the conference, South Africa was amazing, don’t miss the wonderful wines of Stellenbosch and the unforgettable experience of a safari tour. 

Lea Brzica – PhD student, University of Zagreb, Croatia

Attending MMSys’25 in Stellenbosch, South Africa was an unforgettable and inspiring experience. As a new PhD student and early-career researcher, this was not only my first in-person conference but also my first time presenting. I was honoured to share my work, “Analysis of User Experience and Task Performance in a Multi-User Cross-Reality Virtual Object Manipulation Task,” and excited to see genuine interest from other attendees.
Beyond the workshop and technical sessions, I thoroughly enjoyed the keynotes and panel discussions. The poster sessions and demos were great opportunities to explore new ideas and engage with people from all over the world.
One of the most meaningful aspects of the conference was the opportunity to meet fellow PhD students and researchers face-to-face. The coffee breaks and social activities created a welcoming atmosphere that made it easy to form new connections.

I am truly grateful to SIGMM for supporting my participation. The travel grant helped alleviate the financial burden of international travel and made this experience possible. I’m already hoping for the chance to come back and be part of it all over again!

Jérémy Ouellette – PhD student Concordia University, Canada

My time at MMSys 2025 was an incredibly rewarding experience. It was great meeting so many interesting and passionate people in the field, and the reception was both enthusiastic and exceptionally well organized. I want to sincerely thank SIGMM for the travel grant, as their support made it possible for me to attend and present my work. South Africa was an amazing destination, and the entire experience was both professionally and personally unforgettable. MMSys was also the perfect environment for networking, offering countless opportunities to connect with researchers and industry experts. It was truly exciting to see so much interest in my work and to engage in meaningful conversations with others in the multimedia systems community.

The 3rd Edition of Spring School on Social XR organised by CWI

The 3rd edition of the Spring School on Social XR organised by Distributed and Interactive Systems group (DIS) at CWI in Amsterdam took place from 7 to 10 April 2025. The event attracted 30 students from different disciplines (technology, social sciences, and humanities) and countries from the world (Europe but also Canada and USA). The event was organized by Silvia Rossi, Irene Viola, Thomas Röggla, and Pablo Cesar from CWI; and Omar Niamut from TNO. Also this year, it was co-sponsored by ACM SIGMM, thanks to the founding for Special initiatives, and for the first time has been recognised as an ACM Europe Council Seasonal School.

Students and organisers of the 3rd Spring School on Social XR

Across 9 lectures (4 of them open to public) and three hands‑on workshops led by 14 international instructors, participants had the possibility to have cross-domain interactions on Social XR. Sessions ranged from photorealistic avatar capture and behaviour modelling, through AI‑driven volumetric‑video production, low‑latency streaming and novel rendering techniques, to rigorous QoE evaluation frameworks and open immersive‑media datasets. A new thematic topic this year tackled the privacy, security and UX challenges that arise when immersive systems move from lab prototypes to real‑world communication platforms. Together, they provided a holistic perspective, helping participants to better understand the area and to initiate a network of collaboration to overcome current limitations of current real-time conferencing systems. A unique feature of the school is its Open Days, where selected keynotes are made publicly accessible both in person and via live streaming, ensuring broader engagement with the XR research community. In addition to theoretical and hands-on sessions, the school supports networking and discussions through dedicated events, including a poster presentation where participants can receive feedback from peers and experts in the field of Social XR. 

Students during a boat trip.

The list of talks were: 

  • “The Multiple Dimensions of Social in Social XR” by Sun Joo (Grace) Ahn (University of Georgia, USA) 
  • “Shaping VR Experiences: Designing Applications and Experiences for Quality of Experience Assessment” by Marco Carli (Universitá degli Studi Roma TRE, Italy) 
  • “Making a Virtual Reality” by Elmar Eisemann (TU Delft, The Netherlands) 
  • “Robotic Avatar Mediated Social Interaction” by Jan van Erp (TNO & University of Twente, The Netherlands) 
  • “Novel Opportunities and Emerging Risks of Social Virtual Reality Spaces for Online Interactions” by Guo Freeman (Clemson University, USA) 
  • “Privacy, Security and UX Challenges in (Social) XR: an Overview” by Katrien de Moor (NTNU, Norway)
  •  “AI-based Volumetric Content Creation for Immersive XR Experiences and Production Workflows” by Aljosa Smolic (Hochschule Luzern, Switzerland)
  • “Changing Habits, One Experience at a Time” by Funda Yildirim (University of Twente, The Netherlands) 
  • “Challenge-Driven Quality Evaluation and Dataset Development for Immersive Visual Experiences” by Emin Zerman (Mid Sweden University, Sweden) 

The list of Workshops were: 

  • “Cooperative Development of Social XR Evaluation Methods” by Jesús Gutiérrez (Universidad Politecnica de Madrid, Spain) and Pablo Pérez (Nokia XR Labs, Spain) 
  • “From Principle to Practice: Public Values in Action” by Mariëtte van Huijstee (Rathenau Institute, The Netherlands) and Paulien Dresscher (PublicSpaces, The Netherlands) 
  • “Interoperability: What is a Visual Positioning System and Why an Open Source One and Interoperability Between These Systems Need to be Established” by Alina Kadlubsky (Open AR Cloud Europe, Germany)

CASTLE 2024: A Collaborative Effort to Create a Large Multimodal Multi-perspective Daily Activity Dataset

This report describes the CASTLE 2024 event, a collaborative effort to create a PoV 4K video dataset recorded by a dozen people in parallel over several days. The participating content creators wore a GoPro and a Fitbit for approximately 12 hours each day while engaging in typical daily activities. The event took place in Ballyconneely, Ireland, and lasted for four days. The resulting data is publicly available and can be used for papers, studies, and challenges in the multimedia domain in the coming years. A preprint of the paper presenting the resulting dataset is available on arXiv (https://arxiv.org/abs/2503.17116). 

Introduction

Motivated by a requirement for a real-world PoV video dataset, a group of co-organizers of the annual VBS and LSC challenges came together to hold an invitation workshop and generate a novel PoV video dataset. In the first week of December 2024, twelve researchers from the multimedia community gathered in a remote house in Ballyconneely, Ireland, with the goal to create a large multi-view and multimodal lifelogging video dataset. Equipped with a Fitbit on their wrists, a GoPro Hero 13 on their heads for about 12 hours a day, with five fixed cameras capturing the environment, they began a journey of 4K lifelogging. They lived together for four full days and performed some typical living tasks, such as cooking, eating, washing dishes, talking, discussing, reading, watching TV, as well as playing games (ranging from paper plane folding and darts to quizzes). While this sounds very enjoyable, the whole event required a lot of effort, discipline, and meticulous planning – in terms of food and, more importantly, the data acquisition, data storage, data synchronization, avoiding the usage of any copyrighted material (book, movie, songs, etc.), limiting the usage of smartphones and laptops for privacy concerns, and making the content as diverse as possible. Figure 1 gives an impression of the event and shows different activities by the participants.

Figure 1: Participants at CASTLE 2024, having a light dinner and playing cards.

Organisational Procedure

Already months before the event, we were planning for the recording equipment, the participants, the activities, as well as the food.    

The first challenge was figuring out a way to make wearing a GoPro camera all day as simple and enjoyable as possible. This was realized by using the camera with the elastic strap for a strong hold, a specifically adapted rubber pad at the back side of the camera, and a USB-C cable to a large 20,000 mAh power bank that every participant was wearing in their pocket. In the end of the day, the Fitbits, the battery packs, and the SD cards of every participant were collected, approximately 4TB of data was copied to an on-site NAS system, the SD cards cleared, and the batteries fully charged, so that next day in the morning they were usable again.

We ended up with six people from Dublin City University, and six international researchers, but only 10 people were wearing recording equipment. Every participant was asked to prepare at least one breakfast, lunch, or dinner, and all the food and drinks were purchased a few days before the event. 

After arrival at the house, every participant had to sign an agreement that all collected data can be publicly released and used for scientific purposes in the future.    

CASTLE 2024 Multimodal Dataset

The dataset (https://castle-dataset.github.io/) that emerged from this collaborative effort contains heart rate and steps logs of 10 people, 4K@50fps video streams from five fixed mounted cameras, as well as 4K video streams from 10 head-mounted cameras. The recording time per day is 7-12 hours per device, resulting in over 600 hours of video that totals to about 8.5 TB of data, after processing and more efficient re-encoding. The videos were processed into one hour-long parts that are aligned to all start at the hour. This was achieved in a multi-stage process, using a machine-readable QR code-based clock for initial rough- and subsequent audio signal correlation analysis for fine-alignment. 

The language spoken in the videos is mainly English with a few parts of (Swiss-)German and Vietnamese. The activities by the participants include:

  • preparing food and drinks
  • eating
  • washing dishes
  • cleaning up
  • discussing
  • hiding items
  • presenting and listening
  • drawing and painting
  • playing games (e.g., chess, darts, guitar, various card games, etc.)
  • reading (out loud)
  • watching tv (open source videos)
  • having a walk
  • having a car-ride

Use Scenarios of the Dataset

The dataset can be used for content retrieval contests, such as the Lifelog Search Challenge (LSC) and the Video Browser Showdown (VBS), but also for automatic content recognition and annotation challenges, such as the CASTLE Challenge that will happen at ACM Multimedia 2025 (https://castle-dataset.github.io/).  

Further application scenarios include complex scene understanding, 3d reconstruction and localization, audio event prediction, source separation, human-human/machine interaction, and many more.

Challenges of Organizing the Event

As this was the first collaborative event to collect such a multi-view multimodal dataset, there were also some challenges that are worth mentioning and may help other people that want to organize a similar event in the future. 

First of all, the event turned out to be much more costly than originally planned for. Reasons for this are the increased living/rental costs, the travel costs for international participants, but also expenses for technical equipment such as batteries, which we originally did not intend to use. Originally we wanted to organize the event in a real castle, but it turned out to be way too expensive, without a significant gain.

For the participants it was also hard to maintain privacy for all days, since not even quickly responding to emails was possible. When having a walk or a car ride, we needed to make sure that other people or car plates were not recorded.

In terms of the data, it should be mentioned that the different recording devices needed to be synchronized. This was achieved via regular capturing of dynamic QR codes showing the master time (or wall clock time), and using these positions in all videos as temporal anchors during post-processing. 

The data volume together with the available transfer speed were also an issue and it required many hours during the nights to copy all the data from all sd-cards. 

Summary

The CASTLE 2024 event brought together twelve multimedia researchers in a remote house in Ireland for an intensive four-day data collection retreat, resulting in a rich multimodal 4K video dataset designed for lifelogging research. Equipped with head-mounted GoPro cameras and Fitbits, ten participants captured synchronized, real-world point-of-view footage while engaging in everyday activities like cooking, playing games, and discussing, with additional environmental video captured from fixed cameras. The team faced significant logistical challenges, including power management, synchronization, privacy concerns, and data storage, but ultimately produced over 600 hours of aligned video content. The dataset – freely available for scientific use – is intended to support future research and competitions focused on content-based video analysis, lifelogging, and human activity understanding.

SIGMM Workshop on Multimodal AI Agents

The SIGMM Workshop on Multimodal AI Agents was held on October 28th, 2024, at ACMMM24 in Melbourne as an invitation-only event. The initiative was launched by Alberto Del Bimbo, Ramesh Jain, and Alan Smeaton following a vision of the future where multimedia expertise converges with the power of large language models and the belief that there is a great opportunity to position the Multimedia research community at the center of this transformation. The event was structured as three roundtables, inviting some of the most influential figures in the multimedia field to brainstorm on key issues. The goal was to design the future, identifying the multimodal opportunity in the days of powerful large-model systems and preparing an agenda for the coming years for the SIGMM community. We did not want to overlap with the current thinking of how multimodality will be included in the emerging large-models.  Instead, the goal was on how deep multimodality is essential in building next stages of AI agents for real world applications and how fundamental it is in understanding real-time contexts and for actions by agents. The event received a great response, with over 30 attendees from both Academia and Industry, representing 13 different countries.

Three roundtables focused on Tech ChallengesApplications, and Industry-University collaboration. The participants were divided into three groups and assigned to the three roundtables according to their profiles and preferences. For the roundtables, we did not prepare specific questions but rather outlined key areas of focus for discussion. A brief document that provided a short introduction for each roundtable, summarizing the topic of the debate and highlighting three major subjects to guide the discussion was prepared and given to the discussant a few days before the meeting. 

In the following we report a brief synthesis of the discussions at the roundtables, highlighting the principal arguments of discussion and proposals. 

Tech challenges Roundtable

Motivations for the discussion: As large pre-trained models become more prevalent and move towards multimodality, looking at the future, a key issue for their usage arises around the impact of their updating and fine-tuning, understanding how to ensure that improvements in one area don’t come at the cost of degradation in others. It is also fundamentally important to understand how deep multimodality is essential for building next stages of AI agents for real world applications, as well as for comprehending real-time contexts and guiding actions by agents towards Artificial General Intelligence. 

Some salient sentences, open questions, proposals from the discussion:

  • The interplay between human intelligence and machine intelligence is a fundamental aspect of what should be multi-modal. There are not yet deep enough multimodal models…. models for information that truly span all, or even a subset of modalities. We need metrics for this human-machine, human-intelligence machine-intelligence, action. We should come up with and define a task around how people collaborate productively. We should look at something like dynamic difficulty adjustment, that requires continuous, real-time development or training. 
  • Benchmarks are of crucial importance, not just to evaluate one thing against another thing, but to stretch the capabilities. It is not just about passing the benchmark; it is about setting the targets. We should envision a SIGMM-endorsed or sponsored multimodal benchmark by approaching some big tech companies to benchmark some multimodal activity within and across companies.

Applications Roundtable

Motivations for the discussion:   Multimodality is a cornerstone of emerging real-world applications, providing context and situational awareness to systems. Large Multimodal Models are credited for transforming various industries and enabling new applications. Key challenges lie in developing computational approaches for media fusion to construct context and situational understanding, addressing real-time computing costs, and refining model building. It is therefore essential for the SIGMM community to reason on how to build a vibrant community around one or a few key applications.

Some salient sentences, open questions, proposals from the discussion:

There are many areas for application where the SIGMM community can provide vital and innovative contributions and should concentrate its applicative research. Example application areas  and examples of research are: 

  • Health: there is an absence of open-ended sensory data representing of long-term complex information in the health area. We can think of integrated, federated machine learning, i.e. an integrated, federated data space for data control. 
  • Education: we can think of some futuristic learning approach, like completely autonomous learning.  Namely, AI agents that will be supportive through observation models, able to adjust the learning level so that some can finish faster than the others and learn depending on the modalities they like to receive. It is also of key importance to consider what the role of teacher and the role of AI is. 
  • Productivitywe can think of tools for immersive multi-modal experiences, to generate cross-modal content including 3D and podcasting in immersive environments.
  • Entertainment: we should think of how we can improve entertainment through immersive story driven experiences. 

Industry and University Roundtable

Motivations for the discussion:   Research on large AI models is by far dominated by private companies, thanks in part to their access to the data and the cost for building and training such models. As a result, academic institutions are being left behind in the AI race. It is therefore urgent to reason about which research directions are viable for universities and think of new Industry-University collaboration models for multimodal AI research.  It is also important to capitalize on the unique advantage of Academy, concerning their neutrality and ability to address long-term social and ethical issues related to technology.

Some salient sentences, open questions, proposals from the discussion:

  • Small and medium enterprises feel that they are left out. These are the ones who came to talk to universities. This is an opportunity for the SIGMM community to see how we can help.  SIGMM could sponsor joint PhD programs for example addressing small size, multi-model, foundation models, or intelligent agents, where a company sponsors part of the grant project. 
  • SIGMM should promote large visibility events at ACM Multimedia like Grand Challenges and Hackathons. As a community we could sponsor a company-wise Grand Challenge on multimodal AI and intelligent agents, leveraging industry to contribute more data sets. We could promote a regional-global Hackathon where Hackathons are held and overseen in different regions in the world, and the top teams then invited to come to ACM Multimedia and compete for it. 

Based on the discussions at the roundtables, we have identified several concrete actions that could help position the SIGMM research community at the forefront of the multimodal AI transformation:

At the next ACM Multimedia Conference

  • Explicit inclusion of multimodality as a key topic in the next ACM Multimedia call.
  • Multimodal Hackathon on Intelligent Agents (regional-global hackathon).
  • Multimodal Benchmarks (collaborations within and across major tech companies).
  • Multimodal Grand Challenges (in partnership with industry leaders).

At the next ACM SIGMM call for Special projects

  • Special Projects focused on Multimodal AI.

SIGMM is committed to pursuing these initiatives.

Diversity and Inclusion in focus at ACM IMX 2024

Summary: ACM IMX 2024 took place in Stockholm, Sweden, from June 12 to 14, continuing its dedication to promoting diversity within the community. Recognising the importance of amplifying varied voices and experiences to advance the field, the conference built on prior achievements in diversity and inclusion of IMX through a series of initiatives to promote diversity and inclusion (D&I).  This column provides a concise overview of the main D&I initiatives, including childcare support, early-career researcher grants, and manuscript accessibility support.  It includes participant feedback and short testimonials shared during and after the conference to highlight the value of these initiatives. 

To encourage a broad and inclusive pool of organisers, one method employed by the general chairs of ACM IMX’24 to prioritise diversity and inclusion was to team seasoned committee members with new members within the organising committee, this was done as a method to actively foster mentoring opportunities that support continuity and the development of future conference leadership. In addition to this, IMX’24 invited community members to self-nominate for various chair and organisational roles to make it clear that chair roles were open and available to all who were interested in being part of organising the conference. This call for applications was announced during the closing session of ACM IMX’23 in Nantes, France and, over a two-month period, the committee received 12 applications from which 5 candidates were selected to serve as chairs in various capacities. This inclusive approach allowed ACM IMX to engage with junior members and volunteers who might not have been reached through traditional recruitment methods, pairing them with experienced team members to ensure that they were able to build their network within the community and their skills in conference organisation and management. 

SIGMM support was used to enable the chairs of IMX’24 to introduce several initiatives to ensure that all individuals, regardless of personal circumstances, could participate fully in the conference. These initiatives had openly announced calls to all eligible community members who wished to attend the conference in person in Stockholm but required financial assistance. To ensure a fair and thorough selection, the IMX’24 Diversity and Inclusion Chairs, in collaboration with the General Chairs, reviewed each of the applications to ensure that the widest range of support could be offered with the available funds. Applications were evaluated on a rolling basis to ensure that participants were able to organise their travel and visa arrangements without the added challenges of time pressure.

With this support from SIGMM, Diversity and Inclusion grants for IMX were made available for participants, covering:

  • Travel Support for Non-Students from Marginalised and Underrepresented Groups: This grant provided travel support for researchers who self-identified as marginalised or underrepresented within the ACM IMX community, particularly those from non-WEIRD (Western, Educated, Industrialised, Rich, Developed) countries who lacked other funding opportunities. Priority was given to early-career researchers (such as post-docs), and those needing financial assistance, to compliment existing SIGCHI and SIGMM student targeted travel grants. 
  • Childcare and Parental Support: This grant offered financial assistance to parents attending ACM IMX’24, subsidising childcare costs to enable broader participation and to cover expenses related to children’s travel, travel for a childcare companion, and on-site or arranged babysitting during the conference.
  • Disability and Carer Support: This grant aimed to support attendees on extended leave from work due to disability, parental responsibilities, or other personal circumstances. Recipients of this award also received a complementary free conference registration. 
  • Student Travel Awards: SIGMM also provided awards directly to students to support travel expences, enabling a broader range of participation and complimenting free registration offered for those students volunteering at the conference. 

The SIGMM’s special initiatives for diversity and inclusion enable IMX’24 to secure a keynote designed to foster a more inclusive dialogue. Delivered by artist Jake Elwes—a self described hacker, radical faerie, and researcher—the keynote focused on “queer artificial intelligence” and featured deepfake drag performers. Elwes’ work invited the attendees to reflect on who builds these systems, the intentions behind them, and how they can be reclaimed to envision and create different visions of a technology enhanced future.

In combination with support from SIGMM, a special workshop focused on engaging with research and researchers from Latin America as a region of interest was made possible through the generous backing of the SIGCHI Development Fund (SDF). This enabled researchers and workshop keynote speakers to participate in both the “IMX in Latin America – 2nd International Workshop” and attend the conference. A core objective was to increase diversity by broadening the IMX community through actively encouraging colleagues from Latin America to attend and contribute. This workshop also published it’s submissions as part of the ACM IMX’24 workshop proceedings in ICPS.

For the first time at ACM IMX, an external provider (TAPS) was hired to ensure accessibility of papers prior to publication. Finally, the conference offered a range of venue-focused diversity and inclusion initiatives, including the provision of all-gender bathrooms, pronoun badges, and approachable senior community members to support engagement. Care corner and tables were thoughtfully set up throughout the conference to provide attendees with free hygiene essentials such as masks, refreshers, hand sanitisers, sanitary pads and tampons. These measures highlighted ACM IMX’24 commitment to fostering a welcoming and accessible environment for all participants.

Figure 1: Participants’ responses on their perception of diversity and inclusion at IMX, highlighting that it encompasses representation, welcoming environments, active engagement, research focus, and shaping future media experiences.

“During the closing event of IMX2024, we asked our attendees to answer a few questions that could help plan future IMX conferences. We asked everyone to share what future research directions could be included to address D&I at IMX. Some of the suggestions were to include the field of Humanities, to study usability among different demographics, and to understand how people who might not have economic access to technology could benefit from such technology. We also asked everyone to select what, according to them, is D&I at IMX. The options Everyone feels welcomed, Diverse individuals are able to engage and contribute and People from diverse backgrounds get represented and have a voice received a majority of the votes when compared to “Shape the future of interactive media experiences and “Research that focuses on diversity and inclusion in media experiences”. When asked to share how included they felt at IMX2024, 92% of the participants shared that they either felt included or very much [with some leaving the question unanswered]. They also shared how different aspects made them feel included. Some of the highlights were the care corner that was arranged to support the basic needs of the attendees, the social events, interactions at the conference, and the community. ” – Sujithra Raviselvam, IMX’24 Diversity and Inclusion Co-Chair.

Figure 2: Participants’ feedback on factors contributing to feelings of inclusion and exclusion at IMX, along with suggestions for future research directions aimed at improving diversity and inclusion. The feedback highlights personal interactions, event organization, and amenities as key to feeling included, while future research suggestions focus on enhancing accessibility, providing economic support, and integrating more diverse perspectives in HCI research.

The best way to understand the impacts of these supports is through the words of those who were enabled to join the conference by receiving it. 

The grant received for IMX2024 allowed me to attend the conference. Having a young child is challenging as an early researcher, as you must, sometimes, sacrifice your career or family. This grant allowed me to travel without any of these. I could attend the conference without stress or second thoughts, and support my family during the few days of the conference. Thanks to this, I received valuable feedback on my work, followed interesting presentations, and did not miss my family.” – Romain Herault, childcare award recipient. 

“I had the opportunity to present our qualitative study focused on understanding the sensitive values of women entrepreneurs in Brazil to support designing multi-model conversational AI financial systems at IMX, followed by interesting discussions about it in the workshop organized by Debora Christina Muchaluat Saade, Mylene Farias and Jesus Favela. The conference was focused on the future of multimodal technologies, with many exciting demos to investigate, to make more accessible, and to challenge assumptions of real life through a multimedia lens. We also had a conference dinner with the theme of the midsummer celebration. I was amazed by its meaning; as far as I understood, the purpose is to celebrate the light, sun, and summer season with family and friends! I loved it! It was also an opportunity to explore the beautiful Stockholm city with new colleagues and meet current collaborators in research.”– Heloisa Caroline de Souza Pereira Candello.  

A total of 21 applicants received support through diversity and inclusion grants provided by both SIGMM and the SIGCHI Development Fund (SDF). This assistance enabled full participation in ACM IMX’24 and supported a diverse group, including students, non-students from marginalised backgrounds, early-career researchers, and Latin American researchers, all of whom benefitted from these grants and made up more than 10% of the total conference attendees – truly changing and undoubtedly enhancing the experience of all attendees at the conference. 

Figure 3: The word clouds present two data sets from an IMX survey: the countries respondents identify as home, and the locations they would like IMX to feature in the future. It highlights a diverse range of home countries, including Brazil, Germany, and India, and suggest future IMX locations such as Japan, Brazil, and various cities in the USA, indicating a global interest and the geographical diversity of the IMX community.

One benchmarking cycle wraps up, and the next ramps up: News from the MediaEval Multimedia Benchmark

Introduction

MediaEval, the Multimedia Evaluation Benchmark, has offered a yearly set of multimedia challenges since 2010. MediaEval supports the development of algorithms and technologies for analyzing, exploring and accessing information in multimedia data. MediaEval aims to help make multimedia technology a force for good in society and for this reason focuses on tasks with a human or social aspect. Benchmarking contributes in two ways to advancing multimedia research. First, by offering standardized definitions of tasks and evaluation data sets, it makes it possible to fairly compare algorithms and, in this way, track progress. If we can understand which types of algorithms perform better, we can more easily find ways (and the motivation) to improve them. Second, benchmarking helps to direct the attention of the research community, for example, towards new tasks that are based on real-world needs, or towards known problems for which more research is necessary to have a solution that is good enough for a real world application scenario.

The 2023 MediaEval benchmarking season culminated with the yearly workshop, which was held in conjunction with MMM 2024 (https://www.mmm2024.org) in Amsterdam, Netherlands. It was a hybrid workshop, which also welcomed online participants. The workshop kicked off with a joint keynote with MMM 2024. Yiannis Kompatsiaris, Information Technologies Institute, CERTH, on Visual and Multimodal Disinformation Detection. The talk covered the implications of multimodal disinformation online and the challenges that must be faced in order to detect it. The workshop featured an invited speaker, Adriënne Mendrik, CEO & Co-founder of Eyra, supporting benchmarks with the online Next platform. She talked about benchmark challenge design for science and how the Next platform is currently being used in the Social Sciences.

More information about the workshop can be found at https://multimediaeval.github.io/editions/2023/ and the proceedings were published at  https://ceur-ws.org/Vol-3658/ In the rest of this article, we provide an overview of the highlights of the workshop as well as an outlook to the next edition of MediaEval in 2025.  

Tasks at MediaEval

The MultimediaEval Workshop 2023 featured five tasks that focused on human and social aspects of multimedia analysis.

Three of the tasks required participants to combine or cross modalities or even consider new modalities. The Musti: Multimodal Understanding of Smells in Texts and Images task challenged participants to detect and classify smell references in multilingual texts and images from the 17th to the 20th century. They needed to identify whether a text and image evoked the same smell source, detect specific smell sources, and apply zero-shot learning for untrained languages. The remaining two tasks emphasized the social aspects of multimedia. In the NewsImages: Connecting Text and Images task, participants worked with a dataset of news articles and images, predicting which image accompanied a news article. This task aimed to explore cases in which there is a link between a text and an image that goes beyond the text being a literal description of what was pictured in the image. The Predicting Video Memorability task required participants to predict how likely videos were to be remembered, both short- and long-term, and to use EEG data to predict whether specific individuals would remember a given video, combining visual features and neurological signals. 

Two of the tasks focused on pushing forward video analysis, to be useful to support experts in carrying out their jobs. The task SportsVideo: Fine-Grained Action Classification and Position Detection task strives to develop technology that will support coaches. To address this task, participants analyzed videos of table tennis and swimming competitions, detecting athlete positions, identifying strokes, classifying actions, and recognizing game events such as scores and sounds. The task Transparent Tracking of Spermatozoa strived to develop technology that will support medical professionals. Task participants were asked to track sperm cells in video recordings to evaluate male reproductive health. This involved localizing and tracking individual cells in real time, predicting their motility, and using bounding box data to assess sperm quality. The task emphasized both accuracy and processing efficiency, with subtasks involving graph data structures for motility prediction. 

Impressions of Student Participants

MediaEval is grateful to SIGMM for providing funding for three students who attended the MediaEval Workshop and greatly helped us with the organization of this edition: Iván Martín-Fernández and Sergio Esteban-Romero from Speech Technology and Machine Learning Group (GTHAU) – Universidad Politécnica de Madrid, and Xiaomeng Wang from Radboud University. Below the students provide their comments and impressions of the workshop.

“As a novel PhD student, I greatly valued my experience attending MediaEval 2023. I participated as the main author and presented work from our group, GTHAU – Universidad Politécnica de Madrid, on the Predicting Video Memorability Challenge. The opportunity to meet renowned colleagues and senior researchers, and learn from their experiences, provided valuable insight into what academia looks like from the inside. 

MediaEval offers a range of multimedia-related tasks, which may sometimes seem under the radar but are crucial in developing real-world applications. Moreover, the conference distinguishes itself by pushing the boundaries, going beyond just presenting results to foster a deeper understanding of the challenges being addressed. This makes it a truly enriching experience for both newcomers and seasoned professionals alike. 

Having volunteered and contributed to organizational tasks, I also gained first-hand insight into the inner workings of an academic conference, a facet I found particularly rewarding. Overall, MediaEval 2023 proved to be an exceptional blend of scientific rigor, collaborative spirit, and practical insights, making it an event I would highly recommend for anyone in the multimedia community.”

Iván Martín-Fernández, PhD Student, GTHAU – Universidad Politécnica de Madrid

“Attending MediaEval was an invaluable experience that allowed me to connect with a wide range of researchers and engage in discussions about the latest advancements in Artificial Intelligence. Presenting my work on the Multimedia Understanding of Smells in Text and Images (MUSTI) challenge was particularly insightful, as the feedback I received sparked ideas for future research. Additionally, volunteering and assisting with organizational tasks gave me a behind-the-scenes perspective on the significant effort required to coordinate an event like MediaEval. Overall, this experience was highly enriching, and I look forward to participating and collaborating in future editions of the workshop.”

Sergio Esteban-Romero, PhD Student, GTHAU – Universidad Politécnica de Madrid

“I was glad to be a student volunteer at MediaEval 2024. Collaborating with other volunteers, we organized submission files and prepared the facilities. Everyone was exceptionally kind and supportive.
In addition to volunteering, I also participated in the workshop as a paper author. I submitted a paper to the NewsImage task and delivered my first oral presentation. The atmosphere was highly academic, fostering insightful discussions. And I received valuable suggestions to improve my paper.  I truly appreciate this experience, both as a volunteer and as a participant.”

Xiaomeng Wang PhD Student, Data Science – Radboud University

Outlook to MediaEval 2025 

We are happy to announce that in 2025 MediaEval will be hosted in Dublin, Ireland, co-located with CBMI 2025. The Call for Task Proposals is now open, and details regarding submitting proposals can be found here: https://multimediaeval.github.io/2024/09/24/call.html. The final deadline for submitting your task proposals is Wed. 22nd January 2025. We will publish the list of tasks offered in March and registration for participation in MediaEval 2025 will open in April 2025.

For this edition of MediaEval we will again emphasize our “Quest for Insight”: we push beyond improving evaluation scores to achieving deeper understanding about the challenges, including data and the strengths and weaknesses of particular types of approaches, with the larger aim of understanding and explaining the concepts that the tasks revolve around, promoting reproducible research, and fostering a positive impact on society. We look forward to welcoming you to participate in the new benchmarking year.

Report from CBMI 2024

The 21st International Conference on Content-based Multimedia Indexing (CBMI) was hosted by Reykjavik University in cooperation with ACM, SIGMM, VTT and IEEE. The three-day event took place on September 20-22 in Reykjavik, Iceland. Like the year before, it was as an exclusively in-person event. Despite the remote location, an active volcano and in person attendance requirement, we are pleased to report that we had a perfect attendance of presenting authors. CBMI was started in France and still has strong European roots. Looking at the nationality of the submitting authors we can see 17 unique nationalities, 14 countries in Europe, 2 in Asia and 1 in North America.

Conference highlights

Figure 1: First keynote speaker being introduced.

Key elements of a successful conference are the keynote sessions. The first and opening keynote, titled “What does it mean to ‘work as intended’?” was presented by Dr. Cynthia C. S. Liem on day 1. In this talk Cynthia raised important questions on how complex it can be to define, measure and evaluate human-focused systems. Using real-world examples, she demonstrated how recently developed systems, that passed the traditional evaluating metrics, still failed when deployed in the real-world. Her talk was an important reminder that certain weaknesses in human-focused systems are only revealed when exposed to reality.

Figure 2: Keynote speaker Ingibjörg Jónsdóttir (left) and closing keynote speaker Hannes Högni Vilhjálmsson (right).

Traditionally there are only two keynotes at CBMI, first on day 1 and second on day 2. However, our planned second keynotes could not attend until the last day and thus a 3rd “surprise” keynote was organized on day 2 with the title “Remote Sensing of Natural Hazards”.  The speaker was Dr. Ingibjörg Jónsdóttir, an associate professor of geology at the University of Iceland. She gave a very interesting talk about the unique geology of Iceland, the threats posed by natural hazards and her work using remote sensing to monitor both sea ice and volcanoes. This talk was well received by attendees as it gave insight into the host country, the volcanic eruption that ended just a week before the start of the conference (7th in past 2 years on the Reykjanes Peninsula). This subject is highly relevant to community, as the analysis and prediction is based on multimodal data.

The planned second keynote took place in the last session on day 3 and was given by Dr. Hannes Högni Vilhjálmsson, professor at Reykjvik University. The talk, titled “Being Multimodal: What Building Virtual Humans has Taught us about Multimodality”, gave the audience a deep dive into lessons learnt from his 20+ years of experience of developing intelligent virtual agents with face-to-face communication skills. “I will review our attempts to capture, understand and analyze the multi-modal nature of human communication, and how we have built and evaluated systems that engage in and support such communication.” is a direct quote from his abstract of the talk. 

CBMI is a relatively small, but growing, conference that is built on a strong legacy and has a highly motivated community behind it. The special sessions have long played an important role at CBMI and this year there were 8 special sessions accepted.

  • AIMHDA: Advances in AI-Driven Medical and Health Data Analysis
  • CB4AMAS: Content-based Indexing for audio and music: from analysis to synthesis
  • ExMA: Explainability in Multimedia Analysis
  • IVR4B: Interactive Video Retrieval for Beginners
  • MAS4DT: Multimedia analysis and simulations for Digital Twins in the construction domain
  • MmIXR: Multimedia Indexing for XR
  • MIDRA: Multimodal Insights for Disaster Risk Management and Applications
  • UHBER: Multimodal Data Analysis for Understanding of Human Behaviour, Emotions and their Reasons
Figure 3: SS UHBER chair Dr.  E. Vildjunaite with a conference participant. 

The number of papers per session ranged from 2 to 8. The larger sessions (CB4AMAS, MmIXR and UHBER) used a discussion panel format that created a more inclusive atmosphere and, at times, sparked lively discussions. 

Figure 4: Images from the poster session and the IVR4B competition.

Especially popular with attendees was the competition that took place in the Interactive Video Retrieval for Beginners (IVR4B) session. This session was hosted right after the poster session in the wide open space of Reykjavik University’s foyer. 

Awards

The selection committee was unanimous in that the contribution of Lorenzo Bianchi, Fabio Carrara, Nicola Messina & Fabrizio Falchi, titled “Is CLIP the main roadblock for fine-grained open-world perception?”, was the best paper award winner. With the generous support of ACM SIGMM, they were awarded 500 Euros. As the best paper was indeed also a student paper, it was decided to also give the runner-up a 300 Euro award. The runner-up was the contribution of Recep Oguz Araz, Dmitry Bogdanov, Pablo Alonso-Jimenez and Frederic Font, titled “Evaluation of Deep Audio Representations for Semantic Sound Similarity”.

The best demonstration was awarded to Joshua David Springer, Gylfi Thor Gudmundsson and Marcel Kyas for “Lowering Barriers to Entry for Fully-Integrated Custom Payloads on a DJI Matrice”. 

The top two systems in the IVAR4B competition were also recognized: the first place was for Nick Pantelidis, Maria Pegia, Damianos Galanopoulos, et al. for “VERGE: Simplifying Video Search for Novice”; and the second place was for Giuseppe Amato, Paolo Bolettieri, Fabio Carrara, et al. for “VISIONE 5.0: toward evaluation with novice users”. 

Social events

The first day of the conference was quite eventful as before the poster and IVAR4B sessions Francois Pineau-Benois and Raphael Moraly of the Odyssée Quartet performed selected classical works in the “Music-meets-Science” cultural event. The goals of the latter are to bring live classical music content to the community of Multimedia Research. Musicians played a concert and then discussed with researchers, specifically involved into music analysis and retrieval. Such kind of exchanges between content creators and content analysis, indexing and retrieval researchers has been a distinctive feature of CBMI since 2018. 
This event would not have been possible without the generous support of ACM SIGMM.

The second day was no less entertaining as before the banquet attendees took a virtual flight over Iceland’s beautiful landscape via the services of FlyOver Iceland. 
The next CBMI’2025 will be hold in Dublin organized by DCU.

The 2nd Edition of the Spring School on Social XR organized by CWI

ACM SIGMM co-sponsored the second edition of the Spring School on Social XR, organized by the Distributed and Interactive Systems group (DIS) at CWI in Amsterdam. The event took place on March 4th – 8th 2024 and attracted 30 students from different disciplines (technology, social sciences, and humanities). The program included 22 lectures, 6 of them open, by 23 instructors. The event was organized by Irene Viola, Silvia Rossi, Thomas Röggla, and Pablo Cesar from CWI; and Omar Niamut from TNO. The event was co-sponsored by the ACM Special Interest  Group on Multimedia ACM SIGMM, making available student grants and supporting international speaker from under-represented countries, and The Netherlands Institute for Sound and Vision (https://www.beeldengeluid.nl/en).

Students and organisers of the Spring School on Social XR (March 4th – 8th 2024, Amsterdam)

“The future of media communication is immersive, and will empower sectors such as cultural heritage, education, manufacturing, and provide a climate-neutral alternative to travelling in the European Green Deal”. With such a vision in mind, the organization committee continued for a second edition with a holistic program around the research topic of Social XR. The program included keynotes and workshops, where prominent scientists in the field shared their knowledge with students and triggered meaningful conversations and exchanges.

A poster session at the CWI DIS Spring School 2024.

The program included topics such as the capturing and modelling of realistic avatars and their behavior, coding and transmission techniques of volumetric video content, ethics for the design and development of responsible social XR experiences, novel rending and interaction paradigms, and human factors and evaluation of experiences. Together, they provided a holistic perspective, helping participants to better understand the area and to initiate a network of collaboration to overcome current limitations of current real-time conferencing systems.

The spring school is part of the semester program organized by the DIS group of CWI. It was initiated in May 2022 with the Symposium on human-centered multimedia systems: a workshop and seminar to celebrate the inaugural lecture, “Human-Centered Multimedia: Making Remote Togetherness Possible” of Prof. Pablo Cesar. Then, it was continued in 2023 with the 1st Spring School on Social XR.

The list of talks were:

  • “Volumetric Content Creation for Immersive XR Experiences” by Aljosa Smolic
  • “Social Signal Processing as a Method for Modelling Behaviour in SocialXR” by Julie Williamson
  • “Towards a Virtual Reality” by Elmar Eisemann
  • “Meeting Yourself and Others in Virtual Reality” by Mel Slater
  • “Social Presence in VR – A Media Psychology Perspective” by Tilo Hartmann
  • “Ubiquitous Mixed Reality: Designing Mixed Reality Technology to Fit into the Fabric of our Daily Lives” by Jan Gugenheimer
  • “Building Military Family Cohesion through Social XR: A 8-Week Field Study” by Sun Joo (Grace) Ahn
  • “Navigating the Ethical Landscape of XR: Building a Necessary Framework” by Eleni Mangina
  • “360° Multi-Sensory Experience Authoring” by Debora Christina Muchaluat Saade
  • “QoE Assessment of XR” by Patrick le Callet
  • “Bringing Soul to Digital” by Natasja Paulssen
  • “Evaluating QoE for Social XR – Audio, Visual, Audiovisual and Communication Aspects” by Alexander Raake
  • “Immersive Technologies Through the Lens of Public Values” by Mariëtte van Huijstee
  • “Designing Innovative Future XR Meeting Spaces” by Katherine Isbister
  • “Evaluation Methods for Social XR Experiences” by Mark Billinghurst
  • “Recent Advances in 3D Videocommunication” by Oliver Schreer
  • “Virtual Humans in Social XR” by Zerrin Yumak
  • “The Power of Graphs Learning in Immersive Communications” by Laura Toni
  • “Boundless Creativity: Bridging Sectors for Social Impact” by Benjamin de Wit
  • “Social XR in 5G and Beyond: Use Cases, Requirements, and Standardization Activities” by Lea Skorin-Kapov
  • “An Overview on Standardization for Social XR”  by Pablo Perez and Jesús Gutiérrez
  • “Funding: The Path to Research Independence” by Sergio Cabrero

SIGMM Strike Teams Activity Report (April, 2024)

On April 10th, 2024, during the SIGMM Advisory Board meeting, the Strike Team Leaders, Touradj Ebrahimi, Arnold Smeulders, Miriam Redi and Xavier Alameda Pineda (represented by Marco Bertini) reported the results of their activity. They are summarized in the following in the form of recommendations that should be intended as guidelines and behavioral advice for our ongoing and future activity. SIGMM members in charge of SIGMM activities, SIGMM Conference leaders and particularly the organizers of the next ACMMM editions, are invited to adhere to these recommendations for their concerns, implement the items marked as mandatory and report to the SIGMM Advisory Board after the event.

All the SIGMM Strike Teams will remain in charge for two years starting January 1st, 2024 for reviews and updates.

The world is changing rapidly, and technology is driving these changes at an unprecedented pace. In this scenario, multimedia has become ubiquitous, providing new services to users, advanced modalities for information transmission, processing, and management, as well as innovative solutions for digital content understanding and production. The progress of Artificial Intelligence has fueled new opportunities and vitality in the field. New media formats, such as 3D, event data, and other sensory inputs, have become popular. Cutting-edge applications are constantly being developed and introduced.

SIGMM Strike Team on Industry Engagement

Team members: Touradj Ebrahimi (EPFL),Ali Begen (Ozyegin Univ), Balu Adsumilli (Google), Yong Rui (Lenovo) and ChangSheng Xu (Chinese Academy of Sciences)
Coordinator: Touradj Ebrahimi

The team provided recommendations for both ACMMM organizers and SIGMM Advisory Board. The recommendations addressed improving the presence of industry at ACMMM and other SIGMM Conferences/Workshops launching new in-cooperation initiatives and establishing stable bi- directional links.

  1. Organization of industry-focused events
    • Suggested / Mandatory for ACMMM Organizers and SIGMM AB: Create industry-focused promotional materials like pamphlets/brochures for industry participation (sponsorship, exhibit, etc.) in the style of ICASSP 2024 and ICIP 2024
    • Suggested for ACMMM Organizers: invite Keynote Speakers from industry, eventually with financial support of SIGMMM. Keynote talks should be similar to plenary talks but around specific application challenges.
    • Suggested for ACMMM Organizers: organize Special Sessions and Workshops around specific applications of interest to companies and startups. Sessions should be coordinated by industry with eventual support from an experienced and confirmed scholar.
    • Suggested for ACMMM Organizers: organize Hands-on Sessions led by industry to receive feedback on future products and services.
    • Suggested for ACMMM Organizers: organize Panel Sessions led by industry and standardization committees on timely topics relevant to industry e.g. How companies cope with AI.
    • Suggested for ACMMM Organizers: organize Tutorial sessions given by qualified people from industry and standardization committees at SIGMM-sponsored conferences/workshops
    • Suggested for ACMMM Organizers: promote contributions mainly from the industry in theform of Industry Sessions to present companies and their products and services.
    • Suggested for ACMMM Organizers and SIGMM AB: promote Joint SIGMM / Standardization workshop on latest standards e.g. JPEG meets SIGMM, MPEG meets SIGMM, AOM meets SIGMM.
    • Suggested for ACMMM Organizers: organize Job Fairs like job interview speed dating during ACMMM
  2. Initiatives for linkage
    • Mandatory for SIGMM Organizers and SIGMM AB: Create and maintain a mailing list of industrial targets, taking care of GDPR (Include a question in the registration form of SIGMM-sponsored conferences)
    • Suggested for SIGMM AB: organize monthly talks by industry leaders either from large established or SMEs or startups sharing technical/scientific challenges they face and solutions
  3. Initiatives around reproducible results and benchmarking
    • Suggested for ACMMM Organizers and SIGMM AB: support release of databases, studies on performance assessment procedures and metrics eventually focused on specific applications.
    • Suggested for ACMMM Organizers: organize Grand Challenges initiated and sponsored by industry.

Strike Team on ACMMM Format

Team Members: Arnold Smeulders (Univ. of Amsterdam), Alan Smeaton (Dublin City University), Tat Seng Chua (National University of Singapore), Ralf Steinmetz (Univ. Darmstadt), Changwen Chen (Hong Kong Polytechnic Univ.), Nicu Sebe (Univ. of Trento), Marcel Worring (Univ. of Amsterdam), Jianfei Cai (Monash Univ.), Cathal Gurrin (Dublin City Univ.).
Coordinator: Arnold Smeulders

The team provided recommendations for both ACMMM organizers and SIGMM Advisory Board. The recommendations addressed distinct items related to Conference identity, Conference budget and Conference memory.

1. Intended audience. It is generally felt that ACMMM is under pressure from neighboring conferences growing very big. There is consensus that growing big should not be the purpose of ACMMM: a 750 – 1500 size was thought to be ideal including being attractive to industry. Growth should come naturally.

  • Suggested for ACMMM Organizers and SIGMM AB: Promote distant travel by lowering fees for those who travels far
  • Suggested for ACMMM Organizers: Include (a personalized) visa invitation in the call for papers.

2. Community feel, differentiation and interdisciplinarity. Identity is not an actionable concern, but one of the shared common goods is T-shaped individuals interested in neighboring disciplines making an interdisciplinary or multidisciplinary connection. It is desirable to differentiate submitted papers from major close conferences like CVPR. This point is already implemented in the call for papers of ACMMM 2024.

    null
  • Mandatory for ACMMM OrganizersAsk in the submission how the paper fits in the multimedia community and its scientific tradition as illustrated by citations. Consider this information in the explicit review criteria.
  • Recommended for ACMMM Organizers: Support the physical presence of participants by rebalancing fees.
  • Suggested for ACMMM Organizers and SIGMM AB: Organize a session around the SIGMM test of time award, make selection early, funded by SIGMM.
  • Suggested for ACMMM Organizers: Organize moderated discussion sessions for papers on the same theme.

3. Brave New Ideas. Brave New is very well fitting with the intended audience. It is essential that we are able to draw out brave and new ideas from our community for long term growth and vibrancy. The emphasis in reviewing Brave New Ideas should be on the novelty even if it is not perfect. Rotate over a pool of people to prevent lock-in.

    null
  • Suggested / Mandatory for ACMMM OrganizersInclude in the submission a 3-minute pitch video to archive in the ACM digital library.
  • Suggested / Mandatory for ACMMM Organizers: Select reviewers from a pool of senior people to review novelty.
  • Suggested for ACMMM Organizers: Start with one session of 4 papers, if successful, add another session later.

4. Application. There should be no support for one specific application area exclusively in the main conference. Yet, applications areas should be focused in special sessions or workshops.

  • Suggested for ACMMM Organizers: Focus on application-related workshops or special sessions with own reviewing.

5. Presentation. When the core business of ACM MM is inter- and multi-disciplinarity it is natural to make the presentation for a broader audience part of the selection. ACM should make the short videos accessible as a service to the science or general public. TED-like videos for a paper fit naturally with ACMMM and fit with the trend in YouTube to communicate your paper. If too much to do, SIGMM AB should support reviewing the videos financially.

  • Mandatory to ACMMM Organizers: Include a TED-like 3-minute pitch video as part of the submission and this is archived by ACM Digital Library as part of the conference proceedings, to be submitted a week after the paper deadline for review, so there is time to prepare it after the regular paper submission.

6. Promote open-accessFor a data-driven and fair comparison promote open access of data to be used in the next conference to compare to.

  • Suggested for SIGMM AB: Open access for data encouraged.

7. Keynotes. For the intended audience and interdisciplinary, it is felt essential to have keynote on the key-topics of the moment. Keynotes should not focus on one topic but maintaining the diversity of topics in the conference and over the years, so to be sure new ideas are inserted in the community.

  • Suggested to SIGMM AB: to directly fund a big name, expensive, marquee keynote speaker sponsored by SIGMM to one of the societally urgent key-notes as evident from news.

8. Diversity over subdisciplines, etc Do extra effort for Arts, GenAI use models, security, HCI and demos. We need to ensure that if the submitted papers are of sufficiently high quality, there should be at least a session on that sub- topic in the conference. We need to ensure that the conference is not overwhelmed by a popular topic with easy review criteria and generally of much higher review scores.

  •  Suggested for ACMMM Organizers: Promote diversity of all relevant topics in the call for papers and by action in subcommunities by an ambassador. SIGMM will supervise the diversity.

9. Living report. To enhance the institutional memory, maintain a living document passed on from organizer to organizer, with suggestions. The owner of the document is the commissioner for conferences of SIG MM.

  • Mandatory for ACMMM Organizers and SIGMM AB: A short report to the SIGMM commissioner for conferences from the ACMMM chair, including a few recommendations for the next time; handed over to the next conference after the end of the current conference.

SIGMM Strike Team on Harmonization and Spread

Team members: Miriam Redi (Wikimedia Foundation), Sivia Rossi (CWI), Irene Viola (CWI), Mylene Farias (Texas State Univ. and Univ. Brasilia), Ichiro Ide (Nagoya Univ), Pablo Cesar (CWI and TU Delft).
Coordinator: Miriam Redi

The team provided recommendations for both ACMMM organizers and SIGMM Advisory Board. The recommendations addressed distinct items related to give SIGMM Records and Social Media a more central role in SIGMM, integrate SIGMM Records and Social Media in the whole process of the ACMMM organization since its initial planning.

1. SIGMM Website The SIGMM Website is not updated and needs a serious overhaul.

  • Mandatory for SIGMM AB: restart the website from scratch being inspired by other SIGs f.e. reaching out to people at CHI to understand what can be done. Budget should be provided by SIGMM.

2. SIGMM Social Media Channels SIGMM Social media accounts (twitter and linkedin) are managed by the Social Media Team at the SIGMM Records

  • Suggested for SIGMM AB: continuing this organization expanding responsibilities of the team to include conferences and other events

3. Conference Social Media: Social media presence of conferences is managed by the individual conferences. It is not uniform and disconnected from SIGMM social media and the Records. The social media presence of ACMMM flagship conference is weak and needs help. Creating continuity in terms of strategy and processes across conference editions is key.

  • Mandatory for ACMMM Organizers and SIGMM AB: create a Handbook of conference communications: a set of guidelines about how to create continuity across conference editions in terms of communications, and how to connect the SIGMM Records to the rest of the community.
  • Suggested for ACMMM Organizers and SIGMM AB: one member of the Social Media team at the SIGMM Records is systematically invited to join the OC of major conferences as publicity co-chair. The steering committee chair of each conference should commit to keeping the organizers of each conference edition informed about this policy, and monitor its implementation throughout the years.

SIGMM Strike Team on Open Review

Team members: Xavier Alameda Pineda (Univ. Grenoble-Alpes), Marco Bertini (Univ. Firenze). Coordinator: Xavier Alameda Pineda

The team continued the support to ACMMM Conference organizers for the use of Open Review in the ACMMM reviewing process, helping to implement new functions or improve the existing ones and supporting smooth transfer of the best practices. The recommendations addressed distinct items to complete the migration and stabilize use of Open Review in the future ACMMM editions.

1. Technical development and support

  • Mandatory for the Team: update and publish the scripts; complete the Open Review configuration.
  • Mandatory for SIGMM AB and ACMMM organizers: create a Committee led by the TPC chairs of the current ACMM edition a rotating basis.

2. Communication

  • Mandatory for the Team: write a small manual for use and include it in the future ACMMM Handbook.

Alberto Del Bimbo                                                                                        
SIGMM Chair

First edition of the Social Robotics, Artificial Intelligence and Multimedia (SoRAIM) School


In February 20204 was held the first edition of the Social Robotics, Artificial Intelligence and Multimedia (SoRAIM) Winter School, which, with the support of SIGMM attracted more than 50 students and young researchers to learn, discuss and first-hand experiment in topics related to social robotics. The event’s success calls for further editions in upcoming years.

Rationale for SoRAIM

SPRING, a collaborative research project funded by the European Commission under Horizon 2020, is coming to an end in May 2024. Its scientific and technological objectives were to test a versatile social robotic platform within a hospital and have it perform social activities in a multi-person, dynamic setup are in most part achieved. In order to empower the next generation of young researchers with concepts and tools to answer tomorrow’s challenges in the field of social robotics, one must tackle the issue of knowledge and know-how transmission. We therefore chose to provide a winter school, free of charge to the participants (thanks to the additional support of SIGMM), so that as many students and young researchers from various horizons (not only technical fields) could attend. 

Contents of the Winter School

The Social Robotics, Artificial Intelligence and Multimedia (SoRAIM) Winter School took place from 19 to 23 February 2024 in Grenoble, France. An introduction to the contents of the school and the context provided by the SPRING project was provided, and a demonstration combining social navigation and dialogue interaction was given on the first day. This triggered the curiosity of the participants, and a spontaneous Q&A session with the contributions, questions and comments from the participants to the school was held. 

The school spanned over the entire week, with 17 talks, 8 speakers from the H2020 SPRING project, and 9 invited speakers external to the project. The school also included a panel discussion on the topic “Are social robots already out there? Immediate challenges in real-world deployment”, a poster session with 15 contributions, and two hands-on sessions where the participants could choose among the following topics: Robot navigation with Reinforcement Learning, ROS4HRI: How to represent and reason about humans with ROS, Building a conversational system with LLMs using prompt engineering, Robot self-localisation based on camera images, and Speaker extraction from microphone recordings. A social activity (visit of Grenoble’s downtown and Bastille) was organised on Thursday afternoon, allowing participants to mingle with speakers and to discover the host town’s history.

One of the highlights of SoRAIM was its Panel Session, which topic was “Are social robots already out there? Immediate challenges in real-world deployment”.  Although no definitive answers were found, the session stressed the fact that challenges remain numerous for the deployment of actual social robots in our everyday lives (at work, at home). On the technical side, because robotic platforms are subject to certain hardware and software constraints. On the hardware side, because sensors and actuators are restricted in size, power and performance, since the physical space and the battery capacity are also limited. On the software side, because large models can be used if lots of computing resources are permanently available, which is not always the case, since they need to be shared between the various computing modules. Finally on the regulatory and legal side, because the rise of AI use is fast and needs to be balanced with ethical views that address our society’s needs; but the construction of proper laws, norms and their acknowledgement and understanding by stakeholders is slow. In this session the panellists surveyed all aspects of the problems at hand and provided an overview of the challenges that future scientists will need to solve in order to take social robots out of the labs and into the world.

Attendance & future perspectives

SoRAIM attracted 57 participants through the whole week. The attendees were diverse, as was aimed initially, with a breakdown of 50% of PhD students, 20% of young researchers (public sector), 10% of engineers and young researchers (private sector), and 20% of MSc students. Of particular focus, the ratio of women attendees was close to 40%, which is double of the usual in this field. Finally, in terms of geographic spread, attendees came in majority from other European countries (17 countries total), with just below 50% attendees coming from France. Following the school, a satisfaction survey was sent to the attendees in order to better grasp which elements were the most appreciated in view of a longer-term objective to hold this winter school as a serial event. Given the diverse background of attendees, opinions on contents such as the hands-on session varied, but overall satisfaction was very high, which shows the interest of the next generation of researchers for more opportunities to learn in this field. We are currently reviewing options to held similar events each year or every two years, depending on available funding.

More information about the SoRAIM winter school is available on the webpage: https://spring-h2020.eu

Sponsors

SoRAIM was sponsored by the H2020 SPRING project, Inria, the University Grenoble Alpes, the Multidisciplinary Institute of Artificial Intelligence and by ACM’s Special Interest Group on Multimedia (SIGMM). Through ACM SIGMM, we received significant funding which allowed us to invite 14 students and young researchers, members of SIGMM, from abroad.

Full list of contributions

All the talks are available in replay on our YouTube channel: https://www.youtube.com/watch?v=ckJv0eKOgzY&list=PLwdkYSztYsLfWXWai6mppYBwLVjK0VA6y
The complete list of talks and posters presented at SoRAIM Winter School 2024 can be found here: https://spring-h2020.eu/soraim/
In the following, the list of talks in chronological order: