Music Meets Science at ACM Multimedia 2025

Multimedia research is framed through algorithms, datasets, and systems, but at its heart lies content that is deeply human. Few forms of content illustrate this better than classical music. Long before music becomes data to be recorded, generated, searched, or retrieved, it is imagined by composers and brought to life by performers. At ACM Multimedia 2025 in Dublin, this human origin of multimedia took centre stage in a unique social event that bridged classical music and multimedia content analysis. This event was the fourth supported by ACM SIGMM in the framework of Music Meets Science program (at CBMI’2022, CBMI’2023, CBMI’2024).

Music Meets Science explores musical spaces across centuries and styles, from the dynamic Folia of Vivaldi and Handel’s Passacaglia to works by Schubert and contemporary composers from around the world. The goal here is to bring a wide range of music performed by some of the original content creators, our classical musicians, to the multimedia research community, who explore and mine this content. It brings fundamental cultural values to the young researchers in Multimedia, opening their minds to classical and contemporary music which oscillates with the rhythm of centuries.

The concert took place on 29 October, starting at 8:00 PM, during the Welcome Reception of ACM Multimedia 2025. It was attended by over 1,000 delegates of all ages from doctoral students to senior researchers. The programme featured music by Irish composer Garth Knox and a new composition by Finnish composer Jarno Vanhanen, written especially for ACM Multimedia. The performance was delivered by internationally acclaimed French musicians of the new generation: François Pineau-Benois (violin) and Olivier Marin (viola), see Figure 1. Together, they invited the audience to experience music not only as sound, but as rich multimedia content shaped by structure, expression, interpretation, and context.

Figure 1. François Pineau-Benois (violin), Oliver Marin (viola) performing Jarno Vanhanen’s “Aurora Borealis” duet.

By embedding live performance within a major multimedia conference, Music Meets Science highlights the importance of integrating creative arts into the research ecosystem. As multimedia research continues to advance, from content understanding to generation, events like this remind us that artistic practice is not just an application domain, but a source of inspiration. Strengthening the dialogue between creative arts and multimedia research can deepen our understanding of content, context, and meaning, and enrich the future directions of the field.

Welcome message from the SIGMM Executives

Dear colleagues and friends,

We would like to begin by sincerely thanking the SIGMM community for the trust you have placed in us. We are honored to serve as Chairs alongside a talented and dedicated team. A special thanks goes to the previous Executive Committee for their outstanding work during challenging times and for laying down a solid foundation for the future.

We are at an exciting juncture. Multimedia is no longer just a field, it is the connective tissue of modern life. From intelligent communication and immersive experiences to AI-generated content and digital twins, multimedia systems are shaping how we learn, work, and connect. Our community is uniquely positioned to lead in this space.

Over the next two years, we want to focus on presence — not only in terms of emerging technologies, but also in how SIGMM can be present for researchers around the world. Together, we will:

  • Champion young researchers and amplify their voices.
  • Promote open science by supporting the sharing of code, data, and reproducible research.
  • Increase industry engagement by creating meaningful bridges between academia and application.
  • Strengthen our global presence through active local chapters and outreach.
  • Ensure SIGMM remains a space that is inclusive, diverse, and equitable, a community where everyone feels welcome and empowered.
  • Position SIGMM conferences and journals as the leading venue for applied AI in multimodal systems and showcasing how sensing, understanding, generation, and interaction converge to solve real-world challenges.

Let us continue to work together and to make an impact, inspired by the richness of multimedia research and united by a shared commitment to excellence and openness.

Abdulmotaleb, Elisa and Silvia


Abdulmotaleb El Saddik is an award-winning technologist and Distinguished Professor whose leadership in Embodied AI, Digital Twins, and Mixed Reality bridges innovation, mentorship, and human impact.

Elisa Ricci is a Professor at University of Trento and Senior researcher at Fondazione Bruno Kessler in Italy. Her research interests include computer vision and multimedia analysis.

Silvia Rossi is a senior scientist at Centrum Wiskunde & Informatica (CWI) in The Netherlands. Her research interests are at the intersection of multimedia systems, artificial intelligence, and user behaviour modelling for immersive and interactive systems.

The MediaEval Benchmark Looks Back at a Successful Fifteenth Edition, and Forward to its Sweet Sixteen

Introduction

The Benchmarking Initiative for Multimedia Evaluation (MediaEval) organizes interesting and engaging tasks related to multimedia data. MediaEval is proud to be supported by SIGMM. Tasks involve analyzing and exploring multimedia collections, as well as accessing the information that they contain. MediaEval emphasizes challenges that have a human or social aspect in order to support our goal of making multimedia a positive force in society. Participants in MediaEval are encouraged to submit effective, but also creative solutions to MediaEval tasks: We carry out quantitative evaluation of the submissions, but also go beyond the scores in order to obtain insight into the tasks, data, metrics. 

Participation in MediaEval is open to any team that wishes to sign up. Registration has just opened and information is available on the MediaEval 2026 website: https://multimediaeval.github.io/editions/2026 The workshop will take place in Amsterdam, Netherlands and online coordinated with ACM ICMR https://icmr2026.org

In this column, we present a short report on MediaEval 2025, which culminated with the annual workshop in Dublin, Ireland between CBMI (https://www.cbmi2025.org) and ACM Multimedia (https://acmmm2025.org). Then, we provide an outlook to MediaEval 2026, which will be the sixteenth edition of MediaEval.

A Keynote on Metascience

The workshop kicked off with a keynote on metascience for machine learning. The metascience initiative (https://metascienceforml.github.io) strives to promote discussion and development of the scientific underpinnings of machine learning. It looks at the way in which machine learning is done and examines the full range of relevant aspects, from methodologies and mindsets. The keynote was delivered by Jan van Gemert, head of the Computer Vision Lab (https://www.tudelft.nl/ewi/over-de-faculteit/afdelingen/intelligent-systems/pattern-recognition-bioinformatics/computer-vision-lab) at Delft University of Technology. He discussed the “growing pains” of the field of deep learning and the importance of the scientific method for keeping the field on course. He invited the audience to consider the question of the power of benchmarks for hypothesis-driven science in machine learning and deep learning.

Tasks at MediaEval 2025

The MediaEval 2025 tasks reflect the benchmark’s continued emphasis on human-centered and socially relevant multimedia challenges, spanning healthcare, media, memory, and responsible use of generative AI.

Several tasks this year focused on the human aspects of multimodal analysis, combining visual, textual, and physiological signals. The Medico Task challenges participants in building visual question answering models for the interpretation of gastrointestinal images, aiming to support clinical decision-making through interpretable multimodal explanations. The Memorability Task focuses on modeling long-term memory for short movie excerpts and commercial videos, requiring participants to predict how memorable a video is, whether viewers are familiar with it, and, in some cases, to leverage EEG signals alongside visual features. Multimodal understanding is further explored in the MultiSumm Task, where participants are provided with collections of multimodal web content describing food sharing initiatives in different cities and are asked to generate summaries that satisfy specific informational criteria, with evaluation exploring both traditional and emerging LLM-based assessment approaches.

The remaining two tasks emphasize the societal impact of multimedia technology in real-world settings. In the NewsImagesTask, participants worked with large collections of international news articles and images, either retrieving suitable thumbnail images or generating thumbnails for articles. The Synthetic Images Task addressed the growing prevalence of AI-generated content online, asking participants to detect synthetic or manipulated images and localize manipulated areas. The task used data created by state-of-the-art generative models as well as images collected from real-world online settings. We gratefully acknowledge the support of AI-CODE (https://aicode-project.eu), a European project focused on topics related to these two tasks.

MediaEval in Motion

MediaEval is especially proud of participants who return over the years, improving their approaches and contributing insights. We would like to highlight two previous participants who became so interested and involved in MediaEval tasks that they decided to join the task organization team and help organize the tasks. Iván Martín-Fernández, PhD student at Universidad Politécnica de Madrid, became a task organizer for the Memorability task and Lucien Heitz, PhD Student, University of Zurich, became a task organizer for NewsImages. 

One aspect of the MediaEval Benchmark I value most is its effort to go beyond metric-chasing and embark on a “quest for insights,” as the organizers put it, to help us better understand the tasks and encourage creative, innovative solutions. This spirit motivated me to participate in the 2023 Memorability Task in Amsterdam. The experience was so enriching that I wanted to become more involved in the community. In 2025, I was invited to join the Memorability Task organizing team, which gave me the chance to contribute to and help foster this innovative research effort. Thanks to SIGMM’s sponsorship, I was able to attend the event in Dublin, which further enhanced the experience. Working alongside Martha and Gabi as a student volunteer is always a pleasure. As my PhD studies come to an end, I’m proud to say that MediaEval has been a core part of my research, and I’m sure it will remain so in the immediate future. See you in Amsterdam in June!

Iván Martín-Fernández, PhD Student, GTHAU – Universidad Politécnica de Madrid

I ‘graduated’ from being a participant in the previous NewsImages challenge to now taking over the organization duties of the 2025 iteration of the task. It was an incredible journey and learning experience. Big thank you to the main MediaEval organizers for their tireless support and input for shaping this new task that combines image retrieval and generation. The recent benchmark event presented an amazing platform to share and discuss our research. We got so many great submissions from teams around the globe. I was truly overwhelmed by the feedback. Getting involved with the organization of a challenge task is something I can highly recommend to all participants. It allows you to take on an active role and bring new ideas to the table on what problems to tackle next.

Lucien Heitz, PhD Student, University of Zurich

MediaEval continues its tradition of awarding a “MediaEval Distinctive Mention” to teams that dive deeply into the data, the algorithms, and the evaluation procedure. Going above and beyond in this way makes important contributions to our understanding of the task and how to make meaningful progress. Moving the state of the art forward requires improving the scores on a pre-defined benchmark task such as the tasks offered by MediaEval. However, MediaEval Distinctive Mentions underline the importance of research that does not necessarily improve scores on a given task, but rather makes an overall contribution to knowledge.

We were happy to serve as student volunteers at MediaEval 2025. In addition, we participated as a team in the NewsImage task, contributing to two subtasks, and were honored to receive a Distinctive Mention. 

Xiaomeng had previously participated in the same task at MediaEval 2023. Compared to the 2023 edition, she observed notable evolution in both the data and task design. These changes reflect the organizers’ careful consideration of recent advances in modeling techniques as well as the practical applicability of the datasets, which proved to be highly inspiring. 

Bram participated in MediaEval for the first time and particularly found the discussions with colleagues about the challenges very rewarding. The NewsImage retrieval subtask additionally got him to learn how to deal with larger datasets. 

We tried to incorporate deeper reflections on our results into our presentation. Specifically, we showed how certain types of articles are particularly suited for image generation and identified the news categories where retrieval was most effective. 

Xiaomeng Wang and Bram Bakker PhD Students, Data Science – Radboud University

The people whose work is highlighted in this section are grateful to have received support from SIGMM in order to be able to attend the MediaEval workshop in person. 

Outlook to MediaEval 2026

The 2025 workshop concluded with participants collaborating with the task organizers to start to develop “benchmark biographies”, which are living documents that describe benchmarking tasks. Combining elements from data sheets and model cards, benchmark biographies document motivation, history, datasets, evaluation protocols, and baseline results to support transparency, reproducibility, and reuse by the broader research community. We plan to continue work on these benchmark biographies as we move toward MediaEval 2026. 

Further, in the 2026 edition, we will offer again the tasks that were held in 2025 to provide an opportunity for teams who were not able to participate in 2025. We especially encourage “Quest for Insight” papers that examine characteristics of the data and the task definitions, the strengths and weaknesses of particular types of approaches, observations about the evaluation procedure, and the implications of the task. 
We look forward to seeing you in Amsterdam for MediaEval and also ACM ICMR. Don’t forget to check out the MediaEval website (https://multimediaeval.github.io) and register your team if you are interested in participating in 2026.

ACM SIGMM Multimodal Reasoning Workshop

The ACM SIGMM Multimodal Reasoning Workshop was held on 8–9 November 2025 at Indian Institute of Technology Patna (IIT Patna) in Hybrid mode. Organised by Dr. Sriparna Saha, faculty member of the Department of Computer Science and Engineering, IIT Patna and supported by ACM SIGMM, the two-day event brought together researchers, students, and practitioners to discuss the foundations, methods, and applications of multimodal reasoning and generative intelligence. The workshop registered 108 participants and featured invited talks, tutorials, and hands-on sessions by national and international experts. Sessions covered topics ranging from trustworthy AI, LLM fine-tuning, temporal and multimodal reasoning, to knowledge-grounded visual question answering and healthcare applications.

Inauguration:

 During the inauguration, Dr. Sriparna Saha welcomed participants and acknowledged the presence and support of Prof. Jimson Mathew (Dean of Student Affairs, IIT Patna), Prof. Rajiv Ratn Shah (IIIT Delhi) and all speakers. The organising committee expressed gratitude to ACM SIGMM for its financial support, which made the workshop possible. Felicitations were exchanged, and the inauguration concluded with words of encouragement for active participation and interdisciplinary collaboration.

Session summaries and highlights:

Day 1:

 The first day began with an inaugural session followed by a series of engaging talks and tutorials. Prof. Rajiv Ratn Shah (Associate Professor, IIIT Delhi, India) delivered the opening talk on “Tackling Multimodal Challenges with AI: From User Behavior to Content Generation,” highlighting the role of multimodal data in understanding user behaviour, enabling AI-driven content generation, and building region-specific applications such as voice conversion and video dubbing for Indic languages. Prof. Sriparna Saha (Associate Professor, Department of Computer Science and Engineering, IIT Patna, India) then presented “Harnessing Generative Intelligence for Healthcare: Models, Methods, and Evaluations,” discussing safe, domain-specific AI systems for healthcare, multimodal summarisation for low-resource languages, and evaluation frameworks like M3Retrieve and multilingual trust benchmarks. The afternoon sessions featured hands-on tutorials and technical talks. Ms. Swagata Mukherjee (Research Scholar, IIT Patna, India) conducted a tutorial on “Advanced Prompting Techniques for Large Language Models,” covering zero/few-shot prompting, chain-of-thought reasoning, and iterative refinement strategies. Mr. Rohan Kirti (Research Scholar, IIT Patna, India) led a tutorial on “Exploring Multimodal Reasoning: Text & Image Embeddings, Augmentation, and VQA,” demonstrating text–image fusion using models such as CLIP, VILT, and PaliGemma. Prof. José G. Moreno (Associate Professor, IRIT, France) presented “Visual Question Answering about Named Entities with Knowledge-Based Explanation (ViQuAE),” introducing a benchmark for explainable, knowledge-grounded VQA. The day concluded with Prof. Chirag Agarwal (Assistant Professor, University of Virginia, USA) delivering a talk on “Trustworthy AI in the Era of Frontier Models,” which emphasised fairness, safety, and alignment in multimodal and LLM systems.

Day 2:

 The second day continued with high-level technical sessions and tutorials. Prof. Ranjeet Ranjan Jha (Assistant Professor, Department of Mathematics, IIT Patna, India) opened with a talk on “Bridging Deep Learning and Multimodal Reasoning: Generative AI in Real-World Contexts,” tracing the evolution of deep learning into multimodal generative models and discussing ethical and computational challenges in deployment. Prof. Adam Jatowt (Professor, Department of Computer Science, University of Innsbruck, Austria) followed with a presentation on “Analyzing and Improving Temporal Reasoning Capabilities of Large Language Models,” showcasing benchmarks such as BiTimeBERT, TempRetriever, and ComplexTempQA, while proposing methods to enhance time-sensitive reasoning. The final technical session featured Mr. Syed Ibrahim Ahmad (Research Scholar, IIT Patna, India) conducted a tutorial on “LLM Fine-Tuning,” which covered PEFT approaches, QLoRA, quantization, and optimization techniques to fine-tune large models efficiently.

Valedictory session:

 The valedictory session marked the formal close of the workshop. Dr. Sriparna Saha thanked speakers, participants and the organising team for active engagement across technical talks and tutorials. Participants shared positive feedback on the depth and practicality of sessions. Certificates were distributed to attendees. Final remarks encouraged continued research, collaboration and dissemination of resources. Dr. Saha reiterated gratitude to ACM SIGMM for financial support.

Outcomes, observations and suggested actions:

  • Multimodal reasoning remains an interdisciplinary challenge that benefits from close collaboration between multimedia, NLP, and application domain experts.
  • Trustworthiness, safety, and evaluation (benchmarks and metrics) are critical for moving multimodal models from demonstration to practice especially in healthcare and other high-stakes domains.
  • Practical methods for model adaptation (PEFT, quantization) make large models accessible for research groups with limited compute.
  • Datasets and retrieval resources that combine multimodal inputs with external knowledge (as in ViQuAE) are valuable for advancing explainable VQA and grounded reasoning.
  • The community should prioritise regional and language-diverse resources (Indic languages, code-mixed data) to ensure equitable benefits from multimodal AI.
  • SIGMM and ACM venues can play a role in fostering collaborations via special projects, regional hackathons, grand challenges, and multimodal benchmark initiatives.

Outreach & social media:

 The workshop generated significant visibility on LinkedIn and other professional networks. Photos and session highlights were widely shared by participants and organisers, acknowledging ACM SIGMM support and the quality of the technical programme.

Acknowledgements:  The organising committee thanks all speakers, attendees, student volunteers, and ACM SIGMM for financial and logistic support that enabled the workshop.

Reports from ACM Multimedia 2025

URL:  https://acmmm2025.org

Date: Oct 27 – Oct 31, 2025
Place:  Dublin, Ireland
General Chairs: Cathal Gurrin, Klaus Schoeffmann, Min Zhang, Adapt Centre & DCU, Klagenfurt University, Tsinghua University

Introduction

The ACM Multimedia Conference 2025, held in Dublin, Ireland from October 27 to October 31, 2025, continued its tradition as a premier international forum for researchers, practitioners, and industry experts in the field of multimedia. This year’s conference marked an exciting return to Europe, bringing the community together in a city renowned for its rich cultural heritage, innovation-driven ecosystem, and welcoming atmosphere. ACM MM 2025 provided a dynamic platform for presenting state-of-the-art research, discussing emerging trends, and fostering collaboration across diverse areas of multimedia computing.

Hosted in Dublin—a vibrant hub for both technology and academia—the conference delivered a seamless and engaging experience for all attendees. As part of its ongoing mission to support and encourage the next generation of multimedia researchers, SIGMM awarded Student Travel Grants to assist students facing financial constraints. Each recipient received up to 1,000 USD to help offset travel and accommodation expenses. Applicants completed an online form, and the selection committee evaluated candidates based on academic excellence, research potential, and demonstrated financial need.

To shed light on the experiences of these outstanding young scholars, we interviewed several travel grant recipients about their participation in ACM MM 2025 and the conference’s influence on their academic and professional development. Their reflections are shared below.

Wang Zihao – Zhejiang University

This was not my first time attending ACM Multimedia—I also participated in ACMMM 2022—but coming back in 2025 has been just as fantastic. There were many memorable moments, but two of them stood out the most for me. The first was the beautiful violin performance during the Volunteer Dinner, which created such a warm and elegant atmosphere. The second was the Irish drumming performance at the conference banquet on October 30th. It was incredibly energetic and truly unforgettable. These moments reminded me how special it is to be part of this community, where academic exchange and cultural experiences blend together so naturally.

I am truly grateful for the SIGMM Student Travel Grant. The financial support made it possible for me to attend the conference in person, and I really appreciate the effort that SIGMM puts into supporting students. One of the most valuable aspects of this trip was meeting researchers from all over the world who work in areas similar to mine—especially those focusing on music, audio, and multimodality. Having deep, face-to-face conversations with them was inspiring and has given me many new ideas to explore in my future research.

As for suggestions, I honestly think the SIGMM Student Travel Grant program is already doing an amazing job in supporting young scholars like us. My only small hope is for a smooth reimbursement process.

Overall, I feel incredibly fortunate to be here again, reconnecting with the ACM MM community and learning so much from everyone. I’m thankful for this opportunity and excited to continue growing in this field.

Huang Feng-Kai – National Taiwan University

Attending ACM Multimedia 2025 in Dublin was my first time joining the conference, and it has been an unforgettable experience. Everything was so well-organized, and I truly enjoyed every moment. The welcome reception was especially memorable—the food was delicious, the atmosphere was lively, and it was inspiring to see so many renowned researchers and professors chatting enthusiastically. It really felt like the perfect start to my ACM MM journey.

I am deeply grateful to SIGMM for the Student Travel Grant. As a student traveling all the way from Taiwan, attending a conference in Europe is a major financial challenge. The grant covered my accommodation, meals, and flights, which made it possible for me to participate without worrying too much about the cost. Being here has really broadened my horizons. I was able to learn about so many fascinating research topics and meet many kind, talented researchers who generously shared their thoughts with me. These conversations gave me a lot of inspiration for my own work.

I also had the chance to serve as a volunteer, which became my first experience working with an international team. Collaborating with people from different cultural and academic backgrounds helped me improve my communication skills and made the conference even more meaningful.

I truly believe the SIGMM Student Travel Grant is an amazing program that enables students from all over the world to join this vibrant community, exchange ideas, and form new collaborations. My only wish is that SIGMM will continue offering this opportunity in the future. This grant brings so much energy to young researchers like me and plays an important role in supporting the next generation of the multimedia community. I am sincerely thankful for everything this experience has given me, and I look forward to returning to ACM Multimedia in the coming years.

Wang Hao (Peking University)

Attending ACM Multimedia 2025 was my very first time participating in the conference, and the experience was truly amazing. The moment that impressed me the most was having the chance to present my own paper. As a non-native English speaker, giving an academic talk on an international stage was both challenging and rewarding. I felt nervous at first, but I’m really proud of how I managed to deliver my presentation. It was a big milestone for me.

I’m incredibly grateful to SIGMM for the Student Travel Grant, which made it possible for me to attend an international conference for the first time. Without this support, I wouldn’t have been able to experience such a meaningful academic event. Throughout the conference, I met so many new friends, attended inspiring talks, and gained fresh perspectives on multimedia research. These experiences have broadened my view of the field and will definitely influence the direction of my future work.

I’m thankful for this opportunity and truly appreciate how welcoming and encouraging the ACM MM community is. This conference has given me motivation and confidence to continue growing as a researcher.

Yu Liu (University of Electronic Science and Technology of China)

This is my first time attending ACM Multimedia. I am a PhD student at the University of Electronic Science and Technology of China (UESTC), currently spending a year at the University of Auckland, New Zealand, as part of a joint PhD program. It took 25 hours to travel from Auckland to Dublin, but the journey was completely worth it. The conference has been vibrant and intellectually engaging. I had the honor of being the first speaker in my session, and it was incredibly fulfilling to see the audience show genuine interest and appreciation for our work. Outside the sessions, I thoroughly enjoyed immersing myself in Irish culture—tasting the smooth, rich Guinness, watching lively tap dancing, and listening to traditional Irish music. Overall, it has been an inspiring and truly memorable experience.

The SIGMM Student Travel Grant played a vital role in making my attendance possible. In recent years, UESTC has discontinued funding for PhD students’ conference travel, transferring the financial responsibility entirely to individual research groups. Receiving this grant was crucial, allowing me to attend ACM MM 2025 without placing additional strain on my research team’s limited budget. Attending the conference in person provided an invaluable opportunity to present my research, exchange ideas face-to-face with international scholars, and receive constructive feedback from leading experts. These experiences fostered meaningful academic connections and opened doors for potential long-term collaborations that online participation simply cannot replace.

My biggest takeaway from ACM MM 2025 is the inspiration I gained from being part of such a diverse and passionate research community, which has motivated me to continue advancing in the field of responsible AI. I also really enjoyed the volunteer “Thank You” dinner—it was a wonderful experience. At the same time, I noticed that it is not always easy for students to approach professors they do not know personally. In the future, including short icebreaker or networking activities could help start conversations more naturally, making the conference experience even more valuable for students like me.

Li Deng (LUT University)

This is my first time attending ACM Multimedia, and my experience has been exceptionally positive. I was particularly impressed by the workshops relevant to my research area, as the discussions provided valuable insights that are already influencing my ongoing work. I was also struck by the abundance and quality of the social events and networking opportunities, which made it easy to connect with senior researchers and fellow students from diverse backgrounds.

Receiving the SIGMM Student Travel Grant significantly reduced the financial burden of travel and accommodation, allowing me to attend the conference in person without major financial stress. The opportunity to present my work and engage in discussions with leading researchers has greatly supported my academic development. I received direct feedback and established connections that may lead to future collaborations. My biggest takeaway from ACM MM 2025 is a deeper understanding of the rapid development and impact of multimodal large language models.

Looking ahead, I suggest that the SIGMM Student Travel Grant program collaborate with the main conference to organize sessions such as a “Career Forum” for grant recipients and other student volunteers, providing additional guidance and support for early-career researchers.

Summer School on Multimodal Foundation Models and Generative AI Second edition

Organizer: Prof. Mohamed Daoudi, Institut Mines-Télécom Nord Europe (IMT Nord Europe), France.

Co-Organizers: Prof. Ahmed Tamtaoui Institut National des Postes et Télécommunications (INPT), Morocco, Prof. Mohamed Khalil (MorroccoAI), Morocco, Jamal Benhamou (Soft Center), Morocco.

In September 2025, it was held the second edition of the Summer School on Multimodal Foundation Models and Generative AI, which, with the support of SIGMM attracted more than 60 students and young researchers to learn, discuss and first-hand experiment in topics related to Generative AI. The event’s success calls for further editions in upcoming years.

The 2nd edition of the Summer School dedicated to Generative AI and Multimodal Foundation Models was held from September 8 to 12, 2025, in Rabat, Morocco. Over five days, 60 students, researchers, and professionals—selected from more than 1,300 applications—took part in an intensive program combining theoretical courses, hands-on workshops, keynote lectures, evening mentorship sessions, and a hackathon. Following the first edition in 2024, led by INPT, IMT Nord Europe, and the Soft Centre, this new edition was organized in partnership with MoroccoAI, an initiative led by AI experts in Morocco and abroad to promote the growth of AI across the country. This Summer School welcomed students and early-career researchers from Morocco, Germany, France, Italy, Tunisia, and other African countries, further strengthening its international reach. More than 50% of the participants were women.

We chose to offer a summer school with low registration fees (thanks to the additional support of SIGMM), so that as many students and young researchers from diverse backgrounds as possible could attend.

Invited speakers

The AI Summer School presents a distinguished lineup of speakers who bridge the gap between academic research and industry innovation. Our carefully selected experts combine theoretical expertise with practical insights, offering participants a comprehensive understanding of AI’s current landscape and future directions.

  • Pioneering the Future of AI in E-Commerce: Foundation Models and Generative AI at Amazon, Dr. Amin Mantrach, Applied Science Manager, Amazon, Luxembourg
  • Geometric Deep Learning for Non-Rigid Shapes: From Theory to Practice, Dr. Emery Pierson, Researcher LIX, Ecole Polytechnique, France
  • Towards Detailed Understanding of the Visual World in Generative AI Era, Dr. Fahad Shahbaz Khan,  the MBZUAI, Abu Dhabi, United Arab Emirates
  • Design Thinking for Human-Centred AI Development, Dr. Houda Chakiri, Al Akhawayn University, Morocco
  • Leveraging AI for Sustainable Marine Ecosystems, Dr. Jihad Zahir, Cadi Ayyad University, Morocco
  • 4D Human Generation: past, present and the future, Prof. Mohamed Daoudi, IMT Nord Europe, France
  • Training LLMs: Optimize and Scale Your Training, Nouamane Tazi, ML Research Engineer, Hugging Face, France
  • Generating Synthetic Face and Body Models, Prof. Stefano Berretti, Department of Information Engineering of University of Firenze, Italy
  • From Generative to Agentic: The Next Era of Computing, Dr. Kaoutar El Maghraoui, Principal Research Scientist and Manager, IBM T.J. Watson Research Center, USA
  • From documents to structure: Hands-on exploration of agentic document extraction, Prof. Omar Souissi, INPT, Morocco
  • Efficient Speech Generative Modeling with Little Tokenization, Dr. Tatiana Likhomanenko, Staff Research Scientist, Apple, USA
  • From VAE to Diffusion: probabilistic learning with audio-visual data, Dr. Xavier Alameda-Pineda, Research Director, INRIA, France

Program

The program explored the theoretical and practical Multimodal Foundation Models and Generative AI, large-scale pre-trained models, multimodality (text, image, audio, etc.), and their applications across sectors. Evening sessions were dedicated to intensive mentorship, where participants worked on real-world projects under expert guidance.

On Wednesday, participants visited the Technopark in Casablanca as part of the Morocco Accelerator Program, where they had the opportunity to engage with innovative AI startups and learn about their projects. More information about the summer school on Multimodal Foundation Models and Generative AI is available on the webpage https://ai-summer-school.inpt.ac.ma/.

The final day showcased the teams’ talent with 15 project presentations from the hackathon, followed by the closing ceremony and award announcements:

  • First Prize – MyIris: A real-time multimodal navigation system for visually impaired individuals, integrating voice control and video analysis for safe guidance.
  • Second Prize – LALLACare: An innovative platform offering a low-cost alternative to mammography for early breast cancer screening, combining thermal imaging with Google’s MedGemma model for fast, accurate, and explainable assessments.
  • Special Jury Prize – Moun9idoun (AiDex): A multimodal AI solution dedicated to emergency management and community safety.

Acknowledgments: The organizers extend their sincere thanks to the INPT staff (reception, cafeteria, and accommodation), in particular Madame Leila Karakchou, for their invaluable support in facilitating the successful organization of this Summer School, to all the dedicated volunteers from the MoroccoAI association, and to Fatih Hamza from the Soft Center for his work in developing the event website.

Diversity and Inclusion at ACM MMSys 2025

The 16th ACM Multimedia Systems Conference and its associated workshops (MMVE 2025 and NOSSDAV’25) were held from March 31st to April 4th 2025, in Stellenbosch, South Africa. With the intention to create a diverse and inclusive community for multimedia systems, several activities were followed. In this column, we provide a brief overview of different Diversity and Inclusion activities taken before and during the 16th ACM MMSys’25.

Activities Before the Conference

Grants

Thanks to the generous support from the ACM Special Interest Group on Multimedia (SIGMM), we could provide some grants:

  • Student Travel Grant: ACM SIGMM offered travel grants for students in order to promote participation and diversity of students in the conference. ACM SIGMM has centralised support for standard student travel for in-person participation, and any student member of SIGMM, and those who were the first author of an accepted paper, were eligible and encouraged to apply. Female and minority students’ applications were also encouraged.
  • Young African Researcher Travel Awards: Travel grants were awarded specifically aimed to support young African researchers to attend the ACM MMSys’25 Conference and its co-located workshops. These awards targeted to foster diversity, promote knowledge exchange, and strengthen the multimedia systems research community across Africa. One of the eligibility criteria was to be affiliated with an African institution or to be an enrolled PhD student at an African higher learning institution.

Diversity in Papers

Previous to the conference, a brief analysis was done to understand how diverse and inclusive are the submitted papers. During the review process, paper reviewers indicated weather a paper tackled any aspect of diversity and inclusion by considering the following diversity criteria:

  1. Scope
  2. Approach
  3. Evaluation procedure
  4. Results
  5. Other
  6. This paper does not address any topics of diversity

It was found that the majority of papers did not address any topic of diversity, as shown in the diagram below. With these results in mind, we decided to organise a pabel about how to increase diversity and inclusion in future submissions to the conference.

Activities at the Conference: The Diversity Panel

Fuelled by the results of the study about diversity in MMSys’25 papers, the conference featured a panel discussion with the purpose to understand how diverse and inclusive are the topics, methodologies and evaluations in the papers submitted to the conference. In particular, the topics of discussion were (i) Implementing Diversity and Inclusion in research; (ii) Challenges in implementing Diversity and Inclusion; (iii) Inclusive and Diverse Practices; and (iv) Monitoring implementation progress.

Diversity and Inclusion panel discussion in this context targeted to explore how researchers/academia accommodate or work together with their relevant stakeholders or communities during their research activities, and during results dissemination such as in conferences.

To enable the discussion, we invited 4 panellists with different expertise both from academia and industry. These were: 


Professor Vali Lalioti
University of the Arts London (United Kingdom)
Vali Lalioti is a pioneering designer, computer scientist, and innovator. She is Professor of Creative XR and Robotics and Director of Programmes at the Creative Computing Institute (CCI), University of the Arts London (UAL). She played a key role in developing the world’s first Virtual Reality (VR) systems in Germany. Her research focuses on human-robot interaction, robotic movement design, and XR for societal impact, spanning well-being, healthy aging, performance art, and the future of work. She pioneered BBC’s first Augmented Reality production (2003). As Founder-Director at CCI, she founded the Creative XR and Robotics Research Hub, that led the Institute’s expansion.

Associate Professor. Ketan Mayer-Patel
 University of North Carolina at Chapel Hill
Ketan Mayer-Patel is an associate professor in the Department of Computer Science at the University of North Carolina. His research generally focuses on multimedia systems, networking, and multicast applications. Currently, he is investigating model-based video coding, dynamic media coding models, and networking problems associated with multiple independent, but semantically related, media streams.

Dr. Marta Orduna 
Nokia XR Lab; Madrid,Spain
Marta Orduna is a Telecommunication Engineer, Bachelor of Engineering in Telecommunication Technologies and Services in 2016 and Master in Telecommunication Engineering in 2018 both from Universidad Politécnica de Madrid (UPM). In 2023, she received her PhD from UPM entitled “Understanding and Assessing Quality of Experience in Immersive Communications”, reaching Cum Laude. In 2023, she joined Nokia Extended Reality Lab team in Spain, where she continues her research line of the PhD in the area of quality of experience in extended reality

Professor Gregor Schiele
University Duisburg-Essen, Germany
Gregor Schiele is leading the research lab on Intelligent Embedded Systems at the University of Duisburg-Essen in Germany. Professor Gregor’s goal is to make deep learning algorithms so efficient that they can be executed efficiently on every computer device, including tiny embedded sensors and wearable XR devices. He is a big fan of the MMSys community and its constructive discussion culture. 

Below, we provide a summary of the main findings on the four presented topics:

(1) Implementing Diversity and Inclusion in research

The panel discussion revealed that all panellists have worked or collaborated successfully with stakeholders outside their workplaces. Diversity and inclusion were mainly implemented via data collection for research work, co-creation, stakeholders’ workshops or seminars, and in research methodologies such as working with community in participatory action. The discussion highlighted the experience of our panellists with diversity measures as well as helped rising awareness in the audience as to what could they apply as diversity measures to their own work. 

(2) Challenges in implementing the Diversity and Inclusion

The following were mentioned as challenges in implementing diversity and inclusion in research and research dissemination activities:

  1. Financial and time constraints,
  2. Different organizational culture,
  3. Difficulty to find a common time for collaboration due to different priorities,
  4. Differences in language, organizational priorities and objectives.

(3) Inclusive and Diverse Practices

The panel discussed how to build a diverse and an inclusive conference in terms of topics, methodology (variety of approaches in pre-conference, during the conference and post conference). The following are some of the proposed practices:

  1. In a conference, invite at least three best papers and three best demos from other related conferences to present their work and showcase their demos respectively.
  2. Co-location of at least two conferences or workshop with related or complementing themes.
  3. Focus on relevant related conferences to find a match which will lead to run a common workshop, this will build relation that can lead to conferences co-location hence diversity and inclusion.
  4. Invite University graduates employers and   equipment vendors or manufacturers to participate and exhibit their products in conferences.
  5. Provide avenue in conferences for stakeholders to interact with academia such as in roundtable discussion or debates between academia vs industry and keynote presentation from industry/stakeholders.
  6. Run a flagship workshops or conferences with switching roles, for example this year the conference is for academia while industry/stakeholders are invited and assigned minor roles, next year the conference is dominated by industry/stakeholders and academia are invited with minor roles in the conference
  7. Run a conference with tracks of diverse and inclusive themes
  8. In order to accommodate policy makers in conferences, suggestions were as follows:
    1. Invite high profile Government officials such as Ministers or Presidents to officiate or close a conference where they will spend few hours listening to policy brief aligned to the conference theme or to the major conference resolutions during conference opening or closing respectively.
    2. Seek audience with the officials to briefly discuss conference resolutions or issues raised during the conference relevant to their offices.

(4) Monitoring implementation progress

Panellists were required to discuss how to track and measure progress in implementing diversity and inclusion in future ACM MMSys conferences. Generally, this point appeared difficult or it was not well understood by the panellists. It received very few and short responses. Most of the responses were kind of recommendation to:

  1. First set performance criteria which will be used as benchmarks for tracking and measuring implementation progress on diversity and inclusion.
  2. Develop stages of diverse and inclusive such as early/infant stage, medium/growing stage and premium/mature stage to guide a monitoring process, performance parameters and monitoring tools for paper evaluation process and in pre, during and post conference.

Concluding Remarks

Diversity and Inclusion activities done at the ACM MMSys 2025 served as important steps in nurturing diverse and inclusive multimedia system community. The activities comprised of travel grants supporting underrepresented and young African researchers, together with panel discussion at the conference. Although paper review analysis discovered that diversity topics remain underrepresented in paper submissions, this finding served as a catalyst for a rigorous panel discussion, that leads to concrete recommendations.  Going forward, the multimedia systems community is encouraged to adopt a smart framework with progress stages and performance parameters to monitor and track progress of diversity and inclusion in the ACM MMSys conference series.