ACM SIGMM Multimodal Reasoning Workshop

Author:Dr. Sriparna Saha (Organiser)
Affiliation: Department of Computer Science & Engineering, Indian Institute of Technology Patna (IIT Patna), India

The ACM SIGMM Multimodal Reasoning Workshop was held on 8–9 November 2025 at Indian Institute of Technology Patna (IIT Patna) in Hybrid mode. Organised by Dr. Sriparna Saha, faculty member of the Department of Computer Science and Engineering, IIT Patna and supported by ACM SIGMM, the two-day event brought together researchers, students, and practitioners to discuss the foundations, methods, and applications of multimodal reasoning and generative intelligence. The workshop registered 108 participants and featured invited talks, tutorials, and hands-on sessions by national and international experts. Sessions covered topics ranging from trustworthy AI, LLM fine-tuning, temporal and multimodal reasoning, to knowledge-grounded visual question answering and healthcare applications.

Inauguration:

During the inauguration, Dr. Sriparna Saha welcomed participants and acknowledged the presence and support of Prof. Jimson Mathew (Dean of Student Affairs, IIT Patna), Prof. Rajiv Ratn Shah (IIIT Delhi) and all speakers. The organising committee expressed gratitude to ACM SIGMM for its financial support, which made the workshop possible. Felicitations were exchanged, and the inauguration concluded with words of encouragement for active participation and interdisciplinary collaboration.

Session summaries and highlights:

Day 1:

The first day began with an inaugural session followed by a series of engaging talks and tutorials. Prof. Rajiv Ratn Shah (Associate Professor, IIIT Delhi, India) delivered the opening talk on “Tackling Multimodal Challenges with AI: From User Behavior to Content Generation,” highlighting the role of multimodal data in understanding user behaviour, enabling AI-driven content generation, and building region-specific applications such as voice conversion and video dubbing for Indic languages. Prof. Sriparna Saha (Associate Professor, Department of Computer Science and Engineering, IIT Patna, India) then presented “Harnessing Generative Intelligence for Healthcare: Models, Methods, and Evaluations,” discussing safe, domain-specific AI systems for healthcare, multimodal summarisation for low-resource languages, and evaluation frameworks like M3Retrieve and multilingual trust benchmarks. The afternoon sessions featured hands-on tutorials and technical talks. Ms. Swagata Mukherjee (Research Scholar, IIT Patna, India) conducted a tutorial on “Advanced Prompting Techniques for Large Language Models,” covering zero/few-shot prompting, chain-of-thought reasoning, and iterative refinement strategies. Mr. Rohan Kirti (Research Scholar, IIT Patna, India) led a tutorial on “Exploring Multimodal Reasoning: Text & Image Embeddings, Augmentation, and VQA,” demonstrating text–image fusion using models such as CLIP, VILT, and PaliGemma. Prof. José G. Moreno (Associate Professor, IRIT, France) presented “Visual Question Answering about Named Entities with Knowledge-Based Explanation (ViQuAE),” introducing a benchmark for explainable, knowledge-grounded VQA. The day concluded with Prof. Chirag Agarwal (Assistant Professor, University of Virginia, USA) delivering a talk on “Trustworthy AI in the Era of Frontier Models,” which emphasised fairness, safety, and alignment in multimodal and LLM systems.

Day 2:

The second day continued with high-level technical sessions and tutorials. Prof. Ranjeet Ranjan Jha (Assistant Professor, Department of Mathematics, IIT Patna, India) opened with a talk on “Bridging Deep Learning and Multimodal Reasoning: Generative AI in Real-World Contexts,” tracing the evolution of deep learning into multimodal generative models and discussing ethical and computational challenges in deployment. Prof. Adam Jatowt (Professor, Department of Computer Science, University of Innsbruck, Austria) followed with a presentation on “Analyzing and Improving Temporal Reasoning Capabilities of Large Language Models,” showcasing benchmarks such as BiTimeBERT, TempRetriever, and ComplexTempQA, while proposing methods to enhance time-sensitive reasoning. The final technical session featured Mr. Syed Ibrahim Ahmad (Research Scholar, IIT Patna, India) conducted a tutorial on “LLM Fine-Tuning,” which covered PEFT approaches, QLoRA, quantization, and optimization techniques to fine-tune large models efficiently.

Valedictory session:

The valedictory session marked the formal close of the workshop. Dr. Sriparna Saha thanked speakers, participants and the organising team for active engagement across technical talks and tutorials. Participants shared positive feedback on the depth and practicality of sessions. Certificates were distributed to attendees. Final remarks encouraged continued research, collaboration and dissemination of resources. Dr. Saha reiterated gratitude to ACM SIGMM for financial support.

Outcomes, observations and suggested actions:

Multimodal reasoning remains an interdisciplinary challenge that benefits from close collaboration between multimedia, NLP, and application domain experts.
Trustworthiness, safety, and evaluation (benchmarks and metrics) are critical for moving multimodal models from demonstration to practice especially in healthcare and other high-stakes domains.
Practical methods for model adaptation (PEFT, quantization) make large models accessible for research groups with limited compute.
Datasets and retrieval resources that combine multimodal inputs with external knowledge (as in ViQuAE) are valuable for advancing explainable VQA and grounded reasoning.
The community should prioritise regional and language-diverse resources (Indic languages, code-mixed data) to ensure equitable benefits from multimodal AI.
SIGMM and ACM venues can play a role in fostering collaborations via special projects, regional hackathons, grand challenges, and multimodal benchmark initiatives.

Outreach & social media:

The workshop generated significant visibility on LinkedIn and other professional networks. Photos and session highlights were widely shared by participants and organisers, acknowledging ACM SIGMM support and the quality of the technical programme.

Acknowledgements: The organising committee thanks all speakers, attendees, student volunteers, and ACM SIGMM for financial and logistic support that enabled the workshop.