MPEG Column: 153rd MPEG Meeting

The 153rd MPEG meeting took place online from January 19-23, 2026. The official MPEG press release can be found here. This report highlights key outcomes from the meeting, with a focus on research directions relevant to the ACM SIGMM community:

  • MPEG Roadmap
  • Exploration on MPEG Gaussian Splat Coding (GSC)
  • MPEG Immersive Video 2nd edition (new white paper)

MPEG Roadmap

MPEG released an updated roadmap showing continued convergence of immersive and “beyond video” media with deployment-ready systems work. Near-term priorities include 6DoF experiences (MPEG Immersive Video v2 and 6DoF audio), volumetric representations (dynamic meshes, solid point clouds, LiDAR, and emerging Gaussian splat coding), and “coding for machines,” which treats visual and audio signals as inputs to downstream analytics rather than only for human consumption.

Research aspects: The most promising research opportunities sit at the intersections: renderer and device-aware rate-distortion-complexity optimization for volumetric content; adaptive streaming and packaging evolution (e.g., MPEG-DASH / CMAF) for interactive 6DoF services under tight latency constraints; and cross-cutting themes such as media authenticity and provenance, green and energy metadata, and exploration threads on neural-network-based compression and compression of neural networks that foreshadow AI-native multimedia pipelines.

MPEG Gaussian Splat Coding (GSC)

Gaussian Splat Coding (GSC) is MPEG’s effort to standardize how 3D Gaussian Splatting content, scenes represented as sparse “Gaussian splats” with geometry plus rich attributes (scale and rotation, opacity, and spherical-harmonics appearance for view-dependent rendering), is encoded, decoded, and evaluated so it can be exchanged and rendered consistently across platforms. The main motivation is interoperability for immersive media pipelines: enabling reproducible results, shared benchmarks, and comparable rate-distortion-complexity trade-offs for use cases spanning telepresence and immersive replay to mobile XR and digital twins, while retaining the visual strengths that made 3DGS attractive compared to heavier neural scene representations.

The work remains in an exploration phase, coordinated across ISO/IEC JTC 1/SC 29 groups WG 4 (MPEG Video Coding) and WG 7 (MPEG Coding for 3D Graphics and Haptics) through Joint Exploration Experiments covering datasets and anchors, new coding tools, software (renderer and metrics), and Common Test Conditions (CTC). A notable systems thread is “lightweight GSC” for resource-constrained devices (single-frame, low-latency tracks using geometry-based and video-based pipelines with explicit time and memory targets), alongside an “early deployment” path via amendments to existing MPEG point-cloud codecs to more natively carry Gaussian-splat parameters. In parallel, MPEG is testing whether splat-specific tools can outperform straightforward mappings in quality, bitrate, and compute for real-time and streaming-centric scenarios.

Research aspects: Relevant SIGMM directions include splat-aware compression tools and rate-distortion-complexity optimization (including tracked vs. non-tracked temporal prediction); QoE evaluation for 6DoF navigation (metrics for view and temporal consistency and splat-specific artifacts); decoder and renderer co-design for real-time and mobile lightweight profiles (progressive and LOD-friendly layouts, GPU-friendly decode); and networked delivery problems such as adaptive streaming, ROI and view-dependent transmission, and loss resilience for splat parameters. Additional opportunities include interoperability work on reproducible benchmarking, conformance testing, and practical packaging and signaling for deployment.

MPEG Immersive Video 2nd edition (white paper)

The second edition of MPEG Immersive Video defines an interoperable bitstream and decoding process for efficient 6DoF immersive scene playback, supporting translational and rotational movement with motion parallax to reduce discomfort often associated with pure 3DoF viewing. The second edition primarily extends functionality (without changing the high-level bitstream structure), adding capabilities such as capture-device information, additional projection types, and support for Simple Multi-Plane Image (MPI), alongside tools that better support geometry and attribute handling and depth-related processing.

Architecturally, MIV ingests multiple (unordered) camera views with geometry (depth and occupancy) and attributes (e.g., texture), then reduces inter-view redundancy by extracting patches and packing them into 2D “atlases” that are compressed using conventional video codecs. MIV-specific metadata signals how to reconstruct views from the atlases. The standard is built as an extension of the common Visual Volumetric Video-based Coding (V3C) bitstream framework shared with V-PCC, with profiles that preserve backward compatibility while introducing a new profile for added second-edition functionality and a tailored profile for full-plane MPI delivery.

Research aspects: Key SIGMM topics include systems-efficient 6DoF delivery (better view and patch selection and atlas packing under latency and bandwidth constraints); rate-distortion-complexity-QoE optimization that accounts for decode and render cost (especially on HMD and mobile) and motion-parallax comfort; adaptive delivery strategies (representation ladders, viewport and pose-driven bit allocation, robust packetization and error resilience for atlas video plus metadata); renderer-aware metrics and subjective protocols for multi-view temporal consistency; and deployment-oriented work such as profile and level tuning, codec-group choices (HEVC / VVC), conformance testing, and exploiting second-edition features (capture device info, depth tools, Simple MPI) for more reliable reconstruction and improved user experience.

Concluding Remarks

The meeting outcomes highlight a clear shift toward immersive and AI-enabled media systems where compression, rendering, delivery, and evaluation must be co-designed. These developments offer timely opportunities for the ACM SIGMM community to contribute reproducible benchmarks, perceptual metrics, and end-to-end streaming and systems research that can directly influence emerging standards and deployments.

The 154th MPEG meeting will be held in Santa Eulària, Spain, from April 27 to May 1, 2026. Click here for more information about MPEG meetings and ongoing developments.

Quality of Multimedia Experience Meets Machine Intelligence

1. Why QoE meets Machine Intelligence Now

[Multimedia systems are evolving towards AI-driven, adaptive services, leading to a natural convergence of QoE and machine intelligence. In this context, machine intelligence can empower QoE through learning-based, context-aware, and semantic-driven modelling and optimization. At the same time, QoE can guide machine intelligence by providing a human-centred objective for AI system design and evaluation; see also [11]. Looking beyond human perception, toward agent-centric and hybrid QoE, future multimedia systems increasingly require unified experience objectives that support human-AI co-experience. QoMEX’26 in Cardiff stands as a major milestone highlighting the convergence of Quality of Multimedia Experience with Machine Intelligence. This column reflects on this evolution and outlines the key challenges ahead.

Multimedia systems have shifted from “best-effort delivery” toward intelligent, adaptive services that operate under highly diverse network conditions, device capabilities, and user contexts. In this landscape, Quality of Experience (QoE) has become a central concept, focusing on user satisfaction rather than purely signal-level fidelity [1, 2, 3].

QoE has traditionally been human-centric, reflecting perceived quality, enjoyment, comfort, and acceptance of multimedia services [2]. Meanwhile, machine intelligence, from deep learning and reinforcement learning to multimodal foundation models, has rapidly become the dominant paradigm for perception, generation, and decision-making. The intersection of these trends is timely and inevitable: QoE provides the human-centred goal, while machine intelligence provides scalable tools to model and optimize experience in complex real-world environments. Figure 1 summarizes this bidirectional relationship between QoE and machine intelligence, from multimodal inputs to human-centric, agent-centric, and hybrid QoE objectives.

Figure 1. A conceptual framework where machine intelligence enables QoE prediction and QoE-aware optimization, while QoE evolves from a human-centric notion toward agent-centric and hybrid objectives in intelligent multimedia systems.

2. How machine intelligence can empower QoE

(1) Learning QoE models beyond handcrafted rules

Classic QoE models often rely on handcrafted features and simplified assumptions linking system parameters (bitrate, delay, resolution) to perceived quality. Machine learning offers a flexible alternative: it can learn complex nonlinear mappings from content, network conditions, and user interaction signals to QoE outcomes. Deep models further enable learning from high-dimensional inputs such as raw video frames, audio signals, and multimodal logs, supporting richer QoE prediction in streaming, immersive media, short-form video, gaming, and interactive communication. In this context, advances in perceptual quality assessment (e.g., full-reference and no-reference IQA/VQA) also provide useful foundations for QoE-related modelling [5, 8, 9].

(2) QoE-aware control and optimization

Machine intelligence is not only about prediction, it can also enable QoE-driven decision-making. Instead of optimizing network metrics alone, systems can adapt encoding, bitrate selection, buffering strategies, or rendering policies to maximize predicted QoE. This direction has been extensively studied in adaptive streaming, where QoE-driven strategies are used to balance bitrate quality and playback stability [4]. Reinforcement learning is particularly promising, where QoE can serve as a reward signal and agents can learn robust policies under uncertainty (e.g., bandwidth fluctuations, user engagement changes) [6, 7].

(3) Personalization and context-awareness

QoE is inherently subjective and context-dependent. Machine intelligence can support personalization by incorporating user preferences and context signals such as device type, mobility, ambient environment, and usage patterns. For example, some users are more sensitive to rebuffering events, while others prioritize sharpness and resolution. Context-aware learning enables systems to move beyond “one-size-fits-all” adaptation.

(4) Semantic Intelligence

Machine intelligence can empower QoE by shifting quality assessment from perceptual fidelity toward semantic quality. This means how well the meaning and task-relevant information of multimodal content is preserved for both machines and humans. As multimedia data is increasingly consumed by AI systems in applications like autonomous systems and AI-generated content pipelines, traditional perceptual metrics fail to reflect performance and experience because they ignore semantic consistency. Semantic-aware evaluation may enable task-oriented and task-agnostic assessment. By integrating semantic quality assessment, AI can guide compression, transmission, and system design in ways that better align technical performance with downstream task success and user experience.

3. How QoE can guide machine intelligence

The relationship between QoE and machine intelligence is bidirectional: QoE can also shape how multimedia AI systems are designed, trained, and evaluated.

(1) QoE as a human-centric objective function

Many multimedia AI pipelines optimize proxy metrics such as accuracy, PSNR/SSIM, or task performance. However, these do not always align with perceived quality or user satisfaction. QoE provides a principled framework to define what “better” means from the user’s perspective and encourages evaluation beyond technical fidelity [2, 10].

(2) Aligning generative intelligence with user satisfaction

With the rise of generative AI for multimedia enhancement and creation, QoE becomes even more critical. High-quality generation is not only about realism but also about temporal consistency, comfort, trust, and acceptance in real usage conditions. Integrating QoE considerations can help steer generative models toward outcomes that users actually prefer.

Emerging Challenge “QoE of interactive AI systems”

AI evaluation is shifting from pure model accuracy toward experience-based assessment of how humans interact with AI, aligned with frameworks like the EU AI Act. Quality of Experience (QoE) and UX research provide established methods to measure subjective aspects such as trust, transparency, human oversight of the AIS systems, robustness, and satisfaction. Applying QoE methodologies can translate high-level AI principles into measurable experiential dimensions reflecting real-world user understanding and use. This requires new metrics that reflect how users actually understand, trust and operate AI systems in practice. For more details, see [11].

4. Beyond human-centric QoE: toward agent-centric and hybrid QoE

While QoE has historically focused on human perception, emerging multimedia systems increasingly serve autonomous agents such as robots, drones, and intelligent vehicles. In these scenarios, multimedia is not only consumed by humans but also by machines. This motivates an extended view of QoE, agent-centric QoE, where “experience” can be interpreted as the utility of multimedia inputs for decision-making and task execution.

Agent-centric QoE can be characterized through indicators such as perception reliability, uncertainty reduction, latency sensitivity, safety margins, energy efficiency, and task success rate. Importantly, many future applications involve human–AI co-experience, for example, in teleoperation, remote driving, robot-assisted inspection, and collaborative XR. In such systems, overall quality depends on both human satisfaction and machine performance, motivating unified QoE objectives that jointly optimize human-centric and agent-centric requirements. As shown in Figure 1, future multimedia systems may require unified QoE objectives that jointly optimize human satisfaction and agent utility in human–AI co-experience scenarios.

5. Key challenges

Despite its promise, QoE-meets-AI research faces several open challenges:

  • Subjective data cost and scarcity: QoE ground truth often requires user studies and careful experimental design [2, 3].
  • Generalization: QoE models may struggle across unseen content types, devices, or cultural contexts.
  • Bias and fairness: QoE datasets may underrepresent certain user groups or contexts, leading to skewed optimization.
  • Explainability and trust: Black-box QoE predictions can be difficult to interpret and validate in engineering pipelines.
  • Privacy: Personalization requires user data, raising responsible data usage concerns.
  • Ethical aspects: Beyond established research ethics procedures, QoE research must increasingly address the broader ethical implications of AI-driven experience optimization, such as fairness, transparency, wellbeing, privacy, and environmental impact, which are essential for truly human-centred technology.

6. Outlook and takeaways

The convergence of Quality of Experience and machine intelligence represents a major opportunity for the multimedia community. Machine intelligence offers scalable tools to predict and optimize QoE in complex environments, while QoE provides a human-centred lens to guide AI system design toward real user value. Looking forward, QoE may evolve from a purely human-centric notion to a hybrid experience shared by humans and intelligent agents, enabling multimedia systems that are not only technically advanced, but also aligned with what humans and autonomous agents truly need.

Looking ahead to the continued evolution of the QoMEX conference series, QoMEX’26 in Cardiff represents a key milestone where Quality of Multimedia Experience directly converges with Machine Intelligence. As AI increasingly shapes how multimedia is created, transmitted, and consumed, the conference invites the community to rethink both the goals and methods of QoE research – using AI to enhance user experience, while drawing on QoE insights to build more human-aware, trustworthy, and adaptive intelligent systems. This vision is reflected in special sessions:
SS1: Semantic Quality Assessment for Multi-Modal Intelligent Systems” on semantic quality assessment for multimodal intelligent systems, which extend quality evaluation beyond perceptual fidelity toward meaning and task relevance. The session aims to lay the foundations of multimodal semantic quality assessment, enable semantic-driven compression and transmission, and connect semantic quality evaluation with AI understanding.
SS2: Beyond Quality: Integrating Ethical Dimensions in QoE Research” on integrating ethical dimensions into QoE research, emphasizing fairness, transparency, wellbeing, privacy, and environmental impact, which are essential for truly human-centred technology. This session calls for ethically reflexive, value-sensitive QoE frameworks that incorporate social impact, collective QoE, and inclusive research practices alongside traditional UX measures.

Together, these themes signal a continued broadening of the QoE scope, reaffirming QoMEX as a forum that evolves with emerging technologies while advancing inclusive, responsible, and future-oriented quality research. The 18th International Conference on Quality of Multimedia Experience (QoMEX’26) will take place in CardiffUnited Kingdom, from June 29 to July 3, 2026. Please find more information on the website of QoMEX’26: https://qomex2026.itec.aau.at/

18th International Conference on Quality of Multimedia Experience (QoMEX’26) will take place in Cardiff, United Kingdom, from June 29 to July 3, 2026

Reference

[1] ITU-T Rec. P.10/G.100 (2006), Vocabulary for performance and quality of service.

[2] Möller, S., & Raake, A. (2014), Quality of Experience: Advanced Concepts, Applications and Methods. Springer.

[3] De Moor, K., et al. (2010), Proposed framework for evaluating quality of experience in a mobile, testbed-oriented living lab setting. Mobile Networks and Applications.

[4] Seufert, M., Egger, S., Slanina, M., Zinner, T., Hoßfeld, T., & Tran-Gia, P. (2015), A survey on quality of experience of HTTP adaptive streaming. IEEE Communications Surveys & Tutorials.

[5] Bampis, C. G., Li, Z., Moorthy, A. K., Katsavounidis, I., Aaron, A., & Bovik, A. C. (2018), Study of temporal effects on subjective video quality of experience. IEEE Transactions on Image Processing.

[6] Yin, X., Jindal, A., Sekar, V., & Sinopoli, B. (2015), A control-theoretic approach for dynamic adaptive video streaming over HTTP. ACM SIGCOMM.

[7] Mao, H., Netravali, R., & Alizadeh, M. (2017), Neural adaptive video streaming with Pensieve. ACM SIGCOMM.

[8] Wang, Z. & Bovik, AC. (2006), Modern image quality assessment. Springer.

[9] Mittal, A., Moorthy, A. K., & Bovik, A. C. (2013), No-reference image quality assessment in the spatial domain. IEEE Transactions on Image Processing.

[10] Hoßfeld, T., Schatz, R., & Egger, S. (2011), SOS: The MOS is not enough! QoMEX.

[11] Hupont, I., De Moor, K, Skorin-Kapov, L., Varela, M. & Hoßfeld, T. “Rethinking QoE in the Age of AI: From Algorithms to Experience-Based Evaluation.” ACM SIGMultimedia Records (2025).

ACM SIGMM Multimodal Reasoning Workshop

The ACM SIGMM Multimodal Reasoning Workshop was held on 8–9 November 2025 at Indian Institute of Technology Patna (IIT Patna) in Hybrid mode. Organised by Dr. Sriparna Saha, faculty member of the Department of Computer Science and Engineering, IIT Patna and supported by ACM SIGMM, the two-day event brought together researchers, students, and practitioners to discuss the foundations, methods, and applications of multimodal reasoning and generative intelligence. The workshop registered 108 participants and featured invited talks, tutorials, and hands-on sessions by national and international experts. Sessions covered topics ranging from trustworthy AI, LLM fine-tuning, temporal and multimodal reasoning, to knowledge-grounded visual question answering and healthcare applications.

Inauguration:

 During the inauguration, Dr. Sriparna Saha welcomed participants and acknowledged the presence and support of Prof. Jimson Mathew (Dean of Student Affairs, IIT Patna), Prof. Rajiv Ratn Shah (IIIT Delhi) and all speakers. The organising committee expressed gratitude to ACM SIGMM for its financial support, which made the workshop possible. Felicitations were exchanged, and the inauguration concluded with words of encouragement for active participation and interdisciplinary collaboration.

Session summaries and highlights:

Day 1:

 The first day began with an inaugural session followed by a series of engaging talks and tutorials. Prof. Rajiv Ratn Shah (Associate Professor, IIIT Delhi, India) delivered the opening talk on “Tackling Multimodal Challenges with AI: From User Behavior to Content Generation,” highlighting the role of multimodal data in understanding user behaviour, enabling AI-driven content generation, and building region-specific applications such as voice conversion and video dubbing for Indic languages. Prof. Sriparna Saha (Associate Professor, Department of Computer Science and Engineering, IIT Patna, India) then presented “Harnessing Generative Intelligence for Healthcare: Models, Methods, and Evaluations,” discussing safe, domain-specific AI systems for healthcare, multimodal summarisation for low-resource languages, and evaluation frameworks like M3Retrieve and multilingual trust benchmarks. The afternoon sessions featured hands-on tutorials and technical talks. Ms. Swagata Mukherjee (Research Scholar, IIT Patna, India) conducted a tutorial on “Advanced Prompting Techniques for Large Language Models,” covering zero/few-shot prompting, chain-of-thought reasoning, and iterative refinement strategies. Mr. Rohan Kirti (Research Scholar, IIT Patna, India) led a tutorial on “Exploring Multimodal Reasoning: Text & Image Embeddings, Augmentation, and VQA,” demonstrating text–image fusion using models such as CLIP, VILT, and PaliGemma. Prof. José G. Moreno (Associate Professor, IRIT, France) presented “Visual Question Answering about Named Entities with Knowledge-Based Explanation (ViQuAE),” introducing a benchmark for explainable, knowledge-grounded VQA. The day concluded with Prof. Chirag Agarwal (Assistant Professor, University of Virginia, USA) delivering a talk on “Trustworthy AI in the Era of Frontier Models,” which emphasised fairness, safety, and alignment in multimodal and LLM systems.

Day 2:

 The second day continued with high-level technical sessions and tutorials. Prof. Ranjeet Ranjan Jha (Assistant Professor, Department of Mathematics, IIT Patna, India) opened with a talk on “Bridging Deep Learning and Multimodal Reasoning: Generative AI in Real-World Contexts,” tracing the evolution of deep learning into multimodal generative models and discussing ethical and computational challenges in deployment. Prof. Adam Jatowt (Professor, Department of Computer Science, University of Innsbruck, Austria) followed with a presentation on “Analyzing and Improving Temporal Reasoning Capabilities of Large Language Models,” showcasing benchmarks such as BiTimeBERT, TempRetriever, and ComplexTempQA, while proposing methods to enhance time-sensitive reasoning. The final technical session featured Mr. Syed Ibrahim Ahmad (Research Scholar, IIT Patna, India) conducted a tutorial on “LLM Fine-Tuning,” which covered PEFT approaches, QLoRA, quantization, and optimization techniques to fine-tune large models efficiently.

Valedictory session:

 The valedictory session marked the formal close of the workshop. Dr. Sriparna Saha thanked speakers, participants and the organising team for active engagement across technical talks and tutorials. Participants shared positive feedback on the depth and practicality of sessions. Certificates were distributed to attendees. Final remarks encouraged continued research, collaboration and dissemination of resources. Dr. Saha reiterated gratitude to ACM SIGMM for financial support.

Outcomes, observations and suggested actions:

  • Multimodal reasoning remains an interdisciplinary challenge that benefits from close collaboration between multimedia, NLP, and application domain experts.
  • Trustworthiness, safety, and evaluation (benchmarks and metrics) are critical for moving multimodal models from demonstration to practice especially in healthcare and other high-stakes domains.
  • Practical methods for model adaptation (PEFT, quantization) make large models accessible for research groups with limited compute.
  • Datasets and retrieval resources that combine multimodal inputs with external knowledge (as in ViQuAE) are valuable for advancing explainable VQA and grounded reasoning.
  • The community should prioritise regional and language-diverse resources (Indic languages, code-mixed data) to ensure equitable benefits from multimodal AI.
  • SIGMM and ACM venues can play a role in fostering collaborations via special projects, regional hackathons, grand challenges, and multimodal benchmark initiatives.

Outreach & social media:

 The workshop generated significant visibility on LinkedIn and other professional networks. Photos and session highlights were widely shared by participants and organisers, acknowledging ACM SIGMM support and the quality of the technical programme.

Acknowledgements:  The organising committee thanks all speakers, attendees, student volunteers, and ACM SIGMM for financial and logistic support that enabled the workshop.

Reports from ACM Multimedia 2025

URL:  https://acmmm2025.org

Date: Oct 27 – Oct 31, 2025
Place:  Dublin, Ireland
General Chairs: Cathal Gurrin, Klaus Schoeffmann, Min Zhang, Adapt Centre & DCU, Klagenfurt University, Tsinghua University

Introduction

The ACM Multimedia Conference 2025, held in Dublin, Ireland from October 27 to October 31, 2025, continued its tradition as a premier international forum for researchers, practitioners, and industry experts in the field of multimedia. This year’s conference marked an exciting return to Europe, bringing the community together in a city renowned for its rich cultural heritage, innovation-driven ecosystem, and welcoming atmosphere. ACM MM 2025 provided a dynamic platform for presenting state-of-the-art research, discussing emerging trends, and fostering collaboration across diverse areas of multimedia computing.

Hosted in Dublin—a vibrant hub for both technology and academia—the conference delivered a seamless and engaging experience for all attendees. As part of its ongoing mission to support and encourage the next generation of multimedia researchers, SIGMM awarded Student Travel Grants to assist students facing financial constraints. Each recipient received up to 1,000 USD to help offset travel and accommodation expenses. Applicants completed an online form, and the selection committee evaluated candidates based on academic excellence, research potential, and demonstrated financial need.

To shed light on the experiences of these outstanding young scholars, we interviewed several travel grant recipients about their participation in ACM MM 2025 and the conference’s influence on their academic and professional development. Their reflections are shared below.

Wang Zihao – Zhejiang University

This was not my first time attending ACM Multimedia—I also participated in ACMMM 2022—but coming back in 2025 has been just as fantastic. There were many memorable moments, but two of them stood out the most for me. The first was the beautiful violin performance during the Volunteer Dinner, which created such a warm and elegant atmosphere. The second was the Irish drumming performance at the conference banquet on October 30th. It was incredibly energetic and truly unforgettable. These moments reminded me how special it is to be part of this community, where academic exchange and cultural experiences blend together so naturally.

I am truly grateful for the SIGMM Student Travel Grant. The financial support made it possible for me to attend the conference in person, and I really appreciate the effort that SIGMM puts into supporting students. One of the most valuable aspects of this trip was meeting researchers from all over the world who work in areas similar to mine—especially those focusing on music, audio, and multimodality. Having deep, face-to-face conversations with them was inspiring and has given me many new ideas to explore in my future research.

As for suggestions, I honestly think the SIGMM Student Travel Grant program is already doing an amazing job in supporting young scholars like us. My only small hope is for a smooth reimbursement process.

Overall, I feel incredibly fortunate to be here again, reconnecting with the ACM MM community and learning so much from everyone. I’m thankful for this opportunity and excited to continue growing in this field.

Huang Feng-Kai – National Taiwan University

Attending ACM Multimedia 2025 in Dublin was my first time joining the conference, and it has been an unforgettable experience. Everything was so well-organized, and I truly enjoyed every moment. The welcome reception was especially memorable—the food was delicious, the atmosphere was lively, and it was inspiring to see so many renowned researchers and professors chatting enthusiastically. It really felt like the perfect start to my ACM MM journey.

I am deeply grateful to SIGMM for the Student Travel Grant. As a student traveling all the way from Taiwan, attending a conference in Europe is a major financial challenge. The grant covered my accommodation, meals, and flights, which made it possible for me to participate without worrying too much about the cost. Being here has really broadened my horizons. I was able to learn about so many fascinating research topics and meet many kind, talented researchers who generously shared their thoughts with me. These conversations gave me a lot of inspiration for my own work.

I also had the chance to serve as a volunteer, which became my first experience working with an international team. Collaborating with people from different cultural and academic backgrounds helped me improve my communication skills and made the conference even more meaningful.

I truly believe the SIGMM Student Travel Grant is an amazing program that enables students from all over the world to join this vibrant community, exchange ideas, and form new collaborations. My only wish is that SIGMM will continue offering this opportunity in the future. This grant brings so much energy to young researchers like me and plays an important role in supporting the next generation of the multimedia community. I am sincerely thankful for everything this experience has given me, and I look forward to returning to ACM Multimedia in the coming years.

Wang Hao (Peking University)

Attending ACM Multimedia 2025 was my very first time participating in the conference, and the experience was truly amazing. The moment that impressed me the most was having the chance to present my own paper. As a non-native English speaker, giving an academic talk on an international stage was both challenging and rewarding. I felt nervous at first, but I’m really proud of how I managed to deliver my presentation. It was a big milestone for me.

I’m incredibly grateful to SIGMM for the Student Travel Grant, which made it possible for me to attend an international conference for the first time. Without this support, I wouldn’t have been able to experience such a meaningful academic event. Throughout the conference, I met so many new friends, attended inspiring talks, and gained fresh perspectives on multimedia research. These experiences have broadened my view of the field and will definitely influence the direction of my future work.

I’m thankful for this opportunity and truly appreciate how welcoming and encouraging the ACM MM community is. This conference has given me motivation and confidence to continue growing as a researcher.

Yu Liu (University of Electronic Science and Technology of China)

This is my first time attending ACM Multimedia. I am a PhD student at the University of Electronic Science and Technology of China (UESTC), currently spending a year at the University of Auckland, New Zealand, as part of a joint PhD program. It took 25 hours to travel from Auckland to Dublin, but the journey was completely worth it. The conference has been vibrant and intellectually engaging. I had the honor of being the first speaker in my session, and it was incredibly fulfilling to see the audience show genuine interest and appreciation for our work. Outside the sessions, I thoroughly enjoyed immersing myself in Irish culture—tasting the smooth, rich Guinness, watching lively tap dancing, and listening to traditional Irish music. Overall, it has been an inspiring and truly memorable experience.

The SIGMM Student Travel Grant played a vital role in making my attendance possible. In recent years, UESTC has discontinued funding for PhD students’ conference travel, transferring the financial responsibility entirely to individual research groups. Receiving this grant was crucial, allowing me to attend ACM MM 2025 without placing additional strain on my research team’s limited budget. Attending the conference in person provided an invaluable opportunity to present my research, exchange ideas face-to-face with international scholars, and receive constructive feedback from leading experts. These experiences fostered meaningful academic connections and opened doors for potential long-term collaborations that online participation simply cannot replace.

My biggest takeaway from ACM MM 2025 is the inspiration I gained from being part of such a diverse and passionate research community, which has motivated me to continue advancing in the field of responsible AI. I also really enjoyed the volunteer “Thank You” dinner—it was a wonderful experience. At the same time, I noticed that it is not always easy for students to approach professors they do not know personally. In the future, including short icebreaker or networking activities could help start conversations more naturally, making the conference experience even more valuable for students like me.

Li Deng (LUT University)

This is my first time attending ACM Multimedia, and my experience has been exceptionally positive. I was particularly impressed by the workshops relevant to my research area, as the discussions provided valuable insights that are already influencing my ongoing work. I was also struck by the abundance and quality of the social events and networking opportunities, which made it easy to connect with senior researchers and fellow students from diverse backgrounds.

Receiving the SIGMM Student Travel Grant significantly reduced the financial burden of travel and accommodation, allowing me to attend the conference in person without major financial stress. The opportunity to present my work and engage in discussions with leading researchers has greatly supported my academic development. I received direct feedback and established connections that may lead to future collaborations. My biggest takeaway from ACM MM 2025 is a deeper understanding of the rapid development and impact of multimodal large language models.

Looking ahead, I suggest that the SIGMM Student Travel Grant program collaborate with the main conference to organize sessions such as a “Career Forum” for grant recipients and other student volunteers, providing additional guidance and support for early-career researchers.

JPEG Column: 109th JPEG Meeting in Nuremberg, Germany

JPEG XS developers awarded the Engineering, Science and Technology Emmy®.

The 109th JPEG meeting was held in Nuremberg, Germany, from 12 to 17 October 2025.

This JPEG meeting began with the excellent news that JPEG XS developers Fraunhofer IIS and intoPIX were awarded the Engineering, Science and Technology Emmy® for their contributions to the development of the JPEG XS standard.

Furthermore the 109th JPEG meeting was also marked by several major achievements: JPEG Trust Part 2 on Trust Profiles and Reports, complementing Part 1 with several profiles for various usage scenarios, reached Committee Draft; JPEG AIC part 3 was produced for final publication by ISO; JPEG XE reached Committee Draft stage; and the calls for proposals on objective evaluation JPEG AIC-4 and JPEG Pleno Quality Assessment of Light Field received several responses.

The following sections summarise the main highlights of the 109th JPEG meeting:

Fraunhofer IIS and intoPIX representatives with the awarded Engineering, Science and Technology Emmy®.
  • JPEG Trust Part 2 on Trust Profiles and Reports reaches Committee Draft stage.
  • JPEG AIC-4 receives responses to the Call for Proposals on Objective Image Quality Assessment.
  • JPEG XE Part 1, the core coding system, reaches DIS stage.
  • JPEG XS Part 1 AMD 1 reaches DIS stage.
  • JPEG AI Part 2 (Profiling), Part 3 (Reference Software), and Part 5 (File Format) approved as International Standards.
  • JPEG DNA designed the wet-lab experiments, including DNA synthesis/sequencing.
  • JPEG Peno receives responses to the Call for Proposals on Objective Metrics for Light Field Quality Assessment.
  • JPEG RF establishes frameworks for coding and quality assessment of radiance fields.
  • JPEG XL innitiates embedding of JPEG XL in ISOBMFF/HEIF.

JPEG Trust

At the 109th JPEG Meeting, the JPEG Committee reached a key milestone with the completion of the Committee Draft (CD) for JPEG Trust Part 2 – Trust Profiles and Reports (ISO/IEC 21617-2). Building on the framework established in Part 1 (Core Foundation), this new specification further refines Trust Profiles and Trust Reports and provides several example profiles and reusable profile snippets for adoption in diverse usage scenarios.

Compared to earlier drafts, the new Trust Profiles specification introduces templates and dynamic metadata blocks, offering enhanced flexibility while maintaining full backwards compatibility for existing profiles. This flexibility is also reflected in the updated Trust Reports, which can now be more easily tailored to specific usage scenarios. This new specification sets the stage for user communities to build their own Trust Profiles and customise them to their specific needs.

In addition to the CD on Part 2, the committee also produced a CD of Part 4 – Reference Software. This specification provides a reference implementation and reference dataset of the Core Foundation. The reference software will be extended with additional implementations in the future.

Finally, the committee also advanced Part 3 – Media Asset Watermarking. The Terms and Definitions and Use Cases and Requirements documents are now publicly available on the JPEG website. The development of Part 3 is progressing on schedule, with the Committee Draft stage targeted for January 2026.

JPEG AIC

The JPEG AIC-3 standard, which specifies a methodology for fine-grained subjective image quality assessment in the range from good quality up to mathematically lossless, was finalised at the 109th JPEG meeting and will be published as International Standard ISO/IEC 29170-3.

In response to the JPEG AIC-4 Call for Proposals on Objective Image Quality Assessment, four proposals were received and presented. A large-scale subjective experiment has been prepared in order to evaluate the proposals.

JPEG XE

JPEG XE is a joint effort between ITU-T SG21 and ISO/IEC JTC1/SC29/WG1 and will become the first internationally endorsed specification by major standardization bodies ITU-T, ISO, and IEC, for coding of events. It aims to establish a robust and interoperable format for efficient representation and coding of events in the context of machine vision and related applications. To expand the reach of JPEG XE, the JPEG Committee has closely coordinated its activities with the MIPI Alliance with the intention of developing a cross-compatible coding mode, allowing MIPI ESP signals to be decoded effectively by JPEG XE decoders.

At the 109th JPEG Meeting, the DIS of JPEG XE Part 1, the core coding system, was prepared. This part specifies the low-complexity and low-latency lossless coding technology that will be the foundation of JPEG XE. Reaching DIS stage is a major milestone and freezes the core coding technology for the first edition of JPEG XE. The JPEG Committee plans to further improve the coding performance and to provide additional lossless and lossy coding modes, scheduled to be developed in 2026. While the DIS of Part 1 is under ballot for approval as an International Standard, the JPEG Committee initiated the work on Part 2 of JPEG XE to define the profiles and levels. A DIS of Part 2 is planned to be ready for ballot in January 2026.

With JPEG XE Part 1 under ballot and Part 2 in the pipeline, the JPEG Committee remains committed to the development of a comprehensive and industry-aligned standard that meets the growing demand for event-based vision technologies. The collaborative approach between multiple standardisation organisations underscores a shared vision for a unified, international standard to accelerate innovation and interoperability in this emerging field.

JPEG XS

The JPEG Committee is extremely proud to announce that the two companies behind the development of JPEG XS, intoPIX and Fraunhofer IIS, were awarded an Emmy® for Engineering, Science, and Technology for their role in the development of the JPEG XS standard. The awards ceremony was held on October 14th, 2025, at the Television Academy’s Saban Media Center in North Hollywood, California. This award recognizes JPEG XS for being a state-of-the-art image compression format that transmits high-quality images with minimal latency and low-resource consumption, with visually near-lossless image quality. It affirms that JPEG XS is the fundamental game changer for real-time transmission of video in live, professional video, and broadcast applications, and that it is being heavily adopted by the industry.

Nevertheless, the work to further improve JPEG XS continues. In this context, the DIS of AMD 1 of JPEG XS Part 1 is currently under ballot at ISO and is expected to be ready by January 2026. This amendment enables the embedding of sub-frame metadata to JPEG XS as required by augmented and virtual reality applications currently discussed within VESA. The JPEG Committee also initiated the steps to start an amendment for Part 2 (Profiles and buffer models) that will define additional sublevels needed to support on-the-fly proxy-level extraction (i.e. lower resolution streams from a master stream) without recompression. The amendment is planned to go to DIS ballot at the next 110th JPEG meeting in Sydney, Australia.

JPEG AI

During the 109th JPEG meeting, the JPEG AI project achieved major milestones, with Part 2 (Profiling), Part 3 (Reference Software), and Part 5 (File Format) approved as International Standards. Meanwhile, Part 4 (Conformance) is proceeding to publication after a positive ballot. The Core Experiments confirmed that JPEG AI outperforms state-of-the-art codecs in compression efficiency and demonstrated a decoder implementation based on the SADL library.

JPEG DNA

During the 109th JPEG meeting, the JPEG Committee designed the wet-lab experiments, including DNA synthesis/sequencing, with results expected by January 2026. The primary objective of the wet-lab experiments is to validate the technical specifications outlined in the current DIS study text of ISO/IEC 25508-1 in the realistic procedures for DNA media storage. Additional efforts are underway as a new Core Experiment to study the performance of the codec-dependent unequal error correction technique, which is expected to result in the future publication of JPEG DNA Part 2 – Profiles and levels.

JPEG Pleno

JPEG Pleno marked a pivotal step toward the forthcoming ISO/IEC 21794-7 standard, Light Field Quality Assessment. The new Part 7 was officially approved for inclusion in the ISO/IEC work programme, confirming international support for standardizing light field quality assessment methodologies. Moreover, in response to the Call for Proposals on Objective Metrics for Light Field Quality Assessment, three proposals were received and presented. In preparation for the evaluation of the proposals submitted in response to the CfP, an evaluation dataset was released and discussed during the meeting. The next milestone is the execution of a Subjective Quality Assessment on the evaluation dataset to evaluate the proposed objective metrics by the 110th JPEG meeting in Sydney. To this end, the methodological design and preparation of the subjective test were discussed and finalized, marking an important step toward developing the standardization framework for objective light field quality assessment.

The JPEG Pleno Workshop on Emerging Coding Technologies for Plenoptic Modalities was conducted at the 109th meeting with presentations from Touradj Ebrahimi (JPEG Convenor), Peter Schelkens (JPEG Plenoptic Coding and Quality Sub-Group Chair), Aljosa Smolic (Hochschule Luzern), Søren Otto Forchhammer (Danmarks Tekniske Universitet), Giuseppe Valenzise (Université Paris-Saclay), Amr Rizk (Leibniz Universität Hannover), Michael Rudolph (Leibniz Universität Hannover), and Irene Viola (Centrum Wiskunde & Informatica).

JPEG RF

At the 109th JPEG Meeting the exploration activity on JPEG Radiance Fields (JPEG RF) continued its progress toward establishing frameworks for coding and quality assessment of radiance fields. The group updated the drafts of the Use Cases and Requirements and Common Test Conditions, alongside the outcomes of an Exploration Study, which examined the impact of camera trajectory design on human perception during a subjective quality assessment. These discussions refined methodological guidelines for trajectory generation and the subjective assessment procedures. Building on this progress, Exploration Study 6 was launched to benchmark the complete assessment framework through a subjective experiment using the developed protocols. Outreach activities were also planned to engage additional stakeholders and support further development ahead of the next 110th JPEG Meeting in Sydney, Australia.

JPEG XL

At the 109th JPEG meeting, work has started on an embedding of JPEG XL in ISOBMFF/HEIF. It will be described in a new edition of ISO/IEC 18181-2, which has been initiated.

Final Quote

“During the 109th JPEG Meeting, the JPEG Committee reached several important milestones. In particular, JPEG Trust continues its development with the addition of new Parts towards the creation of a reliable and effective standard that restores authenticity and provenance of the multimedia information.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.