Message from the ACM SIGMM Chair

About our initiatives for the Multimedia community

Dear SIGMM members, colleagues, students,

The world is changing rapidly, and technology is driving these changes at an unprecedented pace. In this scenario, multimedia has become ubiquitous, providing new services to users, advanced modalities for information transmission, processing, and management, as well as innovative solutions for digital content understanding and production. The progress of Artificial Intelligence has fueled new opportunities and vitality in the field. New media formats, such as 3D, event data, and other sensory inputs, have become popular. Cutting-edge applications are constantly being developed and introduced.

We believed that these changes should be reflected in our SIGMM flagship conference, ACM Multimedia and the SIGMM organization and activities overall. This belief led us to organize the SIGMM Retreat in coincidence with ACM MM23 in Ottawa on October 30, 2023. The goal of the meeting was opening a discussion on key strategic issues such as the coverage of ACM Multimedia, its quality and reputation and how we can grow the SIGMM community. We invited the members of the SIGMM Advisory Committee, the members of the Steering Committees of the SIGMM-sponsored conferences, the ACM TOMM Editor in Chief, the past SIGMM Chairs, and senior personalities and emerging researchers of our community.  The Retreat was well attended. Twenty people attended in-person. Ten attended on-line. Alberto Del Bimbo, SIGMM Chair, chaired the Retreat with the assistance of Phoebe Chen, SIGMM Vice-chair and Xavier Alameda Pineda.  

The discussion was vibrant and valued opinions and suggestions emerged. It was widely agreed that the distinctive feature of multimedia research is the combination and integration of various modalities to build end-to-end systems.

People agreed on the need to introduce significant changes in the format of our flagship conference to bring new attractiveness. High consensus received the ideas of giving more room to Brave New Ideas sections, having TED-like talks, and soliciting workshops on innovative topics and striving for their continuity. There was also consensus on revitalizing the program by including new emerging topics like Foundation Models, 3D, glass free interactivity, new networking platforms. All the attendees recognized the need to balance the traditional research areas of the ACM Multimedia program.

There was general agreement on using Open Access as the reviewing system for ACM Multimedia. It was recognized it improves the quality and transparency of the reviewing process, enhances respectability, empowers the reviewers to conduct serious reviews, and aligns ACM Multimedia to the top ranked conferences. 

Other important topics of discussion included how to incentivize in-person attendance and discourage online participation to maximize the value of conferences, the collaboration and synchronization of SIGMM-sponsored conferences, and the need to make the transition between conference editions more seamless.

Recognizing the need for greater industry presence, also to offer internship opportunities for students and improve the attendance of younger generations, was identified as a key issue for improvement. All the attendees recognized the importance to exploit SIGMM Records and Social Media as a means to improve the sense of community and disseminate information.

Following our commitment to align words with actions, we decided to create Strike Teams focusing on the most strategic themes. These teams are composed of a few experienced colleagues who volunteered to define realistic strategies for the key issues, determine concrete actions, and help to implement them in the near future. Starting in January 2024, four strike teams are in operation, with members appointed for two years:

  • SIGMM Strike Team on Open Review to provide operational support on the implementation of Open Review, smoothly transferring the best practices and helping to provide new functions.
    Team members are: Xavier Alameda Pineda (Univ. Grenoble-Alpes) Coordinator, Marco Bertini (Univ. Firenze).
  • SIGMM Strike Team on Harmonization and Spread to integrate SIGMM Records and Social Media in the whole process of the ACM Multimedia organization, improve synchronization and harmonization between ACM Multimedia and other SIGMM Conferences, and strengthen the sense of community.
    Team members are: Miriam Redi (Wikimedia Foundation) Coordinator, Silvia Rossi (CWI), Irene Viola (CWI), Mylene Farias (Texas State Univ. and Univ. Brasilia), Ichiro Ide (Nagoya Univ), Pablo Cesar (CWI and TU Delft).
  • SIGMM Strike Team on Industry Engagement to improve the presence of industry at ACM Multimedia, launching new in-cooperation initiatives and establishing stable bi-directional links.
    Team members are:  Touradj Ebrahimi (EPFL) Coordinator, Ali Begen (Ozyegin Univ), Balu Adsumilli (Google), Yong Rui (Lenovo) and ChangSheng Xu (Chinese Academy of Sciences)
  • Strike Team on ACMMM Format to innovate the ACM Multimedia program, aligning it with technological advancements and the emergence of new research areas, and igniting fresh and efficient means of disseminating research.
    Team members are: Arnold Smeulders (Univ. of Amsterdam) Coordinator, Alan Smeaton (Dublin City University), Tat Seng Chua (National University of Singapore), Changwen Chen (Hong Kong Polytechnic Univ.),  Nicu Sebe (Univ. of Trento), Marcel Worring  (Univ. of Amsterdam) and the Chairs of the next two ACMMM Conferences, Jianfei Cai (Monash Univ.) and Cathal Gurrin (Dublin City Univ.).

All the teams report to SIGMM Chair and the SIGMM Executive Committee and will work in close connection with the General Chairs and Program Chairs of the next ACM Multimedia editions.

I take this opportunity to thank again all those who participated in the SIGMM Retreat, and especially those who are committed to the Strike Teams. I sincerely hope that their work brings new ideas and vitality to our community and strengthens its visibility and reputation in the international scientific arena in the years to come.

Alberto Del Bimbo                                                                                        
SIGMM Chair

Towards Immersive Digiphysical Experiences


Immersive experiences have the potential of redefining traditional forms of media engagement by intricately combining reality with imagination. Motivated by necessities, current developments and emerging technologies, this column sets out to bridge immersive experiences in both digital and physical realities. Fitting under the umbrella term of eXtended Reality (XR), the first section describes various realizations of blending digital and physical elements to design what we refer to as immersive digiphysical experiences. We further highlight industry and research initiatives related to driving the design and development of such experiences, considered to be key building-blocks of the futuristic ‘metaverse’. The second section outlines challenges related to assessing, modeling, and managing the Quality of Experience (QoE) of immersive digiphysical experiences and reflects upon ongoing work in the area. While potential use cases span a wide range of application domains, the third section elaborates on the specific case of conference organization, which has over the past few years spanned from fully physical, to fully virtual, and finally to attempts at hybrid organization. We believe this use case provides valuable insights into needs and promising approaches, to be demonstrated and experienced at the upcoming 16th edition of the International Conference on Quality of Multimedia Experience (QoMEX 2024) in Karlshamn, Sweden in June 2024.

Multiple users engaged in a co-located mixed reality experience

Bridging The Digital And Physical Worlds

According to [IMeX WP, 2020], immersive media have been described as involving “multi-modal human-computer interaction where either a user is immersed inside a digital/virtual space or digital/virtual artifacts become a part of the physical world”. Spanning the so-called virtuality continuum [Milgram, 1995], immersive media experiences may involve various realizations of bridging the digital and physical worlds, such as the seamless integration of digital content with the real world (via Augmented or Mixed Reality, AR/MR), and vice versa by incorporating real objects into a virtual environment (Augmented Virtuality, AV). More recently, the term eXtended Reality (XR) (also sometimes referred to as xReality) has been used as an umbrella term for a wide range of levels of “realities”, with [Rauschnabel, 2022] proposing a distinction between AR/MR and Virtual Reality (VR) based on whether the physical environment is, at least visually, part of the user’s experience.

By seamlessly merging digital and physical elements and supporting real-time user engagement with both digital and physical components, immersive digiphysical (i.e., both digitally and physically accessible [Westerlund, 2020]) experiences have the potential of providing compelling experiences blurring the distinction between the real and virtual worlds. A key aspect is that of digital elements responding to user input or the physical environment, and the physical environment responding to interactions with digital objects. Going beyond only visual or auditory stimuli, the incorporation of additional senses, for example via haptic feedback or olfactory elements, can contribute to multisensory engagement [Gibbs, 2022].

The rapid development of XR technologies has been recognized as a key contributor to realizing a wide range of applications built on the fusion of the digital and physical worlds [NEM WP, 2022]. In its contribution to the European XR Coalition (launched by the European Commission), the New European Media Initiative (NEM), Europe’s Technology Platform of Horizon 2020 dedicated to driving the future of digital experiences, calls for needed actions from both industry and research perspectives addressing challenges related to social and human centered XR as well as XR communication aspects [NEM XR, 2022]. One such initiative is the Horizon 2020 TRANSMIXR project [TRANSMIXR], aimed at developing a distributed XR creation environment that supports remote collaboration practices, as well as an XR media experience environment for the delivery and consumption of social immersive media experiences. The NEM initiative further identifies the need for scalable solutions to obtain plausible and convincing virtual copies of physical objects and environments, as well as solutions supporting seamless and convincing interaction between the physical and the virtual world. Among key technologies and infrastructures needed to overcome outlined challenges, the following are identified [NEM XR, 2022]: high bandwidth and low-latency energy-efficient networks; remote computing for processing and rendering deployed on cloud and edge infrastructures; tools for the creation and updating of digital twins (DT) to strengthen the link between the real and virtual worlds, integrating Internet of Things (IoT) platforms; hardware in the form of advanced displays; and various content creation tools relying on interoperable formats.

Merging the digital and physical worlds

Looking towards the future, immersive digiphysical experiences set the stage for visions of the metaverse [Wang, 2023], described as representing the evolution of the Internet towards a platform enabling immersive, persistent, and interconnected virtual environments blending digital and physical [Lee, 2021].[Wang, 2022] see the metaverse as `created by the convergence of physically persistent virtual space and virtually enhance physical reality’. The metaverse is further seen as a platform offering the potential to host real-time multisensory social interactions (e.g., involving sight, hearing, touch) between people communicating with each other in real-time via avatars [Hennig-Thurau, 2023]. As of 2022, the Metaverse Standards Forum is proving a venue for industry coordination fostering the development of interoperability standards for an open and inclusive metaverse [Metaverse, 2023]. Relevant existing standards include: ISO/IEC 23005 (MPEG-V) (standardization of interfaces between the real world and the virtual world, and among virtual worlds) [ISO/IEC 23055], IEEE 2888 (definition of standardized interfaces for synchronization of cyber and physical worlds) [IEEE 2888], and MPEG-I (standards to digitally represent immersive media) [ISO/IEC 23090].

Research Challenges For The Qoe Community

Achieving wide-spread adoption of XR-based services providing digiphysical experiences across a broad range of application domains (e.g., education, industry & manufacturing, healthcare, engineering, etc.) inherently requires ensuring intuitive, comfortable, and positive user experiences. While research efforts in meeting such requirements are well under way, a number of open challenges remain.

Quality of Experience (QoE) for immersive media has been defined as [IMeX WP, 2020]the degree of delight or annoyance of the user of an application or service which involves an immersive media experience. It results from the fulfillment of his or her expectations with respect to the utility and/or enjoyment of the application or service in the light of the user’s personality and current state.” Furthermore, a bridge between QoE and UX has been established through the concept of Quality of User Experience (QUX), combining hedonic, eudaimonic and pragmatic aspects of QoE and UX [Egger-Lampl, 2019]. In the context of immersive communication and collaboration services, significant efforts are being invested towards understanding and optimizing the end user experience [Perez, 2022].

The White Paper [IMeX WP, 2020] ties immersion to the digital media world (“The more the system blocks out stimuli from the physical world, the more the system is considered to be immersive.”). Nevertheless, immersion as such exists in physical contexts as well, e.g., when reading a captivating book. MR, XR and AV scenarios are digiphysical in their nature. These considerations pose several challenges:

  1. Achieving intuitive and natural interactive experiences [Hennig-Thurau, 2023] when mixing realities.
  2. Developing a common understanding of MR-, XR- and AV-related challenges in digiphysical multi-modal multi-party settings.
  3. Advancing VR, AR, MR, XR and AV technologies to allow for truly digiphysical experiences.
  4. Measuring and modeling QoE, UX and QUX for immersive digiphysical services, covering overall methodology, measurement instruments, modeling approaches, test environments and application domains.
  5. Management of the networked infrastructure to support immersive digiphysical experiences with appropriate QoE, UX and QUX.
  6. Sustainability considerations in terms of environmental footprint, accessibility, equality of opportunities in various parts of the world, and cost/benefit ratio.

Challenges 1 and 2 demand for an experience-based bottom-up approach to focus on the most important aspects. Examples include designing and evaluating different user representations [Aseeri, 2021][Viola, 2023], natural interaction techniques [Spittle, 2023] and use of different environments by participants (AR/MR/VR) [Moslavac, 2023]. The latter has shown beneficial for challenges 3 (cf. the emergence of MR-/XR-/AV-supporting head-mounted devices such as the Microsoft Hololens and recent pass-through versions of the Meta Quest) and 4. Finally, challenges 5 and 6 need to be carefully addressed to allow for long-term adoption and feasibility.

Challenges 1 to 4 have been addressed in standardization. For instance, ITU-T Recommendation P.1320 specifies QoE assessment procedures and metrics for the evaluation of XR telemeetings, outlining various categories of QoE influence factors and use cases [ITU-T Rec. P.1320, 2022] (adopted from the 3GPP technical report TR 26.928 on XR technology in 5G). The corresponding ITU-T Study Group 12 (Question 10) developed a taxonomy of telemeetings [ITU-T Rec. G.1092, 2023], providing a systematic classification of telemeeting systems. Ongoing joint efforts between the VQEG Immersive Media Group and ITU-T Study Group 12 are targeted towards specifying interactive test methods for subjective assessment of XR communications [ITU-T P.IXC, 2022].

The complexity of the aforementioned challenges demand for a combination of fundamental work, use cases, implementations, demonstrations, and testing. One specific use case that has shown its urge during recent years in combining digital and physical realities is that of hybrid conference organization, touching in particular on the challenge of achieving intuitive and natural interactions between remote and physically present participants. We consider this use case in detail in the following section, referring to the organization of the International Conference on Quality of Multimedia Experience (QoMEX) as an example.

Immersive Communication And Collaboration: The Case Of Conference Organization

What seemed to be impossible and was undesirable in the past, became a necessity overnight during the CoVid-19 pandemic: running conferences as fully virtual events. Many research communities succeeded in adapting ongoing conference organizations such that communities could meet, present, demonstrate and socialize online. The conference QoMEX 2020 is one such example, whose organizers introduced a set of innovative instruments to mutually interact and enjoy, such as virtual Mozilla Hubs spaces for poster presentations and a music session with prerecorded contributions mixed to form a joint performance to be enjoyed virtually together. A yet unknown inventiveness was observed to make the best out of the heavily travel-restricted situation. Furthermore, the technical approaches varied from off-the-shelf systems (such as Zoom or Teams) to custom-built applications. However, the majority of meetings during CoVid times, no matter scale and nature, were run in unnatural 2D on-screen settings. The frequently reported phenomenon of videoconference (VC) fatigue can be attributed to a set of personal, organizational, technical and environmental factors [Döring, 2022]. Indeed, talking to one’s computer with many faces staring back, limited possibilities to move freely, technostress [Brod, 1984] and organizational mishaps made many people tired of VC technology that was designed for a better purpose, but could not get close enough to a natural real-life experience.

As CoVid was on its retreat, conferences again became physical events and communities enjoyed meeting again, e.g., at QoMEX 2022. However, voices were raised that asked for remote participation for various reasons, such as time or budget restrictions, environmental sustainability considerations, or simply the comfort of being able to work from home. With remote participation came the challenge of bridging between in-person and remote participants, i.e., turning conferences into hybrid events [Bajpai, 2022]. However, there are many mixed experiences from hybrid conferences, both with onsite and online participants: (1) The onsite participants suffer from interruptions of the session flow needed to fix problems with the online participation tool. Their readiness to devote effort, time, and money to participate in a future hybrid event in person might suffer from such issues, which in turn would weaken the corresponding communities; (2) The online participants suffer from similar issues, where sound irregularities (echo, excessive sound volumes, etc.) are felt to be particularly disturbing, along with feelings of being not properly included e.g., in Q&A-sessions and personal interactions. At both ends, clear signs of technostress and “us-and-them” feelings can be observed. Consequently, and despite good intentions and advice [Bajpai, 2022], any hybrid conference might miss its main purpose to bring researchers together to present, discuss and socialize. To avoid the above-listed issues, the post-CoVid QoMEX conferences (since 2022) avoided hybrid operations, with few exceptions.

A conference is a typical case that reveals difficulties in bringing the physical and digital worlds together [Westerlund, 2020], at least when relying upon state-of-the-art telemeeting approaches that have not explicitly been designed for hybrid and digiphysical operations. At the recent 26th ACM Conference on Computer-Supported Cooperative Work And Social Computing in Minneapolis, USA (CSCW 2023), one of the panel sessions focused on “Realizing Values in Hybrid Environments”. Panelists and audience shared experiences about successes and failures with hybrid events. The main take-aways were as follows: (1) there is a general lack of know-how, no matter how much funds are allocated, and (2) there is a significant demand for research activities in the area.

Yet, there is hope, as increasingly many VR, MR, XR and AV-supporting devices and applications keep emerging, enabling new kinds and representations of immersive experiences. In a conference context, the latter implies the feeling of “being there”, i.e., being integrated in the conference community, no matter where the participant is located. This calls for new ways of interacting amongst others through various realities (VR/MR/XR), which need to be invented, tried and evaluated in order to offer new and meaningful experiences in telemeeting scenarios [Viola, 2023]. Indeed, CSCW 2023 hosted a specific workshop titled “Emerging Telepresence Technologies for Hybrid Meetings: an Interactive Workshop”, during which visions, experiences, and solutions were shared and could be experienced locally and remotely. About half of the participants were online, successfully interacting with participants onsite via various techniques.

With these challenges and opportunities in mind, the motto of QoMEX 2024 has been set as “Towards immersive digiphysical experiences.” While the conference is organized as an in-person event, a set of carefully selected hybrid activities will be offered to interested remote participants, such as (1) 360° stereoscopic streaming of the keynote speeches and demo sessions, and (2) the option to take part in so-called hybrid experience demos. The 360° stereoscopic streaming has so far been tested successfully in local, national and transatlantic sessions (during the above-mentioned CSCW workshop) with various settings, and further fine-tuning will be done and tested before the conference. With respect to the demo session – and in addition to traditional onsite demos – this year, the conference will in particular solicit hybrid experience demos that enable both onsite and remote participants to test the demo in an immersive environment. Facilities will also be provided for onsite participants to test demos from both the perspective of a local and remote user, enabling them to experience different roles. The organizers of QoMEX 2024 hope that the hybrid activities of QoMEX 2024 will trigger more research interest in these areas along and beyond the classical lines of QoE research (to perform quantitative subjective studies of QoE features and correlating them with QoE factors).

QoMEX 2024: Towards Immersive Digiphysical Experiences

Concluding Remarks

As immersive experiences extend into both digital and physical worlds and realities, there is a great space to conquer for QoE, UX, and QUX-related research. While the recent CoVid pandemic has forced many users to replace physical with digital meetings and sustainability considerations have reduced many peoples’ and organizations’ readiness to (support) travel, shortcomings of hybrid digiphysical meetings have failed to persuade their participants of their superiority over pure online or on-site meetings. Indeed, one promising path towards a successful integration of physical and digital worlds consists of trying out, experiencing, reflecting, and deriving important research questions for and beyond the QoE research community The upcoming conference QoMEX 2024 will be a stop along this road with carefully selected hybrid experiences aimed at boosting research and best practice in the QoE domain towards immersive digiphysical experiences.

References

  • [Aseeri, 2021] Aseeri, S., & Interrante, V. (2021). The Influence of Avatar Representation on Interpersonal Communication in Virtual Social Environments. IEEE Transactions on Visualization and Computer Graphics, 27(5), 2608-2617.
  • [Bajpai, 2022] Bajpai, V., et al.. (2022). Recommendations for designing hybrid conferences. ACM SIGCOMM Computer Communication Review, 52(2), 63-69.
  • [Brod, 1984] Brod, C. (1984). Technostress: The Human Cost of the Computer Revolution. Basic Books; New York, NY, USA: 1984.
  • [Döring, 2022] Döring, N., Moor, K. D., Fiedler, M., Schoenenberg, K., & Raake, A. (2022). Videoconference Fatigue: A Conceptual Analysis. International Journal of Environmental Research and Public Health, 19(4), 2061.
  • [Egger-Lampl, 2019] Egger-Lampl, S., Hammer, F., & Möller, S. (2019). Towards an integrated view on QoE and UX: adding the Eudaimonic Dimension, ACM SIGMultimedia Records, 10(4):5.
  • [Gibbs, 2022] Gibbs, J. K., Gillies, M., & Pan, X. (2022). A comparison of the effects of haptic and visual feedback on presence in virtual reality. International Journal of Human-Computer Studies, 157, 102717.
  • [Hennig-Thurau, 2023] Hennig-Thurau, T., Aliman, D. N., Herting, A. M., Cziehso, G. P., Linder, M., & Kübler, R. V. (2023). Social Interactions in the Metaverse: Framework, Initial Evidence, and Research Roadmap. Journal of the Academy of Marketing Science, 51(4), 889-913.
  • [IMeX WP, 2020] Perkis, A., Timmerer, C., et al., “QUALINET White Paper on Definitions of Immersive Media Experience (IMEx)”, European Network on Quality of Experience in Multimedia Systems and Services, 14th QUALINET meeting (online), May 25, 2020. Online: https://arxiv.org/abs/2007.07032
  • [ISO/IEC 23055] ISO/IEC 23005 (MPEG-V) standards, Media Context and Control, https://mpeg.chiariglione.org/standards/mpeg-v, accessed January 21, 2024.
  • [ISO/IEC 23090] ISO/IEC 23090 (MPEG-I) standards, Coded representation of Immersive Media, https://mpeg.chiariglione.org/standards/mpeg-i, accessed January 21, 2024.
  • [IEEE 2888] IEEE 2888 standards, https://sagroups.ieee.org/2888/, accessed January 21, 2024.
  • [ITU-T Rec.. G.1092, 2023] ITU-T Recommendation G.1092 – Taxonomy of telemeetings from a quality of experience perspective, Oct. 2023.
  • [ITU-T Rec. P.1320, 2022] ITU-T Recommendation P.1320 – QoE assessment of extended reality (XR) meetings, 2022.
  • [ITU-T P.IXC, 2022] ITU-T Work Item: Interactive test methods for subjective assessment of extended reality communications, under study,” 2022.
  • [Lee, 2021] Lee, L. H. et al. (2021). All One Needs to Know about Metaverse: A Complete Survey on Technological Singularity, Virtual Ecosystem, and Research Agenda. arXiv preprint arXiv:2110.05352.
  • [Metaverse, 2023] Metaverse Standards Forum, https://metaverse-standards.org/
  • [Milgram, 1995] Milgram, P., Takemura, H., Utsumi, A., & Kishino, F. (1995, December). Augmented reality: A class of displays on the reality-virtuality continuum. In Telemanipulator and telepresence technologies (Vol. 2351, pp. 282-292). International Society for Optics and Photonics.
  • [Moslavac, 2023] Moslavac, M., Brzica, L., Drozd, L., Kušurin, N., Vlahović, S., & Skorin-Kapov, L. (2023, July). Assessment of Varied User Representations and XR Environments in Consumer-Grade XR Telemeetings. In 2023 17th International Conference on Telecommunications (ConTEL) (pp. 1-8). IEEE.
  • [Rauschnabel, 2022] Rauschnabel, P. A., Felix, R., Hinsch, C., Shahab, H., & Alt, F. (2022). What is XR? Towards a Framework for Augmented and Virtual Reality. Computers in human behavior, 133, 107289.
  • [NEM WP, 2022] New European Media (NEM), NEM: List of topics for the Work Program 2023-2024.
  • [NEM XR, 2022] New European Media (NEM), NEM contribution to the XR coalition, June 2022.
  • [Perez, 2022] Pérez, P., Gonzalez-Sosa, E., Gutiérrez, J., & García, N. (2022). Emerging Immersive Communication Systems: Overview, Taxonomy, and Good Practices for QoE Assessment. Frontiers in Signal Processing, 2, 917684.
  • [Spittle, 2023] Spittle, B., Frutos-Pascual, M., Creed, C., & Williams, I. (2023). A Review of Interaction Techniques for Immersive Environments. IEEE Transactions on Visualization and Computer Graphics, 29(9), Sept. 2023.
  • [TRANSMIXR] EU HORIZON 2020 TRANSMIXR project, Ignite the Immersive Media Sector by Enabling New Narrative Visions, https://transmixr.eu/
  • [Viola, 2023] Viola, I., Jansen, J., Subramanyam, S., Reimat, I., & Cesar, P. (2023). VR2Gather: A Collaborative Social VR System for Adaptive Multi-Party Real-Time Communication. IEEE MultiMedia, 30(2).
  • [Wang 2023] Wang, H. et al. (2023). A Survey on the Metaverse: The State-of-the-Art, Technologies, Applications, and Challenges. IEEE Internet of Things Journal, 10(16).
  • [Wang, 2022] Wang, Y. et al. (2022). A Survey on Metaverse: Fundamentals, Security, and Privacy. IEEE Communications Surveys & Tutorials, 25(1).
  • [Westerlund, 2020] Westerlund, T. & Marklund, B. (2020). Community pharmacy and primary health care in Sweden – at a crossroads. Pharm Pract (Granada), 18(2): 1927.

MPEG Column: 144th MPEG Meeting in Hannover, Germany

The 144th MPEG meeting was held in Hannover, Germany! For those interested, the press release is available with all the details. It’s great to see progress being made in person (cf. also the group pictures below). The main outcome of this meeting is as follows:

  • MPEG issues Call for Learning-Based Video Codecs for Study of Quality Assessment
  • MPEG evaluates Call for Proposals on Feature Compression for Video Coding for Machines
  • MPEG progresses ISOBMFF-related Standards for the Carriage of Network Abstraction Layer Video Data
  • MPEG enhances the Support of Energy-Efficient Media Consumption
  • MPEG ratifies the Support of Temporal Scalability for Geometry-based Point Cloud Compression
  • MPEG reaches the First Milestone for the Interchange of 3D Graphics Formats
  • MPEG announces Completion of Coding of Genomic Annotations

We have modified the press release to cater to the readers of ACM SIGMM Records and highlighted research on video technologies. This edition of the MPEG column focuses on MPEG Systems-related standards and visual quality assessment. As usual, the column will end with an update on MPEG-DASH.

Attendees of the 144th MPEG meeting in Hannover, Germany.

Visual Quality Assessment

MPEG does not create standards in the visual quality assessment domain. However, it conducts visual quality assessments for its standards during various stages of the standardization process. For instance, it evaluates responses to call for proposals, conducts verification tests of its final standards, and so on. MPEG Visual Quality Assessment (AG 5) issued an open call to study quality assessment for learning-based video codecs. AG 5 has been conducting subjective quality evaluations for coded video content and studying their correlation with objective quality metrics. Most of these studies have focused on the High Efficiency Video Coding (HEVC) and Versatile Video Coding (VVC) standards. To facilitate the study of visual quality, MPEG maintains the Compressed Video for the study of Quality Metrics (CVQM) dataset.

With the recent advancements in learning-based video compression algorithms, MPEG is now studying compression using these codecs. It is expected that reconstructed videos compressed using learning-based codecs will have different types of distortion compared to those induced by traditional block-based motion-compensated video coding designs. To gain a deeper understanding of these distortions and their impact on visual quality, MPEG has issued a public call related to learning-based video codecs. MPEG is open to inputs in response to the call and will invite responses that meet the call’s requirements to submit compressed bitstreams for further study of their subjective quality and potential inclusion into the CVQM dataset.

Considering the rapid advancements in the development of learning-based video compression algorithms, MPEG will keep this call open and anticipates future updates to the call.

Interested parties are kindly requested to contact the MPEG AG 5 Convenor Mathias Wien (wien@lfb.rwth- aachen.de) and submit responses for review at the 145th MPEG meeting in January 2024. Further details are given in the call, issued as AG 5 document N 104 and available from the mpeg.org website.

Research aspects: Learning-based data compression (e.g., for image, audio, video content) is a hot research topic. Research on this topic relies on datasets offering a set of common test sequences, sometimes also common test conditions, that are publicly available and allow for comparison across different schemes. MPEG’s Compressed Video for the study of Quality Metrics (CVQM) dataset is such a dataset, available here, and ready to be used also by researchers and scientists outside of MPEG. The call mentioned above is open for everyone inside/outside of MPEG and allows researchers to participate in international standards efforts (note: to attend meetings, one must become a delegate of a national body).

MPEG Systems-related Standards

At the 144th MPEG meeting, MPEG Systems (WG 3) produced three news-worthy items as follows:

  • Progression of ISOBMFF-related standards for the carriage of Network Abstraction Layer (NAL) video data.
  • Enhancement of the support of energy-efficient media consumption.
  • Support of temporal scalability for geometry-based Point Cloud Compression (PPC).

ISO/IEC 14496-15, a part of the family of ISOBMFF-related standards, defines the carriage of Network Abstract Layer (NAL) unit structured video data such as Advanced Video Coding (AVC), High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC), Essential Video Coding (EVC), and Low Complexity Enhancement Video Coding (LCEVC). This standard has been further improved with the approval of the Final Draft Amendment (FDAM), which adds support for enhanced features such as Picture-in-Picture (PiP) use cases enabled by VVC.

In addition to the improvements made to ISO/IEC 14496-15, separately developed amendments have been consolidated in the 7th edition of the standard. This edition has been promoted to Final Draft International Standard (FDIS), marking the final milestone of the formal standard development.

Another important standard in development is the 2nd edition of ISO/IEC14496-32 (file format reference software and conformance). This standard, currently at the Committee Draft (CD) stage of development, is planned to be completed and reach the status of Final Draft International Standard (FDIS) by the beginning of 2025. This standard will be essential for industry professionals who require a reliable and standardized method of verifying the conformance of their implementation.

MPEG Systems (WG 3) also promoted ISO/IEC 23001-11 (energy-efficient media consumption (green metadata)) Amendment 1 to Final Draft Amendment (FDAM). This amendment introduces energy-efficient media consumption (green metadata) for Essential Video Coding (EVC) and defines metadata that enables a reduction in decoder power consumption. At the same time, ISO/IEC 23001-11 Amendment 2 has been promoted to the Committee Draft Amendment (CDAM) stage of development. This amendment introduces a novel way to carry metadata about display power reduction encoded as a video elementary stream interleaved with the video it describes. The amendment is expected to be completed and reach the status of Final Draft Amendment (FDAM) by the beginning of 2025.

Finally, MPEG Systems (WG 3) promoted ISO/IEC 23090-18 (carriage of geometry-based point cloud compression data) Amendment 1 to Final Draft Amendment (FDAM). This amendment enables the compression of a single elementary stream of point cloud data using ISO/IEC 23090-9 (geometry-based point cloud compression) and storing it in more than one track of ISO Base Media File Format (ISOBMFF)-based files. This enables support for applications that require multiple frame rates within a single file and introduces a track grouping mechanism to indicate multiple tracks carrying a specific temporal layer of a single elementary stream separately.

Research aspects: MPEG Systems usually provides standards on top of existing compression standards, enabling efficient storage and delivery of media data (among others). Researchers may use these standards (including reference software and conformance bitstreams) to conduct research in the general area of multimedia systems (cf. ACM MMSys) or, specifically on green multimedia systems (cf. ACM GMSys).

MPEG-DASH Updates

The current status of MPEG-DASH is shown in the figure below with only minor updates compared to the last meeting.

MPEG-DASH Status, October 2023.

In particular, the 6th edition of MPEG-DASH is scheduled for 2024 but may not include all amendments under development. An overview of existing amendments can be found in the column from the last meeting. Current amendments have been (slightly) updated and progressed toward completion in the upcoming meetings. The signaling of haptics in DASH has been discussed and accepted for inclusion in the Technologies under Consideration (TuC) document. The TuC document comprises candidate technologies for possible future amendments to the MPEG-DASH standard and is publicly available here.

Research aspects: MPEG-DASH has been heavily researched in the multimedia systems, quality, and communications research communities. Adding haptics to MPEG-DASH would provide another dimension worth considering within research, including, but not limited to, performance aspects and Quality of Experience (QoE).

The 145th MPEG meeting will be online from January 22-26, 2024. Click here for more information about MPEG meetings and their developments.

JPEG Column: 100th meeting in Covilha, Portugal

JPEG AI reaches Committee Draft stage at the 100th JPEG meeting

The 100th JPEG meeting was held in Covilhã, Portugal, from July 17th to 21st, 2023. At this meeting, in addition to its usual standardization activities, the JPEG Committee organized a celebration on the occasion of its 100th meeting. This face-to-face meeting, the second after the pandemic, had a record amount of face-to-face participation, with more than 70 experts attending the meeting in person.

Several activities reached important milestones. JPEG AI became a committee draft after intensive meeting sessions with detailed analysis of the core experiment results and multiple evaluations of the considered technologies. JPEG NFT issued a call for proposals, and the first JPEG XE use cases and requirements document was also issued publicly. Furthermore, JPEG Trust has made major steps towards its standardization.

The 100th JPEG meeting had the following highlights:

  • JPEG Celebrates its 100th meeting;
  • JPEG AI reaches Committee Draft;
  • JPEG Pleno Learning-based Point Cloud coding improves its Verification Model;
  • JPEG Trust develops its first part, the “Core Foundation”;
  • JPEG NFT releases the Final Call for Proposals;
  • JPEG AIC-3 initiates the definition of a Working Draft;
  • JPEG XE releases the Use Cases and Requirements for Event-based Vision;
  • JPEG DNA defines the evaluation of the responses to the Call for Proposals;
  • JPEG XS proceeds the development of the 3rd edition;
  • JPEG Systems releases a Reference Software.

The following sections summarize the main highlights of the 100th JPEG meeting.

JPEG Celebrates its 100th meeting

The JPEG Committee organized a celebration of its 100th meeting. A ceremony took place on July 19, 2023 to mark this important milestone. The JPEG Convenor initiated the ceremony, followed by a speech from Prof. Carlos Salema, founder and former chair of the Instituto de Telecomunicações and current vice president of the Lisbon Academy of Sciences, and a welcome note from Prof. Silvia Socorro, vice-rector for research at the University of Beira Interior. Personalities from standardization organizations ISO, IEC and ITU, as well as the Portuguese government, sent welcome addresses in form of recorded videos. Furthermore, a collection of short video addresses from past and current JPEG experts was collected and presented during the ceremony. The celebration was preceded by a workshop on “Media Authenticity in the Age of Artificial Intelligence”. Further information on the workshop and its proceedings are accessible on jpeg.org. A social event followed the celebration ceremony.

The 100th meeting celebration and cake.

100th meeting Social Event.

JPEG AI

The JPEG AI (ISO/IEC 6048) learning-based image coding system has completed the Committee Draft of the standard. The current JPEG AI Verification Model (VM) has two operation points, called base and high which include several tools which can be enabled or disabled, without re-training the neural network models. The base operation point is a subset of design elements of the high operation point. The lowest configuration (base operating point without tools) provides 8% rate savings over the VVC Intra anchor with twice faster decoding and 250 times faster encoder run time on CPU. In the most powerful configuration, the current VM achieves a 29% compression gain over the VVC Intra anchor.

The performance of the JPEG AI VM 3 was presented and discussed during the 100th JPEG meeting. The findings of the 15 core experiments created during the previous 99th JPEG meeting, as well as other input contributions, were discussed and investigated. This effort resulted in the reorganization of many syntactic parts with the goal of their simplification, as well as the use of several neural networks and tools, namely some design simplifications and post filtering improvements. Furthermore, coding efficiency was increased at high quality up to visually lossless, and region-of-interest quality enhancement functionality, as well as bit-exact repeatability, were added among other enhancements. The attention mechanism for the high operation point is the most significant change, as it considerably decreases decoder complexity. The entropy decoding neural network structure is now identical for the high and base operation points. The defined analysis and synthesis transforms enable efficient coding from high quality to near visually lossless and the chroma quality has been improved with the use of novel enhancement filtering technologies.

JPEG Pleno Learning-based Point Cloud coding

The JPEG Pleno Point Cloud activity progressed at the 100th meeting with a major improvement to its Verification Model (VM) incorporating a sparse convolutional framework providing improved quality with a more efficient computational model. In addition, an exciting new application was demonstrated showing the ability of the JPEG VM to support point cloud classification. The 100th JPEG Meeting also saw the release of a new point cloud test set to better support this activity. Prior to the 101st JPEG meeting in October 2023, JPEG experts will investigate possible advancements to the VM in the areas of attention models, voxel pruning within sparse tensor convolution, and support for residual lossless coding. In addition, a major Exploration Study will be conducted to explore the latest point cloud quality metrics.

JPEG Trust

The JPEG Committee is expediting the development of the first part, the “Core Foundation”, of its new international standard: JPEG Trust. This standard defines a framework for establishing trust in media, and addresses aspects of authenticity and provenance through secure and reliable annotation of media assets throughout their life cycle. JPEG Trust is being built on its 2022 Call for Proposals, whose responses form the basis of the framework under development.

The new standard is expected to be published in 2024. To stay updated on JPEG Trust, please regularly check the JPEG website at jpeg.org for the latest information and reach out to the contacts listed below to subscribe to the JPEG Trust mailing list.

JPEG NFT

Non-Fungible Tokens (NFTs) are an exciting new way to create and trade media assets, and have seen an increasing interest from global markets. NFTs promise to impact the trading of artworks, collectible media assets, micro-licensing, gaming, ticketing and more.  At the same time, concerns about interoperability between platforms, intellectual property rights, and fair dealing must be addressed.

JPEG is pleased to announce a Final Call for Proposals on JPEG NFT to address these challenges. The Final Call for Proposals on JPEG NFT and the associated Use Cases and Requirements for JPEG NFT document can be downloaded from the jpeg.org website. JPEG invites interested parties to register their proposals by 2023-10-23. The final deadline for submission of full proposals is 2024-01-15.

JPEG AIC

During the 100th JPEG meeting, the AIC activity continued its efforts on the Core Experiments, which aim at collecting fundamental information on the performance of the contributions received in April 2023 in response to a Call for Contributions on Subjective Image Quality Assessment. These results will be considered during the design of the AIC-3 standard, which has been carried out in a collaborative way since its beginning. The activity also initiated the definition of a Working Draft for AIC-3.

Other activities are also planned to initiate the work on a Draft Call for Proposals on Objective Image Quality Metrics (AIC-4) during the 101st JPEG meeting, October 2023. The JPEG Committee invites interested parties to take part in the discussions and drafting of the Call.

JPEG XE

For the Event-based Vision exploration, called JPEG XE, the JPEG Committee finalized a first version of a Use Cases and Requirements for Event-based Vision v0.5 document. Event-based Vision revolves around a new and emerging image modality created by event-based visual sensors. JPEG XE is about creation and development of a standard to represent events in an efficient way allowing interoperability between sensing, storage, and processing, targeting machine vision and other relevant applications. Events in the context of this standard are defined as the messages that signal the result of an observation at a precise point in time, typically triggered by a detected change in the physical world. The new Use Cases and Requirements document is the first version to become publicly available and serves mainly to attract interest from external experts and other standardization organizations. Although still in a preliminary version, the JPEG committee continues to invest efforts into refining this document, so that it can serve as a solid basis for further standardization. An Ad-Hoc Group has been re-established to work on this topic until the 101st JPEG meeting in October 2023. To stay informed about the activities please join the event-based imaging Ad-hoc Group mailing list.

JPEG DNA

The JPEG Committee has been exploring coding of images in quaternary representations particularly suitable for image archival on DNA storage. The scope of JPEG DNA is to create a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers.

At the 100th JPEG meeting, “Additions to the JPEG DNA Common Test Conditions version 2.0”, was produced which supplements the “JPEG DNA Common Test Conditions” by specifying a new constraint to be taken into account when coding images in quaternary representation. In addition, the detailed procedures for evaluation of the pre-registered responses to the JPEG DNA Call for Proposals were defined.

Furthermore, the next steps towards a deployed high-performance standard were discussed and defined. In particular, it was decided to request for the new work item approval once a Committee Draft stage has been reached.

The JPEG-DNA AHG has been re-established to work on the preparation of assessment and crosschecking of responses to the JPEG DNA Call for Proposals until the 101st JPEG meeting in October 2023.

JPEG XS

The JPEG Committee continued its work on the JPEG XS 3rd edition. The main goal of the 3rd edition is to reduce the bitrate for on-screen content by half while maintaining the same image quality.

Part 1 of the standard – Core coding tools – is still under Draft International Standard (DIS) ballot. For Part 2 – Profiles and buffer models – and Part 3 – Transport and container formats – the Committee Draft (CD) circulation results were processed and the DIS ballot document was created. In Part 2, three new profiles have been added to better adapt to the needs of the market. In particular, two profiles are based on the High 444.12 profile, but introduce some useful constraints on the wavelet decomposition structure and disable the column modes entirely. This makes the profiles easier to implement (with lower resource usage and fewer options to support) while remaining consistent with the way JPEG XS is already being deployed in the market today. Additionally, the two new High profiles are further constrained by explicit conformance points (like the new TDC profile) to better support market interoperability. The third new profile is called TDC MLS 444.12, and allows the achievement of mathematically lossless quality. For example, it is intended for medical applications, where a truly lossless reconstruction might be required.

Completion of the JPEG XS 3rd edition standard is scheduled for January 2024.

JPEG Systems

At the 100th meeting the JPEG Committee produced the CD text of 19566-10, the JPEG Systems Reference Software. In addition, a JPEG white paper was released that provides an overview of the entire JPEG Systems standard. The white paper can be downloaded on the JPEG.org website.

Final Quote

“The JPEG Committee celebrated its 100th meeting, an important milestone considering the current success of JPEG standards. This celebration was enriched with significant achievements at the meeting, notably the release of the Committee Draft of JPEG AI.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Report from CBMI 2023


The 20th International Conference on Content-based Multimedia Indexing (CBMI) was held exclusively as an in-person event in Orleans, France, on September 20-22, 2023. The conference was organized by the University of Orleans and received support from SIGMM. This edition marked a significant milestone as it was the first fully physical conference following the pandemic, providing a welcome opportunity for face-to-face interactions. The event drew a diverse and international audience, with participation from between 70 and 80 attendees representing 18 countries (12 Europeans, 4 Asians, 1 American and 1 African). Additionally, the conference included a European meeting (CHIST-ERA XAIface project) associated with the main event, which brought together approximately 15 individuals. Furthermore, several engineering students from the University of Orleans were invited to participate, allowing them to gain insights into cutting-edge multimedia research and exchange knowledge and ideas.

Program highlights

The conference was structured around two keynote presentations. The first keynote was presented by Prof. Alberto del Bimbo from the University of Florence, who spoke on the topic of “AI-Powered Personal Fashion Advising.” During his talk, Prof. Delbimbo discussed the key tasks and challenges related to using artificial intelligence in the fashion advisory field.

The closing keynote was delivered by Prof. Nicolas Hervé from the Institut National de l’Audiovisuel (French National Audiovisual Archive). Prof. Hervé highlighted the research activities conducted at Ina and how they could be integrated into information systems and enhance the value of their collections. His presentation provided insights into the practical applications of their work.

Presentation of our keynote speakers.

In conjunction with the presentation of 18 papers across four regular paper sessions, the 2023 conference adhered to the established tradition of previous editions by incorporating special sessions. These special sessions were designed to delve into the practical applications of multimedia indexing within specific domains or distinctive settings. This approach allowed for a more focused and in-depth exploration of several topics, offering valuable insights and discussions beyond the regular paper sessions.

In the ongoing year, we received a substantial volume of submissions, culminating in the approval of six special sessions. These special sessions have collectively embraced a total of 25 accepted papers.

  • Cultural Heritage and Multimedia Content
  • Interactive Video Retrieval for Beginners (IVR4B)
  • Physical Models and AI in Image and in Multi-modality 
  • Computational Memorability of Imagery
  • Cross-modal multimedia analysis and retrieval for well-being insights
  • Explainability in Multimedia Analysis (ExMA)

The coordination of these special sessions involved the collaborative efforts of multiple countries, including France, Austria, Ireland, Iceland, the UK, Romania, Japan, Norway, and Vietnam.

The special sessions encompassed a diverse range of multimedia topics, spanning from applications such as cultural heritage preservation and retrieval to machine learning, with a particular focus on facets like explainability and the utilization of physical models.

The conference program was complemented by a poster session composed of fourteen posters. The latter was followed by a demo session which comprised IVR4B video retrieval competition. 

Participants at the poster session.
Participants at the demo session.

The best paper of the conference was awarded EUR 500, generously sponsored by ACM SIGMM. The selection committee quickly found consensus to award the best paper award to Romain XU-DARME, Jenny Benois-Pineau, Romain Giot, Georges Quénot, Zakaria Chihani, Marie-Christine Rousset and Alexey Zhukov for their paper “On the stability, correctness, and plausibility of visual explanation methods based on feature importance”.

Social events

In addition to the two conference dinners organized by the conference committee, the participants had the opportunity to enjoy a guided tour through Orleans on their way to the first restaurant.

Participants enjoyed the first dinner after the guided tour

Among the social events organized during CBMI 2023, was the Music meets Science concert with the support of ACM SIGMM. After a series of scientific presentations, participants were able to appreciate the works of Beethoven, Murphy and Lizee. We thank ACM SIGMM for their support which made this cultural event possible.

The Odyssée Quartet composed of François Pineau-Benois (violinist),  Raphael Moraly (cellist), Olivier Marin (violist) and Audrey Sproule (violinist).

Outlook

The next edition of CBMI will be organized in Iceland. After several hybrid editions, we moved back on site towards the pre-pandemic level. 

VQEG Column: VQEG Meeting June 2023

Introduction

This column provides a report on the last Video Quality Experts Group (VQEG) plenary meeting, which took place from 26 to 30 June 2023 in San Mateo (USA), hosted by Sony Interactive Entertainment. More than 90 participants worldwide registered for the hybrid meeting, counting with the physical attendance of more than 40 people. This meeting was co-located with the ITU-T SG12 meeting, which took place in the first two days of the week. In addition, more than 50 presentations related to the ongoing projects within VQEG were provided, leading to interesting discussions among the researchers attending the meeting. All the related information, minutes, and files from the meeting are available online on the VQEG meeting website, and video recordings of the meeting are available on Youtube.

In this meeting, there were several aspects that can be relevant for the SIGMM community working on quality assessment. For instance, there are interesting new work items and efforts on updating existing recommendations discussed in the ITU-T SG12 co-located meeting (see the section about the Intersector Rapporteur Group on Audiovisual Quality Assessment). In addition, there was an interesting panel related to deep learning for video coding and video quality with experts from different companies (e.g., Netflix, Adobe, Meta, and Google) (see the Emerging Technologies Group section). Also, a special session on Quality of Experience (QoE) for gaming was organized, involving researchers from several international institutions. Apart from this, readers may be interested in the presentation about MPEG activities on quality assessment and the different developments from industry and academia on tools, algorithms and methods for video quality assessment.

We encourage readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Group picture of the VQEG Meeting 26-30 June 2023 hosted by Sony Interactive Entertainment (San Mateo, USA).

Overview of VQEG Projects

Audiovisual HD (AVHD)

The AVHD group investigates improved subjective and objective methods for analyzing commonly available video systems. In this meeting, there were several presentations related to topics covered by this group, which were distributed in different sessions during the meeting.

Nabajeet Barman (Kingston University, UK) presented a datasheet for subjective and objective quality assessment datasets. Ali Ak (Nantes Université, France) delivered a presentation on the acceptability and annoyance of video quality in context. Mikołaj Leszczuk (AGH University, Poland) presented a crowdsourcing pixel quality study using non-neutral photos. Kamil Koniuch (AGH University, Poland) discussed about the role of theoretical models in ecologically valid studies, covering the example of a video quality of experience model. Jingwen Zhu (Nantes Université, France) presented her work on evaluating the streaming experience of the viewers with Just Noticeable Difference (JND)-based Encoding. Also, Lucjan Janowski (AGH University, Poland) talked about proposing a more ecologically-valid experiment protocol using YouTube platform.

In addition, there were four presentations by researchers from the industry sector. Hojat Yeganeh (SSIMWAVE/IMAX, USA) talked about how more accurate video quality assessment metrics would lead to more savings. Lukas Krasula (Netflix, USA) delivered a presentation on subjective video quality for 4K HDR-WCG content using a browser-based approach for at-home testing. Also, Christos Bampis (Netflix, USA) presented the work done by Netflix on improving video quality with neural networks. Finally, Pranav Sodhani (Apple, USA) talked about how to evaluate videos with the Advanced Video Quality Tool (AVQT).

Quality Assessment for Health applications (QAH)

The QAH group works on the quality assessment of health applications, considering both subjective evaluation and the development of datasets, objective metrics, and task-based approaches. The group is currently working towards an ITU-T recommendation for the assessment of medical contents. In this sense, Meriem Outtas (INSA Rennes, France) led an editing session of a draft of this recommendation.

Statistical Analysis Methods (SAM)

The SAM group works on improving analysis methods both for the results of subjective experiments and for objective quality models and metrics. The group is currently working on updating and merging the ITU-T recommendations P.913, P.911, and P.910.

Apart from this, several researchers presented their works on related topics. For instance, Pablo Pérez (Nokia XR Lab, Spain) presented (not so) new findings about transmission rating scale and subjective scores. Also, Jingwen Zhu (Nantes Université, France) presented ZREC, an approach for mean and percentile opinion scores recovery. In addition, Andreas Pastor (Nantes Université, France) presented three works: 1) on the accuracy of open video quality metrics for local decision in AV1 video codec, 2) on recovering quality scores in noisy pairwise subjective experiments using negative log-likelihood, and 3) on guidelines for subjective haptic quality assessment, considering a case study on quality assessment of compressed haptic signals. Lucjan Janowski (AGH University, Poland) discussed about experiment precision, proposing experiment precision measures and methods for experiments comparison. Finally, there were three presentations from members of the University of Konstanz (Germany). Dietmar Saupe presented the JPEG AIC-3 activity on fine-grained assessment of subjective quality of compressed images, Mohsen Jenadeleh talked about how relaxed forced choice improves performance of visual quality assessment methods, and Mirko Dulfer presented his work on quantization for Mean Opinion Score (MOS) recovery in Absolute Category Rating (ACR) experiments.

Computer Generated Imagery (CGI)

CGI group is devoted to analyzing and evaluating of computer-generated content, with a focus on gaming in particular. In this meeting, Saman Zadtootaghaj (Sony Interactive Entertainment, Germany) an Nabajeet Barman (Kingston University, UK) organized a special gaming session, in which researchers from several international institutions presented their work in this topic. Among them, Yu-Chih Chen (UT Austin LIVE Lab, USA) presented GAMIVAL, a Video Quality Prediction on Mobile Cloud Gaming Content. Also, Urvashi Pal (Akamai, USA) delivered a presentation on web streaming quality assessment via computer vision applications over cloud. Mathias Wien (RWTH Aachen University, Germany) provided updates on ITU-T P.BBQCG work item, dataset and model development. Avinab Saha (UT Austin LIVE Lab, USA) presented a study of subjective and objective quality assessment of mobile cloud gaming videos. Finally, Irina Cotanis (Infovista, Sweden) and Karan Mitra (Luleå University of Technology, Sweden) presented their work towards QoE models for mobile cloud and virtual reality games.

No Reference Metrics (NORM)

The NORM group is an open collaborative project for developing no-reference metrics for monitoring visual service quality. In this meeting, Margaret Pinson (NTIA, USA) and Ioannis Katsavounidis (Meta, USA), two of the chairs of the group, provided a summary of NORM successes and discussion of current efforts for improved complexity metric. In addition, there were six presentations dealing with related topics. C.-C. Jay Kuo (University of Southern California, USA) talked about blind visual quality assessment for mobile/edge computing. Vignesh V. Menon (University of Klagenfurt, Austria) presented the updates of the Video Quality Analyzer (VQA). Yilin Wang (Google/YouTube, USA) gave a talk on the recent updates on the Universal Video Quality (UVQ). Farhad Pakdaman (Tampere University, Finland) and Li Yu (Nanjing University, China), presented a low complexity no-reference image quality assessment based on multi-scale attention mechanism with natural scene statistics. Finally, Mikołaj Leszczuk (AGH University, Poland) presented his work on visual quality indicators adapted to resolution changes and on considering in-the-wild video content as a special case of user generated content and a system for its recognition.

Emerging Technologies Group (ETG)

The main objective of the ETG group is to address various aspects of multimedia that do not fall under the scope of any of the existing VQEG groups. The topics addressed are not necessarily directly related to “video quality” but can indirectly impact the work addressed as part of VQEG. This group aims to provide a common platform for people to gather together and discuss new emerging topics, discuss possible collaborations in the form of joint survey papers/whitepapers, funding proposals, etc.

One of the topics addressed by this group is related to the use of artificial-intelligence technologies to different domains, such as compression, super-resolution, and video quality assessment. In this sense, Saman Zadtootaghaj (Sony Interactive Entertainment, Germany) organized a panel session with experts from different companies (e.g., Netflix, Adobe, Meta, and Google) on deep learning in the video coding and video quality domains. In this sense, Marcos Conde (Sony Interactive Entertainment, Germany) and David Minnen (Google, USA) gave a talk on generative compression and the challenges for quality assessment.

Another topic covered by this group is greening of streaming and related trends. In this sense, Vignesh V. Menon and Samira Afzal (University of Klagenfurt, Austria) presented their work on green variable framerate encoding for adaptive live streaming. Also, Prajit T. Rajendran (Université Paris Saclay, France) and Vignesh V. Menon (University of Klagenfurt, Austria) delivered a presentation on energy efficient live per-title encoding for adaptive streaming. Finally, Berivan Isik (Stanford University, USA) talked about sandwiched video compression to efficiently extending the reach of standard codecs with neural wrappers.

Joint Effort Group (JEG) – Hybrid

The JEG group was focused on a joint work to develop hybrid perceptual/bitstream metrics and gradually evolved over time to include several areas of Video Quality Assessment (VQA), such as the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. In addition, the group will include under its activities the VQEG project Implementer’s Guide for Video Quality Metrics (IGVQM).

Apart from this, there were three presentations addressing related topics in this meeting. Nabajeet Barman (Kingston University, UK) presented a subjective dataset for multi-screen video streaming applications. Also, Lohic Fotio (Politecnico di Torino, Italy) presented his works entitled “Human-in-the-loop” training procedure of the artificial-intelligence-based observer (AIO) of a real subject and advances on the “template” on how to report DNN-based video quality metrics.

The website of the group includes a list of activities of interest, freely available publications, and other resources.

Immersive Media Group (IMG)

The IMG group is focused on the research on quality assessment of immersive media. The main joint activity going on within the group is the development of a test plan to evaluate the QoE of immersive interactive communication systems, which is carried out in collaboration with ITU-T through the work item P.IXC. In this meeting, Pablo Pérez (Nokia XR Lab, Spain) and Jesús Gutiérrez (Universidad Politécnica de Madrid, Spain) provided a report on the status of the test plan, including the test proposals from 13 different groups that have joined the activity, which will be launched in September.

In addition to this, Shirin Rafiei (RISE, Sweden) delivered a presentation on her work on human interaction in industrial tele-operated driving through a laboratory investigation.

Quality Assessment for Computer Vision Applications (QACoViA)

The goal of the QACoViA group is to study the visual quality requirements for computer vision methods, where the “final observer” is an algorithm. In this meeting, Avrajyoti Dutta (AGH University, Poland) delivered a presentation dealing with the subjective quality assessment of video summarization algorithms through a crowdsourcing approach.

Intersector Rapporteur Group on Audiovisual Quality Assessment (IRG-AVQA)

This VQEG meeting was co-located with the rapporteur group meeting of ITU-T Study Group 12 – Question 19, coordinated by Chulhee Lee (Yonsei University, Korea). During the first two days of the week, the experts from ITU-T and VQEG worked together on various topics. For instance, there was an editing session to work together on the VQEG proposal to merge the ITU-T Recommendations P.910, P.911, and P.913, including updates with new methods. Another topic addressed during this meeting was the working item “P.obj-recog”, related to the development of an object-recognition-rate-estimation model in surveillance video of autonomous driving. In this sense, a liaison statement was also discussed with the VQEG AVHD group. Also in relation to this group, another liaison statement was discussed on the new work item “P.SMAR” on subjective tests for evaluating the user experience for mobile Augmented Reality (AR) applications.

Other updates

One interesting presentation was given by Mathias Wien (RWTH Aachen University, Germany) on the quality evaluation activities carried out within the MPEG Visual Quality Assessment group, including the expert viewing tests. This presentation and the follow-up discussions will help to strengthen the collaboration between VQEG and MPEG on video quality evaluation activities.

The next VQEG plenary meeting will take place in autumn 2023 and will be announced soon on the VQEG website.

MPEG Column: 143rd MPEG Meeting in Geneva, Switzerland

The 143rd MPEG meeting took place in person in Geneva, Switzerland. The official press release can be accessed here and includes the following details:

  • MPEG finalizes the Carriage of Uncompressed Video and Images in ISOBMFF
  • MPEG reaches the First Milestone for two ISOBMFF Enhancements
  • MPEG ratifies Third Editions of VVC and VSEI
  • MPEG reaches the First Milestone of AVC (11th Edition) and HEVC Amendment
  • MPEG Genomic Coding extended to support Joint Structured Storage and Transport of Sequencing Data, Annotation Data, and Metadata
  • MPEG completes Reference Software and Conformance for Geometry-based Point Cloud Compression

We have adjusted the press release to suit the audience of ACM SIGMM and emphasized research on video technologies. This edition of the MPEG column centers around ISOBMFF and video codecs. As always, the column will conclude with an update on MPEG-DASH.

ISOBMFF Enhancements

The ISO Base Media File Format (ISOBMFF) supports the carriage of a wide range of media data such as video, audio, point clouds, haptics, etc., which has now been further extended to uncompressed video and images.

ISO/IEC 23001-17 – Carriage of uncompressed video and images in ISOBMFF – specifies how uncompressed 2D image and video data is carried in files that comply with the ISOBMFF family of standards. This encompasses a range of data types, including monochromatic and colour data, transparency (alpha) information, and depth information. The standard enables the industry to effectively exchange uncompressed video and image data while utilizing all additional information provided by the ISOBMFF, such as timing, color space, and sample aspect ratio for interoperable interpretation and/or display of uncompressed video and image data.

ISO/IEC 14496-15 (based on ISOBMFF) provides the basis for “network abstraction layer (NAL) unit structured video coding formats” such as AVC, HEVC, and VVC. The current version is the 6th edition, which has been amended to support neural-network post-filter supplemental enhancement information (SEI) messages. This amendment defines the carriage of the neural-network post-filter characteristics (NNPFC) SEI messages and the neural-network post-filter activation (NNPFA) SEI messages to enable the delivery of (i) a base post-processing filter and (ii) a series of neural network updates synchronized with the input video pictures/frames.

Research aspects: While the former, the carriage of uncompressed video and images in ISOBMFF, seems to be something obvious to be supported within a file format, the latter enables to use neural network-based post-processing filters to enhance video quality after the decoding process, which is an active field of research. The current extensions with the file format provide a baseline for the evaluation (cf. also next section).

Video Codec Enhancements

MPEG finalized the specifications of the third editions of the Versatile Video Coding (VVC, ISO/IEC 23090-3) and the Versatile Supplemental Enhancement Information (VSEI, ISO/IEC 23002-7) standards. Additionally, MPEG issued the Committee Draft (CD) text of the eleventh edition of the Advanced Video Coding (AVC, ISO/IEC 14496-10) standard and the Committee Draft Amendment (CDAM) text on top of the High Efficiency Video Coding standard (HEVC, ISO/IEC 23008-2).

These SEI messages include two systems-related SEI messages, (a) one for signaling of green metadata as specified in ISO/IEC 23001-11 and (b) the other for signaling of an alternative video decoding interface for immersive media as specified in ISO/IEC 23090-13. Furthermore, the neural network post-filter characteristics SEI message and the neural-network post-processing filter activation SEI message have been added to AVC, HEVC, and VVC.

The two SEI messages for describing and activating post-filters using neural network technology in video bitstreams could, for example, be used for reducing coding noise, spatial and temporal upsampling (i.e., super-resolution and frame interpolation), color improvement, or general denoising of the decoder output. The description of the neural network architecture itself is based on MPEG’s neural network representation standard (ISO/IEC 15938 17). As results from an exploration experiment have shown, neural network-based post-filters can deliver better results than conventional filtering methods. Processes for invoking these new post-filters have already been tested in a software framework and will be made available in an upcoming version of the VVC reference software (ISO/IEC 23090-16).

Research aspects: SEI messages for neural network post-filters (NNPF) for AVC, HEVC, and VVC, including systems supports within the ISOBMFF, is a powerful tool(box) for interoperable visual quality enhancements at the client. This tool(box) will (i) allow for Quality of Experience (QoE) assessments and (ii) enable the analysis thereof across codecs once integrated within the corresponding reference software.

MPEG-DASH Updates

The current status of MPEG-DASH is depicted in the figure below:

The latest edition of MPEG-DASH is the 5th edition (ISO/IEC 23009-1:2022) which is publicly/freely available here. There are currently three amendments under development:

  • ISO/IEC 23009-1:2022 Amendment 1: Preroll, nonlinear playback, and other extensions. This amendment has been ratified already and is currently being integrated into the 5th edition of part 1 of the MPEG-DASH specification.
  • ISO/IEC 23009-1:2022 Amendment 2: EDRAP streaming and other extensions. EDRAP stands for Extended Dependent Random Access Point and at this meeting the Draft Amendment (DAM) has been approved. EDRAP increases the coding efficiency for random access and has been adopted within VVC.
  • ISO/IEC 23009-1:2022 Amendment 3: Segment sequences for random access and switching. This amendment is at Committee Draft Amendment (CDAM) stage, the first milestone of the formal standardization process. This amendment aims at improving tune-in time for low latency streaming.

Additionally, MPEG Technologies under Consideration (TuC) comprises a few new work items, such as content selection and adaptation logic based on device orientation and signalling of haptics data within DASH.

Finally, part 9 of MPEG-DASH — redundant encoding and packaging for segmented live media (REAP) — has been promoted to Draft International Standard (DIS). It is expected to be finalized in the upcoming meetings.

Research aspects: Random access has been extensively evaluated in the context of video coding but not (low latency) streaming. Additionally, the TuC item related to content selection and adaptation logic based on device orientation raises QoE issues to be further explored.

The 144th MPEG meeting will be held in Hannover from October 16-20, 2023. Click here for more information about MPEG meetings and their developments.

JPEG Column: 99th JPEG Meeting

JPEG Trust on a mission to re-establish trust in digital media

The 99th JPEG meeting was held online, from 24th to 28th April 2023.

Providing tools suitable for establishing provenance, authenticity and ownership of multimedia content is one of the most difficult challenges faced nowadays, considering the technological models that allow effective multimedia data manipulation and generation. As in the past, the JPEG Committee is again answering the emerging challenges in multimedia. JPEG Trust is a standard offering solutions to media authenticity, provenance and ownership.

Furthermore, learning-based coding standards, JPEG AI and JPEG Pleno Learning-based Point Cloud Coding, continue their development. New verification models that incorporate the technological developments resulting from verification experiments and contributions have been approved.

Also relevant, the responses to the Calls for Contributions on standardization of quality models of JPEG AIC and JPEG Pleno Light Field Quality Assessment received responses and started a collaborative process to define new standards.

The 99th JPEG meeting had the following highlights:

Trust, Authenticity and Provenance.
  • New JPEG Trust international standard targets media authenticity
  • JPEG AI new verification model
  • JPEG DNA releases its call for proposals
  • JPEG Pleno Light Field Quality Assessment analyses the response to the call for contributions
  • JPEG AIC analyses the response to the call for contributions
  • JPEG XE identifies use cases and requirements for event based vision
  • JPEG Systems: JUMBF second edition is progressing to publication stage
  • JPEG NFT prepares a call for proposals
  • JPEG XS progress for its third edition

The following summarizes the major achievements during the 99th JPEG meeting.

New JPEG Trust international standard targets media authenticity

Drawing reliable conclusions about the authenticity of digital media is complicated, and becoming more so as AI-based synthetic media such as Deep Fakes and Generative Adversarial Netwodrks (GANs) start appearing. Consumers of social media are challenged to assess the trustworthiness of the media they encounter, and agencies that depend on the authenticity of media assets must be concerned with mistaking fake media for real, with risks of real-world consequences.

To address this problem and to provide leadership in global interoperable media asset authenticity, JPEG initiated development of a new international standard: JPEG Trust. JPEG Trust defines a framework for establishing trust in media. This framework adresses aspects of authenticity, provenance and integrity through secure and reliable annotation of media assets throughout their life cycle. The first part, “Core foundation”, defines the JPEG Trust framework and provides building blocks for more elaborate use cases. It is expected that the standard will evolve over time and be extended with additional specifications.

JPEG Trust arises from a four-year exploration of requirements for addressing mis- and dis-information in online media, followed by a 2022 Call for Proposals, conducted by international experts from industry and academia from all over the world.

The new standard is expected to be published in 2024. To stay updated on JPEG Trust, please regularly check the JPEG website for the latest information.

JPEG AI

The JPEG AI activity progressed at this meeting with more than 60 technical contributions submitted for improvements and additions to the Verification Model (VM), which after some discussion and analysis, resulted in several adoptions for integration into the future VM3.0. These adoptions target the speed-up of the decoding process, namely the replacement of the range coder by an asymmetric numeral system, support for multi-threading or/and single instruction multiple data operations, and parallel decoding with sub-streams. The JPEG AI context module was significantly accelerated with a new network architecture along with other synthesis transform and entropy decoding network simplifications. Moreover, a lightweight model was also adopted targeting mobile devices, providing 10%-15% compression efficiency gains over VVC Intra at just 20-30 kMAC/pxl. In this context, JPEG AI will start the development and evaluation of two JPEG AI VM configurations at two different operating points: lightweight and high.

At the 99th meeting, the JPEG AI requirements were reviewed and it was concluded that most of the key requirements will be achieved by the previously anticipated timeline for DIS (scheduled for Oct. 2023) and thus version 1 of the JPEG AI standard will go as planned without changes in its timeline and with a clear focus on image reconstruction. Some core requirements, such as those addressing computer vision and image processing tasks as well as progressive decoding, will be addressed in a version 2 along with other tools that further improve requirements already addressed in version 1, such as better compression efficiency.

JPEG Pleno Learning-based Point Cloud coding

The JPEG Pleno Point Cloud activity progressed at this meeting with a major improvement to its VM providing improved performance and control over the balance between the coding of geometry and colour via a split geometry and colour coding framework. Colour attribute information is encoded using JPEG AI resulting in enhanced performance and compatibility with the ecosystem of emerging high-performance JPEG codecs. Prior to the 100th JPEG Meeting, JPEG experts will investigate possible advancements to the VM in the areas of attention models, sparse tensor convolution, and support for residual lossless coding.

JPEG DNA

The JPEG Committee has been working on an exploration for coding of images in quaternary representations particularly suitable for image archival on DNA storage. The scope of JPEG DNA is the creation of a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers. During the 99th JPEG meeting, a final call for proposals for JPEG DNA was issued and made public, as a first concrete step towards standardization.

The final call for proposals for JPEG DNA is complemented by a JPEG DNA Common Test Conditions document which is also made public, describing details about the dataset, operating points, anchors and performance assessment methodologies and metrics that will be used to evaluate anchors and future proposals to be submitted. A set of exploration studies has validated the procedures outlined in the final call for proposals for JPEG DNA. The deadline for submission of proposals to the Call for Proposals for JPEG DNA is 2 October 2023, with a pre-registration due by 10 July 2023. The JPEG DNA international standard is expected to be published by early 2025.

JPEG Pleno Light Field Quality Assessment

At the 99th JPEG meeting two contributions were received in response to the JPEG Pleno Final Call for Contributions (CfC) on Subjective Light Field Quality Assessment.

  • Contribution 1: presents a 3-step subjective quality assessment framework, with a pre-processing step; a scoring step; and a data processing step. The contribution includes a software implementation of the quality assessment framework.
  • Contribution 2: presents a multi-view light field dataset, comprising synthetic light fields. It provides RGB + ground-truth depth data, realistic and challenging blender scenes, with various textures, fine structures, rich depth, specularities, non-Lambertian areas, and difficult materials (water, patterns, etc).

The received contributions will be considered in the development of a modular framework based on a collaborative process addressing the use cases and requirements under the JPEG Pleno Quality Assessment of light fields standardization effort.

JPEG AIC

Three contributions in response to the JPEG Call for Contributions (CfC) on Subjective Image Quality Assessment were received at the 99th JPEG meeting. One contribution presented a new subjective quality assessment methodology that combines relative and absolute data. The second contribution reported a new subjective quality assessment methodology based on triplet comparison with boosting techniques. Finally, the last contribution reported a new pairwise sampling methodology.

These contributions will be considered in the development of the standard, following a collaborative process. Several core experiments were designed to assist the creation of a Working Draft (WD) for the future JPEG AIC Part 3 standard.

JPEG XE

The JPEG committee continued with the exploration activity on Event-based Vision, called JPEG XE. Event-based Vision revolves around a new and emerging image modality created by event-based visual sensors. At this meeting, the scope was defined to be the creation and development of a standard to represent events in an efficient way allowing interoperability between sensing, storage, and processing, targeting machine vision applications. Events in the context of this standard are defined as the messages that signal the result of an observation at a precise point in time, typically triggered by a detected change in the physical world. The exploration activity is currently working on the definition of the use cases and requirements.

An Ad-hoc Group has been established. To stay informed about the activities please join the event based imaging Ad-hoc Group mailing list.

JPEG XL

The second editions of JPEG XL Part 1 (Core coding system) and Part 2 (File format) have proceeded to the DIS stage. These second editions provide clarifications, corrections and editorial improvements that will facilitate independent implementations. Experiments are planned to prepare for a second edition of JPEG XL Part 3 (Conformance testing), including conformance testing of the independent implementations J40, jxlatte, and jxl-oxide.

JPEG Systems

The second edition of JUMBF (JPEG Universal Metadata Box Format, ISO/IEC 19566-5) is progressing to the IS publication stage; the second edition brings new capabilities and support for additional types of media.

JPEG NFT

Many Non-Fungible Tokens (NFTs) point to assets represented in JPEG formats or can be represented in current and emerging formats under development by the JPEG Committee. However, various trust and security concerns have been raised about NFTs and the digital assets on which they rely. To better understand user requirements for media formats, the JPEG Committee conducted an exploration on NFTs. The scope of JPEG NFT is the creation of effective specifications that support a wide range of applications relying on NFTs applied to media assets. The standard will be secure, trustworthy and eco-friendly, allowing for an interoperable ecosystem relying on NFT within a single application or across applications. As a result of the exploration, at the 99th JPEG Meeting the committee released a “Draft Call for Proposals on JPEG NFT” and associated updated “Use Cases and Requirements for JPEG NFT”. Both documents are made publicly available for review and feedback.

JPEG XS

The JPEG committee continued its work on the JPEG XS 3rd edition. The primary goal of the 3rd edition is to deliver the same image quality as the 2nd edition, but with half of the required bandwidth. For Part 1 – Core coding tools – the Draft International Standard will proceed to ISO/IEC ballot. This is a significant step in the standardization process with all the core coding technology now final. Most notably, Part 1 adds a temporal decorrelation coding mode to further improve the coding efficiency, while keeping the low-latency and low-complexity core aspects of JPEG XS. Furthermore, Part 2 – Profiles and buffer models – and Part 3 – Transport and container formats – will proceed to Committee Draft consultation. Part 2 is important as it defines the conformance points for JPEG XS compliance. Completion of the JPEG XS 3rd edition standard is scheduled for January 2024.

Final Quote

“The creation of standardized tools to bring assurance of authenticity, provenance and ownership for multimedia content is the most efficient path to suppress the abusive use of fake media. JPEG Trust will be the first international standard that provides such tools.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Future JPEG meetings are planned as follows:

  • No 100, will be in Covilhã, Portugal from 17-21 July 2023
  • No 101, will be online from 30 October – 3 November 2023

A zip package containing the official JPEG logo and logos of all JPEG standards can be downloaded here.

Video Interviews at ACM Multimedia 2022

This column showcases a series of video interviews shooted at ACM Multimedia 2022.
Social media editors in chief (i.e., Silvia Rossi and Conor Keighrey) of the records interviewed the authors behind some of the most intriguing and compelling demos and artistic interactive artworks. Silvia and Conor have started this initiative and will continue, when possible, at conferences supported by SIGMM.

ACM Multimedia is the premier international conference in the area of multimedia within the field of computer science.
As in every edition of ACM MM, the conference once again played host to riveting demonstrations and interactive showcases of the latest research concepts. These sessions serve a dual purpose: they stand as a testament to the presenters’ invaluable scientific and engineering contributions while also providing a unique opportunity for multimedia researchers and practitioners to delve into real-world applications, prototypes, and proofs-of-concept.

This dynamic setting is where conference attendees come face-to-face with groundbreaking multimedia systems. It’s a chance for them to gain insights into the innovative solutions and ideas that are actively shaping the future of this ever-evolving field. From visionary demonstrations of emerging technologies to interactive showcases that push the boundaries of creativity, these sessions are at the heart of what makes ACM MM a unique event in the world of multimedia.

Below is the list of video interviews with references to the corresponding authors and papers.

  • Varvara Guljajeva and Mar Canet Sola. 2022. Dream Painter: An Interactive Art Installation Bridging Audience Interaction, Robotics, and Creative AI. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 7235–7236. https://doi.org/10.1145/3503161.3549976
  • Jorge Forero, Gilberto Bernardes, and Mónica Mendes. 2022. Emotional Machines: Toward Affective Virtual Environments. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 7237–7238. https://doi.org/10.1145/3503161.3549973
  • Ignacio Reimat, Yanni Mei, Evangelos Alexiou, Jack Jansen, Jie Li, Shishir Subramanyam, Irene Viola, Johan Oomen, and Pablo Cesar. 2022. Mediascape XR: A Cultural Heritage Experience in Social VR. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 6955–6957. https://doi.org/10.1145/3503161.3547732
  • Manuel Silva, Luana Santos, Luís Teixeira, and José Vasco Carvalho. 2022. All is Noise: In Search of Enlightenment, a VR Experience. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 7223–7224. https://doi.org/10.1145/3503161.3549958
  • Pin-Xuan Liu, Tse-Yu Pan, Hsin-Shih Lin, Hung-Kuo Chu, and Min-Chun Hu. 2022. BetterSight: Immersive Vision Training for Basketball Players. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 6979–6981. https://doi.org/10.1145/3503161.3547745
  • Tiago Fornelos, Pedro Valente, Rafael Ferreira, Diogo Tavares, Diogo Silva, David Semedo, Joao Magalhaes, and Nuno Correia. 2022. A Conversational Shopping Assistant for Online Virtual Stores. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 6994–6996. https://doi.org/10.1145/3503161.3547738
  • Ting-Yang Kao, Tse-Yu Pan, Chen-Ni Chen, Tsung-Hsun Tsai, Hung-Kuo Chu, and Min-Chun Hu. 2022. ScoreActuary: Hoop-Centric Trajectory-Aware Network for Fine-Grained Basketball Shot Analysis. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 6991–6993. https://doi.org/10.1145/3503161.3547736
  • Maria Giovanna Donadio, Filippo Principi, Andrea Ferracani, Marco Bertini, and Alberto Del Bimbo. 2022. Engaging Museum Visitors with Gamification of Body and Facial Expressions. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 7000–7002. https://doi.org/10.1145/3503161.3547744

Students Report from ACM MMsys 2023

The 14th ACM Multimedia Systems Conference (with the associated workshops: NOSSDAV 2023, MMVE 2023, and the first edition of GMSys 2023) took place from 7th – 10th June 2023 in Vancouver, Canada.  The MMSys conference brings together researchers in multimedia systems to showcase and exchange their cutting-edge research findings. Once again, there were technical talks spanning various multimedia domains and inspiring keynote presentations. Participants had also the opportunity to further interact with colleagues while enjoying the sunset with a 360° view of Vancouver on the Lookout tower or during a dinner in the core of the rainforest. Additionally, this year’s event included a special session dedicated to the memory of Dr. Kuan-Ta Chen, to honor his invaluable contributions to the multimedia community and to inspire the future generation of researches.

To encourage junior researchers to participate on-site, SIGMM has sponsored a group of students with Student Travel Grant Awards. For many of them, this was their first time presenting at an international conference, and it was a wonderful experience. In this article, the recipients of the travel grants share their experiences at MMSys 2023.

Mike Vandersanden, PhD student from Hasselt University, Belgium

As a new PhD student starting my professional academic career less than a year before MMSys ’23, I focused on finding my place in the academic community. My advisor and colleagues encouraged me to achieve two goals: get feedback on my research and build a network. I submitted my first paper and it got accepted for the Doctoral Symposium at the conference, which was a great opportunity to work towards achieving my goals. Presenting my paper allowed me to receive helpful feedback, have interesting discussions, and gain new perspectives. It was motivating to see people interested in my work. During the rest of the conference, I connected with many attendees from different parts of the world. The social events were a great way to meet others, and we also had enjoyable evenings downtown. Upon returning home, I was happy to report to my advisor that I accomplished all my goals for the first year. I am grateful for receiving a student travel grant, as it made it easier to travel to another continent. It also gave me the freedom to manage my budget and increases my chances of attending the conference again next year.


May Lim, PhD student from National University of Singapore, Singapore

I was both excited and nervous for MMSys 2023 as it was not only my first in-person conference but also in a country I had never visited before. The conference turned out to be one of the most unforgettable and pleasant experience I ever had. It was well-organized with very insightful presentations, many opportunities to interact and exchange contacts with fellow researchers, and not forgetting the organizers’ thoughtful efforts to ensure the great comfort and welfare of the participants. Vancouver’s weather and people were very kind as well.

I am thankful for the travel grant and the support from ACM SIGMM was truly heartening. I hope to continue to be part of this community and pay it forward in other ways.


Tiago Soares da Costa, PhD student from FEUP, Portugal

MMSys 2023 marked my return to in-person conferences and was one of the most well organized conferences I had the pleasure to participate in. After several virtual conferences, being able to actively meet and discuss interesting topics related to multimedia with fellow researchers was a breath of fresh air. The keynote presentation from Klara Nahrstedt was one of my highlights from MMSys 2023, due to its extensive focus on multi-view streaming, one of the main topics from my PhD research. Ihab Amer was another welcome surprise, presenting us with the current trends in AI encoding from one of the leading tech enterprises, AMD. Regarding paper presentations, I have to highlight the following works: 1) “The AD△ER Framework: Tools for Event Video Representations“, for providing us with a new approach to frameless videos; 2) “Remote Expert Assistance System for Mixed-HMD Clients over 5G Infrastructure” for delivering an impressive tech demonstration; 3) “FleXR: A System Enabling Flexibly Distributed Extended Reality“, for presenting us with a distributed stream processing solution which can be effectively applied to XR-based environments. As for the social events from MMSys 2023, the sights and sounds from Grouse Mountain and the impressive view from the Vancouver Lookout Tower were among some of my favourite moments in Vancouver, that I will forever cherish. Overall, MMSys 2023 was an amazing conference and I’m particularly grateful to the SIGMM committee for providing me with the travel grant.


Yu-Szu Wei from National Tsing Hua University, Taiwan

It is a great honor for me to receive the student travel grant and I appreciate it so much. ACM MMsys 2023 is my first in-person experience attending an international conference, it certainly is a fantastic experience for me. I met lots of astonishing researchers and volunteers who solve problems with different, creative, and novel approaches. I exchanged my ideas with them and learned a lot from them. The keynote sessions also gave me brand-new mindsets, finding out that there are lots of issues for us to investigate and deal with. The most impressive thing for me is to stand on the stage and present my work to those experts. I’m so proud of myself for delivering my research ideas in front of the public and gaining abundant feedback from the audience.

Thanks to the committee that organized this awesome event, and provided me the travel grant to attend the conference. I’m looking forward to attending ACM MMSys again in the future.