JPEG
Trust reaches Committee Draft stage at the 101st JPEG meeting
The 101st JPEG meeting was held online, from the 30th of October to the 3rd of November 2023. At this meeting, JPEG Trust became a Committee Draft. In addition, JPEG analyzed the responses to its Calls for Proposals for JPEG DNA.
The 101st JPEG meeting had the following highlights:
JPEG Trust reaches Committee Draft;
JPEG AI request its re-establishment;
JPEG Pleno Learning-based Point Cloud coding establishes a new Verification Model;
JPEG Pleno organizes a Light Field Industry Workshop;
JPEG AIC-3 continues the evaluation of contributions;
JPEG XE produces a first draft of the Common Test Conditions;
JPEG DNA analyses the responses to the Call for Proposals;
JPEG XS proceeds with the development of the 3rd edition;
JPEG XL proceeds with the development of the 2nd edition.
The following sections summarize the main highlights of the 101st JPEG meeting.
JPEG Trust
The
101st meeting marked an important milestone for JPEG Trust project with
its Committee Draft (CD) for Part 1 “Core Foundation” (21617-1) of the standard
approved for consultation. It is expected that a Draft International Standard
(DIS) of the Core Foundation will be approved at the 102nd JPEG
meeting in January 2024, which will be another important milestone. This rapid
schedule is necessitated by the speed at which fake media and misinformation are
proliferating especially in respect of generative AI.
Aligned
with JPEG Trust, the NFT Call for Proposals (CfP) has yielded two expressions
of interest to date, and submission of proposals is still open till the 15th of
January 2024.
Additionally,
the Use Cases and Requirements document for JPEG Fake Media (the JPEG Fake
Media exploration preceded the initiation of the JPEG Trust international
standard) was updated to reflect the change to JPEG Trust as well as
incorporate additional use cases that have arisen since the previous JPEG meeting,
namely in respect of composited images. This document is publicly available on
the JPEG website.
JPEG AI
At
the 101st meeting, the JPEG Committee issued a request for
re-establishing the JPEG AI (6048-1) project, along with a Committee Draft (CD)
of its version 1. A new JPEG AI timeline has also been approved and is now
publicly available, where a Draft International Standard (DIS) of the Core
Coding Engine of JPEG AI version 1 is foreseen at the 103rd JPEG
meeting (April 2024), a rather important milestone for JPEG AI. The JPEG
Committee also established that JPEG AI version 2 will address requirements not
yet fulfilled (especially regarding machine consumption tasks) but also
significant improvements on requirements already addressed in version 1, e.g.
compression efficiency. JPEG AI version 2 will issue the final Call for
Proposals in January 2025 and the presentation and evaluation of JPEG AI
version 2 proposals will occur in July 2025. During 2023, the JPEG AI
Verification Model (VM) has evolved from a complex system (800kMAC/pxl) to two
acceptable complexity-efficiency operation points, providing 11% compression
efficiency gains at 20 kMAC/pxl and 25% compression efficiency gains at 200
kMAC/pxl. The decoder for the lower-end operating point has now been implemented
on mobile devices and demonstrated during the 100th and 101st
JPEG meetings. A presentation with the JPEG AI architecture, networks, and
tools is now publicly available. To avoid project delays in the future, the
promising input contributions from the 101st meeting will be
combined in JPEG AI Core Experiment 6.1 (CE6.1) to study interaction and
resolve potential issues during the next meeting cycle. After this integration,
a model will be trained and cross-checked to be approved for release (JPEG AI
VM5 release candidate) along with the study DIS text. Among promising
technologies included in CE6.1 are high quality and variable rate improvements,
with a smaller number of models (from 5 to 4), a multi-branch decoder that
allows up to three reconstructions with different levels of quality from the
same latent representation, but with synthesis transform networks with
different complexity along with several post-filter and arithmetic coder
simplifications.
JPEG Pleno Learning-based Point Cloud coding
The
JPEG Pleno Learning-based Point Cloud coding activity progressed at the 101st
meeting with a major investigation into point cloud quality metrics. The JPEG
Committee decided to continue this investigation into point cloud quality
metrics as well as explore possible advancements to the VM in the areas of
parameter tuning and support for residual lossless coding. The JPEG Committee is
targeting a release of the Committee Draft of Part 6 of the JPEG Pleno standard
relating to Learning-based point cloud coding at the 102nd JPEG
meeting in San Francisco, USA in January 2024.
JPEG Pleno Light Field
The
JPEG Committee has been creating several standards to provision the dynamic
demands of the market, with its royalty-free patent licensing commitments. A
light field coding standard has recently been developed, and JPEG Pleno is
constantly exploring novel light field coding architectures.
The
JPEG Committee is also preparing standardization activities – among others – in
the domains of objective and subjective quality assessment for light fields,
improved light field coding modes, and learning-based light field coding.
A
Light Field Industry Workshop takes place on November 22nd, 2023,
aiming at providing a forum for industrial actors to exchange information on
their needs and expectations with respect to standardization activities in this
domain.
JPEG AIC
During
the 101st JPEG meeting, the AIC activity continued its efforts on
the evaluation of the contributions received in April 2023 in response to the
Call for Contributions on Subjective Image Quality Assessment. Notably, the
activity is currently investigating three different subjective image quality
assessment methodologies. The results of the newly established Core Experiments
will be considered during the design of the AIC-3 standard, which has been
carried out in a collaborative way since its beginning.
The AIC
activity also initiated the discussion on Part 4 of the standard on Objective
Image Quality Metrics (AIC-4) by refining the Use Cases and Requirements
document. During the 102nd JPEG meeting in January 2024, the
activity is planning to work on the Draft Call for Proposals on Objective Image
JPEG XE
The
JPEG Committee continued its activity on Event-based Vision. This activity
revolves around a new and emerging image modality created by event-based visual
sensors. JPEG XE aims at the creation and development of a standard to
represent events in an efficient way allowing interoperability between sensing,
storage, and processing, targeting machine vision and other relevant
applications. For better dissemination and raising external interest, a
workshop around Event-based Vision was organized and took place on Oct 24th,
2023. The workshop triggered the attention of various stakeholders in the field
of Event-based Vision, who will start contributing to JPEG XE. The workshop
proceedings will be made available on jpeg.org. In addition, the JPEG Committee
created a minor revision for the Use cases and Requirements as v1.0, adding an
extra use case on scientific and engineering measurements. Finally, a first
draft of the Common Test Conditions for JPEG XE was produced, along with the
first Exploration Experiments to start practical experiments in the coming
3-month period until the next JPEG meeting. The public Ad-hoc Group on
Event-based Vision was re-established to continue the work towards the next 102nd
JPEG meeting in January of 2024. To stay informed about the activities please
join the Event-based Vision Ad-hoc Group mailing list.
JPEG DNA
As a result of the Call for Proposals issued by the JPEG Committee for
contributions to JPEG DNA standard, 5 proposals were submitted under three
distinct codecs by three organizations. Two codecs were submitted to both
coding and transcoding categories, and one was submitted to the coding category
only. All proposals showed improved compression efficiency when compared to
three selected anchors by the JPEG Committee. After a rigorous analysis of the
proposals and their cross checking by independent parties, it was decided to
create a first Verification Model (VM) based on V-DNA, the best performing
proposal. In addition, a number of core experiments were designed to improve
the JPEG DNA VM with elements from other proposals submitted by quantifying
their added value when integrated in the VM.
JPEG XS
The
JPEG Committee continued its work on JPEG XS 3rd edition. The
primary goal of the 3rd edition is to deliver the same image quality
as the 2nd edition, but with half of the required bandwidth. The
Final Draft International Standard for Part 1 of the standard — Core coding
tools — was produced at this meeting. With this FDIS version, all technical
features are now fixed and completed. Part 2 — Profiles and buffer models —
and Part 3 — Transport and container formats — of the standard are still in
DIS ballot, and ballot results will only be known by the end of January 2024.
The JPEG Committee is now working on Part 4 — Conformance testing, to provide
the necessary test streams of the 3rd edition for potential
implementors. A first Working Draft for Part 4 was issued. Completion of the
JPEG XS 3rd edition is scheduled for April 2024 (Parts 1, 2, and 3)
and Parts 4 and 5 will follow shortly after that. Finally, the new Use cases
and Requirements for JPEG XS document was created containing a new use case to
use JPEG XS for transport of 4K/8K video over 5G mobile networks. It is
expected that the new use case can already be covered by the 3rd edition,
meaning that no further updates to the standard would be needed. However, more
investigations and experimentations will be conducted on this subject.
JPEG XL
The second editions of JPEG
XL Part 1 (Core coding system) and Part 2 (File format) have proceeded to the
FDIS stage, and the second edition of JPEG XL Part 3 (Conformance testing) has
proceeded to the CD stage. These second editions provide clarifications,
corrections and editorial improvements that will facilitate independent
implementations. At the same time, the development of hardware implementation solutions
continues.
Final Quote
“The release of the first Committee Draft of JPEG Trust is a strong signal that the JPEG Committee is reacting with a timely response to demands for solutions that inform users when digital media assets are created or modified, in particular through Generative AI, hence contributing to bringing back trust into media-centric ecosystems.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.
The 145th MPEG meeting was held online from 22-26 January 2024, and the official press release can be found here. It comprises the following highlights:
Latest Edition of the High Efficiency Image Format Standard Unveils Cutting-Edge Features for Enhanced Image Decoding and Annotation
MPEG Systems finalizes Standards supporting Interoperability Testing
MPEG finalizes the Third Edition of MPEG-D Dynamic Range Control
MPEG finalizes the Second Edition of MPEG-4 Audio Conformance
MPEG Genomic Coding extended to support Transport and File Format for Genomic Annotations
MPEG White Paper: Neural Network Coding (NNC) – Efficient Storage and Inference of Neural Networks for Multimedia Applications
This column will focus on the High Efficiency Image Format (HEIF) and interoperability testing. As usual, a brief update on MPEG-DASH et al. will be provided.
High Efficiency Image Format (HEIF)
The High Efficiency Image Format (HEIF) is a widely adopted standard in the imaging industry that continues to grow in popularity. At the 145th MPEG meeting, MPEG Systems (WG 3) ratified its third edition, which introduces exciting new features, such as progressive decoding capabilities that enhance image quality through a sequential, single-decoder instance process. With this enhancement, users can decode bitstreams in successive steps, with each phase delivering perceptible improvements in image quality compared to the preceding step. Additionally, the new edition introduces a sophisticated data structure that describes the spatial configuration of the camera and outlines the unique characteristics responsible for generating the image content. The update also includes innovative tools for annotating specific areas in diverse shapes, adding a layer of creativity and customization to image content manipulation. These annotation features cater to the diverse needs of users across various industries.
Research aspects: Progressive coding has been a part of modern image coding formats for some time now. However, the inclusion of supplementary metadata provides an opportunity to explore new use cases that can benefit both user experience (UX) and quality of experience (QoE) in academic settings.
Interoperability Testing
MPEG standards typically comprise format definitions (or specifications) to enable interoperability among products and services from different vendors. Interestingly, MPEG goes beyond these format specifications and provides reference software and conformance bitstreams, allowing conformance testing.
At the 145th MPEG meeting, MPEG Systems (WG 3) finalized two standards comprising conformance and reference software by promoting it to the Final Draft International Standard (FDIS), the final stage of standards development. The finalized standards, ISO/IEC 23090-24 and ISO/IEC 23090-25, showcase the pinnacle of conformance and reference software for scene description and visual volumetric video-based coding data, respectively.
ISO/IEC 23090-24 focuses on conformance and reference software for
scene description, providing a comprehensive reference implementation and
bitstream tailored for conformance testing related to ISO/IEC 23090-14, scene
description. This standard opens new avenues for advancements in scene
depiction technologies, setting a new standard for conformance and software
reference in this domain.
Similarly, ISO/IEC 23090-25 targets conformance and reference software for the carriage of visual volumetric video-based coding data. With a dedicated reference implementation and bitstream, this standard is poised to elevate the conformance testing standards for ISO/IEC 23090-10, the carriage of visual volumetric video-based coding data. The introduction of this standard is expected to have a transformative impact on the visualization of volumetric video data.
At the same 145th MPEG meeting, MPEG Audio Coding (WG6) celebrated the completion of the second edition of ISO/IEC 14496-26, audio conformance, elevating it to the Final Draft International Standard (FDIS) stage. This significant update incorporates seven corrigenda and five amendments into the initial edition, originally published in 2010.
ISO/IEC 14496-26 serves as a pivotal standard, providing a framework for designing tests to ensure the compliance of compressed data and decoders with the requirements outlined in ISO/IEC 14496-3 (MPEG-4 Audio). The second edition reflects an evolution of the original, addressing key updates and enhancements through diligent amendments and corrigenda. This latest edition, now at the FDIS stage, marks a notable stride in MPEG Audio Coding’s commitment to refining audio conformance standards and ensuring the seamless integration of compressed data within the MPEG-4 Audio framework.
These standards will be made freely accessible for download on the official ISO website, ensuring widespread availability for industry professionals, researchers, and enthusiasts alike.
Research aspects: Reference software and conformance bitstreams often serve as the basis for further research (and development) activities and, thus, are highly appreciated. For example, reference software of video coding formats (e.g., HM for HEVC, VM for VVC) can be used as a baseline when improving coding efficiency or other aspects of the coding format.
MPEG-DASH Updates
The current status of MPEG-DASH is shown in the figure below.
The following most notable aspects have been discussed at the 145th MPEG meeting and adopted into ISO/IEC 23009-1, which will eventually become the 6th edition of the MPEG-DASH standard:
It is now possible to pass CMCD parameters sid and cid via the MPD URL.
Segment duration patterns can be signaled using SegmentTimeline.
Definition of a background mode of operation, which allows a DASH player to receive MPD updates and listen to events without possibly decrypting or rendering any media.
Additionally, the technologies under consideration (TuC) document has been updated with means to signal maximum segment rate, extend copyright license signaling, and improve haptics signaling in DASH. Finally, REAP is progressing towards FDIS but not yet there and most details will be discussed in the upcoming AhG period.
The 146th MPEG meeting will be held in Rennes, France, from April 22-26, 2024. Click here for more information about MPEG meetings and their developments.
About our initiatives for the Multimedia community
Dear SIGMM members, colleagues, students,
The world is changing rapidly, and technology is driving these changes at an unprecedented pace. In this scenario, multimedia has become ubiquitous, providing new services to users, advanced modalities for information transmission, processing, and management, as well as innovative solutions for digital content understanding and production. The progress of Artificial Intelligence has fueled new opportunities and vitality in the field. New media formats, such as 3D, event data, and other sensory inputs, have become popular. Cutting-edge applications are constantly being developed and introduced.
We believed that these changes should be reflected in our SIGMM flagship conference, ACM Multimedia and the SIGMM organization and activities overall. This belief led us to organize the SIGMM Retreat in coincidence with ACM MM23 in Ottawa on October 30, 2023. The goal of the meeting was opening a discussion on key strategic issues such as the coverage of ACM Multimedia, its quality and reputation and how we can grow the SIGMM community. We invited the members of the SIGMM Advisory Committee, the members of the Steering Committees of the SIGMM-sponsored conferences, the ACM TOMM Editor in Chief, the past SIGMM Chairs, and senior personalities and emerging researchers of our community. The Retreat was well attended. Twenty people attended in-person. Ten attended on-line. Alberto Del Bimbo, SIGMM Chair, chaired the Retreat with the assistance of Phoebe Chen, SIGMM Vice-chair and Xavier Alameda Pineda.
The discussion was vibrant and valued opinions and suggestions emerged. It was widely agreed that the distinctive feature of multimedia research is the combination and integration of various modalities to build end-to-end systems.
People agreed on the need to introduce significant changes in the format of our flagship conference to bring new attractiveness. High consensus received the ideas of giving more room to Brave New Ideas sections, having TED-like talks, and soliciting workshops on innovative topics and striving for their continuity. There was also consensus on revitalizing the program by including new emerging topics like Foundation Models, 3D, glass free interactivity, new networking platforms. All the attendees recognized the need to balance the traditional research areas of the ACM Multimedia program.
There was general agreement on using Open Access as the reviewing system for ACM Multimedia. It was recognized it improves the quality and transparency of the reviewing process, enhances respectability, empowers the reviewers to conduct serious reviews, and aligns ACM Multimedia to the top ranked conferences.
Other important topics of discussion included how to incentivize in-person attendance and discourage online participation to maximize the value of conferences, the collaboration and synchronization of SIGMM-sponsored conferences, and the need to make the transition between conference editions more seamless.
Recognizing the need for greater industry presence, also to offer internship opportunities for students and improve the attendance of younger generations, was identified as a key issue for improvement. All the attendees recognized the importance to exploit SIGMM Records and Social Media as a means to improve the sense of community and disseminate information.
Following our commitment to align words with actions, we decided to create Strike Teams focusing on the most strategic themes. These teams are composed of a few experienced colleagues who volunteered to define realistic strategies for the key issues, determine concrete actions, and help to implement them in the near future. Starting in January 2024, four strike teams are in operation, with members appointed for two years:
SIGMM Strike Team on Open Review to provide operational support on the implementation of Open Review, smoothly transferring the best practices and helping to provide new functions. Team members are: Xavier Alameda Pineda (Univ. Grenoble-Alpes) Coordinator, Marco Bertini (Univ. Firenze).
SIGMM Strike Team on Harmonization and Spread to integrate SIGMM Records and Social Media in the whole process of the ACM Multimedia organization, improve synchronization and harmonization between ACM Multimedia and other SIGMM Conferences, and strengthen the sense of community. Team members are: Miriam Redi (Wikimedia Foundation) Coordinator, Silvia Rossi (CWI), Irene Viola (CWI), Mylene Farias (Texas State Univ. and Univ. Brasilia), Ichiro Ide (Nagoya Univ), Pablo Cesar (CWI and TU Delft).
SIGMM Strike Team on Industry Engagement to improve the presence of industry at ACM Multimedia, launching new in-cooperation initiatives and establishing stable bi-directional links. Team members are: Touradj Ebrahimi (EPFL) Coordinator, Ali Begen (Ozyegin Univ), Balu Adsumilli (Google), Yong Rui (Lenovo) and ChangSheng Xu (Chinese Academy of Sciences)
Strike Teamon ACMMM Format to innovate the ACM Multimedia program, aligning it with technological advancements and the emergence of new research areas, and igniting fresh and efficient means of disseminating research. Team members are: Arnold Smeulders (Univ. of Amsterdam) Coordinator, Alan Smeaton (Dublin City University), Tat Seng Chua (National University of Singapore), Changwen Chen (Hong Kong Polytechnic Univ.), Nicu Sebe (Univ. of Trento), Marcel Worring (Univ. of Amsterdam) and the Chairs of the next two ACMMM Conferences, Jianfei Cai (Monash Univ.) and Cathal Gurrin (Dublin City Univ.).
All the teams report to SIGMM Chair and the SIGMM Executive Committee and will work in close connection with the General Chairs and Program Chairs of the next ACM Multimedia editions.
I take this opportunity to thank again all those who participated in the SIGMM Retreat, and especially those who are committed to the Strike Teams. I sincerely hope that their work brings new ideas and vitality to our community and strengthens its visibility and reputation in the international scientific arena in the years to come.
Immersive experiences have the potential of redefining traditional forms of media engagement by intricately combining reality with imagination. Motivated by necessities, current developments and emerging technologies, this column sets out to bridge immersive experiences in both digital and physical realities. Fitting under the umbrella term of eXtended Reality (XR), the first section describes various realizations of blending digital and physical elements to design what we refer to as immersive digiphysical experiences. We further highlight industry and research initiatives related to driving the design and development of such experiences, considered to be key building-blocks of the futuristic ‘metaverse’. The second section outlines challenges related to assessing, modeling, and managing the Quality of Experience (QoE) of immersive digiphysical experiences and reflects upon ongoing work in the area. While potential use cases span a wide range of application domains, the third section elaborates on the specific case of conference organization, which has over the past few years spanned from fully physical, to fully virtual, and finally to attempts at hybrid organization. We believe this use case provides valuable insights into needs and promising approaches, to be demonstrated and experienced at the upcoming 16th edition of the International Conference on Quality of Multimedia Experience (QoMEX 2024) in Karlshamn, Sweden in June 2024.
Bridging The Digital And Physical Worlds
According to [IMeX WP, 2020], immersive media have been described as involving “multi-modal human-computer interaction where either a user is immersed inside a digital/virtual space or digital/virtual artifacts become a part of the physical world”. Spanning the so-called virtuality continuum [Milgram, 1995], immersive media experiences may involve various realizations of bridging the digital and physical worlds, such as the seamless integration of digital content with the real world (via Augmented or Mixed Reality, AR/MR), and vice versa by incorporating real objects into a virtual environment (Augmented Virtuality, AV). More recently, the term eXtended Reality (XR) (also sometimes referred to as xReality) has been used as an umbrella term for a wide range of levels of “realities”, with [Rauschnabel, 2022] proposing a distinction between AR/MR and Virtual Reality (VR) based on whether the physical environment is, at least visually, part of the user’s experience.
By seamlessly merging digital and physical elements and supporting real-time user engagement with both digital and physical components, immersive digiphysical (i.e., both digitally and physically accessible [Westerlund, 2020]) experiences have the potential of providing compelling experiences blurring the distinction between the real and virtual worlds. A key aspect is that of digital elements responding to user input or the physical environment, and the physical environment responding to interactions with digital objects. Going beyond only visual or auditory stimuli, the incorporation of additional senses, for example via haptic feedback or olfactory elements, can contribute to multisensory engagement [Gibbs, 2022].
The rapid development of XR technologies has been recognized as a key contributor to realizing a wide range of applications built on the fusion of the digital and physical worlds [NEM WP, 2022]. In its contribution to the European XR Coalition (launched by the European Commission), the New European Media Initiative (NEM), Europe’s Technology Platform of Horizon 2020 dedicated to driving the future of digital experiences, calls for needed actions from both industry and research perspectives addressing challenges related to social and human centered XR as well as XR communication aspects [NEM XR, 2022]. One such initiative is the Horizon 2020 TRANSMIXR project [TRANSMIXR], aimed at developing a distributed XR creation environment that supports remote collaboration practices, as well as an XR media experience environment for the delivery and consumption of social immersive media experiences. The NEM initiative further identifies the need for scalable solutions to obtain plausible and convincing virtual copies of physical objects and environments, as well as solutions supporting seamless and convincing interaction between the physical and the virtual world. Among key technologies and infrastructures needed to overcome outlined challenges, the following are identified [NEM XR, 2022]: high bandwidth and low-latency energy-efficient networks; remote computing for processing and rendering deployed on cloud and edge infrastructures; tools for the creation and updating of digital twins (DT) to strengthen the link between the real and virtual worlds, integrating Internet of Things (IoT) platforms; hardware in the form of advanced displays; and various content creation tools relying on interoperable formats.
Looking towards the future, immersive digiphysical experiences set the stage for visions of the metaverse [Wang, 2023], described as representing the evolution of the Internet towards a platform enabling immersive, persistent, and interconnected virtual environments blending digital and physical [Lee, 2021].[Wang, 2022] see the metaverse as `created by the convergence of physically persistent virtual space and virtually enhance physical reality’. The metaverse is further seen as a platform offering the potential to host real-time multisensory social interactions (e.g., involving sight, hearing, touch) between people communicating with each other in real-time via avatars [Hennig-Thurau, 2023]. As of 2022, the Metaverse Standards Forum is proving a venue for industry coordination fostering the development of interoperability standards for an open and inclusive metaverse [Metaverse, 2023]. Relevant existing standards include: ISO/IEC 23005 (MPEG-V) (standardization of interfaces between the real world and the virtual world, and among virtual worlds) [ISO/IEC 23055], IEEE 2888 (definition of standardized interfaces for synchronization of cyber and physical worlds) [IEEE 2888], and MPEG-I (standards to digitally represent immersive media) [ISO/IEC 23090].
Research Challenges For The Qoe Community
Achieving wide-spread adoption of XR-based services providing
digiphysical experiences across a broad range of application domains (e.g.,
education, industry & manufacturing, healthcare, engineering, etc.)
inherently requires ensuring intuitive, comfortable, and positive user
experiences. While research efforts in meeting such requirements are well under
way, a number of open challenges remain.
Quality of Experience (QoE) for immersive media has been defined as [IMeX WP, 2020] “the degree of delight or annoyance of the user of an application or service which involves an immersive media experience. It results from the fulfillment of his or her expectations with respect to the utility and/or enjoyment of the application or service in the light of the user’s personality and current state.” Furthermore, a bridge between QoE and UX has been established through the concept of Quality of User Experience (QUX), combining hedonic, eudaimonic and pragmatic aspects of QoE and UX [Egger-Lampl, 2019]. In the context of immersive communication and collaboration services, significant efforts are being invested towards understanding and optimizing the end user experience [Perez, 2022].
The White Paper [IMeX WP, 2020] ties immersion to the digital media world (“The more the system blocks out stimuli from the physical world, the more the system is considered to be immersive.”). Nevertheless, immersion as such exists in physical contexts as well, e.g., when reading a captivating book. MR, XR and AV scenarios are digiphysical in their nature. These considerations pose several challenges:
Achieving intuitive and natural interactive experiences[Hennig-Thurau, 2023] when mixing realities.
Developing a common understanding of MR-, XR- and AV-related challenges in digiphysical multi-modal multi-party settings.
Advancing VR, AR, MR, XR and AV technologies to allow for truly digiphysical experiences.
Measuring and modeling QoE, UX and QUX for immersive digiphysical services, covering overall methodology, measurement instruments, modeling approaches, test environments and application domains.
Management of the networked infrastructure to support immersive digiphysical experiences with appropriate QoE, UX and QUX.
Sustainability considerations in terms of environmental footprint, accessibility, equality of opportunities in various parts of the world, and cost/benefit ratio.
Challenges 1 and 2 demand for an experience-based bottom-up approach to focus on the most important aspects. Examples include designing and evaluating different user representations [Aseeri, 2021][Viola, 2023], natural interaction techniques [Spittle, 2023] and use of different environments by participants (AR/MR/VR) [Moslavac, 2023]. The latter has shown beneficial for challenges 3 (cf. the emergence of MR-/XR-/AV-supporting head-mounted devices such as the Microsoft Hololens and recent pass-through versions of the Meta Quest) and 4. Finally, challenges 5 and 6 need to be carefully addressed to allow for long-term adoption and feasibility.
Challenges 1 to 4 have been addressed in standardization. For instance, ITU-T Recommendation P.1320 specifies QoE assessment procedures and metrics for the evaluation of XR telemeetings, outlining various categories of QoE influence factors and use cases [ITU-T Rec. P.1320, 2022] (adopted from the 3GPP technical report TR 26.928 on XR technology in 5G). The corresponding ITU-T Study Group 12 (Question 10) developed a taxonomy of telemeetings [ITU-T Rec. G.1092, 2023], providing a systematic classification of telemeeting systems. Ongoing joint efforts between the VQEG Immersive Media Group and ITU-T Study Group 12 are targeted towards specifying interactive test methods for subjective assessment of XR communications [ITU-T P.IXC, 2022].
The complexity of the aforementioned challenges demand for a combination of fundamental work, use cases, implementations, demonstrations, and testing. One specific use case that has shown its urge during recent years in combining digital and physical realities is that of hybrid conference organization, touching in particular on the challenge of achieving intuitive and natural interactions between remote and physically present participants. We consider this use case in detail in the following section, referring to the organization of the International Conference on Quality of Multimedia Experience (QoMEX) as an example.
Immersive Communication And Collaboration: The Case Of Conference Organization
What seemed to be impossible and was undesirable in the past, became a necessity overnight during the CoVid-19 pandemic: running conferences as fully virtual events. Many research communities succeeded in adapting ongoing conference organizations such that communities could meet, present, demonstrate and socialize online. The conference QoMEX 2020 is one such example, whose organizers introduced a set of innovative instruments to mutually interact and enjoy, such as virtual Mozilla Hubs spaces for poster presentations and a music session with prerecorded contributions mixed to form a joint performance to be enjoyed virtually together. A yet unknown inventiveness was observed to make the best out of the heavily travel-restricted situation. Furthermore, the technical approaches varied from off-the-shelf systems (such as Zoom or Teams) to custom-built applications. However, the majority of meetings during CoVid times, no matter scale and nature, were run in unnatural 2D on-screen settings. The frequently reported phenomenon of videoconference (VC) fatigue can be attributed to a set of personal, organizational, technical and environmental factors [Döring, 2022]. Indeed, talking to one’s computer with many faces staring back, limited possibilities to move freely, technostress [Brod, 1984] and organizational mishaps made many people tired of VC technology that was designed for a better purpose, but could not get close enough to a natural real-life experience.
As CoVid was on its retreat, conferences again became physical events and communities enjoyed meeting again, e.g., at QoMEX 2022. However, voices were raised that asked for remote participation for various reasons, such as time or budget restrictions, environmental sustainability considerations, or simply the comfort of being able to work from home. With remote participation came the challenge of bridging between in-person and remote participants, i.e., turning conferences into hybrid events [Bajpai, 2022]. However, there are many mixed experiences from hybrid conferences, both with onsite and online participants: (1) The onsite participants suffer from interruptions of the session flow needed to fix problems with the online participation tool. Their readiness to devote effort, time, and money to participate in a future hybrid event in person might suffer from such issues, which in turn would weaken the corresponding communities; (2) The online participants suffer from similar issues, where sound irregularities (echo, excessive sound volumes, etc.) are felt to be particularly disturbing, along with feelings of being not properly included e.g., in Q&A-sessions and personal interactions. At both ends, clear signs of technostress and “us-and-them” feelings can be observed. Consequently, and despite good intentions and advice [Bajpai, 2022], any hybrid conference might miss its main purpose to bring researchers together to present, discuss and socialize. To avoid the above-listed issues, the post-CoVid QoMEX conferences (since 2022) avoided hybrid operations, with few exceptions.
A conference is a typical case that reveals difficulties in bringing the physical and digital worlds together [Westerlund, 2020], at least when relying upon state-of-the-art telemeeting approaches that have not explicitly been designed for hybrid and digiphysical operations. At the recent 26th ACM Conference on Computer-Supported Cooperative Work And Social Computing in Minneapolis, USA (CSCW 2023), one of the panel sessions focused on “Realizing Values in Hybrid Environments”. Panelists and audience shared experiences about successes and failures with hybrid events. The main take-aways were as follows: (1) there is a general lack of know-how, no matter how much funds are allocated, and (2) there is a significant demand for research activities in the area.
Yet, there is hope, as increasingly many VR, MR, XR and AV-supporting devices and applications keep emerging, enabling new kinds and representations of immersive experiences. In a conference context, the latter implies the feeling of “being there”, i.e., being integrated in the conference community, no matter where the participant is located. This calls for new ways of interacting amongst others through various realities (VR/MR/XR), which need to be invented, tried and evaluated in order to offer new and meaningful experiences in telemeeting scenarios [Viola, 2023]. Indeed, CSCW 2023 hosted a specific workshop titled “Emerging Telepresence Technologies for Hybrid Meetings: an Interactive Workshop”, during which visions, experiences, and solutions were shared and could be experienced locally and remotely. About half of the participants were online, successfully interacting with participants onsite via various techniques.
With these challenges and opportunities in mind, the motto of QoMEX 2024 has been set as “Towards immersive digiphysical experiences.” While the conference is organized as an in-person event, a set of carefully selected hybrid activities will be offered to interested remote participants, such as (1) 360° stereoscopic streaming of the keynote speeches and demo sessions, and (2) the option to take part in so-called hybrid experience demos. The 360° stereoscopic streaming has so far been tested successfully in local, national and transatlantic sessions (during the above-mentioned CSCW workshop) with various settings, and further fine-tuning will be done and tested before the conference. With respect to the demo session – and in addition to traditional onsite demos – this year, the conference will in particular solicit hybrid experience demos that enable both onsite and remote participants to test the demo in an immersive environment. Facilities will also be provided for onsite participants to test demos from both the perspective of a local and remote user, enabling them to experience different roles. The organizers of QoMEX 2024 hope that the hybrid activities of QoMEX 2024 will trigger more research interest in these areas along and beyond the classical lines of QoE research (to perform quantitative subjective studies of QoE features and correlating them with QoE factors).
Concluding Remarks
As immersive
experiences extend into both digital and physical worlds and realities, there
is a great space to conquer for QoE, UX, and QUX-related research. While the
recent CoVid pandemic has forced many users to replace physical with digital
meetings and sustainability considerations have reduced many peoples’ and
organizations’ readiness to (support) travel, shortcomings of hybrid
digiphysical meetings have failed to persuade their participants of their
superiority over pure online or on-site meetings. Indeed, one promising path
towards a successful integration of physical and digital worlds consists of
trying out, experiencing, reflecting, and deriving important research questions
for and beyond the QoE research community The upcoming conference QoMEX 2024
will be a stop along this road with carefully selected hybrid experiences aimed
at boosting research and best practice in the QoE domain towards immersive
digiphysical experiences.
References
[Aseeri, 2021] Aseeri, S., & Interrante, V. (2021). The Influence of Avatar Representation on Interpersonal Communication in Virtual Social Environments. IEEE Transactions on Visualization and Computer Graphics, 27(5), 2608-2617.
[Bajpai, 2022] Bajpai, V., et al.. (2022). Recommendations for designing hybrid conferences. ACM SIGCOMM Computer Communication Review, 52(2), 63-69.
[Brod, 1984] Brod, C. (1984). Technostress: The Human Cost of the Computer Revolution. Basic Books; New York, NY, USA: 1984.
[Döring, 2022] Döring, N., Moor, K. D., Fiedler, M., Schoenenberg, K., & Raake, A. (2022). Videoconference Fatigue: A Conceptual Analysis. International Journal of Environmental Research and Public Health, 19(4), 2061.
[Egger-Lampl, 2019] Egger-Lampl, S., Hammer, F., & Möller, S. (2019). Towards an integrated view on QoE and UX: adding the Eudaimonic Dimension, ACM SIGMultimedia Records, 10(4):5.
[Gibbs, 2022] Gibbs, J. K., Gillies, M., & Pan, X. (2022). A comparison of the effects of haptic and visual feedback on presence in virtual reality. International Journal of Human-Computer Studies, 157, 102717.
[Hennig-Thurau, 2023] Hennig-Thurau, T., Aliman, D. N., Herting, A. M., Cziehso, G. P., Linder, M., & Kübler, R. V. (2023). Social Interactions in the Metaverse: Framework, Initial Evidence, and Research Roadmap. Journal of the Academy of Marketing Science, 51(4), 889-913.
[IMeX WP, 2020] Perkis, A., Timmerer, C., et al., “QUALINET White Paper on Definitions of Immersive Media Experience (IMEx)”, European Network on Quality of Experience in Multimedia Systems and Services, 14th QUALINET meeting (online), May 25, 2020. Online: https://arxiv.org/abs/2007.07032
[ISO/IEC 23055] ISO/IEC 23005 (MPEG-V) standards, Media Context and Control, https://mpeg.chiariglione.org/standards/mpeg-v, accessed January 21, 2024.
[ISO/IEC 23090] ISO/IEC 23090 (MPEG-I) standards, Coded representation of Immersive Media, https://mpeg.chiariglione.org/standards/mpeg-i, accessed January 21, 2024.
[IEEE 2888] IEEE 2888 standards, https://sagroups.ieee.org/2888/, accessed January 21, 2024.
[ITU-T Rec.. G.1092, 2023] ITU-T Recommendation G.1092 – Taxonomy of telemeetings from a quality of experience perspective, Oct. 2023.
[ITU-T P.IXC, 2022] ITU-T Work Item: Interactive test methods for subjective assessment of extended reality communications, under study,” 2022.
[Lee, 2021] Lee, L. H. et al. (2021). All One Needs to Know about Metaverse: A Complete Survey on Technological Singularity, Virtual Ecosystem, and Research Agenda. arXiv preprint arXiv:2110.05352.
[Milgram, 1995] Milgram, P., Takemura, H., Utsumi, A., & Kishino, F. (1995, December). Augmented reality: A class of displays on the reality-virtuality continuum. In Telemanipulator and telepresence technologies (Vol. 2351, pp. 282-292). International Society for Optics and Photonics.
[Moslavac, 2023] Moslavac, M., Brzica, L., Drozd, L., Kušurin, N., Vlahović, S., & Skorin-Kapov, L. (2023, July). Assessment of Varied User Representations and XR Environments in Consumer-Grade XR Telemeetings. In 2023 17th International Conference on Telecommunications (ConTEL) (pp. 1-8). IEEE.
[Rauschnabel, 2022] Rauschnabel, P. A., Felix, R., Hinsch, C., Shahab, H., & Alt, F. (2022). What is XR? Towards a Framework for Augmented and Virtual Reality. Computers in human behavior, 133, 107289.
[NEM WP, 2022] New European Media (NEM), NEM: List of topics for the Work Program 2023-2024.
[NEM XR, 2022] New European Media (NEM), NEM contribution to the XR coalition, June 2022.
[Perez, 2022] Pérez, P., Gonzalez-Sosa, E., Gutiérrez, J., & García, N. (2022). Emerging Immersive Communication Systems: Overview, Taxonomy, and Good Practices for QoE Assessment. Frontiers in Signal Processing, 2, 917684.
[Spittle, 2023] Spittle, B., Frutos-Pascual, M., Creed, C., & Williams, I. (2023). A Review of Interaction Techniques for Immersive Environments. IEEE Transactions on Visualization and Computer Graphics, 29(9), Sept. 2023.
[TRANSMIXR] EU HORIZON 2020 TRANSMIXR project, Ignite the Immersive Media Sector by Enabling New Narrative Visions, https://transmixr.eu/
[Viola, 2023] Viola, I., Jansen, J., Subramanyam, S., Reimat, I., & Cesar, P. (2023). VR2Gather: A Collaborative Social VR System for Adaptive Multi-Party Real-Time Communication. IEEE MultiMedia, 30(2).
[Wang 2023] Wang, H. et al. (2023). A Survey on the Metaverse: The State-of-the-Art, Technologies, Applications, and Challenges. IEEE Internet of Things Journal, 10(16).
[Wang, 2022] Wang, Y. et al. (2022). A Survey on Metaverse: Fundamentals, Security, and Privacy. IEEE Communications Surveys & Tutorials, 25(1).
[Westerlund, 2020] Westerlund, T. & Marklund, B. (2020). Community pharmacy and primary health care in Sweden – at a crossroads. Pharm Pract (Granada), 18(2): 1927.