JPEG Column: 103rd JPEG Meeting

JPEG AI reaches Draft International Standard stage

The 103rd JPEG meeting was held online from April 8 to 12, 2024. During the 103rd JPEG meeting, the first learning-based standard, JPEG AI, reached the Draft International Standard (DIS) and was sent for balloting after a very successful development stage that led to performance improvements above 25% against its best-performing anchor, VVC. This high performance, combined with implementation in current mobile phones or the possibilities given by the latent representation to be used in image processing applications, leads to new opportunities and will certainly launch a new era of compression technology.

The following are the main highlights of the 103rd JPEG meeting:

  • JPEG AI reaches Draft International Standard;
  • JPEG Trust integrates JPEG NFT;
  • JPEG Pleno Learning based Point Cloud coding releases a Draft International Standard;
  • JPEG Pleno Light Field works in a new compression model;
  • JPEG AIC analyses different subjective evaluation models for near visually lossless quality evaluation;
  • JPEG XE prepares a call for proposal on event-based coding;
  • JPEG DNA proceeds with the development of a standard for image compression using nucleotide sequences for supporting DNA storage;
  • JPEG XS 3rd edition;
  • JPEG XL analyses HDR coding.

The following sections summarise the main highlights of the 103rd JPEG meeting.

JPEG AI reaches Draft International Standard

At its 103rd meeting the JPEG Committee produced the Draft International Standard (DIS) of the JPEG AI Part 1 Core Coding Engine which is expected to be published as an International Standard in October 2024. JPEG AI offers a coding solution for standard reconstruction with significant improvements in compression efficiency over previous image coding standards at equivalent subjective quality. The JPEG AI coding design allows for hardware/software implementation encoding and decoding, in terms of memory and computational complexity, efficient coding of images with text and graphics, support for 8- and 10-bit depth, region of interest coding, and progressive coding. To cover multiple encoder and decoder complexity-efficiency tradeoffs, JPEG AI supports a multi-branch coding architecture with two encoders and three decoders (6 possible compatible combinations) that have been jointly trained. Compression efficiency (BD-rate) gains of 12.5% to 27.9% over the VVC Intra coding anchor, for relevant encoder and decoder configurations, can be achieved with a wide range of complexity tradeoffs (7 to 216 kMAC/px at the decoder side).

The work regarding JPEG AI profiles and levels (part 2), reference software (part 3) and conformance (part 4) has started and a request for sub-division has been issued in this meeting to establish a new part on the file format (part 5). At this meeting, most of the work focused on the JPEG AI high-level syntax and improvement of several normative and non-normative tools, such as hyper-decoder activations, training dataset, progressive decoding, training methodology and enhancement filters. There are now two smartphone implementations of JPEG AI available. In this meeting, a JPEG AI demo was shown running on a Huawei Mate50 Pro with a Qualcomm Snapdragon 8+ Gen1 with high resolution (4K) image decoding, tiling, full base operating point support and arbitrary image resolution decoding.

JPEG Trust

At the 103rd meeting, the JPEG Committee produced an updated version of the Use Cases and Requirements for JPEG Trust (v2.0). This document integrates the use cases and requirements of the JPEG NFT exploration with the use cases and requirements of JPEG Trust. In addition, a new document with Terms and Definitions for JPEG Trust (v1.0) was published which incorporates all terms and concepts as they are used in the context of the JPEG Trust activities. Finally, an updated version of the JPEG Trust White Paper v1.1 has been released. These documents are publicly available on the JPEG Trust/Documentation page.

JPEG Pleno Learning-based Point Cloud coding

The JPEG Committee continued its activity on Learning-based Point Cloud Coding under the JPEG Pleno family of standards. During the 103rd JPEG meeting, comments on the Committee Draft of IS0/IEC 21794 Part 6: “Learning-based point cloud coding” were received and the activity is on track for the release of a Draft International Standard for balloting at the 104th JPEG meeting in Sapporo, Japan in July 2024. A new version of the Verification Model (Version 4.1) was released during the 103rd JPEG meeting containing an updated entropy coding module. In addition, version 2.1 of the Common Training and Test Conditions was released as a public document.

JPEG Pleno Light Field

The JPEG Pleno Light Field activity progressed at this meeting with a number of technical submissions for improvements to the JPEG PLeno Model (JPLM). The JPLM provides reference implementations for the standardized technologies within the JPEG Pleno framework. The JPEG Pleno Light Field activity has an ongoing standardization activity concerning a novel light field coding architecture that delivers a single coding mode to efficiently code all types of light fields. This novel coding mode does not need any depth information resulting in significant improvement in compression efficiency.

The JPEG Pleno Light Field is also preparing standardization activities in the domains of objective and subjective quality assessment for light fields, aiming to address other plenoptic modalities in the future. During the meeting, important decisions were made regarding the execution of multiple collaborative subjective experiments aiming at exploring various aspects of subjective light field quality assessments. Additionally, a specialized tool for subjective quality evaluation has been developed to support these experiments. The outcomes of these experiments will guide the decisions during the subjective quality assessment standardization process. They will also be utilized in evaluating proposals for the upcoming objective quality assessment standardization activities.

JPEG AIC

During the 103rd JPEG meeting, the work on visual image quality assessment continued with a focus on JPEG AIC-3, targeting a standard for a subjective quality assessment methodology for images in the range from high to nearly visually lossless quality. The activity is currently investigating three kinds of subjective image quality assessment methodologies, notably the Boosted Triplet Comparison (BTC), the In-place Double Stimulus Quality Scale (IDSQS), and the In-place Plain Triplet Comparison (IPTC), as well as a unified framework capable of merging the results of two among them.

The JPEG Committee has also worked on the preparation of the Part 4 of the standard (JPEG AIC-4) by initiating work on the Draft Call for Proposals on Objective Image Quality Assessment. The Final Call for Proposals on Objective Image Quality Assessment is planned to be released in January 2025, while the submission of the proposals is planned for April 2025.

JPEG XE

The JPEG Committee continued its activity on JPEG XE and event-based vision. This activity revolves around a new and emerging image modality created by event-based visual sensors. JPEG XE is about the creation and development of a standard to represent events in an efficient way allowing interoperability between sensing, storage, and processing, targeting machine vision and other relevant applications. The JPEG Committee finished the Common Test Conditions v1.0 document that provides the means to perform an evaluation of candidate technologies for efficient coding of event sequences. The Common Test Conditions define a canonical raw event format, a reference dataset, a set of key performance metrics and an evaluation methodology. In addition, the JPEG Committee also finalized the Draft Call for Proposals on lossless coding for event-based data. This call will be finalized at the next JPEG meeting in July 2024. Both the Common Test Conditions v1.0 and the Draft Call for Proposals are publicly available on jpeg.org. Standardization will start with lossless coding of event sequences as this has the most imminent application urgency in industry. However, the JPEG Committee acknowledges that lossy coding of event sequences is also a valuable feature, which will be addressed at a later stage. The Ad-hoc Group on Event-based Vision was reestablished to continue the work towards the 104th JPEG meeting. To stay informed about the activities please join the event-based imaging Ad-hoc Group mailing list.

JPEG DNA

JPEG DNA is an exploration aiming at developing a standard that provides technical solutions that are capable of representing bi-level, continuous-tone grey-scale, continuous-tone colour, or multichannel digital samples in a format representing nucleotide sequences for supporting DNA storage. A Call for Proposals was published at the 99th JPEG meeting and based on performance assessment and a descriptive analysis of the solutions that had been submitted, the JPEG DNA Verification Model was created during the 102nd JPEG meeting. A number of core experiments were conducted to validate the Verification Model, and notably, the first Working Draft of JPEG DNA was produced during the 103rd JPEG meeting. Work towards the creation of the specification will start with newly defined core experiments to improve the rate-distortion performance of Verification Model and the robustness to insertion, deletion, and substitution errors. In parallel, efforts are underway to improve the noise simulator produced at the 102nd JPEG meeting to allow the assessment of the resilience to noise in the Verification Model in more realistic conditions and to explore learning-based coding solutions.

JPEG XS

The JPEG Committee is happy to announce that the core parts of JPEG XS 3rd edition are ready for publication as International Standards. The Final Draft International Standard for Part 1 of the standard – Core coding tools – is ready, and Part 2 – Profiles and buffer models – and Part 3 – Transport and container formats – are both being prepared by ISO for immediate publication. At this meeting, the JPEG Committee continued the work on Part 4 – Conformance testing, to provide the necessary test streams and test protocols to implementers of the 3rd edition. Consultation of the Committee Draft for Part 4 took place and a DIS version was issued. The development of the reference software, contained in Part 5, continued and the reference decoder is now feature-complete and fully compliant with the 3rd edition. A Committee Draft for Part 5 was issued at this meeting. Development of a fully compliant reference encoder is scheduled to be completed by July.

Finally, new experimental results were presented on how to use JPEG XS over 5G mobile networks for the wireless transmission of low-latency and high quality 4K/8K 360 degree views with mobile devices and VR headsets. More experiments will be conducted, but first results show that JPEG XS is capable of providing immersive and excellent quality of experience in VR use cases, mainly thanks to its native low-latency and low-complexity properties.

JPEG XL

The performance of JPEG XL on HDR images was investigated and the experiments will continue. Work on a hardware implementation continues, and further improvements are made to the libjxl reference software. The second editions of Parts 1 and 2 are in the final stages of the ISO process and will be published soon.

Final Quote

“The JPEG AI Draft International Standard is a yet another important milestone in an age where AI is rapidly replacing previous technologies. With this achievement, the JPEG Committee has demonstrated its ability to reinvent itself and adapt to new technological paradigms, offering standardized solutions based on latest state-of-the-art technologies.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

MPEG Column: 147th MPEG Meeting in Sapporo, Japan


The 147th MPEG meeting was held in Sapporo, Japan from 15-19 July 2024, and the official press release can be found here. It comprises the following highlights:

  • ISO Base Media File Format*: The 8th edition was promoted to Final Draft International Standard, supporting seamless media presentation for DASH and CMAF.
  • Syntactic Description Language: Finalized as an independent standard for MPEG-4 syntax.
  • Low-Overhead Image File Format*: First milestone achieved for small image handling improvements.
  • Neural Network Compression*: Second edition for conformance and reference software promoted.
  • Internet of Media Things (IoMT): Progress made on reference software for distributed media tasks.

* … covered in this column and expanded with possible research aspects.

8th edition of ISO Base Media File Format

The ever-growing expansion of the ISO/IEC 14496-12 ISO base media file format (ISOBMFF) application area has continuously brought new technologies to the standards. During the last couple of years, MPEG Systems (WG 3) has received new technologies on ISOBMFF for more seamless support of ISO/IEC 23009 Dynamic Adaptive Streaming over HTTP (DASH) and ISO/IEC 23000-19 Common Media Application Format (CMAF) leading to the development of the 8th edition of ISO/IEC14496-12.

The new edition of the standard includes new technologies to explicitly indicate the set of tracks representing various versions of the media presentation of a single media for seamless switching and continuous presentation. Such technologies will enable more efficient processing of the ISOBMFF formatted files for DASH manifest or CMAF Fragments.

Research aspects: The central research aspect of the 8th edition of ISOBMFF, which “will enable more efficient processing,” will undoubtedly be its evaluation compared to the state-of-the-art. Standards typically define a format, but how to use it is left open to implementers. Therefore, the implementation is a crucial aspect and will allow for a comparison of performance. One such implementation of ISOBMFF is GPAC, which most likely will be among the first to implement these new features.

Low-Overhead Image File Format

ISO/IEC 23008-12 image format specification defines generic structures for storing image items and sequences based on ISO/IEC 14496-12 ISO base media file format (ISOBMFF). As it allows the use of various high-performance video compression standards for a single image or a series of images, it has been adopted by the market quickly. However, it was challenging to use it for very small-sized images such as icons or emojis. While the initial design of the standard was versatile and useful for a wide range of applications, the size of headers becomes an overhead for applications with tiny images. Thus, Amendment 3 of ISO/IEC 23008-12 low-overhead image file format aims to address this use case by adding a new compact box for storing metadata instead of the ‘Meta’ box to lower the size of the overhead.

Research aspects: The issue regarding header sizes of ISOBMFF for small files or low bitrate (in the case of video streaming) was known for some time. Therefore, amendments in these directions are appreciated while further performance evaluations are needed to confirm design choices made at this initial step of standardization.

Neural Network Compression

An increasing number of artificial intelligence applications based on artificial neural networks, such as edge-based multimedia content processing, content-adaptive video post-processing filters, or federated training, need to exchange updates of neural networks (e.g., after training on additional data or fine-tuning to specific content). For this purpose, MPEG developed a second edition of the standard for coding of neural networks for multimedia content description and analysis (NNC, ISO/IEC 15938-17, published in 2024), adding syntax for differential coding of neural network parameters as well as new coding tools. Trained models can be compressed to at least 10-20% for several architectures, even below 3%, of their original size without performance loss. Higher compression rates are possible at moderate performance degradation. In a distributed training scenario, a model update after a training iteration can be represented at 1% or less of the base model size on average without sacrificing the classification performance of the neural network.

In order to facilitate the implementation of the standard, the accompanying standard ISO/IEC 15938-18 has been updated to cover the second edition of ISO/IEC 15938-17. This standard provides a reference software for encoding and decoding NNC bitstreams, as well as a set of conformance guidelines and reference bitstreams for testing of decoder implementations. The software covers the functionalities of both editions of the standard, and can be configured to test different combinations of coding tools specified by the standard.

Research aspects: The reference software for NNC, together with the reference software for audio/video codecs, are vital tools for building complex multimedia systems and its (baseline) evaluation with respect to compression efficiency only (not speed). This is because reference software is usually designed for functionality (i.e., compression in this case) and not performance.

The 148th MPEG meeting will be held in Kemer, Türkiye, from November 04-08, 2024. Click here for more information about MPEG meetings and their developments.

Overview of Open Dataset Sessions and Benchmarking Competitions in 2023-2024 – Part 1 (QoMEX 2023 and QoMEX 2024)

In this  and the following Dataset Columns, we present a review of some of the notable events related to open datasets and benchmarking competitions in the field of multimedia in the years 2023 and 2024. This selection highlights the wide range of topics and datasets currently of interest to the community. Some of the events covered in this review include special sessions on open datasets and competitions featuring multimedia data. This year’s review follows similar efforts from the previous year (https://records.sigmm.org/records-issues/acm-sigmm-records-issue-1-2023/), highlighting the ongoing importance of open datasets and benchmarking competitions in advancing research and development in multimedia. This first column focuses on the last two editions of QoMEX, i.e.,  2023 and 2024:

QoMEX 2023

4 dataset full papers were presented at the 15th International Conference on Quality of Multimedia Experience (QoMEX 2023), organized in Ghent, Belgium, June 19 – 21, 2023 (https://qomex2023.itec.aau.at/). The complete QoMEX ’23 Proceedings is available in the IEEE Xplore Digital Library (https://ieeexplore.ieee.org/xpl/conhome/10178424/proceeding).

These datasets were presented within the Datasets session, chaired by Professor Lea Skorin-Kapov. Given the scope of the conference (i.e., Quality of Multimedia Experience), these four papers present contributions focused on the impact on user perception of adaptive 2D video streaming, holographic video codecs, omnidirectional video/audio environments and multi-screen video.

PNATS-UHD-1-Long: An Open Video Quality Dataset for Long Sequences for HTTP-based Adaptive Streaming QoE Assessment
Ramachandra Rao, R. R., Borer, S., Lindero, D., Göring, S. and Raake, A.

Paper available at: https://ieeexplore.ieee.org/document/10178493   
Dataset available at: https://github.com/Telecommunication-Telemedia-Assessment/PNATS-UHD-1-Long 

A collaboration work of Technische Universität Ilmenau (Germany), Ericsson Research (Sweden) and Rohde&Schwarz (Switzerland) 

The presented dataset consists of 3 subjective databases targeting overall quality assessment of a typical HTTP-based Adaptive Streaming session consisting of degradations such as quality switching, initial loading delay, and stalling events using audiovisual content ranging between 2 and 5 minutes. In addition to this, subject bias and consistency in quality assessment of such longer-duration audiovisual contents with multiple degradations are investigated using a subject behaviour model. As part of this paper, the overall test design, subjective test results, sources, encoded audiovisual contents, and a set of analysis plots are made publicly available for further research.

Open access dataset of holographic videos for codec analysis and machine learning applications
Gilles, A., Gioia, P., Madali, N., El Rhammad, A., Morin, L.

Paper available at: https://ieeexplore.ieee.org/document/10178637 
Dataset available at: https://hologram-repository.labs.b-com.com/#/holographic-videos 

A collaboration work between IRT and INSA, Rennes, France

This is reported as the first large-scale dataset containing 18 holographic videos computed with three different resolutions and pixel pitches. By providing the color and depth images corresponding to each hologram frame, our dataset can be used in additional applications such as the validation of 3D scene geometry retrieval or deep learning-based hologram synthesis methods. Altogether, our dataset comprises 5400 pairs of RGB-D images and holograms, totaling more than 550 GB of data.

Saliency of Omnidirectional Videos with Different Audio Presentations: Analyses and Dataset
Singla, A., Robotham, T., Bhattacharya, A., Menz, W., Habets, E. and Raake, A.

Paper available at: https://ieeexplore.ieee.org/abstract/document/10178588 
Dataset available at: https://qoevave.github.io/database/docs/Saliency

A collaboration between the Technische Universität Ilmenau and the International Audio Laboratories of Erlangen, both in Germany.

This dataset uses a between-subjects test design to collect users’ exploration data of 360-degree videos in a free-form viewing scenario using the Varjo XR-3 Head Mounted Display, in the presence of no, mono, and 4th-order Ambisonics audio. Saliency information was captured as head-saliency in terms of the center of a viewport at 50 Hz. For each item, subjects were asked to describe the scene with a short free-verbalization task. Moreover, cybersickness was assessed using the simulator sickness questionnaire at the beginning and at the end of the test. The data is sought to enable training of visual and audiovisual saliency prediction models for interactive experiences.

A Subjective Dataset for Multi-Screen Video Streaming Applications
Barman, N., Reznik Y. and Martini, M. G.

Paper available at: https://ieeexplore.ieee.org/document/10178645 
Dataset available at: https://github.com/NabajeetBarman/Multiscreen-Dataset 

A collaboration between Brightcove (London, UK and Seattle, USA) and Kingston University Londong, UK.

This paper presents a new, open-source dataset consisting of subjective ratings for various encoded video sequences of different resolutions and bitrates (quality) when viewed on three devices of varying screen sizes: TV, Tablet, and Mobile. Along with the subjective scores, an evaluation of some of the most famous and commonly used open-source objective quality metrics is also presented. It is observed that the performance of the metrics varies a lot across different device types, with the recently standardized ITU-T P.1204.3 Model, on average, outperforming their full-reference counterparts. 

QoMEX’24

5 dataset full papers were presented at the 16th International Conference on Quality of Multimedia Experience (QoMEX 2024), organized in Karshamn, Sweden, June 18 – 20, 2024 (https://qomex2024.itec.aau.at/). The complete QoMEX ’24 Proceedings is available in the IEEE Xplore Digital Library (https://ieeexplore.ieee.org/xpl/conhome/10597667/proceeding ).

These datasets were presented within the Datasets session, chaired by Dr. Mohsen Jenadeleh. Given the scope of the conference (i.e., Quality of Multimedia Experience), these five papers present contributions focused on the impact on user perception of HDR videos (UHD-1, 8K, and AV1),  immersive 360° video and light fields. This last contribution was awarded the best paper award of the conference.

AVT-VQDB-UHD-1-HDR: An Open Video Quality Dataset for Quality Assessment of UHD-1 HDR Videos
Ramachandra Rao, R. R., Herb, B., Helmi-Aurora, T., Ahmed, M. T, Raake, A.

Paper available at: https://ieeexplore.ieee.org/document/10598284 
Dataset available at: https://github.com/Telecommunication-Telemedia-Assessment/AVT-VQDB-UHD-1-HDR 

A work from Technische Universität Ilmenau, Germany.

This dataset deals with the assessment of the perceived quality of HDR videos. Firstly, a subjective test with 4K/UHD1 HDR videos using the ACR-HR (Absolute Category Rating – Hidden Reference) method was conducted. The tests consisted of a total of 195 encoded videos from 5 source videos which all had a framerate of 60 fps. In this test, the 4K/UHD-1 HDR stimuli were encoded at four different resolutions, namely, 720p, 1080p, 1440p, and 2160p using bitrates ranging between 0.5 Mbps and 40 Mbps. The results of the subjective test have been analyzed to assess the impact of factors such as resolution, bitrate, video codec, and content on the perceived video quality. 

AVT-VQDB-UHD-2-HDR: An open 8K HDR source dataset for video quality research
Keller, D., Goebel, T., Sievenkees, V., Prenzel, J., Raake, A.

Paper available at: https://ieeexplore.ieee.org/document/10598268 
Dataset available at: https://github.com/Telecommunication-Telemedia-Assessment/AVT-VQDB-UHD-2-HDR 

A work from Techniche Universität Ilmenau, Germany.

The AVT-VQDB-UHD-2-HDR dataset consists of 31 8K HDR video sources of 15s created with the goal of accurately representing real-life footage, while taking into account video coding and video quality testing challenges. 

The effect of viewing distance and display peak luminance – HDR AV1 video streaming quality dataset
Hammou, D., Krasula, L., Bampis, C., Li, Z., Mantiuk, R.,

Paper available at: https://ieeexplore.ieee.org/document/10598289 
Dataset available at: https://doi.org/10.17863/CAM.107964 

A collaboration between University of Cambridge (UK) and Netflix Inc. (USA).

The HDR-VDC dataset captures the quality degradation of HDR content due to AV1 coding artifacts and the resolution reduction. The quality drop was measured at two viewing distances, corresponding to 60 and 120 pixels per visual degree, and two display mean luminance levels, 51 and 5.6 nits. It employs a highly sensitive pairwise comparison protocol with active sampling and comparisons across viewing distances to ensure possibly accurate quality measurements. It also provides the first publicly available dataset that measures the effect of display peak luminance and includes HDR videos encoded with AV1. 

A Spherical Light Field Database for Immersive Telecommunication and Telepresence Applications (Best Paper Award)
Zerman, E., Gond, M., Takhtardeshir, S., Olsson, R., Sjöström, M.

Paper available at: https://ieeexplore.ieee.org/document/10598264 
Dataset available at: https://zenodo.org/records/13342006 

A work presented from Mid Sweden University, Sundsvall, Sweden.

The Spherical Light Field Database (SLFDB) consists of a light field of 60 views captured with an omnidirectional camera in 20 scenes. To show the usefulness of the proposed database, we provide two use cases: compression and viewpoint estimation. The initial results validate that the publicly available SLFDB will benefit the scientific community.

AVT-ECoClass-VR: An open-source audiovisual 360° video and immersive CGI multi-talker dataset to evaluate cognitive performance
Fremerey, S., Breuer, C., Leist, L., Klatte, M., Fels, J., Raake, A.

Paper available at: https://ieeexplore.ieee.org/document/10598262 
Dataset available at: https://github.com/Telecommunication-Telemedia-Assessment/AVT-ECoClass-VR

A collaboration work between Technische Universität Ilmenau, RWTH Aache University and RPTU Kaiserslautern (Germany).

This dataset includes two audiovisual scenarios (360◦ video and computer-generated imagery) and two implementations for dataset playback. The 360◦ video part of the dataset features 200 video and single-channel audio recordings of 20 speakers reading ten stories, and 20 videos of speakers in silence, resulting in a total of 220 video and 200 audio recordings. The dataset also includes one 360◦ background image of a real primary school classroom scene, targeting young school children for subsequent subjective tests. The second part of the dataset comprises 20 different 3D models of the speakers and a computer-generated classroom scene, along with an immersive audiovisual virtual environment implementation that can be interacted with using an HTC Vive controller. 

From Theory to Practice: System QoE Assessment by Providers


Service and network providers actively evaluate and derive Quality of Experience (QoE) metrics within their systems, which necessitates suitable monitoring strategies. Objective QoE monitoring involves mapping Quality of Service (QoS) parameters into QoE scores, such as calculating Mean Opinion Scores (MOS) or Good-or-Better (GoB) ratios, by using appropriate mapping functions. Alternatively, individual QoE monitoring directly assesses user experience based on self-reported feedback. We discuss the strengths, weaknesses, opportunities, and threats of both approaches. Based on the collected data from individual or objective QoE monitoring, providers can calculate the QoE metrics across all users in the system, who are subjected to a range of varying QoS conditions. The aggregated QoE across all users in the system for a dedicated time frame is referred to as system QoE. Based on a comprehensive simulation study, the expected system QoE, the system GoB ratio, as well as QoE fairness across all users are computed. Our numerical results explore whether objective and individual QoE monitoring lead to similar conclusions. In our previous work [Hoss2024], we provided a theoretical framework and the mathematical derivation of the corresponding relationships between QoS and system QoE for both monitoring approaches. Here, the focus is on illustrating the key differences of individual and objective QoE monitoring and the consequences in practice.

System QoE: Assessment of QoE of Users in a System

The term “System QoE” refers to the assessment of user experience from a provider’s perspective, focusing on the perceived quality of the users of a particular service. Thereby, providers may be different stakeholders along the service delivery chain, for example, network service provider and, in particular, Internet service provider, or application service provider. QoE monitoring delivers the necessary information to evaluate the system QoE, which is the basis for appropriate actions to ensure high-quality services and high QoE, e.g., through resource and network management.

Typically, QoE monitoring and management involves evaluating how well the network and services perform by analyzing objective metrics like Quality of Service (QoS) parameters (e.g., latency, jitter, packet loss) and mapping them to QoE metrics, such as Mean Opinion Scores (MOS). However, QoE monitoring involves a series of steps that providers need to follow: 1) identify relevant QoE metrics of interest, like MOS or GoB ratio; 2) deploy a monitoring framework to collect and analyze data. We will discuss this in the following.

The scope of system QoE metrics is to quantify the QoE across all users consuming the service for a dedicated time frame, e.g., one day, one week, or one month. Thereby, the expected QoE of an arbitrary user in the system, the ratio of all users experiencing Good-or-Getter (GoB) quality or Poor-or-Worse (PoW) quality, as well as the QoE fairness across all users are of interest. The users in the system may achieve different QoS on network level, e.g., different latency, jitter, throughput, since resources are shared among the users. The same is also true on application level with varying application-specific QoS parameters, for instance, video resolution, buffering time, or startup delays for video streaming. The varying QoS conditions manifest then in the system QoE. Fundamental relationships between the system QoE and QoS metrics were derived in [Hoss2020].

Expected system QoE: The expected system QoE is the average QoE rating of an arbitrary user in the system. The fundamental relationship in [Hoss2020] shows that the expected system QoE may be derived by mapping the QoS as experienced by a user to the corresponding MOS value and computing the average MOS over the varying QoS conditions. Thus, a MOS mapping function is required to map the QoS parameters to MOS values.

System GoB and System PoW: The Mean Opinion Score provides an average score but fails to account for the variability in users and the user rating diversity. Thus, users obtaining the same QoS conditions, may rate this subjectively differently. Metrics like the percentage of users rating the experience as Good or Better or as Poor or Worse provide more granular insights. Such metrics help service providers understand not just the average quality, but how quality is distributed across the user base. The fundamental relationship in [Hoss2020] shows that the system GoB and PoW may be derived by mapping the QoS as experienced by a user to the corresponding GoB or PoW value and computing the average over the varying QoS conditions, respectively. Thus, a GoB or PoW mapping function is required.

QoE Fairness: Operators must not only ensure that users are sufficiently satisfied, but also that this is done in a fair manner. However, what is considered fair in the QoS domain may not necessarily translate to fairness in the QoE domain, making the need to apply a QoE fairness index. [Hoss2018] defines the QoE fairness index as a linear transformation of the standard deviation of MOS values to the range [0;1]. The observed standard deviation is normalized with the maximal standard deviation, being theoretically possible for MOS values in a finite range, typically between 1 (poor quality) and 5 (excellent quality). The difference between 1 (indicating perfect fairness) and the normalized standard deviation of MOS values (indicating the degree of unfairness) yields the fairness index.

The fundamental relationships allow different implementations of QoE monitoring in practice, which are visualized in Figure 1 and discussed in the following. We differentiate between individual QoE monitoring and objective QoE monitoring and provide a qualitative strengths-weaknesses-opportunities-threats (SWOT) analysis.

Figure 1. QoE monitoring approaches to assess system QoE: individual and objective QoE monitoring.

Individual QoE Monitoring

Individual QoE monitoring refers to the assessment of system QoE by collecting individual ratings, e.g., on a 5-point rating scale, from users through their personal feedback. This approach captures the unique and individual nature of user experiences, accounting for factors like personal preferences and context. It allows optimizing services in a personalized manner, which is regarded as a challenging future research objective, see [Schmitt2017, Zhu2018, Gao2020, Yamazaki2021, Skorin-Kapov2018].

The term “individual QoE” was nicely described by in [Zhu2018]: “QoE, by definition, is supposed to be subjective and individual. However, we use the term ‘individual QoE’, since the majority of the literature on QoE has not treated it as such. […] The challenge is that the set of individual factors upon which an individual’s QoE depends is not fixed; rather this (sub)set varies from one context to another, and it is this what justifies even more emphatically the individuality and uniqueness of a user’s experience – hence the term ‘individual QoE’.”

Strengths: Individual QoE monitoring provides valuable insights into how users personally experience a service, capturing the variability and uniqueness of individual perceptions that objective metrics often miss. A key strength is that it gathers direct feedback from a provider’s own users, ensuring a representative sample rather than relying on external or unrepresentative populations. Additionally, it does not require a predefined QoE model, allowing for flexibility in assessing user satisfaction. This approach enables service providers to directly derive various system QoE metrics.

Weaknesses: Individual QoE monitoring is mainly feasible for application service providers and requires additional monitoring efforts beyond the typical QoS tools already in place. Privacy concerns are significant, as collecting sensitive user data can raise issues with data protection and regulatory compliance, such as with GDPR. Additionally, users may use the system primarily as a complaint tool, focusing on reporting negative experiences, which could skew results. Feedback fatigue is another challenge, where users may become less willing to provide ongoing input over time, limiting the validity and reliability of the data collected.

Opportunities: Data from individual QoE monitoring can be utilized to enhance individual user QoE through better resource and service management. From a business perspective, offering a personalized QoE can set providers apart in competitive markets and the data collected has monetization potential, supporting personalized marketing. Data from individual QoE monitoring enables deriving objective metrics like MOS or GoB, to update existing QoE models or to develop new QoE models for novel services by correlating it with QoS parameters. Those insights can drive innovation, leading to new features or services that meet evolving customer needs.

Threats: Individual QoE monitoring accounts for factors outside the provider’s control, such as environmental context (e.g., noisy surroundings [Reichl2015, Jiménez2020]), which may affect user feedback but not reflect actual service performance. Additionally, as mentioned, it may be used as a complaint tool, with users disproportionately reporting negative experiences. There is also the risk of over-engineering solutions by focusing too much on minor individual issues, potentially diverting resources from addressing more significant, system-wide challenges that could have a broader impact on overall service quality

Objective QoE Monitoring

Objective QoE monitoring involves assessing user experience by translating measurable QoS parameters on network level, such as latency, jitter, and packet loss, and on application level, such as video resolution or stalling duration for video streaming, into QoE metrics using predefined models and mapping functions. Unlike individual QoE monitoring, it does not require direct user feedback and instead relies on technically measurable parameters to estimate user satisfaction and various QoE metrics [Hoss2016]. Thereby, the fundamental relationships between system QoE and QoS [Hoss2020] are utilized. For computing the expected system QoE, a MOS mapping function is required, which maps a dedicated QoS value to a MOS value. For computing the system GoB, a GoB mapping function between QoS and GoB is required. Note that the QoS may be a vector of various QoS parameters, which are the input values for the mapping function.

Recent works [Hoss2022] indicated that industrial user experience index values, as obtained by the Threshold-Based Quality (TBQ) model for QoE monitoring, may be accurate enough to derive system QoE metrics. The TBQ model is a framework that defines application-specific thresholds for QoS parameters to assess and classify the user experience, which may be derived with simple and interpretable machine learning models like decision trees.

Strengths: Objective QoE monitoring relies solely on QoS monitoring, making it applicable for network providers, even for encrypted data streams, as long as appropriate QoE models are available, see for example [Juluri2015, Orsolic2020, Casas2022]. It can be easily integrated into existing QoS monitoring tools already deployed, reducing the need for additional resources or infrastructure. Moreover, it offers an objective assessment of user experience, ensuring that the same QoS conditions for different users are consistently mapped to the same QoE scores, as required for QoE fairness.

Weaknesses: Objective QoE monitoring requires specific QoE models and mapping functions for each desired QoE metric, which can be complex and resource-intensive to develop. Additionally, it has limited visibility into the full user experience, as it primarily relies on network-level metrics like bandwidth, latency, and jitter, which may not capture all factors influencing user satisfaction. Its effectiveness is also dependent on the accuracy of the monitored QoS metrics; inaccurate or incomplete data, such as from encrypted packets, can lead to misguided decisions and misrepresentation of the actual user experience.

Opportunities: Objective QoE monitoring enables user-centric resource and network management for application and network service providers by tracking QoS metrics, allowing for dynamic adjustments to optimize resource utilization and improve service delivery. The integration of AI and automation with QoS monitoring can increase the efficiency and accuracy of network management from a user-centric perspective. The objective QoE monitoring data can also enhance Service Level Agreements (SLAs) towards Experience Level Agreements (ELAs) as discussed in [Varela2015].

Threats: One risk of Objective QoE monitoring is the potential for incorrect traffic flow characterization, where data flows may be misattributed to the wrong applications, leading to inaccurate QoE assessments. Additionally, rapid technological changes can quickly make existing QoS monitoring tools and QoE models outdated, necessitating constant upgrades and investment to keep pace with new technologies. These challenges can undermine the accuracy and effectiveness of objective QoE monitoring, potentially leading to misinformed decisions and increased operational costs.

Numerical Results: Visualizing the Differences

In this section, we explore and visualize the obtained system QoE metrics, which are based on collected data either through i) individual QoE monitoring or ii) objective QoE monitoring. The question arises if the two monitoring approaches lead to the same results and conclusions for the provider. The obvious approach for computing the system QoE metrics is to use i) the individual ratings collected directly from the users and ii) the MOS scores obtained through mapping the objectively collected QoS parameters. While the discrepancies are derived mathematically in [Hoss2024], this article presents a visual representation of the differences between individual and objective QoE monitoring through a comprehensive simulation study. This simulation approach allows us to quantify the expected system QoE, the system GoB ratio, and the QoE fairness for a multitude of potential system configurations, which we manipulate in the simulation with varying QoS distributions. Furthermore, we demonstrate methods for utilizing data obtained through either individual QoE monitoring or objective QoE monitoring to accurately calculate the system QoE metrics as intended for a provider.

For the numerical results, the web QoE use case in [Hoss2024] is employed. We conduct a comprehensive simulation study, in which the QoS settings are varied. To be more precise, the page load times (PLTs) are varied, such that the users in the system experience a range of different loading times. For each simulation run, the average PLT and the standard deviation of the PLT across all users in the system are fixed. Then each user gets a randomly assigned PLT according to a beta distribution in the range between 0s and 8s with the specified average and standard deviation. The PLTs per user are sampled from that parameterized beta distribution.

For a concrete PLT, the corresponding user rating distribution is available and follows in our case a shifted binomial distribution, where the mean of the binomial distribution reflects the MOS value for that condition. To mention this clearly, this binomial distribution is a conditional random variable with discrete values on a 5-point scale: the user ratings are conditioned on the actual QoS value. For the individual QoE monitoring, the user ratings are sampled from that conditional random variable, while the QoS values are sampled from the beta distribution. For objective QoE monitoring, only the QoS values are used, but in addition, the MOS mapping function provided in [Hoss2024] is used. Thus, each QoS value is mapped to a continuous MOS value within the range of 1 to 5.

Figure 2 shows the expected system QoE using individual QoE monitoring as well as objective QoE monitoring depending on the average QoS as well as the standard deviation of the QoS, which is indicated by the color. Each point in the figure represents a single simulation run with a fixed average QoS and fixed standard deviation. It can be seen that both QoE monitoring approaches lead to the same results, which was also formally proven in [Hoss2024]. Note that higher QoS variances also result in higher expected system since for the same average QoS, there may be some users with larger QoS values, but also some users with lower QoS values. Due to the non-linear mapping between QoS and QoE this results in higher QoE scores.

Figure 3 shows the system GoB ratio, which can be simply computed with individual QoE monitoring. However, in the case of objective QoE monitoring, we assume that only a MOS mapping function is available. It is tempting to derive the GoB ratio by deriving the ratio of MOS values which are good or better. However, this leads to wrong results, see [Hoss2020]. Nevertheless, the GoB mapping function can be approximated from an existing MOS mapping function, see [Hoss2022, Hoss2017, Perez2023]. Then, the same conclusions are then derived through objective QoE monitoring as for individual QoE monitoring.

Figure 4 considers now QoE fairness for both monitoring approaches. It is tempting to use the user rating values from individual QoE monitoring and apply the QoE fairness index. However, in that case, the fairness index considers the variances of the system QoS and additionally the variances due to user rating diversity, as shown in [Hoss2024]. However, this is not the intended application of the QoE fairness index, which aims to evaluate the fairness objectively from a user-centric perspective, such that resource management can be adjusted and to provide users with high and fairly distributed quality. Therefore, the QoE fairness index uses MOS values, such that users with the same QoS are assigned the same MOS value. In a system with deterministic QoS conditions, i.e., the standard deviation diminishes, the QoE fairness index is 100%, see the results for the objective QoE monitoring. Nevertheless, the individual QoE monitoring also allows computing the MOS values for similar QoS values and then to apply the QoE fairness index. Then, comparable results are obtained as for objective QoE monitoring.

Figure 2. Expected system QoE when using individual and objective QoE monitoring. Both approaches lead to the same expected system QoE.
Figure 3. System GoB ratio: Deriving the ratio of MOS values which are good or better does not work for objective QoE monitoring. But an adjusted GoB computation, by approximating GoB through MOS, leads to the same conclusions as individual QoE monitoring, which simply measures the system GoB.
Figure 4. QoE Fairness: Using the user rating values obtained through individual QoE monitoring additionally includes the user rating diversity, which is not desired in network or resource management. However, individual QoE monitoring also allows computing the MOS values for similar QoS values and then to apply the QoE fairness index, which leads to comparable insights as objective QoE monitoring.

Conclusions

Individual QoE monitoring and objective QoE monitoring are fundamentally distinct approaches for assessing system QoE from a provider’s perspective. Individual QoE monitoring relies on direct user feedback to capture personalized experiences, while objective QoE monitoring uses QoS metrics and QoE models to estimate QoE metrics. Both methods have strengths and weaknesses, offering opportunities for service optimization and innovation while facing challenges such as over-engineering and the risk of models becoming outdated due to technological advancements, as summarized in our SWOT analysis. However, as the numerical results have shown, both approaches can be used with appropriate modifications and adjustments to derive various system QoE metrics like expected system QoE, system GoB and PoW ratio, as well as QoE fairness. A promising direction for future research is the development of hybrid approaches that combine both methods, allowing providers to benefit from objective monitoring while integrating the personalization of individual feedback. This could also be interesting to integrate in existing approaches like the QoS/QoE Monitoring Engine proposal [Siokis2023] or for upcoming 6G networks, which may allow the radio access network (RAN) to autonomously adjust QoS metrics in collaboration with the application to enhance the overall QoE [Bertenyi2024].

References

[Bertenyi2024] Berteny, B., Kunzmann, G., Nielsen, S., and Pedersen, K. Andres, P. (2024). Transforming the 6G vision to action. Nokia Whitepaper, 28 June 2024. Url: https://www.bell-labs.com/institute/white-papers/transforming-the-6g-vision-to-action/.

[Casas2022] Casas, P., Seufert, M., Wassermann, S., Gardlo, B., Wehner, N., & Schatz, R. (2022). DeepCrypt: Deep learning for QoE monitoring and fingerprinting of user actions in adaptive video streaming. In 2022 IEEE 8th International Conference on Network Softwarization (NetSoft) (pp. TBD). IEEE.

[Gao2020] Gao, Y., Wei, X., & Zhou, L. (2020). Personalized QoE improvement for networking video serviceIEEE Journal on Selected Areas in Communications38(10), 2311-2323.

[Hoss2016] Hoßfeld, T., Schatz, R., Egger, S., & Fiedler, M. (2016). QoE beyond the MOS: An in-depth look at QoE via better metrics and their relation to MOS. Quality and User Experience, 1, 1-23.

[Hoss2017] Hoßfeld, T., Fiedler, M., & Gustafsson, J. (2017, May). Betas: Deriving quantiles from MOS-QoS relations of IQX models for QoE management. In 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM) (pp. 1011-1016). IEEE.

[Hoss2018] Hoßfeld, T., Skorin-Kapov, L., Heegaard, P. E., & Varela, M. (2018). A new QoE fairness index for QoE management. Quality and User Experience, 3, 1-23.

[Hoss2020] Hoßfeld, T., Heegaard, P. E., Skorin-Kapov, L., & Varela, M. (2020). Deriving QoE in systems: from fundamental relationships to a QoE-based Service-level Quality IndexQuality and User Experience5(1), 7.

[Hoss2022] Hoßfeld, T., Schatz, R., Egger, S., & Fiedler, M. (2022). Industrial user experience index vs. quality of experience models. IEEE Communications Magazine, 61(1), 98-104.

[Hoss2024] Hoßfeld, T., & Pérez, P. (2024). A theoretical framework for provider’s QoE assessment using individual and objective QoE monitoring. In 2024 16th International Conference on Quality of Multimedia Experience (QoMEX) (pp. TBD). IEEE.

[Jiménez2020] Jiménez, R. Z., Naderi, B., & Möller, S. (2020, May). Effect of environmental noise in speech quality assessment studies using crowdsourcing. In 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX) (pp. 1-6). IEEE.

[Juluri2015] Juluri, P., Tamarapalli, V., & Medhi, D. (2015). Measurement of quality of experience of video-on-demand services: A survey. IEEE Communications Surveys & Tutorials, 18(1), 401-418.

[Orsolic2020] Orsolic, I., & Skorin-Kapov, L. (2020). A framework for in-network QoE monitoring of encrypted video streaming. IEEE Access, 8, 74691-74706.

[Perez2023] Pérez, P. (2023). The Transmission Rating Scale and its Relation to Subjective Scores. In 2023 15th International Conference on Quality of Multimedia Experience (QoMEX) (pp. 31-36). IEEE.

[Reichl2015] Reichl, P., et al. (2015, May). Towards a comprehensive framework for QoE and user behavior modelling. In 2015 Seventh International Workshop on Quality of Multimedia Experience (QoMEX) (pp. 1-6). IEEE.

[Schmitt2017] Schmitt, M., Redi, J., Bulterman, D., & César, P. (2017). Towards individual QoE for multiparty videoconferencing. IEEE Transactions on Multimedia, 20(7), 1781-1795.

[Siokis2023] Siokis, A., Ramantas, K., Margetis, G., Stamou, S., McCloskey, R., Tolan, M., & Verikoukis, C. V. (2023). 5GMediaHUB QoS/QoE monitoring engine. In 2023 IEEE 28th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD) (pp. TBD). IEEE.

[Skorin-Kapov2018] Skorin-Kapov, L., Varela, M., Hoßfeld, T., & Chen, K. T. (2018). A survey of emerging concepts and challenges for QoE management of multimedia servicesACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)14(2s), 1-29.

[Varela2015] Varela, M., Zwickl, P., Reichl, P., Xie, M., & Schulzrinne, H. (2015, June). From service level agreements (SLA) to experience level agreements (ELA): The challenges of selling QoE to the user. In 2015 IEEE International Conference on Communication Workshop (ICCW) (pp. 1741-1746). IEEE.

[Yamazaki2021] Yamazaki, T. (2021). Quality of experience (QoE) studies: Present state and future prospectIEICE Transactions on Communications104(7), 716-724.

[Zhu2018] Zhu, Y., Guntuku, S. C., Lin, W., Ghinea, G., & Redi, J. A. (2018). Measuring individual video QoE: A survey, and proposal for future directions using social mediaACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)14(2s), 1-24.

The 2nd Edition of the Spring School on Social XR organized by CWI

ACM SIGMM co-sponsored the second edition of the Spring School on Social XR, organized by the Distributed and Interactive Systems group (DIS) at CWI in Amsterdam. The event took place on March 4th – 8th 2024 and attracted 30 students from different disciplines (technology, social sciences, and humanities). The program included 22 lectures, 6 of them open, by 23 instructors. The event was organized by Irene Viola, Silvia Rossi, Thomas Röggla, and Pablo Cesar from CWI; and Omar Niamut from TNO. The event was co-sponsored by the ACM Special Interest  Group on Multimedia ACM SIGMM, making available student grants and supporting international speaker from under-represented countries, and The Netherlands Institute for Sound and Vision (https://www.beeldengeluid.nl/en).

Students and organisers of the Spring School on Social XR (March 4th – 8th 2024, Amsterdam)

“The future of media communication is immersive, and will empower sectors such as cultural heritage, education, manufacturing, and provide a climate-neutral alternative to travelling in the European Green Deal”. With such a vision in mind, the organization committee continued for a second edition with a holistic program around the research topic of Social XR. The program included keynotes and workshops, where prominent scientists in the field shared their knowledge with students and triggered meaningful conversations and exchanges.

A poster session at the CWI DIS Spring School 2024.

The program included topics such as the capturing and modelling of realistic avatars and their behavior, coding and transmission techniques of volumetric video content, ethics for the design and development of responsible social XR experiences, novel rending and interaction paradigms, and human factors and evaluation of experiences. Together, they provided a holistic perspective, helping participants to better understand the area and to initiate a network of collaboration to overcome current limitations of current real-time conferencing systems.

The spring school is part of the semester program organized by the DIS group of CWI. It was initiated in May 2022 with the Symposium on human-centered multimedia systems: a workshop and seminar to celebrate the inaugural lecture, “Human-Centered Multimedia: Making Remote Togetherness Possible” of Prof. Pablo Cesar. Then, it was continued in 2023 with the 1st Spring School on Social XR.

The list of talks were:

  • “Volumetric Content Creation for Immersive XR Experiences” by Aljosa Smolic
  • “Social Signal Processing as a Method for Modelling Behaviour in SocialXR” by Julie Williamson
  • “Towards a Virtual Reality” by Elmar Eisemann
  • “Meeting Yourself and Others in Virtual Reality” by Mel Slater
  • “Social Presence in VR – A Media Psychology Perspective” by Tilo Hartmann
  • “Ubiquitous Mixed Reality: Designing Mixed Reality Technology to Fit into the Fabric of our Daily Lives” by Jan Gugenheimer
  • “Building Military Family Cohesion through Social XR: A 8-Week Field Study” by Sun Joo (Grace) Ahn
  • “Navigating the Ethical Landscape of XR: Building a Necessary Framework” by Eleni Mangina
  • “360° Multi-Sensory Experience Authoring” by Debora Christina Muchaluat Saade
  • “QoE Assessment of XR” by Patrick le Callet
  • “Bringing Soul to Digital” by Natasja Paulssen
  • “Evaluating QoE for Social XR – Audio, Visual, Audiovisual and Communication Aspects” by Alexander Raake
  • “Immersive Technologies Through the Lens of Public Values” by Mariëtte van Huijstee
  • “Designing Innovative Future XR Meeting Spaces” by Katherine Isbister
  • “Evaluation Methods for Social XR Experiences” by Mark Billinghurst
  • “Recent Advances in 3D Videocommunication” by Oliver Schreer
  • “Virtual Humans in Social XR” by Zerrin Yumak
  • “The Power of Graphs Learning in Immersive Communications” by Laura Toni
  • “Boundless Creativity: Bridging Sectors for Social Impact” by Benjamin de Wit
  • “Social XR in 5G and Beyond: Use Cases, Requirements, and Standardization Activities” by Lea Skorin-Kapov
  • “An Overview on Standardization for Social XR”  by Pablo Perez and Jesús Gutiérrez
  • “Funding: The Path to Research Independence” by Sergio Cabrero

SIGMM Strike Teams Activity Report (April, 2024)

On April 10th, 2024, during the SIGMM Advisory Board meeting, the Strike Team Leaders, Touradj Ebrahimi, Arnold Smeulders, Miriam Redi and Xavier Alameda Pineda (represented by Marco Bertini) reported the results of their activity. They are summarized in the following in the form of recommendations that should be intended as guidelines and behavioral advice for our ongoing and future activity. SIGMM members in charge of SIGMM activities, SIGMM Conference leaders and particularly the organizers of the next ACMMM editions, are invited to adhere to these recommendations for their concerns, implement the items marked as mandatory and report to the SIGMM Advisory Board after the event.

All the SIGMM Strike Teams will remain in charge for two years starting January 1st, 2024 for reviews and updates.

The world is changing rapidly, and technology is driving these changes at an unprecedented pace. In this scenario, multimedia has become ubiquitous, providing new services to users, advanced modalities for information transmission, processing, and management, as well as innovative solutions for digital content understanding and production. The progress of Artificial Intelligence has fueled new opportunities and vitality in the field. New media formats, such as 3D, event data, and other sensory inputs, have become popular. Cutting-edge applications are constantly being developed and introduced.

SIGMM Strike Team on Industry Engagement

Team members: Touradj Ebrahimi (EPFL),Ali Begen (Ozyegin Univ), Balu Adsumilli (Google), Yong Rui (Lenovo) and ChangSheng Xu (Chinese Academy of Sciences)
Coordinator: Touradj Ebrahimi

The team provided recommendations for both ACMMM organizers and SIGMM Advisory Board. The recommendations addressed improving the presence of industry at ACMMM and other SIGMM Conferences/Workshops launching new in-cooperation initiatives and establishing stable bi- directional links.

  1. Organization of industry-focused events
    • Suggested / Mandatory for ACMMM Organizers and SIGMM AB: Create industry-focused promotional materials like pamphlets/brochures for industry participation (sponsorship, exhibit, etc.) in the style of ICASSP 2024 and ICIP 2024
    • Suggested for ACMMM Organizers: invite Keynote Speakers from industry, eventually with financial support of SIGMMM. Keynote talks should be similar to plenary talks but around specific application challenges.
    • Suggested for ACMMM Organizers: organize Special Sessions and Workshops around specific applications of interest to companies and startups. Sessions should be coordinated by industry with eventual support from an experienced and confirmed scholar.
    • Suggested for ACMMM Organizers: organize Hands-on Sessions led by industry to receive feedback on future products and services.
    • Suggested for ACMMM Organizers: organize Panel Sessions led by industry and standardization committees on timely topics relevant to industry e.g. How companies cope with AI.
    • Suggested for ACMMM Organizers: organize Tutorial sessions given by qualified people from industry and standardization committees at SIGMM-sponsored conferences/workshops
    • Suggested for ACMMM Organizers: promote contributions mainly from the industry in theform of Industry Sessions to present companies and their products and services.
    • Suggested for ACMMM Organizers and SIGMM AB: promote Joint SIGMM / Standardization workshop on latest standards e.g. JPEG meets SIGMM, MPEG meets SIGMM, AOM meets SIGMM.
    • Suggested for ACMMM Organizers: organize Job Fairs like job interview speed dating during ACMMM
  2. Initiatives for linkage
    • Mandatory for SIGMM Organizers and SIGMM AB: Create and maintain a mailing list of industrial targets, taking care of GDPR (Include a question in the registration form of SIGMM-sponsored conferences)
    • Suggested for SIGMM AB: organize monthly talks by industry leaders either from large established or SMEs or startups sharing technical/scientific challenges they face and solutions
  3. Initiatives around reproducible results and benchmarking
    • Suggested for ACMMM Organizers and SIGMM AB: support release of databases, studies on performance assessment procedures and metrics eventually focused on specific applications.
    • Suggested for ACMMM Organizers: organize Grand Challenges initiated and sponsored by industry.

Strike Team on ACMMM Format

Team Members: Arnold Smeulders (Univ. of Amsterdam), Alan Smeaton (Dublin City University), Tat Seng Chua (National University of Singapore), Ralf Steinmetz (Univ. Darmstadt), Changwen Chen (Hong Kong Polytechnic Univ.), Nicu Sebe (Univ. of Trento), Marcel Worring (Univ. of Amsterdam), Jianfei Cai (Monash Univ.), Cathal Gurrin (Dublin City Univ.).
Coordinator: Arnold Smeulders

The team provided recommendations for both ACMMM organizers and SIGMM Advisory Board. The recommendations addressed distinct items related to Conference identity, Conference budget and Conference memory.

1. Intended audience. It is generally felt that ACMMM is under pressure from neighboring conferences growing very big. There is consensus that growing big should not be the purpose of ACMMM: a 750 – 1500 size was thought to be ideal including being attractive to industry. Growth should come naturally.

  • Suggested for ACMMM Organizers and SIGMM AB: Promote distant travel by lowering fees for those who travels far
  • Suggested for ACMMM Organizers: Include (a personalized) visa invitation in the call for papers.

2. Community feel, differentiation and interdisciplinarity. Identity is not an actionable concern, but one of the shared common goods is T-shaped individuals interested in neighboring disciplines making an interdisciplinary or multidisciplinary connection. It is desirable to differentiate submitted papers from major close conferences like CVPR. This point is already implemented in the call for papers of ACMMM 2024.

    null
  • Mandatory for ACMMM OrganizersAsk in the submission how the paper fits in the multimedia community and its scientific tradition as illustrated by citations. Consider this information in the explicit review criteria.
  • Recommended for ACMMM Organizers: Support the physical presence of participants by rebalancing fees.
  • Suggested for ACMMM Organizers and SIGMM AB: Organize a session around the SIGMM test of time award, make selection early, funded by SIGMM.
  • Suggested for ACMMM Organizers: Organize moderated discussion sessions for papers on the same theme.

3. Brave New Ideas. Brave New is very well fitting with the intended audience. It is essential that we are able to draw out brave and new ideas from our community for long term growth and vibrancy. The emphasis in reviewing Brave New Ideas should be on the novelty even if it is not perfect. Rotate over a pool of people to prevent lock-in.

    null
  • Suggested / Mandatory for ACMMM OrganizersInclude in the submission a 3-minute pitch video to archive in the ACM digital library.
  • Suggested / Mandatory for ACMMM Organizers: Select reviewers from a pool of senior people to review novelty.
  • Suggested for ACMMM Organizers: Start with one session of 4 papers, if successful, add another session later.

4. Application. There should be no support for one specific application area exclusively in the main conference. Yet, applications areas should be focused in special sessions or workshops.

  • Suggested for ACMMM Organizers: Focus on application-related workshops or special sessions with own reviewing.

5. Presentation. When the core business of ACM MM is inter- and multi-disciplinarity it is natural to make the presentation for a broader audience part of the selection. ACM should make the short videos accessible as a service to the science or general public. TED-like videos for a paper fit naturally with ACMMM and fit with the trend in YouTube to communicate your paper. If too much to do, SIGMM AB should support reviewing the videos financially.

  • Mandatory to ACMMM Organizers: Include a TED-like 3-minute pitch video as part of the submission and this is archived by ACM Digital Library as part of the conference proceedings, to be submitted a week after the paper deadline for review, so there is time to prepare it after the regular paper submission.

6. Promote open-accessFor a data-driven and fair comparison promote open access of data to be used in the next conference to compare to.

  • Suggested for SIGMM AB: Open access for data encouraged.

7. Keynotes. For the intended audience and interdisciplinary, it is felt essential to have keynote on the key-topics of the moment. Keynotes should not focus on one topic but maintaining the diversity of topics in the conference and over the years, so to be sure new ideas are inserted in the community.

  • Suggested to SIGMM AB: to directly fund a big name, expensive, marquee keynote speaker sponsored by SIGMM to one of the societally urgent key-notes as evident from news.

8. Diversity over subdisciplines, etc Do extra effort for Arts, GenAI use models, security, HCI and demos. We need to ensure that if the submitted papers are of sufficiently high quality, there should be at least a session on that sub- topic in the conference. We need to ensure that the conference is not overwhelmed by a popular topic with easy review criteria and generally of much higher review scores.

  •  Suggested for ACMMM Organizers: Promote diversity of all relevant topics in the call for papers and by action in subcommunities by an ambassador. SIGMM will supervise the diversity.

9. Living report. To enhance the institutional memory, maintain a living document passed on from organizer to organizer, with suggestions. The owner of the document is the commissioner for conferences of SIG MM.

  • Mandatory for ACMMM Organizers and SIGMM AB: A short report to the SIGMM commissioner for conferences from the ACMMM chair, including a few recommendations for the next time; handed over to the next conference after the end of the current conference.

SIGMM Strike Team on Harmonization and Spread

Team members: Miriam Redi (Wikimedia Foundation), Sivia Rossi (CWI), Irene Viola (CWI), Mylene Farias (Texas State Univ. and Univ. Brasilia), Ichiro Ide (Nagoya Univ), Pablo Cesar (CWI and TU Delft).
Coordinator: Miriam Redi

The team provided recommendations for both ACMMM organizers and SIGMM Advisory Board. The recommendations addressed distinct items related to give SIGMM Records and Social Media a more central role in SIGMM, integrate SIGMM Records and Social Media in the whole process of the ACMMM organization since its initial planning.

1. SIGMM Website The SIGMM Website is not updated and needs a serious overhaul.

  • Mandatory for SIGMM AB: restart the website from scratch being inspired by other SIGs f.e. reaching out to people at CHI to understand what can be done. Budget should be provided by SIGMM.

2. SIGMM Social Media Channels SIGMM Social media accounts (twitter and linkedin) are managed by the Social Media Team at the SIGMM Records

  • Suggested for SIGMM AB: continuing this organization expanding responsibilities of the team to include conferences and other events

3. Conference Social Media: Social media presence of conferences is managed by the individual conferences. It is not uniform and disconnected from SIGMM social media and the Records. The social media presence of ACMMM flagship conference is weak and needs help. Creating continuity in terms of strategy and processes across conference editions is key.

  • Mandatory for ACMMM Organizers and SIGMM AB: create a Handbook of conference communications: a set of guidelines about how to create continuity across conference editions in terms of communications, and how to connect the SIGMM Records to the rest of the community.
  • Suggested for ACMMM Organizers and SIGMM AB: one member of the Social Media team at the SIGMM Records is systematically invited to join the OC of major conferences as publicity co-chair. The steering committee chair of each conference should commit to keeping the organizers of each conference edition informed about this policy, and monitor its implementation throughout the years.

SIGMM Strike Team on Open Review

Team members: Xavier Alameda Pineda (Univ. Grenoble-Alpes), Marco Bertini (Univ. Firenze). Coordinator: Xavier Alameda Pineda

The team continued the support to ACMMM Conference organizers for the use of Open Review in the ACMMM reviewing process, helping to implement new functions or improve the existing ones and supporting smooth transfer of the best practices. The recommendations addressed distinct items to complete the migration and stabilize use of Open Review in the future ACMMM editions.

1. Technical development and support

  • Mandatory for the Team: update and publish the scripts; complete the Open Review configuration.
  • Mandatory for SIGMM AB and ACMMM organizers: create a Committee led by the TPC chairs of the current ACMM edition a rotating basis.

2. Communication

  • Mandatory for the Team: write a small manual for use and include it in the future ACMMM Handbook.

Alberto Del Bimbo                                                                                        
SIGMM Chair