In this and the following Dataset Columns, we present a review of some of the notable events related to open datasets and benchmarking competitions in the field of multimedia in the years 2023 and 2024. This selection highlights the wide range of topics and datasets currently of interest to the community. Some of the events covered in this review include special sessions on open datasets and competitions featuring multimedia data. This year’s review follows similar efforts from the previous year (https://records.sigmm.org/records-issues/acm-sigmm-records-issue-1-2023/), highlighting the ongoing importance of open datasets and benchmarking competitions in advancing research and development in multimedia. This first column focuses on the last two editions of QoMEX, i.e., 2023 and 2024:
These datasets were presented within the Datasets session, chaired by Professor Lea Skorin-Kapov. Given the scope of the conference (i.e., Quality of Multimedia Experience), these four papers present contributions focused on the impact on user perception of adaptive 2D video streaming, holographic video codecs, omnidirectional video/audio environments and multi-screen video.
PNATS-UHD-1-Long: An Open Video Quality Dataset for Long Sequences for HTTP-based Adaptive Streaming QoE Assessment Ramachandra Rao, R. R., Borer, S., Lindero, D., Göring, S. and Raake, A.
A collaboration work of Technische Universität Ilmenau (Germany), Ericsson Research (Sweden) and Rohde&Schwarz (Switzerland)
The presented dataset consists of 3 subjective databases targeting overall quality assessment of a typical HTTP-based Adaptive Streaming session consisting of degradations such as quality switching, initial loading delay, and stalling events using audiovisual content ranging between 2 and 5 minutes. In addition to this, subject bias and consistency in quality assessment of such longer-duration audiovisual contents with multiple degradations are investigated using a subject behaviour model. As part of this paper, the overall test design, subjective test results, sources, encoded audiovisual contents, and a set of analysis plots are made publicly available for further research.
Open access dataset of holographic videos for codec analysis and machine learning applications Gilles, A., Gioia, P., Madali, N., El Rhammad, A., Morin, L.
A collaboration work between IRT and INSA, Rennes, France
This is reported as the first large-scale dataset containing 18 holographic videos computed with three different resolutions and pixel pitches. By providing the color and depth images corresponding to each hologram frame, our dataset can be used in additional applications such as the validation of 3D scene geometry retrieval or deep learning-based hologram synthesis methods. Altogether, our dataset comprises 5400 pairs of RGB-D images and holograms, totaling more than 550 GB of data.
Saliency of Omnidirectional Videos with Different Audio Presentations: Analyses and Dataset Singla, A., Robotham, T., Bhattacharya, A., Menz, W., Habets, E. and Raake, A.
A collaboration between the Technische Universität Ilmenau and the International Audio Laboratories of Erlangen, both in Germany.
This dataset uses a between-subjects test design to collect users’ exploration data of 360-degree videos in a free-form viewing scenario using the Varjo XR-3 Head Mounted Display, in the presence of no, mono, and 4th-order Ambisonics audio. Saliency information was captured as head-saliency in terms of the center of a viewport at 50 Hz. For each item, subjects were asked to describe the scene with a short free-verbalization task. Moreover, cybersickness was assessed using the simulator sickness questionnaire at the beginning and at the end of the test. The data is sought to enable training of visual and audiovisual saliency prediction models for interactive experiences.
A Subjective Dataset for Multi-Screen Video Streaming Applications Barman, N., Reznik Y. and Martini, M. G.
A collaboration between Brightcove (London, UK and Seattle, USA) and Kingston University Londong, UK.
This paper presents a new, open-source dataset consisting of subjective ratings for various encoded video sequences of different resolutions and bitrates (quality) when viewed on three devices of varying screen sizes: TV, Tablet, and Mobile. Along with the subjective scores, an evaluation of some of the most famous and commonly used open-source objective quality metrics is also presented. It is observed that the performance of the metrics varies a lot across different device types, with the recently standardized ITU-T P.1204.3 Model, on average, outperforming their full-reference counterparts.
These datasets were presented within the Datasets session, chaired by Dr. Mohsen Jenadeleh. Given the scope of the conference (i.e., Quality of Multimedia Experience), these five papers present contributions focused on the impact on user perception of HDR videos (UHD-1, 8K, and AV1), immersive 360° video and light fields. This last contribution was awarded the best paper award of the conference.
AVT-VQDB-UHD-1-HDR: An Open Video Quality Dataset for Quality Assessment of UHD-1 HDR Videos Ramachandra Rao, R. R., Herb, B., Helmi-Aurora, T., Ahmed, M. T, Raake, A.
A work from Technische Universität Ilmenau, Germany.
This dataset deals with the assessment of the perceived quality of HDR videos. Firstly, a subjective test with 4K/UHD1 HDR videos using the ACR-HR (Absolute Category Rating – Hidden Reference) method was conducted. The tests consisted of a total of 195 encoded videos from 5 source videos which all had a framerate of 60 fps. In this test, the 4K/UHD-1 HDR stimuli were encoded at four different resolutions, namely, 720p, 1080p, 1440p, and 2160p using bitrates ranging between 0.5 Mbps and 40 Mbps. The results of the subjective test have been analyzed to assess the impact of factors such as resolution, bitrate, video codec, and content on the perceived video quality.
AVT-VQDB-UHD-2-HDR: An open 8K HDR source dataset for video quality research Keller, D., Goebel, T., Sievenkees, V., Prenzel, J., Raake, A.
A work from Techniche Universität Ilmenau, Germany.
The AVT-VQDB-UHD-2-HDR dataset consists of 31 8K HDR video sources of 15s created with the goal of accurately representing real-life footage, while taking into account video coding and video quality testing challenges.
The effect of viewing distance and display peak luminance – HDR AV1 video streaming quality dataset Hammou, D., Krasula, L., Bampis, C., Li, Z., Mantiuk, R.,
A collaboration between University of Cambridge (UK) and Netflix Inc. (USA).
The HDR-VDC dataset captures the quality degradation of HDR content due to AV1 coding artifacts and the resolution reduction. The quality drop was measured at two viewing distances, corresponding to 60 and 120 pixels per visual degree, and two display mean luminance levels, 51 and 5.6 nits. It employs a highly sensitive pairwise comparison protocol with active sampling and comparisons across viewing distances to ensure possibly accurate quality measurements. It also provides the first publicly available dataset that measures the effect of display peak luminance and includes HDR videos encoded with AV1.
A Spherical Light Field Database for Immersive Telecommunication and Telepresence Applications (Best Paper Award) Zerman, E., Gond, M., Takhtardeshir, S., Olsson, R., Sjöström, M.
A work presented from Mid Sweden University, Sundsvall, Sweden.
The Spherical Light Field Database (SLFDB) consists of a light field of 60 views captured with an omnidirectional camera in 20 scenes. To show the usefulness of the proposed database, we provide two use cases: compression and viewpoint estimation. The initial results validate that the publicly available SLFDB will benefit the scientific community.
AVT-ECoClass-VR: An open-source audiovisual 360° video and immersive CGI multi-talker dataset to evaluate cognitive performance Fremerey, S., Breuer, C., Leist, L., Klatte, M., Fels, J., Raake, A.
A collaboration work between Technische Universität Ilmenau, RWTH Aache University and RPTU Kaiserslautern (Germany).
This dataset includes two audiovisual scenarios (360◦ video and computer-generated imagery) and two implementations for dataset playback. The 360◦ video part of the dataset features 200 video and single-channel audio recordings of 20 speakers reading ten stories, and 20 videos of speakers in silence, resulting in a total of 220 video and 200 audio recordings. The dataset also includes one 360◦ background image of a real primary school classroom scene, targeting young school children for subsequent subjective tests. The second part of the dataset comprises 20 different 3D models of the speakers and a computer-generated classroom scene, along with an immersive audiovisual virtual environment implementation that can be interacted with using an HTC Vive controller.
Service and network providers actively evaluate and derive Quality of Experience (QoE) metrics within their systems, which necessitates suitable monitoring strategies. Objective QoE monitoring involves mapping Quality of Service (QoS) parameters into QoE scores, such as calculating Mean Opinion Scores (MOS) or Good-or-Better (GoB) ratios, by using appropriate mapping functions. Alternatively, individual QoE monitoring directly assesses user experience based on self-reported feedback. We discuss the strengths, weaknesses, opportunities, and threats of both approaches. Based on the collected data from individual or objective QoE monitoring, providers can calculate the QoE metrics across all users in the system, who are subjected to a range of varying QoS conditions. The aggregated QoE across all users in the system for a dedicated time frame is referred to as system QoE. Based on a comprehensive simulation study, the expected system QoE, the system GoB ratio, as well as QoE fairness across all users are computed. Our numerical results explore whether objective and individual QoE monitoring lead to similar conclusions. In our previous work [Hoss2024], we provided a theoretical framework and the mathematical derivation of the corresponding relationships between QoS and system QoE for both monitoring approaches. Here, the focus is on illustrating the key differences of individual and objective QoE monitoring and the consequences in practice.
System QoE: Assessment of QoE of Users in a System
The term “System QoE” refers to the assessment of user experience from a provider’s perspective, focusing on the perceived quality of the users of a particular service. Thereby, providers may be different stakeholders along the service delivery chain, for example, network service provider and, in particular, Internet service provider, or application service provider. QoE monitoring delivers the necessary information to evaluate the system QoE, which is the basis for appropriate actions to ensure high-quality services and high QoE, e.g., through resource and network management.
Typically, QoE monitoring and management involves
evaluating how well the network and services perform by analyzing objective
metrics like Quality of Service (QoS) parameters (e.g., latency, jitter, packet
loss) and mapping them to QoE metrics, such as Mean Opinion Scores (MOS). However,
QoE monitoring involves a series of steps that providers need to follow: 1) identify
relevant QoE metrics of interest, like MOS or GoB ratio; 2) deploy a monitoring
framework to collect and analyze data. We will discuss this in the following.
The scope of system QoE metrics is to quantify the QoE across all users consuming the service for a dedicated time frame, e.g., one day, one week, or one month. Thereby, the expected QoE of an arbitrary user in the system, the ratio of all users experiencing Good-or-Getter (GoB) quality or Poor-or-Worse (PoW) quality, as well as the QoE fairness across all users are of interest. The users in the system may achieve different QoS on network level, e.g., different latency, jitter, throughput, since resources are shared among the users. The same is also true on application level with varying application-specific QoS parameters, for instance, video resolution, buffering time, or startup delays for video streaming. The varying QoS conditions manifest then in the system QoE. Fundamental relationships between the system QoE and QoS metrics were derived in [Hoss2020].
Expected system QoE: The expected system QoE is the average QoE rating of an arbitrary user in the system. The fundamental relationship in [Hoss2020] shows that the expected system QoE may be derived by mapping the QoS as experienced by a user to the corresponding MOS value and computing the average MOS over the varying QoS conditions. Thus, a MOS mapping function is required to map the QoS parameters to MOS values.
System GoB and System PoW: The Mean Opinion Score provides an average score but fails to account for the variability in users and the user rating diversity. Thus, users obtaining the same QoS conditions, may rate this subjectively differently. Metrics like the percentage of users rating the experience as Good or Better or as Poor or Worse provide more granular insights. Such metrics help service providers understand not just the average quality, but how quality is distributed across the user base. The fundamental relationship in [Hoss2020] shows that the system GoB and PoW may be derived by mapping the QoS as experienced by a user to the corresponding GoB or PoW value and computing the average over the varying QoS conditions, respectively. Thus, a GoB or PoW mapping function is required.
QoE Fairness: Operators must not only ensure that users are sufficiently satisfied, but also that this is done in a fair manner. However, what is considered fair in the QoS domain may not necessarily translate to fairness in the QoE domain, making the need to apply a QoE fairness index. [Hoss2018] defines the QoE fairness index as a linear transformation of the standard deviation of MOS values to the range [0;1]. The observed standard deviation is normalized with the maximal standard deviation, being theoretically possible for MOS values in a finite range, typically between 1 (poor quality) and 5 (excellent quality). The difference between 1 (indicating perfect fairness) and the normalized standard deviation of MOS values (indicating the degree of unfairness) yields the fairness index.
The fundamental relationships allow different implementations of QoE monitoring in practice, which are visualized in Figure 1 and discussed in the following. We differentiate between individual QoE monitoring and objective QoE monitoring and provide a qualitative strengths-weaknesses-opportunities-threats (SWOT) analysis.
Figure 1. QoE monitoring approaches to assess system QoE: individual and objective QoE monitoring.
Individual QoE Monitoring
Individual QoE monitoring refers to the assessment of system QoE by collecting individual ratings, e.g., on a 5-point rating scale, from users through their personal feedback. This approach captures the unique and individual nature of user experiences, accounting for factors like personal preferences and context. It allows optimizing services in a personalized manner, which is regarded as a challenging future research objective, see [Schmitt2017, Zhu2018, Gao2020, Yamazaki2021, Skorin-Kapov2018].
The term “individual QoE” was nicely described by in [Zhu2018]: “QoE, by definition, is supposed to be subjective and individual. However, we use the term ‘individual QoE’, since the majority of the literature on QoE has not treated it as such. […] The challenge is that the set of individual factors upon which an individual’s QoE depends is not fixed; rather this (sub)set varies from one context to another, and it is this what justifies even more emphatically the individuality and uniqueness of a user’s experience – hence the term ‘individual QoE’.”
Strengths:
Individual QoE monitoring provides valuable
insights into how users personally experience a service, capturing the
variability and uniqueness of individual perceptions that objective metrics
often miss. A key strength is that it gathers direct feedback from a provider’s
own users, ensuring a representative sample rather than relying on external or
unrepresentative populations. Additionally, it does not require a predefined
QoE model, allowing for flexibility in assessing user satisfaction. This approach
enables service providers to directly derive various system QoE metrics.
Weaknesses:
Individual QoE monitoring is mainly feasible for
application service providers and requires additional monitoring efforts beyond
the typical QoS tools already in place. Privacy concerns are significant, as
collecting sensitive user data can raise issues with data protection and
regulatory compliance, such as with GDPR. Additionally, users may use the
system primarily as a complaint tool, focusing on reporting negative
experiences, which could skew results. Feedback fatigue is another challenge,
where users may become less willing to provide ongoing input over time,
limiting the validity and reliability of the data collected.
Opportunities: Data from individual QoE monitoring can be utilized to enhance
individual user QoE through better resource and service management. From a
business perspective, offering a personalized QoE can set providers apart in
competitive markets and the data collected has monetization potential,
supporting personalized marketing. Data from individual QoE monitoring enables
deriving objective metrics like MOS or GoB, to update existing QoE models or to
develop new QoE models for novel services by correlating it with QoS parameters.
Those insights can drive innovation, leading to new features or services that
meet evolving customer needs.
Threats: Individual QoE monitoring accounts for factors outside the provider’s control, such as environmental context (e.g., noisy surroundings [Reichl2015, Jiménez2020]), which may affect user feedback but not reflect actual service performance. Additionally, as mentioned, it may be used as a complaint tool, with users disproportionately reporting negative experiences. There is also the risk of over-engineering solutions by focusing too much on minor individual issues, potentially diverting resources from addressing more significant, system-wide challenges that could have a broader impact on overall service quality
Objective QoE Monitoring
Objective QoE monitoring involves assessing user experience by translating measurable QoS parameters on network level, such as latency, jitter, and packet loss, and on application level, such as video resolution or stalling duration for video streaming, into QoE metrics using predefined models and mapping functions. Unlike individual QoE monitoring, it does not require direct user feedback and instead relies on technically measurable parameters to estimate user satisfaction and various QoE metrics [Hoss2016]. Thereby, the fundamental relationships between system QoE and QoS [Hoss2020] are utilized. For computing the expected system QoE, a MOS mapping function is required, which maps a dedicated QoS value to a MOS value. For computing the system GoB, a GoB mapping function between QoS and GoB is required. Note that the QoS may be a vector of various QoS parameters, which are the input values for the mapping function.
Recent works [Hoss2022] indicated that industrial user experience index values, as obtained by the Threshold-Based Quality (TBQ) model for QoE monitoring, may be accurate enough to derive system QoE metrics. The TBQ model is a framework that defines application-specific thresholds for QoS parameters to assess and classify the user experience, which may be derived with simple and interpretable machine learning models like decision trees.
Strengths: Objective QoE monitoring relies solely on QoS monitoring, making it applicable for network providers, even for encrypted data streams, as long as appropriate QoE models are available, see for example [Juluri2015, Orsolic2020, Casas2022]. It can be easily integrated into existing QoS monitoring tools already deployed, reducing the need for additional resources or infrastructure. Moreover, it offers an objective assessment of user experience, ensuring that the same QoS conditions for different users are consistently mapped to the same QoE scores, as required for QoE fairness.
Weaknesses: Objective QoE monitoring requires specific QoE models and mapping
functions for each desired QoE metric, which can be complex and
resource-intensive to develop. Additionally, it has limited visibility into the
full user experience, as it primarily relies on network-level metrics like
bandwidth, latency, and jitter, which may not capture all factors influencing
user satisfaction. Its effectiveness is also dependent on the accuracy of the
monitored QoS metrics; inaccurate or incomplete data, such as from encrypted
packets, can lead to misguided decisions and misrepresentation of the actual
user experience.
Opportunities: Objective QoE monitoring enables user-centric resource and network management for application and network service providers by tracking QoS metrics, allowing for dynamic adjustments to optimize resource utilization and improve service delivery. The integration of AI and automation with QoS monitoring can increase the efficiency and accuracy of network management from a user-centric perspective. The objective QoE monitoring data can also enhance Service Level Agreements (SLAs) towards Experience Level Agreements (ELAs) as discussed in [Varela2015].
Threats: One risk of Objective QoE monitoring is the potential for incorrect traffic flow characterization, where data flows may be misattributed to the wrong applications, leading to inaccurate QoE assessments. Additionally, rapid technological changes can quickly make existing QoS monitoring tools and QoE models outdated, necessitating constant upgrades and investment to keep pace with new technologies. These challenges can undermine the accuracy and effectiveness of objective QoE monitoring, potentially leading to misinformed decisions and increased operational costs.
Numerical Results: Visualizing the Differences
In this section, we explore and visualize the obtained system QoE metrics, which are based on collected data either through i) individual QoE monitoring or ii) objective QoE monitoring. The question arises if the two monitoring approaches lead to the same results and conclusions for the provider. The obvious approach for computing the system QoE metrics is to use i) the individual ratings collected directly from the users and ii) the MOS scores obtained through mapping the objectively collected QoS parameters. While the discrepancies are derived mathematically in [Hoss2024], this article presents a visual representation of the differences between individual and objective QoE monitoring through a comprehensive simulation study. This simulation approach allows us to quantify the expected system QoE, the system GoB ratio, and the QoE fairness for a multitude of potential system configurations, which we manipulate in the simulation with varying QoS distributions. Furthermore, we demonstrate methods for utilizing data obtained through either individual QoE monitoring or objective QoE monitoring to accurately calculate the system QoE metrics as intended for a provider.
For the numerical results, the web QoE use case in [Hoss2024] is employed. We conduct a comprehensive simulation study, in which the QoS settings are varied. To be more precise, the page load times (PLTs) are varied, such that the users in the system experience a range of different loading times. For each simulation run, the average PLT and the standard deviation of the PLT across all users in the system are fixed. Then each user gets a randomly assigned PLT according to a beta distribution in the range between 0s and 8s with the specified average and standard deviation. The PLTs per user are sampled from that parameterized beta distribution.
For a concrete PLT, the corresponding user rating distribution is available and follows in our case a shifted binomial distribution, where the mean of the binomial distribution reflects the MOS value for that condition. To mention this clearly, this binomial distribution is a conditional random variable with discrete values on a 5-point scale: the user ratings are conditioned on the actual QoS value. For the individual QoE monitoring, the user ratings are sampled from that conditional random variable, while the QoS values are sampled from the beta distribution. For objective QoE monitoring, only the QoS values are used, but in addition, the MOS mapping function provided in [Hoss2024] is used. Thus, each QoS value is mapped to a continuous MOS value within the range of 1 to 5.
Figure 2 shows the expected system QoE using individual QoE monitoring as well as objective QoE monitoring depending on the average QoS as well as the standard deviation of the QoS, which is indicated by the color. Each point in the figure represents a single simulation run with a fixed average QoS and fixed standard deviation. It can be seen that both QoE monitoring approaches lead to the same results, which was also formally proven in [Hoss2024]. Note that higher QoS variances also result in higher expected system since for the same average QoS, there may be some users with larger QoS values, but also some users with lower QoS values. Due to the non-linear mapping between QoS and QoE this results in higher QoE scores.
Figure 3 shows the system GoB ratio, which can be simply computed with individual QoE monitoring. However, in the case of objective QoE monitoring, we assume that only a MOS mapping function is available. It is tempting to derive the GoB ratio by deriving the ratio of MOS values which are good or better. However, this leads to wrong results, see [Hoss2020]. Nevertheless, the GoB mapping function can be approximated from an existing MOS mapping function, see [Hoss2022, Hoss2017, Perez2023]. Then, the same conclusions are then derived through objective QoE monitoring as for individual QoE monitoring.
Figure 4 considers now QoE fairness for both monitoring approaches. It is tempting to use the user rating values from individual QoE monitoring and apply the QoE fairness index. However, in that case, the fairness index considers the variances of the system QoS and additionally the variances due to user rating diversity, as shown in [Hoss2024]. However, this is not the intended application of the QoE fairness index, which aims to evaluate the fairness objectively from a user-centric perspective, such that resource management can be adjusted and to provide users with high and fairly distributed quality. Therefore, the QoE fairness index uses MOS values, such that users with the same QoS are assigned the same MOS value. In a system with deterministic QoS conditions, i.e., the standard deviation diminishes, the QoE fairness index is 100%, see the results for the objective QoE monitoring. Nevertheless, the individual QoE monitoring also allows computing the MOS values for similar QoS values and then to apply the QoE fairness index. Then, comparable results are obtained as for objective QoE monitoring.
Figure 2. Expected system QoE when using individual and objective QoE monitoring. Both approaches lead to the same expected system QoE.
Figure 3. System GoB ratio: Deriving the ratio of MOS values which are good or better does not work for objective QoE monitoring. But an adjusted GoB computation, by approximating GoB through MOS, leads to the same conclusions as individual QoE monitoring, which simply measures the system GoB.
Figure 4. QoE Fairness: Using the user rating values obtained through individual QoE monitoring additionally includes the user rating diversity, which is not desired in network or resource management. However, individual QoE monitoring also allows computing the MOS values for similar QoS values and then to apply the QoE fairness index, which leads to comparable insights as objective QoE monitoring.
Conclusions
Individual QoE monitoring and objective QoE monitoring are fundamentally distinct approaches for assessing system QoE from a provider’s perspective. Individual QoE monitoring relies on direct user feedback to capture personalized experiences, while objective QoE monitoring uses QoS metrics and QoE models to estimate QoE metrics. Both methods have strengths and weaknesses, offering opportunities for service optimization and innovation while facing challenges such as over-engineering and the risk of models becoming outdated due to technological advancements, as summarized in our SWOT analysis. However, as the numerical results have shown, both approaches can be used with appropriate modifications and adjustments to derive various system QoE metrics like expected system QoE, system GoB and PoW ratio, as well as QoE fairness. A promising direction for future research is the development of hybrid approaches that combine both methods, allowing providers to benefit from objective monitoring while integrating the personalization of individual feedback. This could also be interesting to integrate in existing approaches like the QoS/QoE Monitoring Engine proposal [Siokis2023] or for upcoming 6G networks, which may allow the radio access network (RAN) to autonomously adjust QoS metrics in collaboration with the application to enhance the overall QoE [Bertenyi2024].
[Siokis2023] Siokis, A., Ramantas, K., Margetis, G., Stamou, S., McCloskey, R., Tolan, M., & Verikoukis, C. V. (2023). 5GMediaHUB QoS/QoE monitoring engine. In 2023 IEEE 28th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD) (pp. TBD). IEEE.
ACM SIGMM co-sponsored the second edition of the Spring School on Social XR, organized by the Distributed and Interactive Systems group (DIS) at CWI in Amsterdam. The event took place on March 4th – 8th 2024 and attracted 30 students from different disciplines (technology, social sciences, and humanities). The program included 22 lectures, 6 of them open, by 23 instructors. The event was organized by Irene Viola, Silvia Rossi, Thomas Röggla, and Pablo Cesar from CWI; and Omar Niamut from TNO. The event was co-sponsored by the ACM Special Interest Group on Multimedia ACM SIGMM, making available student grants and supporting international speaker from under-represented countries, and The Netherlands Institute for Sound and Vision (https://www.beeldengeluid.nl/en).
Students and organisers of the Spring School on Social XR (March 4th – 8th 2024, Amsterdam)
“The future of media communication is immersive, and will empower sectors such as cultural heritage, education, manufacturing, and provide a climate-neutral alternative to travelling in the European Green Deal”. With such a vision in mind, the organization committee continued for a second edition with a holistic program around the research topic of Social XR. The program included keynotes and workshops, where prominent scientists in the field shared their knowledge with students and triggered meaningful conversations and exchanges.
A poster session at the CWI DIS Spring School 2024.
The program included topics such as the capturing and modelling of realistic avatars and their behavior, coding and transmission techniques of volumetric video content, ethics for the design and development of responsible social XR experiences, novel rending and interaction paradigms, and human factors and evaluation of experiences. Together, they provided a holistic perspective, helping participants to better understand the area and to initiate a network of collaboration to overcome current limitations of current real-time conferencing systems.
The spring school is part of the semester program organized by the DIS group of CWI. It was initiated in May 2022 with the Symposium on human-centered multimedia systems: a workshop and seminar to celebrate the inaugural lecture, “Human-Centered Multimedia: Making Remote Togetherness Possible” of Prof. Pablo Cesar. Then, it was continued in 2023 with the 1st Spring School on Social XR.
On April 10th, 2024, during the SIGMM Advisory Board meeting, the Strike Team Leaders, Touradj Ebrahimi, Arnold Smeulders, Miriam Redi and Xavier Alameda Pineda (represented by Marco Bertini) reported the results of their activity. They are summarized in the following in the form of recommendations that should be intended as guidelines and behavioral advice for our ongoing and future activity. SIGMM members in charge of SIGMM activities, SIGMM Conference leaders and particularly the organizers of the next ACMMM editions, are invited to adhere to these recommendations for their concerns, implement the items marked as mandatory and report to the SIGMM Advisory Board after the event.
All the SIGMM Strike Teams will remain in charge for two years starting January 1st, 2024 for reviews and updates.
The world is changing rapidly, and technology is driving these changes at an unprecedented pace. In this scenario, multimedia has become ubiquitous, providing new services to users, advanced modalities for information transmission, processing, and management, as well as innovative solutions for digital content understanding and production. The progress of Artificial Intelligence has fueled new opportunities and vitality in the field. New media formats, such as 3D, event data, and other sensory inputs, have become popular. Cutting-edge applications are constantly being developed and introduced.
SIGMM Strike Team on Industry Engagement
Team members: Touradj Ebrahimi (EPFL),Ali Begen (Ozyegin Univ), Balu Adsumilli (Google), Yong Rui (Lenovo) and ChangSheng Xu (Chinese Academy of Sciences) Coordinator: Touradj Ebrahimi
The team provided recommendations for both ACMMM organizers and SIGMM Advisory Board. The recommendations addressed improving the presence of industry at ACMMM and other SIGMM Conferences/Workshops launching new in-cooperation initiatives and establishing stable bi- directional links.
Organization of industry-focused events
Suggested / Mandatory for ACMMM Organizers and SIGMM AB: Create industry-focused promotional materials like pamphlets/brochures for industry participation (sponsorship, exhibit, etc.) in the style of ICASSP 2024 and ICIP 2024
Suggested for ACMMM Organizers: invite Keynote Speakers from industry, eventually with financial support of SIGMMM. Keynote talks should be similar to plenary talks but around specific application challenges.
Suggested for ACMMM Organizers: organize Special Sessions and Workshops around specific applications of interest to companies and startups. Sessions should be coordinated by industry with eventual support from an experienced and confirmed scholar.
Suggested for ACMMM Organizers: organize Hands-on Sessions led by industry to receive feedback on future products and services.
Suggested for ACMMM Organizers: organize Panel Sessions led by industry and standardization committees on timely topics relevant to industry e.g. How companies cope with AI.
Suggested for ACMMM Organizers: organize Tutorial sessions given by qualified people from industry and standardization committees at SIGMM-sponsored conferences/workshops
Suggested for ACMMM Organizers: promote contributions mainly from the industry in theform of Industry Sessions to present companies and their products and services.
Suggested for ACMMM Organizers and SIGMM AB: promote Joint SIGMM / Standardization workshop on latest standards e.g. JPEG meets SIGMM, MPEG meets SIGMM, AOM meets SIGMM.
Suggested for ACMMM Organizers: organize Job Fairs like job interview speed dating during ACMMM
Initiatives for linkage
Mandatory for SIGMM Organizers and SIGMM AB: Create and maintain a mailing list of industrial targets, taking care of GDPR (Include a question in the registration form of SIGMM-sponsored conferences)
Suggested for SIGMM AB: organize monthly talks by industry leaders either from large established or SMEs or startups sharing technical/scientific challenges they face and solutions
Initiatives around reproducible results and benchmarking
Suggested for ACMMM Organizers and SIGMM AB: support release of databases, studies on performance assessment procedures and metrics eventually focused on specific applications.
Suggested for ACMMM Organizers: organize Grand Challenges initiated and sponsored by industry.
Strike Team on ACMMM Format
Team Members: Arnold Smeulders (Univ. of Amsterdam), Alan Smeaton (Dublin City University), Tat Seng Chua (National University of Singapore), Ralf Steinmetz (Univ. Darmstadt), Changwen Chen (Hong Kong Polytechnic Univ.), Nicu Sebe (Univ. of Trento), Marcel Worring (Univ. of Amsterdam), Jianfei Cai (Monash Univ.), Cathal Gurrin (Dublin City Univ.). Coordinator: Arnold Smeulders
The team provided recommendations for both ACMMM organizers and SIGMM Advisory Board. The recommendations addressed distinct items related to Conference identity, Conference budget and Conference memory.
1. Intended audience. It is generally felt that ACMMM is under pressure from neighboring conferences growing very big. There is consensus that growing big should not be the purpose of ACMMM: a 750 – 1500 size was thought to be ideal including being attractive to industry. Growth should come naturally.
Suggested for ACMMM Organizers and SIGMM AB: Promote distant travel by lowering fees for those who travels far
Suggested for ACMMM Organizers: Include (a personalized) visa invitation in the call for papers.
2. Community feel, differentiation and interdisciplinarity. Identity is not an actionable concern, but one of the shared common goods is T-shaped individuals interested in neighboring disciplines making an interdisciplinary or multidisciplinary connection. It is desirable to differentiate submitted papers from major close conferences like CVPR. This point is already implemented in the call for papers of ACMMM 2024.
null
Mandatory for ACMMM Organizers: Ask in the submission how the paper fits in the multimedia community and its scientific tradition as illustrated by citations. Consider this information in the explicit review criteria.
Recommended for ACMMM Organizers: Support the physical presence of participants by rebalancing fees.
Suggested for ACMMM Organizers and SIGMM AB: Organize a session around the SIGMM test of time award, make selection early, funded by SIGMM.
Suggested for ACMMM Organizers: Organize moderated discussion sessions for papers on the same theme.
3. Brave New Ideas. Brave New is very well fitting with the intended audience. It is essential that we are able to draw out brave and new ideas from our community for long term growth and vibrancy. The emphasis in reviewing Brave New Ideas should be on the novelty even if it is not perfect. Rotate over a pool of people to prevent lock-in.
null
Suggested / Mandatory for ACMMM Organizers: Include in the submission a 3-minute pitch video to archive in the ACM digital library.
Suggested / Mandatory for ACMMM Organizers: Select reviewers from a pool of senior people to review novelty.
Suggested for ACMMM Organizers: Start with one session of 4 papers, if successful, add another session later.
4. Application. There should be no support for one specific application area exclusively in the main conference. Yet, applications areas should be focused in special sessions or workshops.
Suggested for ACMMM Organizers: Focus on application-related workshops or special sessions with own reviewing.
5. Presentation. When the core business of ACM MM is inter- and multi-disciplinarity it is natural to make the presentation for a broader audience part of the selection. ACM should make the short videos accessible as a service to the science or general public. TED-like videos for a paper fit naturally with ACMMM and fit with the trend in YouTube to communicate your paper. If too much to do, SIGMM AB should support reviewing the videos financially.
Mandatory to ACMMM Organizers: Include a TED-like 3-minute pitch video as part of the submission and this is archived by ACM Digital Library as part of the conference proceedings, to be submitted a week after the paper deadline for review, so there is time to prepare it after the regular paper submission.
6. Promote open-access. For a data-driven and fair comparison promote open access of data to be used in the next conference to compare to.
Suggested for SIGMM AB: Open access for data encouraged.
7. Keynotes. For the intended audience and interdisciplinary, it is felt essential to have keynote on the key-topics of the moment. Keynotes should not focus on one topic but maintaining the diversity of topics in the conference and over the years, so to be sure new ideas are inserted in the community.
Suggested to SIGMM AB: to directly fund a big name, expensive, marquee keynote speaker sponsored by SIGMM to one of the societally urgent key-notes as evident from news.
8. Diversity over subdisciplines, etc Do extra effort for Arts, GenAI use models, security, HCI and demos. We need to ensure that if the submitted papers are of sufficiently high quality, there should be at least a session on that sub- topic in the conference. We need to ensure that the conference is not overwhelmed by a popular topic with easy review criteria and generally of much higher review scores.
Suggested for ACMMM Organizers: Promote diversity of all relevant topics in the call for papers and by action in subcommunities by an ambassador. SIGMM will supervise the diversity.
9. Living report. To enhance the institutional memory, maintain a living document passed on from organizer to organizer, with suggestions. The owner of the document is the commissioner for conferences of SIG MM.
Mandatory for ACMMM Organizers and SIGMM AB: A short report to the SIGMM commissioner for conferences from the ACMMM chair, including a few recommendations for the next time; handed over to the next conference after the end of the current conference.
SIGMM Strike Team on Harmonization and Spread
Team members: Miriam Redi (Wikimedia Foundation), Sivia Rossi (CWI), Irene Viola (CWI), Mylene Farias (Texas State Univ. and Univ. Brasilia), Ichiro Ide (Nagoya Univ), Pablo Cesar (CWI and TU Delft). Coordinator: Miriam Redi
The team provided recommendations for both ACMMM organizers and SIGMM Advisory Board. The recommendations addressed distinct items related to give SIGMM Records and Social Media a more central role in SIGMM, integrate SIGMM Records and Social Media in the whole process of the ACMMM organization since its initial planning.
1. SIGMM Website The SIGMM Website is not updated and needs a serious overhaul.
Mandatory for SIGMM AB: restart the website from scratch being inspired by other SIGs f.e. reaching out to people at CHI to understand what can be done. Budget should be provided by SIGMM.
2. SIGMM Social Media Channels SIGMM Social media accounts (twitter and linkedin) are managed by the Social Media Team at the SIGMM Records
Suggested for SIGMM AB: continuing this organization expanding responsibilities of the team to include conferences and other events
3. Conference Social Media: Social media presence of conferences is managed by the individual conferences. It is not uniform and disconnected from SIGMM social media and the Records. The social media presence of ACMMM flagship conference is weak and needs help. Creating continuity in terms of strategy and processes across conference editions is key.
Mandatory for ACMMM Organizers and SIGMM AB: create a Handbook of conference communications: a set of guidelines about how to create continuity across conference editions in terms of communications, and how to connect the SIGMM Records to the rest of the community.
Suggested for ACMMM Organizers and SIGMM AB: one member of the Social Media team at the SIGMM Records is systematically invited to join the OC of major conferences as publicity co-chair. The steering committee chair of each conference should commit to keeping the organizers of each conference edition informed about this policy, and monitor its implementation throughout the years.
SIGMM Strike Team on Open Review
Team members: Xavier Alameda Pineda (Univ. Grenoble-Alpes), Marco Bertini (Univ. Firenze). Coordinator: Xavier Alameda Pineda
The team continued the support to ACMMM Conference organizers for the use of Open Review in the ACMMM reviewing process, helping to implement new functions or improve the existing ones and supporting smooth transfer of the best practices. The recommendations addressed distinct items to complete the migration and stabilize use of Open Review in the future ACMMM editions.
1. Technical development and support
Mandatory for the Team: update and publish the scripts; complete the Open Review configuration.
Mandatory for SIGMM AB and ACMMM organizers: create a Committee led by the TPC chairs of the current ACMM edition a rotating basis.
2. Communication
Mandatory for the Team: write a small manual for use and include it in the future ACMMM Handbook.
The last plenary meeting of the Video Quality Experts Group (VQEG) was held online by the University of Konstantz (Germany) in December 18th to 21st, 2023. It offered the possibility to more than 100 registered participants from 19 different countries worldwide to attend the numerous presentations and discussions about topics related to the ongoing projects within VQEG. All the related information, minutes, and files from the meeting are available online in the VQEG meeting website, and video recordings of the meeting are soon available at Youtube.
All the topics mentioned below can be of interest for the SIGMM community working on quality assessment, but special attention can be devoted to the current activities on improvements of the statistical analysis of subjective experiments and objective metrics and on the development of a test plan to evaluate the QoE of immersive interactive communication systems in collaboration with ITU.
Readers of these columns interested in the ongoing projects of VQEG are encouraged to suscribe to the VQEG’s email reflectors to follow the activities going on and to get involved with them.
As already announced in the VQEG website, the next VQEG plenary meeting be hosted by Universität Klagenfurt in Austria from July 1st to 5th, 2024.
Group picture of the online meeting
Overview of VQEG Projects
Audiovisual HD (AVHD)
The AVHD group works on developing and validating subjective and objective methods to analyze commonly available video systems. During the meeting, there were various sessions in which presentations related to these topics were discussed.
Firstly, Ali Ak (Nantes Université, France), provided an analysis of the relation between acceptance/annoyance and visual quality in a recently collected dataset of several User Generated Content (UGC) videos. Then, Syed Uddin (AGH University of Krakow, Poland) presented a video quality assessment method based on the quantization parameter of MPEG encoders (MPEG-4, MPEG-AVC, and MPEG-HEVC) leveraging VMAF. In addition, Sang Heon Le (LG Electronics, Korea) presented a technique for pre-enhancement for video compression and applicable subjective quality metrics. Another talk was given by Alexander Raake (TU Ilmenau, Germany), who presented AVQBits, a versatile no-reference bitstream-based video quality model (based on the standardized ITU-T P.1204.3 model) that can be applied in several contexts such as video service monitoring, evaluation of video encoding quality, of gaming video QoE, and even of omnidirectional video quality. Also, Jingwen Zhu (Nantes Université, France) and Hadi Amirpour (University of Klagenfurt, Austria) described a study on the evaluation of the effectiveness of different video quality metrics in predicting the Satisfied User Ratio (SUR) in order to enhance the VMAF proxy to better capture content-specific characteristics. Andreas Pastor (Nantes Université, France) presented a method to predict the distortion perceived locally by human eyes in AV1-encoded videos using deep features, which can be easily integrated into video codecs as a pre-processing step before starting encoding.
In relation with
standardization efforts, Mathias Wien (RWTH Aachen University, Germany) gave an
overview on recent expert viewing tests that have been conducted within MPEG
AG5 at the 143rd and 144th MPEG meetings. Also, Kamil Koniuch (AGH University
of Krakow, Poland) presented a proposal to update the Survival Game task
defined in the ITU-T
Recommendation P.1301 on subjective quality evaluation of audio and
audiovisual multiparty telemeetings, in order to improve its implementation and
application to recent efforts such as the evaluation of immersive communication
systems within the ITU-T P.IXC (see the paragraph
related to the Immersive Media Group).
Quality Assessment for Health applications (QAH)
The QAH group is focused on the quality assessment of health applications. It addresses subjective evaluation, generation of datasets, development of objective metrics, and task-based approaches. Recently, the group has been working towards an ITU-T recommendation for the assessment of medical contents. On this topic, Meriem Outtas (INSA Rennes, France) led a discussion dealing with the edition of a draft of this recommendation. In addition, Lumi Xia (INSA Rennes, France) presented a study of task-based medical image quality assessment focusing on a use case of adrenal lesions.
Statistical Analysis Methods (SAM)
The group SAM investigates on analysis methods both for the results of subjective experiments and for objective quality models and metrics. This was one of the most active groups in this meeting, with several presentations on related topics.
In addition, Lukas Krasula (Netflix, USA) introduced e2nest, a web-based platform to conduct media-centric (video, audio, and images) subjective tests. Also, Dietmar Saupe (University of Konstanz, Germany) and Simon Del Pin (NTNU, Norway) showed the results of a study analyzing the national difference in image quality assessment, showing significant differences in various areas. Alexander Raake (TU Ilmenau, Germany) presented a study on the remote testing of high resolution images and videos, using AVrate Voyager , which is a publicly accessible framework for online tests. Finally, Dominik Keller (TU Ilmenau, Germany) presented a recent study exploring the impact of 8K (UHD-2) resolution on HDR video quality, considering different viewing distances. The results showed that the enhanced video quality of 8K HDR over 4K HDR diminishes with increasing viewing distance.
No Reference Metrics (NORM)
The group NORM addresses a collaborative effort to develop no-reference metrics for monitoring visual service quality. In At this meeting, Ioannis Katsavounidis (Meta, USA) led a discussion on the current efforts to improve complexity image and video metrics. In addition, Krishna Srikar Durbha (Univeristy of Texas at Austin, USA) presented a technique to tackle the problem of bitrate ladder construction based on multiple Visual Information Fidelity (VIF) feature sets extracted from different scales and subbands of a video
Emerging Technologies Group (ETG)
The ETG group focuses on various aspects of multimedia that, although they are not necessarily directly related to “video quality”, can indirectly impact the work carried out within VQEG and are not addressed by any of the existing VQEG groups. In particular, this group aims to provide a common platform for people to gather together and discuss new emerging topics, possible collaborations in the form of joint survey papers, funding proposals, etc.
In this meeting, Nabajeet Barman and Saman Zadtootaghaj (Sony
Interactive Entertainment, Germany), suggested a topic to start to be discussed
within VQEG: Quality Assessment of AI Generated/Modified Content. The goal is
to have subsequent discussions on this topic within the group and write a position
or whitepaper.
The IMG group is performing research on the quality assessment of immersive media technologies. Currently, the main joint activity of the group is the development of a test plan to evaluate the QoE of immersive interactive communication systems, which is carried out in collaboration with ITU-T through the work item P.IXC. In this meeting, Pablo Pérez (Nokia XR Lab, Spain), Jesús Gutiérrez (Universidad Politécnica de Madrid, Spain), Kamil Koniuch (AGH University of Krakow, Poland), Ashutosh Singla (CWI, The Netherlands) and other researchers involved in the test plan provided an update on the status of the test plan, focusing on the description of four interactive tasks to be performed in the test, the considered measures, and the 13 different experiments that will be carried out in the labs involved in the test plan. Also, in relation with this test plan, Felix Immohr (TU Ilmenau, Germany), presented a study on the impact of spatial audio on social presence and user behavior in multi-modal VR communications.
Diagram of the methodology of the joint IMG test plan
Quality Assessment for Computer Vision Applications (QACoViA)
The 5GKPI group studies relationship between key performance indicators of new 5G networks and QoE of video services on top of them. At the meeting, Pablo Pérez (Nokia XR Lab, Spain) led an open discussion on the future activities of the group towards 6G, including a brief presentation of QoS/QoE management in 3GPP and presenting potential opportunities to influence QoE in 6G.
The 146th MPEG meeting was held in Rennes, France from 22-26 April 2024, and the official press release can be found here. It comprises the following highlights:
AI-based Point Cloud Coding*: Call for proposals focusing on AI-driven point cloud encoding for applications such as immersive experiences and autonomous driving.
Object Wave Compression*: Call for interest in object wave compression for enhancing computer holography transmission.
Open Font Format: Committee Draft of the fifth edition, overcoming previous limitations like the 64K glyph encoding constraint.
Scene Description: Ratified second edition, integrating immersive media objects and extending support for various data types.
MPEG Immersive Video (MIV): New features in the second edition, enhancing the compression of immersive video content.
Video Coding Standards: New editions of AVC, HEVC, and Video CICP, incorporating additional SEI messages and extended multiview profiles.
Machine-Optimized Video Compression*: Advancement in optimizing video encoders for machine analysis.
Video-based Dynamic Mesh Coding (V-DMC)*: Committee Draft status for efficiently storing and transmitting dynamic 3D content.
LiDAR Coding*: Enhanced efficiency and responsiveness in LiDAR data processing with the new standard reaching Committee Draft status.
* … covered in this column.
AI-based Point Cloud Coding
MPEG issued a Call for Proposals (CfP) on AI-based point cloud coding technologies as a result from ongoing explorations regarding use cases, requirements, and the capabilities of AI-driven point cloud encoding, particularly for dynamic point clouds.
With recent significant progress in AI-based point cloud compression technologies, MPEG is keen on studying and adopting AI methodologies. MPEG is specifically looking for learning-based codecs capable of handling a broad spectrum of dynamic point clouds, which are crucial for applications ranging from immersive experiences to autonomous driving and navigation. As the field evolves rapidly, MPEG expects to receive multiple innovative proposals. These may include a unified codec, capable of addressing multiple types of point clouds, or specialized codecs tailored to meet specific requirements, contingent upon demonstrating clear advantages. MPEG has therefore publicly called for submissions of AI-based point cloud codecs, aimed at deepening the understanding of the various options available and their respective impacts. Submissions that meet the requirements outlined in the call will be invited to provide source code for further analysis, potentially laying the groundwork for a new standard in AI-based point cloud coding. MPEG welcomes all relevant contributions and looks forward to evaluating the responses.
Research aspects: In-depth analysis of algorithms, techniques, and methodologies, including a comparative study of various AI-driven point cloud compression techniques to identify the most effective approaches. Other aspects include creating or improving learning-based codecs that can handle dynamic point clouds as well as metrics for evaluating the performance of these codecs in terms of compression efficiency, reconstruction quality, computational complexity, and scalability. Finally, the assessment of how improved point cloud compression can enhance user experiences would be worthwhile to consider here also.
Object Wave Compression
A Call for Interest (CfI) in object wave compression has been issued by MPEG. Computer holography, a 3D display technology, utilizes a digital fringe pattern called a computer-generated hologram (CGH) to reconstruct 3D images from input 3D models. Holographic near-eye displays (HNEDs) reduce the need for extensive pixel counts due to their wearable design, positioning the display near the eye. This positions HNEDs as frontrunners for the early commercialization of computer holography, with significant research underway for product development. Innovative approaches facilitate the transmission of object wave data, crucial for CGH calculations, over networks. Object wave transmission offers several advantages, including independent treatment from playback device optics, lower computational complexity, and compatibility with video coding technology. These advancements open doors for diverse applications, ranging from entertainment experiences to real- time two-way spatial transmissions, revolutionizing fields such as remote surgery and virtual collaboration. As MPEG explores object wave compression for computer holography transmission, a Call for Interest seeks contributions to address market needs in this field.
Research aspects: Apart from compression efficiency, lower computation complexity, and compatibility with video coding technology, there is a range of research aspects, including the design, implementation, and evaluation of coding algorithms within the scope of this CfI. The QoE of computer-generated holograms (CGHs) together with holographic near-eye displays (HNEDs) is yet another dimension to be explored.
Machine-Optimized Video Compression
MPEG started working on a technical report regarding to the “Optimization of Encoders and Receiving Systems for Machine Analysis of Coded Video Content”. In recent years, the efficacy of machine learning-based algorithms in video content analysis has steadily improved. However, an encoder designed for human consumption does not always produce compressed video conducive to effective machine analysis. This challenge lies not in the compression standard but in optimizing the encoder or receiving system. The forthcoming technical report addresses this gap by showcasing technologies and methods that optimize encoders or receiving systems to enhance machine analysis performance.
Research aspects: Video (and audio) coding for machines has been recently addressed by MPEG Video and Audio working groups, respectively. MPEG Joint Video Experts Team with ITU-T SG16, also known as JVET, joined this space with a technical report, but research aspects remain unchanged, i.e., coding efficiency, metrics, and quality aspects for machine analysis of compressed/coded video content.
MPEG-I Immersive Audio
MPEG Audio Coding is entering the “immersive space” with MPEG-I immersive audio and its corresponding reference software. The MPEG-I immersive audio standard sets a new benchmark for compact and lifelike audio representation in virtual and physical spaces, catering to Virtual, Augmented, and Mixed Reality (VR/AR/MR) applications. By enabling high-quality, real-time interactive rendering of audio content with six degrees of freedom (6DoF), users can experience immersion, freely exploring 3D environments while enjoying dynamic audio. Designed in accordance with MPEG’s rigorous standards, MPEG-I immersive audio ensures efficient distribution across bandwidth-constrained networks without compromising on quality. Unlike proprietary frameworks, this standard prioritizes interoperability, stability, and versatility, supporting both streaming and downloadable content while seamlessly integrating with MPEG-H 3D audio compression. MPEG-I’s comprehensive modeling of real-world acoustic effects, including sound source properties and environmental characteristics, guarantees an authentic auditory experience. Moreover, its efficient rendering algorithms balance computational complexity with accuracy, empowering users to finely tune scene characteristics for desired outcomes.
Research aspects: Evaluating QoE of MPEG-I immersive audio-enabled environments as well as the efficient audio distribution across bandwidth-constrained networks without compromising on audio quality are two important research aspects to be addressed by the research community.
Video-based Dynamic Mesh Coding (V-DMC)
Video-based Dynamic Mesh Compression (V-DMC) represents a significant advancement in 3D content compression, catering to the ever-increasing complexity of dynamic meshes used across various applications, including real-time communications, storage, free-viewpoint video, augmented reality (AR), and virtual reality (VR). The standard addresses the challenges associated with dynamic meshes that exhibit time-varying connectivity and attribute maps, which were not sufficiently supported by previous standards. Video-based Dynamic Mesh Compression promises to revolutionize how dynamic 3D content is stored and transmitted, allowing more efficient and realistic interactions with 3D content globally.
Research aspects: V-DMC aims to allow “more efficient and realistic interactions with 3D content”, which are subject to research, i.e., compression efficiency vs. QoE in constrained networked environments.
Low Latency, Low Complexity LiDAR Coding
Low Latency, Low Complexity LiDAR Coding underscores MPEG’s commitment to advancing coding technologies required by modern LiDAR applications across diverse sectors. The new standard addresses critical needs in the processing and compression of LiDAR-acquired point clouds, which are integral to applications ranging from automated driving to smart city management. It provides an optimized solution for scenarios requiring high efficiency in both compression and real-time delivery, responding to the increasingly complex demands of LiDAR data handling. LiDAR technology has become essential for various applications that require detailed environmental scanning, from autonomous vehicles navigating roads to robots mapping indoor spaces. The Low Latency, Low Complexity LiDAR Coding standard will facilitate a new level of efficiency and responsiveness in LiDAR data processing, which is critical for the real-time decision-making capabilities needed in these applications. This standard builds on comprehensive analysis and industry feedback to address specific challenges such as noise reduction, temporal data redundancy, and the need for region-based quality of compression. The standard also emphasizes the importance of low latency coding to support real-time applications, essential for operational safety and efficiency in dynamic environments.
Research aspects: This standard effectively tackles the challenge of balancing high compression efficiency with real-time capabilities, addressing these often conflicting goals. Researchers may carefully consider these aspects and make meaningful contributions.
The 147th MPEG meeting will be held in Sapporo, Japan, from July 15-19, 2024. Click here for more information about MPEG meetings and their developments.
JPEG Trust reaches Draft International Standard stage
The 102nd JPEG meeting was held in San Francisco, California, USA, from 22 to 26 January 2024. At this meeting, JPEG Trust became a Draft International Standard. Moreover, the responses to the Call for Proposals of JPEG NFT were received and analysed. As a consequence, relevant steps were taken towards the definition of standardized tools for certification of the provenance and authenticity of media content in a time where tools for effective media manipulation should be made available to the general public. The 102nd JPEG meeting was finalised with the JPEG Emerging Technologies Workshop, at Tencent, Palo Alto on 27 January.
JPEG Emerging Technologies Workshop, organised on 27 January at Tencent, Palo Alto
The following sections summarize the main highlights of the 102nd JPEG meeting:
JPEG Trust reaches Draft International Standard stage;
JPEG AI improves the Verification Model;
JPEG Pleno Learning-based Point Cloud coding releases the Committee Draft;
JPEG Pleno Light Field continues development of Quality assessment tools;
AIC starts working on Objective Quality Assessment models for Near Visually Lossless coding;
JPEG XE prepares Common Test Conditions;
JPEG DNA evaluates its Verification Model;
JPEG XS 3rd edition parts are ready for publication as International standards;
JPEG XL investigate HDR compression performance.
JPEG Trust
At its 102nd meeting the JPEG Committee produced the DIS (Draft International Standard) of JPEG Trust Part 1 “Core Foundation” (21617-1). It is expected that the standard will be published as an International Standard during the Summer of 2024. This rapid standardization schedule has been necessary because of the speed at which fake media and misinformation are proliferating especially with respect to Generative AI.
The JPEG Trust Core Foundation specifies a comprehensive framework for individuals, organizations, and governing institutions interested in establishing an environment of trust for the media that they use, and for supporting trust in the media they share online. This framework addresses aspects of provenance, authenticity, integrity, copyright, and identification of assets and stakeholders. To complement Part 1, a proposed new Part 2 “Trust Profiles Catalogue” has been established. This new Part will specify a catalogue of Trust Profiles, targeting common usage scenarios.
During the meeting, the committee also evaluated responses received to the JPEG NFT Final Call for Proposals (CfP). Certain portions of the submissions will be incorporated in the JPEG Trust suite of standards to improve interoperability with respect to media tokenization. As a first step, the committee will focus on standardization of declarations of authorship and ownership.
Finally, the Use Cases and Requirements document for JPEG Trust was updated to incorporate additional requirements in respect of composited media. This document is publicly available on the JPEG website.
A white paper describing the JPEG Trust framework is also available publicly on the JPEG website.
JPEG AI
At the 102nd JPEG meeting, the JPEG AI Verification Model was improved by integrating nearly all the contributions adopted at the 101st JPEG meeting. The major change is a multi-branch JPEG AI decoding architecture with two encoders and three decoders (6 possible compatible combinations) that have been jointly trained, which allows the coverage of encoder and decoder complexity-efficiency tradeoffs. The entropy decoding and latent prediction portion is common for all possible combinations and thus differences reside at the analysis/synthesis networks. Moreover, the number of models has been reduced to 4, both 4:4:4 and 4:2:0 coding is supported, and JPEG AI can now achieve better rate-distortion performance in some relevant use cases. A new training dataset has also been adopted with difficult/high-contrast/versatile images to reduce the number of artifacts and to achieve better generalization and color reproducibility for a wide range of situations. Other enhancements have also been adopted, namely feature clipping for decoding artifacts reduction, improved variable bit-rate training strategy and post-synthesis transform filtering speedups.
The resulting performance and complexity characterization show compression efficiency (BD-rate) gains of 12.5% to 27.9% over the VVC Intra anchor, for relevant encoder and decoder configurations with a wide range of complexity-efficiency tradeoffs (7 to 216 kMAC/px at the decoder side). For the CPU platform, the decoder complexity is 1.6x/3.1x times higher compared to VVC Intra (reference implementation) for the simplest/base operating point. At the 102nd meeting, 12 core experiments were established to further continue work related to different topics, namely about the JPEG AI high-level syntax, progressive decoding, training dataset, hierarchical dependent tiling, spatial random access, to mention the most relevant. Finally, two demonstrations were shown where JPEG AI decoder implementations were run on two smartphone devices, Huawei Mate50 Pro and iPhone14 Pro.
JPEG Pleno Learning-based Point Cloud coding
The 102nd JPEG meeting marked an important milestone for JPEG Pleno Point Cloud with the release of its Committee Draft (CD) for ISO/IEC 21794-Part 6 “Learning-based point cloud coding” (21794-6). Part 6 of the JPEG Pleno framework brings an innovative Learning-based Point Cloud Coding technology adding value to existing Parts focused on Light field and Holography coding. It is expected that a Draft International Standard (DIS) of Part 6 will be approved at the 104th JPEG meeting in July 2024 and the International Standard to be published during 2025. The 102nd meeting also marked the release of version 4 of the JPEG Pleno Point Cloud Verification Model updated to be robust to different hardware and software operating environments.
JPEG Pleno Light Field
The JPEG Committee has recently published a light field coding standard, and JPEG Pleno is constantly exploring novel light field coding architectures. The JPEG Committee is also preparing standardization activities – among others – in the domains of objective and subjective quality assessment for light fields, improved light field coding modes, and learning-based light field coding.
As the JPEG Committee seeks continuous improvement of its use case and requirements specifications, it organized a Light Field Industry Workshop. The presentations and video recording of the workshop that took place on November 22nd, 2023 are available on the JPEG website.
JPEG AIC
During the 102nd JPEG meeting, work on Image Quality Assessment continued with a focus on JPEG AIC-3, targeting standardizing a subjective visual quality assessment methodology for images in the range from high to nearly visually lossless qualities. The activity is currently investigating three different subjective image quality assessment methodologies.
The JPEG Committee also launched the activities on Part 4 of the standard (AIC-4), by initiating work on the Draft Call for Proposals on Objective Image Quality Assessment. The Final Call for Proposals on Objective Image Quality Assessment is planned to be released in July 2024, while the submission of the proposals is planned for October 2024.
JPEG XE
The JPEG Committee continued its activity on JPEG XE and event-based vision. This activity revolves around a new and emerging image modality created by event-based visual sensors. JPEG XE is about the creation and development of a standard to represent events in an efficient way allowing interoperability between sensing, storage, and processing, targeting machine vision and other relevant applications. The JPEG Committee is preparing a Common Test Conditions document that provides the means to perform an evaluation of candidate technology for the efficient coding of event sequences. The Common Test Conditions provide a definition of a reference format, a dataset, a set of key performance metrics and an evaluation methodology. In addition, the committee is preparing a Draft Call for Proposals on lossless coding, with the intent to make it public in April of 2024. Standardization will first start with lossless coding of event sequences as this seems to have the higher application urgency in industry. However, the committee acknowledges that lossy coding of event sequences is also a valuable feature, which will be addressed at a later stage. The public Ad-hoc Group on Event-based Vision was reestablished to continue the work towards the next 103rd JPEG meeting in April of 2024. To stay informed about the activities please join the event based imaging Ad-hoc Group mailing list.
JPEG DNA
During the 102nd JPEG meeting, the JPEG DNA Verification Model description and software were approved along with continued efforts to evaluate its rate-distortion characteristics. Notably, during the 102nd meeting, a subjective quality assessment was carried out by expert viewing using a new approach under development in the framework of AIC-3. The robustness of the Verification Model to errors generated in a biochemical process was also analysed using a simple noise simulator. After meticulous analysis of the results, it was decided to create a number of core experiments to improve the Verification Model rate-distortion performance and the robustness to the errors by adding an error correction technique to the latter. In parallel, efforts are underway to improve the rate-distortion performance of the JPEG DNA Verification Model by exploring learning-based coding solutions. In addition, further efforts are defined to improve the noise simulator so as to allow assessment of the resilience to noise in the Verification Model in more realistic conditions, laying the groundwork for a JPEG DNA robust to insertion, deletion and substitution errors.
JPEG XS
The JPEG Committee is happy to announce that the core parts of JPEG XS 3rd edition are ready for publication as International standards. The Final Draft International Standard for Part 1 of the standard – Core coding tools – was created at the last meeting in November 2023, and is scheduled for publication. DIS ballot results for Part 2 – Profiles and buffer models – and Part 3 – Transport and container formats – of the standard came back, allowing the JPEG Committee to produce and deliver the proposed IS texts to ISO. This means that Part 2 and Part 3 3rd edition are also scheduled for publication.
At this meeting, the JPEG Committee continued the work on Part 4 – Conformance testing, to provide the necessary test streams of the 3rd edition for potential implementors. A Committee Draft for Part 4 was issued. With Parts 1, 2, and 3 now ready, and Part 4 ongoing, the JPEG Committee initiated the 3rd edition of Part 5 – Reference software. A first Working Draft was prepared and work on the reference software will start.
Finally, experimental results were presented on how to use JPEG XS over 5G mobile networks for the transmission of low-latency and high quality 4K/8K 360 degree views with mobile devices. This use case was added at the previous JPEG meeting. It is expected that the new use case can already be covered by the 3rd edition, meaning that no further updates to the standard would be necessary. However, investigations and experimentation on this subject continue.
JPEG XL
The second edition of JPEG XL Part 3 (Conformance testing) has proceeded to the DIS stage. Work on a hardware implementation continues. Experiments are planned to investigate HDR compression performance of JPEG XL.
“In its efforts to provide standardized solutions to ascertain authenticity and provenance of the visual information, the JPEG Committee has released the Draft international Standard of the JPEG Trust. JPEG Trust will bring trustworthiness back to imaging with specifications under the governance of the entire International community and stakeholders as opposed to a small number of companies or countries.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.
Abstract: Energy efficiency has become a crucial aspect of today’s IT infrastructures, and video (streaming) accounts for over half of today’s Internet traffic. This column highlights open-source tools, datasets, and solutions addressing energy efficiency in video streaming presented at ACM Multimedia Systems 2024 and its co-located workshop ACM Green Multimedia Systems.
Introduction
Across various platforms, users seek the highest Quality of Experience (QoE) in video communication and streaming. Whether it’s a crucial business meeting or a relaxing evening of entertainment, individuals desire seamless and high-quality video experiences. However, meeting this demand for high-quality video comes with a cost: increased energy usage [1],[2]. This energy consumption occurs at every stage of the process, including content provision via cloud services and consumption on end users’ devices [3]. Unfortunately, this heightened energy consumption inevitably leads to higher CO2 emissions (except for renewable energy sources), posing environmental challenges. It emphasizes the need for studies to assess the carbon footprint of video streaming.
Content provision is a critical stage in video streaming, involving encoding videos into various formats, resolutions, and bitrates. Encoding demands computing power and energy, especially in cloud-based systems. Cloud computing has become famous for video encoding due to its scalability [4] to adjust cloud resources to handle changing workloads and flexibility [5] to scale their operations based on demand. However, this convenience comes at a cost. Data centers, the heart of cloud computing, consume a significant portion of global electricity, around 3% [6]. Video encoding is one of the biggest energy consumers within these data centers. Therefore, optimizing video encoding for lower energy consumption is crucial for reducing the environmental impact of cloud-based video delivery.
Content consumption [7] involves the device using the network interface card to request and download video segments from the server, decompressing them for playback, and finally rendering the decoded frames on the screen, where the energy consumption depends on the screen technology and brightness settings.
The GAIA project showcased its research on the environmental impact of video streaming at the recent 15th ACM Multimedia Systems Conference (April 15-18, Bari, Italy). We presented our findings at relevant conference sessions: Open-Source Software and Dataset and the Green Multimedia Systems (GMSys) workshop.
Open Source Software
GREEM: An Open-Source Benchmark Tool Measuring the Environmental Footprint of Video Streaming [PDF] [Github] [Poster]
GREEM (Gaia Resource Energy and Emission Monitoring) aims to measure energy usage during video encoding and decoding processes. GREEM tracks the effects of video processing on hardware performance and provides a suite of analytical scenarios. This tool offers easy-to-use scenarios covering the most common video streaming situations, such as measuring sequential and parallel video encoding and decoding.
Automates experimentation: It allows users to easily configure and run various encoding scenarios with different parameters to compare results.
In-depth monitoring: The tool traces numerous hardware parameters, specifically monitoring energy consumption and GPU metrics, including core and memory utilization, temperature, and fan speed, providing a complete picture of video processing resource usage.
Visualization: GREEM offers scripts that generate analytic plots, allowing users to visualize and understand their measurement results easily.
Verifiable: GREEM empowers researchers with a tool that has earned the ACM Reproducibility Badge, which allows others to reproduce the experiments and results reported in the paper.
Open Source Datasets
VEED: Video Encoding Energy and CO2 Emissions Dataset for AWS EC2 instances [PDF] [Github] [Poster]
As video encoding increasingly shifts to cloud-based services, concerns about the environmental impact of massive data centers arise. The Video Encoding Energy and CO2 Emissions Dataset (VEED) provides the energy consumption and CO2 emissions associated with video encoding on Amazon’s Elastic Compute Cloud (EC2) instances. Additionally, VEED goes beyond energy consumption as it also captures encoding duration and CPU utilization.
Contributions:
Findability: A comprehensive metadata description file ensures VEED’s discoverability for researchers.
Accessibility: VEED is open for download on GitHub (https://github.com/cd-athena/VEEDdataset), removing access barriers for researchers. Core findings in the research that leverages the VEED dataset have been independently verified (ACM Reproducibility Badge).
Interoperability: The dataset is provided in a comma-separated value (CSV) format, allowing integration with various analysis applications.
Reusability: Description files empower researchers to understand the data structure and context, facilitating its use in diverse analytical projects.
COCONUT: Content Consumption Energy Measurement Dataset for Adaptive Video Streaming [PDF] [Github]
COCONUT is a dataset comprising the energy consumption of video streaming across various devices and different HAS (HTTP Adaptive Streaming) players. COCONUT captures user data during MPEG-DASH video segment streaming on laptops, smartphones, and other client devices, measuring energy consumption at different stages of streaming, including segment retrieval through the network interface card, video decoding, and rendering on the device.This paper has been designated the ACM Artifacts Available badge, signifying that the COCONUT dataset is publicly accessible. COCONUT can be accessed at https://athena.itec.aau.at/coconut/.
Second International ACM Green Multimedia Systems Workshop — GMSys 2024
VEEP: Video Encoding Energy and CO2 Emission Prediction [pdf] [slides]
In VEEP, a machine learning (ML) scheme that empowers users to predict the energy consumption and CO2 emissions associated with cloud-based video encoding.
Contributions:
Content-aware energy prediction: VEEP analyzes video content to extract features impacting encoding complexity. This understanding feeds an ML model that accurately predicts the energy consumption required for encoding the video on AWS EC2 instances. (High Accuracy: Achieves an R² score of 0.96)
Real-time carbon footprint: VEEP goes beyond energy. It also factors in real-time carbon intensity data based on the location of the cloud instance. This allows VEEP to calculate the associated CO2 emissions for your encoding tasks at encoding time.
Resulting impact: By carefully selecting the type and location of cloud instances based on VEEP’s predictions, CO2 emissions can be reduced by up to 375 times. This significant reduction signifies VEEP’s potential to contribute to greener video encoding.
Conclusions
This column provided an overview of the GAIA project’s research on the environmental impact of video streaming, presented at the 15th ACM Multimedia Systems Conference. GREEM measurement tool empowers developers and researchers to measure the energy and CO2 emissions of video processing. VEED provides valuable insights into energy consumption and CO2 emissions during cloud-based video encoding on AWS EC2 instances. COCONUT sheds light on energy usage during video playback on various devices and with different players, aiding in optimizing client-side video streaming. Furthermore, VEEP, a machine learning framework, takes energy efficiency a step further. It allows users to predict energy consumption and CO2 emissions associated with cloud-based video encoding, allowing users to select cloud instances that minimize environmental impact. These studies can help researchers, developers, and service providers to optimize video streaming for a more sustainable future. The focus on encoding and playback highlights the importance of a holistic approach considering the entire video streaming lifecycle. While these papers primarily focus on the environmental impact of video streaming, a strong connection exists between energy efficiency and QoE [8],[9],[10]. Optimizing video processing for lower energy consumption can sometimes lead to trade-offs regarding video quality. Future research directions could explore techniques for optimizing video processing while ensuring a consistently high QoE for viewers.
References
[1] A. Katsenou, J. Mao, and I. Mavromatis, “Energy-Rate-Quality Tradeoffs of State-of-the-Art Video Codecs.” arXiv, Oct. 02, 2022. Accessed: Oct. 06, 2022. [Online]. Available: http://arxiv.org/abs/2210.00618
[2] H. Amirpour, V. V. Menon, S. Afzal, R. Prodan, and C. Timmerer, “Optimizing video streaming for sustainability and quality: The role of preset selection in per-title encoding,” in 2023 IEEE International Conference on Multimedia and Expo (ICME), IEEE, 2023, pp. 1679–1684. Accessed: May 05, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10219577/
[4] A. Atadoga, U. J. Umoga, O. A. Lottu, and E. O. Sodiya, “Evaluating the impact of cloud computing on accounting firms: A review of efficiency, scalability, and data security,” Glob. J. Eng. Technol. Adv., vol. 18, no. 2, pp. 065–075, Feb. 2024, doi: 10.30574/gjeta.2024.18.2.0027.
[5] B. Zeng, Y. Zhou, X. Xu, and D. Cai, “Bi-level planning approach for incorporating the demand-side flexibility of cloud data centers under electricity-carbon markets,” Appl. Energy, vol. 357, p. 122406, Mar. 2024, doi: 10.1016/j.apenergy.2023.122406.
[7] C. Yue, S. Sen, B. Wang, Y. Qin, and F. Qian, “Energy considerations for ABR video streaming to smartphones: Measurements, models and insights,” in Proceedings of the 11th ACM Multimedia Systems Conference, 2020, pp. 153–165, doi: 10.1145/3339825.3391867.
[8] G. Bingöl, A. Floris, S. Porcu, C. Timmerer, and L. Atzori, “Are Quality and Sustainability Reconcilable? A Subjective Study on Video QoE, Luminance and Resolution,” in 2023 15th International Conference on Quality of Multimedia Experience (QoMEX), IEEE, 2023, pp. 19–24. Accessed: May 06, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10178513/
[9] G. Bingöl, S. Porcu, A. Floris, and L. Atzori, “An Analysis of the Trade-Off Between Sustainability and Quality of Experience for Video Streaming,” in 2023 IEEE International Conference on Communications Workshops (ICC Workshops), IEEE, 2023, pp. 1600–1605. Accessed: May 06, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10283614/
[10] C. Herglotz, W. Robitza, A. Raake, T. Hossfeld, and A. Kaup, “Power Reduction Opportunities on End-User Devices in Quality-Steady Video Streaming.” arXiv, May 24, 2023. doi: 10.48550/arXiv.2305.15117.
In February 20204 was held the first edition of the Social Robotics, Artificial Intelligence and Multimedia (SoRAIM) Winter School, which, with the support of SIGMM attracted more than 50 students and young researchers to learn, discuss and first-hand experiment in topics related to social robotics. The event’s success calls for further editions in upcoming years.
Rationale for SoRAIM
SPRING, a collaborative research project funded by the European Commission under Horizon 2020, is coming to an end in May 2024. Its scientific and technological objectives were to test a versatile social robotic platform within a hospital and have it perform social activities in a multi-person, dynamic setup are in most part achieved. In order to empower the next generation of young researchers with concepts and tools to answer tomorrow’s challenges in the field of social robotics, one must tackle the issue of knowledge and know-how transmission. We therefore chose to provide a winter school, free of charge to the participants (thanks to the additional support of SIGMM), so that as many students and young researchers from various horizons (not only technical fields) could attend.
Contents of the Winter School
The Social Robotics, Artificial Intelligence and Multimedia (SoRAIM) Winter School took place from 19 to 23 February 2024 in Grenoble, France. An introduction to the contents of the school and the context provided by the SPRING project was provided, and a demonstration combining social navigation and dialogue interaction was given on the first day. This triggered the curiosity of the participants, and a spontaneous Q&A session with the contributions, questions and comments from the participants to the school was held.
The school spanned over the entire week, with 17 talks, 8 speakers from the H2020 SPRING project, and 9 invited speakers external to the project. The school also included a panel discussion on the topic “Are social robots already out there? Immediate challenges in real-world deployment”, a poster session with 15 contributions, and two hands-on sessions where the participants could choose among the following topics: Robot navigation with Reinforcement Learning, ROS4HRI: How to represent and reason about humans with ROS, Building a conversational system with LLMs using prompt engineering, Robot self-localisation based on camera images, and Speaker extraction from microphone recordings. A social activity (visit of Grenoble’s downtown and Bastille) was organised on Thursday afternoon, allowing participants to mingle with speakers and to discover the host town’s history.
One of the highlights of SoRAIM was its Panel Session, which topic was “Are social robots already out there? Immediate challenges in real-world deployment”. Although no definitive answers were found, the session stressed the fact that challenges remain numerous for the deployment of actual social robots in our everyday lives (at work, at home). On the technical side, because robotic platforms are subject to certain hardware and software constraints. On the hardware side, because sensors and actuators are restricted in size, power and performance, since the physical space and the battery capacity are also limited. On the software side, because large models can be used if lots of computing resources are permanently available, which is not always the case, since they need to be shared between the various computing modules. Finally on the regulatory and legal side, because the rise of AI use is fast and needs to be balanced with ethical views that address our society’s needs; but the construction of proper laws, norms and their acknowledgement and understanding by stakeholders is slow. In this session the panellists surveyed all aspects of the problems at hand and provided an overview of the challenges that future scientists will need to solve in order to take social robots out of the labs and into the world.
Attendance & future perspectives
SoRAIM attracted 57 participants through the whole week. The attendees were diverse, as was aimed initially, with a breakdown of 50% of PhD students, 20% of young researchers (public sector), 10% of engineers and young researchers (private sector), and 20% of MSc students. Of particular focus, the ratio of women attendees was close to 40%, which is double of the usual in this field. Finally, in terms of geographic spread, attendees came in majority from other European countries (17 countries total), with just below 50% attendees coming from France. Following the school, a satisfaction survey was sent to the attendees in order to better grasp which elements were the most appreciated in view of a longer-term objective to hold this winter school as a serial event. Given the diverse background of attendees, opinions on contents such as the hands-on session varied, but overall satisfaction was very high, which shows the interest of the next generation of researchers for more opportunities to learn in this field. We are currently reviewing options to held similar events each year or every two years, depending on available funding.
More information about the SoRAIM winter school is available on the webpage: https://spring-h2020.eu
Sponsors
SoRAIM was sponsored by the H2020 SPRING project, Inria, the University Grenoble Alpes, the Multidisciplinary Institute of Artificial Intelligence and by ACM’s Special Interest Group on Multimedia (SIGMM). Through ACM SIGMM, we received significant funding which allowed us to invite 14 students and young researchers, members of SIGMM, from abroad.
Overall description of the software architecture used in the SPRING project, by Dr. Séverin lemaignan (https://academia.skadge.org/, PAL Robotics).
Autonomous Robots in the Wild – Adapting From and for Interaction, by Prof. Marc Hanheide (https://www.hanheide.net/, University of Lincoln).
AI and Children’s Rights: Lessons Learnt from the Implementation of the UNICEF Policy Guidance to Social Robots for Children, by Dr. Vasiliky Charisi (https://vickycharisi.wordpress.com/about/, University College London).
Opportunities and Challenges in Putting AI Ethics in Practice: the Role of the EU, by Dr. Mihalis Kritikos (https://www.linkedin.com/in/mihalis-kritikos-43243087/, Ethics and Integrity Sector of the European Commission).
Robust Audio-Visual Perception of Humans, by Prof. Sharon Gannot (https://sharongannot.group/, bar-ilan university).
Predictive modelling of turn-taking in human-robot interaction, by Prof. Gabriel Skantze (https://www.kth.se/profile/skantze, KTH, Kungliga Tekniska Högskolan).
Multi-User Spoken Conversations with Robots, by Dr. Daniel Hernandez Garcia (https://dhgarcia.github.io/, Heriot Watt University).
Learning Robot Behaviour, by Dr. Chris Reinke (https://www.scirei.net/, Inria at Univ. Grenoble Alpes).
Human-Interactive Mobile Robots: from Learning to Deployment, by Prof. Xuesu Xiao (https://cs.gmu.edu/~xiao/, George Mason University).
Human-Presence Modeling and Social Navigation of an Assistive Robot Solution for Detection of Falls and Elderly’s Support, by Prof. Antonios Gasteratos (https://robotics.pme.duth.gr/antonis/, Democritus University of Thrace).
Robotic Coaches for Mental Wellbeing: From the Lab to the Real World, by Prof. Hatice Gunes (https://www.cl.cam.ac.uk/~hg410/, University of Cambridge)
JPEG
Trust reaches Committee Draft stage at the 101st JPEG meeting
The 101st JPEG meeting was held online, from the 30th of October to the 3rd of November 2023. At this meeting, JPEG Trust became a Committee Draft. In addition, JPEG analyzed the responses to its Calls for Proposals for JPEG DNA.
The 101st JPEG meeting had the following highlights:
JPEG Trust reaches Committee Draft;
JPEG AI request its re-establishment;
JPEG Pleno Learning-based Point Cloud coding establishes a new Verification Model;
JPEG Pleno organizes a Light Field Industry Workshop;
JPEG AIC-3 continues the evaluation of contributions;
JPEG XE produces a first draft of the Common Test Conditions;
JPEG DNA analyses the responses to the Call for Proposals;
JPEG XS proceeds with the development of the 3rd edition;
JPEG XL proceeds with the development of the 2nd edition.
The following sections summarize the main highlights of the 101st JPEG meeting.
JPEG Trust
The
101st meeting marked an important milestone for JPEG Trust project with
its Committee Draft (CD) for Part 1 “Core Foundation” (21617-1) of the standard
approved for consultation. It is expected that a Draft International Standard
(DIS) of the Core Foundation will be approved at the 102nd JPEG
meeting in January 2024, which will be another important milestone. This rapid
schedule is necessitated by the speed at which fake media and misinformation are
proliferating especially in respect of generative AI.
Aligned
with JPEG Trust, the NFT Call for Proposals (CfP) has yielded two expressions
of interest to date, and submission of proposals is still open till the 15th of
January 2024.
Additionally,
the Use Cases and Requirements document for JPEG Fake Media (the JPEG Fake
Media exploration preceded the initiation of the JPEG Trust international
standard) was updated to reflect the change to JPEG Trust as well as
incorporate additional use cases that have arisen since the previous JPEG meeting,
namely in respect of composited images. This document is publicly available on
the JPEG website.
JPEG AI
At
the 101st meeting, the JPEG Committee issued a request for
re-establishing the JPEG AI (6048-1) project, along with a Committee Draft (CD)
of its version 1. A new JPEG AI timeline has also been approved and is now
publicly available, where a Draft International Standard (DIS) of the Core
Coding Engine of JPEG AI version 1 is foreseen at the 103rd JPEG
meeting (April 2024), a rather important milestone for JPEG AI. The JPEG
Committee also established that JPEG AI version 2 will address requirements not
yet fulfilled (especially regarding machine consumption tasks) but also
significant improvements on requirements already addressed in version 1, e.g.
compression efficiency. JPEG AI version 2 will issue the final Call for
Proposals in January 2025 and the presentation and evaluation of JPEG AI
version 2 proposals will occur in July 2025. During 2023, the JPEG AI
Verification Model (VM) has evolved from a complex system (800kMAC/pxl) to two
acceptable complexity-efficiency operation points, providing 11% compression
efficiency gains at 20 kMAC/pxl and 25% compression efficiency gains at 200
kMAC/pxl. The decoder for the lower-end operating point has now been implemented
on mobile devices and demonstrated during the 100th and 101st
JPEG meetings. A presentation with the JPEG AI architecture, networks, and
tools is now publicly available. To avoid project delays in the future, the
promising input contributions from the 101st meeting will be
combined in JPEG AI Core Experiment 6.1 (CE6.1) to study interaction and
resolve potential issues during the next meeting cycle. After this integration,
a model will be trained and cross-checked to be approved for release (JPEG AI
VM5 release candidate) along with the study DIS text. Among promising
technologies included in CE6.1 are high quality and variable rate improvements,
with a smaller number of models (from 5 to 4), a multi-branch decoder that
allows up to three reconstructions with different levels of quality from the
same latent representation, but with synthesis transform networks with
different complexity along with several post-filter and arithmetic coder
simplifications.
JPEG Pleno Learning-based Point Cloud coding
The
JPEG Pleno Learning-based Point Cloud coding activity progressed at the 101st
meeting with a major investigation into point cloud quality metrics. The JPEG
Committee decided to continue this investigation into point cloud quality
metrics as well as explore possible advancements to the VM in the areas of
parameter tuning and support for residual lossless coding. The JPEG Committee is
targeting a release of the Committee Draft of Part 6 of the JPEG Pleno standard
relating to Learning-based point cloud coding at the 102nd JPEG
meeting in San Francisco, USA in January 2024.
JPEG Pleno Light Field
The
JPEG Committee has been creating several standards to provision the dynamic
demands of the market, with its royalty-free patent licensing commitments. A
light field coding standard has recently been developed, and JPEG Pleno is
constantly exploring novel light field coding architectures.
The
JPEG Committee is also preparing standardization activities – among others – in
the domains of objective and subjective quality assessment for light fields,
improved light field coding modes, and learning-based light field coding.
A
Light Field Industry Workshop takes place on November 22nd, 2023,
aiming at providing a forum for industrial actors to exchange information on
their needs and expectations with respect to standardization activities in this
domain.
JPEG AIC
During
the 101st JPEG meeting, the AIC activity continued its efforts on
the evaluation of the contributions received in April 2023 in response to the
Call for Contributions on Subjective Image Quality Assessment. Notably, the
activity is currently investigating three different subjective image quality
assessment methodologies. The results of the newly established Core Experiments
will be considered during the design of the AIC-3 standard, which has been
carried out in a collaborative way since its beginning.
The AIC
activity also initiated the discussion on Part 4 of the standard on Objective
Image Quality Metrics (AIC-4) by refining the Use Cases and Requirements
document. During the 102nd JPEG meeting in January 2024, the
activity is planning to work on the Draft Call for Proposals on Objective Image
JPEG XE
The
JPEG Committee continued its activity on Event-based Vision. This activity
revolves around a new and emerging image modality created by event-based visual
sensors. JPEG XE aims at the creation and development of a standard to
represent events in an efficient way allowing interoperability between sensing,
storage, and processing, targeting machine vision and other relevant
applications. For better dissemination and raising external interest, a
workshop around Event-based Vision was organized and took place on Oct 24th,
2023. The workshop triggered the attention of various stakeholders in the field
of Event-based Vision, who will start contributing to JPEG XE. The workshop
proceedings will be made available on jpeg.org. In addition, the JPEG Committee
created a minor revision for the Use cases and Requirements as v1.0, adding an
extra use case on scientific and engineering measurements. Finally, a first
draft of the Common Test Conditions for JPEG XE was produced, along with the
first Exploration Experiments to start practical experiments in the coming
3-month period until the next JPEG meeting. The public Ad-hoc Group on
Event-based Vision was re-established to continue the work towards the next 102nd
JPEG meeting in January of 2024. To stay informed about the activities please
join the Event-based Vision Ad-hoc Group mailing list.
JPEG DNA
As a result of the Call for Proposals issued by the JPEG Committee for
contributions to JPEG DNA standard, 5 proposals were submitted under three
distinct codecs by three organizations. Two codecs were submitted to both
coding and transcoding categories, and one was submitted to the coding category
only. All proposals showed improved compression efficiency when compared to
three selected anchors by the JPEG Committee. After a rigorous analysis of the
proposals and their cross checking by independent parties, it was decided to
create a first Verification Model (VM) based on V-DNA, the best performing
proposal. In addition, a number of core experiments were designed to improve
the JPEG DNA VM with elements from other proposals submitted by quantifying
their added value when integrated in the VM.
JPEG XS
The
JPEG Committee continued its work on JPEG XS 3rd edition. The
primary goal of the 3rd edition is to deliver the same image quality
as the 2nd edition, but with half of the required bandwidth. The
Final Draft International Standard for Part 1 of the standard — Core coding
tools — was produced at this meeting. With this FDIS version, all technical
features are now fixed and completed. Part 2 — Profiles and buffer models —
and Part 3 — Transport and container formats — of the standard are still in
DIS ballot, and ballot results will only be known by the end of January 2024.
The JPEG Committee is now working on Part 4 — Conformance testing, to provide
the necessary test streams of the 3rd edition for potential
implementors. A first Working Draft for Part 4 was issued. Completion of the
JPEG XS 3rd edition is scheduled for April 2024 (Parts 1, 2, and 3)
and Parts 4 and 5 will follow shortly after that. Finally, the new Use cases
and Requirements for JPEG XS document was created containing a new use case to
use JPEG XS for transport of 4K/8K video over 5G mobile networks. It is
expected that the new use case can already be covered by the 3rd edition,
meaning that no further updates to the standard would be needed. However, more
investigations and experimentations will be conducted on this subject.
JPEG XL
The second editions of JPEG
XL Part 1 (Core coding system) and Part 2 (File format) have proceeded to the
FDIS stage, and the second edition of JPEG XL Part 3 (Conformance testing) has
proceeded to the CD stage. These second editions provide clarifications,
corrections and editorial improvements that will facilitate independent
implementations. At the same time, the development of hardware implementation solutions
continues.
Final Quote
“The release of the first Committee Draft of JPEG Trust is a strong signal that the JPEG Committee is reacting with a timely response to demands for solutions that inform users when digital media assets are created or modified, in particular through Generative AI, hence contributing to bringing back trust into media-centric ecosystems.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.