VQEG Column: VQEG Meeting December 2022

Introduction

This column provides an overview of the last Video Quality Experts Group (VQEG) plenary meeting, which took place from 12 to 16 December 2022. Around 100 participants from 21 different countries around the world registered for the meeting that was organized online by Brightcove (United Kingdom). During the five days, there were more than 40 presentations and discussions among researchers working on topics related to the projects ongoing within VQEG. All the related information, minutes, and files from the meeting are available online on the VQEG meeting website, and video recordings of the meeting are available on Youtube.

Many of the works presented in this meeting can be relevant for the SIGMM community working on quality assessment. Particularly interesting can be the proposals to update and merge ITU-T recommendations P.913, P.911, and P.910, the kick-off of the test plan to evaluate the QoE of immersive interactive communication systems, and the creation of a new group on emerging technologies that will start working on AI-based technologies and greening of streaming and related trends.

We encourage readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Group picture of the VQEG Meeting 12-16 December 2022 (online).

Overview of VQEG Projects

Audiovisual HD (AVHD)

The AVHD group investigates improved subjective and objective methods for analysing commonly available video systems. Currently, there are two projects ongoing under this group: Quality of Experience (QoE) Metrics for Live Video Streaming Applications (Live QoE) and Advanced Subjective Methods (AVHD-SUB).

In this meeting, there were three presentations related to topics covered by this group. In the first one, Maria Martini (Kingston University, UK), presented her work on converting video quality assessment metrics. In particular, the work addressed the relationship between SSIM and PSNR for DCT-based compressed images and video, exploiting the content-related factor [1]. The second presentation was given by Urvashi Pal (Akamai, Australia) and dealt with video codec profiling with video quality assessment complexities and resolutions. Finally, Jingwen Zhu (Nantes Université, France) presented her work on the benefit of parameter-driven approaches for the modelling and the prediction of a Satisfied User Ratio for compressed videos [2].

Quality Assessment for Health applications (QAH)

The QAH group works on the quality assessment of health applications, considering both subjective evaluation and the development of datasets, objective metrics, and task-based approaches. Currently there is an open discussion on new topics to address within the group, such as the application of visual attention models and studies to health applications. Also, an opportunity to conduct medical perception research was announced, which was proposed by Elizabeth Krupinski and will take place in the European Congress of Radiology (Vienna, Austria, Mar. 2023).

In addition, four research works were presented at the meeting. Firstly, Julie Fournier (INSA Rennes, France) presented new insights on affinity therapy for people with ASD, based on an eye-tracking study on images. The second presentation was delivered by Lumi Xia (INSA Rennes, France) and dealt with the evaluation of the usability of deep learning-based denoising models for low-dose CT simulation. Also, Mohamed Amine Kerkouri (University of Orleans, France), presented his work on deep-based quality assessment of medical images through domain adaptation. Finally, Jorge Caviedes (ASU, USA) delivered a talk on cognition inspired diagnostic image quality models, emphasising the need of distinguishing among interpretability (e.g., medical professional is confident in making a diagnosis), adequacy (e.g., capture technique shows the right area for assessment), and visual quality (e.g., MOS) in quality assessment of medical contents.

Statistical Analysis Methods (SAM)

The SAM group works on improving analysis methods both for the results of subjective experiments and for objective quality models and metrics. The group is currently working on updating and merging the ITU-T recommendations P.913, P.911, and P.910. The suggestion is to make P.910 and P.911 obsolete and make P.913 the only recommendation from ITU-T on subjective video quality assessments. The group worked on the liaison and document to be sent to ITU-T SG12 and will be available in the meeting files.

In addition, Mohsen Jenadeleh (Univerity of Konstanz, Germany) presented his work on collective just noticeable difference assessment for compressed video with Flicker Test and QUEST+.

Computer Generated Imagery (CGI)

CGI group is devoted to analysing and evaluating computer-generated content, with a focus on gaming in particular. The group is currently working in collaboration with ITU-T SG12 on the work item P.BBQCG on Parametric bitstream-based Quality Assessment of Cloud Gaming Services. In this sense, Saman Zadtootaghaj (Sony Interactive Entertainment, Germany) provided an update on the ongoing activities. In addition, they are working on two new work items: G.OMMOG on Opinion Model for Mobile Online Gaming applications and P.CROWDG on Subjective Evaluation of Gaming Quality with a Crowdsourcing Approach. Also, the group is working on identifying other topics and interests in CGI rather than gaming content.

No Reference Metrics (NORM)

The NORM group is an open collaborative project for developing no-reference metrics for monitoring visual service quality. Currently, the group is working on three topics: the development of no-reference metrics, the clarification of the computation of the Spatial and Temporal Indexes (SI and TI, defined in the ITU-T Recommendation P.910), and the development of a standard for video quality metadata. 

In relation to the first topic, Margaret Pinson (NTIA/ITS, US), talked about why no-reference metrics for image and video quality lack accuracy and reproducibility [3] and presented new datasets containing camera noise and compression artifacts for the development of no-reference metrics by the group. In addition, Oliver Wiedeman (University of Konstanz, Germany) presented his work on cross-resolution image quality assessment.

Regarding the computation of complexity indices, Maria Martini (Kingston University, UK) presented a study comparing 12 metrics (and possible combinations) for assessing video content complexity. Vignesh V. Menon (University of Klagenfurt, Austria) presented a summary of live per-title encoding approaches using video complexity features. Ioannis Katsavounidis and Cosmin Stejerean (Meta, US) presented their work on using motion search to order videos by coding complexity, also making available the software in open source. In addition, they led a discussion on supplementing classic SI and TI with improved complexity metrics (VCA, motion search, etc.).

Finally, related to the third topic, Ioannis Katsavounidis (Meta, US) provided an update on the status of the project. Given that the idea is already mature enough, a contribution will be made to MPEG to consider the insertion of metadata of video metrics into the encoded video streams. In addition, a liaison with AOMedia will be established that may go beyond this particular topic. And include best practices on subjective testing, IMG topics, etc.

Joint Effort Group (JEG) – Hybrid

The JEG group was focused on a joint work to develop hybrid perceptual/bitstream metrics and gradually evolved over time to include several areas of Video Quality Assessment (VQA), such as the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. Currently, the group is working on research problems rather than algorithms and models with immediate applicability. In addition, the group has launched a new website, which includes a list of activities of interest, freely available publications, and other resources. 

Two examples of research problems addressed by the group were shown by the two presentations given by Lohic Fotio Tiotsop (Politecnico di Torino, Italy). The topic of the first presentation was related to the training of artificial intelligence observers for a wide range of applications, while the second presentation provided guidelines to train, validate, and publish DNN-based objective measures.

5G Key Performance Indicators (5GKPI)

The 5GKPI group studies the relationship between key performance indicators of new 5G networks and QoE of video services on top of them. In this meeting, Pablo Pérez (Nokia XR Lab, Spain) presented an overview of activities related to QoE and XR within 3GPP.

Immersive Media Group (IMG)

The IMG group is focused on the research on quality assessment of immersive media. The main joint activity going on within the group is the development of a test plan to evaluate the QoE of immersive interactive communication systems. After the discussions that took place in previous meetings and audio calls, a tentative schedule has been proposed to start the execution of the test plan in the following months. In this sense, a new work item will be proposed in the next ITU-T SG12 meeting to establish a collaboration between VQEG-IMG and ITU on this topic.

In addition to this, a variety of different topics related to immersive media technologies were covered in the works presented during the meeting. For example, Yaosi Hu (Wuhan University, China) presented her work on video quality assessment based on quality aggregation networks. In relation to light field imaging, Maria Martini (Kingston University, UK) exposed the main problems related to what light field quality assessment datasets are currently meeting and presented a new dataset. Also, there were three talks by researchers from CWI (Netherlands) dealing with point cloud QoE assessment: Silvia Rossi presented a behavioral analysis in a 6-DoF VR system, taking into account the influence of content, quality and user disposition [4]; Shishir Subramanyam presented his work related to the subjective QoE evaluation of user-centered adaptive streaming of dynamic point clouds [5]; and Irene Viola presented a point cloud objective quality assessment using PCA-based descriptors (PointPCA). Another presentation related to point cloud quality assessment was delivered by Marouane Tliba (Université d’Orleans, France), who presented an efficient deep-based graph objective metric

In addition, Shirin Rafiei (RISE, Sweden) gave a talk on UX and QoE aspects of remote control operations using a laboratory platform, Marta Orduna (Universidad Politécnica de Madrid, Spain) presented her work on comparing ACR, SSDQE, and SSCQE in long duration 360-degree videos, whose results will be used to submit a proposal to extend ITU-T Rec. P.919 for long sequences, and Ali Ak (Nantes Université, France) his work on just noticeable differences to HDR/SDR image/video quality.    

Quality Assessment for Computer Vision Applications (QACoViA)

The goal of the QACoViA group is to study the visual quality requirements for computer vision methods, where the “final observer” is an algorithm. Four presentations were delivered in this meeting addressing diverse related topics. In the first one, Mikołaj Leszczuk (AGH University, Poland) presented a method for assessing objective video quality for automatic license plate recognition tasks [6]. Also, Femi Adeyemi-Ejeye (University of Surrey, UK) presented his work related to the assessment of rail 8K-UHD CCTV facing video for the investigation of collisions. The third presentation dealt with the application of facial expression recognition and was delivered by Lucie Lévêque (Nantes Université, France), who compared the robustness of humans and deep neural networks on this task [7]. Finally, Alban Marie (INSA Rennes, France) presented a study video coding for machines through a large-scale evaluation of DNNs robustness to compression artefacts for semantic segmentation [8].

Other updates

In relation to the Human Factors for Visual Experiences (HFVE) group, Maria Martini (Kingston University, UK) provided a summary of the status of IEEE recommended practice for the quality assessment of light field imaging. Also, Kjell Brunnström (RISE, Sweden) presented a study related to the perceptual quality of video on simulated low temperatures in LCD vehicle displays.

In addition, a new group was created in this meeting called Emerging Technologies Group (ETG), whose main objective is to address various aspects of multimedia that do not fall under the scope of any of the existing VQEG groups. The topics addressed are not necessarily directly related to “video quality” but can indirectly impact the work addressed as part of VQEG. In particular, two major topics of interest were currently identified: AI-based technologies and greening of streaming and related trends. Nevertheless, the group aims to provide a common platform for people to gather together and discuss new emerging topics, discuss possible collaborations in the form of joint survey papers/whitepapers, funding proposals, etc.

Moreover, it was agreed during the meeting to make the Psycho-Physiological Quality Assessment (PsyPhyQA) group dormant until interest resumes in this effort. Also, it was proposed to move the Implementer’s Guide for Video Quality Metrics (IGVQM) project into the JEG-Hybrid, since their activities are currently closely related. This will be discussed in future group meetings and the final decisions will be announced. Finally, as a reminder, the VQEG GitHub with tools and subjective labs setup is still online and kept updated.

The next VQEG plenary meeting will take place in May 2023 and the location will be announced soon on the VQEG website.

References

[1] Maria G. Martini, “On the relationship between SSIM and PSNR for DCT-based compressed images and video: SSIM as content-aware PSNR”, TechRxiv. Preprint. https://doi.org/10.36227/techrxiv.21725390.v1, 2022.
[2] J. Zhu, P. Le Callet; A. Perrin, S. Sethuraman, K. Rahul, “On The Benefit of Parameter-Driven Approaches for the Modeling and the Prediction of Satisfied User Ratio for Compressed Video”, IEEE International Conference on Image Processing (ICIP), Oct. 2022.
[3] Margaret H. Pinson, “Why No Reference Metrics for Image and Video Quality Lack Accuracy and Reproducibility”, Frontiers in Signal Processing, Jul. 2022.
[4] S. Rossi, I. viola, P. Cesar, “Behavioural Analysis in a 6-DoF VR System: Influence of Content, Quality and User Disposition”, Proceedings of the 1st Workshop on Interactive eXtended Reality, Oct. 2022.
[5] S. Subramanyam, I. Viola, J. Jansen, E. Alexiou, A. Hanjalic, P. Cesar, “Subjective QoE Evaluation of User-Centered Adaptive Streaming of Dynamic Point Clouds”, International Conference on Quality of Multimedia Experience (QoMEX), Sep. 2022.
[6] M. Leszczuk, L. Janowski, J. Nawała, and A. Boev, “Method for Assessing Objective Video Quality for Automatic License Plate Recognition Tasks”, Communications in Computer and Information Science, Oct. 2022.
[7] L. Lévêque, F. Villoteau, E. V. B. Sampaio, M. Perreira Da Silva, and P. Le Callet, “Comparing the Robustness of Humans and Deep Neural Networks on Facial Expression Recognition”, Electronics, 11(23), Dec. 2022.
[8] A. Marie, K. Desnos, L. Morin, and Lu Zhang, “Video Coding for Machines: Large-Scale Evaluation of Deep Neural Networks Robustness to Compression Artifacts for Semantic Segmentation”, IEEE International Workshop on Multimedia Signal Processing (MMSP), Sep. 2022.

MPEG Column: 140th MPEG Meeting in Mainz, Germany

After several years of online meetings, the 140th MPEG meeting was held as a face-to-face meeting in Mainz, Germany, and the official press release can be found here and comprises the following items:

  • MPEG evaluates the Call for Proposals on Video Coding for Machines
  • MPEG evaluates Call for Evidence on Video Coding for Machines Feature Coding
  • MPEG reaches the First Milestone for Haptics Coding
  • MPEG completes a New Standard for Video Decoding Interface for Immersive Media
  • MPEG completes Development of Conformance and Reference Software for Compression of Neural Networks
  • MPEG White Papers: (i) MPEG-H 3D Audio, (ii) MPEG-I Scene Description

Video Coding for Machines

Video coding is the process of compression and decompression of digital video content with the primary purpose of consumption by humans (e.g., watching a movie or video telephony). Recently, however, massive video data is more and more analyzed without human intervention leading to a new paradigm referred to as Video Coding for Machines (VCM) which targets both (i) conventional video coding and (ii) feature coding (see here for further details).

At the 140th MPEG meeting, MPEG Technical Requirements (WG 2) evaluated the responses to the Call for Proposals (CfP) for technologies and solutions enabling efficient video coding for machine vision tasks. A total of 17 responses to this CfP were received, with responses providing various technologies such as (i) learning-based video codecs, (ii) block-based video codecs, (iii) hybrid solutions combining (i) and (ii), and (iv) novel video coding architectures. Several proposals use a region of interest-based approach, where different areas of the frames are coded in varying qualities.

The responses to the CfP reported an improvement in compression efficiency of up to 57% on object tracking, up to 45% on instance segmentation, and up to 39% on object detection, respectively, in terms of bit rate reduction for equivalent task performance. Notably, all requirements defined by WG 2 were addressed by various proposals.

Furthermore, MPEG Technical Requirements (WG 2) evaluated the responses to the Call for Evidence (CfE) for technologies and solutions enabling efficient feature coding for machine vision tasks. A total of eight responses to this CfE were received, of which six responses were considered valid based on the conditions described in the call:

  • For the tested video dataset increases in compression efficiency of up to 87% compared to the video anchor and over 90% compared to the feature anchor were reported.
  • For the tested image dataset, the compression efficiency can be increased by over 90% compared to both image and feature anchors.

Research aspects: the main research area is still the same as described in my last column, i.e., compression efficiency (incl. probably runtime, sometimes called complexity) and Quality of Experience (QoE). Additional research aspects are related to the actual task for which video coding for machines is used (e.g., segmentation, object detection, as mentioned above).

Video Decoding Interface for Immersive Media

One of the most distinctive features of immersive media compared to 2D media is that only a tiny portion of the content is presented to the user. Such a portion is interactively selected at the time of consumption. For example, a user may not see the same point cloud object’s front and back sides simultaneously. Thus, for efficiency reasons and depending on the users’ viewpoint, only the front or back sides need to be delivered, decoded, and presented. Similarly, parts of the scene behind the observer may not need to be accessed.

At the 140th MPEG meeting, MPEG Systems (WG 3) reached the final milestone of the Video Decoding Interface for Immersive Media (VDI) standard (ISO/IEC 23090-13) by promoting the text to Final Draft International Standard (FDIS). The standard defines the basic framework and specific implementation of this framework for various video coding standards, including support for application programming interface (API) standards that are widely used in practice, e.g., Vulkan by Khronos.

The VDI standard allows for dynamic adaptation of video bitstreams to provide the decoded output pictures so that the number of actual video decoders can be smaller than the number of elementary video streams to be decoded. In other cases, virtual instances of video decoders can be associated with the portions of elementary streams required to be decoded. With this standard, the resource requirements of a platform running multiple virtual video decoder instances can be further optimized by considering the specific decoded video regions to be presented to the users rather than considering only the number of video elementary streams in use. The first edition of the VDI standard includes support for the following video coding standards: High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC), and Essential Video Coding (EVC).

Research aspect: VDI is also a promising standard to enable the implementation of viewport adaptive tile-based 360-degree video streaming, but its performance still needs to be assessed in various scenarios. However, requesting and decoding individual tiles within a 360-degree video streaming application is a prerequisite for enabling efficiency in such cases, and VDI provides the basis for its implementation.

MPEG-DASH Updates

Finally, I’d like to provide a quick update regarding MPEG-DASH, which seems to be in maintenance mode. As mentioned in my last blog post, amendments, Defects under Investigation (DuI), and Technologies under Consideration (TuC) are output documents, as well as a new working draft called Redundant encoding and packaging for segmented live media (REAP), which eventually will become ISO/IEC 23009-9. The scope of REAP is to define media formats for redundant encoding and packaging of live segmented media, media ingest, and asset storage. The current working draft can be downloaded here.

Research aspects: REAP defines a distributed system and, thus, all research aspects related to such systems apply here, e.g., performance and scalability, just to name a few.

The 141st MPEG meeting will be online from January 16-20, 2023. Click here for more information about MPEG meetings and their developments.

JPEG Column: 97th JPEG Meeting

JPEG initiates specification on fake media based on responses to its call for proposals

The 97th JPEG meeting was held online from 24 to 28 October 2022. JPEG received responses to the Call for Proposals (CfP) on JPEG Fake Media, the first multimedia international standard designed to facilitate the secure and reliable annotation of media assets creation and modifications. In total six responses were received addressing different requirements in the scope of this standardization initiative. Moreover, relevant advances were made on the standardization of learning-based coding, notably the learning-based coding of images, JPEG AI, and JPEG Pleno point cloud coding. Furthermore, the explorations on quality assessment of images, JPEG AIC, and of JPEG Pleno light field had relevant advances with the definition of their Calls for Contributions and Common Test Conditions.

Also relevant, the 98th JPEG meeting will be held in Sydney, Australia, representing a return to physical meetings after the long COVID pandemics. This is a return, as the last physical meeting was also held in January 2020 in the same location, in Sydney, Australia.

The 97th JPEG meeting had the following highlights:

  • JPEG Fake Media responses to the Call for Proposals analysed,
  • JPEG AI Verification Model,
  • JPEG Pleno Learning-based Point Cloud coding Verification Model,
  • JPEG Pleno Light Field issues a Call for Contributions on Subjective Light Field Quality Assessment,
  • JPEG AIC issues a Call for Contributions on Subjective Image Quality Assessment,
  • JPEG DNA releases a draft of Common Test Conditions,
  • JPEG XS prepares third edition of core coding system, and profiles and buffer models,
  • JPEG 2000 conformance is under development.
Fig. 1: Fake Media application scenarios: Good faith vs Malicious intent.

The following summarises the major achievements of the 97th JPEG meeting.

JPEG Fake Media

In April 2022, the JPEG Committee released a Final Call for Proposals on JPEG Fake Media. The scope of JPEG Fake Media is the creation of a standard that can facilitate the secure and reliable annotation of media assets creation and modifications. The standard shall address use cases that are in good faith as well as those with malicious intent. During the 97th meeting in October 2022, the following six responses to the call were presented:

  1. Adobe/C2PA: C2PA Specification
  2. Huawei: Provenance and Right Management for Digital Contents in JPEG Fake Media
  3. Sony Group Corporation: Methods to keep track provenance of media asset and signing data
  4. Vrije Universiteit Brussel/imec: Media revision history tracking via asset decomposition and serialization
  5. UPC: MIPAMS Provenance module
  6. Newcastle University: Response to JPEG Fake Media standardization call

In the coming months, these proposals will be thoroughly evaluated following a process that is open, transparent, fair and unbiased and allows deep technical discussions to assess which proposals best address identified requirements. Based on the conclusions of these discussions, a new standard will be produced to address fake media and provide solutions for transparency related to media authenticity. The standard will combine the best elements of the six proposals.

To stay informed about the activities please join the JPEG Fake Media & NFT AHG mailing list and regularly check the JPEG website for the latest information.

JPEG AI

JPEG AI (ISO/IEC 6048) aims at the development of a learning-based image coding standard offering a single-stream, compact compressed domain representation, targeting both human visualization with significant compression efficiency improvement over state-of-the-art image coding standards at similar subjective quality, and improved performance for image processing and computer vision tasks. The evaluation of the Call for Proposals responses had already confirmed the industry interest, and the subjective tests presented at the 96th JPEG meeting showed results that significantly outperform conventional image compression solutions. 

The JPEG AI verification model has been issued as the outcome of this meeting and follows the integration effort of several neural networks and tools. There are several characteristics that make the JPEG AI Verification Model (VM) unique, such as the decoupling of the entropy decoding from the sample reconstruction and the exploitation of the spatial correlation between latents using a prediction and a fusion network as well as a massively parallelized auto-regressive network. The performance evaluation has shown significant RD performance improvements (as much as 32.2% of BD-rate over H.266/VVC) with competitive decoding complexity. Other functionalities such as rate adaptation and device interoperability have also been addressed with the use of gain units and the quantization of the weights in the entropy decoding module. Moreover, the adoption process for architectural changes and for new or improved coding tools in JPEG AI VM was approved. A set of core experiments have been defined for improving the JPEG AI VM and target the improvement of the coding efficiency and the reduction of the encoding and decoding complexity. The core experiments represent a set of promising technologies, such as learning-based GAN training, simplification of the analysis/synthesis transform, adaptive entropy coding alphabet, and even encoder-only tools and procedures for training speed-up.

JPEG Pleno Learning-based Point Cloud coding

The JPEG Pleno Point Cloud activity progressed at this meeting with the successful validation of the Verification Model under Consideration (VMuC). The VMuC was confirmed as the Verification Model (VM) to form the core of the future standard; ISO/IEC 21794 Part 6 JPEG Pleno: Learning-based Point Cloud Coding. The JPEG Committee has commenced work on the Working Draft of the standard, with initial text reviewed at this meeting. Prior to the next 98th JPEG Meeting, JPEG experts will investigate possible advancements to the VM in the area of auto-regressive entropy encoding and sparse tensor convolution as well as sourcing additional point clouds for the JPEG Pleno Point Cloud test set.

JPEG Pleno Light Field

During the 97th meeting, the JPEG Committee released the “JPEG Pleno Final Call for Contributions on Subjective Light Field Quality Assessment”, to collect new procedures and best practices regarding light field subjective quality evaluation methodologies to assess artifacts induced by coding algorithms. All contributions, including test procedures, datasets, and any additional information, will be considered to develop the standard by consensus among the JPEG experts following a collaborative process approach. The deadline for submission of contributions is April 1, 2023.


The JPEG Committee organized its 1st workshop on light field quality assessment to discuss challenges and current solutions for subjective light field quality assessment, explore relevant use cases and requirements, and provide a forum for researchers to discuss the latest findings in this area. The JPEG Committee also promoted its 2nd workshop on learning-based light field coding to exchange experiences and to present technological advances in learning-based coding solutions for light field data. The proceedings and video footage of both workshops are now accessible on the JPEG website.

JPEG AIC

At the 97th JPEG Meeting, a new JPEG AIC Final Call for Contributions on Subjective Image Quality Assessment was issued. The JPEG Committee is working on the continuation of the previous standardization efforts (AIC-1 and AIC-2) and aims at developing a new standard, known as AIC-3. The new standard will be focusing on the methodologies for quality assessment of images in a range that goes from high quality to near-visually lossless quality, which are not covered by the previous AIC standards.

The Call for Contributions on Subjective Image Quality Assessment is asking for contributions to the standardization process that will be collaborative from the very beginning. In this context, all received contributions will be considered for the development of the standard by consensus among the JPEG experts.

The JPEG Committee will be releasing a new JPEG AIC-3 Dataset on the 15th of December 2022. And the deadline for submitting contributions to the call is set to the 1st of April 2023 23:59 UTC. The contributors will be presenting their contributions at the 99th JPEG Meeting in April 2023.

The Call for Contributions on Subjective Image Quality Assessment addresses the development of a suitable subjective evaluation methodology standard. A second stage will address the objective perceptual visual quality evaluation models that perform well and have a good discriminative power in the high quality to near-visually lossless quality range.

JPEG DNA

The JPEG Committee has continued its exploration of the coding of images in quaternary representations, as it is particularly suitable for DNA storage applications. The scope of JPEG DNA is the creation of a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers. During the 97th JPEG meeting, the JPEG DNA Benchmark Codec and the JPEG DNA Common Test Conditions were updated to allow for additional concrete experiments to take place prior to issuing a draft call for proposals at the next meeting. This will also allow further validation and extension of the JPEG DNA benchmark codec to simulate an end-to-end image storage pipeline using DNA and in particular include biochemical noise simulation which is an essential element in practical implementations.

JPEG XS

The 2nd edition of JPEG XS is now fully completed and published. The JPEG Committee continues its work on the 3rd edition of JPEG XS, starting with Part 1 (Core coding system) and Part 2 (Profiles and buffer models). These editions will address new use cases and requirements for JPEG XS by defining additional coding tools to further improve the coding efficiency, while keeping the low-latency and low-complexity core aspects of JPEG XS. The primary goal of the 3rd edition is to deliver the same image quality as the 2nd edition, but with half of the required bandwidth. During the 97th JPEG meeting, a new Working Draft of Part 1 and a first Working Draft of Part 2 were created. To support the work a new Core Experiment was also issued to further test the proposed technology. Finally, an update to the JPEG XS White Paper has been published.

JPEG 2000

A new edition of Rec. ITU-T T.803 | ISO/IEC 15444-4 (JPEG 2000 conformance) is under development.

This new edition proposes to relax the maximum allowable errors so that well-designed 16-bit fixed-point implementations pass all compliance tests; adds two test codestreams to facilitate testing of inverse wavelet and component decorrelating transform accuracy, and adds several codestreams and files conforming to Rec. ITU-T 801 |ISO/IEC 15444-2 to facilitate the implementation of decoders and file format readers

Codestreams and test files can be found on the JPEG GitLab repository at: https://gitlab.com/wg1/htj2k-codestreams/-/merge_requests/14

Final Quote

“Motivated by the consumers’ concerns of manipulated contents, the JPEG Committee has taken concrete steps to define a new standard that provides interoperable solutions for a secure and reliable annotation of media assets creation and modifications” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Upcoming JPEG meetings are planned as follows:

  • No 98, will be in Sydney, Australia from 14-20 January 2022

Students Report from ACM Multimedia 2022

ACM Multimedia 2022 was held in a hybrid format in Lisbon, Portugal from October 10-14, 2022.

This was the first local participation in three years for many participants, as the strict travel restrictions associated with Covid-19 in 2020 and 2021 made it difficult to participate locally by travelling out of the host and neighbouring countries.

In Portugal, the Covid-19 restrictions were almost lifted, and the city was bustling with tourists. Participants were careful to avoid infectious diseases and enjoyed Lisbon’s local wine “Vinho Verde” and cod dishes with their colleagues and engaged in lively discussions about multimedia research.

For many students, this was their first time presenting at an international conference, and it was a wonderful experience.

To encourage student authors to participate on-site, SIGMM has sponsored a group of students with Student Travel Grant Awards. Students who wanted to apply for this travel grant needed to submit an online form before the submission deadline. The selected students received either 1,000 or 2,000 USD to cover their airline tickets as well as accommodation costs for this event. Of the recipients, 25 were able to attend the conference. We asked them to share their unique experience attending ACM Multimedia 2022. In this article, we share their reports of the event.


Xiangming Gu, PhD student, National University of Singapore, Singapore

It is a great honour to receive a SIGMM Student Grant. ACM Multimedia 2022 is my first time attending an academic conference physically. During the conference, I presented my oral paper “MM-ALT: Multimodal Automatic Lyric Transcription”, which was also selected as “Top Rated Papers”. Besides the presentation, I also met a lot of people who shared similar research interests. It was very inspiring to learn from others’ papers and discuss them with the authors directly. Moreover, I was also a volunteer for ACM Multimedia 2022 and attended the session of the 5th International ACM Workshop on Multimedia Content Analysis in Sports. During the session, I learnt how to organize a workshop, which was a great exercise for me. Now, after I come back to Singapore, I still miss the conference. I wish I can get my paper accepted next year and attend the conference again.

Avinash Madasu, Computer Science Master’s student at the University of North Carolina Chapel Hill, USA.

It is my absolute honour to receive the student travel grant for attending the ACM Multimedia 2022 conference. This is the first time I have attended a top AI conference in-person. I enjoyed it a lot during the conference and I was sad that the conference ended quickly. Within the conference days, I was able to attend a lot of oral sessions, keynote talks and poster sessions. I was able to interact with fellow researchers from both academia and industry. I learnt a lot about exciting research going on in my area of interest as well as other areas. It provided a new refreshing experience and I hope to bring this to my research. I presented a poster and felt happy when fellow researchers appreciated my work. Apart from technical details, I was able to forge a lot of new friendships which I truly cherish for my whole life.

Moreno La Quatra, PhD student, Politecnico di Torino

The ACM Multimedia 2022 conference was an amazing experience. After a few years of remote conferences, it was a pleasure to be able to attend the conference in person. I got the opportunity to meet many researchers of different seniorities and backgrounds, and I learned a lot from them. The poster sessions were one of the highlights of the conference. They were a very valuable opportunity to present interesting ideas and explore the details of other researchers’ work. I found the keynotes, presentations, and workshops to be very inspiring and engaging as well. Throughout them, I learned about specific topics and interacted with friendly, passionate researchers from around the world. I would like to thank the ACM Multimedia 2022 organization for the opportunity to attend the conference in Lisbon, all the other volunteers for their friendly and helpful attitude, and the SIGMM Student Travel Grant committee for the financial support.

Sheng-Ming Tang, Master student, National Tsing Hua University, Hsinchu, Taiwan

My name is Sheng-Ming Tang from National Tsing Hua University, Hsinchu, Taiwan. It is a great honour for me to receive the student travel grant. First, I want to thank the committee for organizing this fantastic event. As ACM MM 2022 is my first in-person experience presenting at a conference, I felt a little bit nervous in the first place. However, I started to get comfortable in the conference through the interaction of those astonishing researchers and the volunteers. It was great to not only present in front of the public but also participate in the events. I met a lot of people who solved problems with different and creative approaches, learned brand-new mindsets from the keynote sessions, and gained abundant feedback from the audience, which would boost my research. Thank the committee again for giving me this greatest opportunity to present and share my work in person. I enjoyed a lot during the event.

Tai-Chen Tsai, Graduate student, National Tsing Hua University Taiwan

First, I would like to thank ACM for providing a student travel grant that allowed me to attend the conference. This is my first time presenting my work at a conference. The conference I attended was the interactive art session. I was worried that the setup would be complicated abroad. However, as soon as I arrived at the site, volunteers assisted me with the installation. The conference provided complete hardware resources, allowing me to have a smooth and excellent exhibition experience. Also, I took the opportunity to see many interesting researchers from different countries. The work “Emotional Machines” in the interactive art exhibition surprised me. His system collects and combines what participants are saying and their current emotions. The data is transformed into 360-degree image content in the VR environment through the model so that everyone’s information forms a small universe in the VR environment. The idea is creative.
Additionally, I can chat and discuss projects with published researchers while volunteering at workshops. They shared their lifestyle and work experiences as researchers in European countries, and we discussed what interesting study is and what is not. This is the best reward for me.

Bhalaji Nagarajan, PhD Student, Universitat de Barcelona, Spain

ACM-Multimedia was the first big conference I was able to participate in person after two years of complete virtual participation. I presented my work both as oral and poster presentations at the Workshop on Multimedia-Assisted Dietary Management (MADiMa). It gave me an excellent opportunity to present my work and to get valuable input from reputed pioneers regarding the future scope. It gave me a new dimension and helped in expanding my technical skill set. This was also my first volunteering experience on such a massive scale. It gave me a great learning experience to see and learn how to manage conferences of such a large scale.
I am very happy that I attended the conference in person. I was able to meet new people, and reputed pioneers in the field, learn new things and of course, made some new friends. A big thank you for the SIGMM Travel Grant that allowed me to attend the conference in-person in Lisbon.

Kiruthika Kannan, MS by Research, International Institute of Information Technology, Hyderabad, India. 

My paper on “DrawMon: A Distributed System for Detection of Atypical Sketch Content in Concurrent Pictionary Games” was accepted at the 30th ACM International Conference on Multimedia. It was my first international conference, and I felt honoured to be able to present my research in front of experienced researchers. The conference also exhibited diverse research projects addressing fascinating scientific and technological problems. The poster sessions and talks at the conference improved my knowledge of the research trends in multimedia. In addition to this, I was able to interact with fellow researchers from diverse cultures. It was interesting to hear about their experiences and learn about their work at their institution. As a volunteer at the conference, I witnessed the hard work of the behind the scene organizers and volunteering team to smoothly run the events. I am grateful to the SIGMM Student Travel Grant for supporting my attendance at the ACMMM 22 conference.

Garima Sharma, PhD Student, Department of Human-Centred Computing, Monash University

It was a pleasure to receive a SIGMM travel grant and to attend the ACM Multimedia 2022 conference in person. ACM Multimedia is one of the top conferences in my research area and it was my first in-person conference during my PhD. I had a great experience interacting with numerous researchers and fellow PhD students. Along with all the interesting keynotes, I attended as many oral sessions as possible. Some of these sessions were aligned with my research work and some were outside of my work. This gave me a new research perspective at different levels. Also, working with organisers in a few sessions gave me a whole new experience in managing these events. Overall, I got many insightful comments, suggestions and feedback which motivated me with some interesting directions in my research work. I would like to thank the organisers for making this year’s ACM Multimedia a wonderful experience for every attendee.

Alon Harell, PhD student at the Multimedia Lab at Simon Fraser University

I had the pleasure to receive the SIGMM Student Travel Grant and to attend and volunteer at ACM Multimedia 22 in Lisbon, Portugal. The work I submitted to the conference was done outside of my regular PhD research, and thus without this grant, I would have not been able to participate. The workshop at which I presented, ACM MM Sport 22, was incredibly eye-opening with many fantastic papers, great presentations, and above all great people with which I was able to exchange ideas, form bonds, and perhaps even create future collaborations. The main conference, which coincides more closely with my main research on image and video coding for machines, was just as good. With fascinating talks, some in person, and some virtual, I was exposed to many new ideas (or perhaps just new to me) and learned a great deal. I was also able to benefit from the generosity and experience of Prof.  Chong Wah Ngo from Singapore Management University, during my PhD. Mentor lunch, who shared with me his thoughts on pursuing a career in academia. Overall, ACM Multimedia 22 was an especially unique experience because it was the first in-person conference I was able to attend since the beginning of the COVID-19 pandemic, and being back face-to-face with fellow researchers was a great pleasure.

Lorenzo Vaiani, Ph.D. student (1st year), Politecnico di Torino, Italy

ACM MM 2022 was my first in-person conference. Being able to present my works and discuss them with other participants in person was an incredible experience. I enjoyed every activity, from presentations and posters to workshops and demos. I received excellent feedback and new inspiration to continue my research. The best part was definitely strengthening the bonds with friends I already knew and making more with the amazing people I met there. I learned a lot from all of them. Volunteer activities helped a lot in making these kinds of connections. Thanks to the organizers for this fantastic opportunity and the SIGMM Student Travel Grant committee for the financial support. This edition of ACM MM was just the first for me, but I hope for many more in the future.

Xiaoyu Lin, third-year PhD student at Inria Grenoble, France

It is a great honour to attend ACM MM 2022 in Lisbon. It was a great experience. I have met lots of nice professors and researchers. Discussing with them gave me lots of inspiration on both research directions and career development. I presented my work during the doctoral symposium. I’ve got plenty of useful feedback which can help me to improve our work. During the “Ask Me Anything” lunch, I have the chance to discuss with several senior researchers. They provide me with some kind and very useful advice on how to do research. Besides, I have also served as a volunteer for a workshop. It also helped me to meet other volunteers and made some new friends. Thanks to all the chairs and organizers who have worked hard to make ACM MM 2022 such a wonderful conference.  It’s really an impressive experience!

Zhixin Ma, PhD student, Singapore Management University, Singapore

I would like to thank the ACM Multimedia Committee provided me with the student travel grant so that I can attend the conference in person. ACM Multimedia is the worldwide top conference in the Multimedia field. It provides me with an opportunity to present my work and communicate with the researchers working on this topic of multimedia search.
Besides, the excellent keynotes and passionate panel talk also picture a good vision of future research in the multimedia field. Overall, I must express that ACM MM22 is amazing and well-organized. I again appreciate the ACM MM committee for the student travel grant, which made my attendance possible.

Report from ACM Multimedia 2022 by Nitish Nagesh


Nitish Nagesh (@nitish_nagesh) is a Ph.D. student in the Computer Science department, the University of California, Irvine, USA. He has been awarded as Best Social Media Reporter of ACM Multimedia 2022 conference. To celebrate this award, Nitish Nagesh reported on his wonderful experience at ACM Multimedia 2022 as follows.


I was excited when our paper “World Food Atlas for Food Navigation” was accepted to the Multimedia Assisted Dietary Management Workshop (MADiMA). Being held in conjunction with ACM Multimedia 2022, the premier multimedia conference was the icing on the cake. It being in Lisbon, Portugal was the cherry on top of the cake. It is said that a picture is worth a thousand words. It is fitting to describe a multimedia conference experience report through pictures.

Prof. Ramesh Jain organized an informal meetup at the Choupana Caffe based on the advice of Joao Magalhaes, general chair of ACMMM 2022. It was great to meet researchers working on food computing including Prof. Yoko Yamakata, Prof. Agnieszka, Maija Kale. It was great to also have the company of students and professors from Singapore Management University including Prof. Chong Wah along with Prof. Phoebe Chen. Since this was the first in-person conference for many folks, we had great conversations over waffles, pear salad and watermelon mint juice!

The MADiMA workshop and the Cooking and Eating Activities (CEA) workshop had stellar keynote speakers and presentations about topics ranging from adherence to a mediterranean diet to mental health estimation through food images. 

The workshop was at the Lisbon Congress Center. It was a treat to watch the sun shine brightly on the congress center in the morning and the mellow sunset only a few minutes away near the Tagus Estuary rendering an orangish hue to the red bridge overlooking the train tracks below.

After a great set of presentations, the MADIMA and CEA workshop was drawn to a close with a group picture, of one large family of people who love food and want to help people enjoy food while maintaining their health goals. A huge shout out to the workshop chairs Prof. Stavroula Mougiakakou, Prof. Keiji Yanai, Prof. Dario Allegra and Prof. Yoko Yamakata. (I tried my best to include a photo where everyone looks good!)

All work and no play makes us dull people! And all research with no food makes us hungry people! We had a post-workshop dinner at an authentic Portuguese restaurant. The food was great and it was a delightful evening because of the surprise treat from the professors! 

Prof. Jain’s Ph.D. talk was inspiring as he shared his personal journey that led him to focus on healthcare. He urged students in the multimedia community to pursue multimodal healthcare research as he shared his insights on building a personal health navigator.

I had signed up to be a mentee for the Ph.D. school Ask Me Anything (AMA) session. We asked Prof. Ming Dong questions about his time at graduate school, balancing teaching and research responsibilities, tips on maximizing research output and strategies to cope with rejections. He was candid in his responses and emphasized the need to focus on incremental progress while striving to do impactful research. I must thank Prof. Wei Tsang and other organizers for their leadership in organizing a first-of-its-kind session.

In between running around oral sessions, poster presentations, keynote talks, networking, grabbing lunch, and enjoying Portuguese Tart, we managed to have fun while volunteering. Huge credit to the students and staff (the Rafael’s, the Diogo’s, the David’s, the Gustavo’s, the Pedro’s) from Nova university for doing the heavy lifting to ensure a smooth online, hybrid and in-person experience!

It was a pleasure to watch Prof. Alan Smeaton deliver an inspiring speech about the journey of information retrieval and multimedia. The community congratulates you once again on the Technical Achievement Award – more power to you, Alan!

The highlight of the conference was the grand banquet at Centro Cultural de Belém. There could not have been a better climax to the gala event than the Fado music. One aspect of Fado music symbolizes longing where the spouse sings a melancholy when her partner sets sail on long voyages. It is accompanied by the unique 12 string guitar and is sung very close to the audience to heighten the intimacy. I could fully relate to the artists’ melody and rhythms since I had been longing to see my family and friends back home, whom I have not visited for the past three years due to the pandemic. Another tune described the beauty of Lisbon in superlatives including the sun shining the brightest compared to any other part of the world. There was a happy ending to the tune when the artists recreated the moment of joy after the war was over and everyone was merry again. It reinvigorated a fresh hope and breathed a new lease of life into our cluttered worlds. For once, I was truly present in the moment!

VQEG Column: VQEG Meeting May 2022

Introduction

Welcome to this new column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG), which will provide an overview of the last VQEG plenary meeting that took place from 9 to 13 May 2022. It was organized by INSA Rennes (France), and it was the first face-to-face meeting after the series of online meetings due to the Covid-19 pandemic. Remote attendance was also offered, which made possible that around 100 participants, from 17 different countries, attended the meeting (more than 30 of them attended in person). During the meeting, more than 40 presentations were provided, and interesting discussion took place. All the related information, minutes, and files from the meeting are available online in the VQEG meeting website, and video recordings of the meeting are available in Youtube.

Many of the works presented at this meeting can be relevant for the SIGMM community working on quality assessment. Particularly interesting can be the proposals to update the ITU-T Recommendations P.910 and P.913, as well as the presented publicly available datasets. We encourage those readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Group picture of the VQEG Meeting 9-13 May 2022 in Rennes (France).

Overview of VQEG Projects

Audiovisual HD (AVHD)

The AVHD group investigates improved subjective and objective methods for analyzing commonly available video systems. In this sense, the group continues working on extensions of the ITU-T Recommendation P.1204 to cover other encoders (e.g., AV1) apart from H.264, HEVC, and VP9. In addition, the project’s Quality of Experience (QoE) Metrics for Live Video Streaming Applications (Live QoE) and Advanced Subjective Methods (AVHD-SUB) are still ongoing. 

In this meeting, several AVHD-related topics were discussed, supported by six different presentations. In the first one, Mikolaj Leszczuk (AGH University, Poland) presented an analysis of the influence on the subjective assessment of the quality of video transmission of experiment conditions, such as video sequence order, variation and repeatability that can entail a “learning” process of the test participants during the test. In the second presentation, Lucjan Janowski (AGH University, Poland) presented two proposals towards more ecologically valid experiment designs: the first one using the Absolute Category Rating [1] without scale but in a “think aloud” manner, and the second one called “Your Youtube, our lab” in which the user selects the content that he or she prefers and a question quality appears during the viewing experience through a specifically designed interface. Also dealing with the study of testing methodologies, Babak Naderi (TU-Berlin, Germany) presented work on subjective evaluation of video quality with a crowdsourcing approach, while Pierre David (Capacités, France) presented a three-lab experiment, involving Capacités (France), RISE (Sweden) and AGH University (Poland) on quality evaluation of social media videos. Kjell Brunnström (RISE, Sweden) continued by giving an overview of video quality assessment of Video Assistant Refereeing (VAR) systems, and lastly, Olof Lindman (SVT, Sweden) presented another effort to reduce the lack of open datasets with the Swedish Television (SVT) Open Content.

Quality Assessment for Health applications (QAH)

The QAH group works on the quality assessment of health applications, considering both subjective evaluation and the development of datasets, objective metrics, and task-based approaches. In this meeting, Lucie Lévêque (Nantes Université, France) provided an overview of the recent activities of the group, including a submitted review paper on objective quality assessment for medical images, a special session accepted for IEEE International Conference on Image Processing (ICIP) that will take place in October in Bordeaux (France), and a paper submitted to IEEE ICIP on quality assessment through detection task of covid-19 pneumonia. The work described in this paper was also presented by Meriem Outtas (INSA Rennes, France).

In addition, there were two more presentations related to the quality assessment of medical images. Firstly, Yuhao Sun (University of Edinburgh, UK) presented their research on a no-reference image quality metric for visual distortions on Computed Tomography (CT) scans [2]. Finally, Marouane Tliba (Université d’Orleans, France) presented his studies on quality assessment of medical images through deep-learning techniques using domain adaptation.

Statistical Analysis Methods (SAM)

The SAM group works on improving analysis methods both for the results of subjective experiments and for objective quality models and metrics. The group is currently working on a proposal to update the ITU-T Recommendation P.913, including new testing methods for subjective quality assessment and statistical analysis of the results. Margaret Pinson presented this work during the meeting.   

In addition, five presentations were delivered addressing topics related to the group activities. Jakub Nawała (AGH University, Poland) presented the Generalised Score Distribution to accurately describe responses from subjective quality experiments. Three presentations were provided by members of Nantes Université (France): Ali Ak presented his work on spammer detection on pairwise comparison experiments, Andreas Pastor talked about how to improve the maximum likelihood difference scaling method in order to measure the inter-content scale, and Chama El Majeny presented the functionalities of a subjective test analysis tool, whose code will be publicly available. Finally, Dietmar Saupe (Univerity of Konstanz, Germany) delivered a presentation on subjective image quality assessment with boosted triplet comparisons.

Computer Generated Imagery (CGI)

CGI group is devoted to analyzing and evaluating computer-generated content, with a focus on gaming in particular. Currently, the group is working on the ITU-T Work Item P.BBQCG on Parametric bitstream-based Quality Assessment of Cloud Gaming Services. Apart from this, Jerry (Xiangxu) Yu (University of Texas at Austin, US) presented a work on subjective and objective quality assessment of user-generated gaming videos and Nasim Jamshidi (TUB, Germany) presented a deep-learning bitstream-based video quality model for CG content.

No Reference Metrics (NORM)

The NORM group is an open collaborative project for developing no-reference metrics for monitoring visual service quality. Currently, the group is working on three topics: the development of no-reference metrics, the clarification of the computation of the Spatial and Temporal Indexes (SI and TI, defined in the ITU-T Recommendation P.910), and on the development of a standard for video quality metadata.  

At this meeting, this was one of the most active groups and the corresponding sessions included several presentations and discussions. Firstly, Yiannis Andreopoulos (iSIZE, UK) presented their work on domain-specific fusion of multiple objective quality metrics. Then, Werner Robitza (AVEQ GmbH/TU Ilmenau, Germany) presented the updates on SI/TI clarification activities, which is leading an update of the ITU-T Recommendation P.910. In addition, Lukas Krasula (Netflix, US) presented their investigations on the relation between banding annoyance and the overall quality perceived by the viewers. Hadi Amirpour (University of Klagenfurt, Austria) delivered two presentations related to their Video Complexity Analyzer and their Video Complexity Dataset, which are both publicly available. Finally, Mikołaj Leszczuk (AGH University , Poland) gave two talks on their research related to User-Generated Content (UGC) (a.k.a. in-the-wild video content) recognition and on advanced video quality indicators to characterise video content.   

Joint Effort Group (JEG) – Hybrid

The JEG group was focused on joint work to develop hybrid perceptual/bitstream metrics and gradually evolved over time to include several areas of Video Quality Assessment (VQA), such as the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. A report on the ongoing activities of the group was presented by Enrico Masala (Politecnico di Torino, Italy), which included the release of a new website to reflect the evolution that happened in the last few years within the group. Although currently the group is not directly seeking the development of new metrics or tools readily available for VQA, it is still working on related topics such as the studies by Lohic Fotio Tiotsop (Politecnico di Torino, Italy) on the sensitivity of artificial intelligence-based observers to input signal modification.

5G Key Performance Indicators (5GKPI)

The 5GKPI group studies the relationship between key performance indicators of new 5G networks and QoE of video services on top of them. In this meeting, Pablo Pérez (Nokia, Spain) presented an extended report on the group activities, from which it is worth noting the joint work on a contribution to the ITU-T Work Item G.QoE-5G

Immersive Media Group (IMG)

The IMG group is focused on the research on the quality assessment of immersive media. Currently, the main joint activity of the group is the development of a test plan for evaluating the QoE of immersive interactive communication systems. In this sense, Pablo Pérez (Nokia, Spain) and Jesús Gutiérrez (Universidad Politécnica de Madrid, Spain) presented a follow up on this test plan including an overview of the state-of-the-art on related works and a taxonomy classifying the existing systems [3]. This test plan is closely related to the work carried out by the ITU-T on QoE Assessment of eXtended Reality Meetings, so Gunilla Berndtsson (Ericsson, Sweden) presented the latest advances on the development of the P.QXM.  

Apart from this, there were four presentations related to the quality assessment of immersive media. Shirin Rafiei (RISE, Sweden) presented a study on QoE assessment of an augmented remote operating system for scaling in smart mining applications. Zhengyu Zhang (INSA Rennes, France) gave a talk on a no-reference quality metric for light field images based on deep-learning and exploiting angular and spatial information. Ali Ak (Nantes Université, France) presented a study on the effect of temporal sub-sampling on the accuracy of the quality assessment of volumetric video. Finally, Waqas Ellahi (Nantes Université, France) showed their research on a machine-learning framework to predict Tone-Mapping Operator (TMO) preference based on image and visual attention features [4].

Quality Assessment for Computer Vision Applications (QACoViA)

The goal of the QACoViA group is to study the visual quality requirements for computer vision methods. In this meeting, there were three presentations related to this topic. Mikołaj Leszczuk (AGH University, Poland) presented an objective video quality assessment method for face recognition tasks. Also, Alban Marie  (INSA Rennes, France) showed an analysis of the correlation of quality metrics with artificial intelligence accuracy. Finally, Lucie Lévêque (Nantes Université, France) gave an overview of a study on the reliability of existing algorithms for facial expression recognition [5]. 

Intersector Rapporteur Group on Audiovisual Quality Assessment (IRG-AVQA)

The IRG-AVQA group studies topics related to video and audiovisual quality assessment (both subjective and objective) among ITU-R Study Group 6 and ITU-T Study Group 12. In this sense, Chulhee Lee (Yonsei University, South Korea) and Alexander Raake (TU Ilmenau, Germany) provided an overview on ongoing activities related to quality assessment within ITU-R and ITU-T.

Other updates

In addition, the Human Factors for Visual Experiences (HFVE), whose objective is to uphold the liaison relation between VQEG and the IEEE standardization group P3333.1, presented their advances in relation to two standards: IEEE P3333.1.3 – Deep-Learning-based assessment of VE based on HF, which has been approved and published, and the IEEE P3333.1.4 on Light field imaging, which has been submitted and is in the process to be approved. Also, although there were not many activities in this meeting within the Implementer’s Guide for Video Quality Metrics (IGVQM) and the Psycho-Physiological Quality Assessment (PsyPhyQA) they are still active. Finally, as a reminder, the VQEG GitHub with tools and subjective labs setup is still online and kept updated.

The next VQEG plenary meeting will take place online in December 2022. Please, see VQEG Meeting information page for more information.

References

[1] ITU, “Subjective video quality assessment methods for multimedia applications”, ITU-T Recommendation P.910, Jul. 2022.
[2] Y. Sun, G. Mogos, “Impact of Visual Distortion on Medical Images”, IAENG International Journal of Computer Science, 1:49, Mar. 2022.
[3] P. Pérez, E. González-sosa, J. Gutiérrez, N. García, “Emerging Immersive Communication Systems: Overview, Taxonomy, and Good Practices for QoE Assessment”, Frontiers in Signal Processing, Jul. 2022.
[4] W. Ellahi, T. Vigier, P. Le Callet, “A machine-learning framework to predict TMO preference based on image and visual attention features”, International Workshop on Multimedia Signal Processing, Oct. 2021.
[5] E. M. Barbosa Sampaio, L. Lévêque, P. Le Callet, M. Perreira Da Silva, “Are facial expression recognition algorithms reliable in the context of interactive media? A new metric to analyse their performance”, ACM International Conference on Interactive Media Experiences, Jun. 2022.

JPEG Column: 96th JPEG Meeting

JPEG analyses the responses of the Calls for Proposals for the standardisation of the first codecs based on machine learning

The 96th JPEG meeting was held online from 25 to 29 July 2022. The meeting was one of the most productive in the recent history of JPEG with the analysis of the responses of two Calls for Proposals (CfP) for machine learning-based coding solutions, notably JPEG AI and JPEG Pleno Point Cloud Coding. The superior performance of the CfP responses compared to the state-of-the-art anchors leave little doubt about the future of coding technologies becoming dominated by machine learning-based solutions with the expected consequences on the standardisation pathway. A new era of multimedia coding standardisation has begun. Both activities had defined a verification model, and are pursuing a collaborative process that will select the best technologies for the definition of the new machine learning-based standards.

The 96th JPEG meeting had the following highlights:

JPEG AI and JPEG Pleno Point Cloud, the two first machine learning-based coding standards under development by JPEG.
  • JPEG AI response to the Call for Proposals;
  • JPEG Pleno Point Cloud begins the collaborative standardisation phase;
  • JPEG Fake Media and NFT
  • JPEG Systems
  • JPEG Pleno Light Field
  • JPEG AIC
  • JPEG XS
  • JPEG 2000
  • JPEG DNA

The following summarises the major achievements of the 96th JPEG meeting.

JPEG AI

The 96th JPEG meeting represents an important milestone for the JPEG AI standardisation as it marks the beginning of the collaborative phase of this project. The main JPEG AI objective is to design a solution that offers significant compression efficiency improvement over coding standards in common use at equivalent subjective quality and an effective compressed domain processing for machine learning-based image processing and computer vision tasks. 

During the 96th JPEG meeting, several activities occurred, notably presentation of the eleven responses to all tracks of the Call for Proposals (CfP). Furthermore, discussions on the evaluation process used to assess submissions to the CfP took place, namely, subjective, objective and complexity assessment as well as the identification of device interoperability issues by cross-checking. For the standard reconstruction track, several contributions showed significantly higher compression efficiency in both subjective quality methodologies and objective metrics when compared to the best-performing conventional image coding.

From the analysis and discussion of the results obtained, the most promising technologies were identified and a new JPEG AI verification model under consideration (VMuC) was approved. The VMuC corresponds to a combination of two proponents’ solutions (following the ‘one tool for one functionality’ principle), selected by consensus and considering the CfP decision criteria and factors. In addition, a set of JPEG AI Core Experiments were defined to obtain further improvements in both performance efficiency and complexity, notably the use of learning-based GAN training, alternative analysis/synthesis transforms and an evaluation study for the compressed-domain denoising as an image processing task. Several further activities were also discussed and defined, such as the design of a compressed domain image classification decoder VMuC, the creation of a large screen content dataset for the training of learning-based image coding solutions and the definition of a new and larger JPEG AI test set.

JPEG Pleno Point Cloud begins collaborative standardisation phase

JPEG Pleno integrates various modalities of plenoptic content under a single framework in a seamless manner. Efficient and powerful point cloud representation is a key feature of this vision. A point cloud refers to data representing positions of points in space, expressed in a given three-dimensional coordinate system, the so-called geometry. This geometrical data can be accompanied by per-point attributes of varying nature (e.g. color or reflectance). Such datasets are usually acquired with a 3D scanner, LIDAR or created using 3D design software and can subsequently be used to represent and render 3D surfaces. Combined with other types of data (like light field data), point clouds open a wide range of new opportunities, notably for immersive browsing and virtual reality applications.

Learning-based solutions are the state of the art for several computer vision tasks, such as those requiring a high-level understanding of image semantics, e.g., image classification, face recognition and object segmentation, but also 3D processing tasks, e.g. visual enhancement and super-resolution. Recently, learning-based point cloud coding solutions have shown great promise to achieve competitive compression efficiency compared to available conventional point cloud coding solutions at equivalent subjective quality. Building on a history of successful and widely adopted coding standards, JPEG is well positioned to develop a standard for learning-based point cloud coding.

During its 94th meeting, the JPEG Committee released a Final Call for Proposals on JPEG Pleno Point Cloud Coding. This call addressed learning-based coding technologies for point cloud content and associated attributes with emphasis on both human visualization and decompressed/reconstructed domain 3D processing and computer vision with competitive compression efficiency compared to point cloud coding standards in common use, with the goal of supporting a royalty-free baseline. During its 96th meeting, the JPEG Committee evaluated 5 codecs submitted in response to this Call. Following a comprehensive evaluation process, the JPEG Committee selected one of the proposals to form the basis of a future standard and initialised a sub-division to form Part 6 of ISO/IEC 21794. The selected submission was a learning-based approach to point cloud coding that met the requirements of the Call and showed competitive performance, both in terms of coding geometry and color, against existing solutions.

JPEG Fake Media and NFT

At the 96th JPEG meeting, 6 pre-registrations to the Final Call for Proposals (CfP) on JPEG Fake Media were received. The scope of JPEG Fake Media is the creation of a standard that can facilitate the secure and reliable annotation of media asset creation and modifications. The standard shall address use cases that are in good faith as well as those with malicious intent. The CfP welcomes contributions that address at least one of the extensive list of requirements specified in the associated “Use Cases and Requirements for JPEG Fake Media” document. Proponents who have not yet made a pre-registration are still welcome to submit their final proposal before 19 October 2022. Full details about the timeline, submission requirements and evaluation processes are documented in the CfP available on jpeg.org.

In parallel with the work on Fake Media, JPEG explores use cases and requirements related to Non Fungible Tokens (NFTs). Although the use cases between both topics are different, there is a significant overlap in terms of requirements and relevant solutions. The presentations and video recordings of the joint 5th JPEG NFT and Fake Media Workshop that took place prior to the 96th meeting are available on the JPEG website. In addition, a new version of the “Use Cases and Requirements for JPEG NFT” was produced and made publicly available for review and feedback.

JPEG Systems

During the 96th JPEG Meeting, the IS texts for both JLINK (ISO/IEC 19566-7) and JPEG Snack (ISO/IEC 19566-8) were prepared and submitted for final publication. JLINK specifies a format to store multiple images inside of JPEG files and supports interactive navigation between them. JLINK addresses use cases such as virtual museum tours, real estate visits, hotspot zoom into other images and many others. JPEG Snack on the other hand enables self-running multimedia experiences such as animated image sequences and moving image overlays. Both standards are based on the JPEG Universal Metadata Box Format (JUMBF, ISO/IEC 19566-5) for which a second edition is in progress. This second edition adds extensions to the native support of CBOR (Concise Binary Object Representation) and attaches private fields to the JUMBF Description Box.

JPEG Pleno Light Field

During its 96th meeting, the JPEG Committee released the “JPEG Pleno Second Draft Call for Contributions on Light Field Subjective Quality Assessment”, to collect new procedures and best practices for light field subjective quality evaluation methodologies to assess artefacts induced by coding algorithms. All contributions, which can be test procedures, datasets, and any additional information, will be considered to develop the standard by consensus among JPEG experts following a collaborative process approach. The Final Call for Contributions will be issued at the 97th JPEG meeting. The deadline for submission of contributions is 1 April 2023.

A JPEG Pleno Light Field AhG has also started the preparation of a first workshop on Subjective Light Field Quality Assessment and a second workshop on Learning-based Light field Coding, to exchange experiences, to present technological advances and research results on light field subjective quality assessment and to present technological advances and research results on learning-based coding solutions for light field data, respectively.

JPEG AIC

During its 96th meeting, a Second Draft Call for Contributions on Subjective Image Quality Assessment was issued. The final Call for Contributions is now planned to be issued at the 97th JPEG meeting. The standardization process will be collaborative from the very beginning, i.e. all submissions will be considered in developing the next extension of the JPEG AIC standard. The deadline for submissions has been extended to 1 April 2023 at 23:59 UTC. Multiple types of contributions are accepted, namely subjective assessment methods including supporting evidence and detailed description, test material, interchange format, software implementation, criteria and protocols for evaluation, additional relevant use cases and requirements, and any relevant evidence or literature. A dataset of sample images with compression-based distortions in the target quality range is planned to be prepared for the 97th JPEG meeting.

JPEG XS

With the 2nd edition of JPEG XS now in place, the JPEG Committee continues with the development of the 3rd edition of JPEG XS Part 1 (Core coding system) and Part 2 (Profiles and buffer models). These editions will address new use cases and requirements for JPEG XS by defining additional coding tools to further improve the coding efficiency, while keeping the low-latency and low-complexity core aspects of JPEG XS. The primary goal of the 3rd edition is to deliver the same image quality as the 2nd edition, but for specific content such as screen content with half of the required bandwidth. In this respect, experiments have indicated that it is possible to increase the quality in static regions of an image sequence by more than 10dB when compared to the 2nd edition. Based on the input contributions, a first working draft for 21122-1 has been created, along with the necessary core experiments for further evaluation and verification.

In addition, JPEG has finalized the work on the amendment for Part 2 2nd edition that defines a new High 4:2:0 profile and the new sublevel Sublev4bpp. This amendment is now ready for publication by ISO. In the context of Part 4 (Conformance testing) and Part 5 (Reference software), the JPEG Committee decided to make both parts publicly available.

Finally, the JPEG Committee decided to create a series of public documents, called the “JPEG XS in-depth series” that will explain various features and applications of JPEG XS to a broad audience. The first document in this series explains the advantages of using JPEG XS for raw image compression and will be published soon on jpeg.org.

JPEG 2000

The JPEG Committee published a case study that compares HT2K, ProRes and JPEG 2000 Part 1 when processing motion picture content with widely available commercial software tools running on notebook computers, available at https://ds.jpeg.org/documents/jpeg2000/wg1n100269-096-COM-JPEG_Case_Study_HTJ2K_performance_on_laptop_desktop_PCs.pdf

JPEG 2000 is widely used in the media and entertainment industry for Digital Cinema distribution, studio video masters and broadcast contribution links. High Throughput JPEG 2000 (HTJ2K or JPEG 2000 Part 15) is an update to JPEG 2000 that provides an order of magnitude speed up over legacy JPEG 2000 Part 1.

JPEG DNA

The JPEG Committee has continued its exploration of the coding of images in quaternary representations, as it is particularly suitable for DNA storage applications. The scope of JPEG DNA is the creation of a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers. During the 96th JPEG meeting, a new version of the overview document on Use Cases and Requirements for DNA-based Media Storage was issued and has been made publicly available. The JPEG Committee also updated two additional documents: the JPEG DNA Benchmark Codec and the JPEG DNA Common Test Conditions in order to allow for concrete exploration experiments to take place. This will allow further validation and extension of the JPEG DNA benchmark codec to simulate an end-to-end image storage pipeline using DNA and in particular, include biochemical noise simulation which is an essential element in practical implementations. A new branch has been created in the JPEG Gitlab that now contains two anchors and two JPEG DNA benchmark codecs.

Final Quote

“After successful calls for contributions, the JPEG Committee sets precedence by launching the collaborative phase of two learning based visual information coding standards, hence announcing the start of a new era in coding technologies relying on AI.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Upcoming JPEG meetings are planned as follows:

  • No 97, will be held online from 24-28 October 2022.
  • No 98, will be in Sydney, Australia from 14-20 January 2022

MPEG Column: 139th MPEG Meeting (virtual/online)

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 139th MPEG meeting was once again held as an online meeting, and the official press release can be found here and comprises the following items:

  • MPEG Issues Call for Evidence for Video Coding for Machines (VCM)
  • MPEG Ratifies the Third Edition of Green Metadata, a Standard for Energy-Efficient Media Consumption
  • MPEG Completes the Third Edition of the Common Media Application Format (CMAF) by adding Support for 8K and High Frame Rate for High Efficiency Video Coding
  • MPEG Scene Descriptions adds Support for Immersive Media Codecs
  • MPEG Starts New Amendment of VSEI containing Technology for Neural Network-based Post Filtering
  • MPEG Starts New Edition of Video Coding-Independent Code Points Standard
  • MPEG White Paper on the Third Edition of the Common Media Application Format

In this report, I’d like to focus on VCM, Green Metadata, CMAF, VSEI, and a brief update about DASH (as usual).

Video Coding for Machines (VCM)

MPEG’s exploration work on Video Coding for Machines (VCM) aims at compressing features for machine-performed tasks such as video object detection and event analysis. As neural networks increase in complexity, architectures such as collaborative intelligence, whereby a network is distributed across an edge device and the cloud, become advantageous. With the rise of newer network architectures being deployed amongst a heterogenous population of edge devices, such architectures bring flexibility to systems implementers. Due to such architectures, there is a need to efficiently compress intermediate feature information for transport over wide area networks (WANs). As feature information differs substantially from conventional image or video data, coding technologies and solutions for machine usage could differ from conventional human-viewing-oriented applications to achieve optimized performance. With the rise of machine learning technologies and machine vision applications, the amount of video and images consumed by machines has rapidly grown. Typical use cases include intelligent transportation, smart city technology, intelligent content management, etc., which incorporate machine vision tasks such as object detection, instance segmentation, and object tracking. Due to the large volume of video data, extracting and compressing the feature from a video is essential for efficient transmission and storage. Feature compression technology solicited in this Call for Evidence (CfE) can also be helpful in other regards, such as computational offloading and privacy protection.

Over the last three years, MPEG has investigated potential technologies for efficiently compressing feature data for machine vision tasks and established an evaluation mechanism that includes feature anchors, rate-distortion-based metrics, and evaluation pipelines. The evaluation framework of VCM depicted below comprises neural network tasks (typically informative) at both ends as well as VCM encoder and VCM decoder, respectively. The normative part of VCM typically includes the bitstream syntax which implicitly defines the decoder whereas other parts are usually left open for industry competition and research.

Further details about the CfP and how interested parties can respond can be found in the official press release here.

Research aspects: the main research area for coding-related standards is certainly compression efficiency (and probably runtime). However, this video coding standard will not target humans as video consumers but as machines. Thus, video quality and, in particular, Quality of Experience needs to be interpreted differently, which could be another worthwhile research dimension to be studied in the future.

Green Metadata

MPEG Systems has been working on Green Metadata for the last ten years to enable the adaptation of the client’s power consumption according to the complexity of the bitstream. Many modern implementations of video decoders can adjust their operating voltage or clock speed to adjust the power consumption level according to the required computational power. Thus, if the decoder implementation knows the variation in the complexity of the incoming bitstream, then the decoder can adjust its power consumption level to the complexity of the bitstream. This will allow less energy use in general and extended video playback for the battery-powered devices.

The third edition enables support for Versatile Video Coding (VVC, ISO/IEC 23090-3, a.k.a. ITU-T H.266) encoded bitstreams and enhances the capability of this standard for real-time communication applications and services. While finalizing the support of VVC, MPEG Systems has also started the development of a new amendment to the Green Metadata standard, adding the support of Essential Video Coding (EVC, ISO/IEC 23094-1) encoded bitstreams.

Research aspects: reducing global greenhouse gas emissions will certainly be a challenge for humanity in the upcoming years. The amount of data on today’s internet is dominated by video, which all consumes energy from production to consumption. Therefore, there is a strong need for explicit research efforts to make video streaming in all facets friendly to our environment. 

Third Edition of Common Media Application Format (CMAF)

The third edition of CMAF adds two new media profiles for High Efficiency Video Coding (HEVC, ISO/IEC 23008-2, a.k.a. ITU-T H.265), namely for (i) 8K and (ii) High Frame Rate (HFR). Regarding the former, the media profile supporting 8K resolution video encoded with HEVC (Main 10 profile, Main Tier with 10 bits per colour component) has been added to the list of CMAF media profiles for HEVC. The profile will be branded as ‘c8k0’ and will support videos with up to 7680×4320 pixels (8K) and up to 60 frames per second. Regarding the latter, another media profile has been added to the list of CMAF media profiles, branded as ‘c8k1’ and supports HEVC encoded video with up to 8K resolution and up to 120 frames per second. Finally, chroma location indication support has been added to the 3rd edition of CMAF.

Research aspects: basically, CMAF serves two purposes: (i) harmonizing DASH and HLS at the segment format level by adopting the ISOBMFF and (ii) enabling low latency streaming applications by introducing chunks (that are smaller than segments). The third edition supports resolutions up to 8K and HFR, which raises the question of how low latency can be achieved for 8K/HFR applications and services and under which conditions.

New Amendment for Versatile Supplemental Enhancement Information (VSEI) containing Technology for Neural Network-based Post Filtering

At the 139th MPEG meeting, the MPEG Joint Video Experts Team with ITU-T SG 16 (WG 5; JVET) issued a Committee Draft Amendment (CDAM) text for the Versatile Supplemental Enhancement Information (VSEI) standard (ISO/IEC 23002-7, a.k.a. ITU-T H.274). Beyond the Supplemental Enhancement Information (SEI) message for shutter interval indication, which is already known from its specification in Advanced Video Coding (AVC, ISO/IEC 14496-10, a.k.a. ITU-T H.264) and High Efficiency Video Coding (HEVC, ISO/IEC 23008-2, a.k.a. ITU-T H.265), and a new indicator for subsampling phase indication which is relevant for variable-resolution video streaming, this new amendment contains two SEI messages for describing and activating post filters using neural network technology in video bitstreams. This could reduce coding noise, upsampling, colour improvement, or denoising. The description of the neural network architecture itself is based on MPEG’s neural network coding standard (ISO/IEC 15938-17). Results from an exploration experiment have shown that neural network-based post filters can deliver better performance than conventional filtering methods. Processes for invoking these new post-processing filters have already been tested in a software framework and will be made available in an upcoming version of the Versatile Video Coding (VVC, ISO/IEC 23090-3, a.k.a. ITU-T H.266) reference software (ISO/IEC 23090-16, a.k.a. ITU-T H.266.2).

Research aspects: quality enhancements such as reducing coding noise, upsampling, colour improvement, or denoising have been researched quite substantially either with or without neural networks. Enabling such quality enhancements via (V)SEI messages enable system-level support for research and development efforts in this area. For example, integration in video streaming applications or/and conversational services, including performance evaluations.

The latest MPEG-DASH Update

Finally, I’d like to provide a brief update on MPEG-DASH! At the 139th MPEG meeting, MPEG Systems issued a new working draft related to Extended Dependent Random Access Point (EDRAP) streaming and other extensions, which will be further discussed during the Ad-hoc Group (AhG) period (please join the dash email list for further details/announcements). Furthermore, Defects under Investigation (DuI) and Technologies under Consideration (TuC) have been updated. Finally, a new part has been added (ISO/IEC 23009-9), which is called encoder and packager synchronization, for which also a working draft has been produced. Publicly available documents (if any) can be found here.

An updated overview of DASH standards/features can be found in the Figure below.

Research aspects: in the Christian Doppler Laboratory ATHENA we aim to research and develop novel paradigms, approaches, (prototype) tools and evaluation results for the phases (i) multimedia content provisioning (i.e., video coding), (ii) content delivery (i.e., video networking), and (iii) content consumption (i.e., video player incl. ABR and QoE) in the media delivery chain as well as for (iv) end-to-end aspects, with a focus on, but not being limited to, HTTP Adaptive Streaming (HAS). Recent DASH-related publications include “Low Latency Live Streaming Implementation in DASH and HLS” and “Segment Prefetching at the Edge for Adaptive Video Streaming” among others.

The 140th MPEG meeting will be face-to-face in Mainz, Germany, from October 24-28, 2022. Click here for more information about MPEG meetings and their developments.

Report from CBMI 2022

The 19th International Conference on Content-based Multimedia Indexing (CBMI) took place as a hybrid conference in Graz, Austria, from September 14-16, 2022, organized by JOANNEUM RESEARCH and supported by SIGMM. After the 2020 edition was postponed and held as a fully online conference in 2021, this was an important step back to a physical conference. Probably still as an effect of the COVID pandemic, the event was a bit smaller than in previous years, with around 50 participants from 18 countries (13 European countries, the rest from Asia and North America). About 60% were attending on-site, the other via web conference. 

Program highlights

The conference program included two keynotes. The opening keynote by Miriam Redi from Wikimedia analysed the role of multimedia assets in a free knowledge ecosystem such as the one around Wikipedia. The closing keynote by Efstratios Gavves from the University of Amsterdam showcased recent progress in machine learning of dynamic information and causality in a diverse range of application domains and highlighted open research challenges.

With the aim to increase the interaction between the scientific community and the users of multimedia indexing technologies, a panel session titled “Multimedia Indexing and Retrieval Challenges in Media Archives” was organised. The panel featured four distinguished experts from the audiovisual archive domain. Brecht Declerq from meemoo, the Flemish Institute for Archive, is currently the president of FIAT/IFTA, the International Association of TV Archives. Richard Wright started as a researcher in speech processing before he became a renowned expert in digital preservation, setting up a series of successful European projects in the area. Johan Oomen manages the department for Research and Heritage at Beeld en Geluid, the Netherlands Institute of Sound and Vision. Christoph Bauer is an expert from the Multimedia Archive of the Austrian Broadcasting Corporation ORF and consults archives of the Western Balkan countries on digitisation and preservation topics. The panel tried to analyse why only a small part of research outputs makes it into productive use at archives and identified research challenges such as the need for more semantic and contextualised content descriptions, the ability to easily control the amount vs. accuracy of generated metadata and the need for novel paradigms to interact with multimedia collections beyond the textual search box. At the same time, archives face the challenge of dealing with much richer metadata, but without the quality guarantees known from manually documented content.

Panel discussion with Richard Wright, Brecht Declerq, Christoph Bauer and Johan Oomen (online), moderated by Georg Thallinger.

In addition to five regular paper sessions (presenting 16 papers in total), the 2022 conference followed the tradition of previous editions of special sessions addressing the use of multimedia indexing in specific application areas or specific settings. This year the special sessions (nine papers in total) covered multimedia in clinical applications and for the protection against natural disasters as well as machine learning from multimedia in cases where data is scarce. The program was completed with a poster & demo session, featuring seven posters and two demos.

Participants enjoyed the return of face-to-face discussions at the poster and demo sessions.

The best paper and the best student paper of the conference were each awarded EUR 500, generously sponsored by SIGMM. The selection committee quickly found consensus to award the best paper award to Maria Eirini Pegia, Anastasia Moumtzidou, Ilias Gialampoukidis, Björn Þór Jónsson, Stefanos Vrochidis and Ioannis Kompatsiaris for their paper “BiasUNet: Learning Change Detection over Sentinel-2 Image Pairs”, and the best student paper award to Sara Sarto, Marcella Cornia, Lorenzo Baraldi and Rita Cucchiara for their paper “Retrieval-Augmented Transformer for Image Captioning”. The authors of the best papers were invited to submit an extended version to the IEEE Transactions on Multimedia journal.

Best student paper award for Sara Sarto, presented by Werner Bailer.
Best paper award for Maria Eirini Pegia and Björn Þór Jónsson, presented by Georges Quénot.

Handling the hybrid setting

As a platform for the online part of the conference, an online event using GoTo Webinar has been created. The aim was still to have all presentations and Q&A live, however, speakers were asked to provide a backup video of their talk (which was only used in one case). The poster and demo session was a particular challenge in the hybrid setting. In order to allow all participants to see the contributions in the best setting, all contributions were both presented as printed posters on-site and as a short video online. After discussions took place on-site in front of the posters and demos, a Q&A session connecting the conference room and the remote presenters took place to enable also discussions with the online presenters.

Social events

Getting back to at least hybrid conferences also means having the long missed opportunities to discuss and exchange with both well-known colleagues and first-time attendees during coffee breaks and over lunch and dinner. In addition to a conference dinner on the second evening, the government of the state of Styria, of which Graz is the capital, hosted a reception for the participants in the beautiful setting of the historic Orangerie in the gardens of Graz castle. The participants had the opportunity to enjoy a guided tour through Graz on their way to the reception.

Concert by François Pineau-Benois (violin), Olga Cepovecka (piano) and Dorottya Standi (cello).

A special event was the Music meets Science concert, with the support of SIGMM. This is already the fourth concert which has been presented in the framework of the CBMI conference (2007, 2018, 2021, 2022). After a long conference day, the participants could enjoy works by Schubert and Haydn, Austrian composers which gave an aspect of local Austrian culture to the event. Reflecting the international spirit of CBMI, the concert was given by a trio of very talented young musicians with international careers from three different countries. We thank SIGMM for its support which made this cultural event happen. 

Matthias Rüther, director of JOANNEUM RESEARCH DIGITAL, welcomes the conference participants at the reception

Outlook

The next edition of CBMI will be organised in September 2023 in Orleans, France. While it is likely that the hybrid setting is here to stay for the near future, we hope that the share of participants on site will move back towards the pre-pandemic level.

Diversity and Inclusion in focus at ACM IMX ’22 and MMSys ’22

The 13th ACM Multimedia Systems Conference (and its associated workshops: MMVE 2022, NOSSDAV 2022, and GameSys 2022) took place from the 14th – 17th of June 2022 in Athlone, Ireland.  The week after, the ACM International Conference on Interactive Media Experiences took place in Aveiro, Portugal from the 22nd – 24th of June. Both conferences are strongly committed to creating a diverse, inclusive and accessible forum to discuss the latest research on multimedia systems and the technology experiences they enable and have been actively working towards this goal over the last number of years.
While this is challenging in itself, demanding systematic and continuous efforts at various levels, the worldwide COVID-19 pandemic introduced even more challenges. As it has repeatedly been coined (and shown), restrictions due to the COVID-19 pandemic have had a significant impact on many scholars, such as female academics [1,2], caregivers [3], young scientists [4] and may have exacerbated existing inequalities [5], despite the increased participation possibilities introduced by fully online conferences.
The diversity and inclusion chairs of both IMX and MMSys were therefore highly motivated to adopt a set of measures aimed at stimulating the inclusion of underrepresented groups, offering various possibilities for participation, and raising awareness of diversity (and implications of a lack of diversity) for community development and research activities.

Relevant support and activities

With the generous support from the ACM Special Interest Group on Multimedia (SIGMM) and ACM, the provided support at MMSys’22 and IMX’22 included the following:

  • SIGMM student travel grants:  any student member of SIGMM is eligible to apply for such a grant, however, the students who are the first author of an accepted paper (in any track/workshop) are particularly encouraged to apply. The grants can cover any travel expenses such as airfare/shuttle, hotel and meals (but not conference registration fees).
  • SIGMM carer grants: the carer grants are intended to allow SIGMM members to fully engage with the online event or attend in person. These grants are intended to cover extra costs to help with caring responsibilities — for example, childcare at home or at the destination — which would otherwise limit your participation in the conference.
  • SIGMM-sponsored Equality Diversity and Inclusion (EDI) travel grants: these grants aim to support researchers who self-identify as marginalized and/or underrepresented in the MMSys community  (e.g., scholars who come from non-WEIRD – Western, Educated, Industrialized, Rich, Developed – societies). The EDI grants have also been used to support researchers who lack other/own funding opportunities, as well as scholars from relevant yet underrepresented research areas.
  • Paper mentoring: this instrument was primarily aimed at those who are new to submitting an academic paper. In particular, those in circumstances which are particularly adverse, like for example those for whom English is a second language or those who are authoring a particularly novel submission which may require additional input, could apply for paper mentoring. 

In addition to the above measures, MMSys’22 also offered excellent mentoring activities for both PhD students and postdocs and more advanced researchers. The PhD mentoring was organized by the doctoral consortium chairs Patrick Le Callet and Carsten Griwodz and PhD students had the possibility to give a short pitch about their PhD research, have discussions with the MMSys’22 mentors and wider community, and have a 1 on 1 in-person talk with their assigned mentor. The postdoc mentoring was organized by Pablo Cesar and Irena Orsolic. Postdocs in the MMSys community were invited to give a lightning talk about their research and were invited to a dedicated networking lunch with other members of the MMSys community. 
IMX’ 22 on the other hand, featured an open application process for program committee membership and an active reasonable adjustment policy to ensure that registration fees are not preventing people from attending the conference. In addition, undergraduate and graduate students, as well as early-career researchers could also apply for travel support from the SIGCHI Gary Marsden travel awards and PhD students could benefit from interaction with and feedback from peers and senior researchers in the Doctoral Consortium. Finally, both for MMSys and IMX, participants had to actively agree with the ACM Policy Against Discrimination and Harassment.

Activities at the conference

At the conference, additional activities were organized to raise awareness, increase understanding, foster experience sharing and especially also trigger reflection about diversity and inclusion. MMSys ’22 featured a panel on  “Designing Inclusivity in Technologies“. Inclusive Design is an approach used in many sectors to try and allow everyone to experience our services and products in an equitable way. One of the ways we could do this is by celebrating diversity in how we design and take into account the different barriers faced by different communities across the globe. The panel brought together experts to discuss what inclusive design looks like for them, the charms of the communities they work with, the challenges they face in designing with and for them and how other communities can learn from the methods they have used in order to build a more inclusive world that benefits all of us. 
The panellists were:

  • Veronica Orvalho: Professor at Porto University’s Instituto de Telecomunicações and the Founder/CEO of Didimo – a platform that enables users to generate digital humans.
  • Nitesh Goyal: Leads research on Responsible AI tools at Google Research.
  • Kellie Morrissey: Researcher & Lecturer at the University of Limerick’s School of Design.

IMX ’22 featured a panel discussion on “Diversity in the Metaverse”. The Metaverse is a hot topic, which has many people wondering both what it is, and more importantly, what it will look like in the future for immersive media experiences. As a unique space for social interaction, engagement and connection, it’s essential that we address the importance of representation and accessibility during its time of infancy. The discussion intended not only to cover the current scenario in virtual and augmented reality worlds, but also the consequences and challenges of building a diverse Metaverse by taking into account design, content, marketing, and the various barriers faced by different communities across the globe.

The panel was moderated by  Tara Collingwoode-Williams  (Goldsmiths University) and had four panellists to discuss topics related to research and practice around “Diversity and Inclusive design in the Metaverse”:

  • Nina Salomons – (Filmmaker, diversity advocate and XR consultant, XRDI, AnomieXR co-founder UK – London)
  • Micaela Mantegna – (TED Fellow. Video Games Policy/Artificial intelligence, creativity & copyright Professor. AI, XR and Metaverse researcher. BKC Harvard Affiliate. Diversity & Inclusion advocate. Founder of Women In Games, Argentina – Greater Buenos Aires) 
  • Krystal Cooper -( Unity : Emerging Products – Professional Artistry / Virtual production * Spatial Computing * XR researcher * , USA – LA)
  • Mmuso Mafisa – (XR consultant, Veza Interactive and Venture Chain Capital, SA – Johannesburg Metropolitan Area)

Short testimonials by two of the EDI grant beneficiaries

Soonbin Lee is a PhD student at Sungkyunkwan University (SKKU) in Korea, who would not have been able to attend MMsys ’22 without the SIGMM support (due to a lack of other funding opportunities). Soonbin wrote a short testimonial.

“The conference consisted of the presentation of a keynote and regular sessions by various speakers. In particular, with the advent of cloud gaming, there are many presentations, including: streaming systems specialized in game videos; haptic media for realistic viewing; and humanoid robots that can empathize with humans. During the conference, I enjoyed the spectacular views of Ireland and the wonderful traditional cuisine that was included in the conference program. Along with the presentations during the regular sessions, demo sessions were also presented. Participants from the industry, including Qualcomm, Fraunhofer FOKUS, INRIA, and TNO, were engaged during the MMSys demo sessions. Being able to participate offered also an excellent opportunity to witness the outcomes of real-time systems, including user-interactive VR games, holographic cube matching instructions, and a mobile-based deep learning video codec decoding demo. I was also able to hear the presentations of various PhD research proposals, and it was very impressive to see many PhD students present their interesting research.

At the MMSys conference, there were also a number of social events, like Viking boat and beer-brewing in Ireland, so I was able to meet with other researchers and get to know them better. This was an amazing experience for me because it is not easy to meet the researchers in person. On the last day, I gave a presentation at the NOSSDAV session on the compression processing of MPEG Immersive Video (MIV). Through this discussion and the Q&A, I was able to learn more about the most recent trends in research. 
More importantly, I made many friends who studied with the same interests. I had a fantastic chance and a wonderful experience meeting other scholars in person. The MMSys Conference was a really impressive conference for me. With the travel grant, I fully enjoyed this opportunity!”

Postdoctoral researcher Alan Guedes also wrote a short reflection:
“I am a researcher from the Brazilian multimedia community, especially concentrated at the WebMedia event (http://webmedia.org.br). Although my community is considerably large and active, it has little presence at ACM events. This lack prevents the visibility of our research and possible international collaboration. In 2022, I was honoured with ACM Diversity and Inclusion Travel Award to attend two ACM SIGMM-supported conferences, namely IMX and MMSys. The events had inspiring presentations and keynotes, which made me energetic about new research directions. Particularly, I had the chance to meet researchers that I only know by their citing names. At these events, I could present some research done in Brazil and collaborate on technical committees and workshops. 

This networking was invaluable and will be essential in my research career. I was also happy to see other Brazilians that, like me, seek to engage and strengthen the bonds of SIGMM and Brazilian communities.”

Final reflections 

Both at IMX and MMSys, there were various actions and initiatives to put EDI-related topics on the agenda and to foster diversity and inclusion, both at the community level and in terms of research-related activities. We believe that a key success factor in this respect is the fact that there are valuable support mechanisms offered by the ACM and SIGMM, allowing the IMX and MMSys communities to continuously and systematically have goals related to equality, diversity and inclusion on the agenda, e.g., by removing participation barriers (e.g., by having adjusted prices depending on the country of the attendees), triggering awareness, providing a forum for under-represented voices and/or regions (e.g., focused workshops at IMX focusing on Asia (2016, 2017), Latin America (2020), .., supported by the SIGCHI Development Fund).

Based on our experiences, it is also important that defined actions and measures are based on a good understanding of the key problems. This means that efforts to gain insights into key aspects (e.g., gender balance, numbers on the participation of under-represented groups, …) and developments  over time  are highly valuable. Secondly, it is important that EDI aspects are considered holistically, as they relate to all aspects of the conference, from the beginning until the end, including e.g., the selection of keynote speakers, the matter of who is represented in the technical committees (e.g., have an open call for associate chairs as has been done at IMX since the beginning), or who is represented in the organizing committee, which efforts are done to reach out to relevant communities in various parts of the world that are currently under-represented (e.g., South-America, Afrika,…). Lastly, we need more experience sharing through both formal and informal channels. There is a huge potential to share best practices and experiences both within and between the related conferences and communities to combine our efforts towards a common EDI vision and associated goals. 

References