JPEG Column: 95th JPEG Meeting

JPEG issues a call for proposals for JPEG Fake Media

The 95th JPEG meeting was held online from 25 to 29 April 2022. A Call for Proposals (CfP) was issued for JPEG Fake Media that aims at a standardisation framework for secure annotation of modifications in media assets. With this new initiative, JPEG endeavours to provide standardised means for the identification of the provenance of media assets that include imaging information. Assuring the provenance of the coded information is essential considering the current trends and possibilities on multimedia technology.

Fake Media standardisation aims the identification of image provenance.

This new initiative complements the ongoing standardisation of machine learning based codecs for images and point clouds. Both are expected to revolutionise the state of the art of coding standards, leading to compression rates beyond the current state of the art.

The 95th JPEG meeting had the following highlights:

  • JPEG Fake Media issues a Call for Proposals;
  • JPEG AI
  • JPEG Pleno Point Cloud Coding;
  • JPEG Pleno Light Fields quality assessment;
  • JPEG AIC near perceptual lossless quality assessment;
  • JPEG NFT exploration;
  • JPEG DNA explorations
  • JPEG XS 2nd edition published;
  • JPEG XL 2nd edition.

The following summarises the major achievements of the 95th JPEG meeting.

JPEG Fake Media

At its 95th JPEG meeting, the committee issued a Final Call for Proposals (CfP) on JPEG Fake Media. The scope of JPEG Fake Media is the creation of a standard that can facilitate the secure and reliable annotation of media asset creation and modifications. The standard shall address use cases that are in good faith as well as those with malicious intent. The call for proposals welcomes contributions that address at least one of the extensive list of requirements specified in the associated “Use Cases and Requirements for JPEG Fake Media” document. Proponents are highly encouraged to express their interest in submission of a proposal before 20 July 2022 and submit their final proposal before 19 October 2022. Full details about the timeline, submission requirements and evaluation processes are documented in the CfP available on jpeg.org.

JPEG AI

Following the JPEG AI joint ISO/IEC/ITU-T Call for Proposals issued after the 94th JPEG committee meeting, 14 registrations were received among which 12 codecs were submitted for the standard reconstruction task. For computer vision and image processing tasks, several teams have submitted compressed domain decoders, notably 6 for image classification. Prior to the 95th JPEG meeting, the work was focused on the management of the Call for Proposals submissions and the creation of the test sets and the generation of anchors for standard reconstruction, image processing and computer vision tasks. Moreover, a dry run of the subjective evaluation of the JPEG AI anchors was performed with expert subjects and the results were analysed during this meeting, followed by additions and corrections to the JPEG AI Common Training and Test Conditions and the definition of several recommendations for the evaluation of the proposals, notably, the anchors, images and bitrates selection. A procedure for cross-check evaluation was also discussed and approved. The work will now focus on the evaluation of the Call for Proposals submissions, which is expected to be finalized at the 96th JPEG meeting.

JPEG Pleno Point Cloud Coding

JPEG Pleno is working towards the integration of various modalities of plenoptic content under a single and seamless framework. Efficient and powerful point cloud representation is a key feature within this vision. Point cloud data supports a wide range of applications for human and machine consumption including metaverse, autonomous driving, computer-aided manufacturing, entertainment, cultural heritage preservation, scientific research and advanced sensing and analysis. During the 95th JPEG meeting, the JPEG Committee reviewed the responses to the Final Call for Proposals on JPEG Pleno Point Cloud Coding. Four responses have been received from three different institutions. At the upcoming 96th JPEG meeting, the responses to the Call for Proposals will be evaluated with a subjective quality evaluation and objective metric calculations.

JPEG Pleno Light Field

The JPEG Pleno standard tools provide a framework for coding new imaging modalities derived from representations inspired by the plenoptic function. The image modalities addressed by the current standardization activities are light field, holography, and point clouds, where these image modalities describe different sampled representations of the plenoptic function. Therefore, to properly assess the quality of these plenoptic modalities, specific subjective and objective quality assessment methods need to be designed.

In this context, JPEG has launched a new standardisation effort known as JPEG Pleno Quality Assessment. It aims at providing a quality assessment standard, defining a framework that includes subjective quality assessment protocols and objective quality assessment procedures for lossy decoded data of plenoptic modalities for multiple use cases and requirements. The first phase of this effort will address the light field modality.

To assist this task, JPEG has issued the “JPEG Pleno Draft Call for Contributions on Light Field Subjective Quality Assessment”, to collect new procedures and best practices with regard to light field subjective quality assessment methodologies to assess artefacts induced by coding algorithms. All contributions, which can be test procedures, datasets, and any additional information, will be considered to develop the standard by consensus among the JPEG experts following a collaborative process approach.

The Final Call for Contributions will be issued at the 96th JPEG meeting. The deadline for submission of contributions is 18 December 2022.

JPEG AIC

During the 95th JPEG Meeting, the committee released the Draft Call for Contributions on Subjective Image Quality Assessment.

The new JPEG AIC standard will be developed considering all the submissions to the Call for Contributions in a collaborative process. The deadline for the submission is set for 14 October 2022. Multiple types of contributions are accepted, notably subjective assessment methods including supporting evidence and detailed description, test material, interchange format, software implementation, criteria and protocols for evaluation, additional relevant use cases and requirements, and any relevant evidence or literature.

The JPEG AIC committee has also started the preparation of a workshop on subjective assessment methods for the investigated quality range, which will be held at the end of June. The workshop targets obtaining different views on the problem, and will include both internal and external speakers, as well as a Q&A panel. Experts in the field of quality assessment and stakeholders interested in the use cases are invited.

JPEG NFT

After the joint JPEG NFT and Fake Media workshops it became evident that even though the use cases between both topics are different, there is a significant overlap in terms of requirements and relevant solutions. For that reason, it was decided to create a single AHG that covers both JPEG NFT and JPEG Fake Media explorations. The newly established AHG JPEG Fake Media and NFT will use the JPEG Fake Media mailing list.

JPEG DNA

The JPEG Committee has continued its exploration of the coding of images in quaternary representations, as it is particularly suitable for DNA storage applications. The scope of JPEG DNA is the creation of a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers. A new version of the overview document on DNA-based Media Storage: State-of-the-Art, Challenges, Use Cases and Requirements was issued and has been made publicly available. It was decided to continue this exploration by validating and extending the JPEG DNA benchmark codec to simulate an end-to-end image storage pipeline using DNA for future exploration experiments including biochemical noise simulation. During the 95th JPEG meeting, a new specific document describing the Use Cases and Requirements for DNA-based Media Storage was created which is made publicly available. A timeline for the standardization process was also defined. Interested parties are invited to consider joining the effort by registering to the JPEG DNA AHG mailing list.

JPEG XS

The JPEG Committee is pleased to announce that the 2nd editions of Part 1 (Core coding system), Part 2 (Profiles and buffer models), and Part 3 (Transport and container formats) were published in March 2022. Furthermore, the committee finalized the work on Part 4 (Conformance testing) and Part 5 (Reference software), which are now entering the final phase for publication. With these last two parts, the committee’s work on the 2nd edition of the JPEG XS standards comes to an end, allowing to shift the focus to further improve the standard. Meanwhile, in response to the latest Use Cases and Requirements for JPEG XS v3.1, the committee received a number of technology proposals from Fraunhofer and intoPIX that focus on improving the compression performance for desktop content sequences. The proposals will now be evaluated and thoroughly tested and will form the foundation of the work towards a 3rd edition of the JPEG XS suite of standards. The primary goal of the 3rd edition is to deliver the same image quality as the 2nd edition, but with half of the required bandwidth.

JPEG XL

The second edition of JPEG XL Part 1 (Core coding system), with an improved numerical stability of the edge-preserving filter and numerous editorial improvements, has proceeded to the CD stage. Work on a second edition of Part 2 (File format) was initiated. Hardware coding was also further investigated. Preliminary software support has been implemented in major web browsers, image viewing and editing software, including popular tools such as FFmpeg, ImageMagick, libvips, GIMP, GDK and Qt. JPEG XL is now ready for wide-scale adoption.

Final Quote

“Recent development on creation and modification of visual information call for development of tools that can help protecting the authenticity and integrity of media assets. JPEG Fake Media is a standardised framework to deal with imaging provenance.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Upcoming JPEG meetings are planned as follows:

  • No. 96, will be held online during 25-29 July 2022.

VQEG Column: VQEG Meeting Dec. 2021 (virtual/online)

Introduction

Welcome to a new column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG).
The last VQEG plenary meeting took place from 13 to 17 December 2021, and it was organized online by University of Surrey, UK. During five days, more than 100 participants (from more than 20 different countries of America, Asia, Africa, and Europe) could remotely attend the multiple sessions related to the active VQEG projects, which included more than 35 presentations and interesting discussions. This column provides an overview of this VQEG plenary meeting, while all the information, minutes and files (including the presented slides) from the meeting are available online in the VQEG meeting website.

Group picture of the VQEG Meeting 13-17 December 2021

Many of the works presented in this meeting can be relevant for the SIGMM community working on quality assessment. Particularly interesting can be the new analyses and methodologies discussed within the Statistical Analyses Methods group, the new metrics and datasets presented within the No-Reference Metrics group, and the progress on the plans of the 5G Key Performance Indicators group and the Immersive Media group. We encourage those readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Overview of VQEG Projects

Audiovisual HD (AVHD)

The AVHD group investigates improved subjective and objective methods for analyzing commonly available video systems. In this sense, it has recently completed a joint project between VQEG and ITU SG12 in which 35 candidate objective quality models were submitted and evaluated through extensive validation tests. The result was the ITU-T Recommendation P.1204, which includes three standardized models: a bit-stream model, a reduced reference model, and a hybrid no-reference model. The group is currently considering extensions of this standard, which originally covered H.264, HEVC, and VP9, to include other encoders, such as AV1. Apart from this, two other projects are active under the scope of AVHD: QoE Metrics for Live Video Streaming Applications (Live QoE) and Advanced Subjective Methods (AVHD-SUB).

During the meeting, three presentations related to AVHD activities were provided. In the first one, Mikolaj Leszczuk (AGH University) presented their work on secure and reliable delivery of professional live transmissions with low latency, which brought to the floor the constant need for video datasets, such as the VideoSet. In addition, Andy Quested (ITU-R Working Party 6C) led a discussion on how to assess video quality for very high resolution (e.g., 8K, 16K, 32K, etc.) monitors with interactive applications, which raised the discussion on the key possibility of zooming in to absorb the details of the images without pixelation. Finally, Abhinau Kumar (UT Austin) and Cosmin Stejerean (Meta) presented their work on exploring the reduction of the complexity of VMAF by using features in the wavelet domain [1]. 

Quality Assessment for Health applications (QAH)

The QAH group works on the quality assessment of health applications, considering both subjective evaluation and the development of datasets, objective metrics, and task-based approaches. This group was recently launched and, for the moment, they have been working on a topical review paper on objective quality assessment of medical images and videos, which was submitted in December to Medical Image Analysis [2]. Rafael Rodrigues (Universidade da Beira Interior) and Lucie Lévêque (Nantes Université) presented the main details of this work in a presentation scheduled during the QAH session. The presentation also included information about the review paper published by some members of the group on methodologies for subjective quality assessment of medical images [3] and the efforts in gathering datasets to be listed on the VQEG datasets website. In addition, Lu Zhang (IETR – INSA Rennes) presented her work on model observers for the objective quality assessment of medical images from task-based approaches, considering three tasks: detection, localization, and characterization [4]. In addition, it is worth noting that members of this group are organizing a special session on “Quality Assessment for Medical Imaging” at the IEEE International Conference on Image Processing (ICIP) that will take place in Bordeaux (France) from the 16 to the 19 October 2022.

Statistical Analysis Methods (SAM)

The SAM group works on improving analysis methods both for the results of subjective experiments and for objective quality models and metrics. Currently, they are working on statistical analysis methods for subjective tests, which are discussed in their monthly meetings.

In this meeting, there were four presentations related to SAM activities. In the first one, Zhi Li and Lukáš Krasula (Netflix), exposed the lessons they learned from the subjective assessment test carried out during the development of their metric Contrast Aware Multiscale Banding Index (CAMBI) [5]. In particular, they found that some subjective can have perceptually unbalanced stimuli, which can cause systematic and random errors in the results. In this sense, they explained their statistical data analyses to mitigate these errors, such as the techniques in ITU-T Recommendation P.913 (section 12.6) which can reduce the effects of the random error. The second presentation described the work by Pablo Pérez (Nokia Bell Labs), Lucjan Janowsk (AGH University), Narciso Garcia (Universidad Politécnica de Madrid), and Margaret H. Pinson (NTIA/ITS) on a novel subjective assessment methodology with few observers with repetitions (FOWR) [6]. Apart from the description of the methodology, the dataset generated from the experiments is available on the Consumer Digital Video Library (CDVL). Also, they launched a call for other labs to repeat their experiments, which will help on discovering the viability, scope and limitations of the FOWR method and, if appropriate, include this method in the ITU-T Recommendation P.913 for quasi-experimental assessments when it is not possible to have 16 to 24 subjects (e.g., pre-tests, expert assessment, and resource limitations), for example, performing the experiment with 4 subjects 4 times each on different days, which would be similar to a test with 15 subjects. In the third presentation, Irene Viola (CWI) and Lucjan Janowski (AGH University) presented their analyses on the standardized methods for subject removal in subjective tests. In particular, the methods proposed in the recommendations ITU-R BT.500 and ITU-T P.913 were considered, resulting in that the first one (described in Annex 1 of Part 1) is not recommended for Absolute Category Rating (ACR) tests, while the one described in the second recommendations provides good performance, although further investigation in the correlation threshold used to discard subjects s required. Finally, the last presentation led the discussion on the future activities of SAM group, where different possibilities were proposed, such as the analysis of confidence intervals for subjective tests, new methods for comparing subjective tests from more than two labs, how to extend these results to better understand the precision of objective metrics, and research on crowdsourcing experiment in order to make them more reliable and improve cost-effectiveness. These new activities are discussed in the monthly meetings of the group.

Computer Generated Imagery (CGI)

CGI group focuses on quality analysis of computer-generated imagery, with a focus on gaming in particular. Currently, the group is working on topics related to ITU work items, such as ITU-T Recommendation P.809 with the development of a questionnaire for interactive cloud gaming quality assessment, ITU-T Recommendation P.CROWDG related to quality assessment of gaming through crowdsourcing, ITU-T Recommendation P.BBQCG with a bit-stream based quality assessment of cloud gaming services, and a codec comparison for computer-generated content. In addition, a presentation was delivered during the meeting by Nabajeet Barman (Kingston University/Brightcove), who presented the subjective results related to the work presented at the last VQEG meeting on the use of LCEVC for Gaming Video Streaming Applications [7]. For more information on the related activities, do not hesitate to contact the chairs of the group. 

No Reference Metrics (NORM)

The NORM group is an open collaborative project for developing no-reference metrics for monitoring visual service quality. Currently, two main topics are being addressed by the group, which are discussed in regular online meetings. The first one is related to the improvement of SI/TI metrics to solve ambiguities that have appeared over time, with the objective of providing reference software and updating the ITU-T Recommendation P.910. The second item is related to the addition of standard metadata of video quality assessment-related information in the encoded video streams. 

In this meeting, this group was one of the most active in terms of presentations on related topics, with 11 presentations. Firstly, Lukáš Krasula (Netflix) presented their Contrast Aware Multiscale Banding Index (CAMBI) [5], an objective quality metric that addresses banding degradations that are not detected by other metrics, such as VMAF and PSNR (code is available on GitHub). Mikolaj Leszczuk (AGH University) presented their work on the detection of User-Generated Content (UGC) automatic detection in the wild. Also, Vignesh Menon & Hadi Amirpour (AAU Klagenfurt) presented their open-source project related to the analysis and online prediction of video complexity for streaming applications. Jing Li (Alibaba) presented their work related to the perceptual quality assessment of internet videos [8], proposing a new objective metric (STDAM, for the moment, used internally) validated in the Youku-V1K dataset. The next presentation was delivered by Margaret Pinson (NTIA/ITS) dealing with a comprehensive analysis on why no-reference metrics fail, which emphasized the need of training these metrics on several datasets and test them on larger ones. The discussion also pointed out the recommendation for researchers to publish their metrics in open source in order to make it easier to validate and improve them. Moreover, Balu Adsumilli and Yilin Wang (Youtube) presented a new no-reference metric for UGC, called YouVQ, based on a transfer-learning approach with a pre-train on non-UGC data and a re-train on UGC. This metric will be released in open-source shortly, and a dataset with videos and subjective scores has been also published. Also, Margaret Pinson (NTIA/ITS), Mikołaj Leszczuk (AGH University), Lukáš Krasula (Netflix), Nabajeet Barman (Kingston University/Brightcove), Maria Martini (Kingston University), and Jing Li (Alibaba) presented a collection of datasets for no-reference metric research, while Shahid Satti (Opticom GmbH) exposed their work on encoding complexity for short video sequences. On his side, Franz Götz-Hahn (Universität Konstanz/Universität Kassel) presented their work on the creation of the KonVid-150k video quality assessment dataset [9], which can be very valuable for training no-reference metrics, and the development of objective video quality metrics. Finally, regarding the aforementioned two active topics within NORM group, Ioannis Katsavounidis (Meta) provided a presentation on the advances in relation to the activity related to the inclusion of standard video quality metadata, while Lukáš Krasula (Netflix), Cosmin Stejerean (Meta), and Werner Robitza (AVEQ/TU Ilmenau) presented the updates on the improvement of SI/TI metrics for modern video systems.

Joint Effort Group (JEG) – Hybrid

The JEG group was focused on joint work to develop hybrid perceptual/bitstream metrics and on the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. In this sense, a project in collaboration with Sky was finished and presented in the last VQEG meeting.

Related activities were presented in this meeting. In particular, Enrico Masala and Lohic Fotio Tiotsop (Politecnico di Torino) presented the updates on the recent activities carried out by the group, and their work on artificial-intelligence observers for video quality evaluation [10].

Implementer’s Guide for Video Quality Metrics (IGVQM)

The IGVQM group, whose activity started in the VQEG meeting in December 2020, works on creating an implementer’s guide for video quality metrics. In this sense, the current goal is to create a report on the accuracy of video quality metrics following a test plan based on collecting datasets, collecting metrics and methods for assessment, and carrying out statistical analyses. An update on the advances was provided by Ioannis Katsavounidis (Meta) and a call for the community is open to contribute to this activity with datasets and metrics.

5G Key Performance Indicators (5GKPI)

The 5GKPI group studies relationship between key performance indicators of new communications networks (especially 5G) and QoE of video services on top of them. Currently, the group is working on the definition of relevant use cases, which are discussed on monthly audiocalls. 

In relation to these activities, there were four presentations during this meeting. Werner Robitza (AVQ/TU Ilmenau) presented a proposal for KPI message format for gaming QoE over 5G networks. Also, Pablo Pérez (Nokia Bell Labs) presented their work on a parametric quality model for teleoperated driving [11] and an update of the ITU-T GSTR-5GQoE topic, related to the QoE requirements for real-time multimedia services over 5G networks. Finally, Margaret Pinson (NTIA/ITS) presented an overall description of 5G technology, including differences in spectrum allocation per country impact on the propagation and responsiveness and throughput of 5G devices.

Immersive Media Group (IMG)

The IMG group researches on quality assessment of immersive media. The group recently finished the test plan for quality assessment of short 360-degree video sequences, which resulted in the support for the development of the ITU-T Recommendation P.919. Currently, the group is working on further analyses of the data gathered from the subjective tests carried out for that test plan and on the analysis of data for the quality assessment of long 360-degree videos. In addition, members of the group are contributing to the IUT-T SG12 on the topic G.CMVTQS on computational models for QoE/QoS monitoring to assess video telephony services. Finally, the group is also working on the preparation of a test plan for evaluating the QoE with immersive and interactive communication systems, which was presented by Pablo Pérez (Nokia Bell Labs) and Jesús Gutiérrez (Universidad Politécnica de Madrid). If the reader is interested in this topic, do not hesitate to contact them to join the effort. 

During the meeting, there were also four presentations covering topics related to the IMG topics. Firstly, Alexander Raake (TU Ilmenau) provided an overview of the projects within the AVT group dealing with the QoE assessment of immersive media. Also, Ashutosh Singla (TU Ilmenau) presented a 360-degree video database with higher-order ambisonics spatial audio. Maria Martini (Kingston University) presented an update on the IEEE standardization activities on Human Factors or Visual Experiences (HFVE), such as the recently submitted draft standard on deep-learning-based quality assessment and the draft standard to be submitted shortly on quality assessment of light field content. Finally, Kjell Brunnstöm (RISE) presented their work on legibility in virtual reality, also addressing the perception of speech-to-text by Deaf and hard of hearing.  

Intersector Rapporteur Group on Audiovisual Quality Assessment (IRG-AVQA) and Q19 Interim Meeting

Although in this case there was no official meeting IRG-AVQA meeting, there were various presentations related to ITU activities addressing QoE evaluation topics. In this sense, Chulhee Lee (Yonsei University) presented an overview of ITU-R activities, with a special focus on quality assessment of HDR content, and together with Alexander Raake (TU Ilmenau) presented an update on ongoing ITU-T activities.

Other updates

All the sessions of this meeting and, thus, the presentations, were recorded and have been uploaded to Youtube. Also, it is worth informing that the anonymous FTP will be closed soon, so files and presentations can be accessed from old browsers or via an FTP app. All the files, including those corresponding to the VQEG meetings, will be embedded into the VQEG website over the next months. In addition, the GitHub with tools and subjective labs setup is still online and kept updated. Moreover, during this meeting, it was decided to close the Joint Effort Group (JEG) and the Independent Lab Group (ILG), which can be re-established when needed. Finally, although there were not many activities in this meeting within the Quality Assessment for Computer Vision Applications (QACoViA) and the Psycho-Physiological Quality Assessment (PsyPhyQA) they are still active.

The next VQEG plenary meeting will take place in Rennes (France) from 9 to 13 May 2022, which will be again face-to-face after four online meetings.

References

[1] A. K. Venkataramanan, C. Stejerean, A. C. Bovik, “FUNQUE: Fusion of Unified Quality Evaluators”, arXiv:2202.11241, submitted to the IEEE International Conference on Image Processing (ICIP), 2022. (opens in a new tab).
[2] R. Rodrigues, L. Lévêque, J. Gutiérrez, H. Jebbari, M. Outtas, L. Zhang, A. Chetouani, S. Al-Juboori, M. G. Martini, A. M. G. Pinheiro, “Objective Quality Assessment of Medical Images and Videos: Review and Challenges”, submitted to the Medical Image Analysis, 2022.
[3] L. Lévêque, M. Outtas, L. Zhang, H. Liu, “Comparative study of the methodologies used for subjective medical image quality assessment”, Physics in Medicine & Biology, vol. 66, no. 15, Jul. 2021. (opens in a new tab).
[4] L.Zhang, C.Cavaro-Ménard, P.Le Callet, “An overview of model observers”, Innovation and Research in Biomedical Engineering, vol. 35, no. 4, pp. 214-224, Sep. 2014. (opens in a new tab).
[5] P. Tandon, M. Afonso, J. Sole, L. Krasula, “Comparative study of the methodologies used for subjective medical image quality assessment”, Picture Coding Symposium (PCS), Jul. 2021. (opens in a new tab).
[6] P. Pérez, L. Janowski, N. García, M. Pinson, “Subjective Assessment Experiments That Recruit Few Observers With Repetitions (FOWR)”, IEEE Transactions on Multimedia (Early Access), Jul. 2021. (opens in a new tab).
[7] N. Barman, S. Schmidt, S. Zadtootaghaj, M.G. Martini, “Evaluation of MPEG-5 part 2 (LCEVC) for live gaming video streaming applications”, Proceedings of the Mile-High Video Conference, Mar. 2022. (opens in a new tab).
[8] J. Xu, J. Li, X. Zhou, W. Zhou, B. Wang, Z. Chen, “Perceptual Quality Assessment of Internet Videos”, Proceedings of the ACM International Conference on Multimedia, Oct. 2021. (opens in a new tab).
[9] F. Götz-Hahn, V. Hosu, H. Lin, D. Saupe, “KonVid-150k: A Dataset for No-Reference Video Quality Assessment of Videos in-the-Wild”, IEEE Access, vol. 9, pp. 72139 – 72160, May. 2021. (opens in a new tab).
[10] L. F. Tiotsop, T. Mizdos, M. Barkowsky, P. Pocta, A. Servetti, E. Masala, “Mimicking Individual Media Quality Perception with Neural Network based Artificial Observers”, ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 18, no. 1, Jan. 2022. (opens in a new tab).
[11] P. Pérez, J. Ruiz, I. Benito, R. López, “A parametric quality model to evaluate the performance of tele-operated driving services over 5G networks”, Multimedia Tools and Applications, Jul. 2021. (opens in a new tab).

JPEG Column: 94th JPEG Meeting

IEC, ISO and ITU issue a call for proposals for joint standardization of image coding based on machine learning

The 94th JPEG meeting was held online from 17 to 21 January 2022. A major milestone has been reached at this meeting with the release of the final call for proposals under the JPEG AI project. This standard aims at the joint standardization of the first image coding standard based on machine learning by the IEC, ISO and ITU, offering a single stream, compact compressed domain representation, targeting both human visualization with significant compression efficiency improvement over image coding standards in common use at equivalent subjective quality and effective performance for image processing and computer vision tasks.

The JPEG AI call for proposals was issued in parallel with a call for proposals for point cloud coding based on machine learning. The latter will be conducted in parallel with JPEG AI standardization.

The 94th JPEG meeting had the following highlights:

  • JPEG AI Call for Proposals;
  • JPEG JPEG Pleno Point Cloud Call for Proposals;
  • JPEG Pleno Light Fields quality assessment;
  • JPEG AIC near perceptual lossless quality assessment;
  • JPEG Systems;
  • JPEG Fake Media draft Call for Proposals;
  • JPEG NFT exploration;
  • JPEG XS;
  • JPEG XL
  • JPEG DNA explorations.

The following provides an overview of the major achievements carried out during the 94th JPEG meeting.

JPEG AI

JPEG AI targets a wide range of applications such as cloud storage, visual surveillance, autonomous vehicles and devices, image collection storage and management, live monitoring of visual data and media distribution. The main objective is to design a coding solution that offers significant compression efficiency improvement over coding standards in common use at equivalent subjective quality and an effective compressed domain processing for machine learning-based image processing and computer vision tasks. Other key requirements include hardware/software implementation-friendly encoding and decoding, support for 8- and 10-bit depth, efficient coding of images with text and graphics and progressive decoding.

During the 94th JPEG meeting, several activities toward a JPEG AI learning-based coding standard have occurred, notably the release of the Final Call for Proposals for JPEG AI, consolidated with the definition of the Use Cases and Requirements and the Common Training and Test Conditions to assure a fair and complete evaluation of the future proposals.

The final JPEG AI Call for Proposals marks an important milestone being the first time that contributions are solicited towards a learning-based image coding solution. The JPEG AI proposals’ registration deadline is 25 February 2022. There are three main phases for proponents to submit materials, namely, on 10th March for the proposed decoder implementation with some fixed coding model, on 2nd May for the submission of proposals’ bitstreams and decoded images and/or labels for the test datasets, and on 18th July, for the submission of source code for the encoder, decoder, training procedure and the proposal description. The presentation and discussion of the JPEG AI proposals will occur during the 96th JPEG meeting. JPEG AI is a joint standardization project between IEC, ISO and ITU.

JPEG AI framework

JPEG Pleno Point Cloud Coding

JPEG Pleno is working towards the integration of various modalities of plenoptic content under a single and seamless framework. Efficient and powerful point cloud representation is a key feature of this vision. Point cloud data supports a wide range of applications for human and machine consumption including metaverse, autonomous driving, computer-aided manufacturing, entertainment, cultural heritage preservation, scientific research and advanced sensing and analysis. During the 94th JPEG meeting, the JPEG Committee released a final Call for Proposals on JPEG Pleno Point Cloud Coding. This call addresses learning-based coding technologies for point cloud content and associated attributes with emphasis on both human visualization and decompressed/reconstructed domain 3D processing and computer vision with competitive compression efficiency compared to point cloud coding standards in common use, with the goal of supporting a royalty-free baseline. This Call was released in conjunction with new releases of the JPEG Pleno Point Cloud Use Cases and Requirements and the JPEG Pleno Point Cloud Common Training and Test Conditions. Interested parties are invited to register for this Call by the deadline of the 31st of March 2022.

JPEG Pleno Light Field

Besides defining coding standards, JPEG Pleno is planning for the creation of quality assessment standards, i.e. defining a framework including subjective quality assessment protocols and objective quality assessment measures for lossy decoded data of plenoptic modalities in the context of multiple use cases. The first phase of this effort will address the light field modality and should build on the light field quality assessment tools developed by JPEG in recent years. Future activities will focus on holographic and point cloud modalities, for both of which also coding related standardization efforts have been initiated.

JPEG AIC

During the 94th JPEG Meeting, the first version of the use cases and requirements document was released under the Image Quality Assessment activity. The standardization process was also defined, and the process will be carried out in two phases: during Stage I, a subjective methodology for the assessment of images with visual quality in the range from high quality to near-visually lossless will be standardized, following a collaborative process; successively, in Stage II, an objective image quality metric will be standardized, by means of a competitive process. A tentative timeline has also been planned with a call for contributions for subjective quality assessment methodologies to be released in July 2022, and a call for proposals for an objective quality metric planned in July 2023.

JPEG Systems

JPEG Systems produced the FDIS text for JLINK (ISO/IEC 19566-7), which allows the storage of multiple images inside JPEG files and the interactive navigation between them. This enables features like virtual museum tours, real estate visits, hotspot zoom into other images and many others. For JPEG Snack, the Committee produced the DIS text of ISO/IEC 19566-8, which allows storing multiple images for self-running multimedia experiences like animated image sequences and moving image overlays. Both texts are submitted for respective balloting. For JUMBF (ISO/IEC 19566-5, JPEG Universal Metadata Box Format), a second edition was initiated which combines the first edition and two amendments. Actual extensions are the support of CBOR (Concise Binary Object Representation) and private content types. In addition, JPEG Systems started an activity on a technical report for JPEG extensions mechanisms to facilitate forwards and backwards compatibility under ISO/IEC 19566-9. This technical report gives guidelines for the design of future JPEG standards and summarizes existing design mechanisms.

JPEG Fake Media

At its 94th meeting, the JPEG Committee released a Draft Call for Proposals for JPEG Fake Media and associated Use Cases and Requirements on JPEG Fake Media. These documents are the result of the work performed by the JPEG Fake Media exploration. The scope of JPEG Fake Media is the creation of a standard that can facilitate secure and reliable annotation of media asset creation and modifications. The standard shall address use cases that are both in good faith and those with malicious intent. The Committee targets the following timeline for the next steps in the standardization process:

  • April 2022: issue Final Call for Proposals
  • October 2022: evaluation of proposals
  • January 2023: first Working Draft (WD)
  • January 2024: Draft International Standard (DIS)
  • October 2024: International Standard (IS)

The JPEG Committee welcomes feedback on the JPEG Fake Media documents and invites interested experts to join the JPEG Fake Media AhG mailing list to get involved in this standardization activity.

JPEG NFT

The Ad hoc Group (AhG) on NFT resumed its exploratory work on the role of JPEG in the NFT ecosystem during the 94th JPEG meeting. Three use cases and four essential requirements were selected. The use cases include the usage of NFT for JPEG-based digital art, NFT for collectable JPEGs, and NFT for JPEG micro-licensing. The following categories of critical requirements are under consideration: metadata descriptions, metadata embedding and referencing; authentication and integrity; and the format for registering media assets. As a result, the JPEG Committee published an output document titled JPEG NFT Use Cases and Requirements. Additionally, the third JPEG NFT and Fake Media Workshop proceedings were published, and arrangements were made to hold another combined workshop between the JPEG NFT and JPEG Fake Media groups.

JPEG XS

At the 94th JPEG meeting a new revision of the Use Cases and Requirements for JPEG XS document was produced, as version 3.1, to clarify and improve the requirements of a frame buffer. In addition, the JPEG Committee reports that the second editions of Part 1 (Core coding system), Part 2 (Profiles and buffer models), and Part 3 (Transport and container formats) have been approved and are now scheduled for publication as International Standards. Lastly, the DAM text for Amendment 1 to JPEG XS Part 2, which contains the additional High420.12 profile and a new sublevel at 4 bpp, is ready and will be sent to final balloting for approval.

JPEG XL

JPEG XL Part 4 (Reference software) has proceeded to the FDIS stage. Work continued on the second edition of Part 1 (Core coding system). Core experiments were defined to investigate the numerical stability of the edge-preserving filter and fixed-point implementations. Both Part 1 (core coding system) and Part 2 (file format) are now published as IS, and preliminary support has been implemented in major web browsers, image viewing and editing software. Consequently, JPEG XL is now ready for wide-scale adoption.

JPEG DNA

The JPEG Committee has continued its exploration of the coding of images in quaternary representations, as is particularly suitable for DNA storage. The scope of JPEG DNA is the creation of a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers. A new version of the JPEG DNA overview document was issued and is now publicly available. It was decided to continue this exploration by validating and extending the JPEG DNA experimentation software to simulate an end-to-end image storage pipeline using DNA for future exploration experiments including biochemical noise simulation. During the 94th JPEG meeting, the JPEG DNA committee initiate a new document describing the Common Test Conditions that should be used to evaluate different aspects of image coding for storage on DNA support. It was also decided to prepare an outreach video to explain DNA coding as well as organize the 6th workshop on JPEG DNA with emphasis on the biochemical process noise simulators. Interested parties are invited to consider joining the effort by registering on the mailing list of JPEG DNA AhG.

Final Quote

“JPEG marks a historical milestone with the parallel release of two calls for proposals for learning based coding of images and point clouds,” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Upcoming JPEG meetings are planned as follows:

  • No 95, will be held online during 25-29 April 2022

MPEG Column: 137th MPEG Meeting (virtual/online)

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 137th MPEG meeting was once again held as an online meeting, and the official press release can be found here and comprises the following items:

  • MPEG Systems Wins Two More Technology & Engineering Emmy® Awards
  • MPEG Audio Coding selects 6DoF Technology for MPEG-I Immersive Audio
  • MPEG Requirements issues Call for Proposals for Encoder and Packager Synchronization
  • MPEG Systems promotes MPEG-I Scene Description to the Final Stage
  • MPEG Systems promotes Smart Contracts for Media to the Final Stage
  • MPEG Systems further enhanced the ISOBMFF Standard
  • MPEG Video Coding completes Conformance and Reference Software for LCEVC
  • MPEG Video Coding issues Committee Draft of Conformance and Reference Software for MPEG Immersive Video
  • JVET produces Second Editions of VVC & VSEI and finalizes VVC Reference Software
  • JVET promotes Tenth Edition of AVC to Final Draft International Standard
  • JVET extends HEVC for High-Capability Applications up to 16K and Beyond
  • MPEG Genomic Coding evaluated Responses on New Advanced Genomics Features and Technologies
  • MPEG White Papers
    • Neural Network Coding (NNC)
    • Low Complexity Enhancement Video Coding (LCEVC)
    • MPEG Immersive video

In this column, I’d like to focus on the Emmy® Awards, video coding updates (AVC, HEVC, VVC, and beyond), and a brief update about DASH (as usual).

MPEG Systems Wins Two More Technology & Engineering Emmy® Awards

MPEG Systems is pleased to report that MPEG is being recognized this year by the National Academy for Television Arts and Sciences (NATAS) with two Technology & Engineering Emmy® Awards, for (i) “standardization of font technology for custom downloadable fonts and typography for Web and TV devices and for (ii) “standardization of HTTP encapsulated protocols”, respectively.

The first of these Emmys is related to MPEG’s Open Font Format (ISO/IEC 14496-22) and the second of these Emmys is related to MPEG Dynamic Adaptive Streaming over HTTP (i.e., MPEG DASH, ISO/IEC 23009). The MPEG DASH standard is the only commercially deployed international standard technology for media streaming over HTTP and it is widely used in many products. MPEG developed the first edition of the DASH standard in 2012 in collaboration with 3GPP and since then has produced four more editions amending the core specification by adding new features and extended functionality. Furthermore, MPEG has developed six other standards as additional “parts” of ISO/IEC 23009 enabling the effective use of the MPEG DASH standards with reference software and conformance testing tools, guidelines, and enhancements for additional deployment scenarios. MPEG DASH has dramatically changed the streaming industry by providing a standard that is widely adopted by various consortia such as 3GPP, ATSC, DVB, and HbbTV, and across different sectors. The success of this standard is due to its technical excellence, large participation of the industry in its development, addressing the market needs, and working with all sectors of industry all under ISO/IEC JTC 1/SC 29 MPEG Systems’ standard development practices and leadership.

These are MPEG’s fifth and sixth Technology & Engineering Emmy® Awards (after MPEG-1 and MPEG-2 together with JPEG in 1996, Advanced Video Coding (AVC) in 2008, MPEG-2 Transport Stream in 2013, and ISO Base Media File Format in 2021) and MPEG’s seventh and eighth overall Emmy® Awards (including the Primetime Engineering Emmy® Awards for Advanced Video Coding (AVC) High Profile in 2008 and High-Efficiency Video Coding (HEVC) in 2017).

I have been actively contributing to the MPEG DASH standard since its inception. My initial blog post dates back to 2010 and the first edition of MPEG DASH was published in 2012. A more detailed MPEG DASH timeline provides many pointers to the Institute of Information Technology (ITEC) at the Alpen-Adria-Universität Klagenfurt and its DASH activities that is now continued within the Christian Doppler Laboratory ATHENA. In the end, the MPEG DASH community of contributors to and users of the standards can be very proud of this achievement only after 10 years of the first edition being published. Thus, also happy 10th birthday MPEG DASH and what a nice birthday gift.

Video Coding Updates

In terms of video coding, there have been many updates across various standards’ projects at the 137th MPEG Meeting.

Advanced Video Coding

Starting with Advanced Video Coding (AVC), the 10th edition of Advanced Video Coding (AVC, ISO/IEC 14496-10 | ITU-T H.264) has been promoted to Final Draft International Standard (FDIS) which is the final stage of the standardization process. Beyond various text improvements, this specifies a new SEI message for describing the shutter interval applied during video capture. This can be variable in video cameras, and conveying this information can be valuable for analysis and post-processing of the decoded video.

High-Efficiency Video Coding

The High-Efficiency Video Coding (HEVC, ISO/IEC 23008-2 | ITU-T H.265) standard has been extended to support high-capability applications. It defines new levels and tiers providing support for very high bit rates and video resolutions up to 16K, as well as defining an unconstrained level. This will enable the usage of HEVC in new application domains, including professional, scientific, and medical video sectors.

Versatile Video Coding

The second editions of Versatile Video Coding (VVC, ISO/IEC 23090-3 | ITU-T H.266) and Versatile supplemental enhancement information messages for coded video bitstreams (VSEI, ISO/IEC 23002-7 | ITU-T H.274) have reached FDIS status. The new VVC version defines profiles and levels supporting larger bit depths (up to 16 bits), including some low-level coding tool modifications to obtain improved compression efficiency with high bit-depth video at high bit rates. VSEI version 2 adds SEI messages giving additional support for scalability, multi-view, display adaptation, improved stream access, and other use cases. Furthermore, a Committee Draft Amendment (CDAM) for the next amendment of VVC was issued to begin the formal approval process to enable linking VVC with the Green Metadata (ISO/IEC 23001-11) and Video Decoding Interface (ISO/IEC 23090-13) standards and add a new unconstrained level for exceptionally high capability applications such as certain uses in professional, scientific, and medical application scenarios. Finally, the reference software package for VVC (ISO/IEC 23090-16) was also completed with its achievement of FDIS status. Reference software is extremely helpful for developers of VVC devices, helping them in testing their implementations for conformance to the video coding specification.

Beyond VVC

The activities in terms of video coding beyond VVC capabilities, the Enhanced Compression Model (ECM 3.1) performance over VTM-11.0 + JVET-V0056 (i.e., VVC reference software) shows an improvement of close to 15% for Random Access Main 10. This is indeed encouraging and, in general, these activities are currently managed within two exploration experiments (EEs). The first is on neural network-based (NN) video coding technology (EE1) and the second is on enhanced compression beyond VVC capability (EE2). EE1 currently plans to further investigate (i) enhancement filters (loop and post) and (ii) super-resolution (JVET-Y2023). It will further investigate selected NN technologies on top of ECM 4 and the implementation of selected NN technologies in the software library, for platform-independent cross-checking and integerization. Enhanced Compression Model 4 (ECM 4) comprises new elements on MRL for intra, various GPM/affine/MV-coding improvements including TM, adaptive intra MTS, coefficient sign prediction, CCSAO improvements, bug fixes, and encoder improvements (JVET-Y2025). EE2 will investigate intra prediction improvements, inter prediction improvements, improved screen content tools, and improved entropy coding (JVET-Y2024).

Research aspects: video coding performance is usually assessed in terms of compression efficiency or/and encoding runtime (time complexity). Another aspect is related to visual quality, its assessment, and metrics, specifically for neural network-based video coding technologies.

The latest MPEG-DASH Update

Finally, I’d like to provide a brief update on MPEG-DASH! At the 137th MPEG meeting, MPEG Systems issued a draft amendment to the core MPEG-DASH specification (i.e., ISO/IEC 23009-1) about Extended Dependent Random Access Point (EDRAP) streaming and other extensions which it will be further discussed during the Ad-hoc Group (AhG) period (please join the dash email list for further details/announcements). Furthermore, Defects under Investigation (DuI) and Technologies under Consideration (TuC) are available here.

An updated overview of DASH standards/features can be found in the Figure below.

MPEG-DASH status of January 2021.

Research aspects: in the Christian Doppler Laboratory ATHENA we aim to research and develop novel paradigms, approaches, (prototype) tools and evaluation results for the phases (i) multimedia content provisioning (i.e., video coding), (ii) content delivery (i.e., video networking), and (iii) content consumption (i.e., video player incl. ABR and QoE) in the media delivery chain as well as for (iv) end-to-end aspects, with a focus on, but not being limited to, HTTP Adaptive Streaming (HAS).

The 138th MPEG meeting will be again an online meeting in July 2022. Click here for more information about MPEG meetings and their developments.

Reports from ACM Multimedia System 2021

Introduction

The 12th ACM Multimedia Systems Conference (MMSys’21) happened from September 28th through October 1st, 2021.  The  MMSys conference is an important forum for researchers in multimedia systems. But, due to the ongoing pandemic, the event was held in a hybrid mode – onsite in Istanbul, Turkey, and online. Organizers and chairs (Özgü Alay, Cheng-Hsin Hsu, and  Ali C. Begen) worked very hard to make sure the conference was successful, both for the on-site participants (around 50) and the online participants (with a peak of 330 concurrent viewers).  For a small description of the event, take a look at the text written by Ali Begen, one of the general chairs.
To encourage student authors to participate on-site, SIGMM has sponsored a group of students with Student Travel Grant Awards. Students who wanted to apply for this travel grant needed to submit an online form before the submission deadline. Then, the selection committee chose 7 travel grant winners. The selected students received either 1,000 or 2,000 USD to cover their airline tickets as well accommodation costs for this event. We asked the travel grant winners to share their unique experiences attending MMSys’21. The following are their comments.

Minh Nguyen

It is my honour to receive the SIGMM Student travel award that gives me a golden opportunity to attend the MMSys’2021 conference on-site. This conference is the first one I have attended during the Covid pandemic. I attended the whole conference, and I really appreciate the organizing committee who tried their best to organize this conference in a hybrid mode. It was a very interesting and well-organized conference where many innovative papers were introduced. The venue of the conference is a great place with professional staff and comfortable accommodation and meeting rooms. The local Turkish food attracted me. They were delicious. At this conference, I was happy to meet, connect, and discuss with experts working in multimedia systems, which is close to my PhD thesis. I was interested in informative and passionate keynotes about cutting-edge technologies and their open discussion. Especially, many novel papers motivated me and gave me some ideas for my future work in my PhD thesis. Also, their enjoyable social events brought me a chance to visit Istanbul and experience new things. I look forward to attending future editions of the conference.

Lucas Torrealba A.

I found the conference very interesting. It was my first experience of an in-person conference and it was amazing. The research articles presented seem very relevant to me and the organization did a wonderful job as well. In addition, it seems to be quite a good idea for the future to always leave hybrid ways to participate in the conferences.

Paniz Parastar

The MMsys2021 was my first in-person conference, and since it was highly organized, it raised my expectation of future conferences. Overall, many interesting topics were covered, and I only mentioned a couple of instances here. 
AI/ML are the hot topics as of today. I believe it’s enjoyable to see them applied in the various aspects of multimedia streaming and other areas as well as in computer vision. Notably, I liked the papers in NOSSDAV sessions on the last day of the conference adapting learning methods to improve the QoE of users. Since I’m working on distinguishing IoT devices and their traffics on the network these days, video clustering papers and mainly the paper that classifies the 360 videos from regular ones based on the traffic features (.i.e., flow and packet level features) were educational to me. Also, comparing subjective and objective quality assessment metrics alongside the various network conditions as they do in the paper may not be a new topic, but it is always interesting to explore. 
Plus, one of the most exciting talks for me was ‘Games as a Game Changer’, which was part of the Equality, Diversity, and Inclusion (EDI) Workshop. It changed my perception of games as an entertaining tool that also can help us better understand situations that don’t usually happen in our daily lives.

Ekrem Cetinkaya

MMSys’21 was my first in-person conference experience, and I can gladly say that it was above my expectations. We were welcomed by a fantastic organization, given how difficult the situation was. Everything went so smoothly, from the keynotes to paper presentations to demo sessions, and of course, social events.
Personally, two things were the most impressive for me. First, the keynote by Caitlin Kalinowski (Facebook) was given in person, and she had to fly from the U.S. to Istanbul just for this keynote. Second, the hybrid organization was thought through. There was a team of five whose duty was to make sure the conference was insightful for those who could not make it to Istanbul as well.
Moreover, the social events and the venues were really lovely. I learned that the MMSys community has a long history, and you could feel that, especially in those social events where it was an amicable environment, meaning that it was also easy for me to do some networking. Overall, I can say the MMSys conference was amazing in all aspects without any doubt. I want to thank the SIGMM committee once again for their travel grant, which made this experience possible.

Ivan Bartolec

The ACM MMSys’21 conference held in Istanbul, Turkey, was an excellent opportunity to meet, interact, and discuss ideas with researchers who are working to develop new and engaging multimedia experiences. This was my first MMSys conference, and it was an excellent environment for both learning and networking, with a thoughtfully selected collection of presentations, engaging keynotes (especially the one from a representative of Facebook), and fun social events. I found the sessions based on video or video streaming to be the most interesting and informative for my field of study. The demo sessions concept was also pretty unique, and by being on-site and seeing the demos and asking questions, I learnt a few things about practical implementations that I find incredibly useful. I’m very thankful for the opportunity to present my PhD research as part of the Doctoral symposium and to receive feedback from conference attendees as well as offline comments and ideas via email, which I gladly responded to. It was an absolute pleasure to attend MMSys’21 on-site, courtesy of the Student Travel Grant, and I look forward to visiting future editions of the conference and continuing to interact with the MMSys community.

Jesus Aguilar Armijo

It has been a pleasure to attend MMSys’2021 in person. This would not have been possible without the SIGMM Student travel award.
At the conference, I had the opportunity to attend four keynotes, where I would like to highlight the keynote from Caitlin Kalinowski (Facebook). She presented in person and showed the Virtual Reality devices of her company and future projects with emerging technologies.
I found truly engaging the different sessions of MMSys as they were related to my work in network-assisted video streaming. For example, the NOSSDAV session named “Session #1: Yet Another Streaming Session” contained the paper “Common Media Client Data (CMCD): Initial Findings” which I found especially interesting as I use some features of this standard in my work. Moreover, the paper entitled: “Beyond throughput, the next generation: a 5G dataset with channel and context metrics” (from MMSys’20 but presented in MMSys’21) in the open dataset session was particularly interesting for me as I use their previous dataset with 4G as a radio traces for my last paper.
During the conference, I had the opportunity to discuss and exchange ideas with different researchers, which I found valuable and insightful. I would also like to highlight the good organization of the conference and the social events.
Finally, I presented my work in the Doctoral Symposium session, and I received some interesting questions from the audience. It was a great opportunity, and I am grateful to SIGMM, which allowed me to participate in this extraordinary experience.

Towards an updated understanding of immersive multimedia experiences

Bringing theories and measurement techniques up to date

Development of technology for immersive multimedia experiences

Immersive multimedia experiences, as its name is suggesting are those experiences focusing on media that is able to immerse users with different interactions into an experience of an environment. Through different technologies and approaches, immersive media is emulating a physical world through the means of a digital or simulated world, with the goal of creating a sense of immersion. Users are involved in a technologically driven environment where they may actively join and participate in the experiences offered by the generated world [White Paper, 2020]. Currently, as hardware and technologies are developing further, those immersive experiences are getting better with the more advanced feeling of immersion. This means that immersive multimedia experiences are exceeding just the viewing of the screen and are enabling bigger potential. This column aims to present and discuss the need for an up to date understanding of immersive media quality. Firstly, the development of the constructs of immersion and presence over time will be outlined. Second, influencing factors of immersive media quality will be introduced, and related standardisation activities will be discussed. Finally, this column will be concluded by summarising why an updated understanding of immersive media quality is urgent.

Development of theories covering immersion and presence

One of the first definitions of presence was established by Slater and Usoh already in 1993 and they defined presence as a “sense of presence” in a virtual environment [Slater, 1993]. This is in line with other early definitions of presence and immersion. For example, Biocca defined immersion as a system property. Those definitions focused more on the ability of the system to technically accurately provide stimuli to users [Biocca, 1995]. As technology was only slowly capable to provide systems that are able to generate stimulation to users that can mimic the real world, this was of course the main content of definitions. Quite early on questionnaires to capture the experienced immersion were introduced, such as the Igroup Presence Questionnaire (IPQ) [Schubert, 2001]. Also, the early methods for measuring experiences are mainly focused on aspects of how good the representation of the real world was done and perceived. With maturing technology, the focus was shifted more towards emotions and more cognitive phenomena besides the basics stimulus generation. For example, Baños and colleagues showed that experienced emotion and immersion are in relation to each other and also influence the sense of presence [Baños, 2004]. Newer definitions focus more on these mentioned cognitive aspects, e.g., Nilsson defines three factors that can lead to immersion: (i) technology, (ii) narratives, and (iii) challenges, where only the factor technology is a non-cognitive one [Nilsson, 2016]. In 2018, Slater defines the place illusion as the illusion of being in a place while knowing one is not really there. This is a focus on a cognitive construct, removal of disbelieve, but still leaves the focus of how the illusion is created mainly on system factors instead of cognitive ones [Slater, 2018]. In recent years, more and more activities were started to define how to measure immersive experiences as an overall construct.

Constructs of interest in relation to immersion and presence

This section discusses constructs and activities that are related to immersion and presence. In the beginning, subtypes of extended reality (XR) and the relation to user experience (UX) as well as quality of experience (QoE) are outlined. Afterwards, recent standardization activities related to immersive multimedia experiences are introduced and discussed.
Moreover, immersive multimedia experiences can be divided by many different factors, but recently the most common distinctions are regarding the interactivity where content can be made for multi-directional viewing as 360-degree videos, or where content is presented through interactive extended reality. Those XR technologies can be divided into mixed reality (MR), augmented reality (AR), augmented virtuality (AV), virtual reality (VR), and everything in between [Milgram, 1995]. Through all those areas immersive multimedia experiences have found a place on the market, and are providing new solutions to challenges in research as well as in industries, with a growing potential of adopting into different areas [Chuah, 2018].

While discussing immersive multimedia experiences, it is important to address user experience and quality of immersive multimedia experiences, which can be defined following the definition of quality of experience itself [White Paper, 2012] as a measure of the delight or annoyance of a customer’s experiences with a service, wherein this case service is an immersive multimedia experience. Furthermore, while defining QoE terms experience and application are also defined and can be utilized for immersive multimedia experience, where an experience is an individual’s stream of perception and interpretation of one or multiple events; and application is a software and/or hardware that enables usage and interaction by a user for a given purpose [White Paper 2012].

As already mentioned, immersive media experiences have an impact in many different fields, but one, where the impact of immersion and presence is particularly investigated, is gaming applications along with QoE models and optimizations that go with it. Specifically interesting is the framework and standardization for subjective evaluation methods for gaming quality [ITU-T Rec. P.809, 2018]. This standardization is providing instructions on how to assess QoE for gaming services from two possible test paradigms, i.e., passive viewing tests and interactive tests. However, even though detailed information about the environments, test set-ups, questionnaires, and game selection materials are available those are still focused on the gaming field and concepts of flow and immersion in games themselves.

Together with gaming, another step in defining and standardizing infrastructure of audiovisual services in telepresence, immersive environments, and virtual and extended reality, has been done in regards to defining different service scenarios of immersive live experience [ITU-T Rec. H.430.3, 2018] where live sports, entertainment, and telepresence scenarios have been described. With this standardization, some different immersive live experience scenarios have been described together with architectural frameworks for delivering such services, but not covering all possible use case examples. When mentioning immersive multimedia experience, spatial audio sometimes referred to as “immersive audio” must be mentioned as is one of the key features of especially of AR or VR experiences [Agrawal, 2019], because in AR experiences it can provide immersive experiences on its own, but also enhance VR visual information.
In order to be able to correctly assess QoE or UX, one must be aware of all characteristics such as user, system, content, and context because their actual state may have an influence on the immersive multimedia experience of the user. That is why all those characteristics are defined as influencing factors (IF) and can be divided into Human IF, System IF, and Context IF and are as well standardized for virtual reality services [ITU-T Rec. G.1035, 2021]. Particularly addressed Human IF is simulator sickness as it specifically occurs as a result of exposure to immersive XR environments. Simulator sickness is also known as cybersickness or VR/AR sickness, as it is visually induced motion sickness triggered by visual stimuli and caused by the sensory conflict arising between the vestibular and visual systems. Therefore, to achieve the full potential of immersive multimedia experience, the unwanted sensation of simulation sickness must be reduced. However, with the frequent change of immersive technology, some hardware improvement is leading to better experiences, but a constant updating of requirement specification, design, and development is needed together with it to keep up with the best practices.

Conclusion – Towards an updated understanding

Considering the development of theories, definitions, and influencing factors around the constructs immersion and presence, one can see two different streams. First, there is a quite strong focus on the technical ability of systems in most early theories. Second, the cognitive aspects and non-technical influencing factors gain importance in the new works. Of course, it is clear that in the 1990ies, technology was not yet ready to provide a good simulation of the real world. Therefore, most activities to improve systems were focused on that activity including measurements techniques. In the last few years, technology was fast developing and the basic simulation of a virtual environment is now possible also on mobile devices such as the Oculus Quest 2. Although concepts such as immersion or presence are applicable from the past, definitions dealing with those concepts need to capture as well nowadays technology. Meanwhile, systems have proven to provide good real-world simulators and provide users with a feeling of presence and immersion. While there is already activity in standardization which is quite strong and also industry-driven, research in many research disciplines such as telecommunication are still mainly using old questionnaires. These questionnaires are mostly focused on technological/real-world simulation constructs and, thus, not able to differentiate products and services anymore to an extent that is optimal. There are some newer attempts to create new measurement tools for e.g. social aspects of immersive systems [Li, 2019; Toet, 2021]. Measurement scales aiming at capturing differences due to the ability of systems to create realistic simulations are not able to reliably differentiate different systems due to the fact that most systems are providing realistic real-world simulations. To enhance research and industrial development in the field of immersive media, we need definitions of constructs and measurement methods that are appropriate for the current technology even if the newer measurement and definitions are not as often cited/used yet. That will lead to improved development and in the future better immersive media experiences.

One step towards understanding immersive multimedia experiences is reflected by QoMEX 2022. The 14th International Conference on Quality of Multimedia Experience will be held from September 5th to 7th, 2022 in Lippstadt, Germany. It will bring together leading experts from academia and industry to present and discuss current and future research on multimedia quality, Quality of Experience (QoE), and User Experience (UX). It will contribute to excellence in developing multimedia technology towards user well-being and foster the exchange between multidisciplinary communities. One core topic is immersive experiences and technologies as well as new assessment and evaluation methods, and both topics contribute to bringing theories and measurement techniques up to date. For more details, please visit https://qomex2022.itec.aau.at.

References

[Agrawal, 2019] Agrawal, S., Simon, A., Bech, S., Bærentsen, K., Forchhammer, S. (2019). “Defining Immersion: Literature Review and Implications for Research on Immersive Audiovisual Experiences.” In Audio Engineering Society Convention 147. Audio Engineering Society.
[Biocca, 1995] Biocca, F., & Delaney, B. (1995). Immersive virtual reality technology. Communication in the age of virtual reality, 15(32), 10-5555.
[Baños, 2004] Baños, R. M., Botella, C., Alcañiz, M., Liaño, V., Guerrero, B., & Rey, B. (2004). Immersion and emotion: their impact on the sense of presence. Cyberpsychology & behavior, 7(6), 734-741.
[Chuah, 2018] Chuah, S. H. W. (2018). Why and who will adopt extended reality technology? Literature review, synthesis, and future research agenda. Literature Review, Synthesis, and Future Research Agenda (December 13, 2018).
[ITU-T Rec. G.1035, 2021] ITU-T Recommendation G:1035 (2021). Influencing factors on quality of experience for virtual reality services, Int. Telecomm. Union, CH-Geneva.
[ITU-T Rec. H.430.3, 2018] ITU-T Recommendation H:430.3 (2018). Service scenario of immersive live experience (ILE), Int. Telecomm. Union, CH-Geneva.
[ITU-T Rec. P.809, 2018] ITU-T Recommendation P:809 (2018). Subjective evaluation methods for gaming quality, Int. Telecomm. Union, CH-Geneva.
[Li, 2019] Li, J., Kong, Y., Röggla, T., De Simone, F., Ananthanarayan, S., De Ridder, H., … & Cesar, P. (2019, May). Measuring and understanding photo sharing experiences in social Virtual Reality. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (pp. 1-14).
[Milgram, 1995] Milgram, P., Takemura, H., Utsumi, A., & Kishino, F. (1995, December). Augmented reality: A class of displays on the reality-virtuality continuum. In Telemanipulator and telepresence technologies (Vol. 2351, pp. 282-292). International Society for Optics and Photonics.
[Nilsson, 2016] Nilsson, N. C., Nordahl, R., & Serafin, S. (2016). Immersion revisited: a review of existing definitions of immersion and their relation to different theories of presence. Human Technology, 12(2).
[Schubert, 2001] Schubert, T., Friedmann, F., & Regenbrecht, H. (2001). The experience of presence: Factor analytic insights. Presence: Teleoperators & Virtual Environments, 10(3), 266-281.
[Slater, 1993] Slater, M., & Usoh, M. (1993). Representations systems, perceptual position, and presence in immersive virtual environments. Presence: Teleoperators & Virtual Environments, 2(3), 221-233.
[Toet, 2021] Toet, A., Mioch, T., Gunkel, S. N., Niamut, O., & van Erp, J. B. (2021). Holistic Framework for Quality Assessment of Mediated Social Communication.
[Slater, 2018] Slater, M. (2018). Immersion and the illusion of presence in virtual reality. British Journal of Psychology, 109(3), 431-433.
[White Paper, 2012] Qualinet White Paper on Definitions of Quality of Experience (2012). European Network on Quality of Experience in Multimedia Systems and Services (COST Action IC 1003), Patrick Le Callet, Sebastian Möller and Andrew Perkis, eds., Lausanne, Switzerland, Version 1.2, March 2013.
[White Paper, 2020] Perkis, A., Timmerer, C., Baraković, S., Husić, J. B., Bech, S., Bosse, S., … & Zadtootaghaj, S. (2020). QUALINET white paper on definitions of immersive media experience (IMEx). arXiv preprint arXiv:2007.07032.

Report from ACM Multimedia Systems 2021 by Neha Sharma


Neha Sharma (@NehaSharma) is a PhD student working with Dr Mohamed Hefeeda in Network and Multimedia Systems Lab at Simon Fraser University. Her research interests are in computer vision and machine learning with a focus on next-generation multimedia systems and applications. Her current work focuses on designing an inexpensive hyperspectral camera using a hybrid approach by leveraging both hardware and software solutions. She has been awarded as Best Social Media Reporter of the conference to promote the sharing among researchers on social networks. To celebrate this award, here is a more complete report on the conference.

Being a junior researcher in multimedia systems, I must say I feel proud to be part of this amazing community. I became part of ACM Multimedia Systems Conference (MMSys) last year in 2020, where I published my first research work. I was excited to attend MMSys ’20 in Istanbul, which unfortunately shifted online due to COVID-19. I presented my first work online and got to learn about other researchers in the community. This year I was able to publish another work with my team and got selected to present my ideas and research plans in Doctoral Symposium (thanks to reviewers). MMSys’21 gave me hope to have a full conference experience, as we all were hoping to start our lives back to normal. But, as the conference date was approaching, things were still not clear and travel restrictions were still in place. But on the good note, MMSys ’21 became hybrid to provide an opportunity to the people who can travel. It was at the very end I decided to travel and attend MMSys’21 in person. And I am glad I made that decision. My experience was overwhelmingly rich in terms of learning interesting research findings and making inspiring connections in the community. As the recipient of the “Best Social Media Reporter” award, enjoy the highlights of MMSys’ 21 through my lens. 

In the light of the ongoing global pandemic, ACM MMSys ’21 was held in hybrid mode – onsite in Istanbul, Turkey and online jointly on September 28 – October 1, 2021. Ali C. Begen (Ozyegin University and Networked Media, Turkey) opened the conference onsite with a warm welcome. MMSys’21 became the first-ever hybrid conference where participants presented onsite as well as remotely in real-time. There were participants joining from 38 different countries. The organizing team did an amazing job in pulling off this complex event. This year the research track implemented a two-round submission system, and accepted papers included public reviews in the proceedings. This, however, was not the only first, MMSys ’21 had its first Doctoral Symposium targeting the PhD students and aiming to find their mentors. In addition, there were postponed celebrations for the 30th anniversary of NOSSDAV and the 25th anniversary of Packet Video.

The conference program was very well scheduled. Each day of the conference started with a keynote. There were four insightful and inspiring keynotes from researchers working in cutting edge multimedia technologies. The first day started with a talk titled “AI-Driven Solutions throughout Games’ Lifecycles Leveraging Big Data” by Qiaolin Chen from Tencent IEG Global. Chen discussed how AI and big data are evolving the gaming industry, from intelligent market decisions to data-driven game development. On the second day, Caitlin Kalinowski presented an interesting keynote “Making Impossible Products: How to Get 0-to-1 Products Right”. Caitlin heads the VR Hardware team at Facebook Reality Labs. She shared insights about Oculus and zero-to-one products. The next day, Chris Bregler (Google) talked about “Synthetic Media: New Opportunities and New Challenges”. He discussed recent trends in generative media creation techniques that have opened new possibilities for societally beneficial uses but have also raised concerns about misuse. Last day, Sriram Sethuraman and Deepthi Nandakumar (Amazon) provided insights about “Role of ML in the Prediction of Perceptual Video Quality”. Keynotes are available on youtube to watch on-demand.

This year the conference attracted paper submissions from a range of multimedia topics including immersive media, live video, content preparation, cloud-based and mobile media processing and computer vision systems. Apart from the main research track, MMSys ’21 hosted three workshops:

  • NOSSDAV – Network and Operating System Support for Digital Audio and Video
  • MMVE – Immersive Mixed and Virtual Environment Systems
  • GameSys – Game Systems

These workshops provided an opportunity to meet those who are working in focused areas of multimedia research. This year MMSys conducted the inaugural ACM workshop on Game Systems (GameSys ’21). This workshop attracted research on all aspects of computer/digital games, emphasizing networks, systems, interaction, and applications. Highlights include the work presented by Mark Claypool et. Al (Worcester Polytechnic Institute) which conducts a user study measuring attribute scaling for cloud-based games. 

In addition to area focussed workshops, MMSys’21 also conducted two grand challenges:

Another main highlight of the conference is the EDI (Equality, Diversity and Inclusion) workshop. The workshop was tailored towards PhD students, assistant professors and starting researchers in various research organizations. The event openly discussed core topics about parenthood, work-family policies, career paths and EDI aspects at large. Laura Toni, Mea Wang and Ozgu Alay opened the workshop on the third day of the conference. Miriam Redi shared goals to achieve an equitable and inclusive multimedia community. Susanne Boll talked about the target strategy “25 in 25” to increase the participation of women in SIGMM to at least 25% by 2025. Other guest speakers also highlighted some strategies to achieve target diversity and inclusion in MMSys.

Last but not the least, amazing social events. Each day of the conference ended with a well-planned social event providing a great opportunity to the in-person attendees to meet, discuss, and develop professional and social links throughout the community in a more relaxed setting. We had visited some historical venues like Galata Tower and Adile Sultan Palace and enjoyed a Bosphorus boat tour with a live music band. This year MMSys planned the first inter-continental socials. We travelled from the European side to the Asian side of Istanbul (by bus and by boat). As a token of appreciation, in-person participants received Turkish delights and coffee, a set of traditional towels (peştemal), Istanbul-themed puzzles and a hand-made Kütahya Porcelain vase/coffee set as souvenirs. For me, the best part was sitting together and dining with peers, discussing prospects of your own research or multimedia systems research, in general.

Closing the conference, Ali C. Begen opened with the announcement of the awards. The Best Paper Award was presented to Xiao Zhu et. Al for the paper “Livelyzer: Analyzing the First-Mile Ingest Performance of Live Video Streaming”. See the full list of awards here. The conference closed with the announcement of ACM Multimedia Systems 2022, which will be happening in Athlone, Ireland. Looking forward to seeing everyone again next year.

JPEG Column: 93rd JPEG Meeting

JPEG Committee launches a Call for Proposals on Learning based Point Cloud Coding

The 93rd JPEG meeting was held online from 18 to 22 October 2021. The JPEG Committee continued its work on the development of new standardised solutions for the representation of visual information. Notably, the JPEG Committee has decided to release a new call for proposals on point cloud coding based on machine learning technologies that targets both compression efficiency and effective performance for 3D processing as well as machine and computer vision tasks. This activity will be conducted in parallel with JPEG AI standardization. Furthermore, it was also decided to pursue the development of a new standard in the context of the exploration on JPEG Fake News activity.

JPEG coding framework based in machine learning. The latent representation generated by the AI based coding mechanism can be used for human visualisation, data processing and computer vision tasks.

Considering the response to the Call for Proposals on JPEG Pleno Holography, a first standard for compression of digital holograms has entered its collaborative phase. The response to the call for proposals identified a reliable coding solution for this type of visual information that overcomes the limitations of the state of the art coding solutions for holographic data compression.

The 93rd JPEG meeting had the following highlights:

  • JPEG Pleno Point Cloud Coding draft of the Call for Proposals;
  • JPEG JPEG Pleno Holography;
  • JPEG AI drafts of the Call for Proposals and Common Training and Test Conditions;
  • JPEG Fake Media defines the standardisation timeline;
  • JPEG NFT collects use cases;
  • JPEG AIC explores standardisation of near-visually lossless quality models;
  • JPEG XS new profiles and sub-levels;
  • JPEG XL explores fixed point implementations;
  • JPEG DNA considers image quaternary representations suitable for DNA storage.

The following provides an overview of the major achievements of the 93rd JPEG meeting.

JPEG Pleno Point Cloud Coding

JPEG Pleno is working towards the integration of various modalities of plenoptic content under a single and seamless framework. Efficient and powerful point cloud representation is a key feature within this vision. Point cloud data supports a wide range of applications for human and machine consumption including autonomous driving, computer-aided manufacturing, entertainment, cultural heritage preservation, scientific research and advanced sensing and analysis. During the 93rd JPEG meeting, the JPEG Committee released a Draft Call for Proposals on JPEG Pleno Point Cloud Coding. This call addresses learning-based coding technologies for point cloud content and associated attributes with emphasis on both human visualization and decompressed/reconstructed domain 3D processing and computer vision with competitive compression efficiency compared to point cloud coding standards in common use, with the goal of supporting a royalty-free baseline. A Final Call for Proposals on JPEG Pleno Point Cloud Coding is planned to be released in January 2022.

JPEG Pleno Holography

At its 93rd JPEG meeting, the committee reviewed the response to the Call for Proposals on JPEG Pleno Holography, which is the first standardization effort aspiring to a versatile solution for efficient compression of holograms for a wide range of applications such as holographic microscopy, tomography, interferometry, printing and display and their associated hologram types. The coding technology selected provides excellent rate-distortion performance for lossy coding, in addition, to supporting lossless coding and random access via a space-frequency segmentation approach. The selected technology will serve as a baseline for the standard specification to be developed. This final specification is planned to be published as an international standard in early 2024.

JPEG AI

JPEG AI scope is the creation of a learning-based image coding standard offering a single-stream, compact compressed domain representation, targeting both human visualization with significant compression efficiency improvement over image coding standards in common use at equivalent subjective quality, and effective performance for image processing and computer vision tasks.

During the 93rd JPEG meeting, the JPEG AI project activities were focused on the analysis of the results of the exploration studies as well as refinements and improvements on common training and test conditions, especially the performance assessment of the image classification and super-resolution tasks. A related topic that received much attention was device interoperability which was thoroughly analyzed and discussed. Also, the JPEG AI Third Draft Call for Proposals is now available with improvements on evaluation conditions and proposal composition and requirements. A final call for proposals is expected to be issued at the 94th meeting (17-21 January 2022) and to produce a first Working Draft by October 2022.

JPEG Fake Media

The scope of the JPEG Fake Media exploration is to assess standardization needs to facilitate secure and reliable annotation of media asset creation and modifications in good-faith usage scenarios as well as in those with malicious intent. At the 93rd meeting, the JPEG Committee released an updated version of the “JPEG Fake Media Context, Use Cases and Requirements” document. The new version includes an extended set of definitions and a new section related to threat vectors. In addition, the requirements have been substantially enhanced, in particular those related to media asset authenticity and integrity. Given the progress of the exploration, an initial timeline for the standardization process was proposed:

  • April 2022: Issue call for proposals
  • October 2022: Submission of proposals
  • January 2023: Start standardization process
  • January 2024: Draft International Standard (DIS)
  • October 2024: International Standard (IS)

The JPEG Committee welcomes feedback on the working document and invites interested experts to join the JPEG Fake Media AhG mailing list to get involved in this standardization activity.

JPEG NFT

Non-Fungible Tokens (NFTs) have recently attracted substantial interest. Numerous digital assets associated with NFTs are encoded in existing JPEG formats or can be represented in JPEG-developed current and future representations. Additionally, several trusts and security concerns have been raised about NFTs and the underlying digital assets. The JPEG Committee has established the JPEG NFT exploration initiative to better understand user requirements for media formats. JPEG NFT’s mission is to provide effective specifications that enable various applications that rely on NFTs applied to media assets. The standard shall be secure, trustworthy, and environmentally friendly, enabling an interoperable ecosystem based on NFT within or across applications. The group seeks to engage stakeholders from various backgrounds, including technical, legal, creative, and end-user communities, to develop use cases and requirements. On October 12th, 2021, a second JPEG NFT Workshop was organized in this context. The presentations and video footage from the workshop are now available on the JPEG website. In January 2022, a third workshop will focus on commonalities with the JPEG Fake Media exploration. JPEG encourages interested parties to visit its website frequently for the most up-to-date information and to subscribe to the JPEG NFT Ad Hoc Group’s (AhG) mailing list to participate in this effort.

JPEG AIC

During the 93rd JPEG Meeting, work was initiated on the first draft of a document on use cases and requirements regarding Assessment of Image Coding. The scope of AIC activities was defined to target standards or best practices with respect to subjective and objective image quality assessment methodologies that target a range from high quality to near-visually lossless quality. This is a range of visual qualities where artefacts are not noticeable by an average non-expert viewer without presenting an original reference image but are detectable by a flicker test.

JPEG XS

The JPEG Committee created an updated document “Use Cases and Requirements for JPEG XS V3.0”. It describes new use cases and refines the requirements to allow improving the coding efficiency and to provide additional functionality w.r.t. HDR content, random access and more. In addition, the JPEG XS second editions of Part 1 (Core coding system), Part 2 (Profiles and buffer models), and Part 3 (Transport and container formats) went to the final ballot before ISO publication stage. In the meantime, the Committee continued working on the second editions of Part 4 (Conformance Testing) and Part 5 (Reference Software), which are now ready as Draft International Standards. In addition, the decision was made to create an amendment to Part 2 that will add a High420.12 profile and a new sublevel at 4 bpp, to swiftly address market demands.

JPEG XL

Part 3 (Conformance testing) has proceeded to DIS stage. Core experiments were discussed to investigate hardware coding, in particular fixed-point implementations, and will be continued. Work on a second edition of Part 1 (Core coding system) was initiated. With preliminary support in major web browsers, image viewing and editing software, JPEG XL is ready for wide-scale adoption.

JPEG DNA

The JPEG Committee has continued its exploration of the coding of images in quaternary representations, as is particularly suitable for DNA storage. An important progress in this activity is the implementation of experimentation software to simulate the coding/decoding of images in quaternary code. A thorough explanation of the package has been created, and a wiki for documentation and a link to the code can be found here. A successful fifth workshop on JPEG DNA was held prior to the 93rd JPEG meeting and a new version of the JPEG DNA overview document was issued and is now publicly available. It was decided to continue this exploration by validating and extending the JPEG DNA experimentation software to simulate an end-to-end image storage pipeline using DNA for future exploration experiments, as well as improving the JPEG DNA overview document. Interested parties are invited to consider joining the effort by registering to the mailing list of JPEG DNA.

Final Quote

“Aware of the importance of timely standards in AI-powered imaging applications, the JPEG Committee is moving forward with two concurrent calls for proposals addressing both image and point cloud coding based on machine learning”, said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Upcoming JPEG meetings are planned as follows:

No 94, to be held online during 17-21 January 2022.

Reports from ACM Multimedia 2021

Introduction

Due to the COVID-19, the annual ACM Multimedia Conference (https://2021.acmmm.org) was held in a hybrid mode – onsite in Chengdu, China, and online jointly this year. The organizers have made meticulous preparations for this conference and totally more than 1000 researchers from all over the world participated. 

Besides, there are also AI companies, e.g., Huawei and ByteDance on site trying to attract researchers. It is worth mentioning that in order to prevent the COVID-19, staff and volunteers make a lot of efforts, such as testing the body temperature and providing free masks for attendees.

To encourage student authors to fully engage with the event, SIGMM has sponsored 39 students with Student Travel Grant Awards this year. Students who wanted to apply for this travel grant needed to submit an online form (https://acmsigmm.wufoo.com/forms/sigmm-student-travel-application-form/) before the submission deadline and then the selection committee has chosen the travel grant winners according to selection criteria. The selected students received up to 1000 USD to cover their airline tickets as well accommodation costs for this event. We interviewed some travel grant winners to share their wonderful experience of attending the conference. The following are comments from them.

Students interviewed at ACM Multimedia 2021

Shaoxiang Chen (Fudan University)

It was such a great pleasure to receive the student travel grant and attend the ACM MM 2021 conference in Chengdu. The organizers have devoted a significant amount of effort to ensure the attendees have a nice experience, and in fact, we did. The prepared check-in gifts including masks, an umbrella, and small notebooks were considerate. The onsite covid-19 test was convenient for us to travel back. The keynote talks were closely related to the popular topics in the multimedia community, and I have learned a lot about deep learning and multimodal pre-training. As for the doctoral symposium, I have met excellent PhD students from all over the world and received helpful suggestions from the mentors during my own presentation. Finally, the wonderful performances at the dinner banquet made the entire conference experience even more perfect.

Yuqian Fu (Fudan University)

It is the second time that I attend ACM Multimedia onsite. The first time was in Nice, France in October 2019. That is also a very nice trip. Another thing that I want to share is that I have one long paper accepted by ACM Multimedia in 2020. The conference was supposed to be held in Seattle, USA. However, due to the COVID-19, we had to attend the conference online, which is a big pity. Therefore, it is really a happy thing to participate in this year’s conference in Chengdu. During the conference, I have the opportunity to talk with other researchers face-to-face, and I also presented my work actively to them. I learned a lot in the past few days and had a good experience. Finally, I would like to thank SIGMM for the travel grant, thank the organizers for all the efforts they made to ensure the progress of the conference, and the volunteers for their kind help.

Zheng Wang (Fudan University)

It has been a wonderful experience for me at the ACM Multimedia 2021 in Chengdu this October. Owing to the COVID-19 outbreaks in the past two years, we were so lucky to be together again. Many thanks to the local organizers for their tremendous efforts to hold the conference onsite. At the poster sessions, I was able to present my paper for video moment retrieval to attendances and discuss my idea with them. I could also stop by others’ work, and understanding their work gives me a direct observation about what is going on in the multimedia community. I enjoy the poster session since it helped me know the research trades better. One issue is that the hall for the poster session is relatively crowded, and some walls have two posters arranged one above the other, making the communication a bit inconvenient. In the keynote sessions, I was able to see diverse research areas gathered under the same topic, which let me see a problem from different aspects. As I am in my last PhD year, I could talk with several researchers from university institutions and companies, and I got valuable advice on what should I get prepared for pursuing a career in research or business. Thanks to the local organizers for arranging trips to see cute pandas, which makes visiting Chengdu a delight and unforgettable memory.

Yang Jiao (Fudan University)

It was a great honour to attend the ACM Multimedia in Chengdu this year. This year’s ACM Multimedia is a special conference, for it is the first top conference held onsite since COVID-19. It was the first time that I attended this conference and I enjoyed the academic atmosphere there. I have met a lot of friends with similar research interests as well as famous teachers to share research experiences. What excites me most is the best paper session, where a great number of outstanding works investigate interesting frontier tasks in multimedia society, such as generating music according to visual motion, estimating postures based on one’s speech tune, etc. Moreover, the dinner banquet surprises me a lot. Besides the regular host introduction and dining time, organizers also elaborately prepare wonderful shows as well as a lucky draw. I, fortunately, won the third prize. In summary, thanks for all the efforts of the organizers and excellent talks given by outstanding researchers in this year’s Multimedia. It was a really impressive experience for me!

Yechao Zhang (Huazhong University of Science and Technology (HUST) )

It was such an honour for me to receive the student travel grant. Frankly, I am merely a grad student in my second year in HUST, and it was the first time for me to attend any academic conference ever. The acceptance from ACM Multimedia 2021 is a major inspiration for me, which had inspired me to apply for a PhD program just so I could keep contributing to the academic research in the area of Multimedia in the future. During the conference, I had very much enjoyed my time visiting Chengdu. Apart from the amazing food adventure, I had the most beneficial conversations with researchers from all over the world. All these wonderful experiences would not be possible if there wasn’t for the travel grant from SIGMM. Many thanks for the recognition and support from SIGMM. I sincerely hope ACM Multimedia will gain more international influence.

Jingru Gan (University of Chinese Academy of Sciences)

The ACM Multimedia held this year is an extraordinary conference in terms of the organization and attending experience. I am most impressed by the refined arrangement of hybrid oral sessions which accommodates onsite and online presenters from everywhere on earth. The great importance of this meeting is that it intensifies the bond of researchers from pages of papers to face-to-face meetings. To get a chance of knowing how others go through months of trial and error before achieving a satisfactory result is inspiring, which encourages me to completely dedicate myself to my future work.

Yanqiao Zhu (University of Chinese Academy of Sciences)

Although this was not my first time attending international conferences, my experience at ACM Multimedia 2021 was still very exciting and unforgettable, especially after a long-time travel block due to COVID-19. This year, the diverse program not only makes me feel more connected with the multimedia research community but really broadens my vision. During the conference, I presented my paper on multimedia recommendation, met with many prestigious scholars from both academia and industry, and exchanged many interesting ideas. I believe most of the discussions will spur sparks for future research directions. I also participated in social networking programs, during which I made a lot of friends in related research areas. Overall, it was a great honour for me to receive the SIGMM travel grant that supports me attending ACM Multimedia 2021 physically. I would like to sincerely thank all organizers for their effort in making this year’s ACM Multimedia a great success.

Yudong Wang (University of Electronic Science and Technology of China)

As an undergraduate who received the student travel grant, this is my first time attending an international conference. According to the 2019-nCoV, the attendees onsite are almost Chinese and the room for the poster is a little crowded, but fortunately, people are orderly. At the conference, I stand on my poster and share my work with some researchers in the same field. Apart from that, I talk with some people who work on recommendation algorithms. They help me get to know the other AI application and brand new methods to realize intelligence. I listen to some oral work from a different area of the world and learned a lot about the other field of multimedia. The most impressive thing is the banquet. Although from different schools, the atmosphere among strangers on the table is harmonious. We talk about our daily life in our school and enjoy the performances on the stage. By the way, the gifts prepared for the attendees are surprises. If there are any regrets, it must be that I was not a volunteer to help others and failed to draw a lottery. In summary, thanks to the committee, I had a great experience on ACM Multimedia 2021.

Peidong Liu (Tsinghua University)

I am pleased to attend ACM MM 2021 conference onsite in Chengdu, China. Due to the coronavirus pandemic, the conference adopts a hybrid form, i.e. both onsite and online, to make most of the people participating in the academic exchange. It is noted that this is my first time to attend the onsite international conference in the last few years and I find it more convenient to exchange ideas onsite than online. There are several points worth talking about. First off, this conference utilizes an app called Whova in the procedure of the conference and we can complete personal research interests and affiliated institutions to communicate more conveniently with other researchers. Besides that, volunteers are patient to help us with the check-in process and give us a nice experience at the conference. Finally, thanks to the support from the conference community, I gain the opportunity to communicate with the researchers onsite all around the globe.

Haoyu Zhang (Shandong University)

This was my first time attending an international conference, and I was very happy to participate offline in Chengdu, Sichuan, China. The feeling of participating in the offline conference was something that cannot be experienced online. The volunteers at the conference were very enthusiastic and answered some questions about attending the conference for me. The ACM Multimedia was very caring, prepared many exquisite gifts for each participant, and provided dinner with very local characteristics. The delicious food made me linger. In the daily meeting, I watched and browsed the reports and posters that I was interested in, and had detailed exchanges with the authors, which not only broadened my horizons but also inspired my thinking. In short, I was very honoured to be able to attend this ACM Multimedia conference, and it was a very impressive experience. Finally, I wish the ACM Multimedia better and better.

Summary

Overall, almost everyone has a high evaluation of the experience of participating in this conference. Besides, we can tell that the travel grant does help a lot to the students. To summarize, this conference was held successfully and left a very good impression on the participants.

JPEG Column: 92nd JPEG Meeting

JPEG Committee explores NFT standardisation needs

The 92nd JPEG meeting was held online from 7 to 13 July 2021. This meeting has consolidated JPEG’s exploration on standardisation needs related to Non-Fungible Tokens (NFTs). Recently, there has been a growing interest in the use of NFTs in many applications, notably in the trade of digital art and collectables.

Other notable results of the 92nd JPEG meeting have been the release of an update to the Call for Proposals on JPEG Pleno Holography and an initiative to revisit opportunities for standardisation of image quality assessment methodologies and metrics.

The 92nd JPEG meeting had the following highlights:

  • JPEG NFT exploration;
  • JPEG Fake Media defines context, use cases and requirements;
  • JPEG Pleno Holography call for proposals;
  • JPEG AI prepare Call for Proposals;
  • JPEG AIC explores new quality models;
  • JPEG Systems;
  • JPEG XS;
  • JPEG XL;
  • JPEG DNA.

The following provides an overview of the major achievements of the 92nd JPEG meeting.

JPEG NFT exploration

Recently, Non-Fungible Tokens (NFTs) have garnered considerable interest. Numerous digital assets linked with NFTs are either encoded in existing JPEG formats or can be represented in JPEG-developed current and forthcoming representations. Additionally, various trust and security concerns have been raised about NFTs and the digital assets on which they rely. To better understand user requirements for media formats, the JPEG Committee has launched the JPEG NFT exploration initiative. The mission of JPEG NFT is to provide effective specifications that enable various applications that rely on NFTs applied to media assets. A JPEG NFT standard shall be secure, trustworthy, and eco-friendly, enabling an interoperable ecosystem based on NFTs within or across applications. The committee strives to engage stakeholders from diverse backgrounds, including the technical, legal, artistic, and end-user communities, to establish use cases and requirements. In this context, the first JPEG NFT Workshop was held on July 1st, 2021. The workshop’s presentations and video footage are now accessible on the JPEG website, and a second workshop will be held in the near future. JPEG encourages interested parties to frequently visit its website for the most up-to-date information and to subscribe to the mailing list of the JPEG NFT Ad Hoc Group (AhG) in order to participate in this effort.

JPEG Fake Media

The scope of the JPEG Fake Media exploration is to assess standardisation needs to facilitate secure and reliable annotation of media asset creation and modifications in good-faith usage scenarios as well as in those with malicious intent. At the 92nd meeting, the JPEG Committee released an updated version of the “JPEG Fake Media Context, Use Cases and Requirements” document. This new version includes an improved and extended set of requirements covering three main categories: media creation and modification descriptions, metadata embedding & referencing and authenticity verification. In addition, the document contains several improvements including an extended set of definitions covering key terminologies. The JPEG Committee welcomes feedback to the document and invites interested experts to join the JPEG Fake Media AhG mailing list to get involved in the discussion.

JPEG Pleno

Currently, a Call for Proposals is open for JPEG Pleno Holography, which is the first standardisation effort aspiring to provide a versatile solution for efficient compression of holograms for a wide range of applications such as holographic microscopy, tomography, interferometry, printing, and display, and their associated hologram types. Key desired functionalities include support for both lossy and lossless coding, scalability, random access, and integration within the JPEG Pleno system architecture, with the goal of supporting a royalty-free baseline. In support of this Call for Proposals, a Common Test Conditions document and accompanying software have been released, enabling elaborate stress testing from the rate-distortion, functionality and visual rendering quality perspectives. For the latter, numerical reconstruction software has been released enabling viewport rendering from holographic data. References to software and documentation can be found on the JPEG website.

JPEG Pleno Point Cloud continues to progress towards a Call for Proposals on learning-based point cloud coding solutions with the release at the 92nd JPEG meeting of an updated Use Cases and Requirements document. This document details how the JPEG Committee envisions learning-based point cloud coding solutions meeting the requirements of rapidly emerging use cases in this field. This document continues the focus on solutions supporting scalability and random access while detailing new requirements for 3D processing and computer vision tasks performed in the compressed domain to support emerging applications such as autonomous driving and robotics.

JPEG AI

JPEG AI scope is the creation of a learning-based image coding standard offering a single-stream, compact compressed domain representation, targeting both human visualisation with significant compression efficiency improvement over image coding standards in common use at equivalent subjective quality, and effective performance for image processing and computer vision tasks. At the 92nd JPEG meeting, several activities were carried out towards the launch of the final JPEG AI Call for Proposals. This has included improvements of the training and test conditions for learning-based image coding, especially in the areas of the JPEG AI training dataset, target bitrates, computation of quality metrics, subjective quality evaluation, and complexity assessment. A software package called the JPEG AI objective quality assessment framework, with a reference implementation of all objective quality metrics, has been made available. Moreover, the results of the JPEG AI exploration experiments for image processing and computer vision tasks defined at the previous 91st JPEG meeting were presented and discussed, including their impact on Common Test Conditions.

Moreover, the JPEG AI Use Cases and Requirements were refined with two new core requirements regarding reconstruction reproducibility and hardware platform independence. The second draft of the Call for Proposals was produced and the timeline of the JPEG AI work item was revised. It was decided that the final Call for Proposals will be issued as an outcome of the 94th JPEG Meeting. The deadline for expression of interest and registration is 5 February 2022 and the submission of bitstreams and decoded images for the test dataset are due on 30 April 2022.

JPEG AIC

Image quality assessment remains an essential component in the development of image coding technologies. A new activity has been initiated in the JPEG AIC framework to study the assessment of image coding quality, with particular attention to crowd-sourced subjective evaluation methodologies and image coding at fidelity targets relevant for end-user image delivery on the web and consumer-grade photo archival.

JPEG Systems

JUMBF (ISO/IEC 19566-5 AMD1) and JPEG 360 (ISO/IEC 19566-6 AMD1) are now published standards available through ISO. A request to create the second amendment of JUMBF (ISO/IEC 19566-5) has been produced; this amendment will further extend the functionality to cover use cases and requirements under development in the JPEG Fake Media exploration initiative. The Systems software efforts are progressing on the development of a file parser for most JPEG standards and will include support for metadata within JUMBF boxes. Interested parties are invited to subscribe to the mailing list of the JPEG Systems AhG in order to monitor and contribute to JPEG Systems activities.

JPEG XS

JPEG XS aims at the standardization of a visually lossless low-latency and lightweight compression that can be used as a mezzanine codec in various markets. With the second editions of Part 1 (core coding system), Part 2 (profiles and buffer models), and Part 3 (transport and container formats) under ballot to become International Standards, the work during this JPEG meeting went into the second edition of Part 4 (Conformance Testing) and Part 5 (Reference Software). The second edition primarily brings new coding and signalling capabilities to support raw Bayer sensor content, mathematically lossless coding of images with up to 12 bits per colour component sample, and 4:2:0-sampled image content. In addition, the JPEG Committee continued its initial exploration to study potential future improvements to JPEG XS, while still honouring its low-complexity and low-latency requirements. Among such improvements are better support for high dynamic range (HDR), better support for raw Bayer sensor content, and overall improved compression efficiency. The compression efficiency work also targets improved handling of computer-screen content and artificially-generated rendered content.

JPEG XL

JPEG XL aims at standardization for image coding that offers high compression efficiency, along with features desirable for web distribution and efficient compression of high-quality images. JPEG XL Part 3 (Conformance testing) has been promoted to the Committee Draft stage of the ISO/IEC approval process. New core experiments were defined to investigate hardware-based coding, in particular including fixed-point implementations. With preliminary support in major web browsers, image viewing and manipulation libraries and tools, JPEG XL is ready for wide-scale adoption.

JPEG DNA

The JPEG Committee has continued its exploration of the coding of images in quaternary representations, as is particularly suitable for DNA storage. Two new use cases were identified as well as the sequencing noise models and simulators to use for DNA digital storage. There was a successful presentation of the fourth workshop by the stakeholders, and a new version of the JPEG DNA overview document was issued and is now publicly available. It was decided to continue this exploration by organising the fifth workshop and conducting further outreach to stakeholders, as well as to continue improving the JPEG DNA overview document. Moreover, it was also decided to produce software to simulate an end-to-end image storage pipeline using DNA storage for future exploration experiments. Interested parties are invited to consider joining the effort by registering to the mailing list of JPEG DNA.

Final Quote

“The JPEG Committee is considering standardisation needs for timely and effective specifications that can best support the use of NFTs in applications where media assets can be represented with JPEG formats.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Upcoming JPEG meetings are planned as follows:

  • No 93, to be held online during 18-22 October 2021.
  • No 94, to be held online during 17-21 January 2022.