Students Report from ACM Multimedia 2022

ACM Multimedia 2022 was held in a hybrid format in Lisbon, Portugal from October 10-14, 2022.

This was the first local participation in three years for many participants, as the strict travel restrictions associated with Covid-19 in 2020 and 2021 made it difficult to participate locally by travelling out of the host and neighbouring countries.

In Portugal, the Covid-19 restrictions were almost lifted, and the city was bustling with tourists. Participants were careful to avoid infectious diseases and enjoyed Lisbon’s local wine “Vinho Verde” and cod dishes with their colleagues and engaged in lively discussions about multimedia research.

For many students, this was their first time presenting at an international conference, and it was a wonderful experience.

To encourage student authors to participate on-site, SIGMM has sponsored a group of students with Student Travel Grant Awards. Students who wanted to apply for this travel grant needed to submit an online form before the submission deadline. The selected students received either 1,000 or 2,000 USD to cover their airline tickets as well as accommodation costs for this event. Of the recipients, 25 were able to attend the conference. We asked them to share their unique experience attending ACM Multimedia 2022. In this article, we share their reports of the event.


Xiangming Gu, PhD student, National University of Singapore, Singapore

It is a great honour to receive a SIGMM Student Grant. ACM Multimedia 2022 is my first time attending an academic conference physically. During the conference, I presented my oral paper “MM-ALT: Multimodal Automatic Lyric Transcription”, which was also selected as “Top Rated Papers”. Besides the presentation, I also met a lot of people who shared similar research interests. It was very inspiring to learn from others’ papers and discuss them with the authors directly. Moreover, I was also a volunteer for ACM Multimedia 2022 and attended the session of the 5th International ACM Workshop on Multimedia Content Analysis in Sports. During the session, I learnt how to organize a workshop, which was a great exercise for me. Now, after I come back to Singapore, I still miss the conference. I wish I can get my paper accepted next year and attend the conference again.

Avinash Madasu, Computer Science Master’s student at the University of North Carolina Chapel Hill, USA.

It is my absolute honour to receive the student travel grant for attending the ACM Multimedia 2022 conference. This is the first time I have attended a top AI conference in-person. I enjoyed it a lot during the conference and I was sad that the conference ended quickly. Within the conference days, I was able to attend a lot of oral sessions, keynote talks and poster sessions. I was able to interact with fellow researchers from both academia and industry. I learnt a lot about exciting research going on in my area of interest as well as other areas. It provided a new refreshing experience and I hope to bring this to my research. I presented a poster and felt happy when fellow researchers appreciated my work. Apart from technical details, I was able to forge a lot of new friendships which I truly cherish for my whole life.

Moreno La Quatra, PhD student, Politecnico di Torino

The ACM Multimedia 2022 conference was an amazing experience. After a few years of remote conferences, it was a pleasure to be able to attend the conference in person. I got the opportunity to meet many researchers of different seniorities and backgrounds, and I learned a lot from them. The poster sessions were one of the highlights of the conference. They were a very valuable opportunity to present interesting ideas and explore the details of other researchers’ work. I found the keynotes, presentations, and workshops to be very inspiring and engaging as well. Throughout them, I learned about specific topics and interacted with friendly, passionate researchers from around the world. I would like to thank the ACM Multimedia 2022 organization for the opportunity to attend the conference in Lisbon, all the other volunteers for their friendly and helpful attitude, and the SIGMM Student Travel Grant committee for the financial support.

Sheng-Ming Tang, Master student, National Tsing Hua University, Hsinchu, Taiwan

My name is Sheng-Ming Tang from National Tsing Hua University, Hsinchu, Taiwan. It is a great honour for me to receive the student travel grant. First, I want to thank the committee for organizing this fantastic event. As ACM MM 2022 is my first in-person experience presenting at a conference, I felt a little bit nervous in the first place. However, I started to get comfortable in the conference through the interaction of those astonishing researchers and the volunteers. It was great to not only present in front of the public but also participate in the events. I met a lot of people who solved problems with different and creative approaches, learned brand-new mindsets from the keynote sessions, and gained abundant feedback from the audience, which would boost my research. Thank the committee again for giving me this greatest opportunity to present and share my work in person. I enjoyed a lot during the event.

Tai-Chen Tsai, Graduate student, National Tsing Hua University Taiwan

First, I would like to thank ACM for providing a student travel grant that allowed me to attend the conference. This is my first time presenting my work at a conference. The conference I attended was the interactive art session. I was worried that the setup would be complicated abroad. However, as soon as I arrived at the site, volunteers assisted me with the installation. The conference provided complete hardware resources, allowing me to have a smooth and excellent exhibition experience. Also, I took the opportunity to see many interesting researchers from different countries. The work “Emotional Machines” in the interactive art exhibition surprised me. His system collects and combines what participants are saying and their current emotions. The data is transformed into 360-degree image content in the VR environment through the model so that everyone’s information forms a small universe in the VR environment. The idea is creative.
Additionally, I can chat and discuss projects with published researchers while volunteering at workshops. They shared their lifestyle and work experiences as researchers in European countries, and we discussed what interesting study is and what is not. This is the best reward for me.

Bhalaji Nagarajan, PhD Student, Universitat de Barcelona, Spain

ACM-Multimedia was the first big conference I was able to participate in person after two years of complete virtual participation. I presented my work both as oral and poster presentations at the Workshop on Multimedia-Assisted Dietary Management (MADiMa). It gave me an excellent opportunity to present my work and to get valuable input from reputed pioneers regarding the future scope. It gave me a new dimension and helped in expanding my technical skill set. This was also my first volunteering experience on such a massive scale. It gave me a great learning experience to see and learn how to manage conferences of such a large scale.
I am very happy that I attended the conference in person. I was able to meet new people, and reputed pioneers in the field, learn new things and of course, made some new friends. A big thank you for the SIGMM Travel Grant that allowed me to attend the conference in-person in Lisbon.

Kiruthika Kannan, MS by Research, International Institute of Information Technology, Hyderabad, India. 

My paper on “DrawMon: A Distributed System for Detection of Atypical Sketch Content in Concurrent Pictionary Games” was accepted at the 30th ACM International Conference on Multimedia. It was my first international conference, and I felt honoured to be able to present my research in front of experienced researchers. The conference also exhibited diverse research projects addressing fascinating scientific and technological problems. The poster sessions and talks at the conference improved my knowledge of the research trends in multimedia. In addition to this, I was able to interact with fellow researchers from diverse cultures. It was interesting to hear about their experiences and learn about their work at their institution. As a volunteer at the conference, I witnessed the hard work of the behind the scene organizers and volunteering team to smoothly run the events. I am grateful to the SIGMM Student Travel Grant for supporting my attendance at the ACMMM 22 conference.

Garima Sharma, PhD Student, Department of Human-Centred Computing, Monash University

It was a pleasure to receive a SIGMM travel grant and to attend the ACM Multimedia 2022 conference in person. ACM Multimedia is one of the top conferences in my research area and it was my first in-person conference during my PhD. I had a great experience interacting with numerous researchers and fellow PhD students. Along with all the interesting keynotes, I attended as many oral sessions as possible. Some of these sessions were aligned with my research work and some were outside of my work. This gave me a new research perspective at different levels. Also, working with organisers in a few sessions gave me a whole new experience in managing these events. Overall, I got many insightful comments, suggestions and feedback which motivated me with some interesting directions in my research work. I would like to thank the organisers for making this year’s ACM Multimedia a wonderful experience for every attendee.

Alon Harell, PhD student at the Multimedia Lab at Simon Fraser University

I had the pleasure to receive the SIGMM Student Travel Grant and to attend and volunteer at ACM Multimedia 22 in Lisbon, Portugal. The work I submitted to the conference was done outside of my regular PhD research, and thus without this grant, I would have not been able to participate. The workshop at which I presented, ACM MM Sport 22, was incredibly eye-opening with many fantastic papers, great presentations, and above all great people with which I was able to exchange ideas, form bonds, and perhaps even create future collaborations. The main conference, which coincides more closely with my main research on image and video coding for machines, was just as good. With fascinating talks, some in person, and some virtual, I was exposed to many new ideas (or perhaps just new to me) and learned a great deal. I was also able to benefit from the generosity and experience of Prof.  Chong Wah Ngo from Singapore Management University, during my PhD. Mentor lunch, who shared with me his thoughts on pursuing a career in academia. Overall, ACM Multimedia 22 was an especially unique experience because it was the first in-person conference I was able to attend since the beginning of the COVID-19 pandemic, and being back face-to-face with fellow researchers was a great pleasure.

Lorenzo Vaiani, Ph.D. student (1st year), Politecnico di Torino, Italy

ACM MM 2022 was my first in-person conference. Being able to present my works and discuss them with other participants in person was an incredible experience. I enjoyed every activity, from presentations and posters to workshops and demos. I received excellent feedback and new inspiration to continue my research. The best part was definitely strengthening the bonds with friends I already knew and making more with the amazing people I met there. I learned a lot from all of them. Volunteer activities helped a lot in making these kinds of connections. Thanks to the organizers for this fantastic opportunity and the SIGMM Student Travel Grant committee for the financial support. This edition of ACM MM was just the first for me, but I hope for many more in the future.

Xiaoyu Lin, third-year PhD student at Inria Grenoble, France

It is a great honour to attend ACM MM 2022 in Lisbon. It was a great experience. I have met lots of nice professors and researchers. Discussing with them gave me lots of inspiration on both research directions and career development. I presented my work during the doctoral symposium. I’ve got plenty of useful feedback which can help me to improve our work. During the “Ask Me Anything” lunch, I have the chance to discuss with several senior researchers. They provide me with some kind and very useful advice on how to do research. Besides, I have also served as a volunteer for a workshop. It also helped me to meet other volunteers and made some new friends. Thanks to all the chairs and organizers who have worked hard to make ACM MM 2022 such a wonderful conference.  It’s really an impressive experience!

Zhixin Ma, PhD student, Singapore Management University, Singapore

I would like to thank the ACM Multimedia Committee provided me with the student travel grant so that I can attend the conference in person. ACM Multimedia is the worldwide top conference in the Multimedia field. It provides me with an opportunity to present my work and communicate with the researchers working on this topic of multimedia search.
Besides, the excellent keynotes and passionate panel talk also picture a good vision of future research in the multimedia field. Overall, I must express that ACM MM22 is amazing and well-organized. I again appreciate the ACM MM committee for the student travel grant, which made my attendance possible.

Report from ACM Multimedia 2022 by Nitish Nagesh


Nitish Nagesh (@nitish_nagesh) is a Ph.D. student in the Computer Science department, the University of California, Irvine, USA. He has been awarded as Best Social Media Reporter of ACM Multimedia 2022 conference. To celebrate this award, Nitish Nagesh reported on his wonderful experience at ACM Multimedia 2022 as follows.


I was excited when our paper “World Food Atlas for Food Navigation” was accepted to the Multimedia Assisted Dietary Management Workshop (MADiMA). Being held in conjunction with ACM Multimedia 2022, the premier multimedia conference was the icing on the cake. It being in Lisbon, Portugal was the cherry on top of the cake. It is said that a picture is worth a thousand words. It is fitting to describe a multimedia conference experience report through pictures.

Prof. Ramesh Jain organized an informal meetup at the Choupana Caffe based on the advice of Joao Magalhaes, general chair of ACMMM 2022. It was great to meet researchers working on food computing including Prof. Yoko Yamakata, Prof. Agnieszka, Maija Kale. It was great to also have the company of students and professors from Singapore Management University including Prof. Chong Wah along with Prof. Phoebe Chen. Since this was the first in-person conference for many folks, we had great conversations over waffles, pear salad and watermelon mint juice!

The MADiMA workshop and the Cooking and Eating Activities (CEA) workshop had stellar keynote speakers and presentations about topics ranging from adherence to a mediterranean diet to mental health estimation through food images. 

The workshop was at the Lisbon Congress Center. It was a treat to watch the sun shine brightly on the congress center in the morning and the mellow sunset only a few minutes away near the Tagus Estuary rendering an orangish hue to the red bridge overlooking the train tracks below.

After a great set of presentations, the MADIMA and CEA workshop was drawn to a close with a group picture, of one large family of people who love food and want to help people enjoy food while maintaining their health goals. A huge shout out to the workshop chairs Prof. Stavroula Mougiakakou, Prof. Keiji Yanai, Prof. Dario Allegra and Prof. Yoko Yamakata. (I tried my best to include a photo where everyone looks good!)

All work and no play makes us dull people! And all research with no food makes us hungry people! We had a post-workshop dinner at an authentic Portuguese restaurant. The food was great and it was a delightful evening because of the surprise treat from the professors! 

Prof. Jain’s Ph.D. talk was inspiring as he shared his personal journey that led him to focus on healthcare. He urged students in the multimedia community to pursue multimodal healthcare research as he shared his insights on building a personal health navigator.

I had signed up to be a mentee for the Ph.D. school Ask Me Anything (AMA) session. We asked Prof. Ming Dong questions about his time at graduate school, balancing teaching and research responsibilities, tips on maximizing research output and strategies to cope with rejections. He was candid in his responses and emphasized the need to focus on incremental progress while striving to do impactful research. I must thank Prof. Wei Tsang and other organizers for their leadership in organizing a first-of-its-kind session.

In between running around oral sessions, poster presentations, keynote talks, networking, grabbing lunch, and enjoying Portuguese Tart, we managed to have fun while volunteering. Huge credit to the students and staff (the Rafael’s, the Diogo’s, the David’s, the Gustavo’s, the Pedro’s) from Nova university for doing the heavy lifting to ensure a smooth online, hybrid and in-person experience!

It was a pleasure to watch Prof. Alan Smeaton deliver an inspiring speech about the journey of information retrieval and multimedia. The community congratulates you once again on the Technical Achievement Award – more power to you, Alan!

The highlight of the conference was the grand banquet at Centro Cultural de Belém. There could not have been a better climax to the gala event than the Fado music. One aspect of Fado music symbolizes longing where the spouse sings a melancholy when her partner sets sail on long voyages. It is accompanied by the unique 12 string guitar and is sung very close to the audience to heighten the intimacy. I could fully relate to the artists’ melody and rhythms since I had been longing to see my family and friends back home, whom I have not visited for the past three years due to the pandemic. Another tune described the beauty of Lisbon in superlatives including the sun shining the brightest compared to any other part of the world. There was a happy ending to the tune when the artists recreated the moment of joy after the war was over and everyone was merry again. It reinvigorated a fresh hope and breathed a new lease of life into our cluttered worlds. For once, I was truly present in the moment!

VQEG Column: VQEG Meeting May 2022

Introduction

Welcome to this new column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG), which will provide an overview of the last VQEG plenary meeting that took place from 9 to 13 May 2022. It was organized by INSA Rennes (France), and it was the first face-to-face meeting after the series of online meetings due to the Covid-19 pandemic. Remote attendance was also offered, which made possible that around 100 participants, from 17 different countries, attended the meeting (more than 30 of them attended in person). During the meeting, more than 40 presentations were provided, and interesting discussion took place. All the related information, minutes, and files from the meeting are available online in the VQEG meeting website, and video recordings of the meeting are available in Youtube.

Many of the works presented at this meeting can be relevant for the SIGMM community working on quality assessment. Particularly interesting can be the proposals to update the ITU-T Recommendations P.910 and P.913, as well as the presented publicly available datasets. We encourage those readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Group picture of the VQEG Meeting 9-13 May 2022 in Rennes (France).

Overview of VQEG Projects

Audiovisual HD (AVHD)

The AVHD group investigates improved subjective and objective methods for analyzing commonly available video systems. In this sense, the group continues working on extensions of the ITU-T Recommendation P.1204 to cover other encoders (e.g., AV1) apart from H.264, HEVC, and VP9. In addition, the project’s Quality of Experience (QoE) Metrics for Live Video Streaming Applications (Live QoE) and Advanced Subjective Methods (AVHD-SUB) are still ongoing. 

In this meeting, several AVHD-related topics were discussed, supported by six different presentations. In the first one, Mikolaj Leszczuk (AGH University, Poland) presented an analysis of the influence on the subjective assessment of the quality of video transmission of experiment conditions, such as video sequence order, variation and repeatability that can entail a “learning” process of the test participants during the test. In the second presentation, Lucjan Janowski (AGH University, Poland) presented two proposals towards more ecologically valid experiment designs: the first one using the Absolute Category Rating [1] without scale but in a “think aloud” manner, and the second one called “Your Youtube, our lab” in which the user selects the content that he or she prefers and a question quality appears during the viewing experience through a specifically designed interface. Also dealing with the study of testing methodologies, Babak Naderi (TU-Berlin, Germany) presented work on subjective evaluation of video quality with a crowdsourcing approach, while Pierre David (Capacités, France) presented a three-lab experiment, involving Capacités (France), RISE (Sweden) and AGH University (Poland) on quality evaluation of social media videos. Kjell Brunnström (RISE, Sweden) continued by giving an overview of video quality assessment of Video Assistant Refereeing (VAR) systems, and lastly, Olof Lindman (SVT, Sweden) presented another effort to reduce the lack of open datasets with the Swedish Television (SVT) Open Content.

Quality Assessment for Health applications (QAH)

The QAH group works on the quality assessment of health applications, considering both subjective evaluation and the development of datasets, objective metrics, and task-based approaches. In this meeting, Lucie Lévêque (Nantes Université, France) provided an overview of the recent activities of the group, including a submitted review paper on objective quality assessment for medical images, a special session accepted for IEEE International Conference on Image Processing (ICIP) that will take place in October in Bordeaux (France), and a paper submitted to IEEE ICIP on quality assessment through detection task of covid-19 pneumonia. The work described in this paper was also presented by Meriem Outtas (INSA Rennes, France).

In addition, there were two more presentations related to the quality assessment of medical images. Firstly, Yuhao Sun (University of Edinburgh, UK) presented their research on a no-reference image quality metric for visual distortions on Computed Tomography (CT) scans [2]. Finally, Marouane Tliba (Université d’Orleans, France) presented his studies on quality assessment of medical images through deep-learning techniques using domain adaptation.

Statistical Analysis Methods (SAM)

The SAM group works on improving analysis methods both for the results of subjective experiments and for objective quality models and metrics. The group is currently working on a proposal to update the ITU-T Recommendation P.913, including new testing methods for subjective quality assessment and statistical analysis of the results. Margaret Pinson presented this work during the meeting.   

In addition, five presentations were delivered addressing topics related to the group activities. Jakub Nawała (AGH University, Poland) presented the Generalised Score Distribution to accurately describe responses from subjective quality experiments. Three presentations were provided by members of Nantes Université (France): Ali Ak presented his work on spammer detection on pairwise comparison experiments, Andreas Pastor talked about how to improve the maximum likelihood difference scaling method in order to measure the inter-content scale, and Chama El Majeny presented the functionalities of a subjective test analysis tool, whose code will be publicly available. Finally, Dietmar Saupe (Univerity of Konstanz, Germany) delivered a presentation on subjective image quality assessment with boosted triplet comparisons.

Computer Generated Imagery (CGI)

CGI group is devoted to analyzing and evaluating computer-generated content, with a focus on gaming in particular. Currently, the group is working on the ITU-T Work Item P.BBQCG on Parametric bitstream-based Quality Assessment of Cloud Gaming Services. Apart from this, Jerry (Xiangxu) Yu (University of Texas at Austin, US) presented a work on subjective and objective quality assessment of user-generated gaming videos and Nasim Jamshidi (TUB, Germany) presented a deep-learning bitstream-based video quality model for CG content.

No Reference Metrics (NORM)

The NORM group is an open collaborative project for developing no-reference metrics for monitoring visual service quality. Currently, the group is working on three topics: the development of no-reference metrics, the clarification of the computation of the Spatial and Temporal Indexes (SI and TI, defined in the ITU-T Recommendation P.910), and on the development of a standard for video quality metadata.  

At this meeting, this was one of the most active groups and the corresponding sessions included several presentations and discussions. Firstly, Yiannis Andreopoulos (iSIZE, UK) presented their work on domain-specific fusion of multiple objective quality metrics. Then, Werner Robitza (AVEQ GmbH/TU Ilmenau, Germany) presented the updates on SI/TI clarification activities, which is leading an update of the ITU-T Recommendation P.910. In addition, Lukas Krasula (Netflix, US) presented their investigations on the relation between banding annoyance and the overall quality perceived by the viewers. Hadi Amirpour (University of Klagenfurt, Austria) delivered two presentations related to their Video Complexity Analyzer and their Video Complexity Dataset, which are both publicly available. Finally, Mikołaj Leszczuk (AGH University , Poland) gave two talks on their research related to User-Generated Content (UGC) (a.k.a. in-the-wild video content) recognition and on advanced video quality indicators to characterise video content.   

Joint Effort Group (JEG) – Hybrid

The JEG group was focused on joint work to develop hybrid perceptual/bitstream metrics and gradually evolved over time to include several areas of Video Quality Assessment (VQA), such as the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. A report on the ongoing activities of the group was presented by Enrico Masala (Politecnico di Torino, Italy), which included the release of a new website to reflect the evolution that happened in the last few years within the group. Although currently the group is not directly seeking the development of new metrics or tools readily available for VQA, it is still working on related topics such as the studies by Lohic Fotio Tiotsop (Politecnico di Torino, Italy) on the sensitivity of artificial intelligence-based observers to input signal modification.

5G Key Performance Indicators (5GKPI)

The 5GKPI group studies the relationship between key performance indicators of new 5G networks and QoE of video services on top of them. In this meeting, Pablo Pérez (Nokia, Spain) presented an extended report on the group activities, from which it is worth noting the joint work on a contribution to the ITU-T Work Item G.QoE-5G

Immersive Media Group (IMG)

The IMG group is focused on the research on the quality assessment of immersive media. Currently, the main joint activity of the group is the development of a test plan for evaluating the QoE of immersive interactive communication systems. In this sense, Pablo Pérez (Nokia, Spain) and Jesús Gutiérrez (Universidad Politécnica de Madrid, Spain) presented a follow up on this test plan including an overview of the state-of-the-art on related works and a taxonomy classifying the existing systems [3]. This test plan is closely related to the work carried out by the ITU-T on QoE Assessment of eXtended Reality Meetings, so Gunilla Berndtsson (Ericsson, Sweden) presented the latest advances on the development of the P.QXM.  

Apart from this, there were four presentations related to the quality assessment of immersive media. Shirin Rafiei (RISE, Sweden) presented a study on QoE assessment of an augmented remote operating system for scaling in smart mining applications. Zhengyu Zhang (INSA Rennes, France) gave a talk on a no-reference quality metric for light field images based on deep-learning and exploiting angular and spatial information. Ali Ak (Nantes Université, France) presented a study on the effect of temporal sub-sampling on the accuracy of the quality assessment of volumetric video. Finally, Waqas Ellahi (Nantes Université, France) showed their research on a machine-learning framework to predict Tone-Mapping Operator (TMO) preference based on image and visual attention features [4].

Quality Assessment for Computer Vision Applications (QACoViA)

The goal of the QACoViA group is to study the visual quality requirements for computer vision methods. In this meeting, there were three presentations related to this topic. Mikołaj Leszczuk (AGH University, Poland) presented an objective video quality assessment method for face recognition tasks. Also, Alban Marie  (INSA Rennes, France) showed an analysis of the correlation of quality metrics with artificial intelligence accuracy. Finally, Lucie Lévêque (Nantes Université, France) gave an overview of a study on the reliability of existing algorithms for facial expression recognition [5]. 

Intersector Rapporteur Group on Audiovisual Quality Assessment (IRG-AVQA)

The IRG-AVQA group studies topics related to video and audiovisual quality assessment (both subjective and objective) among ITU-R Study Group 6 and ITU-T Study Group 12. In this sense, Chulhee Lee (Yonsei University, South Korea) and Alexander Raake (TU Ilmenau, Germany) provided an overview on ongoing activities related to quality assessment within ITU-R and ITU-T.

Other updates

In addition, the Human Factors for Visual Experiences (HFVE), whose objective is to uphold the liaison relation between VQEG and the IEEE standardization group P3333.1, presented their advances in relation to two standards: IEEE P3333.1.3 – Deep-Learning-based assessment of VE based on HF, which has been approved and published, and the IEEE P3333.1.4 on Light field imaging, which has been submitted and is in the process to be approved. Also, although there were not many activities in this meeting within the Implementer’s Guide for Video Quality Metrics (IGVQM) and the Psycho-Physiological Quality Assessment (PsyPhyQA) they are still active. Finally, as a reminder, the VQEG GitHub with tools and subjective labs setup is still online and kept updated.

The next VQEG plenary meeting will take place online in December 2022. Please, see VQEG Meeting information page for more information.

References

[1] ITU, “Subjective video quality assessment methods for multimedia applications”, ITU-T Recommendation P.910, Jul. 2022.
[2] Y. Sun, G. Mogos, “Impact of Visual Distortion on Medical Images”, IAENG International Journal of Computer Science, 1:49, Mar. 2022.
[3] P. Pérez, E. González-sosa, J. Gutiérrez, N. García, “Emerging Immersive Communication Systems: Overview, Taxonomy, and Good Practices for QoE Assessment”, Frontiers in Signal Processing, Jul. 2022.
[4] W. Ellahi, T. Vigier, P. Le Callet, “A machine-learning framework to predict TMO preference based on image and visual attention features”, International Workshop on Multimedia Signal Processing, Oct. 2021.
[5] E. M. Barbosa Sampaio, L. Lévêque, P. Le Callet, M. Perreira Da Silva, “Are facial expression recognition algorithms reliable in the context of interactive media? A new metric to analyse their performance”, ACM International Conference on Interactive Media Experiences, Jun. 2022.

JPEG Column: 96th JPEG Meeting

JPEG analyses the responses of the Calls for Proposals for the standardisation of the first codecs based on machine learning

The 96th JPEG meeting was held online from 25 to 29 July 2022. The meeting was one of the most productive in the recent history of JPEG with the analysis of the responses of two Calls for Proposals (CfP) for machine learning-based coding solutions, notably JPEG AI and JPEG Pleno Point Cloud Coding. The superior performance of the CfP responses compared to the state-of-the-art anchors leave little doubt about the future of coding technologies becoming dominated by machine learning-based solutions with the expected consequences on the standardisation pathway. A new era of multimedia coding standardisation has begun. Both activities had defined a verification model, and are pursuing a collaborative process that will select the best technologies for the definition of the new machine learning-based standards.

The 96th JPEG meeting had the following highlights:

JPEG AI and JPEG Pleno Point Cloud, the two first machine learning-based coding standards under development by JPEG.
  • JPEG AI response to the Call for Proposals;
  • JPEG Pleno Point Cloud begins the collaborative standardisation phase;
  • JPEG Fake Media and NFT
  • JPEG Systems
  • JPEG Pleno Light Field
  • JPEG AIC
  • JPEG XS
  • JPEG 2000
  • JPEG DNA

The following summarises the major achievements of the 96th JPEG meeting.

JPEG AI

The 96th JPEG meeting represents an important milestone for the JPEG AI standardisation as it marks the beginning of the collaborative phase of this project. The main JPEG AI objective is to design a solution that offers significant compression efficiency improvement over coding standards in common use at equivalent subjective quality and an effective compressed domain processing for machine learning-based image processing and computer vision tasks. 

During the 96th JPEG meeting, several activities occurred, notably presentation of the eleven responses to all tracks of the Call for Proposals (CfP). Furthermore, discussions on the evaluation process used to assess submissions to the CfP took place, namely, subjective, objective and complexity assessment as well as the identification of device interoperability issues by cross-checking. For the standard reconstruction track, several contributions showed significantly higher compression efficiency in both subjective quality methodologies and objective metrics when compared to the best-performing conventional image coding.

From the analysis and discussion of the results obtained, the most promising technologies were identified and a new JPEG AI verification model under consideration (VMuC) was approved. The VMuC corresponds to a combination of two proponents’ solutions (following the ‘one tool for one functionality’ principle), selected by consensus and considering the CfP decision criteria and factors. In addition, a set of JPEG AI Core Experiments were defined to obtain further improvements in both performance efficiency and complexity, notably the use of learning-based GAN training, alternative analysis/synthesis transforms and an evaluation study for the compressed-domain denoising as an image processing task. Several further activities were also discussed and defined, such as the design of a compressed domain image classification decoder VMuC, the creation of a large screen content dataset for the training of learning-based image coding solutions and the definition of a new and larger JPEG AI test set.

JPEG Pleno Point Cloud begins collaborative standardisation phase

JPEG Pleno integrates various modalities of plenoptic content under a single framework in a seamless manner. Efficient and powerful point cloud representation is a key feature of this vision. A point cloud refers to data representing positions of points in space, expressed in a given three-dimensional coordinate system, the so-called geometry. This geometrical data can be accompanied by per-point attributes of varying nature (e.g. color or reflectance). Such datasets are usually acquired with a 3D scanner, LIDAR or created using 3D design software and can subsequently be used to represent and render 3D surfaces. Combined with other types of data (like light field data), point clouds open a wide range of new opportunities, notably for immersive browsing and virtual reality applications.

Learning-based solutions are the state of the art for several computer vision tasks, such as those requiring a high-level understanding of image semantics, e.g., image classification, face recognition and object segmentation, but also 3D processing tasks, e.g. visual enhancement and super-resolution. Recently, learning-based point cloud coding solutions have shown great promise to achieve competitive compression efficiency compared to available conventional point cloud coding solutions at equivalent subjective quality. Building on a history of successful and widely adopted coding standards, JPEG is well positioned to develop a standard for learning-based point cloud coding.

During its 94th meeting, the JPEG Committee released a Final Call for Proposals on JPEG Pleno Point Cloud Coding. This call addressed learning-based coding technologies for point cloud content and associated attributes with emphasis on both human visualization and decompressed/reconstructed domain 3D processing and computer vision with competitive compression efficiency compared to point cloud coding standards in common use, with the goal of supporting a royalty-free baseline. During its 96th meeting, the JPEG Committee evaluated 5 codecs submitted in response to this Call. Following a comprehensive evaluation process, the JPEG Committee selected one of the proposals to form the basis of a future standard and initialised a sub-division to form Part 6 of ISO/IEC 21794. The selected submission was a learning-based approach to point cloud coding that met the requirements of the Call and showed competitive performance, both in terms of coding geometry and color, against existing solutions.

JPEG Fake Media and NFT

At the 96th JPEG meeting, 6 pre-registrations to the Final Call for Proposals (CfP) on JPEG Fake Media were received. The scope of JPEG Fake Media is the creation of a standard that can facilitate the secure and reliable annotation of media asset creation and modifications. The standard shall address use cases that are in good faith as well as those with malicious intent. The CfP welcomes contributions that address at least one of the extensive list of requirements specified in the associated “Use Cases and Requirements for JPEG Fake Media” document. Proponents who have not yet made a pre-registration are still welcome to submit their final proposal before 19 October 2022. Full details about the timeline, submission requirements and evaluation processes are documented in the CfP available on jpeg.org.

In parallel with the work on Fake Media, JPEG explores use cases and requirements related to Non Fungible Tokens (NFTs). Although the use cases between both topics are different, there is a significant overlap in terms of requirements and relevant solutions. The presentations and video recordings of the joint 5th JPEG NFT and Fake Media Workshop that took place prior to the 96th meeting are available on the JPEG website. In addition, a new version of the “Use Cases and Requirements for JPEG NFT” was produced and made publicly available for review and feedback.

JPEG Systems

During the 96th JPEG Meeting, the IS texts for both JLINK (ISO/IEC 19566-7) and JPEG Snack (ISO/IEC 19566-8) were prepared and submitted for final publication. JLINK specifies a format to store multiple images inside of JPEG files and supports interactive navigation between them. JLINK addresses use cases such as virtual museum tours, real estate visits, hotspot zoom into other images and many others. JPEG Snack on the other hand enables self-running multimedia experiences such as animated image sequences and moving image overlays. Both standards are based on the JPEG Universal Metadata Box Format (JUMBF, ISO/IEC 19566-5) for which a second edition is in progress. This second edition adds extensions to the native support of CBOR (Concise Binary Object Representation) and attaches private fields to the JUMBF Description Box.

JPEG Pleno Light Field

During its 96th meeting, the JPEG Committee released the “JPEG Pleno Second Draft Call for Contributions on Light Field Subjective Quality Assessment”, to collect new procedures and best practices for light field subjective quality evaluation methodologies to assess artefacts induced by coding algorithms. All contributions, which can be test procedures, datasets, and any additional information, will be considered to develop the standard by consensus among JPEG experts following a collaborative process approach. The Final Call for Contributions will be issued at the 97th JPEG meeting. The deadline for submission of contributions is 1 April 2023.

A JPEG Pleno Light Field AhG has also started the preparation of a first workshop on Subjective Light Field Quality Assessment and a second workshop on Learning-based Light field Coding, to exchange experiences, to present technological advances and research results on light field subjective quality assessment and to present technological advances and research results on learning-based coding solutions for light field data, respectively.

JPEG AIC

During its 96th meeting, a Second Draft Call for Contributions on Subjective Image Quality Assessment was issued. The final Call for Contributions is now planned to be issued at the 97th JPEG meeting. The standardization process will be collaborative from the very beginning, i.e. all submissions will be considered in developing the next extension of the JPEG AIC standard. The deadline for submissions has been extended to 1 April 2023 at 23:59 UTC. Multiple types of contributions are accepted, namely subjective assessment methods including supporting evidence and detailed description, test material, interchange format, software implementation, criteria and protocols for evaluation, additional relevant use cases and requirements, and any relevant evidence or literature. A dataset of sample images with compression-based distortions in the target quality range is planned to be prepared for the 97th JPEG meeting.

JPEG XS

With the 2nd edition of JPEG XS now in place, the JPEG Committee continues with the development of the 3rd edition of JPEG XS Part 1 (Core coding system) and Part 2 (Profiles and buffer models). These editions will address new use cases and requirements for JPEG XS by defining additional coding tools to further improve the coding efficiency, while keeping the low-latency and low-complexity core aspects of JPEG XS. The primary goal of the 3rd edition is to deliver the same image quality as the 2nd edition, but for specific content such as screen content with half of the required bandwidth. In this respect, experiments have indicated that it is possible to increase the quality in static regions of an image sequence by more than 10dB when compared to the 2nd edition. Based on the input contributions, a first working draft for 21122-1 has been created, along with the necessary core experiments for further evaluation and verification.

In addition, JPEG has finalized the work on the amendment for Part 2 2nd edition that defines a new High 4:2:0 profile and the new sublevel Sublev4bpp. This amendment is now ready for publication by ISO. In the context of Part 4 (Conformance testing) and Part 5 (Reference software), the JPEG Committee decided to make both parts publicly available.

Finally, the JPEG Committee decided to create a series of public documents, called the “JPEG XS in-depth series” that will explain various features and applications of JPEG XS to a broad audience. The first document in this series explains the advantages of using JPEG XS for raw image compression and will be published soon on jpeg.org.

JPEG 2000

The JPEG Committee published a case study that compares HT2K, ProRes and JPEG 2000 Part 1 when processing motion picture content with widely available commercial software tools running on notebook computers, available at https://ds.jpeg.org/documents/jpeg2000/wg1n100269-096-COM-JPEG_Case_Study_HTJ2K_performance_on_laptop_desktop_PCs.pdf

JPEG 2000 is widely used in the media and entertainment industry for Digital Cinema distribution, studio video masters and broadcast contribution links. High Throughput JPEG 2000 (HTJ2K or JPEG 2000 Part 15) is an update to JPEG 2000 that provides an order of magnitude speed up over legacy JPEG 2000 Part 1.

JPEG DNA

The JPEG Committee has continued its exploration of the coding of images in quaternary representations, as it is particularly suitable for DNA storage applications. The scope of JPEG DNA is the creation of a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers. During the 96th JPEG meeting, a new version of the overview document on Use Cases and Requirements for DNA-based Media Storage was issued and has been made publicly available. The JPEG Committee also updated two additional documents: the JPEG DNA Benchmark Codec and the JPEG DNA Common Test Conditions in order to allow for concrete exploration experiments to take place. This will allow further validation and extension of the JPEG DNA benchmark codec to simulate an end-to-end image storage pipeline using DNA and in particular, include biochemical noise simulation which is an essential element in practical implementations. A new branch has been created in the JPEG Gitlab that now contains two anchors and two JPEG DNA benchmark codecs.

Final Quote

“After successful calls for contributions, the JPEG Committee sets precedence by launching the collaborative phase of two learning based visual information coding standards, hence announcing the start of a new era in coding technologies relying on AI.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Upcoming JPEG meetings are planned as follows:

  • No 97, will be held online from 24-28 October 2022.
  • No 98, will be in Sydney, Australia from 14-20 January 2022

MPEG Column: 139th MPEG Meeting (virtual/online)

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 139th MPEG meeting was once again held as an online meeting, and the official press release can be found here and comprises the following items:

  • MPEG Issues Call for Evidence for Video Coding for Machines (VCM)
  • MPEG Ratifies the Third Edition of Green Metadata, a Standard for Energy-Efficient Media Consumption
  • MPEG Completes the Third Edition of the Common Media Application Format (CMAF) by adding Support for 8K and High Frame Rate for High Efficiency Video Coding
  • MPEG Scene Descriptions adds Support for Immersive Media Codecs
  • MPEG Starts New Amendment of VSEI containing Technology for Neural Network-based Post Filtering
  • MPEG Starts New Edition of Video Coding-Independent Code Points Standard
  • MPEG White Paper on the Third Edition of the Common Media Application Format

In this report, I’d like to focus on VCM, Green Metadata, CMAF, VSEI, and a brief update about DASH (as usual).

Video Coding for Machines (VCM)

MPEG’s exploration work on Video Coding for Machines (VCM) aims at compressing features for machine-performed tasks such as video object detection and event analysis. As neural networks increase in complexity, architectures such as collaborative intelligence, whereby a network is distributed across an edge device and the cloud, become advantageous. With the rise of newer network architectures being deployed amongst a heterogenous population of edge devices, such architectures bring flexibility to systems implementers. Due to such architectures, there is a need to efficiently compress intermediate feature information for transport over wide area networks (WANs). As feature information differs substantially from conventional image or video data, coding technologies and solutions for machine usage could differ from conventional human-viewing-oriented applications to achieve optimized performance. With the rise of machine learning technologies and machine vision applications, the amount of video and images consumed by machines has rapidly grown. Typical use cases include intelligent transportation, smart city technology, intelligent content management, etc., which incorporate machine vision tasks such as object detection, instance segmentation, and object tracking. Due to the large volume of video data, extracting and compressing the feature from a video is essential for efficient transmission and storage. Feature compression technology solicited in this Call for Evidence (CfE) can also be helpful in other regards, such as computational offloading and privacy protection.

Over the last three years, MPEG has investigated potential technologies for efficiently compressing feature data for machine vision tasks and established an evaluation mechanism that includes feature anchors, rate-distortion-based metrics, and evaluation pipelines. The evaluation framework of VCM depicted below comprises neural network tasks (typically informative) at both ends as well as VCM encoder and VCM decoder, respectively. The normative part of VCM typically includes the bitstream syntax which implicitly defines the decoder whereas other parts are usually left open for industry competition and research.

Further details about the CfP and how interested parties can respond can be found in the official press release here.

Research aspects: the main research area for coding-related standards is certainly compression efficiency (and probably runtime). However, this video coding standard will not target humans as video consumers but as machines. Thus, video quality and, in particular, Quality of Experience needs to be interpreted differently, which could be another worthwhile research dimension to be studied in the future.

Green Metadata

MPEG Systems has been working on Green Metadata for the last ten years to enable the adaptation of the client’s power consumption according to the complexity of the bitstream. Many modern implementations of video decoders can adjust their operating voltage or clock speed to adjust the power consumption level according to the required computational power. Thus, if the decoder implementation knows the variation in the complexity of the incoming bitstream, then the decoder can adjust its power consumption level to the complexity of the bitstream. This will allow less energy use in general and extended video playback for the battery-powered devices.

The third edition enables support for Versatile Video Coding (VVC, ISO/IEC 23090-3, a.k.a. ITU-T H.266) encoded bitstreams and enhances the capability of this standard for real-time communication applications and services. While finalizing the support of VVC, MPEG Systems has also started the development of a new amendment to the Green Metadata standard, adding the support of Essential Video Coding (EVC, ISO/IEC 23094-1) encoded bitstreams.

Research aspects: reducing global greenhouse gas emissions will certainly be a challenge for humanity in the upcoming years. The amount of data on today’s internet is dominated by video, which all consumes energy from production to consumption. Therefore, there is a strong need for explicit research efforts to make video streaming in all facets friendly to our environment. 

Third Edition of Common Media Application Format (CMAF)

The third edition of CMAF adds two new media profiles for High Efficiency Video Coding (HEVC, ISO/IEC 23008-2, a.k.a. ITU-T H.265), namely for (i) 8K and (ii) High Frame Rate (HFR). Regarding the former, the media profile supporting 8K resolution video encoded with HEVC (Main 10 profile, Main Tier with 10 bits per colour component) has been added to the list of CMAF media profiles for HEVC. The profile will be branded as ‘c8k0’ and will support videos with up to 7680×4320 pixels (8K) and up to 60 frames per second. Regarding the latter, another media profile has been added to the list of CMAF media profiles, branded as ‘c8k1’ and supports HEVC encoded video with up to 8K resolution and up to 120 frames per second. Finally, chroma location indication support has been added to the 3rd edition of CMAF.

Research aspects: basically, CMAF serves two purposes: (i) harmonizing DASH and HLS at the segment format level by adopting the ISOBMFF and (ii) enabling low latency streaming applications by introducing chunks (that are smaller than segments). The third edition supports resolutions up to 8K and HFR, which raises the question of how low latency can be achieved for 8K/HFR applications and services and under which conditions.

New Amendment for Versatile Supplemental Enhancement Information (VSEI) containing Technology for Neural Network-based Post Filtering

At the 139th MPEG meeting, the MPEG Joint Video Experts Team with ITU-T SG 16 (WG 5; JVET) issued a Committee Draft Amendment (CDAM) text for the Versatile Supplemental Enhancement Information (VSEI) standard (ISO/IEC 23002-7, a.k.a. ITU-T H.274). Beyond the Supplemental Enhancement Information (SEI) message for shutter interval indication, which is already known from its specification in Advanced Video Coding (AVC, ISO/IEC 14496-10, a.k.a. ITU-T H.264) and High Efficiency Video Coding (HEVC, ISO/IEC 23008-2, a.k.a. ITU-T H.265), and a new indicator for subsampling phase indication which is relevant for variable-resolution video streaming, this new amendment contains two SEI messages for describing and activating post filters using neural network technology in video bitstreams. This could reduce coding noise, upsampling, colour improvement, or denoising. The description of the neural network architecture itself is based on MPEG’s neural network coding standard (ISO/IEC 15938-17). Results from an exploration experiment have shown that neural network-based post filters can deliver better performance than conventional filtering methods. Processes for invoking these new post-processing filters have already been tested in a software framework and will be made available in an upcoming version of the Versatile Video Coding (VVC, ISO/IEC 23090-3, a.k.a. ITU-T H.266) reference software (ISO/IEC 23090-16, a.k.a. ITU-T H.266.2).

Research aspects: quality enhancements such as reducing coding noise, upsampling, colour improvement, or denoising have been researched quite substantially either with or without neural networks. Enabling such quality enhancements via (V)SEI messages enable system-level support for research and development efforts in this area. For example, integration in video streaming applications or/and conversational services, including performance evaluations.

The latest MPEG-DASH Update

Finally, I’d like to provide a brief update on MPEG-DASH! At the 139th MPEG meeting, MPEG Systems issued a new working draft related to Extended Dependent Random Access Point (EDRAP) streaming and other extensions, which will be further discussed during the Ad-hoc Group (AhG) period (please join the dash email list for further details/announcements). Furthermore, Defects under Investigation (DuI) and Technologies under Consideration (TuC) have been updated. Finally, a new part has been added (ISO/IEC 23009-9), which is called encoder and packager synchronization, for which also a working draft has been produced. Publicly available documents (if any) can be found here.

An updated overview of DASH standards/features can be found in the Figure below.

Research aspects: in the Christian Doppler Laboratory ATHENA we aim to research and develop novel paradigms, approaches, (prototype) tools and evaluation results for the phases (i) multimedia content provisioning (i.e., video coding), (ii) content delivery (i.e., video networking), and (iii) content consumption (i.e., video player incl. ABR and QoE) in the media delivery chain as well as for (iv) end-to-end aspects, with a focus on, but not being limited to, HTTP Adaptive Streaming (HAS). Recent DASH-related publications include “Low Latency Live Streaming Implementation in DASH and HLS” and “Segment Prefetching at the Edge for Adaptive Video Streaming” among others.

The 140th MPEG meeting will be face-to-face in Mainz, Germany, from October 24-28, 2022. Click here for more information about MPEG meetings and their developments.

Report from CBMI 2022

The 19th International Conference on Content-based Multimedia Indexing (CBMI) took place as a hybrid conference in Graz, Austria, from September 14-16, 2022, organized by JOANNEUM RESEARCH and supported by SIGMM. After the 2020 edition was postponed and held as a fully online conference in 2021, this was an important step back to a physical conference. Probably still as an effect of the COVID pandemic, the event was a bit smaller than in previous years, with around 50 participants from 18 countries (13 European countries, the rest from Asia and North America). About 60% were attending on-site, the other via web conference. 

Program highlights

The conference program included two keynotes. The opening keynote by Miriam Redi from Wikimedia analysed the role of multimedia assets in a free knowledge ecosystem such as the one around Wikipedia. The closing keynote by Efstratios Gavves from the University of Amsterdam showcased recent progress in machine learning of dynamic information and causality in a diverse range of application domains and highlighted open research challenges.

With the aim to increase the interaction between the scientific community and the users of multimedia indexing technologies, a panel session titled “Multimedia Indexing and Retrieval Challenges in Media Archives” was organised. The panel featured four distinguished experts from the audiovisual archive domain. Brecht Declerq from meemoo, the Flemish Institute for Archive, is currently the president of FIAT/IFTA, the International Association of TV Archives. Richard Wright started as a researcher in speech processing before he became a renowned expert in digital preservation, setting up a series of successful European projects in the area. Johan Oomen manages the department for Research and Heritage at Beeld en Geluid, the Netherlands Institute of Sound and Vision. Christoph Bauer is an expert from the Multimedia Archive of the Austrian Broadcasting Corporation ORF and consults archives of the Western Balkan countries on digitisation and preservation topics. The panel tried to analyse why only a small part of research outputs makes it into productive use at archives and identified research challenges such as the need for more semantic and contextualised content descriptions, the ability to easily control the amount vs. accuracy of generated metadata and the need for novel paradigms to interact with multimedia collections beyond the textual search box. At the same time, archives face the challenge of dealing with much richer metadata, but without the quality guarantees known from manually documented content.

Panel discussion with Richard Wright, Brecht Declerq, Christoph Bauer and Johan Oomen (online), moderated by Georg Thallinger.

In addition to five regular paper sessions (presenting 16 papers in total), the 2022 conference followed the tradition of previous editions of special sessions addressing the use of multimedia indexing in specific application areas or specific settings. This year the special sessions (nine papers in total) covered multimedia in clinical applications and for the protection against natural disasters as well as machine learning from multimedia in cases where data is scarce. The program was completed with a poster & demo session, featuring seven posters and two demos.

Participants enjoyed the return of face-to-face discussions at the poster and demo sessions.

The best paper and the best student paper of the conference were each awarded EUR 500, generously sponsored by SIGMM. The selection committee quickly found consensus to award the best paper award to Maria Eirini Pegia, Anastasia Moumtzidou, Ilias Gialampoukidis, Björn Þór Jónsson, Stefanos Vrochidis and Ioannis Kompatsiaris for their paper “BiasUNet: Learning Change Detection over Sentinel-2 Image Pairs”, and the best student paper award to Sara Sarto, Marcella Cornia, Lorenzo Baraldi and Rita Cucchiara for their paper “Retrieval-Augmented Transformer for Image Captioning”. The authors of the best papers were invited to submit an extended version to the IEEE Transactions on Multimedia journal.

Best student paper award for Sara Sarto, presented by Werner Bailer.
Best paper award for Maria Eirini Pegia and Björn Þór Jónsson, presented by Georges Quénot.

Handling the hybrid setting

As a platform for the online part of the conference, an online event using GoTo Webinar has been created. The aim was still to have all presentations and Q&A live, however, speakers were asked to provide a backup video of their talk (which was only used in one case). The poster and demo session was a particular challenge in the hybrid setting. In order to allow all participants to see the contributions in the best setting, all contributions were both presented as printed posters on-site and as a short video online. After discussions took place on-site in front of the posters and demos, a Q&A session connecting the conference room and the remote presenters took place to enable also discussions with the online presenters.

Social events

Getting back to at least hybrid conferences also means having the long missed opportunities to discuss and exchange with both well-known colleagues and first-time attendees during coffee breaks and over lunch and dinner. In addition to a conference dinner on the second evening, the government of the state of Styria, of which Graz is the capital, hosted a reception for the participants in the beautiful setting of the historic Orangerie in the gardens of Graz castle. The participants had the opportunity to enjoy a guided tour through Graz on their way to the reception.

Concert by François Pineau-Benois (violin), Olga Cepovecka (piano) and Dorottya Standi (cello).

A special event was the Music meets Science concert, with the support of SIGMM. This is already the fourth concert which has been presented in the framework of the CBMI conference (2007, 2018, 2021, 2022). After a long conference day, the participants could enjoy works by Schubert and Haydn, Austrian composers which gave an aspect of local Austrian culture to the event. Reflecting the international spirit of CBMI, the concert was given by a trio of very talented young musicians with international careers from three different countries. We thank SIGMM for its support which made this cultural event happen. 

Matthias Rüther, director of JOANNEUM RESEARCH DIGITAL, welcomes the conference participants at the reception

Outlook

The next edition of CBMI will be organised in September 2023 in Orleans, France. While it is likely that the hybrid setting is here to stay for the near future, we hope that the share of participants on site will move back towards the pre-pandemic level.

Diversity and Inclusion in focus at ACM IMX ’22 and MMSys ’22

The 13th ACM Multimedia Systems Conference (and its associated workshops: MMVE 2022, NOSSDAV 2022, and GameSys 2022) took place from the 14th – 17th of June 2022 in Athlone, Ireland.  The week after, the ACM International Conference on Interactive Media Experiences took place in Aveiro, Portugal from the 22nd – 24th of June. Both conferences are strongly committed to creating a diverse, inclusive and accessible forum to discuss the latest research on multimedia systems and the technology experiences they enable and have been actively working towards this goal over the last number of years.
While this is challenging in itself, demanding systematic and continuous efforts at various levels, the worldwide COVID-19 pandemic introduced even more challenges. As it has repeatedly been coined (and shown), restrictions due to the COVID-19 pandemic have had a significant impact on many scholars, such as female academics [1,2], caregivers [3], young scientists [4] and may have exacerbated existing inequalities [5], despite the increased participation possibilities introduced by fully online conferences.
The diversity and inclusion chairs of both IMX and MMSys were therefore highly motivated to adopt a set of measures aimed at stimulating the inclusion of underrepresented groups, offering various possibilities for participation, and raising awareness of diversity (and implications of a lack of diversity) for community development and research activities.

Relevant support and activities

With the generous support from the ACM Special Interest Group on Multimedia (SIGMM) and ACM, the provided support at MMSys’22 and IMX’22 included the following:

  • SIGMM student travel grants:  any student member of SIGMM is eligible to apply for such a grant, however, the students who are the first author of an accepted paper (in any track/workshop) are particularly encouraged to apply. The grants can cover any travel expenses such as airfare/shuttle, hotel and meals (but not conference registration fees).
  • SIGMM carer grants: the carer grants are intended to allow SIGMM members to fully engage with the online event or attend in person. These grants are intended to cover extra costs to help with caring responsibilities — for example, childcare at home or at the destination — which would otherwise limit your participation in the conference.
  • SIGMM-sponsored Equality Diversity and Inclusion (EDI) travel grants: these grants aim to support researchers who self-identify as marginalized and/or underrepresented in the MMSys community  (e.g., scholars who come from non-WEIRD – Western, Educated, Industrialized, Rich, Developed – societies). The EDI grants have also been used to support researchers who lack other/own funding opportunities, as well as scholars from relevant yet underrepresented research areas.
  • Paper mentoring: this instrument was primarily aimed at those who are new to submitting an academic paper. In particular, those in circumstances which are particularly adverse, like for example those for whom English is a second language or those who are authoring a particularly novel submission which may require additional input, could apply for paper mentoring. 

In addition to the above measures, MMSys’22 also offered excellent mentoring activities for both PhD students and postdocs and more advanced researchers. The PhD mentoring was organized by the doctoral consortium chairs Patrick Le Callet and Carsten Griwodz and PhD students had the possibility to give a short pitch about their PhD research, have discussions with the MMSys’22 mentors and wider community, and have a 1 on 1 in-person talk with their assigned mentor. The postdoc mentoring was organized by Pablo Cesar and Irena Orsolic. Postdocs in the MMSys community were invited to give a lightning talk about their research and were invited to a dedicated networking lunch with other members of the MMSys community. 
IMX’ 22 on the other hand, featured an open application process for program committee membership and an active reasonable adjustment policy to ensure that registration fees are not preventing people from attending the conference. In addition, undergraduate and graduate students, as well as early-career researchers could also apply for travel support from the SIGCHI Gary Marsden travel awards and PhD students could benefit from interaction with and feedback from peers and senior researchers in the Doctoral Consortium. Finally, both for MMSys and IMX, participants had to actively agree with the ACM Policy Against Discrimination and Harassment.

Activities at the conference

At the conference, additional activities were organized to raise awareness, increase understanding, foster experience sharing and especially also trigger reflection about diversity and inclusion. MMSys ’22 featured a panel on  “Designing Inclusivity in Technologies“. Inclusive Design is an approach used in many sectors to try and allow everyone to experience our services and products in an equitable way. One of the ways we could do this is by celebrating diversity in how we design and take into account the different barriers faced by different communities across the globe. The panel brought together experts to discuss what inclusive design looks like for them, the charms of the communities they work with, the challenges they face in designing with and for them and how other communities can learn from the methods they have used in order to build a more inclusive world that benefits all of us. 
The panellists were:

  • Veronica Orvalho: Professor at Porto University’s Instituto de Telecomunicações and the Founder/CEO of Didimo – a platform that enables users to generate digital humans.
  • Nitesh Goyal: Leads research on Responsible AI tools at Google Research.
  • Kellie Morrissey: Researcher & Lecturer at the University of Limerick’s School of Design.

IMX ’22 featured a panel discussion on “Diversity in the Metaverse”. The Metaverse is a hot topic, which has many people wondering both what it is, and more importantly, what it will look like in the future for immersive media experiences. As a unique space for social interaction, engagement and connection, it’s essential that we address the importance of representation and accessibility during its time of infancy. The discussion intended not only to cover the current scenario in virtual and augmented reality worlds, but also the consequences and challenges of building a diverse Metaverse by taking into account design, content, marketing, and the various barriers faced by different communities across the globe.

The panel was moderated by  Tara Collingwoode-Williams  (Goldsmiths University) and had four panellists to discuss topics related to research and practice around “Diversity and Inclusive design in the Metaverse”:

  • Nina Salomons – (Filmmaker, diversity advocate and XR consultant, XRDI, AnomieXR co-founder UK – London)
  • Micaela Mantegna – (TED Fellow. Video Games Policy/Artificial intelligence, creativity & copyright Professor. AI, XR and Metaverse researcher. BKC Harvard Affiliate. Diversity & Inclusion advocate. Founder of Women In Games, Argentina – Greater Buenos Aires) 
  • Krystal Cooper -( Unity : Emerging Products – Professional Artistry / Virtual production * Spatial Computing * XR researcher * , USA – LA)
  • Mmuso Mafisa – (XR consultant, Veza Interactive and Venture Chain Capital, SA – Johannesburg Metropolitan Area)

Short testimonials by two of the EDI grant beneficiaries

Soonbin Lee is a PhD student at Sungkyunkwan University (SKKU) in Korea, who would not have been able to attend MMsys ’22 without the SIGMM support (due to a lack of other funding opportunities). Soonbin wrote a short testimonial.

“The conference consisted of the presentation of a keynote and regular sessions by various speakers. In particular, with the advent of cloud gaming, there are many presentations, including: streaming systems specialized in game videos; haptic media for realistic viewing; and humanoid robots that can empathize with humans. During the conference, I enjoyed the spectacular views of Ireland and the wonderful traditional cuisine that was included in the conference program. Along with the presentations during the regular sessions, demo sessions were also presented. Participants from the industry, including Qualcomm, Fraunhofer FOKUS, INRIA, and TNO, were engaged during the MMSys demo sessions. Being able to participate offered also an excellent opportunity to witness the outcomes of real-time systems, including user-interactive VR games, holographic cube matching instructions, and a mobile-based deep learning video codec decoding demo. I was also able to hear the presentations of various PhD research proposals, and it was very impressive to see many PhD students present their interesting research.

At the MMSys conference, there were also a number of social events, like Viking boat and beer-brewing in Ireland, so I was able to meet with other researchers and get to know them better. This was an amazing experience for me because it is not easy to meet the researchers in person. On the last day, I gave a presentation at the NOSSDAV session on the compression processing of MPEG Immersive Video (MIV). Through this discussion and the Q&A, I was able to learn more about the most recent trends in research. 
More importantly, I made many friends who studied with the same interests. I had a fantastic chance and a wonderful experience meeting other scholars in person. The MMSys Conference was a really impressive conference for me. With the travel grant, I fully enjoyed this opportunity!”

Postdoctoral researcher Alan Guedes also wrote a short reflection:
“I am a researcher from the Brazilian multimedia community, especially concentrated at the WebMedia event (http://webmedia.org.br). Although my community is considerably large and active, it has little presence at ACM events. This lack prevents the visibility of our research and possible international collaboration. In 2022, I was honoured with ACM Diversity and Inclusion Travel Award to attend two ACM SIGMM-supported conferences, namely IMX and MMSys. The events had inspiring presentations and keynotes, which made me energetic about new research directions. Particularly, I had the chance to meet researchers that I only know by their citing names. At these events, I could present some research done in Brazil and collaborate on technical committees and workshops. 

This networking was invaluable and will be essential in my research career. I was also happy to see other Brazilians that, like me, seek to engage and strengthen the bonds of SIGMM and Brazilian communities.”

Final reflections 

Both at IMX and MMSys, there were various actions and initiatives to put EDI-related topics on the agenda and to foster diversity and inclusion, both at the community level and in terms of research-related activities. We believe that a key success factor in this respect is the fact that there are valuable support mechanisms offered by the ACM and SIGMM, allowing the IMX and MMSys communities to continuously and systematically have goals related to equality, diversity and inclusion on the agenda, e.g., by removing participation barriers (e.g., by having adjusted prices depending on the country of the attendees), triggering awareness, providing a forum for under-represented voices and/or regions (e.g., focused workshops at IMX focusing on Asia (2016, 2017), Latin America (2020), .., supported by the SIGCHI Development Fund).

Based on our experiences, it is also important that defined actions and measures are based on a good understanding of the key problems. This means that efforts to gain insights into key aspects (e.g., gender balance, numbers on the participation of under-represented groups, …) and developments  over time  are highly valuable. Secondly, it is important that EDI aspects are considered holistically, as they relate to all aspects of the conference, from the beginning until the end, including e.g., the selection of keynote speakers, the matter of who is represented in the technical committees (e.g., have an open call for associate chairs as has been done at IMX since the beginning), or who is represented in the organizing committee, which efforts are done to reach out to relevant communities in various parts of the world that are currently under-represented (e.g., South-America, Afrika,…). Lastly, we need more experience sharing through both formal and informal channels. There is a huge potential to share best practices and experiences both within and between the related conferences and communities to combine our efforts towards a common EDI vision and associated goals. 

References

Students report on ACM MMSys 2022

The 13th ACM Multimedia Systems Conference (and associated workshops: MMVE 2022, NOSSDAV 2022, GameSys 2022) happened from 14th – 17th June 2022 in Athlone, Ireland.  The MMSys conference is an essential forum for researchers in multimedia systems to present and share their latest research findings in multimedia systems. After two years of online and hybrid editions, MMSys was held onsite in the beautiful Athlone. Besides the many high-quality technical talks spread across different multimedia areas and the wonderful keynote talks, there were a few events targeted especially at students, such as mentoring sessions and the doctoral symposium. The social events were significant this year since they were the first opportunity in two years for multimedia researchers to meet colleagues, collaborators, and friends and discuss the latest hot topics while sharing a pint of Guinness or a glass of wine. 

To encourage student authors to participate on-site, SIGMM has sponsored a group of students with Student Travel Grant Awards. Students who wanted to apply for this travel grant needed to submit an online form before the submission deadline. The selected students received either 1,000 or 2,000 USD to cover their airline tickets as well as accommodation costs for this event. Of the recipients, 11 were able to attend the conference. We asked them to share their unique experience attending MMSys’22. In this article, we share their reports of the event.


Andrea M. Storås, PhD student, Oslo Metropolitan University, Norway

I am grateful for receiving the SIGMM Student Travel Grant and getting the opportunity to participate at the MMSys’ 2022 Conference in Athlone, Ireland. During the conference, I presented my research as a part of the Doctoral Symposium and got valuable advice and mentoring from an experienced professor in the field of multimedia systems. The Doctoral Symposium was a great place for me to get experience with pitching my research and presenting posters at a scientific conference. 

In addition to inspiring talks and demos, the conference was filled with social events. One of the highlights was the boat trip to the Glasson Lake House with barbeque afterwards. I found the conference useful for my future career as I got to meet brilliant researchers, connect with other PhD students and discuss topics related to my PhD. I really hope that I will get the opportunity to participate in future editions of MMSys.


Reza Farahani, PhD student, ITEC Dept., Alpen-Adria-University Klagenfurt, Austria

After two years of virtual attendance in ACM MMSys, I had the opportunity to be in Athlone, Ireland, and present our work in front of the community. Like previous years, I expected a well-organized conference, and I witnessed everything from keynotes to papers sessions was perfect. Moreover, the social events were one of the best experiences I achieved, where I could discuss with community members and learn many things in a friendly atmosphere. Overall, I must express that the MMSys 2022 was excellent in all aspects, and I appreciate the SIGMM committee once again for the nice travel grant which made this experience possible.


Xiaokun Xu, PhD student, Worcester Polytechnic Institute, USA

The MMsys2022 was my first in-person conference, and it was very well organized and far more than my expectation for an in-person conference since in the past 2 years I participated in some virtual conferences and they were not very good experiences. I thought the in-person conference would be similar. The fact is that I was totally wrong. MMsys2022 was a wonderful experience, the first time I built a real connection with the community and peer researchers.
Many things impressed me a lot. For the papers and presentations, I found the poster #75 “Realistic Video Sequences for Subjective QoE Analysis” was really interesting to me. The presentation from the author was very helpful and I talked a lot with the author. Now he is one of my new friends I made from the conference and we still keep in communication through email.
Besides the papers, social events were another part that impressed me. All the social events were highly organized and made communication easier for us. I got the opportunity to talk with the authors and ask some questions that I didn’t ask during the presentation, and made some new friends who are doing similar research as me. I also got the chance to talk with some professors who are the top researchers in specific fields. Those are really precious experiences for a PhD student.
Overall, MMSys 2022 was an amazing conference and now it’s an encouragement for me to attend more academic communication in future. I’m really grateful to the SIGMM committee for the travel grant, which made this wonderful experience possible.


Sindhu Chellappa, PhD student, University of New Hampshire, US

I am really happy to be part of MMSys at Athlone, Ireland. This is the first in-person conference I have attended after the pandemic. The conference was organized seamlessly, and the keynotes were very interesting. The keynote “Network is the Renderer” by Dr Morgan from Roblox stole the entire show. Along with that, the keynotes by Dr Ali and Dr Mohamed Hefeeda on Low latency streaming and DeepGame respectively were very interesting. The social events were very relaxing and well organized. I had to travel from the US to India and to Ireland. It was a breathtaking trip, but with the student travel grant, it was a boon to attend the conference in-person.


Tzu-Yi Fan, master student, National Tsing Hua University, Taiwan 

I am grateful to receive the student grant for MMSys 2022, which was my first in-person conference. I learned a lot at the conference and had a wonderful experience in Athlone, Ireland. 
Initially, I felt nervous when I arrived in a distant and unfamiliar place, but the kind and welcomed organization calmed my mind. The schedule of the conference was fruitful. I enjoyed the presentations and keynotes a lot. I presented my paper about high-rise firefighting in the special session. Although I did not speak smoothly at the beginning, I still enjoyed interacting with the audience. Keynote given by Professor Mohamed impressed me a lot. He spoke about the challenges of cloud gaming and introduced a video encoding pipeline to reduce the bandwidth. I also loved the coffee break between sessions. During that time, people worldwide could discuss each other’s research, which I could not do in virtual participation. It was an excellent opportunity to practice demonstrating our research to people from different backgrounds.
Moreover, the social events at night were also exciting. I tasted several kinds of beer at the welcome party. Ireland is famous for beer. I was glad to try the local flavour, which I never thought beer could be.
Thank the MMSys 2022 organization for holding such a splendid conference and expanding my horizons. I look forward to carrying on my new research and joining more conferences in the future.


Kerim Hodžić, PhD student, University of Sarajevo, Bosnia and Herzegovina

My name is Kerim Hodžić, and I am a PhD student at the Faculty of Electrical Engineering, Computer Science Department at the University of Sarajevo, Bosnia and Herzegovina. It was my pleasure to attend the ACM/MMSYS 2022 conference held in Athlone, Ireland where I presented my paper „Realistic Video Sequences for Subjective QoE Analysis” which is part of my PhD research. In addition to that, I had an opportunity to learn much from attending all the conference sessions with very interesting paper presentations and also from the special guests who provided us with interesting information about the industry. In social events, I met many people from industry and academia and I hope it will lead to some useful cooperation in the future. This is the best conference I have attended so far in my career and I want to congratulate everyone who organised it. I also want to thank the SIGMM committee for their travel grant, which made this experience possible. Till the next MMSYS! All the best.


Juan Antonio De Rus Arance, Universitat Politècnica de València, Spain

MMSys’2022 was an amazing experience and a great opportunity to discover other research works in my field. It gave me the chance to meet colleagues working in the same area and discuss ideas with them, opening the doors to possible collaborations. Moreover, participating in the Doctoral Symposium was very didactic.
It wouldn’t have been possible for me to attend the conference if it wasn’t for the SIGMM Student travel award and I’m very grateful.


Miguel Fernández Dasí, PhD student, Universitat Politècnica de Catalunya, Spain

I am a PhD student at the Universitat Politècnica de Catalunya, and MMSys 2022 was my first in-person conference. I attended the Doctoral Symposium to present my paper, “Design, development and evaluation of adaptive and interactive solutions for high-quality viewport-aware VR360 video processing and delivery”.
It was a great experience meeting fellow PhD students and sharing ideas about different topics, especially with those working in the same area. Furthermore, everyone at the conference was always willing to talk, which I have significantly appreciated as a PhD student and that always led to fascinating conversations.
All the keynotes were engaging. I was particularly interested in Prof. Mohamed Hefeeda’s “DeepGame: Efficient Video Encoding for Cloud Gaming” keynote, a topic related to my PhD thesis. I also found Prof. Nadia Magnenat Thalmann’s keynote on “Digital and Robotic Humanoid Twins: for Which Purposes” interesting, a topic I didn’t know about but found great interest in.  I am thankful to SIGMM for receiving the Student Travel Grant, which made my attendance at this conference possible.


Melan Vijayaratnam, PhD student, CentraleSupelec, France

I am delighted to have been given a grant for the MMSys conference in Athlone, Ireland. This was my first in-person conference that my supervisor Dr Giuseppe Valenzise really wanted me to attend to meet with the Multimedia community. I went there by myself and it was scary at first to go to the conference without knowing anyone at first. However, being on the doctoral symposium track, my mentor Dr Pablo Cesar helped me with his advice and introduced me to many people and I got to meet other fellow PhD students. It was definitely an incredible experience and I am grateful to have been introduced to this welcoming community.


Chun Wei Ooi, PhD student, Trinity College Dublin, Ireland

It was my first time attending the MMsys conference this year. I would like to thank the committee for awarding the travel grants to students such as myself. I presented my research topic at MMVE and received some good suggestions from senior researchers. It was a very fruitful conference where I met different researchers from different backgrounds and levels. I also benefited tremendously from attending the conference because my latest work is partly inspired by the research talk I attended. One of the highlights of attending MMsys in person is its many social events. Not only did they show the best side of the venue, but more importantly I was able to make friends with fellow researchers. Overall MMsys community is a very talented and friendly bunch, I am glad to be a part of it.   


Jingwen Zhu, PhD student, Nantes university, France

I was very disappointed that I didn’t receive my visa until the day before the MMSys. However, I got a call from the embassy on the first day of the conference, telling me that my visa application was approved. I shared the news with my supervisor Patrick Le Callet, who insisted that I should buy the next plane to come to the conference and present my research proposal in person.

MMSys is the first conference for me since the beginning of my PhD. As a first-year PhD student, it was a very good opportunity for me to know this excellent community and exchange my research with more experienced researchers. I really appreciate the breakfast with my mentor Dr Ketan Mayer-Patel. He gave me very nice suggestions for my PhD during breakfast. After the conference, he still sent me a good tutorial about how to make a good academic poster. I would like to thank the conference organizers and the travel grand for giving me the opportunity to meet everyone in person. Thanks to everyone who exchanged ideas with me during the conference and especially my DS mentor Ketan. I hope that I can continue to attend MMSys next year!

JPEG Column: 95th JPEG Meeting

JPEG issues a call for proposals for JPEG Fake Media

The 95th JPEG meeting was held online from 25 to 29 April 2022. A Call for Proposals (CfP) was issued for JPEG Fake Media that aims at a standardisation framework for secure annotation of modifications in media assets. With this new initiative, JPEG endeavours to provide standardised means for the identification of the provenance of media assets that include imaging information. Assuring the provenance of the coded information is essential considering the current trends and possibilities on multimedia technology.

Fake Media standardisation aims the identification of image provenance.

This new initiative complements the ongoing standardisation of machine learning based codecs for images and point clouds. Both are expected to revolutionise the state of the art of coding standards, leading to compression rates beyond the current state of the art.

The 95th JPEG meeting had the following highlights:

  • JPEG Fake Media issues a Call for Proposals;
  • JPEG AI
  • JPEG Pleno Point Cloud Coding;
  • JPEG Pleno Light Fields quality assessment;
  • JPEG AIC near perceptual lossless quality assessment;
  • JPEG NFT exploration;
  • JPEG DNA explorations
  • JPEG XS 2nd edition published;
  • JPEG XL 2nd edition.

The following summarises the major achievements of the 95th JPEG meeting.

JPEG Fake Media

At its 95th JPEG meeting, the committee issued a Final Call for Proposals (CfP) on JPEG Fake Media. The scope of JPEG Fake Media is the creation of a standard that can facilitate the secure and reliable annotation of media asset creation and modifications. The standard shall address use cases that are in good faith as well as those with malicious intent. The call for proposals welcomes contributions that address at least one of the extensive list of requirements specified in the associated “Use Cases and Requirements for JPEG Fake Media” document. Proponents are highly encouraged to express their interest in submission of a proposal before 20 July 2022 and submit their final proposal before 19 October 2022. Full details about the timeline, submission requirements and evaluation processes are documented in the CfP available on jpeg.org.

JPEG AI

Following the JPEG AI joint ISO/IEC/ITU-T Call for Proposals issued after the 94th JPEG committee meeting, 14 registrations were received among which 12 codecs were submitted for the standard reconstruction task. For computer vision and image processing tasks, several teams have submitted compressed domain decoders, notably 6 for image classification. Prior to the 95th JPEG meeting, the work was focused on the management of the Call for Proposals submissions and the creation of the test sets and the generation of anchors for standard reconstruction, image processing and computer vision tasks. Moreover, a dry run of the subjective evaluation of the JPEG AI anchors was performed with expert subjects and the results were analysed during this meeting, followed by additions and corrections to the JPEG AI Common Training and Test Conditions and the definition of several recommendations for the evaluation of the proposals, notably, the anchors, images and bitrates selection. A procedure for cross-check evaluation was also discussed and approved. The work will now focus on the evaluation of the Call for Proposals submissions, which is expected to be finalized at the 96th JPEG meeting.

JPEG Pleno Point Cloud Coding

JPEG Pleno is working towards the integration of various modalities of plenoptic content under a single and seamless framework. Efficient and powerful point cloud representation is a key feature within this vision. Point cloud data supports a wide range of applications for human and machine consumption including metaverse, autonomous driving, computer-aided manufacturing, entertainment, cultural heritage preservation, scientific research and advanced sensing and analysis. During the 95th JPEG meeting, the JPEG Committee reviewed the responses to the Final Call for Proposals on JPEG Pleno Point Cloud Coding. Four responses have been received from three different institutions. At the upcoming 96th JPEG meeting, the responses to the Call for Proposals will be evaluated with a subjective quality evaluation and objective metric calculations.

JPEG Pleno Light Field

The JPEG Pleno standard tools provide a framework for coding new imaging modalities derived from representations inspired by the plenoptic function. The image modalities addressed by the current standardization activities are light field, holography, and point clouds, where these image modalities describe different sampled representations of the plenoptic function. Therefore, to properly assess the quality of these plenoptic modalities, specific subjective and objective quality assessment methods need to be designed.

In this context, JPEG has launched a new standardisation effort known as JPEG Pleno Quality Assessment. It aims at providing a quality assessment standard, defining a framework that includes subjective quality assessment protocols and objective quality assessment procedures for lossy decoded data of plenoptic modalities for multiple use cases and requirements. The first phase of this effort will address the light field modality.

To assist this task, JPEG has issued the “JPEG Pleno Draft Call for Contributions on Light Field Subjective Quality Assessment”, to collect new procedures and best practices with regard to light field subjective quality assessment methodologies to assess artefacts induced by coding algorithms. All contributions, which can be test procedures, datasets, and any additional information, will be considered to develop the standard by consensus among the JPEG experts following a collaborative process approach.

The Final Call for Contributions will be issued at the 96th JPEG meeting. The deadline for submission of contributions is 18 December 2022.

JPEG AIC

During the 95th JPEG Meeting, the committee released the Draft Call for Contributions on Subjective Image Quality Assessment.

The new JPEG AIC standard will be developed considering all the submissions to the Call for Contributions in a collaborative process. The deadline for the submission is set for 14 October 2022. Multiple types of contributions are accepted, notably subjective assessment methods including supporting evidence and detailed description, test material, interchange format, software implementation, criteria and protocols for evaluation, additional relevant use cases and requirements, and any relevant evidence or literature.

The JPEG AIC committee has also started the preparation of a workshop on subjective assessment methods for the investigated quality range, which will be held at the end of June. The workshop targets obtaining different views on the problem, and will include both internal and external speakers, as well as a Q&A panel. Experts in the field of quality assessment and stakeholders interested in the use cases are invited.

JPEG NFT

After the joint JPEG NFT and Fake Media workshops it became evident that even though the use cases between both topics are different, there is a significant overlap in terms of requirements and relevant solutions. For that reason, it was decided to create a single AHG that covers both JPEG NFT and JPEG Fake Media explorations. The newly established AHG JPEG Fake Media and NFT will use the JPEG Fake Media mailing list.

JPEG DNA

The JPEG Committee has continued its exploration of the coding of images in quaternary representations, as it is particularly suitable for DNA storage applications. The scope of JPEG DNA is the creation of a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers. A new version of the overview document on DNA-based Media Storage: State-of-the-Art, Challenges, Use Cases and Requirements was issued and has been made publicly available. It was decided to continue this exploration by validating and extending the JPEG DNA benchmark codec to simulate an end-to-end image storage pipeline using DNA for future exploration experiments including biochemical noise simulation. During the 95th JPEG meeting, a new specific document describing the Use Cases and Requirements for DNA-based Media Storage was created which is made publicly available. A timeline for the standardization process was also defined. Interested parties are invited to consider joining the effort by registering to the JPEG DNA AHG mailing list.

JPEG XS

The JPEG Committee is pleased to announce that the 2nd editions of Part 1 (Core coding system), Part 2 (Profiles and buffer models), and Part 3 (Transport and container formats) were published in March 2022. Furthermore, the committee finalized the work on Part 4 (Conformance testing) and Part 5 (Reference software), which are now entering the final phase for publication. With these last two parts, the committee’s work on the 2nd edition of the JPEG XS standards comes to an end, allowing to shift the focus to further improve the standard. Meanwhile, in response to the latest Use Cases and Requirements for JPEG XS v3.1, the committee received a number of technology proposals from Fraunhofer and intoPIX that focus on improving the compression performance for desktop content sequences. The proposals will now be evaluated and thoroughly tested and will form the foundation of the work towards a 3rd edition of the JPEG XS suite of standards. The primary goal of the 3rd edition is to deliver the same image quality as the 2nd edition, but with half of the required bandwidth.

JPEG XL

The second edition of JPEG XL Part 1 (Core coding system), with an improved numerical stability of the edge-preserving filter and numerous editorial improvements, has proceeded to the CD stage. Work on a second edition of Part 2 (File format) was initiated. Hardware coding was also further investigated. Preliminary software support has been implemented in major web browsers, image viewing and editing software, including popular tools such as FFmpeg, ImageMagick, libvips, GIMP, GDK and Qt. JPEG XL is now ready for wide-scale adoption.

Final Quote

“Recent development on creation and modification of visual information call for development of tools that can help protecting the authenticity and integrity of media assets. JPEG Fake Media is a standardised framework to deal with imaging provenance.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Upcoming JPEG meetings are planned as follows:

  • No. 96, will be held online during 25-29 July 2022.

VQEG Column: VQEG Meeting Dec. 2021 (virtual/online)

Introduction

Welcome to a new column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG).
The last VQEG plenary meeting took place from 13 to 17 December 2021, and it was organized online by University of Surrey, UK. During five days, more than 100 participants (from more than 20 different countries of America, Asia, Africa, and Europe) could remotely attend the multiple sessions related to the active VQEG projects, which included more than 35 presentations and interesting discussions. This column provides an overview of this VQEG plenary meeting, while all the information, minutes and files (including the presented slides) from the meeting are available online in the VQEG meeting website.

Group picture of the VQEG Meeting 13-17 December 2021

Many of the works presented in this meeting can be relevant for the SIGMM community working on quality assessment. Particularly interesting can be the new analyses and methodologies discussed within the Statistical Analyses Methods group, the new metrics and datasets presented within the No-Reference Metrics group, and the progress on the plans of the 5G Key Performance Indicators group and the Immersive Media group. We encourage those readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Overview of VQEG Projects

Audiovisual HD (AVHD)

The AVHD group investigates improved subjective and objective methods for analyzing commonly available video systems. In this sense, it has recently completed a joint project between VQEG and ITU SG12 in which 35 candidate objective quality models were submitted and evaluated through extensive validation tests. The result was the ITU-T Recommendation P.1204, which includes three standardized models: a bit-stream model, a reduced reference model, and a hybrid no-reference model. The group is currently considering extensions of this standard, which originally covered H.264, HEVC, and VP9, to include other encoders, such as AV1. Apart from this, two other projects are active under the scope of AVHD: QoE Metrics for Live Video Streaming Applications (Live QoE) and Advanced Subjective Methods (AVHD-SUB).

During the meeting, three presentations related to AVHD activities were provided. In the first one, Mikolaj Leszczuk (AGH University) presented their work on secure and reliable delivery of professional live transmissions with low latency, which brought to the floor the constant need for video datasets, such as the VideoSet. In addition, Andy Quested (ITU-R Working Party 6C) led a discussion on how to assess video quality for very high resolution (e.g., 8K, 16K, 32K, etc.) monitors with interactive applications, which raised the discussion on the key possibility of zooming in to absorb the details of the images without pixelation. Finally, Abhinau Kumar (UT Austin) and Cosmin Stejerean (Meta) presented their work on exploring the reduction of the complexity of VMAF by using features in the wavelet domain [1]. 

Quality Assessment for Health applications (QAH)

The QAH group works on the quality assessment of health applications, considering both subjective evaluation and the development of datasets, objective metrics, and task-based approaches. This group was recently launched and, for the moment, they have been working on a topical review paper on objective quality assessment of medical images and videos, which was submitted in December to Medical Image Analysis [2]. Rafael Rodrigues (Universidade da Beira Interior) and Lucie Lévêque (Nantes Université) presented the main details of this work in a presentation scheduled during the QAH session. The presentation also included information about the review paper published by some members of the group on methodologies for subjective quality assessment of medical images [3] and the efforts in gathering datasets to be listed on the VQEG datasets website. In addition, Lu Zhang (IETR – INSA Rennes) presented her work on model observers for the objective quality assessment of medical images from task-based approaches, considering three tasks: detection, localization, and characterization [4]. In addition, it is worth noting that members of this group are organizing a special session on “Quality Assessment for Medical Imaging” at the IEEE International Conference on Image Processing (ICIP) that will take place in Bordeaux (France) from the 16 to the 19 October 2022.

Statistical Analysis Methods (SAM)

The SAM group works on improving analysis methods both for the results of subjective experiments and for objective quality models and metrics. Currently, they are working on statistical analysis methods for subjective tests, which are discussed in their monthly meetings.

In this meeting, there were four presentations related to SAM activities. In the first one, Zhi Li and Lukáš Krasula (Netflix), exposed the lessons they learned from the subjective assessment test carried out during the development of their metric Contrast Aware Multiscale Banding Index (CAMBI) [5]. In particular, they found that some subjective can have perceptually unbalanced stimuli, which can cause systematic and random errors in the results. In this sense, they explained their statistical data analyses to mitigate these errors, such as the techniques in ITU-T Recommendation P.913 (section 12.6) which can reduce the effects of the random error. The second presentation described the work by Pablo Pérez (Nokia Bell Labs), Lucjan Janowsk (AGH University), Narciso Garcia (Universidad Politécnica de Madrid), and Margaret H. Pinson (NTIA/ITS) on a novel subjective assessment methodology with few observers with repetitions (FOWR) [6]. Apart from the description of the methodology, the dataset generated from the experiments is available on the Consumer Digital Video Library (CDVL). Also, they launched a call for other labs to repeat their experiments, which will help on discovering the viability, scope and limitations of the FOWR method and, if appropriate, include this method in the ITU-T Recommendation P.913 for quasi-experimental assessments when it is not possible to have 16 to 24 subjects (e.g., pre-tests, expert assessment, and resource limitations), for example, performing the experiment with 4 subjects 4 times each on different days, which would be similar to a test with 15 subjects. In the third presentation, Irene Viola (CWI) and Lucjan Janowski (AGH University) presented their analyses on the standardized methods for subject removal in subjective tests. In particular, the methods proposed in the recommendations ITU-R BT.500 and ITU-T P.913 were considered, resulting in that the first one (described in Annex 1 of Part 1) is not recommended for Absolute Category Rating (ACR) tests, while the one described in the second recommendations provides good performance, although further investigation in the correlation threshold used to discard subjects s required. Finally, the last presentation led the discussion on the future activities of SAM group, where different possibilities were proposed, such as the analysis of confidence intervals for subjective tests, new methods for comparing subjective tests from more than two labs, how to extend these results to better understand the precision of objective metrics, and research on crowdsourcing experiment in order to make them more reliable and improve cost-effectiveness. These new activities are discussed in the monthly meetings of the group.

Computer Generated Imagery (CGI)

CGI group focuses on quality analysis of computer-generated imagery, with a focus on gaming in particular. Currently, the group is working on topics related to ITU work items, such as ITU-T Recommendation P.809 with the development of a questionnaire for interactive cloud gaming quality assessment, ITU-T Recommendation P.CROWDG related to quality assessment of gaming through crowdsourcing, ITU-T Recommendation P.BBQCG with a bit-stream based quality assessment of cloud gaming services, and a codec comparison for computer-generated content. In addition, a presentation was delivered during the meeting by Nabajeet Barman (Kingston University/Brightcove), who presented the subjective results related to the work presented at the last VQEG meeting on the use of LCEVC for Gaming Video Streaming Applications [7]. For more information on the related activities, do not hesitate to contact the chairs of the group. 

No Reference Metrics (NORM)

The NORM group is an open collaborative project for developing no-reference metrics for monitoring visual service quality. Currently, two main topics are being addressed by the group, which are discussed in regular online meetings. The first one is related to the improvement of SI/TI metrics to solve ambiguities that have appeared over time, with the objective of providing reference software and updating the ITU-T Recommendation P.910. The second item is related to the addition of standard metadata of video quality assessment-related information in the encoded video streams. 

In this meeting, this group was one of the most active in terms of presentations on related topics, with 11 presentations. Firstly, Lukáš Krasula (Netflix) presented their Contrast Aware Multiscale Banding Index (CAMBI) [5], an objective quality metric that addresses banding degradations that are not detected by other metrics, such as VMAF and PSNR (code is available on GitHub). Mikolaj Leszczuk (AGH University) presented their work on the detection of User-Generated Content (UGC) automatic detection in the wild. Also, Vignesh Menon & Hadi Amirpour (AAU Klagenfurt) presented their open-source project related to the analysis and online prediction of video complexity for streaming applications. Jing Li (Alibaba) presented their work related to the perceptual quality assessment of internet videos [8], proposing a new objective metric (STDAM, for the moment, used internally) validated in the Youku-V1K dataset. The next presentation was delivered by Margaret Pinson (NTIA/ITS) dealing with a comprehensive analysis on why no-reference metrics fail, which emphasized the need of training these metrics on several datasets and test them on larger ones. The discussion also pointed out the recommendation for researchers to publish their metrics in open source in order to make it easier to validate and improve them. Moreover, Balu Adsumilli and Yilin Wang (Youtube) presented a new no-reference metric for UGC, called YouVQ, based on a transfer-learning approach with a pre-train on non-UGC data and a re-train on UGC. This metric will be released in open-source shortly, and a dataset with videos and subjective scores has been also published. Also, Margaret Pinson (NTIA/ITS), Mikołaj Leszczuk (AGH University), Lukáš Krasula (Netflix), Nabajeet Barman (Kingston University/Brightcove), Maria Martini (Kingston University), and Jing Li (Alibaba) presented a collection of datasets for no-reference metric research, while Shahid Satti (Opticom GmbH) exposed their work on encoding complexity for short video sequences. On his side, Franz Götz-Hahn (Universität Konstanz/Universität Kassel) presented their work on the creation of the KonVid-150k video quality assessment dataset [9], which can be very valuable for training no-reference metrics, and the development of objective video quality metrics. Finally, regarding the aforementioned two active topics within NORM group, Ioannis Katsavounidis (Meta) provided a presentation on the advances in relation to the activity related to the inclusion of standard video quality metadata, while Lukáš Krasula (Netflix), Cosmin Stejerean (Meta), and Werner Robitza (AVEQ/TU Ilmenau) presented the updates on the improvement of SI/TI metrics for modern video systems.

Joint Effort Group (JEG) – Hybrid

The JEG group was focused on joint work to develop hybrid perceptual/bitstream metrics and on the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. In this sense, a project in collaboration with Sky was finished and presented in the last VQEG meeting.

Related activities were presented in this meeting. In particular, Enrico Masala and Lohic Fotio Tiotsop (Politecnico di Torino) presented the updates on the recent activities carried out by the group, and their work on artificial-intelligence observers for video quality evaluation [10].

Implementer’s Guide for Video Quality Metrics (IGVQM)

The IGVQM group, whose activity started in the VQEG meeting in December 2020, works on creating an implementer’s guide for video quality metrics. In this sense, the current goal is to create a report on the accuracy of video quality metrics following a test plan based on collecting datasets, collecting metrics and methods for assessment, and carrying out statistical analyses. An update on the advances was provided by Ioannis Katsavounidis (Meta) and a call for the community is open to contribute to this activity with datasets and metrics.

5G Key Performance Indicators (5GKPI)

The 5GKPI group studies relationship between key performance indicators of new communications networks (especially 5G) and QoE of video services on top of them. Currently, the group is working on the definition of relevant use cases, which are discussed on monthly audiocalls. 

In relation to these activities, there were four presentations during this meeting. Werner Robitza (AVQ/TU Ilmenau) presented a proposal for KPI message format for gaming QoE over 5G networks. Also, Pablo Pérez (Nokia Bell Labs) presented their work on a parametric quality model for teleoperated driving [11] and an update of the ITU-T GSTR-5GQoE topic, related to the QoE requirements for real-time multimedia services over 5G networks. Finally, Margaret Pinson (NTIA/ITS) presented an overall description of 5G technology, including differences in spectrum allocation per country impact on the propagation and responsiveness and throughput of 5G devices.

Immersive Media Group (IMG)

The IMG group researches on quality assessment of immersive media. The group recently finished the test plan for quality assessment of short 360-degree video sequences, which resulted in the support for the development of the ITU-T Recommendation P.919. Currently, the group is working on further analyses of the data gathered from the subjective tests carried out for that test plan and on the analysis of data for the quality assessment of long 360-degree videos. In addition, members of the group are contributing to the IUT-T SG12 on the topic G.CMVTQS on computational models for QoE/QoS monitoring to assess video telephony services. Finally, the group is also working on the preparation of a test plan for evaluating the QoE with immersive and interactive communication systems, which was presented by Pablo Pérez (Nokia Bell Labs) and Jesús Gutiérrez (Universidad Politécnica de Madrid). If the reader is interested in this topic, do not hesitate to contact them to join the effort. 

During the meeting, there were also four presentations covering topics related to the IMG topics. Firstly, Alexander Raake (TU Ilmenau) provided an overview of the projects within the AVT group dealing with the QoE assessment of immersive media. Also, Ashutosh Singla (TU Ilmenau) presented a 360-degree video database with higher-order ambisonics spatial audio. Maria Martini (Kingston University) presented an update on the IEEE standardization activities on Human Factors or Visual Experiences (HFVE), such as the recently submitted draft standard on deep-learning-based quality assessment and the draft standard to be submitted shortly on quality assessment of light field content. Finally, Kjell Brunnstöm (RISE) presented their work on legibility in virtual reality, also addressing the perception of speech-to-text by Deaf and hard of hearing.  

Intersector Rapporteur Group on Audiovisual Quality Assessment (IRG-AVQA) and Q19 Interim Meeting

Although in this case there was no official meeting IRG-AVQA meeting, there were various presentations related to ITU activities addressing QoE evaluation topics. In this sense, Chulhee Lee (Yonsei University) presented an overview of ITU-R activities, with a special focus on quality assessment of HDR content, and together with Alexander Raake (TU Ilmenau) presented an update on ongoing ITU-T activities.

Other updates

All the sessions of this meeting and, thus, the presentations, were recorded and have been uploaded to Youtube. Also, it is worth informing that the anonymous FTP will be closed soon, so files and presentations can be accessed from old browsers or via an FTP app. All the files, including those corresponding to the VQEG meetings, will be embedded into the VQEG website over the next months. In addition, the GitHub with tools and subjective labs setup is still online and kept updated. Moreover, during this meeting, it was decided to close the Joint Effort Group (JEG) and the Independent Lab Group (ILG), which can be re-established when needed. Finally, although there were not many activities in this meeting within the Quality Assessment for Computer Vision Applications (QACoViA) and the Psycho-Physiological Quality Assessment (PsyPhyQA) they are still active.

The next VQEG plenary meeting will take place in Rennes (France) from 9 to 13 May 2022, which will be again face-to-face after four online meetings.

References

[1] A. K. Venkataramanan, C. Stejerean, A. C. Bovik, “FUNQUE: Fusion of Unified Quality Evaluators”, arXiv:2202.11241, submitted to the IEEE International Conference on Image Processing (ICIP), 2022. (opens in a new tab).
[2] R. Rodrigues, L. Lévêque, J. Gutiérrez, H. Jebbari, M. Outtas, L. Zhang, A. Chetouani, S. Al-Juboori, M. G. Martini, A. M. G. Pinheiro, “Objective Quality Assessment of Medical Images and Videos: Review and Challenges”, submitted to the Medical Image Analysis, 2022.
[3] L. Lévêque, M. Outtas, L. Zhang, H. Liu, “Comparative study of the methodologies used for subjective medical image quality assessment”, Physics in Medicine & Biology, vol. 66, no. 15, Jul. 2021. (opens in a new tab).
[4] L.Zhang, C.Cavaro-Ménard, P.Le Callet, “An overview of model observers”, Innovation and Research in Biomedical Engineering, vol. 35, no. 4, pp. 214-224, Sep. 2014. (opens in a new tab).
[5] P. Tandon, M. Afonso, J. Sole, L. Krasula, “Comparative study of the methodologies used for subjective medical image quality assessment”, Picture Coding Symposium (PCS), Jul. 2021. (opens in a new tab).
[6] P. Pérez, L. Janowski, N. García, M. Pinson, “Subjective Assessment Experiments That Recruit Few Observers With Repetitions (FOWR)”, IEEE Transactions on Multimedia (Early Access), Jul. 2021. (opens in a new tab).
[7] N. Barman, S. Schmidt, S. Zadtootaghaj, M.G. Martini, “Evaluation of MPEG-5 part 2 (LCEVC) for live gaming video streaming applications”, Proceedings of the Mile-High Video Conference, Mar. 2022. (opens in a new tab).
[8] J. Xu, J. Li, X. Zhou, W. Zhou, B. Wang, Z. Chen, “Perceptual Quality Assessment of Internet Videos”, Proceedings of the ACM International Conference on Multimedia, Oct. 2021. (opens in a new tab).
[9] F. Götz-Hahn, V. Hosu, H. Lin, D. Saupe, “KonVid-150k: A Dataset for No-Reference Video Quality Assessment of Videos in-the-Wild”, IEEE Access, vol. 9, pp. 72139 – 72160, May. 2021. (opens in a new tab).
[10] L. F. Tiotsop, T. Mizdos, M. Barkowsky, P. Pocta, A. Servetti, E. Masala, “Mimicking Individual Media Quality Perception with Neural Network based Artificial Observers”, ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 18, no. 1, Jan. 2022. (opens in a new tab).
[11] P. Pérez, J. Ruiz, I. Benito, R. López, “A parametric quality model to evaluate the performance of tele-operated driving services over 5G networks”, Multimedia Tools and Applications, Jul. 2021. (opens in a new tab).