Report from ACM Multimedia Systems 2021 by Neha Sharma


Neha Sharma (@NehaSharma) is a PhD student working with Dr Mohamed Hefeeda in Network and Multimedia Systems Lab at Simon Fraser University. Her research interests are in computer vision and machine learning with a focus on next-generation multimedia systems and applications. Her current work focuses on designing an inexpensive hyperspectral camera using a hybrid approach by leveraging both hardware and software solutions. She has been awarded as Best Social Media Reporter of the conference to promote the sharing among researchers on social networks. To celebrate this award, here is a more complete report on the conference.

Being a junior researcher in multimedia systems, I must say I feel proud to be part of this amazing community. I became part of ACM Multimedia Systems Conference (MMSys) last year in 2020, where I published my first research work. I was excited to attend MMSys ’20 in Istanbul, which unfortunately shifted online due to COVID-19. I presented my first work online and got to learn about other researchers in the community. This year I was able to publish another work with my team and got selected to present my ideas and research plans in Doctoral Symposium (thanks to reviewers). MMSys’21 gave me hope to have a full conference experience, as we all were hoping to start our lives back to normal. But, as the conference date was approaching, things were still not clear and travel restrictions were still in place. But on the good note, MMSys ’21 became hybrid to provide an opportunity to the people who can travel. It was at the very end I decided to travel and attend MMSys’21 in person. And I am glad I made that decision. My experience was overwhelmingly rich in terms of learning interesting research findings and making inspiring connections in the community. As the recipient of the “Best Social Media Reporter” award, enjoy the highlights of MMSys’ 21 through my lens. 

In the light of the ongoing global pandemic, ACM MMSys ’21 was held in hybrid mode – onsite in Istanbul, Turkey and online jointly on September 28 – October 1, 2021. Ali C. Begen (Ozyegin University and Networked Media, Turkey) opened the conference onsite with a warm welcome. MMSys’21 became the first-ever hybrid conference where participants presented onsite as well as remotely in real-time. There were participants joining from 38 different countries. The organizing team did an amazing job in pulling off this complex event. This year the research track implemented a two-round submission system, and accepted papers included public reviews in the proceedings. This, however, was not the only first, MMSys ’21 had its first Doctoral Symposium targeting the PhD students and aiming to find their mentors. In addition, there were postponed celebrations for the 30th anniversary of NOSSDAV and the 25th anniversary of Packet Video.

The conference program was very well scheduled. Each day of the conference started with a keynote. There were four insightful and inspiring keynotes from researchers working in cutting edge multimedia technologies. The first day started with a talk titled “AI-Driven Solutions throughout Games’ Lifecycles Leveraging Big Data” by Qiaolin Chen from Tencent IEG Global. Chen discussed how AI and big data are evolving the gaming industry, from intelligent market decisions to data-driven game development. On the second day, Caitlin Kalinowski presented an interesting keynote “Making Impossible Products: How to Get 0-to-1 Products Right”. Caitlin heads the VR Hardware team at Facebook Reality Labs. She shared insights about Oculus and zero-to-one products. The next day, Chris Bregler (Google) talked about “Synthetic Media: New Opportunities and New Challenges”. He discussed recent trends in generative media creation techniques that have opened new possibilities for societally beneficial uses but have also raised concerns about misuse. Last day, Sriram Sethuraman and Deepthi Nandakumar (Amazon) provided insights about “Role of ML in the Prediction of Perceptual Video Quality”. Keynotes are available on youtube to watch on-demand.

This year the conference attracted paper submissions from a range of multimedia topics including immersive media, live video, content preparation, cloud-based and mobile media processing and computer vision systems. Apart from the main research track, MMSys ’21 hosted three workshops:

  • NOSSDAV – Network and Operating System Support for Digital Audio and Video
  • MMVE – Immersive Mixed and Virtual Environment Systems
  • GameSys – Game Systems

These workshops provided an opportunity to meet those who are working in focused areas of multimedia research. This year MMSys conducted the inaugural ACM workshop on Game Systems (GameSys ’21). This workshop attracted research on all aspects of computer/digital games, emphasizing networks, systems, interaction, and applications. Highlights include the work presented by Mark Claypool et. Al (Worcester Polytechnic Institute) which conducts a user study measuring attribute scaling for cloud-based games. 

In addition to area focussed workshops, MMSys’21 also conducted two grand challenges:

Another main highlight of the conference is the EDI (Equality, Diversity and Inclusion) workshop. The workshop was tailored towards PhD students, assistant professors and starting researchers in various research organizations. The event openly discussed core topics about parenthood, work-family policies, career paths and EDI aspects at large. Laura Toni, Mea Wang and Ozgu Alay opened the workshop on the third day of the conference. Miriam Redi shared goals to achieve an equitable and inclusive multimedia community. Susanne Boll talked about the target strategy “25 in 25” to increase the participation of women in SIGMM to at least 25% by 2025. Other guest speakers also highlighted some strategies to achieve target diversity and inclusion in MMSys.

Last but not the least, amazing social events. Each day of the conference ended with a well-planned social event providing a great opportunity to the in-person attendees to meet, discuss, and develop professional and social links throughout the community in a more relaxed setting. We had visited some historical venues like Galata Tower and Adile Sultan Palace and enjoyed a Bosphorus boat tour with a live music band. This year MMSys planned the first inter-continental socials. We travelled from the European side to the Asian side of Istanbul (by bus and by boat). As a token of appreciation, in-person participants received Turkish delights and coffee, a set of traditional towels (peştemal), Istanbul-themed puzzles and a hand-made Kütahya Porcelain vase/coffee set as souvenirs. For me, the best part was sitting together and dining with peers, discussing prospects of your own research or multimedia systems research, in general.

Closing the conference, Ali C. Begen opened with the announcement of the awards. The Best Paper Award was presented to Xiao Zhu et. Al for the paper “Livelyzer: Analyzing the First-Mile Ingest Performance of Live Video Streaming”. See the full list of awards here. The conference closed with the announcement of ACM Multimedia Systems 2022, which will be happening in Athlone, Ireland. Looking forward to seeing everyone again next year.

JPEG Column: 93rd JPEG Meeting

JPEG Committee launches a Call for Proposals on Learning based Point Cloud Coding

The 93rd JPEG meeting was held online from 18 to 22 October 2021. The JPEG Committee continued its work on the development of new standardised solutions for the representation of visual information. Notably, the JPEG Committee has decided to release a new call for proposals on point cloud coding based on machine learning technologies that targets both compression efficiency and effective performance for 3D processing as well as machine and computer vision tasks. This activity will be conducted in parallel with JPEG AI standardization. Furthermore, it was also decided to pursue the development of a new standard in the context of the exploration on JPEG Fake News activity.

JPEG coding framework based in machine learning. The latent representation generated by the AI based coding mechanism can be used for human visualisation, data processing and computer vision tasks.

Considering the response to the Call for Proposals on JPEG Pleno Holography, a first standard for compression of digital holograms has entered its collaborative phase. The response to the call for proposals identified a reliable coding solution for this type of visual information that overcomes the limitations of the state of the art coding solutions for holographic data compression.

The 93rd JPEG meeting had the following highlights:

  • JPEG Pleno Point Cloud Coding draft of the Call for Proposals;
  • JPEG JPEG Pleno Holography;
  • JPEG AI drafts of the Call for Proposals and Common Training and Test Conditions;
  • JPEG Fake Media defines the standardisation timeline;
  • JPEG NFT collects use cases;
  • JPEG AIC explores standardisation of near-visually lossless quality models;
  • JPEG XS new profiles and sub-levels;
  • JPEG XL explores fixed point implementations;
  • JPEG DNA considers image quaternary representations suitable for DNA storage.

The following provides an overview of the major achievements of the 93rd JPEG meeting.

JPEG Pleno Point Cloud Coding

JPEG Pleno is working towards the integration of various modalities of plenoptic content under a single and seamless framework. Efficient and powerful point cloud representation is a key feature within this vision. Point cloud data supports a wide range of applications for human and machine consumption including autonomous driving, computer-aided manufacturing, entertainment, cultural heritage preservation, scientific research and advanced sensing and analysis. During the 93rd JPEG meeting, the JPEG Committee released a Draft Call for Proposals on JPEG Pleno Point Cloud Coding. This call addresses learning-based coding technologies for point cloud content and associated attributes with emphasis on both human visualization and decompressed/reconstructed domain 3D processing and computer vision with competitive compression efficiency compared to point cloud coding standards in common use, with the goal of supporting a royalty-free baseline. A Final Call for Proposals on JPEG Pleno Point Cloud Coding is planned to be released in January 2022.

JPEG Pleno Holography

At its 93rd JPEG meeting, the committee reviewed the response to the Call for Proposals on JPEG Pleno Holography, which is the first standardization effort aspiring to a versatile solution for efficient compression of holograms for a wide range of applications such as holographic microscopy, tomography, interferometry, printing and display and their associated hologram types. The coding technology selected provides excellent rate-distortion performance for lossy coding, in addition, to supporting lossless coding and random access via a space-frequency segmentation approach. The selected technology will serve as a baseline for the standard specification to be developed. This final specification is planned to be published as an international standard in early 2024.

JPEG AI

JPEG AI scope is the creation of a learning-based image coding standard offering a single-stream, compact compressed domain representation, targeting both human visualization with significant compression efficiency improvement over image coding standards in common use at equivalent subjective quality, and effective performance for image processing and computer vision tasks.

During the 93rd JPEG meeting, the JPEG AI project activities were focused on the analysis of the results of the exploration studies as well as refinements and improvements on common training and test conditions, especially the performance assessment of the image classification and super-resolution tasks. A related topic that received much attention was device interoperability which was thoroughly analyzed and discussed. Also, the JPEG AI Third Draft Call for Proposals is now available with improvements on evaluation conditions and proposal composition and requirements. A final call for proposals is expected to be issued at the 94th meeting (17-21 January 2022) and to produce a first Working Draft by October 2022.

JPEG Fake Media

The scope of the JPEG Fake Media exploration is to assess standardization needs to facilitate secure and reliable annotation of media asset creation and modifications in good-faith usage scenarios as well as in those with malicious intent. At the 93rd meeting, the JPEG Committee released an updated version of the “JPEG Fake Media Context, Use Cases and Requirements” document. The new version includes an extended set of definitions and a new section related to threat vectors. In addition, the requirements have been substantially enhanced, in particular those related to media asset authenticity and integrity. Given the progress of the exploration, an initial timeline for the standardization process was proposed:

  • April 2022: Issue call for proposals
  • October 2022: Submission of proposals
  • January 2023: Start standardization process
  • January 2024: Draft International Standard (DIS)
  • October 2024: International Standard (IS)

The JPEG Committee welcomes feedback on the working document and invites interested experts to join the JPEG Fake Media AhG mailing list to get involved in this standardization activity.

JPEG NFT

Non-Fungible Tokens (NFTs) have recently attracted substantial interest. Numerous digital assets associated with NFTs are encoded in existing JPEG formats or can be represented in JPEG-developed current and future representations. Additionally, several trusts and security concerns have been raised about NFTs and the underlying digital assets. The JPEG Committee has established the JPEG NFT exploration initiative to better understand user requirements for media formats. JPEG NFT’s mission is to provide effective specifications that enable various applications that rely on NFTs applied to media assets. The standard shall be secure, trustworthy, and environmentally friendly, enabling an interoperable ecosystem based on NFT within or across applications. The group seeks to engage stakeholders from various backgrounds, including technical, legal, creative, and end-user communities, to develop use cases and requirements. On October 12th, 2021, a second JPEG NFT Workshop was organized in this context. The presentations and video footage from the workshop are now available on the JPEG website. In January 2022, a third workshop will focus on commonalities with the JPEG Fake Media exploration. JPEG encourages interested parties to visit its website frequently for the most up-to-date information and to subscribe to the JPEG NFT Ad Hoc Group’s (AhG) mailing list to participate in this effort.

JPEG AIC

During the 93rd JPEG Meeting, work was initiated on the first draft of a document on use cases and requirements regarding Assessment of Image Coding. The scope of AIC activities was defined to target standards or best practices with respect to subjective and objective image quality assessment methodologies that target a range from high quality to near-visually lossless quality. This is a range of visual qualities where artefacts are not noticeable by an average non-expert viewer without presenting an original reference image but are detectable by a flicker test.

JPEG XS

The JPEG Committee created an updated document “Use Cases and Requirements for JPEG XS V3.0”. It describes new use cases and refines the requirements to allow improving the coding efficiency and to provide additional functionality w.r.t. HDR content, random access and more. In addition, the JPEG XS second editions of Part 1 (Core coding system), Part 2 (Profiles and buffer models), and Part 3 (Transport and container formats) went to the final ballot before ISO publication stage. In the meantime, the Committee continued working on the second editions of Part 4 (Conformance Testing) and Part 5 (Reference Software), which are now ready as Draft International Standards. In addition, the decision was made to create an amendment to Part 2 that will add a High420.12 profile and a new sublevel at 4 bpp, to swiftly address market demands.

JPEG XL

Part 3 (Conformance testing) has proceeded to DIS stage. Core experiments were discussed to investigate hardware coding, in particular fixed-point implementations, and will be continued. Work on a second edition of Part 1 (Core coding system) was initiated. With preliminary support in major web browsers, image viewing and editing software, JPEG XL is ready for wide-scale adoption.

JPEG DNA

The JPEG Committee has continued its exploration of the coding of images in quaternary representations, as is particularly suitable for DNA storage. An important progress in this activity is the implementation of experimentation software to simulate the coding/decoding of images in quaternary code. A thorough explanation of the package has been created, and a wiki for documentation and a link to the code can be found here. A successful fifth workshop on JPEG DNA was held prior to the 93rd JPEG meeting and a new version of the JPEG DNA overview document was issued and is now publicly available. It was decided to continue this exploration by validating and extending the JPEG DNA experimentation software to simulate an end-to-end image storage pipeline using DNA for future exploration experiments, as well as improving the JPEG DNA overview document. Interested parties are invited to consider joining the effort by registering to the mailing list of JPEG DNA.

Final Quote

“Aware of the importance of timely standards in AI-powered imaging applications, the JPEG Committee is moving forward with two concurrent calls for proposals addressing both image and point cloud coding based on machine learning”, said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Upcoming JPEG meetings are planned as follows:

No 94, to be held online during 17-21 January 2022.

Reports from ACM Multimedia 2021

Introduction

Due to the COVID-19, the annual ACM Multimedia Conference (https://2021.acmmm.org) was held in a hybrid mode – onsite in Chengdu, China, and online jointly this year. The organizers have made meticulous preparations for this conference and totally more than 1000 researchers from all over the world participated. 

Besides, there are also AI companies, e.g., Huawei and ByteDance on site trying to attract researchers. It is worth mentioning that in order to prevent the COVID-19, staff and volunteers make a lot of efforts, such as testing the body temperature and providing free masks for attendees.

To encourage student authors to fully engage with the event, SIGMM has sponsored 39 students with Student Travel Grant Awards this year. Students who wanted to apply for this travel grant needed to submit an online form (https://acmsigmm.wufoo.com/forms/sigmm-student-travel-application-form/) before the submission deadline and then the selection committee has chosen the travel grant winners according to selection criteria. The selected students received up to 1000 USD to cover their airline tickets as well accommodation costs for this event. We interviewed some travel grant winners to share their wonderful experience of attending the conference. The following are comments from them.

Students interviewed at ACM Multimedia 2021

Shaoxiang Chen (Fudan University)

It was such a great pleasure to receive the student travel grant and attend the ACM MM 2021 conference in Chengdu. The organizers have devoted a significant amount of effort to ensure the attendees have a nice experience, and in fact, we did. The prepared check-in gifts including masks, an umbrella, and small notebooks were considerate. The onsite covid-19 test was convenient for us to travel back. The keynote talks were closely related to the popular topics in the multimedia community, and I have learned a lot about deep learning and multimodal pre-training. As for the doctoral symposium, I have met excellent PhD students from all over the world and received helpful suggestions from the mentors during my own presentation. Finally, the wonderful performances at the dinner banquet made the entire conference experience even more perfect.

Yuqian Fu (Fudan University)

It is the second time that I attend ACM Multimedia onsite. The first time was in Nice, France in October 2019. That is also a very nice trip. Another thing that I want to share is that I have one long paper accepted by ACM Multimedia in 2020. The conference was supposed to be held in Seattle, USA. However, due to the COVID-19, we had to attend the conference online, which is a big pity. Therefore, it is really a happy thing to participate in this year’s conference in Chengdu. During the conference, I have the opportunity to talk with other researchers face-to-face, and I also presented my work actively to them. I learned a lot in the past few days and had a good experience. Finally, I would like to thank SIGMM for the travel grant, thank the organizers for all the efforts they made to ensure the progress of the conference, and the volunteers for their kind help.

Zheng Wang (Fudan University)

It has been a wonderful experience for me at the ACM Multimedia 2021 in Chengdu this October. Owing to the COVID-19 outbreaks in the past two years, we were so lucky to be together again. Many thanks to the local organizers for their tremendous efforts to hold the conference onsite. At the poster sessions, I was able to present my paper for video moment retrieval to attendances and discuss my idea with them. I could also stop by others’ work, and understanding their work gives me a direct observation about what is going on in the multimedia community. I enjoy the poster session since it helped me know the research trades better. One issue is that the hall for the poster session is relatively crowded, and some walls have two posters arranged one above the other, making the communication a bit inconvenient. In the keynote sessions, I was able to see diverse research areas gathered under the same topic, which let me see a problem from different aspects. As I am in my last PhD year, I could talk with several researchers from university institutions and companies, and I got valuable advice on what should I get prepared for pursuing a career in research or business. Thanks to the local organizers for arranging trips to see cute pandas, which makes visiting Chengdu a delight and unforgettable memory.

Yang Jiao (Fudan University)

It was a great honour to attend the ACM Multimedia in Chengdu this year. This year’s ACM Multimedia is a special conference, for it is the first top conference held onsite since COVID-19. It was the first time that I attended this conference and I enjoyed the academic atmosphere there. I have met a lot of friends with similar research interests as well as famous teachers to share research experiences. What excites me most is the best paper session, where a great number of outstanding works investigate interesting frontier tasks in multimedia society, such as generating music according to visual motion, estimating postures based on one’s speech tune, etc. Moreover, the dinner banquet surprises me a lot. Besides the regular host introduction and dining time, organizers also elaborately prepare wonderful shows as well as a lucky draw. I, fortunately, won the third prize. In summary, thanks for all the efforts of the organizers and excellent talks given by outstanding researchers in this year’s Multimedia. It was a really impressive experience for me!

Yechao Zhang (Huazhong University of Science and Technology (HUST) )

It was such an honour for me to receive the student travel grant. Frankly, I am merely a grad student in my second year in HUST, and it was the first time for me to attend any academic conference ever. The acceptance from ACM Multimedia 2021 is a major inspiration for me, which had inspired me to apply for a PhD program just so I could keep contributing to the academic research in the area of Multimedia in the future. During the conference, I had very much enjoyed my time visiting Chengdu. Apart from the amazing food adventure, I had the most beneficial conversations with researchers from all over the world. All these wonderful experiences would not be possible if there wasn’t for the travel grant from SIGMM. Many thanks for the recognition and support from SIGMM. I sincerely hope ACM Multimedia will gain more international influence.

Jingru Gan (University of Chinese Academy of Sciences)

The ACM Multimedia held this year is an extraordinary conference in terms of the organization and attending experience. I am most impressed by the refined arrangement of hybrid oral sessions which accommodates onsite and online presenters from everywhere on earth. The great importance of this meeting is that it intensifies the bond of researchers from pages of papers to face-to-face meetings. To get a chance of knowing how others go through months of trial and error before achieving a satisfactory result is inspiring, which encourages me to completely dedicate myself to my future work.

Yanqiao Zhu (University of Chinese Academy of Sciences)

Although this was not my first time attending international conferences, my experience at ACM Multimedia 2021 was still very exciting and unforgettable, especially after a long-time travel block due to COVID-19. This year, the diverse program not only makes me feel more connected with the multimedia research community but really broadens my vision. During the conference, I presented my paper on multimedia recommendation, met with many prestigious scholars from both academia and industry, and exchanged many interesting ideas. I believe most of the discussions will spur sparks for future research directions. I also participated in social networking programs, during which I made a lot of friends in related research areas. Overall, it was a great honour for me to receive the SIGMM travel grant that supports me attending ACM Multimedia 2021 physically. I would like to sincerely thank all organizers for their effort in making this year’s ACM Multimedia a great success.

Yudong Wang (University of Electronic Science and Technology of China)

As an undergraduate who received the student travel grant, this is my first time attending an international conference. According to the 2019-nCoV, the attendees onsite are almost Chinese and the room for the poster is a little crowded, but fortunately, people are orderly. At the conference, I stand on my poster and share my work with some researchers in the same field. Apart from that, I talk with some people who work on recommendation algorithms. They help me get to know the other AI application and brand new methods to realize intelligence. I listen to some oral work from a different area of the world and learned a lot about the other field of multimedia. The most impressive thing is the banquet. Although from different schools, the atmosphere among strangers on the table is harmonious. We talk about our daily life in our school and enjoy the performances on the stage. By the way, the gifts prepared for the attendees are surprises. If there are any regrets, it must be that I was not a volunteer to help others and failed to draw a lottery. In summary, thanks to the committee, I had a great experience on ACM Multimedia 2021.

Peidong Liu (Tsinghua University)

I am pleased to attend ACM MM 2021 conference onsite in Chengdu, China. Due to the coronavirus pandemic, the conference adopts a hybrid form, i.e. both onsite and online, to make most of the people participating in the academic exchange. It is noted that this is my first time to attend the onsite international conference in the last few years and I find it more convenient to exchange ideas onsite than online. There are several points worth talking about. First off, this conference utilizes an app called Whova in the procedure of the conference and we can complete personal research interests and affiliated institutions to communicate more conveniently with other researchers. Besides that, volunteers are patient to help us with the check-in process and give us a nice experience at the conference. Finally, thanks to the support from the conference community, I gain the opportunity to communicate with the researchers onsite all around the globe.

Haoyu Zhang (Shandong University)

This was my first time attending an international conference, and I was very happy to participate offline in Chengdu, Sichuan, China. The feeling of participating in the offline conference was something that cannot be experienced online. The volunteers at the conference were very enthusiastic and answered some questions about attending the conference for me. The ACM Multimedia was very caring, prepared many exquisite gifts for each participant, and provided dinner with very local characteristics. The delicious food made me linger. In the daily meeting, I watched and browsed the reports and posters that I was interested in, and had detailed exchanges with the authors, which not only broadened my horizons but also inspired my thinking. In short, I was very honoured to be able to attend this ACM Multimedia conference, and it was a very impressive experience. Finally, I wish the ACM Multimedia better and better.

Summary

Overall, almost everyone has a high evaluation of the experience of participating in this conference. Besides, we can tell that the travel grant does help a lot to the students. To summarize, this conference was held successfully and left a very good impression on the participants.

JPEG Column: 92nd JPEG Meeting

JPEG Committee explores NFT standardisation needs

The 92nd JPEG meeting was held online from 7 to 13 July 2021. This meeting has consolidated JPEG’s exploration on standardisation needs related to Non-Fungible Tokens (NFTs). Recently, there has been a growing interest in the use of NFTs in many applications, notably in the trade of digital art and collectables.

Other notable results of the 92nd JPEG meeting have been the release of an update to the Call for Proposals on JPEG Pleno Holography and an initiative to revisit opportunities for standardisation of image quality assessment methodologies and metrics.

The 92nd JPEG meeting had the following highlights:

  • JPEG NFT exploration;
  • JPEG Fake Media defines context, use cases and requirements;
  • JPEG Pleno Holography call for proposals;
  • JPEG AI prepare Call for Proposals;
  • JPEG AIC explores new quality models;
  • JPEG Systems;
  • JPEG XS;
  • JPEG XL;
  • JPEG DNA.

The following provides an overview of the major achievements of the 92nd JPEG meeting.

JPEG NFT exploration

Recently, Non-Fungible Tokens (NFTs) have garnered considerable interest. Numerous digital assets linked with NFTs are either encoded in existing JPEG formats or can be represented in JPEG-developed current and forthcoming representations. Additionally, various trust and security concerns have been raised about NFTs and the digital assets on which they rely. To better understand user requirements for media formats, the JPEG Committee has launched the JPEG NFT exploration initiative. The mission of JPEG NFT is to provide effective specifications that enable various applications that rely on NFTs applied to media assets. A JPEG NFT standard shall be secure, trustworthy, and eco-friendly, enabling an interoperable ecosystem based on NFTs within or across applications. The committee strives to engage stakeholders from diverse backgrounds, including the technical, legal, artistic, and end-user communities, to establish use cases and requirements. In this context, the first JPEG NFT Workshop was held on July 1st, 2021. The workshop’s presentations and video footage are now accessible on the JPEG website, and a second workshop will be held in the near future. JPEG encourages interested parties to frequently visit its website for the most up-to-date information and to subscribe to the mailing list of the JPEG NFT Ad Hoc Group (AhG) in order to participate in this effort.

JPEG Fake Media

The scope of the JPEG Fake Media exploration is to assess standardisation needs to facilitate secure and reliable annotation of media asset creation and modifications in good-faith usage scenarios as well as in those with malicious intent. At the 92nd meeting, the JPEG Committee released an updated version of the “JPEG Fake Media Context, Use Cases and Requirements” document. This new version includes an improved and extended set of requirements covering three main categories: media creation and modification descriptions, metadata embedding & referencing and authenticity verification. In addition, the document contains several improvements including an extended set of definitions covering key terminologies. The JPEG Committee welcomes feedback to the document and invites interested experts to join the JPEG Fake Media AhG mailing list to get involved in the discussion.

JPEG Pleno

Currently, a Call for Proposals is open for JPEG Pleno Holography, which is the first standardisation effort aspiring to provide a versatile solution for efficient compression of holograms for a wide range of applications such as holographic microscopy, tomography, interferometry, printing, and display, and their associated hologram types. Key desired functionalities include support for both lossy and lossless coding, scalability, random access, and integration within the JPEG Pleno system architecture, with the goal of supporting a royalty-free baseline. In support of this Call for Proposals, a Common Test Conditions document and accompanying software have been released, enabling elaborate stress testing from the rate-distortion, functionality and visual rendering quality perspectives. For the latter, numerical reconstruction software has been released enabling viewport rendering from holographic data. References to software and documentation can be found on the JPEG website.

JPEG Pleno Point Cloud continues to progress towards a Call for Proposals on learning-based point cloud coding solutions with the release at the 92nd JPEG meeting of an updated Use Cases and Requirements document. This document details how the JPEG Committee envisions learning-based point cloud coding solutions meeting the requirements of rapidly emerging use cases in this field. This document continues the focus on solutions supporting scalability and random access while detailing new requirements for 3D processing and computer vision tasks performed in the compressed domain to support emerging applications such as autonomous driving and robotics.

JPEG AI

JPEG AI scope is the creation of a learning-based image coding standard offering a single-stream, compact compressed domain representation, targeting both human visualisation with significant compression efficiency improvement over image coding standards in common use at equivalent subjective quality, and effective performance for image processing and computer vision tasks. At the 92nd JPEG meeting, several activities were carried out towards the launch of the final JPEG AI Call for Proposals. This has included improvements of the training and test conditions for learning-based image coding, especially in the areas of the JPEG AI training dataset, target bitrates, computation of quality metrics, subjective quality evaluation, and complexity assessment. A software package called the JPEG AI objective quality assessment framework, with a reference implementation of all objective quality metrics, has been made available. Moreover, the results of the JPEG AI exploration experiments for image processing and computer vision tasks defined at the previous 91st JPEG meeting were presented and discussed, including their impact on Common Test Conditions.

Moreover, the JPEG AI Use Cases and Requirements were refined with two new core requirements regarding reconstruction reproducibility and hardware platform independence. The second draft of the Call for Proposals was produced and the timeline of the JPEG AI work item was revised. It was decided that the final Call for Proposals will be issued as an outcome of the 94th JPEG Meeting. The deadline for expression of interest and registration is 5 February 2022 and the submission of bitstreams and decoded images for the test dataset are due on 30 April 2022.

JPEG AIC

Image quality assessment remains an essential component in the development of image coding technologies. A new activity has been initiated in the JPEG AIC framework to study the assessment of image coding quality, with particular attention to crowd-sourced subjective evaluation methodologies and image coding at fidelity targets relevant for end-user image delivery on the web and consumer-grade photo archival.

JPEG Systems

JUMBF (ISO/IEC 19566-5 AMD1) and JPEG 360 (ISO/IEC 19566-6 AMD1) are now published standards available through ISO. A request to create the second amendment of JUMBF (ISO/IEC 19566-5) has been produced; this amendment will further extend the functionality to cover use cases and requirements under development in the JPEG Fake Media exploration initiative. The Systems software efforts are progressing on the development of a file parser for most JPEG standards and will include support for metadata within JUMBF boxes. Interested parties are invited to subscribe to the mailing list of the JPEG Systems AhG in order to monitor and contribute to JPEG Systems activities.

JPEG XS

JPEG XS aims at the standardization of a visually lossless low-latency and lightweight compression that can be used as a mezzanine codec in various markets. With the second editions of Part 1 (core coding system), Part 2 (profiles and buffer models), and Part 3 (transport and container formats) under ballot to become International Standards, the work during this JPEG meeting went into the second edition of Part 4 (Conformance Testing) and Part 5 (Reference Software). The second edition primarily brings new coding and signalling capabilities to support raw Bayer sensor content, mathematically lossless coding of images with up to 12 bits per colour component sample, and 4:2:0-sampled image content. In addition, the JPEG Committee continued its initial exploration to study potential future improvements to JPEG XS, while still honouring its low-complexity and low-latency requirements. Among such improvements are better support for high dynamic range (HDR), better support for raw Bayer sensor content, and overall improved compression efficiency. The compression efficiency work also targets improved handling of computer-screen content and artificially-generated rendered content.

JPEG XL

JPEG XL aims at standardization for image coding that offers high compression efficiency, along with features desirable for web distribution and efficient compression of high-quality images. JPEG XL Part 3 (Conformance testing) has been promoted to the Committee Draft stage of the ISO/IEC approval process. New core experiments were defined to investigate hardware-based coding, in particular including fixed-point implementations. With preliminary support in major web browsers, image viewing and manipulation libraries and tools, JPEG XL is ready for wide-scale adoption.

JPEG DNA

The JPEG Committee has continued its exploration of the coding of images in quaternary representations, as is particularly suitable for DNA storage. Two new use cases were identified as well as the sequencing noise models and simulators to use for DNA digital storage. There was a successful presentation of the fourth workshop by the stakeholders, and a new version of the JPEG DNA overview document was issued and is now publicly available. It was decided to continue this exploration by organising the fifth workshop and conducting further outreach to stakeholders, as well as to continue improving the JPEG DNA overview document. Moreover, it was also decided to produce software to simulate an end-to-end image storage pipeline using DNA storage for future exploration experiments. Interested parties are invited to consider joining the effort by registering to the mailing list of JPEG DNA.

Final Quote

“The JPEG Committee is considering standardisation needs for timely and effective specifications that can best support the use of NFTs in applications where media assets can be represented with JPEG formats.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Upcoming JPEG meetings are planned as follows:

  • No 93, to be held online during 18-22 October 2021.
  • No 94, to be held online during 17-21 January 2022.

MPEG Column: 135th MPEG Meeting (virtual/online)

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 135th MPEG meeting was once again held as an online meeting, and the official press release can be found here and comprises the following items:

  • MPEG Video Coding promotes MPEG Immersive Video (MIV) to the FDIS stage
  • Verification tests for more application cases of Versatile Video Coding (VVC)
  • MPEG Systems reaches first milestone for Video Decoding Interface for Immersive Media
  • MPEG Systems further enhances the extensibility and flexibility of Network-based Media Processing
  • MPEG Systems completes support of Versatile Video Coding and Essential Video Coding in High Efficiency Image File Format
  • Two MPEG White Papers:
    • Versatile Video Coding (VVC)
    • MPEG-G and its application of regulation and privacy

In this column, I’d like to focus on MIV and VVC including systems-related aspects as well as a brief update about DASH (as usual).

MPEG Immersive Video (MIV)

At the 135th MPEG meeting, MPEG Video Coding has promoted the MPEG Immersive Video (MIV) standard to the Final Draft International Standard (FDIS) stage. MIV was developed to support compression of immersive video content in which multiple real or virtual cameras capture a real or virtual 3D scene. The standard enables storage and distribution of immersive video content over existing and future networks for playback with 6 Degrees of Freedom (6DoF) of view position and orientation.

From a technical point of view, MIV is a flexible standard for multiview video with depth (MVD) that leverages the strong hardware support for commonly used video codecs to code volumetric video. The actual views may choose from three projection formats: (i) equirectangular, (ii) perspective, or (iii) orthographic. By packing and pruning views, MIV can achieve bit rates around 25 Mb/s and a pixel rate equivalent to HEVC Level 5.2.

The MIV standard is designed as a set of extensions and profile restrictions for the Visual Volumetric Video-based Coding (V3C) standard (ISO/IEC 23090-5). The main body of this standard is shared between MIV and the Video-based Point Cloud Coding (V-PCC) standard (ISO/IEC 23090-5 Annex H). It may potentially be used by other MPEG-I volumetric codecs under development. The carriage of MIV is specified through the Carriage of V3C Data standard (ISO/IEC 23090-10).

The test model and objective metrics are publicly available at https://gitlab.com/mpeg-i-visual.

At the same time, MPEG Systems has begun developing the Video Decoding Interface for Immersive Media (VDI) standard (ISO/IEC 23090-13) for a video decoders’ input and output interfaces to provide more flexible use of the video decoder resources for such applications. At the 135th MPEG meeting, MPEG Systems has reached the first formal milestone of developing the ISO/IEC 23090-13 standard by promoting the text to Committee Draft ballot status. The VDI standard allows for dynamic adaptation of video bitstreams to provide the decoded output pictures in such a way so that the number of actual video decoders can be smaller than the number of the elementary video streams to be decoded. In other cases, virtual instances of video decoders can be associated with the portions of elementary streams required to be decoded. With this standard, the resource requirements of a platform running multiple virtual video decoder instances can be further optimized by considering the specific decoded video regions that are to be actually presented to the users rather than considering only the number of video elementary streams in use.

Research aspects: It seems that visual compression and systems standards enabling immersive media applications and services are becoming mature. However, the Quality of Experience (QoE) of such applications and services is still in its infancy. The QUALINET White Paper on Definitions of Immersive Media Experience (IMEx) provides a survey of definitions of immersion and presence which leads to a definition of Immersive Media Experience (IMEx). Consequently, the next step is working towards QoE metrics in this domain that requires subjective quality assessments imposing various challenges during the current COVID-19 pandemic.

Versatile Video Coding (VVC) updates

The third round of verification testing for Versatile Video Coding (VVC) has been completed. This includes the testing of High Dynamic Range (HDR) content of 4K ultra-high-definition (UHD) resolution using the Hybrid Log-Gamma (HLG) and Perceptual Quantization (PQ) video formats. The test was conducted using state-of-the-art high-quality consumer displays, emulating an internet streaming-type scenario.

On average, VVC showed on average approximately 50% bit rate reduction compared to High Efficiency Video Coding (HEVC).

Additionally, the ISO/IEC 23008-12 Image File Format has been amended to support images coded using Versatile Video Coding (VVC) and Essential Video Coding (EVC).

Research aspects: The results of the verification tests are usually publicly available and can be used as a baseline for future improvements of the respective standards including the evaluation thereof. For example, the tradeoff compression efficiency vs. encoding runtime (time complexity) for live and video on-demand scenarios is always an interesting research aspect.

The latest MPEG-DASH Update

Finally, I’d like to provide a brief update on MPEG-DASH! At the 135th MPEG meeting, MPEG Systems issued a draft amendment to the core MPEG-DASH specification (i.e., ISO/IEC 23009-1) that provides further improvements of Preroll which is renamed to Preperiod and it will be further discussed during the Ad-hoc Group (AhG) period (please join the dash email list for further details/announcements). Additionally, this amendment includes some minor improvements for nonlinear playback. The so-called Technologies under Consideration (TuC) document comprises new proposals that did not yet reach consensus for promotion to any official standards documents (e.g., amendments to existing DASH standards or new parts). Currently, proposals for minimizing initial delay are discussed among others. Finally, libdash has been updated to support the MPEG-DASH schema according to the 5th edition.

An updated overview of DASH standards/features can be found in the Figure below.

MPEG-DASH status of July 2021.

Research aspects: The informative aspects of MPEG-DASH such as the adaptive bitrate (ABR) algorithms have been subject to research for many years. New editions of the standard mostly introduced incremental improvements but disruptive ideas rarely reached the surface. Perhaps it’s time to take a step back and re-think how streaming should work for todays and future media applications and services.

The 136th MPEG meeting will be again an online meeting in October 2021 but MPEG is aiming to meet in-person again in January 2021 (if possible). Click here for more information about MPEG meetings and their developments.

Dataset Column: Overview, Scope and Call for Contributions

Overview and Scope

The Dataset Column (https://records.sigmm.org/open-science/datasets/) of ACM SIGMM Records provides timely updates on the developments in the domain of publicly available multimedia datasets as enabling tools for reproducible research in numerous related areas. It is intended as a platform for further dissemination of useful information on multimedia datasets and studies of datasets covering various domains, published in peer-reviewed journals, conference proceedings, dissertations, or as results of applied research in industry.

The aim of the Dataset Column is therefore not to substitute already established platforms for disseminating multimedia datasets, e.g., Qualinet Databases (https://qualinet.github.io/databases/) [2], Multimedia Evaluation Benchmark (https://multimediaeval.github.io/), but promote such platforms and particularly interesting datasets and benchmarking challenges associated with them. Multimedia Evaluation Benchmark, MediaEval 2021, registration is now open (https://multimediaeval.github.io). This year’s MediaEval features a wide variety of tasks and datasets tackling a large number of domains, including video privacy, social media data analysis and understanding, news items analysis, medicine and wellbeing, affective and subjective content analysis, and game and sports associated media.

The Column will also continue reporting of contributions presented within Dataset Tracks at relevant conferences, e.g., ACM Multimedia (MM), ACM Multimedia Systems (MMSys), International Conference on Quality of Multimedia Experience (QoMEX), International Conference on Multimedia Modeling (MMM).

Dataset Column in the SIGMM Records

Previously published Dataset Columns are listed below in chronological order.

Call for Contributions

Those who have created and even previously published elsewhere a dataset, benchmarking initiative or studies of datasets relevant to the multimedia community are very welcome to submit their contribution to the ACM SIGMM Records Dataset Column. Examples of these are the accepted datasets to the open dataset and software track of the ACM MMSys 2021 conference or the datasets presented at QoMEX 2021 conference. Please contact one of the editors responsible for the respective area, Mihai Gabriel Constantin (mihai.constantin84@upb.ro), Karel Fliegel (fliegek@fel.cvut.cz), and Maria Torres Vega (maria.torresvega@ugent.be) to report your contribution.

Column Editors

Since September 2021, the Dataset Column is edited by Mihai Gabriel Constantin, Karel Fliegel, and Maria Torres Vega. Current editors appreciate the work of the previous team, Martha Larson, Bart Thomee and all other contributors, and will continue and further develop this dissemination platform.

The general scope of the Dataset Column is reviewed above, with the more specific areas of the editors listed below:

  • Mihai Gabriel Constantin will be responsible for the datasets related to multimedia analysis, understanding, retrieval and exploration,
  • Karel Fliegel for the datasets with subjective annotations related to Quality of Experience (QoE) [1] research,
  • Maria Torres Vega for the datasets related to immersive multimedia systems, networked QoE and cognitive network management.

Mihai Gabriel Constantin is a researcher at the AI Multimedia Lab, University Politehnica of Bucharest, Romania, and got his PhD at the Faculty of Electronics, Telecommunications, and Information Technology at the same university, with the topic “Automatic Analysis of the Visual Impact of Multimedia Data”. He has authored over 25 scientific papers in international conferences and high impact journals, with an emphasis on the prediction of the subjective impact of multimedia items on human viewers and deep ensembles. He participated as researcher in more than 10 research projects, and is a member of program committees and reviewer for several workshops, conferences and journals. He is also an active member of the multimedia processing community, being part of the MediaEval benchmarking initiative organization team, and leading or co-organizing several tasks during MediaEval that include Predicting Media Memorability [3] and Recommending Movies Using Content [4], as well as publishing several papers that analyze the data, annotations, participant features, methods, and observed best practices for MediaEval tasks and datasets [5]. More details can be found on his webpage: https://gconstantin.aimultimedialab.ro/.

Karel Fliegel received M.Sc. (Ing.) in 2004 (electrical engineering and audiovisual technology) and his Ph.D. in 2011 (research on modeling of visual perception of image impairment features) both from the Czech Technical University in Prague, Faculty of Electrical Engineering (CTU FEE), Czech Republic. He is an assistant professor at Multimedia Technology Group of CTU FEE. His research interests include multimedia technology, image processing, image and video compression, subjective and objective image quality assessment, Quality of Experience, HVS modeling, and imaging photonics. He has been a member of research teams within various projects especially in the area of visual information processing. He has participated in COST ICT Actions IC1003 Qualinet and IC1105 3D-ConTourNet, responsible for development of Qualinet Databases [2] (https://qualinet.github.io/databases/) relevant especially to QoE research.

Maria Torres Vega is an FWO (Research Foundation Flanders) Senior Postdoctoral fellow working at the multimedia delivery cluster of the IDLab group of the Ghent University (UGent) currently working on the perception of immersive multimedia applications. She received her M.Sc. degree in Telecommunication Engineering from the Polytechnic University of Madrid, Spain, in 2009. Between 2009 and 2013 she worked as a software and test engineer in Germany with focus on Embedded Systems and Signal Processing. In October 2013, she decided to go back to academia and started her PhD at the Eindhoven University of Technology (Eindhoven, The Netherlands), where she researched on the impact of beam-steered optical wireless networks on the users’ perception of services. This work awarded her PhD in Electrical Engineering in September 2017. In her years in academia (since October 2013), she has authored more than 40 publications, including three best paper awards. Furthermore, she serves as reviewer to a plethora of journals and conferences. In 2020 she served as general chair of the 4th Quality of Experience Management workshop, as tutorial chair of the 2020 Network Softwarization conference (NetSoft), and as demo chair of the Quality of Multimedia Experience conference (QoMex 2020). In 2021, she served as Technical Program Committee (TPC) chair of the 2021 Quality of Multimedia Experience conference (QoMex 2021).

References

MPEG Visual Quality Assessment Advisory Group: Overview and Perspectives

Introduction

The perceived visual quality is of utmost importance in the context of visual media compression, such as 2D, 3D, immersive video, and point clouds. The trade-off between compression efficiency and computational/implementation complexity has a crucial impact on the success of a compression scheme. This specifically holds for the development of visual media compression standards which typically aims at maximum compression efficiency using state-of-the-art coding technology. In MPEG, the subjective and objective assessment of visual quality has always been an integral part of the standards development process. Due to the significant effort of formal subjective evaluations, the standardization process typically relies on such formal tests in the starting phase and for verification while in the development phase objective metrics are used. In the new MPEG structure, established in 2020, a dedicated advisory group has been installed for the purpose of providing, maintaining, and developing visual quality assessment methods suitable for use in the standardization process.

This column lays out the scope and tasks of this advisory group and reports on its first achievements and developments. After a brief overview of the organizational structure, current projects are presented, and initial results are presented.

Organizational Structure

MPEG: A Group of Groups in ISO/IEC JTC 1/SC 29

The Moving Pictures Experts Groups (MPEG) is a standardization group that develops standards for coded representation of digital audio, video, 3D Graphics and genomic data. Since its establishment in 1988, the group has produced standards that enable the industry to offer interoperable devices for an enhanced digital media experience [1]. In its new structure as defined in 2020, MPEG is established as a set of Working Groups (WGs) and Advisory Groups (AGs) in Sub-Committee (SC) 29 “Coding of audio, picture, multimedia and hypermedia information” of the Joint Technical Committee (JTC) 1 of ISO (International Standardization Organization) and IEC (International Electrotechnical Commission). The lists of WGs and AGs in SC 29 are shown in Figure 1. Besides MPEG, SC 29 also includes and JPEG (the Joint Photographic Experts Group, WG 1) as well as an Advisory Group for Chair Support Team and Management (AG 1) and an Advisory Group for JPEG and MPEG Collaboration (AG 4), thereby covering the wide field of media compression and transmission. Within this structure, the focus of AG 5 MPEG Visual Quality Assessment (MPEG VQA) is on interaction and collaboration with the working groups directly working on MPEG visual media compression, including WG 4 (Video Coding), WG 5 (JVET), and WG 7 (3D Graphics).

Figure 1. MPEG Advisory Groups (AGs) and Working Groups (WGs) in ISO/IEC JTC 1/SC 29 [2].

Setting the Field for MPEG VQA: The Terms of Reference

SC 29 has defined Terms of Reference (ToR) for all its WGs and AGs. The scope of AG5 MPEG Visual Quality Assessment is to support needs for quality assessment testing in close coordination with the relevant MPEG Working Groups, dealing with visual quality, with the following activities [2]:

  • to assess the visual quality of new technologies to be considered to begin a new standardization project;
  • to contribute to the definition of Calls for Proposals (CfPs) for new standardization work items;
  • to select and design subjective quality evaluation methodologies and objective quality metrics for the assessment of visual coding technologies, e.g., in the context of a Call for Evidence (CfE) and CfP;
  • to contribute to the selection of test material and coding conditions for a CfP;
  • to define the procedures useful to assess the visual quality of the submissions to a CfP;
  • to design and conduct visual quality tests, process, and analyze the raw data, and make the report of the evaluation results available conclusively;
  • to support in the assessment of the final status of a standard, verifying its performance compared to the existing standard(s);
  • to maintain databases of test material;
  • to recommend guidelines for selection of testing laboratories (verifying their current capabilities);
  • to liaise with ITU and other relevant organizations on the creation of new Quality Assessment standards or the improvement of the existing ones.

Way of Working

Given the fact that MPEG Visual Quality Assessment is an advisory group, and given the above-mentioned ToR, the goal of AG5 is not to produce new standards on its own. Instead, AG5 strives to communicate and collaborate with relevant SDOs in the field, applying existing standards and recommendations and potentially contributing to further development by reporting results and working practices to these groups.

In terms of meetings, AG5 adopts the common MPEG meeting cycle of typically four MPEG AG/WG meetings per year, which -due to the ongoing pandemic situation- so far have all been held online. The meetings are held to review the progress of work, agree on recommendations, and decide on further plans. During the meeting, AG5 closely collaborates with the MPEG WGs and conducts experts viewing sessions in various MPEG standardization activities. The focus of such activities includes the preparation of new standardization projects, the performance verification of completed projects, as well as support of ongoing projects, where frequent subjective evaluation results are required in the decision process. Between meetings, AG5 work is carried out in the context of Ad-hoc Groups (AhGs) which are established from meeting to meeting with well-defined tasks.

Focus Groups

Due to the broad field of ongoing standardization activities, AG5 has established so-called focus groups which cover the relevant fields of development. The focus group structure and the appointed chairs are shown in Figure 2.

Figure 2. MPEG VQA focus groups.

The focus groups are mandated to coordinate with other relevant MPEG groups and other standardization bodies on activities of mutual interest, and to facilitate the formal and informal assessment of the visual media type under their consideration. The focus groups are described as follows:

  • Standard Dynamic Range Video (SDR): This is the ‘classical’ video quality assessment domain. The group strives to support, design, and conduct testing activities on SDR content at any resolution and coding condition, and to maintain existing testing methods and best practice procedures.
  • High Dynamic Range Video (HDR): The focus group on HDR strives to facilitate the assessment of HDR video quality using different devices with combinations of spatial resolution, colour gamut, and dynamic range, and further to maintain and refine methodologies for measuring HDR video quality. A specific focus of the starting phase was on the preparation of the verification tests for Versatile Video Coding (VVC, ISO/IEC 23090-3 / ITU-T H.266).
  • 360° Video: The omnidirectional characteristics of 360° video content have to be taken into account for visual quality assessment. The groups’ focus is on continuing the development of 360° video quality assessment methodologies, including those using head-mounted devices. Like with the focus group on HDR, the verification tests for VVC had priority in the starting phase.
  • Immersive Video (MPEG Immersive Video, MIV): Since MIV allows for movement of the user at six degrees of freedom, the assessment of this type of content bears even more challenges and the variability of the user’s perception of the media has to be factored in. Given the absence of an original reference or ground truth, for the synthetically rendered scene, objective evaluation with conventional objective metrics is a challenge. The focus group strives to develop appropriate subjective expert viewing methods to support the development process of the standard and also evaluates and improve objective metrics in the context of MIV.

Ad hoc Groups

AG5 currently has three AhGs defined which are briefly presented with their mandates below:

  • Quality of immersive visual media (chaired by Christian Timmerer of AAU/Bitmovin, Joel Jung of Tencent, and Aljosa Smolic of Trinity College Dublin): Study Draft Overview of Quality Metrics and Methodologies for Immersive Visual Media (AG 05/N00013) with respect to new updates presented at this meeting; Solicit inputs for subjective evaluation methods and objective metrics for immersive video (e.g., 360, MIV, V-PCC, G-PCC); Organize public online workshop(s) on Quality of Immersive Media: Assessment and Metrics.
  • Learning-based quality metrics for 2D video (chaired by Yan Ye of Alibaba and Mathias Wien of RWTH Aachen University): Compile and maintain a list of video databases suitable and available to be used in AG5’s studies; Compile a list of learning-based quality metrics for 2D video to be studied; Evaluate the correlation between the learning-based quality metrics and subjective quality scores in the databases;
  • Guidelines for subjective visual quality evaluation (chaired by Mathias Wien of RWTH Aachen University, Lu Yu of Zhejiang University and Convenor of MPEG Video Coding (ISO/IEC JTC1 SC29/WG4), and Joel Jung of Tencent): Prepare the third draft of the Guidelines for Verification Testing of Visual Media Specifications; Prepare the second draft of the Guidelines for remote experts viewing test methods for use in the context of Ad-hoc Groups, and Core or Exploration Experiments.

AG 5 First Achievements

Reports and Guidelines

The results of the work of the AhGs are aggregated in AG5 output documents which are public (or will become public soon) in order to allow for feedback also from outside of the MPEG community.

The AhG on “Quality for Immersive Visual Media” maintains a report “Overview of Quality Metrics and Methodologies for Immersive Visual Media” [3] which documents the state-of-the-art in the field and shall serve as a reference for MPEG working groups in their work on compression standards in this domain. The AhG further organizes a public workshop on “Quality of Immersive Media: Assessment and Metrics” which takes place in an online form at the beginning of October 2021 [4]. The scope of this workshop is to raise awareness about MPEG efforts in the context of quality of immersive visual media and to invite experts outside of MPEG to present new techniques relevant to the scope of this workshop.

The AhG on “Guidelines for Subjective Visual Quality Evaluation” currently develops two guideline documents supporting the MPEG standardization work. The “Guidelines for Verification Testing of Visual Media Specifications” [5] define the process of assessing the performance of a completed standard after its publication. The concept of verification testing has already been established MPEG working practice for its media compression standards since the 1990ties. The document is intended to formalize the process, describe the steps and conditions for the verification tests, and set the requirements to meet MPEG procedural quality expectations.

The AhG has further released a first draft of “Guidelines for Remote Experts Viewing Sessions” with the intention to establish a formalized procedure for ad-hoc generation subjective test results as input to the standards development process [6]. This activity has been driven by the ongoing pandemic situation which forced MPEG to continue its work in virtual online meetings since early 2020. The procedure for remote experts viewing is intended to be applied during the (online) meeting phase or in the AhG phase and to provide measurable and reproducible subjective results in order to be input to the decision-making process in the project under consideration.

Verification Testing

With Essential Video Coding (EVC) [7], Low Complexity Enhancement Video Coding (LCEVC) [8] of ISO/IEC, and the joint coding standard Versatile Video Coding (VVC) of ISO/IEC and ITU-T [9][10], a significant number of new video coding standards has been recently released. Since its first meeting in October 2020, AG5 has been engaged in the preparation and conduction of verification tests for these video coding specifications. Further verification tests for MPEG Immersive Video (MIV) and Video-based Point Cloud Compression (V-PCC) [11] are under preparation and more are to come. Results of the verification test activities which have been completed in the first year of AG5 are summarized in the following subsections. All reported results have been achieved by formal subjective assessments according to established assessment protocols [12][13] and performed by qualified test laboratories. The bitstreams were generated with reference software encoders of the specification under consideration using established encoder configurations with comparable settings for both, the reference and the evaluated coding schemes. It has to be noted that all testing had to be done under the constrained conditions of the ongoing pandemic situation which induced an additional challenge for the test laboratories in charge.

MPEG-5 Part 1: Essential Video Coding (EVC)

The EVC standard was developed with the goal to provide a royalty-free Baseline profile and a Main profile with higher compression efficiency compared to High-Efficiency Video Coding (HEVC) [15][16][17]. Verification tests were conducted for Standard Dynamic Range (SDR) and high dynamic range (HDR, BT.2100 PQ) video content at both, HD (1920×1080 pixels) and UHD (3840×2160 pixels) resolution. The tests revealed around 40% bitrate savings at a comparable visual quality for the Main profile when compared to HEVC, and around 36% bitrate saving for the Baseline profile when compared to Advanced Video Coding (AVC) [18][19], both for SDR content [20]. For HDR PQ content, the Main profile provided around 35% bitrate savings for both resolutions [21].

MPEG-5 Part 2: Low-Complexity Enhancement Video Coding (LCEVC)

The LCEVC standard follows a layered approach where an LCEVC enhancement layer is added to a lower resolution base layer of an existing codec in order to achieve the full resolution video [22]. Since the base layer codec operates at a lower resolution and the separate enhancement layer decoding process is relatively lightweight, the computational complexity of the decoding process is typically lower compared to decoding of the full resolution with the base layer codec. The addition of the enhancement layer would typically be provided on top of the established base layer decoder implementation by an additional decoding entity, e.g., in a browser.

For verification testing, LCEVC was evaluated using AVC, HEVC, EVC, and VVC base layer bitstreams at half resolution, and comparing the performance to the respective schemes with full resolution coding as well half-resolution coding with a simple upsampling tool. For UHD resolution, the bitrate savings for LCEVC at comparable visual quality were at 46% when compared to full resolution AVC and 31% when compared to full resolution HEVC. The comparison to the more recent and more efficient EVC and VVC coding schemes led to partially overlapping confidence intervals of the subjective scores of the test subjects. The curves still revealed some benefits for the application of LCEVC. The gains compared to half-resolution coding with simple upsampling provided approximately 28%, 34%, 38%, and 33% bitrate savings at comparable visual quality, demonstrating the benefit of LCEVC enhancement layer coding compared to straight-forward plain upsampling [23].

MPEG-I Part 3 / ITU-T H.266: Versatile Video Coding (VVC)

VVC is the most recent video coding standard in the historical line of joint specifications of ISO/IEC and ITU-T, such as AVC and HEVC. The development focus for VVC was on compression efficiency improvement at a moderate increase of decode complexity as well as the versatility of the design [24][25]. Versatility features include tools designed to address HDR, WCG, resolution-adaptive multi-rate video streaming services, 360-degree immersive video, bitstream extraction and merging, temporal scalability, gradual decoding refresh, and multilayer coding to deliver layered video content to support application features such as multiview, alpha maps, depth maps, and spatial and quality scalability.

A series of verification tests have been conducted covering SDR UHD and HD, HDR PQ and HLG, as well as 360° video contents [26][27][28]. An early open-source encoder (VVenC, [14]) was additionally assessed in some categories. For SDR coding, both, the VVC reference software (VTM) and the open-source VVenC were evaluated against the HEVC reference software (HM). The results revealed bit rate savings of around 46% (SDR UHD, VTM and VVenC), 50% (SDR HD, VTM and VVenC), 49% (HDR UHD, PQ and HLG), 52%, and 50-56% (360° with different projection formats) at a similar visual quality compared to HEVC. In Figure 3, pooled MOS (Mean Opinion Score) over bit rate points for the mentioned categories are provided. The MOS values range from 10 (imperceptible impairments) down to 0 (everywhere severely annoying impairments). Pooling was done by computing the geometric mean of the bitrates and the arithmetic mean of the MOS scores across the test sequences of each test category. The results reveal a consistent benefit of VVC over its predecessor HEVC in terms of visual quality over the required bitrate.

Figure 3. Pooled MOS over bitrate plots of the VVC verification tests for the SDR UHD, SDR HD, HDR HLG, and 360° video test categories. Curves cited from [26][27][28].

Summary

This column presented an overview of the organizational structure and the activities of the Advisory Group on MPEG Visual Quality Assessment, ISO/IEC JTC 1/SC 29/AG 5, which has been formed about one year ago. The work items of AG5 include the application, documentation, evaluation, and improvement of objective quality metrics and subjective quality assessment procedures. In its first year of existence, the group has produced an overview on immersive quality metrics, draft guidelines for verification tests and for remote experts viewing sessions as well as reports of formal subjective quality assessments for the verification tests of EVC, LCEVC, and VVC. The work of the group will continue towards studying and developing quality metrics suitable for the assessment tasks emerging by the development of the various MPEG visual media coding standards and towards subjective quality evaluation in upcoming and future verification tests and new standardization projects.

References

[1] MPEG website, https://www.mpegstandards.org/.
[2] ISO/IEC JTC1 SC29, “Terms of Reference of SC 29/WGs and AGs,” Doc. SC29N19020, July 2020.
[3] ISO/IEC JTC1 SC29/AG5 MPEG VQA, “Draft Overview of Quality Metrics and Methodologies for Immersive Visual Media (v2)”, doc. AG5N13, 2nd meeting: January 2021.
[4] MPEG AG 5 Workshop on Quality of Immersive Media: Assessment and Metrics, https://multimediacommunication.blogspot.com/2021/08/mpeg-ag-5-workshop-on-quality-of.html, October 5th, 2021.
[5] ISO/IEC JTC1 SC29/AG5 MPEG VQA, “Guidelines for Verification Testing of Visual Media Specifications (draft 2)”, doc. AG5N30, 4th meeting: July 2021.
[6] ISO/IEC JTC1 SC29/AG5 MPEG VQA, “Guidelines for remote experts viewing sessions (draft 1)”, doc. AG5N31, 4th meeting: July 2021.
[7] ISO/IEC 23094-1:2020, “Information technology — General video coding — Part 1: Essential video coding”, October 2020.
[8] ISO/IEC 23094-2, “Information technology – General video coding — Part 2: Low complexity enhancement video coding”, September 2021.
[9] ISO/IEC 23090-3:2021, “Information technology — Coded representation of immersive media — Part 3: Versatile video coding”, February 2021.
[10] ITU-T H.266, “Versatile Video Coding“, August 2020. https://www.itu.int/rec/recommendation.asp?lang=en&parent=T-REC-H.266-202008-I.
[11] ISO/IEC 23090-5:2021, “Information technology — Coded representation of immersive media — Part 5: Visual volumetric video-based coding (V3C) and video-based point cloud compression (V-PCC)”, June 2021.
[12] ITU-T P.910 (2008), Subjective video quality assessment methods for multimedia applications.
[13] ITU-R BT.500-14 (2019), Methodologies for the subjective assessment of the quality of television images.
[14] Fraunhofer HHI VVenC software repository. [Online]. Available: https://github.com/fraunhoferhhi/vvenc.
[15] K. Choi, J. Chen, D. Rusanovskyy, K.-P. Choi and E. S. Jang, “An overview of the MPEG-5 essential video coding standard [standards in a nutshell]”, IEEE Signal Process. Mag., vol. 37, no. 3, pp. 160-167, May 2020.
[16] ISO/IEC 23008-2:2020, “Information technology — High efficiency coding and media delivery in heterogeneous environments — Part 2: High efficiency video coding”, August 2020.
[17] ITU-T H.265, “High Efficiency Video Coding”, August 2021.
[18] ISO/IEC 14496-10:2020, “Information technology — Coding of audio-visual objects — Part 10: Advanced video coding”, December 2020.
[19] ITU-T H.264, “Advanced Video Coding”, August 2021.
[20] ISO/IEC JTC1 SC29/WG4, “Report on Essential Video Coding compression performance verification testing for SDR Content”, doc WG4N47, 2nd meeting: January 2021.
[21] ISO/IEC JTC1 SC29/WG4, “Report on Essential Video Coding compression performance verification testing for HDR/WCG content”, doc WG4N30, 1st meeting: October 2020.
[22] G. Meardi et al., “MPEG-5—Part 2: Low complexity enhancement video coding (LCEVC): Overview and performance evaluation”, Proc. SPIE, vol. 11510, pp. 238-257, Aug. 2020.
[23] ISO/IEC JTC1 SC29/WG4, “Verification Test Report on the Compression Performance of Low Complexity Enhancement Video Coding”, doc. WG4N76, 3rd meeting: April 2020.
[24] Benjamin Bross, Jianle Chen, Jens-Rainer Ohm, Gary J. Sullivan, and Ye-Kui Wang, “Developments in International Video Coding Standardization After AVC, With an Overview of Versatile Video Coding (VVC)”, Proceedings of the IEEE, Vol. 109, Issue 9, pp. 1463–1493, doi 10.1109/JPROC.2020.3043399, Sept. 2021 (open access publication), available at https://ieeexplore.ieee.org/document/9328514.
[25] Benjamin Bross, Ye-Kui Wang, Yan Ye, Shan Liu, Gary J. Sullivan, and Jens-Rainer Ohm, “Overview of the Versatile Video Coding (VVC) Standard and its Applications”, IEEE Trans. Circuits & Systs. for Video Technol. (open access publication), available online at https://ieeexplore.ieee.org/document/9395142.
[26] Mathias Wien and Vittorio Baroncini, “VVC Verification Test Report for Ultra High Definition (UHD) Standard Dynamic Range (SDR) Video Content”, doc. JVET-T2020 of ITU-T/ISO/IEC Joint Video Experts Team (JVET), 20th meeting: October 2020.
[27] Mathias Wien and Vittorio Baroncini, “VVC Verification Test Report for High Definition (HD) and 360° Standard Dynamic Range (SDR) Video Content”, doc. JVET-V2020 of ITU-T/ISO/IEC Joint Video Experts Team (JVET), 22nd meeting: April 2021.
[28] Mathias Wien and Vittorio Baroncini, “VVC verification test report for high dynamic range video content”, doc. JVET-W2020 of ITU-T/ISO/IEC Joint Video Experts Team (JVET), 23rd meeting: July 2021.

Report from ACM IMX 2021 by Lingyuan Li

Although the Covid-19 pandemic has forced international researchers and practitioners to share their research at virtual conferences, ACM Interactive Media Experiences (IMX) 2021 clearly invested significant time and effort to provide all attendees with an accessible, interactive, and vibrant online academic feast. Serving on the Organizing Committee of IMX 2021 as the Student Volunteer Chair as well as a Doctoral Consortium student, I was happy and honoured to take part in the conference, to help support it, and to see how attendees enjoyed and benefited from it. 

I was also delighted to receive the ACM SIGMM Best Social Media Reporter Award which offered me the opportunity to write this report as a summary of my experiences with IMX 2021 (and of course a free ACM SIGMM conference registration!!).

OhYay Platform

IMX 2020 was the first time for the conference to go entirely virtual. In its second year as an entirely virtual conference, IMX 2021 collaborated with OhYay to create a very realistic and immersive experience for the conference attendees. On OhYay, attendees felt like they were in a real conference venue in New York City. There was a reception, lobbies, main hall, showcase rooms, rooftop, pool, and so forth. In addition to the high-fidelity environment, IMX 2021 and the OhYay development team added many interaction features into the platform to help attendees have a more human-centred and engaging experience: for example, attendees were able to “whisper” to each other without others being able to hear; they could send reactions, like applause emoji with sound effects; they could join some social events together, such as lip-sync, jigsaw.

Informative Conference

IMX 2021 contained a high number of inspiring talks, insightful discussions, and quality communication. On Day 1, IMX hosted a series of workshops: XR in Games, Life Improvement in Quality by Ubiquitous Experiences (LIQUE), DataTV and SensoryX. I had a three-hour doctoral consortium (DC) in the morning on Day 1 as well. 8 PhD students presented ongoing dissertation research and had 2 one-on-one sessions with distinguished researchers as mentors! I was so excited to meet people in a ‘real’ virtual space and the OhYay platform also enabled DC attendees to take group pictures in the photo booth. I could not help but Tweet my first-day experience with lots of photos.

My Tweet of DC in IMX 

On Day 2 and Day 3, with artist Sougwen Chung’s amazing keynote “Where does ‘AI’ end and ‘we’ begin?” kicking off the main conference, a set of paper sessions and panel discussions regarding mixed-reality (AR/VR), AI, gaming and inclusive design brought inspiration, new ideas and state-of-the-art research topics to attendees. Admittedly, AR/VR as well as AI technology as the focus of the current development of science and technology, lead the progress of civilization of the times. IMX helped us to see this trend of balance and integration of AI, AR, VR and MR in the future: the downstream of the hyper-reality terminal products dips into various fields, including games, consumer applications, enterprise applications, health care, education and others. With the increase of downstream application scenarios, the market space is expected to further expand. This opens up a broader world for all researchers, designers and practitioners including IMXers to explore how we can put warmth into products delivered by the developing technologies which come with many unknowns and create a need for establishing best practices, standards, and design patterns for as many people as reasonably possible.

My Tweet of the IMX main conference: Enjoyed a great deal of quality discussion and amazing interactive social events.

Every time I tweeted, I picked up representative screenshots, made them into a pretty collage, and gave infectious enthusiasm to the text. That may be my secret of winning the Social Media Award to help disseminate IMX information.

Novelties

Social Events

In addition to the world-leading interactive media research sessions, panels, speakers and showcases presented, IMX 2021 also aimed for some interactive fun for networking and chilling for attendees. There was a virtual elevator that could be seen as an events hub for attendees to select which event they wanted to join. Various social events were provided to enrich breaks in between research sessions: Mukbang, Yoga, Lip Sync, Jigsaw, etc. For example, attendees sometimes needed to collaborate with Jigsaw, which spontaneously enhanced mutual understanding through the interactive collaborative engagement even if IMX was a virtual conference. 

In this sense, IMX 2021 succeeded in its aim to allow attendees to have an “in-person” and immersive experience as much as possible because there were many opportunities for attendees to communicate more deeply, network, and socialize.

Doctoral Consortium

IMX 2021 DC provided an opportunity for 8 PhD students to present, explore and develop our research interests, under the mentorship of a panel of 14 distinguished researchers, including 2 one-on-one sessions. The virtual conference enabled mentors from all over the world to make exchanges views with students without geographical limitations. We were also able to have in-depth communication to obtain valuable instruction on dissertation research in such an immersive environment. Moreover, each student not only gave a presentation at the DC before the main conference but also presented a poster at the conference, enabling wider visibility of our work. 

Doctoral Consortium Reception Room

Accessibility

It is noteworthy that IMX 2021 made accessibility design an integral part of the conference. Except for closed-caption for ready-made videos, IMX 2021 had a captioner to provide an accurate real-time caption for a live discussion. In addition, some attendees were excited to find out that an ASL option was also offered! 

Optional ASL and live caption

IMX also took efforts to make the platform more friendly to screen-reader users. 

Conclusion

In conclusion, IMX 2021 was an excellent example of an engaging, interactive, fun, informative and nice virtual conference. The organizing team clearly not only made significant efforts to represent the diversity in which interactive media is used in our lives but also already presented an amazing show of how interactive the media could be to even benefit our online communication. I look forward to IMX 2022!

Multidisciplinary column: the importance of talking to a 12-year old

In 2018, while on a research visit to Bordeaux, I felt it would be good to connect more closely to the local community. As a consequence, colleagues convinced me to join the Femmes & Sciences movement, in which women researchers in STEM proactively did local outreach.

My French was conversational, though not stellar. But I thought it hopefully should be good enough to converse with young teenagers. Furthermore, as for ‘local community’, it would be a nice idea to both get to know colleagues and the culture of the local schools. So there I went, speaking at a countryside school in one of the many wine regions, and at a secondary school in Bordeaux where students would not trivially think of STEM university careers.

It was an amazing and enlightening experience. As soon as I started to talk about search engines, recommender systems, music and video services, a spark really ignited in the students. They knew these, and they used them daily!

But it only was because of me mentioning it, that they started realizing there was computer science technology behind all these services. Before, they had no clue.

And I think this is a real problem, that we as a community severely undervalue.

In my own family, my father (electrical engineering), sister (civil engineering & geomatics) and I (computer science) studied to become engineers. For the rest of my family, this meant we were ‘the technical people’, getting called in when computers were slow, cell phones were updated and printers started malfunctioning. This especially happened to my father and me, as we ‘were good with computers, since that was our profession’.

But I had not studied to fix printers. And, as I joked during university open days to prospective students, my sister never got asked to go fix the kitchen sink, even though she had been taught about water management.

It always has been striking to me how malfunctioning hardware and software were the first associations that laypeople outside of our field seemed to have with our work. Today, this is broadening to fears of hacking, and on the less negative side, (overblown?) hopes in AI and cryptocurrency. In all these cases, the technology is something alien, something that ‘normal’ humans do not understand and grasp well.

Yet at the same time, the technologies we build affect everyone’s lives, increasingly so. Frequently, they silently work in the back, and we indeed only visibly notice them if something goes wrong. But then, rather strange associations and dialogues emerge.

Recently, I became a member of the national Young Academy, a body of earlier-career faculty across disciplines in The Netherlands, playing a public opinion-making role on academic culture, the image of academia and its findings, and associated policy-making. Through this role, and with my background in search and recommendation, I am increasingly being invited into committees, workshops and other forms of public appearances, that involve policy-makers and laypeople concerned with the impact of AI technologies (especially: possible exclusion of humans, as a consequence of the use of AI technologies).

In these activities, it has again been striking to me how little common vocabulary is present, and how questions thus get formulated awkwardly. More than once, I get asked ‘what the algorithm exactly is doing’, when my discussion partners actually refer to broader decision-making processes, where problems may occur across the pipeline, also already before any algorithm would be deployed.

When I try to explain that much of the applications of interest focus on prioritization with a cutoff within a larger collection, and I ask how my discussion partners would prioritize, I get blank stares if I keep this story at the current, general, abstract level that would come naturally to me as a computer scientist. If I’m unlucky, I may even get an answer back that my discussion partners don’t want to take a stance themselves, as it is ‘difficult and subjective’ matter, but ‘surely AI can do this better than we humans?’. Now that will form a problem if we will frame the problem in a supervised learning setup, without a sense of solid ground truth or criteria to optimize for.

However, going through simple, concrete examples ‘close to home’ does seem to help. Here, I really benefited from the experience I had learnt while in Bordeaux and beyond, especially in setups where I had to work with children.

Try to explain concepts of information retrieval and data modelling in a non-native tongue to a 12-year old, and you are forced to ask simple questions, that will give insight into these children’s own world views and contexts. It will give them building blocks they recognize and can build on.

Working in music and multimedia has greatly helped me here; as said before, everyone is a heavy daily user of music and multimedia services, and thus (without explicitly knowing) actually has some world view ready on preferences, priorities and ways to navigate larger information collections. This will greatly help as a discussion starter, with the discussion elements remaining tangible for everyone.

I would argue that working on a better public understanding of our work is among the most societally impactful roles that we, as researchers in the field, can play. Our discussion partners are stakeholders who don’t realize they are stakeholders. And of course, in the case of children, they may at the same time be the future technologists, who in the future will build forth on our work.

It takes serious time investment and a lot of practice to get this right. I have always been puzzled at how this typically meant this would be considered too much of a time sink, and not our prime responsibility as academics. But who else would otherwise take this up?

And if I think of how much time I have been encouraged to sink into endlessly rewriting grant proposals or papers at the micro-level, just to hopefully please reviewers, something does not feel right. Any acceptances following this have arguably been good for my career. But I am not quite convinced this has been more meaningful use of the public money my contract is funded from.

Or, in a more positive interpretation: in our community, we actually care about communicating well, and are clearly willing to invest in it. But so far, we really have been focusing our attention inward, while there is a lot to gain when we’d rather look outward.

So for those who would be interested in engaging more with those outsides of our field: please do. Outreach is much more than cute PR. And with the applications that we work on being so close to people’s daily lives, we in music/multimedia hold some very important keys, and really should learn the perspectives of our end users.

So let’s use those keys, and finally, get some doors opened that have remained shut for too long.

Editor Biographies

Cynthia_Liem_2017

Dr. Cynthia C. S. Liem is an Associate Professor in the Multimedia Computing Group of Delft University of Technology, The Netherlands, and pianist of the Magma Duo. Her research interests focus on making people discover new interests and content which would not trivially be retrieved, and assessing questions of validation and validity, especially in the context of music and multimedia search and recommendation. She initiated and co-coordinated the European research projects PHENICX (2013-2016) and TROMPA (2018-2021), focusing on technological enrichment of digital musical heritage, and gained industrial experience at Bell Labs Netherlands, Philips Research and Google. She was a recipient of the Lucent Global Science and Google Anita Borg Europe Memorial scholarships, the Google European Doctoral Fellowship 2010 in Multimedia, a finalist of the New Scientist Science Talent Award 2016 for young scientists committed to public outreach, and is a member of the Dutch national Young Academy.

jochen_huber

Dr. Jochen Huber is Professor of Computer Science at Furtwangen University, Germany. Previously, he was a Senior User Experience Researcher with Synaptics and an SUTD-MIT postdoctoral fellow in the Fluid Interfaces Group at MIT Media Lab and the Augmented Human Lab at Singapore University of Technology and Design. He holds a Ph.D. in Computer Science and degrees in both Mathematics (Dipl.-Math.) and Computer Science (Dipl.-Inform.), all from Technische Universität Darmstadt, Germany. Jochen’s work is situated at the intersection of Human-Computer Interaction and Human Augmentation. He designs, implements and studies novel input technology in the areas of mobile, tangible & non-visual interaction, automotive UX and assistive augmentation. He has co-authored over 60 academic publications and regularly serves as program committee member in premier HCI and multimedia conferences. He was program co-chair of ACM TVX 2016 and Augmented Human 2015 and chaired tracks of ACM Multimedia, ACM Creativity and Cognition and ACM International Conference on Interface Surfaces and Spaces, as well as numerous workshops at ACM CHI and IUI. Further information can be found on his personal homepage: http://jochenhuber.com

VQEG Column: VQEG Meeting Jun. 2021 (virtual/online)

Introduction

Welcome to the fifth column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG).
The last VQEG plenary meeting took place online from 7 to 11 June 2021. As the previous meeting celebrated in December 2020, it was organized online (this time by Kingston University) with multiple sessions spread over five days, allowing remote participation of people from 22 different countries of America, Asia, and Europe. More than 100 participants registered to the meeting and they could attend the 40 presentations and several discussions that took place in all working groups. 
This column provides an overview of the recently completed VQEG plenary meeting, while all the information, minutes and files (including the presented slides) from the meeting are available online in the VQEG meeting website

Group picture of the VQEG Meeting 7-11 June 2021.

Several interesting presentations of state-of-the-art works can be of interest to the SIGMM community, in addition to the contributions to several working items of ITU from various VQEG groups. The progress on the new activities launched in the last VQEG plenary meeting (in relation to Live QoE assessment, SI/TI clarification, implementers guide for video quality metrics for coding applications, and the inclusion of video quality metrics as metadata in compressed streams), as well as the proposal for a new joint work on evaluation of immersive communication systems from a task-based or interactive perspective within the Immersive Media Group.

We encourage those readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Overview of VQEG Projects

Audiovisual HD (AVHD)

AVHD group works on improved subjective and objective methods for video-only and audiovisual quality of commonly available systems. Currently, after the project AVHD/P.NATS2 (a joint collaboration between VQEG and ITU SG12) finished in 2020 [1], two projects are ongoing within AVHD group: QoE Metrics for Live Video Streaming Applications (Live QoE), which was launched in the last plenary meeting, and Advanced Subjective Methods (AVHD-SUB).
The main discussion during the AVHD sessions was related to the Live QoE project, which was led by Shahid Satti (Opticom) and Rohit Puri (Twitch). In addition to the presentation of the project proposal, the main decisions reached until now were exposed (e.g., use of videos of 20-30 seconds with resolution 1080p and framerates up to 60fps, use ACR as subjective test methodology, generation of test conditions, etc.), as well as open questions were brought up for discussion, especially in relation to how to acquire premium content and network traces. 
In addition to this discussion, Steve Göring (TU Ilmenau) presented and open-source platform (AVrate Voyager) for crowdsourcing/online subjective tests [2], and Shahid Satti (Opticom) presented the performance results of the Opticom models on the project AVHD/P.NATS Phase 2. Finally, Ioannis Katsavounidis (Facebook) presented the subjective testing validation of the AV1 performance from the Alliance for Open Media (AOM) to gather feedback on the test plan and possible interested testing labs from VQEG. It is also worth noting that this session was recorded to be used as raw multimedia data for the Live QoE project. 

Quality Assessment for Health applications (QAH)

The session related to the QAH group group allocated three presentations apart from the project summary provided by Lucie Lévêque (Polytech Nantes). In particular, Meriem Outtas (INSA Rennes) provided a review on objective quality assessment of medical images and videos. This is is one of the topics jointly addressed by the group, which is working on an overview paper in line with the recent review on subjective medical image quality assessment [3]. Moreover, Zohaib Amjad Khan (Université Sorbonne Paris Nord) presented a work on video quality assessment of laparoscopic videos, while Aditja Raj and Maria Martini (Kingston University) presented their work on multivariate regression-based convolutional neural network model for fundus image quality assessment.

Statistical Analysis Methods (SAM)

The SAM session consisted of three presentations followed by discussions on the topics. One of this was related to the description of subjective experiment consistency by p-value p-p plot [4], which was presented by Jakub Nawała (AGH University of Science and Technology). In addition, Zhi Li (Netflix) and Rafał Figlus (AGH University of Science and Technology) presented the progress on the contribution from SAM to the ITU-T to modify the recommendation P.913 to include the MLE model for subject behavior in subjective experiments [5] and the recently available implementation of this model in Excel. Finally, Pablo Pérez (Nokia Bell Labs) and Lucjan Janowski (AGH University of Science and Technology) presented their work on the possibility of performing subjective experiments with four subjects [6].

Computer Generated Imagery (CGI)

Nabajeet Barman (Kingston University) presented a report on the current activities of the CGI group. The main current working topics are related to gaming quality assessment methodologies and quality prediction, and codec comparison for CG content. This group is closely collaborating with the ITU-T SG12, as reflected by its support on the completion of the 3 work items: ITU-T Rec. G.1032 on influence factors on gaming quality of experience, ITU-T Rec. P.809 on subjective evaluation methods for gaming quality, and ITU-T Rec. G.1072 on opinion model for gaming applications. Furthermore, CGI is contributing to 3 new work items: ITU-T work item P.BBQCG on parametric bitstream-based quality assessment of cloud gaming services, ITU-T work item G.OMMOG on opinion models for mobile online gaming applications, and ITU-T work item P.CROWDG on subjective evaluation of gaming quality with a crowdsourcing approach. 
In addition, four presentations were scheduled during the CGI slots. The first one was delivered by Joel Jung (Tencent Media Lab) and David Lindero (Ericsson), who presented the details of the ITU-T work item P.BBQCG. Another one was related to the evaluation of MPEG-5 Part 2 (LCEVC) for gaming video streaming applications, which was presented by Nabajeet Barman (Kingston University) and Saman Zadtootaghaj (Dolby Laboratories). Also Nabajeet together with Maria Martini (Kingston University) presented a dataset, codec comparison and challenges related to user generated HDR gaming video streaming [7]. Finally, JP Tauscher (Technische Universität Braunschweig) presented his work on EEG-based detection of deep fake images. 

No Reference Metrics (NORM)

The session for NORM group included a presentation on the impact of Spatial and Temporal Information (SI and TI) on video quality and compressibility [8], delivered by Werner Robitza (AVEQ GmbH), which was followed by a fruitful discussion on the compression complexity and on the activity related to SI/TI clarification launched in the last VQEG plenary meeting. In addition, there was another presentation from Mikołaj Leszczuk (AGH University of Science and Technology) on content type indicators for technologies supporting video sequence summarization. Finally, Ioannis Katsavounidis (Facebook) led a discussion on the inclusion of video quality metrics as metadata in compressed streams, with a report on the progress on this activity that was started in the last meeting. 

Joint Effort Group (JEG) – Hybrid

The JEG-Hybrid group is currently working on the development of a generally applicable no-reference hybrid perceptual/bitstream model. In this sense, Enrico Masala and Lohic Fotio Tiotsop (Politecnico di Tornio) presented the progress on designing a neural-network approach to model single observers using existing subjectively-annotated image and video datasets [9] (the design of subjective tests tailored for the training of this approach is envisioned for future work). In addition to this activity, the group is working in collaboration with the Sky Group on the “Hodor Project”, which is based on developing a measure that could allow to automatically identify video sequences for which quality metrics are likely to deliver inaccurate Mean Opinion Score (MOS) estimation.
Apart from these joint activities Dr. Yendo Hu (Carnation Communications Inc. and Jimei University) delivered a presentation proposing to work on a benchmarking standard to bring quality, bandwidth, and latency into a common measurement domain.

Quality Assessment for Computer Vision Applications (QACoViA)

In addition to a progress report, the QACoViA group scheduled two interesting presentations on enhancing artificial intelligence resilience to image coding artifacts through expert training (by Alban Marie from INSA Rennes) and on providing datasets to rain no-reference metrics for computer vision applications (by Carolina Whitaker from NTIA/ITS). 

5G Key Performance Indicators (5GKPI)

The 5GKPI session consisted of a presentation by Pablo Pérez (Nokia Bell-Labs) of the progress achieved by the group since the last plenary meeting in the following efforts: 1) the contribution to ITU-T Study Group 12 Question 13 related through the Technical Report about QoE in 5G video services (GSTR-5GQoE), which addresses QoE requirements and factors for some use cases like Tele-operated Driving (ToD), wireless content production, mixed reality offloading and first responder networks; 2) the contribution to the 5G Automotive Association (5GAA) through a high-level contribution on general QoE requirements for remote driving, considering for the near future the execution of subjective tests for ToD video quality; and 3) the long-term plan on working on a methodology to create simple opinion models to estimate average QoE for a network and use case.

Immersive Media Group (IMG)

Several presentations were delivered during the IMG session that were divided into two blocks: one covering technologies and studies related to the evaluation of immersive communication systems from a task-based or interactive perspective, and another one covering other topics related to the assessment of QoE of immersive media. 
The first set of presentations is related to a new proposal for a joint work within IMG related to the ITU-T work item P.QXM on QoE assessment of eXtended Reality meetings. Thus, Irene Viola (CWI) presented an overview of this work item. In addition, Carlos Cortés (Universidad Politécncia de Madrid) presented his work on evaluating the impact of delay on QoE in immersive interactive environments, Irene Viola (CWI) presented a dataset of point cloud dynamic humans for immersive telecommunications, Pablo César (CWI) presented their pipeline for social virtual reality [10], and Narciso García (Universidad Politécncia de Madrid) presented their real-time free-viewpoint video system (FVVLive) [11]. After these presentations, Jesús Gutiérrez (Universidad Politécncia de Madrid) led the discussion on joint next steps with IMG, which, in addition, to identify interested parties in joining the effort to study the evaluation of immersive communication systems, also covered the further analyses to be done from the subjective tests carried out with short 360-degree videos [12] and the studies carried out to assess quality and other factors (e.g., presence) with long omnidirectional sequences. In this sense, Marta Orduna (Universidad Politécnica de Madrid) presented her subjective study to validate a methodology to assess quality, presence, empathy, attitude, and attention in Social VR [13]. Future progress on these joint activities will be discussed in the group audio-calls. 
Within the other block of presentations related to immersive media topics, Maria Martini (Kingston University), Chulhee Lee (Yonsei University), and Patrick Le Callet (Université de Nantes) presented the status of IEEE standardization on QoE for immersive experiences (IEEE P3333.1.4 – Light Field, and IEEE P3333.1.3, deep learning-based quality assessment), Kjell Brunnström (RISE) presented their work on legibility and readability in augmented reality [14], Abdallah El Ali (CWI) presented his work on investigating the relationship between momentary emotion self-reports and head and eye movements in HMD-based 360° videos [15], Elijs Dima (Mid Sweden University) exposed his study on quality of experience in augmented telepresence considering the effects of viewing positions and depth-aiding augmentation [16], Silvia Rossi (UCL) presented her work towards behavioural analysis of 6-DoF user when consuming immersive media [17], and Yana Nehme (INSA Lyon) presented a study on exploring crowdsourcing for subjective quality assessment of 3D Graphics.

Intersector Rapporteur Group on Audiovisual Quality Assessment (IRG-AVQA) and Q19 Interim Meeting

During the IRG-AVQA session, an overview on the progress and recent works within ITU-R SG6 and ITU-T SG12 was provided. In particular, Chulhee Lee (Yonsei University) in collaboration with other ITU rapporteurs presented the progress of ITU-R WP6C on recommendations for HDR content, the work items within: ITU-T SG12 Question 9 on audio-related work items, SG12 Question 13 on gaming and immersive technologies (e.g., augmented/extended reality) among others, SG12 Question 14 recommendations and work items related to the development of video quality models, and SG12 Question 19 on work items related to television and multimedia. In addition, the progress of the group “Implementers Guide for Video Quality Metrics (IGVQM)”, launched in the last plenary meeting by Ioannis Katsavounidis (Facebook) was discussed addressing specific points to push the collection of video quality models and datasets to be used to develop an implementer’s guide for objective video quality metrics for coding applications. 

Other updates

The next VQEG plenary meeting will take place online in December 2021.

In addition, VQEG is investigating the possibility to disseminate the videos from all the talks from these plenary meetings via platforms such as Youtube and Facebook.

Finally, given that some modifications are being made to the public FTP of VQEG, if the links to the presentations included in this column are not opened by the browser, the reader can download all the presentations in one compressed file.

References

[1] A. Raake, S. Borer, S. Satti, J. Gustafsson, R.R.R. Rao, S. Medagli, P. List, S. Göring, D. Lindero, W. Robitza, G. Heikkilä, S. Broom, C. Schmidmer, B. Feiten, U. Wüstenhagen, T. Wittmann, M. Obermann, and R. Bitto, “Multi-model standard for bitstream-, pixel-based and hybrid video quality assessment of UHD/4K: ITU-T P.1204”, IEEE Access, vol. 8, pp. 193020-193049, Oct. 2020.
[2] R.R.R. Rao, S. Göring, and A. Raake, “Towards High Resolution Video Quality Assessment in the Crowd”, IEEE Int. Conference on Quality of Multimedia Experience (QoMEX), Jun. 2021.
[3] L. Lévêque, M. Outtas, H. Liu, and L. Zhang, “Comparative study of the methodologies used for subjective medical image quality assessment”, Physics in Medicine & Biology, Jul. 2021 (Accepted).
[4] J. Nawala, L. Janowski, B. Cmiel, and K. Rusek, “Describing Subjective Experiment Consistency by p-Value P–P Plot”, ACM International Conference on Multimedia (ACM MM), Oct. 2020.
[5] Z. Li, C. G. Bampis, L. Krasula, L. Janowski, and I. Katsavounidis, “A Simple Model for Subject Behavior in Subjective Experiments”, arXiv:2004.02067v3, May 2021.
[6] P. Perez, L. Janowski, N. Garcia, M. Pinson, “Subjective Assessment Experiments That Recruit Few Observers With Repetitions (FOWR)”, arXiv:2104.02618, Apr. 2021.
[7] N. Barman, and M. G. Martini, “User Generated HDR Gaming Video Streaming: Dataset, Codec Comparison and Challenges”, IEEE Transactions on Circuits and Systems for Video Technology, May 2021.
[8] W. Robitza, R.R.R. Rao, S. Göring, and A. Raake, “Impact of Spatial and Temporal Information on Video Quality and Compressibility”, IEEE Int. Conference on Quality of Multimedia Experience (QoMEX), Jun. 2021.
[9] L. Fotio Tiotsop, T. Mizdos, M. Uhrina, M. Barkowsky, P. Pocta, and E. Masala, “Modeling and estimating the subjects’ diversity of opinions in video quality assessment: a neural network based approach”, Multimedia Tools and Applications, vol. 80, pp. 3469–3487, Sep. 2020.
[10] J. Jansen, S. Subramanyam, R. Bouqueau, G. Cernigliaro, M. Martos Cabré, F. Pérez, and P. Cesar, “A Pipeline for Multiparty Volumetric Video Conferencing: Transmission of Point Clouds over Low Latency DASH”, ACM Multimedia Systems Conference (MMSys), May 2020.
[11] P. Carballeira, C. Carmona, C. Díaz, D. Berjón, D. Corregidor, J. Cabrera, F. Morán, C. Doblado, S. Arnaldo, M.M. Martín, and N. García, “FVV Live: A real-time free-viewpoint video system with consumer electronics hardware”, IEEE Transactions on Multimedia, May 2021.
[12] J. Gutiérrez, P. Pérez, M. Orduna, A. Singla, C. Cortés, P. Mazumdar, I. Viola, K. Brunnström, F. Battisti, N. Cieplińska, D. Juszka, L. Janowski, M. Leszczuk, A. Adeyemi-Ejeye, Y. Hu, Z. Chen, G. Van Wallendael, P. Lambert, C. Díaz, J. Hedlund, O. Hamsis, S. Fremerey, F. Hofmeyer, A. Raake, P. César, M. Carli, N. García, “Subjective evaluation of visual quality and simulator sickness of short 360° videos: ITU-T Rec. P.919”, IEEE Transactions on Multimedia, Jul. 2021 (Early Access).
[13] M. Orduna, P. Pérez, J. Gutiérrez, and N. García, “Methodology to Assess Quality, Presence, Empathy, Attitude, and Attention in Social VR: International Experiences Use Case”, arXiv:2103.02550, 2021.
[14] J. Falk, S. Eksvärd, B. Schenkman, B. Andrén, and K. Brunnström “Legibility and readability in Augmented Reality”, IEEE Int. Conference on Quality of Multimedia Experience (QoMEX), Jun. 2021.
[15] T. Xue,  A. El Ali,  G. Ding,  and P. Cesar, “Investigating the Relationship between Momentary Emotion Self-reports and Head and Eye Movements in HMD-based 360° VR Video Watching”, Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, May 2021.
[16] E. Dima, K. Brunnström, M. Sjöström, M. Andersson, J. Edlund, M. Johanson, and T. Qureshi, “Joint effects of depth-aiding augmentations and viewing positions on the quality of experience in augmented telepresence”, Quality and User Experience, vol. 5, Feb. 2020.
[17] S. Rossi, I. Viola, J. Jansen, S. Subramanyam, L. Toni, and P. Cesar, “Influence of Narrative Elements on User Behaviour in Photorealistic Social VR”, International Workshop on Immersive Mixed and Virtual Environment Systems (MMVE), Sep. 28, 2021.