VQEG Column: VQEG Meeting June 2023


This column provides a report on the last Video Quality Experts Group (VQEG) plenary meeting, which took place from 26 to 30 June 2023 in San Mateo (USA), hosted by Sony Interactive Entertainment. More than 90 participants worldwide registered for the hybrid meeting, counting with the physical attendance of more than 40 people. This meeting was co-located with the ITU-T SG12 meeting, which took place in the first two days of the week. In addition, more than 50 presentations related to the ongoing projects within VQEG were provided, leading to interesting discussions among the researchers attending the meeting. All the related information, minutes, and files from the meeting are available online on the VQEG meeting website, and video recordings of the meeting are available on Youtube.

In this meeting, there were several aspects that can be relevant for the SIGMM community working on quality assessment. For instance, there are interesting new work items and efforts on updating existing recommendations discussed in the ITU-T SG12 co-located meeting (see the section about the Intersector Rapporteur Group on Audiovisual Quality Assessment). In addition, there was an interesting panel related to deep learning for video coding and video quality with experts from different companies (e.g., Netflix, Adobe, Meta, and Google) (see the Emerging Technologies Group section). Also, a special session on Quality of Experience (QoE) for gaming was organized, involving researchers from several international institutions. Apart from this, readers may be interested in the presentation about MPEG activities on quality assessment and the different developments from industry and academia on tools, algorithms and methods for video quality assessment.

We encourage readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Group picture of the VQEG Meeting 26-30 June 2023 hosted by Sony Interactive Entertainment (San Mateo, USA).

Overview of VQEG Projects

Audiovisual HD (AVHD)

The AVHD group investigates improved subjective and objective methods for analyzing commonly available video systems. In this meeting, there were several presentations related to topics covered by this group, which were distributed in different sessions during the meeting.

Nabajeet Barman (Kingston University, UK) presented a datasheet for subjective and objective quality assessment datasets. Ali Ak (Nantes Université, France) delivered a presentation on the acceptability and annoyance of video quality in context. Mikołaj Leszczuk (AGH University, Poland) presented a crowdsourcing pixel quality study using non-neutral photos. Kamil Koniuch (AGH University, Poland) discussed about the role of theoretical models in ecologically valid studies, covering the example of a video quality of experience model. Jingwen Zhu (Nantes Université, France) presented her work on evaluating the streaming experience of the viewers with Just Noticeable Difference (JND)-based Encoding. Also, Lucjan Janowski (AGH University, Poland) talked about proposing a more ecologically-valid experiment protocol using YouTube platform.

In addition, there were four presentations by researchers from the industry sector. Hojat Yeganeh (SSIMWAVE/IMAX, USA) talked about how more accurate video quality assessment metrics would lead to more savings. Lukas Krasula (Netflix, USA) delivered a presentation on subjective video quality for 4K HDR-WCG content using a browser-based approach for at-home testing. Also, Christos Bampis (Netflix, USA) presented the work done by Netflix on improving video quality with neural networks. Finally, Pranav Sodhani (Apple, USA) talked about how to evaluate videos with the Advanced Video Quality Tool (AVQT).

Quality Assessment for Health applications (QAH)

The QAH group works on the quality assessment of health applications, considering both subjective evaluation and the development of datasets, objective metrics, and task-based approaches. The group is currently working towards an ITU-T recommendation for the assessment of medical contents. In this sense, Meriem Outtas (INSA Rennes, France) led an editing session of a draft of this recommendation.

Statistical Analysis Methods (SAM)

The SAM group works on improving analysis methods both for the results of subjective experiments and for objective quality models and metrics. The group is currently working on updating and merging the ITU-T recommendations P.913, P.911, and P.910.

Apart from this, several researchers presented their works on related topics. For instance, Pablo Pérez (Nokia XR Lab, Spain) presented (not so) new findings about transmission rating scale and subjective scores. Also, Jingwen Zhu (Nantes Université, France) presented ZREC, an approach for mean and percentile opinion scores recovery. In addition, Andreas Pastor (Nantes Université, France) presented three works: 1) on the accuracy of open video quality metrics for local decision in AV1 video codec, 2) on recovering quality scores in noisy pairwise subjective experiments using negative log-likelihood, and 3) on guidelines for subjective haptic quality assessment, considering a case study on quality assessment of compressed haptic signals. Lucjan Janowski (AGH University, Poland) discussed about experiment precision, proposing experiment precision measures and methods for experiments comparison. Finally, there were three presentations from members of the University of Konstanz (Germany). Dietmar Saupe presented the JPEG AIC-3 activity on fine-grained assessment of subjective quality of compressed images, Mohsen Jenadeleh talked about how relaxed forced choice improves performance of visual quality assessment methods, and Mirko Dulfer presented his work on quantization for Mean Opinion Score (MOS) recovery in Absolute Category Rating (ACR) experiments.

Computer Generated Imagery (CGI)

CGI group is devoted to analyzing and evaluating of computer-generated content, with a focus on gaming in particular. In this meeting, Saman Zadtootaghaj (Sony Interactive Entertainment, Germany) an Nabajeet Barman (Kingston University, UK) organized a special gaming session, in which researchers from several international institutions presented their work in this topic. Among them, Yu-Chih Chen (UT Austin LIVE Lab, USA) presented GAMIVAL, a Video Quality Prediction on Mobile Cloud Gaming Content. Also, Urvashi Pal (Akamai, USA) delivered a presentation on web streaming quality assessment via computer vision applications over cloud. Mathias Wien (RWTH Aachen University, Germany) provided updates on ITU-T P.BBQCG work item, dataset and model development. Avinab Saha (UT Austin LIVE Lab, USA) presented a study of subjective and objective quality assessment of mobile cloud gaming videos. Finally, Irina Cotanis (Infovista, Sweden) and Karan Mitra (Luleå University of Technology, Sweden) presented their work towards QoE models for mobile cloud and virtual reality games.

No Reference Metrics (NORM)

The NORM group is an open collaborative project for developing no-reference metrics for monitoring visual service quality. In this meeting, Margaret Pinson (NTIA, USA) and Ioannis Katsavounidis (Meta, USA), two of the chairs of the group, provided a summary of NORM successes and discussion of current efforts for improved complexity metric. In addition, there were six presentations dealing with related topics. C.-C. Jay Kuo (University of Southern California, USA) talked about blind visual quality assessment for mobile/edge computing. Vignesh V. Menon (University of Klagenfurt, Austria) presented the updates of the Video Quality Analyzer (VQA). Yilin Wang (Google/YouTube, USA) gave a talk on the recent updates on the Universal Video Quality (UVQ). Farhad Pakdaman (Tampere University, Finland) and Li Yu (Nanjing University, China), presented a low complexity no-reference image quality assessment based on multi-scale attention mechanism with natural scene statistics. Finally, Mikołaj Leszczuk (AGH University, Poland) presented his work on visual quality indicators adapted to resolution changes and on considering in-the-wild video content as a special case of user generated content and a system for its recognition.

Emerging Technologies Group (ETG)

The main objective of the ETG group is to address various aspects of multimedia that do not fall under the scope of any of the existing VQEG groups. The topics addressed are not necessarily directly related to “video quality” but can indirectly impact the work addressed as part of VQEG. This group aims to provide a common platform for people to gather together and discuss new emerging topics, discuss possible collaborations in the form of joint survey papers/whitepapers, funding proposals, etc.

One of the topics addressed by this group is related to the use of artificial-intelligence technologies to different domains, such as compression, super-resolution, and video quality assessment. In this sense, Saman Zadtootaghaj (Sony Interactive Entertainment, Germany) organized a panel session with experts from different companies (e.g., Netflix, Adobe, Meta, and Google) on deep learning in the video coding and video quality domains. In this sense, Marcos Conde (Sony Interactive Entertainment, Germany) and David Minnen (Google, USA) gave a talk on generative compression and the challenges for quality assessment.

Another topic covered by this group is greening of streaming and related trends. In this sense, Vignesh V. Menon and Samira Afzal (University of Klagenfurt, Austria) presented their work on green variable framerate encoding for adaptive live streaming. Also, Prajit T. Rajendran (Université Paris Saclay, France) and Vignesh V. Menon (University of Klagenfurt, Austria) delivered a presentation on energy efficient live per-title encoding for adaptive streaming. Finally, Berivan Isik (Stanford University, USA) talked about sandwiched video compression to efficiently extending the reach of standard codecs with neural wrappers.

Joint Effort Group (JEG) – Hybrid

The JEG group was focused on a joint work to develop hybrid perceptual/bitstream metrics and gradually evolved over time to include several areas of Video Quality Assessment (VQA), such as the creation of a large dataset for training such models using full-reference metrics instead of subjective metrics. In addition, the group will include under its activities the VQEG project Implementer’s Guide for Video Quality Metrics (IGVQM).

Apart from this, there were three presentations addressing related topics in this meeting. Nabajeet Barman (Kingston University, UK) presented a subjective dataset for multi-screen video streaming applications. Also, Lohic Fotio (Politecnico di Torino, Italy) presented his works entitled “Human-in-the-loop” training procedure of the artificial-intelligence-based observer (AIO) of a real subject and advances on the “template” on how to report DNN-based video quality metrics.

The website of the group includes a list of activities of interest, freely available publications, and other resources.

Immersive Media Group (IMG)

The IMG group is focused on the research on quality assessment of immersive media. The main joint activity going on within the group is the development of a test plan to evaluate the QoE of immersive interactive communication systems, which is carried out in collaboration with ITU-T through the work item P.IXC. In this meeting, Pablo Pérez (Nokia XR Lab, Spain) and Jesús Gutiérrez (Universidad Politécnica de Madrid, Spain) provided a report on the status of the test plan, including the test proposals from 13 different groups that have joined the activity, which will be launched in September.

In addition to this, Shirin Rafiei (RISE, Sweden) delivered a presentation on her work on human interaction in industrial tele-operated driving through a laboratory investigation.

Quality Assessment for Computer Vision Applications (QACoViA)

The goal of the QACoViA group is to study the visual quality requirements for computer vision methods, where the “final observer” is an algorithm. In this meeting, Avrajyoti Dutta (AGH University, Poland) delivered a presentation dealing with the subjective quality assessment of video summarization algorithms through a crowdsourcing approach.

Intersector Rapporteur Group on Audiovisual Quality Assessment (IRG-AVQA)

This VQEG meeting was co-located with the rapporteur group meeting of ITU-T Study Group 12 – Question 19, coordinated by Chulhee Lee (Yonsei University, Korea). During the first two days of the week, the experts from ITU-T and VQEG worked together on various topics. For instance, there was an editing session to work together on the VQEG proposal to merge the ITU-T Recommendations P.910, P.911, and P.913, including updates with new methods. Another topic addressed during this meeting was the working item “P.obj-recog”, related to the development of an object-recognition-rate-estimation model in surveillance video of autonomous driving. In this sense, a liaison statement was also discussed with the VQEG AVHD group. Also in relation to this group, another liaison statement was discussed on the new work item “P.SMAR” on subjective tests for evaluating the user experience for mobile Augmented Reality (AR) applications.

Other updates

One interesting presentation was given by Mathias Wien (RWTH Aachen University, Germany) on the quality evaluation activities carried out within the MPEG Visual Quality Assessment group, including the expert viewing tests. This presentation and the follow-up discussions will help to strengthen the collaboration between VQEG and MPEG on video quality evaluation activities.

The next VQEG plenary meeting will take place in autumn 2023 and will be announced soon on the VQEG website.

MPEG Column: 143rd MPEG Meeting in Geneva, Switzerland

The 143rd MPEG meeting took place in person in Geneva, Switzerland. The official press release can be accessed here and includes the following details:

  • MPEG finalizes the Carriage of Uncompressed Video and Images in ISOBMFF
  • MPEG reaches the First Milestone for two ISOBMFF Enhancements
  • MPEG ratifies Third Editions of VVC and VSEI
  • MPEG reaches the First Milestone of AVC (11th Edition) and HEVC Amendment
  • MPEG Genomic Coding extended to support Joint Structured Storage and Transport of Sequencing Data, Annotation Data, and Metadata
  • MPEG completes Reference Software and Conformance for Geometry-based Point Cloud Compression

We have adjusted the press release to suit the audience of ACM SIGMM and emphasized research on video technologies. This edition of the MPEG column centers around ISOBMFF and video codecs. As always, the column will conclude with an update on MPEG-DASH.

ISOBMFF Enhancements

The ISO Base Media File Format (ISOBMFF) supports the carriage of a wide range of media data such as video, audio, point clouds, haptics, etc., which has now been further extended to uncompressed video and images.

ISO/IEC 23001-17 – Carriage of uncompressed video and images in ISOBMFF – specifies how uncompressed 2D image and video data is carried in files that comply with the ISOBMFF family of standards. This encompasses a range of data types, including monochromatic and colour data, transparency (alpha) information, and depth information. The standard enables the industry to effectively exchange uncompressed video and image data while utilizing all additional information provided by the ISOBMFF, such as timing, color space, and sample aspect ratio for interoperable interpretation and/or display of uncompressed video and image data.

ISO/IEC 14496-15 (based on ISOBMFF) provides the basis for “network abstraction layer (NAL) unit structured video coding formats” such as AVC, HEVC, and VVC. The current version is the 6th edition, which has been amended to support neural-network post-filter supplemental enhancement information (SEI) messages. This amendment defines the carriage of the neural-network post-filter characteristics (NNPFC) SEI messages and the neural-network post-filter activation (NNPFA) SEI messages to enable the delivery of (i) a base post-processing filter and (ii) a series of neural network updates synchronized with the input video pictures/frames.

Research aspects: While the former, the carriage of uncompressed video and images in ISOBMFF, seems to be something obvious to be supported within a file format, the latter enables to use neural network-based post-processing filters to enhance video quality after the decoding process, which is an active field of research. The current extensions with the file format provide a baseline for the evaluation (cf. also next section).

Video Codec Enhancements

MPEG finalized the specifications of the third editions of the Versatile Video Coding (VVC, ISO/IEC 23090-3) and the Versatile Supplemental Enhancement Information (VSEI, ISO/IEC 23002-7) standards. Additionally, MPEG issued the Committee Draft (CD) text of the eleventh edition of the Advanced Video Coding (AVC, ISO/IEC 14496-10) standard and the Committee Draft Amendment (CDAM) text on top of the High Efficiency Video Coding standard (HEVC, ISO/IEC 23008-2).

These SEI messages include two systems-related SEI messages, (a) one for signaling of green metadata as specified in ISO/IEC 23001-11 and (b) the other for signaling of an alternative video decoding interface for immersive media as specified in ISO/IEC 23090-13. Furthermore, the neural network post-filter characteristics SEI message and the neural-network post-processing filter activation SEI message have been added to AVC, HEVC, and VVC.

The two SEI messages for describing and activating post-filters using neural network technology in video bitstreams could, for example, be used for reducing coding noise, spatial and temporal upsampling (i.e., super-resolution and frame interpolation), color improvement, or general denoising of the decoder output. The description of the neural network architecture itself is based on MPEG’s neural network representation standard (ISO/IEC 15938 17). As results from an exploration experiment have shown, neural network-based post-filters can deliver better results than conventional filtering methods. Processes for invoking these new post-filters have already been tested in a software framework and will be made available in an upcoming version of the VVC reference software (ISO/IEC 23090-16).

Research aspects: SEI messages for neural network post-filters (NNPF) for AVC, HEVC, and VVC, including systems supports within the ISOBMFF, is a powerful tool(box) for interoperable visual quality enhancements at the client. This tool(box) will (i) allow for Quality of Experience (QoE) assessments and (ii) enable the analysis thereof across codecs once integrated within the corresponding reference software.


The current status of MPEG-DASH is depicted in the figure below:

The latest edition of MPEG-DASH is the 5th edition (ISO/IEC 23009-1:2022) which is publicly/freely available here. There are currently three amendments under development:

  • ISO/IEC 23009-1:2022 Amendment 1: Preroll, nonlinear playback, and other extensions. This amendment has been ratified already and is currently being integrated into the 5th edition of part 1 of the MPEG-DASH specification.
  • ISO/IEC 23009-1:2022 Amendment 2: EDRAP streaming and other extensions. EDRAP stands for Extended Dependent Random Access Point and at this meeting the Draft Amendment (DAM) has been approved. EDRAP increases the coding efficiency for random access and has been adopted within VVC.
  • ISO/IEC 23009-1:2022 Amendment 3: Segment sequences for random access and switching. This amendment is at Committee Draft Amendment (CDAM) stage, the first milestone of the formal standardization process. This amendment aims at improving tune-in time for low latency streaming.

Additionally, MPEG Technologies under Consideration (TuC) comprises a few new work items, such as content selection and adaptation logic based on device orientation and signalling of haptics data within DASH.

Finally, part 9 of MPEG-DASH — redundant encoding and packaging for segmented live media (REAP) — has been promoted to Draft International Standard (DIS). It is expected to be finalized in the upcoming meetings.

Research aspects: Random access has been extensively evaluated in the context of video coding but not (low latency) streaming. Additionally, the TuC item related to content selection and adaptation logic based on device orientation raises QoE issues to be further explored.

The 144th MPEG meeting will be held in Hannover from October 16-20, 2023. Click here for more information about MPEG meetings and their developments.

JPEG Column: 99th JPEG Meeting

JPEG Trust on a mission to re-establish trust in digital media

The 99th JPEG meeting was held online, from 24th to 28th April 2023.

Providing tools suitable for establishing provenance, authenticity and ownership of multimedia content is one of the most difficult challenges faced nowadays, considering the technological models that allow effective multimedia data manipulation and generation. As in the past, the JPEG Committee is again answering the emerging challenges in multimedia. JPEG Trust is a standard offering solutions to media authenticity, provenance and ownership.

Furthermore, learning-based coding standards, JPEG AI and JPEG Pleno Learning-based Point Cloud Coding, continue their development. New verification models that incorporate the technological developments resulting from verification experiments and contributions have been approved.

Also relevant, the responses to the Calls for Contributions on standardization of quality models of JPEG AIC and JPEG Pleno Light Field Quality Assessment received responses and started a collaborative process to define new standards.

The 99th JPEG meeting had the following highlights:

Trust, Authenticity and Provenance.
  • New JPEG Trust international standard targets media authenticity
  • JPEG AI new verification model
  • JPEG DNA releases its call for proposals
  • JPEG Pleno Light Field Quality Assessment analyses the response to the call for contributions
  • JPEG AIC analyses the response to the call for contributions
  • JPEG XE identifies use cases and requirements for event based vision
  • JPEG Systems: JUMBF second edition is progressing to publication stage
  • JPEG NFT prepares a call for proposals
  • JPEG XS progress for its third edition

The following summarizes the major achievements during the 99th JPEG meeting.

New JPEG Trust international standard targets media authenticity

Drawing reliable conclusions about the authenticity of digital media is complicated, and becoming more so as AI-based synthetic media such as Deep Fakes and Generative Adversarial Netwodrks (GANs) start appearing. Consumers of social media are challenged to assess the trustworthiness of the media they encounter, and agencies that depend on the authenticity of media assets must be concerned with mistaking fake media for real, with risks of real-world consequences.

To address this problem and to provide leadership in global interoperable media asset authenticity, JPEG initiated development of a new international standard: JPEG Trust. JPEG Trust defines a framework for establishing trust in media. This framework adresses aspects of authenticity, provenance and integrity through secure and reliable annotation of media assets throughout their life cycle. The first part, “Core foundation”, defines the JPEG Trust framework and provides building blocks for more elaborate use cases. It is expected that the standard will evolve over time and be extended with additional specifications.

JPEG Trust arises from a four-year exploration of requirements for addressing mis- and dis-information in online media, followed by a 2022 Call for Proposals, conducted by international experts from industry and academia from all over the world.

The new standard is expected to be published in 2024. To stay updated on JPEG Trust, please regularly check the JPEG website for the latest information.


The JPEG AI activity progressed at this meeting with more than 60 technical contributions submitted for improvements and additions to the Verification Model (VM), which after some discussion and analysis, resulted in several adoptions for integration into the future VM3.0. These adoptions target the speed-up of the decoding process, namely the replacement of the range coder by an asymmetric numeral system, support for multi-threading or/and single instruction multiple data operations, and parallel decoding with sub-streams. The JPEG AI context module was significantly accelerated with a new network architecture along with other synthesis transform and entropy decoding network simplifications. Moreover, a lightweight model was also adopted targeting mobile devices, providing 10%-15% compression efficiency gains over VVC Intra at just 20-30 kMAC/pxl. In this context, JPEG AI will start the development and evaluation of two JPEG AI VM configurations at two different operating points: lightweight and high.

At the 99th meeting, the JPEG AI requirements were reviewed and it was concluded that most of the key requirements will be achieved by the previously anticipated timeline for DIS (scheduled for Oct. 2023) and thus version 1 of the JPEG AI standard will go as planned without changes in its timeline and with a clear focus on image reconstruction. Some core requirements, such as those addressing computer vision and image processing tasks as well as progressive decoding, will be addressed in a version 2 along with other tools that further improve requirements already addressed in version 1, such as better compression efficiency.

JPEG Pleno Learning-based Point Cloud coding

The JPEG Pleno Point Cloud activity progressed at this meeting with a major improvement to its VM providing improved performance and control over the balance between the coding of geometry and colour via a split geometry and colour coding framework. Colour attribute information is encoded using JPEG AI resulting in enhanced performance and compatibility with the ecosystem of emerging high-performance JPEG codecs. Prior to the 100th JPEG Meeting, JPEG experts will investigate possible advancements to the VM in the areas of attention models, sparse tensor convolution, and support for residual lossless coding.


The JPEG Committee has been working on an exploration for coding of images in quaternary representations particularly suitable for image archival on DNA storage. The scope of JPEG DNA is the creation of a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers. During the 99th JPEG meeting, a final call for proposals for JPEG DNA was issued and made public, as a first concrete step towards standardization.

The final call for proposals for JPEG DNA is complemented by a JPEG DNA Common Test Conditions document which is also made public, describing details about the dataset, operating points, anchors and performance assessment methodologies and metrics that will be used to evaluate anchors and future proposals to be submitted. A set of exploration studies has validated the procedures outlined in the final call for proposals for JPEG DNA. The deadline for submission of proposals to the Call for Proposals for JPEG DNA is 2 October 2023, with a pre-registration due by 10 July 2023. The JPEG DNA international standard is expected to be published by early 2025.

JPEG Pleno Light Field Quality Assessment

At the 99th JPEG meeting two contributions were received in response to the JPEG Pleno Final Call for Contributions (CfC) on Subjective Light Field Quality Assessment.

  • Contribution 1: presents a 3-step subjective quality assessment framework, with a pre-processing step; a scoring step; and a data processing step. The contribution includes a software implementation of the quality assessment framework.
  • Contribution 2: presents a multi-view light field dataset, comprising synthetic light fields. It provides RGB + ground-truth depth data, realistic and challenging blender scenes, with various textures, fine structures, rich depth, specularities, non-Lambertian areas, and difficult materials (water, patterns, etc).

The received contributions will be considered in the development of a modular framework based on a collaborative process addressing the use cases and requirements under the JPEG Pleno Quality Assessment of light fields standardization effort.


Three contributions in response to the JPEG Call for Contributions (CfC) on Subjective Image Quality Assessment were received at the 99th JPEG meeting. One contribution presented a new subjective quality assessment methodology that combines relative and absolute data. The second contribution reported a new subjective quality assessment methodology based on triplet comparison with boosting techniques. Finally, the last contribution reported a new pairwise sampling methodology.

These contributions will be considered in the development of the standard, following a collaborative process. Several core experiments were designed to assist the creation of a Working Draft (WD) for the future JPEG AIC Part 3 standard.


The JPEG committee continued with the exploration activity on Event-based Vision, called JPEG XE. Event-based Vision revolves around a new and emerging image modality created by event-based visual sensors. At this meeting, the scope was defined to be the creation and development of a standard to represent events in an efficient way allowing interoperability between sensing, storage, and processing, targeting machine vision applications. Events in the context of this standard are defined as the messages that signal the result of an observation at a precise point in time, typically triggered by a detected change in the physical world. The exploration activity is currently working on the definition of the use cases and requirements.

An Ad-hoc Group has been established. To stay informed about the activities please join the event based imaging Ad-hoc Group mailing list.


The second editions of JPEG XL Part 1 (Core coding system) and Part 2 (File format) have proceeded to the DIS stage. These second editions provide clarifications, corrections and editorial improvements that will facilitate independent implementations. Experiments are planned to prepare for a second edition of JPEG XL Part 3 (Conformance testing), including conformance testing of the independent implementations J40, jxlatte, and jxl-oxide.

JPEG Systems

The second edition of JUMBF (JPEG Universal Metadata Box Format, ISO/IEC 19566-5) is progressing to the IS publication stage; the second edition brings new capabilities and support for additional types of media.


Many Non-Fungible Tokens (NFTs) point to assets represented in JPEG formats or can be represented in current and emerging formats under development by the JPEG Committee. However, various trust and security concerns have been raised about NFTs and the digital assets on which they rely. To better understand user requirements for media formats, the JPEG Committee conducted an exploration on NFTs. The scope of JPEG NFT is the creation of effective specifications that support a wide range of applications relying on NFTs applied to media assets. The standard will be secure, trustworthy and eco-friendly, allowing for an interoperable ecosystem relying on NFT within a single application or across applications. As a result of the exploration, at the 99th JPEG Meeting the committee released a “Draft Call for Proposals on JPEG NFT” and associated updated “Use Cases and Requirements for JPEG NFT”. Both documents are made publicly available for review and feedback.


The JPEG committee continued its work on the JPEG XS 3rd edition. The primary goal of the 3rd edition is to deliver the same image quality as the 2nd edition, but with half of the required bandwidth. For Part 1 – Core coding tools – the Draft International Standard will proceed to ISO/IEC ballot. This is a significant step in the standardization process with all the core coding technology now final. Most notably, Part 1 adds a temporal decorrelation coding mode to further improve the coding efficiency, while keeping the low-latency and low-complexity core aspects of JPEG XS. Furthermore, Part 2 – Profiles and buffer models – and Part 3 – Transport and container formats – will proceed to Committee Draft consultation. Part 2 is important as it defines the conformance points for JPEG XS compliance. Completion of the JPEG XS 3rd edition standard is scheduled for January 2024.

Final Quote

“The creation of standardized tools to bring assurance of authenticity, provenance and ownership for multimedia content is the most efficient path to suppress the abusive use of fake media. JPEG Trust will be the first international standard that provides such tools.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Future JPEG meetings are planned as follows:

  • No 100, will be in Covilhã, Portugal from 17-21 July 2023
  • No 101, will be online from 30 October – 3 November 2023

A zip package containing the official JPEG logo and logos of all JPEG standards can be downloaded here.

Video Interviews at ACM Multimedia 2022

This column showcases a series of video interviews shooted at ACM Multimedia 2022.
Social media editors in chief (i.e., Silvia Rossi and Conor Keighrey) of the records interviewed the authors behind some of the most intriguing and compelling demos and artistic interactive artworks. Silvia and Conor have started this initiative and will continue, when possible, at conferences supported by SIGMM.

ACM Multimedia is the premier international conference in the area of multimedia within the field of computer science.
As in every edition of ACM MM, the conference once again played host to riveting demonstrations and interactive showcases of the latest research concepts. These sessions serve a dual purpose: they stand as a testament to the presenters’ invaluable scientific and engineering contributions while also providing a unique opportunity for multimedia researchers and practitioners to delve into real-world applications, prototypes, and proofs-of-concept.

This dynamic setting is where conference attendees come face-to-face with groundbreaking multimedia systems. It’s a chance for them to gain insights into the innovative solutions and ideas that are actively shaping the future of this ever-evolving field. From visionary demonstrations of emerging technologies to interactive showcases that push the boundaries of creativity, these sessions are at the heart of what makes ACM MM a unique event in the world of multimedia.

Below is the list of video interviews with references to the corresponding authors and papers.

  • Varvara Guljajeva and Mar Canet Sola. 2022. Dream Painter: An Interactive Art Installation Bridging Audience Interaction, Robotics, and Creative AI. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 7235–7236. https://doi.org/10.1145/3503161.3549976
  • Jorge Forero, Gilberto Bernardes, and Mónica Mendes. 2022. Emotional Machines: Toward Affective Virtual Environments. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 7237–7238. https://doi.org/10.1145/3503161.3549973
  • Ignacio Reimat, Yanni Mei, Evangelos Alexiou, Jack Jansen, Jie Li, Shishir Subramanyam, Irene Viola, Johan Oomen, and Pablo Cesar. 2022. Mediascape XR: A Cultural Heritage Experience in Social VR. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 6955–6957. https://doi.org/10.1145/3503161.3547732
  • Manuel Silva, Luana Santos, Luís Teixeira, and José Vasco Carvalho. 2022. All is Noise: In Search of Enlightenment, a VR Experience. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 7223–7224. https://doi.org/10.1145/3503161.3549958
  • Pin-Xuan Liu, Tse-Yu Pan, Hsin-Shih Lin, Hung-Kuo Chu, and Min-Chun Hu. 2022. BetterSight: Immersive Vision Training for Basketball Players. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 6979–6981. https://doi.org/10.1145/3503161.3547745
  • Tiago Fornelos, Pedro Valente, Rafael Ferreira, Diogo Tavares, Diogo Silva, David Semedo, Joao Magalhaes, and Nuno Correia. 2022. A Conversational Shopping Assistant for Online Virtual Stores. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 6994–6996. https://doi.org/10.1145/3503161.3547738
  • Ting-Yang Kao, Tse-Yu Pan, Chen-Ni Chen, Tsung-Hsun Tsai, Hung-Kuo Chu, and Min-Chun Hu. 2022. ScoreActuary: Hoop-Centric Trajectory-Aware Network for Fine-Grained Basketball Shot Analysis. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 6991–6993. https://doi.org/10.1145/3503161.3547736
  • Maria Giovanna Donadio, Filippo Principi, Andrea Ferracani, Marco Bertini, and Alberto Del Bimbo. 2022. Engaging Museum Visitors with Gamification of Body and Facial Expressions. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 7000–7002. https://doi.org/10.1145/3503161.3547744

Explainable Artificial Intelligence for Quality of Experience Modelling

Data-driven Quality of Experience (QoE) modelling using Machine Learning (ML) arose as a promising alternative to the cumbersome and potentially biased manual QoE modelling. However, the reasoning of a majority of ML models is not explainable due to their black-box characteristics, which prevents us from gaining insights about how the model actually related QoE influence factors and QoE. These fundamental relationships are highly relevant for QoE researchers and service and network providers though.

With the emerging field of eXplainable Artificial Intelligence (XAI) and its recent technological advances, these issues can now be resolved. As a consequence, XAI enables data-driven QoE modelling to obtain generalizable QoE models and provides us simultaneously with the model’s reasoning on which QoE factors are relevant and how they affect the QoE score. In this work, we showcase the feasibility of explainable data-driven QoE modelling for video streaming and web browsing, before we discuss the opportunities and challenges of deploying XAI for QoE modelling.


In order to enhance services and networks and prevent users from switching to competitors, researchers and service providers need a deep understanding of the factors that influence the Quality of Experience (QoE) [1]. However, developing an effective QoE model is a complex and costly endeavour. Typically, it requires dedicated and extensive studies, which can only cover a limited portion of the parameter space and may be influenced by the study design. These studies often generate a relatively small sample of QoE ratings from a comparatively small population, making them vulnerable to poor performance when applied to unseen data. Moreover, the process of collecting and processing data for QoE modelling is not only arduous and time-consuming, but it can also introduce biases and self-fulfilling prophecies, such as perceiving an exponential relationship when one is expected.

To overcome these challenges, data-driven QoE modelling utilizing machine learning (ML) has emerged as a promising alternative, especially in scenarios where there is a wealth of data available or where data streams can be continuously obtained. A notable example is the ITU-T standard P.1203 [2], which estimates video streaming QoE by combining manual modelling – accounting for 75% of the Mean Opinion Score (MOS) estimation – and ML-based Random Forest modelling – accounting for the remaining 25%. The inclusion of the ML component in P.1203 indicates its ability to enhance performance. However, the inner workings of P.1203’s Random Forest model, specifically how it calculates the output score, are not obvious. Also, the survey in [3] shows that ML-based QoE modelling in multimedia systems is already widely used, including Virtual Reality, 360-degree video, and gaming. However, the QoE models are based on shallow learning methods, e.g., Support Vector Machines (SVM), or on deep learning methods, which lack explainability. Thus, it is difficult to understand what QoE factors are relevant and how they affect the QoE score [13], resulting in a lack of trust in data-driven QoE models and impeding their widespread adoption by researchers and providers [14].

Fortunately, recent advancements in the field of eXplainable Artificial Intelligence (XAI) [6] have paved the way for interpretable ML-based QoE models, thereby fostering trust between stakeholders and the QoE model. These advancements encompass a diverse range of XAI techniques that can be applied to existing black-box models, as well as novel and sophisticated ML models designed with interpretability in mind. Considering the use case of modelling video streaming QoE from real subjective ratings, the work in [4] evaluates the feasibility of explainable, data-driven QoE modelling and discusses the deployment of XAI for QoE research.

The utilization of XAI for QoE modelling brings several benefits. Not only does it speed up the modelling process, but it also enables the identification of the most influential QoE factors and their fundamental relationships with the Mean Opinion Score (MOS). Furthermore, it helps eliminate biases and preferences from different research teams and datasets that could inadvertently influence the model. All that is required is a selective dataset with descriptive features and corresponding QoE ratings (labels), which covers the most important QoE influence factors and, in particular, also rare events, e.g., many stalling events in a session. Generating such complete datasets, however, is an open research question, but calls for data-centric AI [15]. By merging datasets from various studies, more robust and generalizable QoE models can theoretically be created. These studies need to have a common ground though. Another benefit is the fact that the models can also be automatically refined over time as new QoE studies are conducted and additional data becomes available.

XAI: eXplainable Artificial Intelligence

For a comprehensive understanding of eXplainable Artificial Intelligence (XAI), a general overview can be found in [5], while a thorough survey on XAI methods and a taxonomy of XAI methods, in general, is available in [6].

XAI methods can be categorized into two main types: local and global explainability techniques. Local explainability aims to provide explanations for individual stimuli in terms of QoE factors and QoE ratings. On the other hand, global explainability focuses on offering general reasoning for how a model derives the QoE rating from the underlying QoE factors. Furthermore, XAI methods can be classified into post-hoc explainers and interpretable models.

Post-hoc explainers [6] are commonly used to explain various black-box models, such as neural networks or ensemble techniques after they have been trained. One widely utilized post-hoc explainer is SHAP values [7], which originates from game theory. SHAP values quantify the contribution of each feature to the model’s prediction by considering all possible feature subsets and learning a model for each subset. Other post-hoc explainers include LIME and Anchors, although they are limited to classification tasks.

Interpretable models, by design, provide explanations for how the model arrives at its output. Well-known interpretable models include linear models and decision trees. Additionally, generalized additive models (GAM) are gaining recognition as interpretable models.

A GAM is a generalized linear model in which the model output is computed by summing up each of the arbitrarily transformed input features along with a bias [8]. The form of a GAM enables a direct interpretation of the model by analyzing the learned functions and the transformed inputs, which allows to estimate the influence of a feature. Two state-of-the-art ML-based GAM models are Explainable Boosting Machine (EBM) [9] and Neural Additive Model (NAM) [8]. While EBM uses decision trees to learn the functions and gradient boosting to improve training, NAM utilizes arbitrary neural networks to learn the functions, resulting in a neural network architecture with one sub-network per feature. EBM extends GAM by also considering additional pairwise feature interaction terms while maintaining explainability.

Exemplary XAI-based QoE Modelling using GAMs

We demonstrate the learned predictor functions for both EBM (red) and NAM (blue) on a video QoE dataset in Figure 1. All technical details about the dataset and the methodology can be found in [4]. We observe that both models provide smooth shape functions, which are easy to interpret. EBM and NAM differ only marginally and mostly in areas where the data density is low. Here, EBM outperforms NAM by overfitting on single data points using the feature interaction terms. We can see this, for example, for a high total stalling duration and a high number of quality switches, where at some point EBM stops the negative trend and strongly contrasts its previous trend to improve predictions for extreme outliers.

Figure 1: EBM and NAM for video QoE modelling

Using the smooth predictor functions, it is easy to apply curve fitting. In the bottom right plot of Figure 1, we fit the average bitrate predictor function of NAM, which was shifted by the average MOS of the dataset to obtain the original MOS scale on the y-axis, on an inverted x-axis using exponential (IQX), logarithmic (WQL), and linear functions (LIN). Note that this constitutes a univariate mapping of average bitrate to MOS, neglecting the other influencing factors. We observe that our predictor function follows the WQL hypothesis [10] (red) with a high R²=0.967. This is in line with the mechanics of P.1203, where the authors of [11] showed the same logarithmic behavior for the bitrate in mode 0.

Figure 2: EBM and NAM for web QoE modelling

As the presented XAI methods are universally applicable to any QoE dataset, Figure 2 shows a similar GAM-based QoE modelling for a web QoE dataset obtained from [12]. We can see that the loading behavior in terms of ByteIndex-Page Load Time (BI-PLT) and time to last byte (TTLB) has the strongest impact on web QoE. Moreover, we see that different URLs/webpages have a different effect on the MOS, which shows that web QoE is content dependent. Summarizing, using GAMs, we obtain valuable easy to interpret functions, which explain fundamental relationships between QoE factors and MOS. Nevertheless, further XAI methods can be utilized, as detailed in [4,5,6].


In addition to expediting the modelling process and mitigating modelling biases, data-driven QoE modelling offers significant advantages in terms of improved accuracy and generalizability compared to manual QoE models. ML-based models are not constrained to specific classes of continuous functions typically used in manual modelling, allowing them to capture more complex relationships present in the data. However, a challenge with ML-based models is the risk of overfitting, where the model becomes overly sensitive to noise and fails to capture the underlying relationships. Overfitting can be avoided through techniques like model regularization or by collecting sufficiently large or complete datasets.

Successful implementation of data-driven QoE modelling relies on purposeful data collection. It is crucial to ensure that all (or at least the most important) QoE factors are included in the dataset, covering their full parameter range with an adequate number of samples. Controlled lab or crowdsourcing studies can define feature values easily, but budget constraints (time and cost) often limit data collection to a small set of selected feature values. Conversely, field studies can encompass a broader range of feature values observed in real-world scenarios, but they may only gather limited data samples for rare events, such as video sessions with numerous stalling events. To prevent data bias, it is essential to balance feature values, which may require purposefully generating rare events in the field. Additionally, thorough data cleaning is necessary. While it is possible to impute missing features resulting from measurement errors, doing so increases the risk of introducing bias. Hence, it is preferable to filter out missing or unusual feature values.

Moreover, adding new data and retraining an ML model is a natural and straightforward process in data-driven modelling, offering long-term advantages. Eventually, data-driven QoE models would be capable of handling concept drift, which refers to changes in the importance of influencing factors over time, such as altered user expectations. However, QoE studies are rarely conducted as temporal and population-based snapshots, limiting frequent model updates. Ideally, a pipeline could be established to provide a continuous stream of features and QoE ratings, enabling online learning and ensuring the QoE models remain up to date. Although challenging for research endeavors, service providers could incorporate such QoE feedback streams into their applications

Comparing black-box and interpretable ML models, there is a slight trade-off between performance and explainability. However, as shown in [4], it should be negligible in the context of QoE modelling. Instead, XAI allows to fully understand the model decisions, identifying relevant QoE factors and their relationships to the QoE score. Nevertheless, it has to be considered that explaining models becomes inherently more difficult when the number of input features increases. Highly correlated features and interactions may further lead to misinterpretations when using XAI since the influence of a feature may also depend on other features. To obtain reliable and trustworthy explainable models, it is, therefore, crucial to exclude highly correlated features.

Finally, although we demonstrated XAI-based QoE modelling only for video streaming and web browsing, from a research perspective, it is important to understand that the whole process is easily applicable in other domains like speech or gaming. Apart from that, it can also be highly beneficial for providers of services and networks to use XAI when implementing a continuous QoE monitoring. They could integrate visualizations of trends like Figure 1 or Figure 2 into dashboards, thus, allowing to easily obtain a deeper understanding of the QoE in their system.


In conclusion, the progress in technology has made data-driven explainable QoE modeling suitable for implementation. As a result, it is crucial for researchers and service providers to consider adopting XAI-based QoE modeling to gain a comprehensive and broader understanding of the factors influencing QoE and their connection to users’ subjective experiences. By doing so, they can enhance services and networks in terms of QoE, effectively preventing user churn and minimizing revenue losses.


[1] K. Brunnström, S. A. Beker, K. De Moor, A. Dooms, S. Egger, M.-N. Garcia, T. Hossfeld, S. Jumisko-Pyykkö, C. Keimel, M.-C. Larabi et al., “Qualinet White Paper on Definitions of Quality of Experience,” 2013.

[2] W. Robitza, S. Göring, A. Raake, D. Lindegren, G. Heikkilä, J. Gustafsson, P. List, B. Feiten, U. Wüstenhagen, M.-N. Garcia et al., “HTTP Adaptive Streaming QoE Estimation with ITU-T Rec. P. 1203: Open Databases and Software,” in ACM MMSys, 2018

[3] G. Kougioumtzidis, V. Poulkov, Z. D. Zaharis, and P. I. Lazaridis, “A Survey on Multimedia Services QoE Assessment and Machine Learning-Based Prediction,” IEEE Access, 2022.

[4] N. Wehner, A. Seufert, T. Hoßfeld, M. and Seufert, “Explainable Data-Driven QoE Modelling with XAI,” QoMEX, 2023.

[5] C. Molnar, Interpretable Machine Learning, 2nd ed., 2022. Available: https://christophm.github.io/interpretable-ml-book

[6] A. B. Arrieta, N. Diıaz-Rodriguez et al., “Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges Toward Responsible AI,” Information fusion, 2020.

[7] S. M. Lundberg and S.-I. Lee, “A Unified Approach to Interpreting Model Predictions,” NIPS, 2017.

[8] R. Agarwal, L. Melnick, N. Frosst, X. Zhang, B. Lengerich, R. Caruana, and G. E. Hinton, “Neural Additive Models: Interpretable MachineLearning with Neural Nets,” NIPS, 2021.

[9] H. Nori, S. Jenkins, P. Koch, and R. Caruana, “InterpretML: A Unified Framework for Machine Learning Interpretability,” arXiv preprint arXiv:1909.09223, 2019.

[10] T. Hoßfeld, R. Schatz, E. Biersack, and L. Plissonneau, “Internet Video Delivery in YouTube: From Traffic Measurements to Quality of Experience,” in Data Traffic Monitoring and Analysis, 2013.

[11] M. Seufert, N. Wehner, and P. Casas, “Studying the Impact of HAS QoE Factors on the Standardized Qoe Model P. 1203,” in ICDCS, 2018

[12] D. N. da Hora, A. S. Asrese, V. Christophides, R. Teixeira, D. Rossi, “Narrowing the gap between QoS metrics and Web QoE using Above-the-fold metrics,” PAM, 2018

[13] A. Seufert, F. Wamser, D. Yarish, H. Macdonald, and T. Hoßfeld, “QoE Models in the Wild: Comparing Video QoE Models Using a Crowdsourced Data Set”, in QoMEX, 2021

[14] D. Shin, “The effects of explainability and causability on perception, trust, and acceptance: Implications for explainable AI”, in International Journal of Human-Computer Studies, 2021.

[15] D. Zha, Z. P. Bhat, K. H. Lai, F. Yang, & X. Hu, “Data-centric ai: Perspectives and challenges”, in SIAM International Conference on Data Mining, 2023

Students Report from ACM MMsys 2023

The 14th ACM Multimedia Systems Conference (with the associated workshops: NOSSDAV 2023, MMVE 2023, and the first edition of GMSys 2023) took place from 7th – 10th June 2023 in Vancouver, Canada.  The MMSys conference brings together researchers in multimedia systems to showcase and exchange their cutting-edge research findings. Once again, there were technical talks spanning various multimedia domains and inspiring keynote presentations. Participants had also the opportunity to further interact with colleagues while enjoying the sunset with a 360° view of Vancouver on the Lookout tower or during a dinner in the core of the rainforest. Additionally, this year’s event included a special session dedicated to the memory of Dr. Kuan-Ta Chen, to honor his invaluable contributions to the multimedia community and to inspire the future generation of researches.

To encourage junior researchers to participate on-site, SIGMM has sponsored a group of students with Student Travel Grant Awards. For many of them, this was their first time presenting at an international conference, and it was a wonderful experience. In this article, the recipients of the travel grants share their experiences at MMSys 2023.

Mike Vandersanden, PhD student from Hasselt University, Belgium

As a new PhD student starting my professional academic career less than a year before MMSys ’23, I focused on finding my place in the academic community. My advisor and colleagues encouraged me to achieve two goals: get feedback on my research and build a network. I submitted my first paper and it got accepted for the Doctoral Symposium at the conference, which was a great opportunity to work towards achieving my goals. Presenting my paper allowed me to receive helpful feedback, have interesting discussions, and gain new perspectives. It was motivating to see people interested in my work. During the rest of the conference, I connected with many attendees from different parts of the world. The social events were a great way to meet others, and we also had enjoyable evenings downtown. Upon returning home, I was happy to report to my advisor that I accomplished all my goals for the first year. I am grateful for receiving a student travel grant, as it made it easier to travel to another continent. It also gave me the freedom to manage my budget and increases my chances of attending the conference again next year.

May Lim, PhD student from National University of Singapore, Singapore

I was both excited and nervous for MMSys 2023 as it was not only my first in-person conference but also in a country I had never visited before. The conference turned out to be one of the most unforgettable and pleasant experience I ever had. It was well-organized with very insightful presentations, many opportunities to interact and exchange contacts with fellow researchers, and not forgetting the organizers’ thoughtful efforts to ensure the great comfort and welfare of the participants. Vancouver’s weather and people were very kind as well.

I am thankful for the travel grant and the support from ACM SIGMM was truly heartening. I hope to continue to be part of this community and pay it forward in other ways.

Tiago Soares da Costa, PhD student from FEUP, Portugal

MMSys 2023 marked my return to in-person conferences and was one of the most well organized conferences I had the pleasure to participate in. After several virtual conferences, being able to actively meet and discuss interesting topics related to multimedia with fellow researchers was a breath of fresh air. The keynote presentation from Klara Nahrstedt was one of my highlights from MMSys 2023, due to its extensive focus on multi-view streaming, one of the main topics from my PhD research. Ihab Amer was another welcome surprise, presenting us with the current trends in AI encoding from one of the leading tech enterprises, AMD. Regarding paper presentations, I have to highlight the following works: 1) “The AD△ER Framework: Tools for Event Video Representations“, for providing us with a new approach to frameless videos; 2) “Remote Expert Assistance System for Mixed-HMD Clients over 5G Infrastructure” for delivering an impressive tech demonstration; 3) “FleXR: A System Enabling Flexibly Distributed Extended Reality“, for presenting us with a distributed stream processing solution which can be effectively applied to XR-based environments. As for the social events from MMSys 2023, the sights and sounds from Grouse Mountain and the impressive view from the Vancouver Lookout Tower were among some of my favourite moments in Vancouver, that I will forever cherish. Overall, MMSys 2023 was an amazing conference and I’m particularly grateful to the SIGMM committee for providing me with the travel grant.

Yu-Szu Wei from National Tsing Hua University, Taiwan

It is a great honor for me to receive the student travel grant and I appreciate it so much. ACM MMsys 2023 is my first in-person experience attending an international conference, it certainly is a fantastic experience for me. I met lots of astonishing researchers and volunteers who solve problems with different, creative, and novel approaches. I exchanged my ideas with them and learned a lot from them. The keynote sessions also gave me brand-new mindsets, finding out that there are lots of issues for us to investigate and deal with. The most impressive thing for me is to stand on the stage and present my work to those experts. I’m so proud of myself for delivering my research ideas in front of the public and gaining abundant feedback from the audience.

Thanks to the committee that organized this awesome event, and provided me the travel grant to attend the conference. I’m looking forward to attending ACM MMSys again in the future. 

Goodbye Multidisciplinary Column!

In June 2017, we were invited to serve as editors of the newly established column on multidisciplinary aspects of Multimedia. Our major goal back then was to portray a look beyond ‘the big pond’ — multimedia research that is.

We set out to establish a multidisciplinary dialogue within the multimedia community and raise awareness for, as well as underline mutual benefits between neighbouring disciplines. Towards this end, we chose various formats: interviews of peers whose work sits at the intersection of disciplines, and opinion-based articles on multidisciplinary aspects of multimedia, also including community and conference spotlights.

We look back at a rich volume of articles. Over the past 5 years, we gave voice to 6 peers who work at the intersection of multimedia and other disciplines, e.g. accessibility, musical interfaces, digital naturalism or security and privacy. One common recurring theme amongst those interviewed is that they draw upon a variety of disciplines in their daily work–or as Andy Quitmeyer put it, his “work is anti-disciplinary. Instead of relying on [a] specific field of practice, the work simply sets out towards some basic goals and happily uses any means necessary to get there. Currently, this includes a blend of naturalistic experimentation, performance art, filmmaking, interaction design, software and hardware engineering, industrial design, ergonomics illustration, and storytelling.” We also discussed grand research challenges for our community. In retrospect, these were again mostly interdisciplinary, e.g. the likes of ‘universal design’, ‘generative everything’, the blending of real and virtual worlds through AI-powered multimedia and ‘reproducibility, openness and accessibility of research and communities‘. In this, Odette Scharenborg also emphasized the importance of being a visible role model, and ensuring that a diverse user audience should be accounted for, as well exemplified in her work on “making speech technology available for everyone, irrespective of how one speaks and what language one speaks”.

As for our opinion-based articles, we both highlighted communities that actively fostered interdisciplinarity (assistive augmentation and music information retrieval). Next to this, we shared further examples of ways to inclusively teach and design, as well as establish communities and reach more diverse audiences. Here, we often gave examples in which more established infrastructures in the academic community (such as conferences and workshops) could be combined with lesser-trodden paths of outreach.

Making connections between disciplines takes energy and commitment, which often needs to be invested next to other duties and services. In this, lately, we realized that both of us do not have the necessary capacity anymore to continue this series of columns. While this last piece marks the end of this column, for now, we are positive this column gave stage to multidisciplinary dialogues within, and inspirational to, our community. Given the grand research challenges our fields of research face, we speculate that inter- and multidisciplinary work will remain key to working towards addressing those challenges–there is ‘multi’ in multimedia.

About the Column

The Multidisciplinary Column is edited by Cynthia C. S. Liem and Jochen Huber.

Editor Biographies


Dr. Cynthia C. S. Liem is an Assistant Professor in the Multimedia Computing Group of Delft University of Technology, The Netherlands, and pianist of the Magma Duo. She initiated and co-coordinated the European research project PHENICX (2013-2016), focusing on technological enrichment of symphonic concert recordings with partners such as the Royal Concertgebouw Orchestra. Her research interests consider music and multimedia search and recommendation, and increasingly shift towards making people discover new interests and content which would not trivially be retrieved. Beyond her academic activities, Cynthia gained industrial experience at Bell Labs Netherlands, Philips Research and Google. She was a recipient of the Lucent Global Science and Google Anita Borg Europe Memorial scholarships, the Google European Doctoral Fellowship 2010 in Multimedia, and a finalist of the New Scientist Science Talent Award 2016 for young scientists committed to public outreach.


Dr. Jochen Huber is Professor of Computer Science at Furtwangen University, Germany. Previously, he was a Senior User Experience Researcher with Synaptics and an SUTD-MIT postdoctoral fellow in the Fluid Interfaces Group at MIT Media Lab and the Augmented Human Lab at Singapore University of Technology and Design. He holds a Ph.D. in Computer Science and degrees in both Mathematics (Dipl.-Math.) and Computer Science (Dipl.-Inform.), all from Technische Universität Darmstadt, Germany. Jochen’s work is situated at the intersection of Human-Computer Interaction and Human Augmentation. He designs, implements and studies novel input technology in the areas of mobile, tangible & non-visual interaction, automotive UX and assistive augmentation. He has co-authored over 60 academic publications and regularly serves as program committee member in premier HCI and multimedia conferences. He was program co-chair of ACM TVX 2016 and Augmented Human 2015 and chaired tracks of ACM Multimedia, ACM Creativity and Cognition and ACM International Conference on Interface Surfaces and Spaces, as well as numerous workshops at ACM CHI and IUI. Further information can be found on his personal homepage: http://jochenhuber.com