VQEG Column: VQEG Meeting Dec. 2020 (virtual/online)


Welcome to the third column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG).
The last VQEG plenary meeting took place online from 14 to 18 December. Given the current circumstances, it was organized all online for the second time, with multiple sessions distributed over five to six hours each day allowing remote participation of people from different time zones. About 130 participants from 24 different countries registered to the meeting and could attend the several presentations and discussions that took place in all working groups.
This column provides an overview of this meeting, while all the information, minutes, files (including the presented slides), and video recordings from the meeting are available online in the VQEG meeting website. As highlights of interest for the SIGMM community, apart from several interesting presentations of state-of-the-art works, relevant contributions to ITU recommendations related to multimedia quality assessment were reported from various groups (e.g., on adaptive bitrate streaming services, on subjective quality assessment of 360-degree videos, on statistical analysis of quality assessments, on gaming applications, etc.), the new group on quality assessment for health applications was launched, and an interesting session on 5G use cases took place, as well as a workshop dedicated to user testing during Covid-19. In addition, new efforts have been launched related to the research on quality metrics for live media streaming applications, and to provide guidelines on implementing objective video quality metrics (ahead of PSNR) to the video compression community.
We encourage those readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.

Overview of VQEG Projects

Audiovisual HD (AVHD)

AVHD/P.NATS2 project was a joint collaboration between VQEG and ITU SG12, whose goal was to develop a multitude of objective models, varying in terms of complexity/type of input/use-cases for the assessment of video quality in adaptive bitrate streaming services over reliable transport up to 4K. The report of this project, which finished in January 2020, was approved in this meeting. In summary, it resulted in 10 model categories with models trained and validated on 26 subjective datasets. This activity resulted in 4 ITU standards (ITU-T Rec. P.1204 in [1], P.1204.3 in [2], P.1204.4 in [3], P.1204.5 in [4], a dataset created during this effort and a journal publication reporting details on the validation tests [5]. In this sense, one presentation by Alexander Raake (TU Ilmenau) provided details on the P.NATS Phase 2 project and the resulting ITU recommendations, while details of the processing chain used in the project were presented by Werner Robitza (AVEQ GmbH) and David Lindero (Ericsson).
In addition to this activity, there were various presentations covering topics related to this group. For instance, Cindy Chen, Deepa Palamadai Sundar, and Visala Vaduganathan (Facebook) presented their work on hardware acceleration of video quality metrics. Also from Facebook, Haixiong Wang presented their work on efficient measurement of quality at scale in their video ecosystem [6]. Lucjan Janowski (AGH University) proposed a discussion on more ecologically valid subjective experiments, Alan Bovik (University of Texas at Austin) presented a hitchhiker’s guide to SSIM, and Ali Ak (Université de Nantes) presented a comprehensive analysis of crowdsourcing for subjective evaluation of tone mapping operators. Finally, Rohit Puri (Twitch) opened a discussion on the research on QoE metrics for live media streaming applications, which led to the agreement to start a new sub-project within AVHD group on this topic.

Psycho-Physiological Quality Assessment (PsyPhyQA)

The chairs of the PsyPhyQA group provided an update on the activities carried out. In this sense, a test plan for psychophysiological video quality assessment was established and currently the group is aiming to develop ideas to do quality assessment tests with psychophysiological measures in times of a pandemic and to collect and discuss ideas about possible joint works. In addition, the project is trying to learn about physiological correlates of simulator sickness, and in this sense, a presentation was delivered J.P. Tauscher (Technische Universität Braunschweig) on exploring neural and peripheral physiological correlates of simulator sickness. Finally, Waqas Ellahi (Université de Nantes) gave a presentation on visual fidelity of tone mapping operators from gaze data using HMM [7].

Quality Assessment for Health applications (QAH)

This was the first meeting for this new QAH group. The chairs informed about the first audio call that took place on November to launch the project, know how many people are interested in this project, what each member has already done on medical images, what each member wants to do in this joint project, etc.
The plenary meeting served to collect ideas about possible joint works and to share experiences on related studies. In this sense, Lucie Lévêque (Université Gustave Eiffel) presented a review on subjective assessment of the perceived quality of medical images and videos, Maria Martini (Kingston University London) talked about the suitability of VMAF for quality assessment of medical videos (ultrasound & wireless capsule endoscopy), and Jorge Caviedes (ASU) delivered a presentation on cognition inspired diagnostic image quality models.

Statistical Analysis Methods (SAM)

The update report from SAM group presented the ongoing progress on new methods for data analysis, including the discussion with ITU-T (P.913 [8]) and ITU-R (BT.500 [9]) about including a new one in the recommendations.
Several interesting presentations related to the ongoing work within SAM were delivered. For instance, Jakub Nawala (AGH University) presented the “su-JSON”, a uniform JSON-based subjective data format, as well as his work on describing subjective experiment consistency by p-value p–p plots. An interesting discussion was raised by Lucjan Janowski (AGH University) on how to define the quality of a single sequence, analyzing different perspectives (e.g., crowd, experts, psychology, etc.). Also, Babak Naderi (TU Berlin) presented an analysis on the relation on Mean Opinion Score (MOS) and ranked-based statistics. Recent advances on Netflix quality metric VMAF were presented by Zhi Li (Netflix), especially on the properties of VMAF in the presence of image enhancement. Finally, two more presentations addressed the progress on statistical analyses of quality assessment data, one by Margaret Pinson (NTIA/ITS) on the computation of confidence intervals, and one by Suiyi Ling (Université de Nantes) on a probabilistic model to recover the ground truth and annotator’s behavior.

Computer Generated Imagery (CGI)

The report from the chairs of the CGI group covered the progress on the research on assessment methodologies for quality assessment of gaming services (e.g., ITU-T P.809 [10]), on crowdsourcing quality assessment for gaming application (P.808 [11]), on quality prediction and opinion models for cloud gaming (e.g., ITU-T G.1072 [12]), and on models (signal-, bitstream-, and parametric-based models) for video quality assessment of CGI content (e.g., nofu, NDNetGaming, GamingPara, DEMI, NR-GVQM, etc.).
In terms of planned activities, the group is targeting the generation of new gaming datasets and tools for metrics to assess gaming QoE, but also the group is aiming at identifying other topics of interest in CGI rather than gaming content.
In addition, there was a presentation on updates on gaming standardization activities and deep learning models for gaming quality prediction by Saman Zadtootaghaj (TU Berlin), another one on subjective assessment of multi-dimensional aesthetic assessment for mobile game images by Suiyi Ling (Université de Nantes), and one addressing quality assessment of gaming videos compressed via AV1 by Maria Martini (Kingston University London), leading to interesting discussions on those topics.

No Reference Metrics (NORM)

The session for NORM group included a presentation on the differences among existing implementations of spatial and temporal perceptual information indices (SI and TI as defined in ITU-T P.910 [13]) by Cosmin Stejerean (Facebook), which led to an open discussion and to the agreement on launching an effort to clarify the ambiguous details that have led to different implementations (and different results), to generate test vectors for reference and validation of the implementations and to address the computation of these indicators for HDR content. In addition, Margaret Pinson (NTIA/ITS) presented the paradigm of no-reference metric research analyzing design problems and presenting a framework for collaborative development of no-reference metrics for image and video quality. Finally, Ioannis Katsavounidis (Facebook) delivered a talk on addressing the addition of video quality metadata in compressed bitstreams. Further discussions on these topics are planned in the next month within the group.

Joint Effort Group (JEG) – Hybrid

The JEG-Hybrid group is currently working in collaboration with Sky Group in determining when video quality metrics are likely to inaccurately predict the MOS and on modelling single observers’ quality perception based in artificial intelligence techniques. In this sense, Lohic Fotio (Politecnico di Tornio) presented his work on artificial intelligence-based observers for media quality assessment. Also, together with Florence Agboma (Sky UK) they presented their work on comparing commercial and open source video quality metrics for HD constant bitrate videos. Finally, Dariusz Grabowski (AGH University) presented his work on comparing full-reference video quality metrics using cluster analysis.

Quality Assessment for Computer Vision Applications (QACoViA)

The QACoViA group announced Lu Zhang (INSA Rennes) as new third co-chair, who will also work in the near future in a project related to image compression for optimized recognition by distributed neural networks. In addition, Mikołaj Leszczuk (AGH University) presented a report on a recently finished project related to objective video quality assessment method for recognition tasks, in collaboration with Huawei through its Innovation Research Programme.

5G Key Performance Indicators (5GKPI)

The 5GKPI session was oriented to identify possible interested partners and joint works (e.g., contribution to ITU-T SG12 recommendation G.QoE-5G [14], generation of open/reference datasets, etc.). In this sense, it included four presentations of use cases of interest: tele-operated driving by Yungpeng Zang (5G Automotive Association), content production related to the European project 5G-Records by Paola Sunna (EBU), Augmented/Virtual Reality by Bill Krogfoss (Bell Labs Consulting), and QoE for remote controlled use cases by Kjell Brunnström (RISE).

Immersive Media Group (IMG)

A report on the updates within the IMG group was initially presented, especially covering the current joint work investigating the subjective quality assessment of 360-degree video. In particular, a cross-lab test, involving 10 different labs, were carried out at the beginning of 2020 resulting in relevant outcomes including various contributions to ITU SG12/Q13 and MPEG AhG on Quality of Immersive Media. It is worth noting that the new ITU-T recommendation P.919 [15], related to subjective quality assessment of 360-degree videos (in line with ITU-R BT.500 [8] or ITU-T P.910 [13]), was approved in mid-October, and was supported by the results of these cross-lab tests. 
Furthermore, since these tests have already finished, there was a presentation by Pablo Pérez (Nokia Bell-Labs) on possible future joint activities within IMG, which led to an open discussion after it that will continue in future audio calls.
In addition, a total of four talks covered topics related to immersive media technologies, including an update from the Audiovisual Technology Group of the TU Ilmenau on immersive media topics, and a presentation of a no-reference quality metric for light field content based on a structural representation of the epipolar plane image by Ali Ak and Patrick Le Callet (Université de Nantes) [16]. Also, there were two presentations related to 3D graphical contents, one addressing the perceptual characterization of 3D graphical contents based on visual attention patterns by Mona Abid (Université de Nantes), and another one comparing subjective methods for quality assessment of 3D graphics in virtual reality by Yana Nehmé (INSA Lyon). 

Intersector Rapporteur Group on Audiovisual Quality Assessment (IRG-AVQA) and Q19 Interim Meeting

Chulhee Lee (Yonsei University) chaired the IRG-AVQA session, providing an overview on the progress and recent works within ITU-R WP6C in HDR related topics and ITU-T SG12 Questions 9, 13, 14, 19 (e.g., P.NATS Phase 2 and follow-ups, subjective assessment of 360-degree video, QoE factors for AR applications, etc.). In addition, a new work item was announced within ITU-T SG9: End-to-end network characteristics requirements for video services (J.pcnp-char [17]).
From the discussions raised during this session, a new dedicated group was set up to work on introducing and provide guidelines on implementing objective video quality metrics, ahead of PSNR, to the video compression community. The group was named “Implementers Guide for Video Quality Metrics (IGVQM)” and will be chaired by Ioannis Katsavounidis (Facebook), accounting with the involvement of several people from VQEG.
After the IRG-AVQA session, the Q19 interim meeting took place with a report by Chulhee Lee and a presentation by Zhi Li (Netflix) on an update on improvements on subjective experiment data analysis process.

Other updates

Apart from the aforementioned groups, the Human Factors for Visual Experience (HVEI) is still active coordinating VQEG activities in liaison with the IEEE Standards Association Working Groups on HFVE, especially on perceptual quality assessment of 3D, UHD and HD contents, quality of experience assessment for VR and MR, quality assessment of light-field imaging contents, and deep-learning-based assessment of visual experience based on human factors. In this sense, there are ongoing contributions from VQEG members to IEEE Standards.
In addition, there was a workshop dedicated to user testing during Covid-19, which included a presentation on precaution for lab experiments by Kjell Brunnström (RISE), another presentation by Babak Naderi (TU Berlin) on subjective tests during the pandemic, and a break-out session for discussions on the topic.

Finally, the next VQEG plenary meeting will take place in spring 2021 (exact dates still to be agreed), probably online again.


[1] ITU-T Rec. P.1204. Video quality assessment of streaming services over reliable transport for resolutions up to 4K, 2020.
[2] ITU-T Rec. P.1204.3. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to full bitstream information, 2020.
[3] ITU-T Rec. P.1204.4. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to full and reduced reference pixel information, 2020.
[4] ITU-T Rec. P.1204.5. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to transport and received pixel information, 2020.
[5] A. Raake, S. Borer, S. Satti, J. Gustafsson, R.R.R. Rao, S. Medagli, P. List, S. Göring, D. Lindero, W. Robitza, G. Heikkilä, S. Broom, C. Schmidmer, B. Feiten, U. Wüstenhagen, T. Wittmann, M. Obermann, R. Bitto, “Multi-model standard for bitstream-, pixel-based and hybrid video quality assessment of UHD/4K: ITU-T P.1204”, IEEE Access, vol. 8, pp. 193020-193049, Oct. 2020.
[6] S.L. Regunathan, H. Wang, Y. Zhang, Y. R. Liu, D. Wolstencroft, S. Reddy, C. Stejerean, S. Gandhi, M. Chen, P. Sethi, A, Puntambekar, M. Coward, I. Katsavounidis, “Efficient measurement of quality at scale in Facebook video ecosystem”, in Applications of Digital Image Processing XLIII, vol. 11510, p. 115100J, Aug. 2020.
[7] W. Ellahi, T. Vigier and P. Le Callet, “HMM-Based Framework to Measure the Visual Fidelity of Tone Mapping Operators”, IEEE International Conference on Multimedia & Expo Workshops (ICMEW), London, United Kingdom, Jul. 2020.
[8] ITU-R Rec. BT.500-14. Methodology for the subjective assessment of the quality of television pictures, 2019.
[9] ITU-T Rec. P.913. Methods for the subjective assessment of video quality, audio quality and audiovisual quality of Internet video and distribution, 2016.
[10] ITU-T Rec. P.809. Subjective evaluation methods for gaming quality, 2018.
[11] ITU-T Rec. P.808. Subjective evaluation of speech quality with a crowdsourcing approach, 2018.
[12] ITU-T Rec. G.1072. Opinion model predicting gaming quality of experience for cloud gaming services, 2020.
[13] ITU-T Rec. P.910. Subjective video quality assessment methods for multimedia applications, 2008.
[14] ITU-T Rec. G.QoE-5G. QoE factors for new services in 5G networks, 2020 (under study).
[15] ITU-T Rec. P.919. Subjective test methodologies for 360º video on head-mounted displays, 2020.
[16] A. Ak, S. Ling and P. Le Callet, “No-Reference Quality Evaluation of Light Field Content Based on Structural Representation of The Epipolar Plane Image”, IEEE International Conference on Multimedia & Expo Workshops (ICMEW), London, United Kingdom, Jul. 2020.
[17] ITU-T Rec. J.pcnp-char. E2E Network Characteristics Requirement for Video Services, 2020 (under study).

JPEG Column: 89th JPEG Meeting

JPEG initiates standardisation of image compression based on AI

The 89th JPEG meeting was held online from 5 to 9 October 2020.

During this meeting multiple JPEG standardisation activities and explorations were discussed and progressed. Notably, the call for evidence on learning-based image coding was successfully completed and evidence was found that this technology promises several new functionalities while offering at the same time superior compression efficiency, beyond the state of the art. A new work item, JPEG AI, that will use learning-based image coding as core technology has been proposed, enlarging the already wide families of JPEG standards.

Figure 1. JPEG Families of standards and JPEG AI.

The 89th JPEG meeting had the following highlights:

  • JPEG AI call for evidence report
  • JPEG explores standardization needs to address fake media
  • JPEG Pleno Point Cloud Coding reviews status of the call for evidence
  • JPEG Pleno Holography call for proposals timeline
  • JPEG DNA identifies use cases and requirements
  • JPEG XL standard defines the final specification
  • JPEG Systems JLINK reaches committee draft stage
  • JPEG XS 2nd Edition Parts 1, 2 and 3.


At the 89th meeting the submissions to the Call for Evidence on learning-based image coding were presented and discussed. Four submissions were received in response to the Call for Evidence. The results of the subjective evaluation of the submissions to the Call for Evidence were reported and discussed in detail by experts. It was agreed that there is strong evidence that learning-based image coding solutions can outperform the already defined anchors in terms of compression efficiency, when compared to state-of-the-art conventional image coding architecture. Thus, it was decided to create a new standardisation activity for a JPEG AI on learning-based image coding system, that applies machine learning tools to achieve substantially better compression efficiency compared to current image coding systems, while offering unique features desirable for an efficient distribution and consumption of images. This type of approach should allow to obtain an efficient compressed domain representation not only for visualisation, but also for machine learning based image processing and computer vision. JPEG AI releases to the public the results of the objective and subjective evaluations as well as a first version of common test conditions for assessing the performance of leaning-based image coding systems.

JPEG explores standardization needs to address fake media

Recent advances in media modification, particularly deep learning-based approaches, can produce near realistic media content that is almost indistinguishable from authentic content. These developments open opportunities for production of new types of media contents that are useful for many creative industries but also increase risks of spread of maliciously modified content (e.g., ‘deepfake’) leading to social unrest, spreading of rumours or encouragement of hate crimes. The JPEG Committee is interested in exploring if a JPEG standard can facilitate a secure and reliable annotation of media modifications, both in good faith and malicious usage scenarios. 

The JPEG is currently discussing with stakeholders from academia, industry and other organisations to explore the use cases that will define a roadmap to identify the requirements leading to a potential standard. The Committee has received significant interest and has released a public document outlining the context, use cases and requirements. JPEG invites experts and technology users to actively participate in this activity and attend a workshop, to be held online in December 2020. Details on the activities of JPEG in this area can be found on the JPEG.org website. Interested parties are notably encouraged to register to the mailing list of the ad hoc group that has been set up to facilitate the discussions and coordination on this topic.

JPEG Pleno Point Cloud Coding

JPEG Pleno is working towards the integration of various modalities of plenoptic content under a single and seamless framework. Efficient and powerful point cloud representation is a key feature within this vision. Point cloud data supports a wide range of applications including computer-aided manufacturing, entertainment, cultural heritage preservation, scientific research and advanced sensing and analysis. During the 89th JPEG meeting, the JPEG Committee reviewed expressions of interest in the Final Call for Evidence on JPEG Pleno Point Cloud Coding. This Call for Evidence focuses specifically on point cloud coding solutions supporting scalability and random access of decoded point clouds. Between its 89th and 90th meetings, the JPEG Committee will be actively promoting this activity and collecting submissions to participate in the Call for Evidence.

JPEG Pleno Holography

At the 89th meeting, the JPEG Committee released an updated draft of the Call for Proposals for JPEG Pleno Holography. A final Call for Proposals on JPEG Pleno Holography will be released in April 2021. JPEG Pleno Holography is seeking for compression solutions of holographic content. The scope of the activity is quite large and addresses diverse use cases such as holographic microscopy and tomography, but also holographic displays and printing. Current activities are centred around refining the objective and subjective quality assessment procedures. Interested parties are already invited at this stage to participate in these activities.


JPEG standards are used in storage and archival of digital pictures. This puts the JPEG Committee in a good position to address the challenges of DNA-based storage by proposing an efficient image coding format to create artificial DNA molecules. JPEG DNA has been established as an exploration activity within the JPEG Committee to study use cases, to identify requirements and to assess the state of the art in DNA storage for the purpose of image archival using DNA in order to launch a standardization activity. To this end, a first workshop was organised on 30 September 2020. Presentations made at the workshop are available from the following URL:


At its 89th meeting, the JPEG Committee released a second version of a public document that describes its findings regarding storage of digital images using artificial DNA. In this framework, JPEG DNA ad hoc group was re-conducted in order to continue its activities to further refine the above-mentioned document and to organise a second workshop. Interested parties are invited to join this activity by participating in the AHG through the following URL: http://listregistration.jpeg.org.


Final technical comments by national bodies have been addressed and incorporated into the JPEG XL specification (ISO/IEC 18181-1) and the reference implementation. A draft FDIS study text has been prepared and final validation experiments are planned.

JPEG Systems

The JLINK (ISO/IEC 19566-7) standard has reached committee draft stage and will be made public.  The JPEG Committee invites technical feedback on the document which is available on the JPEG website.  Development of the JPEG Snack (IS0/IEC 19566-8) standard has begun to support the defined use cases and requirements.  Interested parties can subscribe to the mailing list of the JPEG Systems AHG in order to contribute to the above activities.


The JPEG committee is finalizing its work on the 2nd Editions of JPEG-XS Part 1, Part 2 and Part 3. Part 1 defines new coding tools required to efficiently compress raw Bayer images. The observed quality gains of raw Bayer compression over compressing in the RGB domain can be as high as 5dB PSNR. Moreover, the second edition adds support for mathematically lossless image compression and allows compression of 4:2:0 sub-sampled images. Part 2 defines new profiles for such content. With the support for low-complexity high quality compression of raw Bayer (or Color-Filtered Array) data, JPEG XS proves to also be an excellent compression scheme in the professional and consumer digital camera market, as well as in the machine vision and automotive industry.

Final Quote

“JPEG AI will be a new work item completing the collection of JPEG standards. JPEG AI relies on artificial intelligence to compress images. This standard not only will offer superior compression efficiency beyond the current state of the art but also will open new possibilities for vision tasks by machines and computational imaging for humans.” Said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Future JPEG meetings are planned as follows:

  • No 90, will be held online from January 18 to 22, 2021.
  • N0 91, will be held online from April 19 to 23, 2021.

Immersive Media Experiences – Why finding Consensus is Important

An introduction to the QUALINET White Paper on Definitions of Immersive Media Experience (IMEx) [1].


Immersive media are reshaping the way users experience reality. They are increasingly incorporated across enterprise and consumer sectors to offer experiential solutions to a diverse range of industries. Current technologies that afford an immersive media experience (IMEx) include Augmented Reality (AR), Virtual Reality (VR), Mixed Reality (MR), and 360-degree video. Popular uses can be found in enhancing connectivity applications, supporting knowledge-based tasks, learning & skill development, as well as adding immersive and interactive dimensions to the retail, business, and entertainment industries. Whereas the evolution of immersive media can be traced over the past 50 years, its current popularity boost is primarily owed to significant advances in the last decade brought about by improved connectivity, superior computing, and device capabilities. In specific, advancements witnessed in display technologies, visualizations, interaction & tracking devices, recognition technologies, platform development, new media formats, and increasing user demand for real-time & dynamic content across platforms.

Though still in its infancy, the immersive economy is growing into a dynamic and confident sector. Being an emerging sector, it is hard to find official data, but some estimations project the immersive media global market size to continue its upward growth at around 30% CAGR to reach USD180 Bn by 2022 [2,3]. Country-wise, the USA is expected to secure 1/3rd of the global immersive media market share followed by China, Japan, Germany, and the UK as likely immersive media markets where significant spending is anticipated. Consumer products and devices are poised to be the largest contributing segment. The growth in immersive consumer products is expected to continue as Head-Mounted Displays (HMD) become commonplace and interest in mobile augmented reality increase [4]. However, immersive media are no longer just a pursuit of alternative display technologies but pushing towards holistic ecosystems that seek contributions from hardware manufacturers, application & platform developers, content producers, and users. These ecosystems are making way for sophisticated content creation available on platforms that allow user participation, interaction, and skill integration through advanced tools.

Immersive media experience (IMEx), today, is not only how users view media but in fact a transformative way to consume media altogether. They draw considerable interdisciplinary interest from multiple disciplines. As stakeholders increase, the need for clarity and coherence on definitions and concepts become all the more important. In this article, we provide an overview and a brief survey of some of the key definitions that are central to IMEx including its Quality of Experience (QoE), application areas, influencing factors, and assessment methods. Our aim is to enable some clarity and initiate consensus, on topics related to IMEx that can be useful for researchers and practitioners working both inside academia and the industry.

Why to understand IMEx?

IMEx combines reality with technology enabling emplaced multimedia experiences of standard media (film, photographic, or animated) as well as synthetic and interactive environments for users. They utilize visual, auditory, and haptic feedback to stimulate physical senses such that users psychologically feel immersed within these multidimensional media environments. This sense of “being there” is also referred to as presence.

As mentioned earlier, the enthusiasm for IMEx is mainly driven by the gaming, entertainment, retail, healthcare, digital marketing, and skill training industries. So far, research has tilted favourably towards innovation, with a particular interest in image capture, recognition, mapping, and display technologies over the past few years. However, the prevalence of IMEx has also ushered in a plethora of definitions, frameworks, and models to understand the psychological and phenomenological concepts associated with these media forms. Central, of course, are the closely related concepts of immersion and presence, which are interpreted varyingly across fields; for example, when one moves from literature to narratology to computer sciences. However, with immersive media, these three separate fields come together inside interactive digital narrative applications where immersive narratives are used to solve real-world problems. This is when noticeable interdisciplinary differences regarding definitions, scope, and constituents require urgent redressal to achieve a coherent understanding of the used concepts. Such consensus is vital for giving directionality to the future of immersive media that can be shared by all.

A White Paper on IMEx

A recent White Paper [1] by QUALINET, the European Network on Quality of Experience in Multimedia Systems and Services [5], is a contribution to the discussions related to Immersive Media Experience (IMEx). It attempts to build consensus around ideas and concepts that are related to IMEx but originate from multidisciplinary groups with a joint interest in multimedia experiences.

The QUALINET community aims at extending the notion of network-centric Quality of Service (QoS) in multimedia systems, by relying on the concept of Quality of Experience (QoE). The main scientific objective is the development of methodologies for subjective and objective quality metrics considering current and new trends in multimedia communication systems as witnessed by the appearance of new types of content and interactions.

The white paper was created based on an activity launched at the 13th QUALINET meeting on June 4, 2019, in Berlin as part of Task Force 7, Immersive Media Experiences (IMEx). The paper received contributions from 44 authors under 10 section leads, which were consolidated into a first draft and released among all section leads and editors for internal review. After incorporating the feedback from all section leads, the editors initially released the White Paper within the QUALINET community for review. Following feedback from QUALINET at large, the editors distributed the White Paper widely for an open, public community review (e.g., research communities/committees in ACM and IEEE, standards development organizations, various open email reflectors related to this topic). The feedback received from this public consultation process resulted in the final version which has been approved during the 14th QUALINET meeting on May 25, 2020.

Understanding the White Paper

The White Paper surveys definitions and concepts that contribute to IMEx. It describes the Quality of Experience (QoE) for immersive media by establishing a relationship between the concepts of QoE and IMEx. This article provides an outline of these concepts by looking at:

  • Survey of definitions of immersion and presence discusses various frameworks and conceptual models that are most relevant to these phenomena in terms of multimedia experiences.
  • Definition of immersive media experience describes experiential determinants for IMEx characterized through its various technological contexts.
  • Quality of experience for immersive media applies existing QoE concepts to understand the user-centric subjective feelings of “a sense of being there”, “a sense of agency”, and “cybersickness”.
  • Application area for immersive media experience presents an overview of immersive technologies in use within gaming, omnidirectional content, interactive storytelling, health, entertainment, and communications.
  • Influencing factors on immersive media experience look at the three existing influence factors on QoE with a pronounced emphasis on the human influence factor as of very high relevance to IMEx.
  • Assessment of immersive media experience underscores the importance of proper examination of multimedia systems, including IMEx, by highlighting three methods currently in use, i.e., subjective, behavioral, and psychophysiological.
  • Standardization activities discuss the three clusters of activities currently underway to achieve interoperability for IMEx: (i) data representation & formats; (ii) guidelines, systems standards, & APIs; and (iii) Quality of Experience (QoE).


Immersive media have significantly changed the use and experience of new digital media. These innovative technologies transcend traditional formats and present new ways to interact with digital information inside synthetic or enhanced realities, which include VR, AR, MR, and haptic communications. Earlier the need for a multidisciplinary consensus was discussed vis-à-vis definitions of IMEx. The QUALINET white paper provides such “a toolbox of definitions” for IMEx. It stands out for bringing together insights from multimedia groups spread across academia and industry, specifically the Video Quality Experts Group (VQEG) and the Immersive Media Group (IMG). This makes it a valuable asset for those working in the field of IMEx going forward.


[1] Perkis, A., Timmerer, C., et al., “QUALINET White Paper on Definitions of Immersive Media Experience (IMEx)”, European Network on Quality of Experience in Multimedia Systems and Services, 14th QUALINET meeting (online), May 25, 2020. Online: https://arxiv.org/abs/2007.07032
[2] Mateos-Garcia, J., Stathoulopoulos, K., & Thomas, N. (2018). The immersive economy in the UK (Rep. No. 18.1137.020). Innovate UK.
[3] Infocomm Media 2025 Supplementary Information (pp. 31-43, Rep.). (2015). Singapore: Ministry of Communications and Information.
[4] Hadwick, A. (2020). XR Industry Insight Report 2019-2020 (Rep.). San Francisco: VRX Conference & Expo.
[5] http://www.qualinet.eu/

MPEG Column: 132nd MPEG Meeting (virtual/online)

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 132nd MPEG meeting was the first meeting with the new structure. That is, ISO/IEC JTC 1/SC 29/WG 11 — the official name of MPEG under the ISO structure — was disbanded after the 131st MPEG meeting and some of the subgroups of WG 11 (MPEG) have been elevated to independent MPEG Working Groups (WGs) and Advisory Groups (AGs) of SC 29 rather than subgroups of the former WG 11. Thus, the MPEG community is now an affiliated group of WGs and AGs that will continue meeting together according to previous MPEG meeting practices and will further advance the standardization activities of the MPEG work program.

The 132nd MPEG meeting was the first meeting with the new structure as follows (incl. Convenors and position within WG 11 structure):

  • AG 2 MPEG Technical Coordination (Convenor: Prof. Jörn Ostermann; for overall MPEG work coordination and prev. known as the MPEG chairs meeting; it’s expected that one can also provide inputs to this AG without being a member of this AG)
  • WG 2 MPEG Technical Requirements (Convenor Dr. Igor Curcio; former Requirements subgroup)
  • WG 3 MPEG Systems (Convenor: Dr. Youngkwon Lim; former Systems subgroup)
  • WG 4 MPEG Video Coding (Convenor: Prof. Lu Yu; former Video subgroup)
  • WG 5 MPEG Joint Video Coding Team(s) with ITU-T SG 16 (Convenor: Prof. Jens-Rainer Ohm; former JVET)
  • WG 6 MPEG Audio Coding (Convenor: Dr. Schuyler Quackenbush; former Audio subgroup)
  • WG 7 MPEG Coding of 3D Graphics (Convenor: Prof. Marius Preda, former 3DG subgroup)
  • WG 8 MPEG Genome Coding (Convenor: Prof. Marco Mattaveli; newly established WG)
  • AG 3 MPEG Liaison and Communication (Convenor: Prof. Kyuheon Kim; (former Communications subgroup)
  • AG 5 MPEG Visual Quality Assessment (Convenor: Prof. Mathias Wien; former Test subgroup).

The 132nd MPEG meeting was held as an online meeting and more than 300 participants continued to work efficiently on standards for the future needs of the industry. As a group, MPEG started to explore new application areas that will benefit from standardized compression technology in the future. A new web site has been created and can be found at http://mpeg.org/.

The official press release can be found here and comprises the following items:

  • Versatile Video Coding (VVC) Ultra-HD Verification Test Completed and Conformance and Reference Software Standards Reach their First Milestone
  • MPEG Completes Geometry-based Point Cloud Compression (G-PCC) Standard
  • MPEG Evaluates Extensions and Improvements to MPEG-G and Announces a Call for Evidence on New Advanced Genomics Features and Technologies
  • MPEG Issues Draft Call for Proposals on the Coded Representation of Haptics
  • MPEG Evaluates Responses to MPEG IPR Smart Contracts CfP
  • MPEG Completes Standard on Harmonization of DASH and CMAF
  • MPEG Completes 2nd Edition of the Omnidirectional Media Format (OMAF)
  • MPEG Completes the Low Complexity Enhancement Video Coding (LCEVC) Standard

In this report, I’d like to focus on VVC, G-PCC, DASH/CMAF, OMAF, and LCEVC.

Versatile Video Coding (VVC) Ultra-HD Verification Test Completed and Conformance & Reference Software Standards Reach their First Milestone

MPEG completed a verification testing assessment of the recently ratified Versatile Video Coding (VVC) standard for ultra-high definition (UHD) content with standard dynamic range, as may be used in newer streaming and broadcast television applications. The verification test was performed using rigorous subjective quality assessment methods and showed that VVC provides a compelling gain over its predecessor — the High Efficiency Video Coding (HEVC) standard produced in 2013. In particular, the verification test was performed using the VVC reference software implementation (VTM) and the recently released open-source encoder implementation of VVC (VVenC):

  • Using its reference software implementation (VTM), VVC showed bit rate savings of roughly 45% over HEVC for comparable subjective video quality.
  • Using VVenC, additional bit rate savings of more than 10% relative to VTM were observed, which at the same time runs significantly faster than the reference software implementation.

Additionally, the standardization work for both conformance testing and reference software for the VVC standard reached its first major milestone, i.e., progressing to the Committee Draft ballot in the ISO/IEC approval process. The conformance testing standard (ISO/IEC 23090-15) will ensure interoperability among the diverse applications that use the VVC standard, and the reference software standard (ISO/IEC 23090-16) will provide an illustration of the capabilities of VVC and a valuable example showing how the standard can be implemented. The reference software will further facilitate the adoption of the standard by being available for use as the basis of product implementations.

Research aspects: as for every new video codec, its compression efficiency and computational complexity are important performance metrics. While the reference software (VTM) provides a valid reference in terms of compression efficiency it is not optimized for runtime. VVenC seems to provide already a significant improvement and with x266 another open source implementation will be available soon. Together with AOMedia’s AV1 (including its possible successor AV2) we are looking forward to a lively future in the area of video codecs.

MPEG Completes Geometry-based Point Cloud Compression Standard

MPEG promoted its ISO/IEC 23090-9 Geometry-based Point Cloud Compression (G-PCC) standard to the Final Draft International Standard (FDIS) stage. G-PCC addresses lossless and lossy coding of time-varying 3D point clouds with associated attributes such as color and material properties. This technology is particularly suitable for sparse point clouds. ISO/IEC 23090-5 Video-based Point Cloud Compression (V-PCC), which reached the FDIS stage in July 2020, addresses the same problem but for dense point clouds, by projecting the (typically dense) 3D point clouds onto planes, and then processing the resulting sequences of 2D images using video compression techniques. The generalized approach of G-PCC, where the 3D geometry is directly coded to exploit any redundancy in the point cloud itself, is complementary to V-PCC and particularly useful for sparse point clouds representing large environments.

Point clouds are typically represented by extremely large amounts of data, which is a significant barrier to mass-market applications. However, the relative ease of capturing and rendering spatial information compared to other volumetric video representations makes point clouds increasingly popular for displaying immersive volumetric data. The current draft reference software implementation of a lossless, intra-frame G‐PCC encoder provides a compression ratio of up to 10:1 and lossy coding of acceptable quality for a variety of applications with a ratio of up to 35:1.

By providing high immersion at currently available bit rates, the G‐PCC standard will enable various applications such as 3D mapping, indoor navigation, autonomous driving, advanced augmented reality (AR) with environmental mapping, and cultural heritage.

Research aspects: the main research focus related to G-PCC and V-PCC is currently on compression efficiency but one should not dismiss its delivery aspects including its dynamic, adaptive streaming. A recent paper on this topic has been published in the IEEE Communications Magazine and is entitled “From Capturing to Rendering: Volumetric Media Delivery With Six Degrees of Freedom“.

MPEG Finalizes the Harmonization of DASH and CMAF

MPEG successfully completed the harmonization of Dynamic Adaptive Streaming over HTTP (DASH) with Common Media Application Format (CMAF) featuring a DASH profile for the use with CMAF (as part of the 1st Amendment of ISO/IEC 23009-1:2019 4th edition).

CMAF and DASH segments are both based on the ISO Base Media File Format (ISOBMFF), which per se enables smooth integration of both technologies. Most importantly, this DASH profile defines (a) a normative mapping of CMAF structures to DASH structures and (b) how to use Media Presentation Description (MPD) as a manifest format.
Additional tools added to this amendment include

  • DASH events and timed metadata track timing and processing models with in-band event streams,
  • a method for specifying the resynchronization points of segments when the segments have internal structures that allow container-level resynchronization,
  • an MPD patch framework that allows the transmission of partial MPD information as opposed to the complete MPD using the XML patch framework as defined in IETF RFC 5261, and
  • content protection enhancements for efficient signalling.

It is expected that the 5th edition of the MPEG DASH standard (ISO/IEC 23009-1) containing this change will be issued at the 133rd MPEG meeting in January 2021. An overview of DASH standards/features can be found in the Figure below.

Research aspects: one of the features enabled by CMAF is low latency streaming that is actively researched within the multimedia systems community (e.g., here). The main research focus has been related to the ABR logic while its impact on the network is not yet fully understood and requires strong collaboration among stakeholders along the delivery path including ingest, encoding, packaging, (encryption), content delivery network (CDN), and consumption. A holistic view on ABR is needed to enable innovation and the next step towards the future generation of streaming technologies (https://athena.itec.aau.at/).

MPEG Completes 2nd Edition of the Omnidirectional Media Format

MPEG completed the standardization of the 2nd edition of the Omnidirectional MediA Format (OMAF) by promoting ISO/IEC 23009-2 to Final Draft International Standard (FDIS) status including the following features:

  • “Late binding” technologies to deliver and present only that part of the content that adapts to the dynamically changing users’ viewpoint. To enable an efficient implementation of such a feature, this edition of the specification introduces the concept of bitstream rewriting, in which a compliant bitstream is dynamically generated that, by combining the received portions of the bitstream, covers only the users’ viewport on the client.
  • Extension of OMAF beyond 360-degree video. This edition introduces the concept of viewpoints, which can be considered as user-switchable camera positions for viewing content or as temporally contiguous parts of a storyline to provide multiple choices for the storyline a user can follow.
  • Enhances the use of video, image, or timed text overlays on top of omnidirectional visual background video or images related to a sphere or a viewport.

Research aspects: standards usually define formats to enable interoperability but various informative aspects are left open for industry competition and subject to research and development. The same holds for OMAF and its 2nd edition enables researchers and developers to work towards efficient viewport-adaptive implementations focusing on the users’ viewport.

MPEG Completes the Low Complexity Enhancement Video Coding Standard

MPEG is pleased to announce the completion of the new ISO/IEC 23094-2 standard, i.e., Low Complexity Enhancement Video Coding (MPEG-5 Part 2 LCEVC), which has been promoted to Final Draft International Standard (FDIS) at the 132nd MPEG meeting.

  • LCEVC adds an enhancement data stream that can appreciably improve the resolution and visual quality of reconstructed video with an effective compression efficiency of limited complexity by building on top of existing and future video codecs.
  • LCEVC can be used to complement devices originally designed only for decoding the base layer bitstream, by using firmware, operating system, or browser support. It is designed to be compatible with existing video workflows (e.g., CDNs, metadata management, DRM/CA) and network protocols (e.g., HLS, DASH, CMAF) to facilitate the rapid deployment of enhanced video services.
  • LCEVC can be used to deliver higher video quality in limited bandwidth scenarios, especially when the available bit rate is low for high-resolution video delivery and decoding complexity is a challenge. Typical use cases include mobile streaming and social media, and services that benefit from high-density/low-power transcoding.

Research aspects: LCEVC provides a kind of scalable video coding by combining hardware- and software-based decoders that allow for certain flexibility as part of regular software life cycle updates. However, LCEVC has been never compared to Scalable Video Coding (SVC) and Scalable High-Efficiency Video Coding (SHVC) which could be an interesting aspect for future work.

The 133rd MPEG meeting will be again an online meeting in January 2021.

Click here for more information about MPEG meetings and their developments.

Report from ACM MMSys 2020 by Conor Keighrey

Conor Keighrey (@ConorKeighrey) recently completed his PhD in the Athlone Institute of Technology which aimed to capture and understand the quality of experience (QoE) within a novel immersive multimedia speech and language assessment. He is currently interested in exploring the application of immersive multimedia technologies within health, education and training.

With a warm welcome from Istanbul, Ali C. Begen (Ozyegin University and Networked Media, Turkey) opened MMSys 2020 this year. In light of the global pandemic, the conference has taken a new format being delivered online for the first time. This, however, was not the only first for MMSys, Laura Toni (University College London, UK) is introduced as the first-ever female co-chair for the conference. This year, the organising committee presented gender and culturally diverse line-up of researchers from all around the globe. In parallel, two new grand challenges were introduced on the topics of “Improving Open-Source HEVC Encoding” and “Low-latency live streaming” for the first time ever at MMSys. 

The conference attracted paper submissions from a range of multimedia topics including but not limited to streaming technologies, networking, machine learning, volumetric media, and fake media detection tools. Core areas were complemented with in-depth keynotes delivered by academic and industry experts. 

Examples of such include Ryan Overbeck’s (Google, USA) keynote on “Light Fields – Building the Core Immersive Photo and Video Format for VR and AR” presented on the first day. Light fields provide the opportunity to capture full 6DOF and photo-realism in virtual reality. In his talk, Ryan provided key insight into the camera rigs and results from Google’s recent approach to perfect the capture of virtual representations of real-world spaces.

Later during the conference, Roderick Hodgson from Amber Video presented an interesting keynote on “Preserving Video Truth: an Anti-Deepfakes Narrative”. Roderick delivered a fantastic overview of the emerging area of deep fakes, and the application platforms which are being developed to detect, what will without a doubt be used as highly influential media streams in the future. Discussion closed with Stefano Petrangeli asking how the concept of deep fakes could be applied within the context of AR filters. Although AR is within its infancy from a visual quality perspective, the future may rapidly change how we perceive faces through immersive multimedia experiences utilizing AR filters. The concept is interesting, and it leads to the question of what future challenges will be seen with these emerging technologies.

Although not the main focus of the MMSys conference, the co-located workshops have always stood out for me. I have attended MMSys for the last three years and the warm welcome expressed by all members of the research community has been fantastic. However, the workshops have always shined through as they provide the opportunity to meet those who are working in focused areas of multimedia research. This year’s MMSys was no different as it hosted three workshops:

  • NOSSDAV – The International workshop on Network and Operating System Support for Digital Audio and Video
  • PV – The International Packet Video Workshop
  • MMVE – The International Workshop on Immersive Mixed and Virtual Environment Systems

With a focus on novel immersive media experiences, the MMVE workshop was highly successful with five key presentations exploring the topics of game mechanics, cloud computing, head-mounted display field of view prediction, navigation, and delay. Highlights include the work presented by Na Wang et. Al (George Mason University) which explored field of view prediction within augmented reality experiences on mobile platforms. With the emergence of new and proposed areas of research in augmented reality cloud, field of view predication will alleviate some of the challenges associated with the optimization of network communication for novel immersive multimedia experiences in the future. 

Unlike previous years, conference organisers faced the challenge of creating social events which were completely online. A trivia night hosted on Zoom brought over 40 members of the MMSys community together virtually to test their knowledge against a wide array of general knowledge. Utilizing online the platform “Kahoot”, attendees were challenged with a series of 47 questions. With great interaction from the audience, the event provided a great opportunity to socialise in a relaxing manner much like the real world counterpart! 

Leader boards towards the end were close, with Wei Tsang Ooi gaining the first place with a last-minute bonus question! Jean Botev and Roderick Hodgson took second and third place respectively. Events like this have always been a highlight of the MMSys community, we hope to see it take place this coming year in person over some quite beers and snacks!

Mea Wang opened the N2Women Meeting on the 10th of June. The event openly discussed core influential topics such as the separation of work and life needs within the research community. With a primary objective of assisting new researchers to maintain a healthy work and life balance. Overall, the event was a success, the topic of work and life balance is important for those at all stages of their research careers. Reflecting on my own personal experiences during my PhD, it can be a struggle to determine when to “clock out” and when to spend a few extra hours engaged with research. Key members of the community shared their own personal experiences, discussing other topics such the importance of mentoring, as academic supervisors can often become a mentor for life. Ozgu Alay discussed the importance of developing connections at research-orientated events. Those new to the community should not be afraid to spark a conversation with experts in the field, often the ideal approach is to take interest in their work and begin discussion from there. 

Lastly, Mea Wang mentioned that the initiative had initially acquired funding for the purpose of travel supports and childcare for those attending the conference. Due to the online nature this year, the supports have now been placed aside for next year’s event. Such funding provides a fantastic opportunity to support the cost of attending an international conference and engage with the multimedia community!

Closing the conference, Ali C. Begen opened with the announcement of the awards. The Best Paper Award was presented by Özgü Alay and Christian Timmerer who announced Nan Jiang et al as the winner for their paper on “QuRate: Power-Efficient Mobile Immersive Video Streaming”. The paper is available for download on the ACM Digital Library at the following link. The conference closed with the announcement of key celebrations for next year as the NOSSDAV workshop celebrates it’s 30thanniversary, and the Packet Video workshop celebrates the 25th anniversary! 

Overall, the expertise in multimedia shined through in this year’s ACM MMSys, with fantastic keynotes, presentations, and demonstrations from researchers around the globe. Although there are many benefits to attending a virtual conference, after numerous experiences this year I can’t help but feel there is something missing. Over the past 3 years, I’ve attended ACM MMSys in person as a PhD candidate, one of the major benefits of in person events are social encounters. Although this year’s iteration of ACM MMSys did a phenomenal job at the presentation of these events in the new and unexpected virtual format. I believe that it is these social events which shine through as they provide the opportunity to meet, discuss, and develop professional and social links throughout the multimedia research community in a more relaxed setting. 

As a result, I look forward to what Özgü Alay, Cheng-Hsin Hsu, and Ali C. Begen have in store for us at ACM Multimedia Systems 2021, located in the beautiful city of Istanbul, Turkey.

ACM IMX 2020: What does “going virtual” mean?

I work in the department of Research & Development, based in London, at the BBC. My interests include Interactive and Immersive Media, Interaction Design, Evaluative Methods, Virtual Reality, Augmented Reality, Synchronised Experiences & Connected Homes.
In the interest of full disclosure, I serve on the steering board of ACM Interactive Media Experiences (IMX) as Vice President for Conferences. It was an honour to be invited to the organising committee as one of IMX’s first Diversity Co-Chairs and as a Doctoral Consortium Co-Chair. I will also be the General Co-Chair for ACM IMX 2021
I hope you join us at IMX 2021 but if you need convincing, please read on about my experiences with IMX 2020!
I am quite active on Twitter (@What2DoNext), so I don’t think it came as a massive surprise to the IMX community that I won the award of the Best Social Media Reporter for ACM IMX 2020. Here are some of the award-winning tweets describing a workshop, a creative challenge, the opening keynote, my co-author presenting our paper (which incidentally won an honourable mention), the closing keynote and announcing the venue for ACM IMX 2021. This report is a summary of my experiences with IMX 2020.

Before the conference

Summary of activities at IMX 2020.

For the first time in the history of IMX, it was going entirely virtual. As if that wasn’t enough, IMX 2020 was the conference that got rebranded. In 2019, it was called TVX – Interactive Experiences for Television and Online Video! However, the steering committee unanimously voted to rename and rebrand it to reflect the fact that the conference had outgrown its original remit. The new name – Interactive Media Experiences (IMX) – was succinct and all-compassing of the conference’s current scope. With the rebrand, came a revival of principles and ethos. For the first time in the history of IMX, the organising committee worked with the steering committee to include Diversity co-chairs. 

The tech industry has suffered from a lack of diverse representation, and 2020 was the year, we decided to try to improve the situation in the IMX community. So, in addition to holding the position of the Doctoral Consortium co-chair, a relatively well-defined role, I was invited to be one of two Diversity chairs. The conference was going to take place in Barcelona, Spain – a city I have been lucky to visit multiple times. I love the people, the culture, the food (and wine) and the city, especially in the summer. The organisation was on track when, due to the unprecedented and global pandemic, we called in an emergency meeting to immediately transfer conference activities to various online platforms. Unfortunately, we lost one keynote, a panel, & 3 workshops, but we managed to transfer the rest into a live virtual event over a combination of platforms: Zoom, Mozilla Hubs, Miro, Slack & Sli.do.

The organising committee came together to reach out to the IMX community to ask for their help in converting their paper, poster and demo presentations to a format suitable for a virtual conference. We were quite amazed at how the community came together to make the virtual conference possible. Quite a few of us spent a lot of late nights getting everything ready!

We set about creating an accessible program and proceedings with links to the various online spaces scheduled to host track sessions and links to papers for better access using the SIGCHI progressive web app and the ACM Publishing System. It didn’t hurt that one of our Technical Program chairs, David A. Shamma, is the current SIGCHI VP of Operations. It was also helpful to have access to the ACM’s guide for virtual conferences and the experience gained by folks like Blair McIntyre (general co-chair of IEEE VR 2020 & Professor at Georgia Institute of Technology). We also got lots of support from Liv Erickson (Emerging Tech Product Manager at Mozilla).

About a week before the conference, Mario Montagud (General Co-Chair) sent an email to all registered attendees to inform them about how to join. Honestly, there were moments when I thought it might be touch and go. I had issues with my network, last-minute committee jobs kept popping up, and social distancing was becoming problematic.

During the conference…

Traditionally, IMX brings together international researchers and practitioners from a wide range of disciplines to attend workshops and challenges on the first day followed by two days of keynotes, panels, paper presentations, posters and demos. The activities are interspersed with lunches, networking with colleagues, copious coffee and a social event. 

The advantage of a virtual event is that I had no jet lag and I woke up in my bed at home on the day of the conference. However, I had to provide my coffee and lunches in the 2020 instantiation while (very briefly) considering the option of attending an international conference in my pyjamas. The other early difference is that I didn’t get a name badge in a conference branded registration packet, however, due to my committee roles at IMX 2020, the communications team made us zoom background ‘badges’ – which I loved!

Virtual Backgrounds for use in Zoom.

My first day was exciting and diverse! I had a three-hour workshop in the morning (starting 10 AM BST) titled “Toys & the TV: Serious Play” I had organised with my colleagues Suzanne Clark and Barbara Zambrini from BBC R&D, Christoph Ziegler from IRT and Rainer Kirchknopf from ZDF. We had a healthy interest in the workshop and enthusiastic contributions. A few of the attendees contributed idea/position papers while the other attendees were asked to support their favourite amongst the presented ideas. The groups of people were then sent to a breakout group to work on the concept and produce a newspaper-type summary page of an exemplar manifestation of the idea. We all worked over Zoom and a collaborative whiteboard on Miro. It was the virtual version of an interactive “post-it on a wall” type workshop. 

Then it was time for lunch and a cup of tea while managing home learning activities for my kids. Usually, I would have been hunting for a quiet place in the conference venue (depending on the time difference) to facetime with my kids. None of that in 2020! I could chat with my fellow organising committee to make sure things were running smoothly and offer aid if needed. Most of the day’s activities were being efficiently coordinated by Mario, based during the conference, at the i2Cat offices in Barcelona.

Around 4 PM (BST), I had a near four-hour creative challenge meet up. However, before that, I dropped into the IMX in Latin America workshop which was organised by colleagues in (you guessed it) Latin America as a way to introduce the work they do to IMX. Things were going well in that workshop, so after a quick hello to the organisers, I rushed over to take part in the creative challenge!

The creative challenge, titled “Snap Creative Challenge: Reimagine the Future of Storytelling with Augmented Reality (AR) ”, was an invited event. It was sponsored by Snap (Andrés Monroy-Hernández) and co-organised by Microsoft Research (Mar González-Franco) and BBC Research & Development (myself). Earlier in the year, over six months, eleven academic teams from eight countries created AR projects to demonstrate their vision of what storytelling would look like in a world where AR is more prevalent. We mentored the teams with the help of Anthony Steed (University College London), Nonny de La Peña (Emblematic Group), Rajan Vaish (Snap), Vanessa Pope (Queen Mary, University of London), and some colleagues who generously donated their time and expertise. We started with a welcome to the event (hosted on Zoom) given by Andrés Monroy-Hernández and then it was straight into presentations of the project. Snap created a summary video of the ideas presented on the day. 

Each project was distinct, unique and had the potential for so much more development and expansion. The creative challenge was closed by one of the co-founders of Snap (Bobby Murphy). After closing, some teams had office hours where we could go and have an extended chat about the various projects. Everyone was super enthusiastic and keen to share ideas.

It was 8.20 PM, so I had to end the day with my glass of wine with my other half, but I had a brilliant day and couldn’t get over how many interesting people I got to chat to – and it was just the first day of the conference! On the second day of the conference, Christian Timmerer (Alpen-Adria-Universität Klagenfurt & Bitmovin) and I had an hour-long doctoral consortium to host bright and early at 9 AM (BST). Three doctoral students presented a variety of topics. Each student was assigned two mentors who were experts in the field the students were working in. This year, the organising committee were keen to ensure diverse participation through all streams of the conference so, Christian and I kept this in mind in choosing mentors for the doctoral students. We were also able to invite mentors regardless of whether they would travel to a venue or not since everyone was attending online. In a way, it gave us more freedom to be diverse in our choices and thinking. Turns out one hour was whetting the appetite for everyone but the conference had other activities scheduled in the day, so I quite liked having a short break before my next session at noon! Time for another cup of coffee and a piece of chocolate! 

The general chairs (Pablo Cesar – CWI, Mario Montagud & Sergi Fernandez – i2Cat) welcomed everyone to the conference at noon (BST). Pablo gave a summary of the number of participants we had at IMX. This is one of the most unfortunate things in a virtual conference. It’s difficult to get a sense of ‘being together’ with the other attendees at the conference but we got some idea from Pablo. Asreen Rostami (RISE) and I gave a summary of diversity & inclusion activities we put in place through the organisation of the conference to begin the process of improving the representation of under-represented groups within the IMX community. Unfortunately, a lot of the plans were not implemented once IMX 2020 went virtual but some of the guidance to inject diverse thinking into all parts of the conference were still carried out – ensuring that the make-up of the ACs was diverse, encouraging workshop organisers to include a diverse set of participants and use inclusive language, casting a wider net in our search for keynotes and mentors, and selecting a time period to run the conference that was best suited to a majority of our attendees. The Technical Program Co-Chair (Lucia D’Acunto, TNO) gave a summary of how the tracks were populated w.r.t papers. To round off the opening welcome for IMX 2020, Mario gave an overview of communication channels, the tools used and the conference program. The wonderful thing about being in a virtual conference is that you can easily screenshot presentations, so you have a good record of what happened. Under pre-pandemic situations, I would have photographed the slides on a screen on stage from my seat in the auditorium hall. So unfashionable in 2020 – you will agree. Getting a visual reminder of talks is useful if you want to remember key points! It also exceedingly good for illustrations as part of a report you might write about the conference three months later.

Sergi Fernandez introduced the opening keynote: Mel Slater (University of Barcelona) who talked about using Virtual Reality to Change Attitudes and Behaviour. Mel was my doctoral supervisor back in between 2001 and 2006 when I did a PhD at UCL. He was the reason I decided to focus my postgraduate studies to build expressive virtual characters. It was fantastic to “go to a conference with him” again even if he got the seat with the better weather. His opening keynote was engaging, entertaining and gave a lot of food for thought. He also had a new video of his virtual self being a rock star. To this day, I believe this is the main reason he got into VR in the first place! And why ever not?

Immediately after Mels’ talk and Q&A session, it was time to inform attendees about the demos and posters available for viewing as part of the conference. The demos and posters were displayed in a series of Mozilla Hubs rooms (domes) created by Jesús Gutierrez (Universidad Politecnica de Madrid, Demo co-chair) and I, based off some models given to us by Liv (Mozilla). We were able to personalise the virtual spaces and give it a Spanish twist using a couple of panorama images David A. Shamma (FXPAL & Technical Program co-chair for IMX 2020) found on Flickr. Ayman and Julie Williamson (Univ. of Glasgow) also enabled the infrastructure behind the IMX Hub spaces. Jesús and I gave a short ‘how-to’ presentation to let attendees know what to expect in the IMX Hub Spaces. After our presentation, Mario played a video of pitches giving us quick lightning summaries of the demos, work-in-progress poster presentations and doctoral consortium poster displays.

Thirty minutes later, it was time for the first paper session of the day (and the conference)! Ayman chaired the first four papers in the conference in a session titled ‘Augmented TV’. The first paper presented was one I co-authored with Radu-Daniel Vatavu (Univ. Stefan cel Mare of Suceava), Pejman Saeghe (Univ. of Manchester), Teresa Chambel (Univ. of Lisbon), and Marian F Ursu (Univ. of York). The paper (‘Conceptualising Augmented Reality Television for the Living Room’) examined the characteristics of Augmented Reality Television (ARTV) by analysing commonly accepted views on augmented and mixed reality systems, by looking at previous work, by looking at tangential fields (ambient media, interactive TV, 3D TV etc.) and by proposing a conceptual framework for ARTV – the “Augmented Reality Television Continuum”. The presentation is on the ACM SIGCHI’s YouTube channel if you feel like watching Pejman talk about the paper instead of reading it or maybe in addition to reading it!

Ayman and Pejman talking about our paper ‘Conceptualising Augmented Reality Television for the Living Room

I did not present the paper, but I was still relieved that it was done! I have noticed that once a paper I was involved with is done, I tend to have enough headspace to engage and ask questions of other authors. So that’s what I was able to do for the rest of the conference. In that same first paper session, Simon von der Au (IRT) et al. presented ‘The SpaceStation App: Design and Evaluation of an AR Application for Educational Television’ in which they got to work with models and videos of the International Space Station! Now, I love natural history documentaries so when I need to work with content, I don’t think I can go wrong if I choose David Attenborough narrated content – think Blue Planet. However, the ISS is a close second! They also cited two of my co-authored papers – Ziegler et al. 2018 and Saeghe et al. 2019 – which is always lovely to see.

After the first session, we had a 30-minute break before making our way to the Hubs Domes to look at demos and posters. Our outstanding student volunteers were deployed to guide IMX attendees to various domes. It was very satisfying seeing all our Hubs space populated with demos/posters with snippets of conversation flowing past as I passed through the domes to see how folks fared in the space. The whole experience resulted in a lot of selfies and images!

There were moments of delight throughout the event. I thought I’d rebel against my mom and get pink hair! Pablo got purple hair and IRL he does not have hair that colour (or that uniformly distributed). Ayman and I tried getting some virtual drinks – I got myself a pina colada while Ayman stayed sober. I also visited all the posters and demos which seldom happens when I attend conferences IRL. In Hubs, it was an excellent way to ‘bump into’ folks. I have been in the IMX community for a while, so I was able to recognise many people by reading their floating name labels. Most of their avatars looked nothing like the people I knew! Christian and Omar Niamut (TNO) had more photorealistic avatars but even those were only recognisable if I squinted! I was also very jealous of Omar’s (and Julie’s) virtual hands which they got because they visited the domes using their VR headsets. It was loads of fun seeing how people represented themselves through their virtual clothes, hair and body choice. 

All of the demos and posters were well presented but the ‘Watching Together but Apart’ caught my eye because I knew my colleagues Rajiv RamdhanyLibby Miller, and Kristian Hentschel built ‘BBC Together’ – an experimental BBC R&D prototype to enable people to watch and listen to BBC programmes together while they are physically apart. It was a response to the situation brought to a lot of our doorsteps by the pandemic! It was amazing to see that another research group responded in the same way to build a similar application. It was great fun talking to Jannik Munk Bryld about their project and compare notes.

Once the paper session was over, there was a 45 minutes break to stretch our legs and rest our eyes. Longer in-between session breaks are a necessity in virtual conferences. At 2:30 PM (BST), it was time to listen to two industry talks chaired by Steve Schirra (YouTube) and Mikel Zorrilla (Vicomtech). Mike Darnell (Samsung Electronics America) talked of conclusions he drew from a survey study of hundreds of participants which focused on user behaviour when it came to choosing what to watch on the TV. The main take-home message was that people generally knew in advance exactly what they want to watch on TV.

Natàlia Herèdia (Media UX Design) talked of her pop-up media lab focusing on designing an OTT for a local public channel. She spoke of the process she used and gave a summary of her work on reaching new audiences. 

After the industry talk, it was time for a half an hour break. The organising committee and student volunteers went out to the demo domes in Hubs to get a group selfie! We realised that Ayman has serious ambitions when it comes to cinematography. After we got our shots, we attended another paper session chaired by Aisling Kelliher (Virginia Tech) titled ‘Live Production and Audience’. Other people might have mosquitos or mice as a pest problem. In this paper session, I learnt that there are people like Aisling whose pest problems are a little more significant – like bear sized bigger! So many revelations in such a short time! 

The first paper of the last session, titled ‘DAX: Data-Driven Audience Experiences in Esports’, was presented by Athanasios Vasileios Kokkinakis (Univ. of York). He gave a fascinating insight into how companion screen applications might allow audiences to consume interesting data-driven insights during and around the broadcasts of Esports. It was great to see this wort of work since I have some history of working on companion screen applications with sports being one of the genres that could benefit from multi-device applications. The paper won the best paper award! Yvette Wohn (New Jersey Institute of Technology) presented a paper, titled ‘Audience Management practices of Live Streamers on Twitch’, in which she interviewed Twitch streamers to understand how streamers discover audience composition and use appropriate mechanisms to interact with them. The last paper of the conference was presented by Marian –  ‘Authoring Interactive Fictional Stories in Object-Based Media (OBM)’. The paper referred to quite a few BBC R&D OBM projects. Again, it was quite lovely to see some reaffirmation of ideas with similar thought processes flowing through the screen.

At 6 PM (BST), I had the honour of chairing the closing keynote by Nonny. Nonny had a lot of unique immersive journalism pieces to show us! She also gave us a live demo of her XR creation, remixing and sharing platform – REACH.love. She imported a virtual character inspired by the Futurama animated character – Bender. Incidentally, my very first virtual character was also created in Bender’s image. I had to remove the antenna off his head because Anthony Steed, who was my project lead at the time, wasn’t as appreciative of my character design – tragic times. 

Alas, we had come near the end of the conference which meant it was time for Mario to give a summary of numbers to indicate how many attendees participated in IMX 2020 – spoiler: it was the highest attendance yet. He also handed out various awards. It turns out that our co-authored paper on ‘Conceptualising Augmented Reality Television for the Living Room’ got an honourable mention! More importantly, I was awarded the best social media reporter which is of course why you are reading this report! I guess this is an encouragement to keep on tweeting about IMX!

Frank Bentley (Verizon Media, IMX Steering Committee president) gave a short presentation in which he acknowledged that it was June the 19th – Juneteenth (Freedom Day) in the US. He gave a couple of poignant suggestions on how we might consider marking the day. He also talked about the rebranding exercise that resulted in the conference going from TVX to IMX.

Frank also announced that we are looking for host bids for IMX 2022! As VP of Conferences, I would be very excited to hear from you! Please do email me if you are looking for information about hosting an IMX conference in 2022 or beyond. You can also drop me a tweet @What2DoNext!

He then handed over the floor to Yvette and me to announce the proposed venue of IMX 2021 – New York! A few of the organising committee positions are still up for grabs. Do consider joining our exciting and diverse organising committee if you feel like you could contribute to making IMX 2021 a success! In the meantime, I managed to persuade my lovely colleague at BBC R&D (Vicky Barlow) to make a teaser video to introduce IMX 2021.

That brought us to the end of IMX 2020, sadly. The stragglers of the IMX community lingered a little to have a little bit of chat over zoom which was lovely.

After the conference…

You would think that once the conference was over, that was it but no, not so. In years past, all that was left to do was to stalk people you met at the conference on LinkedIn to make sure the ‘virtual business cards’ were saved. Of course, I did a bit of that this year as well. However, this year had been a much more involved experience. I have had a chance to define the role of Diversity chairs with Asreen. I have had the chance to work with Ayman, Julie, Jesús, Liv and Blair to bring demos and posters to Hubs as part of the IMX 2020 virtual experience. It was a blast! You might have thought that I would be taking a rest! You would be wrong! 

I am joining forces with Yvette and the rest of a whole new committee to start organising IMX 2021 – New York into a format that continues the success of IMX 2020 and strive to improve on it. Finally, let’s not forget Frank’s reminder that we are looking for colleagues out there (maybe you?) to host IMX 2022 and beyond! 

The story continues… Do get in touch!

VQEG Column: Recent contributions to ITU recommendations

Welcome to the second column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG).
VQEG plays a major role in research and the development of standards on video quality and this column presents examples of recent contributions to International Telecommunication Union (ITU) recommendations, as well as ongoing contributions to recommendations to come in the near future. In addition, the formation of a new group within VQEG addressing Quality Assessment for Health Applications (QAH) has been announced.  

VQEG website: www.vqeg.org
Jesús Gutiérrez (jesus.gutierrez@upm.es), Universidad Politécnica de Madrid (Spain)
Kjell Brunnström (kjell.brunnstrom@ri.se), RISE (Sweden) 
Thanks to Lucjan Janowski (AGH University of Science and Technology), Alexander Raake (TU Ilmenau) and Shahid Satti (Opticom) for their help and contributions.


VQEG is an international and independent organisation that provides a forum for technical experts in perceptual video quality assessment from industry, academia, and standardization organisations. Although VQEG does not develop or publish standards, several activities (e.g., validation tests, multi-lab test campaigns, objective quality models developments, etc.) carried out by VQEG groups have been instrumental in the development of international recommendations and standards. VQEG contributions have been mainly submitted to relevant ITU Study Groups (e.g., ITU-T SG9, ITU-T SG12, ITU-R WP6C), but also to other standardization bodies, such as MPEG, ITU-R SG6, ATIS, IEEE P.3333 and P.1858, DVB, and ETSI. 

In our first column on the ACM SIGMM Records we provided a table summarizing the several VQEG studies that have resulted in ITU Recommendations. In this new column, we describe with more detail the last contributions to recent ITU standards, and we provide an insight on the ongoing contributions that may result in ITU recommendations in the near future.

ITU Recommendations with recent inputs from VQEG

ITU-T Rec. P.1204 standard series

A campaign within the ITU-T Study Group (SG) 12 (Question 14) in collaboration with the VQEG AVHD group resulted in the development of three new video quality model standards for the assessment of sequences of up to UHD/4K resolution. This campaign was carried out during more than two years under the project “AVHD-AS / P.NATS Phase 2”. While “P.NATS Phase 1” (finalized in 2016 and resulting in the standards series ITU-T Rec. P.1203, P.1203.1, P.1203.2 and P.1203.3) addressed the development of improved bitstream-based models for the prediction of the overall quality of long (1-5 minutes) video streaming sessions, the second phase addressed the development of short-term video quality models covering a wider scope with bitstream-based, pixel-based and hybrid models. The P.NATS Phase 2 project was executed as a competition between nine participating institutions in different tracks resulting in the aforementioned three types of video quality models. 

For the competition, a total of 26 databases were created, 13 used for training and 13 for validation and selection of the winning models. In order to establish the ground truth, subjective video quality tests were performed on four different display devices (PC-monitors, 55-75” TVs, mobile, and tablet) with at least 24 subjects each and using the 5-point Absolute Category Rating (ACR) scale. In total, about 5000 test sequences with a duration of around 8 seconds were evaluated, containing a variety of resolutions, encoding configurations, bitrates, and framerates using the codecs H.264/AVC, H.265/HEVC and VP9.   

More details about the whole workflow and results of the competition can be found in [1]. As a result of this competition, the new standard series ITU-T Rec. P.1204 [2] has been recently published, including a bitstream-based model  (ITU-T Rec. P.1204.3 [3]), a pixel-based model (ITU-T Rec. P.1204.4 [4]) and a hybrid model (ITU-T Rec. P.1204.5 [5]).

ITU-T Rec. P.1401

ITU-T Rec. P.1401 [6] is about statistical analysis, evaluation and reporting guidelines of quality measurements and was recently revised in January 2020.  Based on the article by Brunnström and Barkowsky [7], it was recognized and pointed out by VQEG that this Recommendation, which is very useful, lacked a section on the topic of multiple comparisons and its potential impact on the performance evaluations of objective quality methods. In the latest revision, Section 7.6.5 covers this topic.

Ongoing VQEG Inputs to ITU Recommendations

ITU-T Rec. P.919

ITU has been working on a recommendation for subjective test methodologies for 360º video on Head-Mounted Displays (HMDs), under the SG12 Question 13 (Q13). The Immersive Media Group (IMG) of the VQEG has collaborated in this effort through the fulfilment of the Phase 1 of the Test Plan for Quality Assessment of 360-degree Video. In particular, the Phase 1 of this test plan addresses the assessment of short sequences (less than 30 seconds), in the spirit of ITU-R BT.500 [8] and ITU-T P.910 [9]. In this sense, the evaluation of audiovisual quality and simulator sickness was considered. On the other hand, the Phase 2 of the test plan (envisioned for the near future) covers the assessment of other factors that can be more influential with longer sequences (several minutes), such as immersiveness and presence.  

Therefore, within Phase 1 the IMG designed and executed a cross-lab test with the participation of ten international laboratories, from AGH University of Science and Technology (Poland), Centrum Wiskunde & Informatica (The Netherlands), Ghent University (Belgium), Nokia Bell-Labs (Spain), Roma TRE University (Italy), RISE Acreo (Sweden), TU Ilmenau (Germany), Universidad Politécnica de Madrid (Spain), University of Surrey (England), Wuhan University (China). 

This test was aimed at assessing and validating subjective evaluation methodologies for 360º video. Thus, the single-stimulus methodology Absolute Category Rating (ACR) and the double-stimulus Degradation Category Rating (DCR) were considered to evaluate audiovisual quality of 360º videos distorted with uniform and non-uniform degradations.  In particular, different configurations of uniform and tile-based coding were applied to eight video sources with different spatial, temporal and exploration properties. Other influence factors were also studied, such as the influence of the sequence duration (from 10 to 30s) and the test setup (considering different HMDs and methods to collect the observers’ ratings, using audio or not, etc.).  Finally, in addition to the evaluation of audiovisual quality, the assessment of simulator sickness symptoms was addressed studying the use of different questionnaires. As a result of this work, the IMG of VQEG presented two contributions to the recommendation ITU-T Rec. P.919 (ex P.360-VR), which has been consented in the last SG12 meeting (7-11 September 2020) and is envisioned to be published soon. In addition, the results and the annotated dataset coming from the cross-lab test will be published soon.

ITU-T Rec. P.913

Another upcoming contribution is prepared by the Statistical Analysis Group (SAM). The main goal of the proposal is to increase the precision of the subjective experiment analysis by describing a subjective answer as a random variable. The random variable is described by three key influencing factors, the sequence quality, a subject bias, and a subject precision. It is further development of the ITU-T P.913 [10] recommendation where subject bias was introduced. Adding subject precision allows for two achievements: Better handling unreliable subjects and easier estimation procedure. 

Current standards describe a way to remove an unreliable subject. The problem is that the methods proposed in BT.500 [8] and P.913 [10] are different and point to different subjects. Also, both methods have some arbitrary parameters (e.g., thresholds) deciding when a subject should be removed. It means that two subjects can be similarly imprecise but one is over the threshold, and we accept all his answers as correct and the other is under the threshold, and we remove her all answers. The proposed method weights the impact of each subject answer depending on the subject precision. As the consequence, each subject is to some extent removed and kept. The balance between how much information we keep and how much we remove depends on the subject precision. 

The estimation procedure of the proposed model, described in the literature, is MLE (Maximum Likelihood Estimation). Such estimation is computationally costly and needs a careful setup to obtain a reliable solution. Therefore, we proposed Alternating Projection (AP) solver which is less general than MLE but works as well as MLE for the subject model estimation. This solver is called “alternating projection” because, in a loop, we alternate between projecting (or averaging) the opinion scores along the subject dimension and the stimulus dimension. It increases the precision of the obtained model parameters’ step by step weighting more information coming from the more precise subjects. More details can be found in the white paper in [11].

Other updates 

A new VQEG group has been recently established related to Quality Assessment for Health Applications (QAH), with the motivation to study visual quality requirements for medical imaging and telemedicine. The main goals of this new group are:

  • Assemble all the existing publicly accessible databases on medical quality.
  • Develop databases with new diagnostic tasks and new objective quality assessment models.
  • Provide methodologies, recommendations and guidelines for subjective test of medical image quality assessment.
  • Study the quality requirements and Quality of Experience in the context of telemedicine and other telehealth services.

For any further questions or expressions of interest to join this group, please contact QAH Chair Lu Zhang (lu.ge@insa-rennes.fr), Vice Chair Meriem Outtas (Meriem.Outtas@insa-rennes.fr), and Vice Chair Hantao Liu (hantao.liu@cs.cardiff.ac.uk).


[1] A. Raake, S. Borer, S. Satti, J. Gustafsson, R.R.R. Rao, S. Medagli, P. List, S. Göring, D. Lindero, W. Robitza, G. Heikkilä, S. Broom, C. Schmidmer, B. Feiten, U. Wüstenhagen, T. Wittmann, M. Obermann, R. Bitto, “Multi-model standard for bitstream-, pixel-based and hybrid video quality assessment of UHD/4K: ITU-T P.1204” , IEEE Access, 2020 (Available online soon).   
[2] ITU-T Rec. P.1204. Video quality assessment of streaming services over reliable transport for resolutions up to 4K. Geneva, Switzerland: ITU, 2020.
[3] ITU-T Rec. P.1204.3. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to full bitstream information. Geneva, Switzerland: ITU, 2020.
[4] ITU-T Rec. P.1204.4. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to full and reduced reference pixel information. Geneva, Switzerland: ITU, 2020.
[5] ITU-T Rec. P.1204.5. Video quality assessment of streaming services over reliable transport for resolutions up to 4K with access to transport and received pixel information. Geneva, Switzerland: ITU, 2020.
[6] ITU-T Rec. P.1401. Methods, metrics and procedures for statistical evaluation, qualification and comparison of objective quality prediction models. Geneva, Switzerland: ITU, 2020.
[7] K. Brunnström and M. Barkowsky, “Statistical quality of experience analysis: on planning the sample size and statistical significance testing”, Journal of Electronic Imaging, vol. 27, no. 5,  p. 11, Sep. 2018 (DOI: 10.1117/1.JEI.27.5.053013).
[8] ITU-R Rec. BT.500-14. Methodology for the subjective assessment of the quality of television pictures. Geneva, Switzerland: ITU, 2019.
[9]  ITU-T Rec. P.910. Subjective video quality assessment methods for multimedia applications. Geneva, Switzerland: ITU, 2008.
[10] ITU-T Rec. P.913. Methods for the subjective assessment of video quality, audio quality and audiovisual quality of Internet video and distribution quality television in any environment. Geneva, Switzerland: ITU, 2016.
[11] Z. Li, C. G. Bampis, L. Janowski, I. Katsavounidis, “A simple model for subject behavior in subjective experiments”, arXiv:2004.02067, Apr. 2020.

An interview with Benoit Huet

Benoit at the beginning of his research career.

Describe your journey into research from your youth up to the present. What foundational lessons did you learn from this journey? Why were you initially attracted to multimedia?

This is an excellent question. Indeed, life is a journey, and every step is a lesson. I was originally attracted by electronics but as I was studying it, I discovered computers. Remember, for those who were old enough in the 1980’s, this was the start of personal computers. So I decided to learn more about them, and as I was studying computer science I found out about AI, yes AI 1990’s style. I was interested, but this coincided with one of the AI winters and I was advised, or rather decided, to go in a different direction. The area that attracted me most was computer vision. The reason was that it seemed like a very hard problem which would clearly have a very broad impact. It turns out that vision alone is indeed very hard and using additional information or signals could help obtain better results, hence reducing time to impact for such a scientific approach/method. This was what attracted me to multimedia and kept me busy for many years at EURECOM. What did I learn along the way? Follow your instinct and your heart as you go along as it is rare to know where to go from the very start. Your destination might not even exist at the time you started your journey!

Tell us more about your vision and objectives behind your current roles? What do you hope to accomplish and how will you bring this about?

Since July 2019 I have headed the Data Science team of MEDIAN Technologies. The objective is to bring recent advances from the field of computer vision, neural networks, and also multimedia to part from the way medical imaging is currently performed while providing solutions to detect cancer at the earliest possible stage of the disease and help identify the best treatment for each patient. Concretely, we are currently working on the identification of biomarkers for Hepatocellular Carcinoma (HCC) which is the most common type of primary liver cancer and which is known to be a difficult organ for medical imaging solutions.

Can you profile your current research, its challenges, opportunities, and implications?

To answer this very broad question concisely, I will limit myself to one challenge, one opportunity, and one implication. For the challenge, I will mention one key challenge which I have encountered many times in many projects: Interdisciplinary communication. In most projects, whether small or large and involving multiple domains of expertise, the communication between people of different backgrounds is not as straightforward as one would assume. It is important to address this challenge proactively. For the opportunity, medical imaging is nowadays still mostly employing “traditional” machine learning on top of man-made features (Grey-Level Co-occurrence Matrices, Gabor, etc). The “end to end” paradigm shift brought in by the recent developments in the field of deep neural nets is still to take place in the medical domain at large. This is what we aim to achieve for medical imaging for Oncology. The implication, a significant improvement in the detection of cancer, such as the early detection of tumors. Such early detection allows for therapy to take place at the earliest possible stage hence drastically increasing the patient chance of survival. Saving lives in short.

How would you describe your top innovative achievements in terms of the problems you were trying to solve, your solutions, and the impact it has today and into the future?

There is a number of research work originating from my team that would be worth mentioning here; EventEnricher for collecting media illustrating event, the Hyper Video Browser, an interactive tool for searching and hyperlinking in broadcast media, or the work resulting from the collaborative project NexGen-TV: Providing real-time insight during political debates in a second screen application… to name just a few.

But the one with the highest impact is the work performed while on sabbatical at IBM T.J. Watson research center. As I onboarded, the research group received a request from the 20th Century Fox, regarding the possibility for some AI to help generate the trailer of a sci-fi horror movie that was about to be released. The project was both challenging and interesting as I had previously addressed video summarization and multimedia emotion recognition as part of previous research projects. The challenge was the limited amount of time available to deliver the shots which using state of the art machine learning were identified as the best suited to be part of the trailer. The team worked hard and hard work was rewarded multiple times. First because the hard deadline was met, having the “AI Trailer” on time for the screening of the movie in the US. Second because Fox sent a whole video crew to shoot the making of the trailer behind the scene. The video was posted on YouTube and got about 2 million views in about a week. This was the level of impact this scientific research work had. And if that was not enough, the work got another reward at the ACM Multimedia 2017 conference for being the Best Brave New Ideas paper that year.

Over your distinguished career, what are your top lessons you want to share with the audience?

Over the years I have observed that as a researcher, one needs to be curious while being able to find a good compromise between being focused and exploring new or alternative options/approaches. I feel that it is easy for today’s young researchers to be overwhelmed with the pace at which high-quality publications are becoming available. Social media (i.e. Twitter) and online repositories (i.e. Arxiv) are no stranger to this situation. There will always be a new paper reporting something potentially interesting with respect to your research, yet it doesn’t mean you should keep reading and reading at the cost of making slow or no progress on your own work! Reading and being aware of the state of the art is one thing, contributing and being innovative is another and the latter is the key to a successful PhD. Life as a researcher whether in academia or in the industry is made of choices, directions to follow, etc. While more senior people may sometimes rely on their experience, I believe it is important to listen to your inner self and follow what motivates you whenever possible. I have always believed and often witnessed that it is easier to work toward something of interest (to yourself) and in most cases the outcome exceeds expectations.

What is the best joke you know?

I have a very bad memory for jokes. Tell me one and I will laugh because I have a good sense of humor. But ask me to tell the story the next day and I will not be able to. So I looked jokes on the internet and here is the first one that made me laugh (I did read quite a few before!!!):

Two men meet on opposite sides of a river.  One shouts to the other, “I need you to help me get to the other side!” The other guy replies, “You’re on the other side!”

Not the best, but it will do for now!

If you were conducting this interview, what questions would you ask, and then what would be your answers?

The COVID-19 pandemic is affecting people’s lives on an international scale, do you think this will have an influence on research and in particular multimedia research?

Indeed, the situation is forcing us to change the way we collaborate and interact. As a researcher, one regularly travels for project meetings, conferences, PhD presentations, etc. in addition to local activities and events such as teaching, labs, group meetings, etc. With current travel restrictions and social distancing recommendations, remote work relying heavily on high bandwidth internet has developed to an unprecedented level, exposing both its limitations and advantages. Similarly, scientific conferences where a lot of interaction takes place have been forced to adapt. At first, organizers postponed the events, hoping the situation will quickly return to normal. However, with the extended duration of the pandemic, the shift from physical to remote or virtual conferencing, using online tools and systems, had to be performed. This clearly demonstrated not only the possibility of organizing such events online but also showed some limitations regarding interaction. On this topic, this could be a great opportunity for the multimedia community to have an impact at large. Indeed, who would be better suited to contribute to the next generation of tools for effective interactive remote work and conferencing than the multimedia community. I believe we have a role to play and look forward to seeing and using such tools. I didn’t touch on the health aspect of this question but that is also something multimedia researchers, usually well acquainted with the state of the art machine learning, can contribute to. On that note, if Medical Imaging is a topic that attracts you and that you are motivated by, do not hesitate to reach out.

Disclaimer: All views expressed in this interview are my own and do not represent the opinions of any entity with which I have been or am now affiliated.

A recent photo of Benoit.

Bio: Benoit Huet heads the data science team at MEDIAN Technologies. His research interests include computer vision, machine learning, and large-scale multimedia data mining and indexing.

JPEG Column: 88th JPEG Meeting

The 88th JPEG meeting initially planned to be held in Geneva, Switzerland, was held online because of the Covid-19 outbreak.

JPEG experts organised a large number of sessions spread over day and night to allow the remote participation of multiple time zones. A very intense activity has resulted in multiple outputs and initiatives. In particular two new explorations activities were initiated. The first explores possible standardisation needs to address the growing emergence of fake media by introducing appropriate security features to prevent the misuse of media content. The latest, considers the use of DNA for media content archival.

Furthermore, JPEG has started the work on the new part 8 of the JPEG Systems standard, called JPEG snack, for interoperable rich image experiences, and it is holding two Call for Evidence, JPEG AI and JPEG Pleno Point cloud coding.

Despite travel restrictions, JPEG Committee has managed to keep up with the majority of its plans, defined prior to the COVID-19 outbreak. An overview of the different activities is represented in Fig. 1.

Figure 1 – JPEG Planned Timeline.

The 88th JPEG meeting had the following highlights:

  • JPEG explores standardization needs to address fake media
  • JPEG Pleno Point Cloud call for evidence
  • JPEG DNA – based archival of media content using DNA
  • JPEG AI call for evidence
  • JPEG XL standard evolves to a final specification
  • JPEG Systems part 8, named JPEG Snack progress
  • JPEG XS Part-1 2nd Edition first ballot.

JPEG explores standardization needs to address fake media

Recent advances in media manipulation, particularly deep learning-based approaches, can produce near realistic media content that is almost indistinguishable from authentic content to the human eye. These developments open opportunities for production of new types of media contents that are useful for the entertainment industry and other business usage, e.g., creation of special effects or artificial natural scene production with actors in the studio. However, this also leads to issues relating to fake media generation undermining the integrity of the media (e.g., deepfakes), copyright infringements and defamation to mention a few examples. Misuse of manipulated media can cause social unrest, spread rumours for political gain or encourage hate crimes. In this context, the term ‘fake’ is used here to refer to any manipulated media, independently of its ‘good’ or ‘bad’ intention.

In many application domains, fake media producers may want or may be required to declare the type of manipulations performed, in opposition to other situations where the intention is to ‘hide’ the mere existence of such manipulations. This is already leading various Governmental organizations to plan new legislation or companies (especially social media platforms or news outlets) to develop mechanisms that would clearly detect and annotate manipulated media contents when they are shared. While growing efforts are noticeable in developing technologies, there is a need to have a standard for the media/metadata format, e.g., a JPEG standard that facilitates a secure and reliable annotation of fake media, both in good faith and malicious usage scenarios. To better understand the fake media ecosystem and needs in terms of standardization, the JPEG Committee has initiated an in-depth analysis of fake media use cases, naturally independently of the “intentions”.     

More information on the initiative is available on the JPEG website. Interested parties are invited to join the above AHG through the following URL: http://listregistration.jpeg.org.

JPEG Pleno Point Cloud

JPEG Pleno is working towards the integration of various modalities of plenoptic content under a single and seamless framework. Efficient and powerful point cloud representation is a key feature within this vision. Point cloud data supports a wide range of applications including computer-aided manufacturing, entertainment, cultural heritage preservation, scientific research and advanced sensing and analysis. During the 88th JPEG meeting, the JPEG Committee released a Final Call for Evidence on JPEG Pleno Point Cloud Coding that focuses specifically on point cloud coding solutions supporting scalability and random access of decoded point clouds. Between the 88th and 89th meetings, the JPEG Committee will be actively promoting this activity and collecting registrations to participate in the Call for Evidence.


In digital media information, notably images, the relevant representation symbols, e.g. quantized DCT coefficients, are expressed in bits (i.e., binary units) but they could be expressed in any other units, for example the DNA units which follow a 4-ary representation basis. This would mean that DNA molecules may be created with a specific DNA units’ configuration which stores some media representation symbols, e.g. the symbols of a JPEG image, thus leading to DNA-based media storage as a form of molecular data storage. JPEG standards have been used in storage and archival of digital pictures as well as moving images. While the legacy JPEG format is widely used for photo storage in SD cards, as well as archival of pictures by consumers,  JPEG 2000 as described in ISO/IEC 15444 is used in many archival applications, notably for preservation of cultural heritage in form of visual data as pictures and video in digital format. This puts the JPEG Committee in a unique position to address the challenges in DNA-based storage by creating a standard image representation and coding for such applications. To explore the latter, an AHG has been established. Interested parties are invited to join the above AHG through the following URL: http://listregistration.jpeg.org.


At the 88th meeting, the submissions to the Call for Evidence were reported and analysed. Six submissions were received in response to the Call for Evidence made in coordination with the IEEE MMSP 2020 Challenge. The submissions along with the anchors were already evaluated using objective quality metrics. Following this initial process, subjective experiments have been designed to compare the performance of all submissions. Thus, during this meeting, the main focus of JPEG AI was on the presentation and discussion of the objective performance evaluation of all submissions as well as the definition of the methodology for the subjective evaluation that will be made next.


The standardization of the JPEG XL image coding system is nearing completion. Final technical comments by national bodies have been received for the codestream (Part 1); the DIS has been approved and an FDIS text is under preparation. The container file format (Part 2) is progressing to the DIS stage. A white paper summarizing key features of JPEG XL is available at http://ds.jpeg.org/whitepapers/jpeg-xl-whitepaper.pdf.

JPEG Systems

ISO/IEC has approved the JPEG Snack initiative to deliver interoperable rich image experiences.  As a result, the JPEG Systems Part 8 (ISO/IEC 19566-8) has been created to define the file format construction and the metadata signalling and descriptions which enable animation with transition effects.  A Call for Participation and updated use cases and requirements have been issued. The CfP and the use cases and requirements documents are available at http://ds.jpeg.org/documents/wg1n87035-REQ-JPEG_Snack_Use_Cases_and_Requirements_v2_2.pdf and http://ds.jpeg.org/documents/wg1n88032-SI-CfP_JPEG_Snack.pdf respectively.

An updated working draft for the JLINK initiative was completed.  Interest parties are encouraged to review the JLINK Working Draft 3.0 available at http://ds.jpeg.org/documents/wg1n88031-SI-JLINK_WD_3_0.pdf


The JPEG committee is pleased to announce a significant step in the standardization of an efficient Bayer image compression scheme, with the first ballot of the 2nd Edition of JPEG XS Part-1.

The new edition of this visually lossless low-latency and lightweight compression scheme now includes image sensor coding tools allowing efficient compression of Color-Filtered Array (CFA) data. This compression enables better quality and lower complexity than the corresponding compression in the RGB domain.  It can be used as a mezzanine codec in various markets such as real-time video storage in and outside of cameras, and data compression onboard autonomous cars.

Final Quote

“Fake Media has become a challenge with the wide-spread manipulated contents in the news. JPEG is determined to mitigate this problem by providing standards that can securely identify manipulated contents.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Future JPEG meetings are planned as follows:

  • No 89, will be held online from October 5 to 9, 2020.

Towards Interactive QoE Assessment of Robotic Telepresence

Telepresence robots (TPRs) are remote-controlled, wheeled devices with an internet connection. A TPR can “teleport” you to a remote location, let you drive around and interact with people.  A TPR user can feel present in the remote location by being able to control the robot position, movements, actions, voice and video. A TPR facilitates human-to-human interaction, wherever you want and whenever you want. The human user sends commands to the TPR by pressing buttons or keys from a keyboard, mouse, or joystick.

A Robotic Telepresence Environment

In recent years, people from different environments and backgrounds have started to adopt TPRs for private and business purposes such as attending a class, roaming around the office and visiting patients. Due to the COVID-19 pandemic, adoption in healthcare has increased in order to facilitate social distancing and staff safety [Ackerman 2020, Tavakoli et al. 2020].

Robotic Telepresence Sample Use Cases

Despite such increase in adoption, a research gap remains from a QoE perspective, as TPRs offer interaction beyond the well understood QoE issues in traditional static audio-visual conferencing. TPRs, as remote-controlled vehicles, enable users with some form of physical presence at the remote location. Furthermore, for those people interacting with the TPR at the remote location, the robot is a physical representation or proxy agent of its remote operator. The operator can physically interact with the remote location by driving over an object or pushing an object forward. These aspects of teleoperation and navigation represent an additional dimension in terms of functionality, complexity and experience.

Navigating a TPR may pose challenges to end-users and influence their perceived quality of the system. For instance, when a TPR operator is driving the robot, he/she expects an instantaneous reaction from the robot. An increased delay in sending commands to the robot may thus negatively impact robot mobility and the user’s satisfaction, even if the audio-visual communication functionality itself is not affected.

In a recent paper published at QoMEX 2020 [Jahromi et al. 2020], we addressed this gap in research by means of a subjective QoE experiment that focused on the QoE aspects of live TPR teleoperation over the internet. We were interested in understanding how network QoS-related factors influence the operator’s QoE when using a TPR in an office context.

TPR QoE User Study and Experimental Findings

In our study, we investigated the QoE of TPR navigation along three research questions: 1) impact of network factors including bandwidth, delay and packet loss on the TPR navigation QoE, 2) discrimination between navigation QoE and video QoE, 3) impact of task on TPR QoE sensitivity.

The QoE study participants were situated in a laboratory setting in Dublin, Ireland, where they navigated a Beam Plus TPR via keyboard input on a desktop computer. The TPR was placed in a real office setting of California Telecom in California, USA. Bandwidth, delay and packet loss rate were manipulated on the operator’s PC.

A User Participating in the Robotic Telepresence QoE Study

A total of 23 subjects participated in our QoE lab study: 8 subjects were female and 15 male and the average test duration was 30 minutes per participant. We followed  ITU-T Recommendation BT.500 and detected three participants as outliers which were excluded from subsequent analysis. A post-test survey shows that none of the participants reported task boredom as a factor. In fact, many reported that they enjoyed the experience! 

The influence of network factors on Navigation QoE

All three network influence factors exhibited a significant impact on navigation QoE but in different ways. Above a threshold of 0.9 Mbps, bandwidth showed no influence on navigation QoE, while 1% packet loss already showed a noticeable impact on the navigation QoE.  A mixed-model ANOVA confirms that the impact of the different network factors on navigation quality ratings is statistically significant (see [Jahromi et al. 2020] for details).  From the figure below, one can see that the levels of navigation QoE MOS, as well as their sensitivity to network impairment level, depend on the actual impairment type.

The bar plots illustrate the influence of network QoS factors on the navigation quality (left) and the video quality (right).

Discrimination between navigation QoE and video QoE

Our study results show that the subjects were capable of discriminating between video quality and navigation quality, as they treated them as separate concepts when it comes to experience assessment. Based on ANOVA analysis [Jahromi et al. 2020], we see that the impact of bandwidth and packet loss on TPR video quality ratings were statistically significant. However, for the delay, this was not the case (in contrast to navigation quality).  A comparison of navigation quality and video quality subplots shows that changes in MOS across different impairment levels diverge between the two in terms of amplitude.  To quantify this divergence, we performed a Spearman Rank Ordered Correlation Coefficient (SROCC) analysis, revealing only a weak correlation between video and navigation quality (SROCC =0.47).

Impact of task on TPR QoE sensitivity

Our study showed that the type of TPR task had more impact on navigation QoE than streaming video QoE. Statistical analysis reveals that the actual task at hand significantly affects QoE impairment sensitivity, depending on the network impairment type. For example, the interaction between bandwidth and task is statistically significant for navigation QoE, which means that changes in bandwidth were rated differently depending on the task type. On the other hand, this was not the case for delay and packet loss. Regarding video quality, we do not see a significant impact of task on QoE sensitivity to network impairments, except for the borderline case for packet loss rate.

Conclusion: Towards a TPR QoE Research Agenda

There were three key findings from this study. First, we understand that users can differentiate between visual and navigation aspects of TPR operation. Secondly, all three network factors have a significant impact on TPR navigation QoE. Thirdly,  visual and navigation QoE sensitivity to specific impairments strongly depends on the actual task at hand. We also found the initial training phase to be essential in order to ensure familiarity of participants with the system and to avoid bias caused by novelty effects. We observed that participants were highly engaged when navigating the TPR, as was also reflected in the positive feedback received during the debriefing interviews. We believe that our study methodology and design, including task types, worked very well and can serve as a solid basis for future TPR QoE studies. 

We also see the necessity of developing a more generic, empirically validated, TPR experience framework that allows for systematic assessment and modelling of QoE and UX in the context of TPR usage. Beyond integrating concepts and constructs that have been already developed in other related domains such as (multi-party) telepresence, XR, gaming, embodiment and human-robot interaction, the development of such a framework must take into account the unique properties that distinguish the TPR experience from other technologies:

  • Asymmetric conditions
    The factors influencing  QoE for TPR users are not only bidirectional, they are also different on both sides of TPR, i.e., the experience is asymmetric. Considering the differences between the local and the remote location, a TPR setup features a noticeable number of asymmetric conditions as regards the number of users, content, context, and even stimuli: while the robot is typically controlled by a single operator, the remote location may host a number of users (asymmetry in the number of users). An asymmetry also exists in the number of stimuli. For instance, the remote users perceive the physical movement and presence of the operator by the actual movement of the TPR. The experience of encountering a TPR rolling into an office is a hybrid kind of intrusion, somewhere between a robot and a physical person. However, from the operator’s perspective, the experience is a rather virtual one, as he/she only becomes conscious of physical impact at the remote location only by means of technically mediated feedback.
  • Social Dimensions
    According to [Haans et al. 2012], the experience of telepresence is defined as “a consequence of the way in which we are embodied, and that the capability to feel as if one is actually there in a technologically mediated or simulated environment is a natural consequence of the same ability that allows us to adjust to, for example, a slippery surface or the weight of a hammer”.
    The experience of being present in a TPR-mediated context goes beyond AR and VR. It is a blended physical reality. The sense of ownership of a wheeled TPR by means of mobility and remote navigation of using a “physical” object, allows the users to feel as if they are physically present in the remote environment (e.g. a physical avatar). This allows the TPR users to get involved in social activities, such as accompanying people and participating in discussions while navigating, sharing the same visual scenes, visiting a place and getting involved in social discussions, parties and celebrations. In healthcare, a doctor can use TPR for visiting patients as well as dispensing and administering medication remotely.
  • TPR Mobility and Physical Environment
    Mobility is a key dimension of telepresence frameworks [Rae et al. 2015]. TPR mobility and navigation features introduce new interactions between the operators and the physical environment.  The environmental aspect becomes an integral part of the interaction experience [Hammer et al. 2018].
    During a TPR usage, the navigation path and the number of obstacles that a remote user may face can influence the user’s experience. The ease or complexity of navigation can change the operator’s focus and attention from one influence factor to another (e.g., video quality to navigation quality). In Paloski et al’s, 2008 study, it was found that cognitive impairment as a result of fatigue can influence user performance concerning robot operation [Paloski et al. 2008]. This raises the question of how driving and interaction through TPR impacts the user’s cognitive load and results in fatigue compared to physical presence.
    The mobility aspects of TPRs can also influence the perception of spatial configurations of the physical environment. This allows the TPR user to manipulate and interact with the environment from a spatial configuration aspect [Narbutt et al. 2017]. For example,  the ambient noise of the environment can be perceived at different levels. The TPR operator can move the robot closer to the source of the noise or keep a distance from it. This can enhance his/her feelings of being present [Rae et al. 2015].

Above distinctive characteristics of a TPR-mediated context illustrate the complexity and the broad range of aspects that potentially have a significant influence on the TPR quality of user experience. Consideration of these features and factors provides a useful basis for the development of a comprehensive TPR experience framework.


  • [Tavakoli et al. 2020] Tavakoli, Mahdi, Carriere, Jay and Torabi, Ali. (2020). Robotics For COVID-19: How Can Robots Help Health Care in the Fight Against Coronavirus.
  • [Ackerman 2020] E. Ackerman (2020). Telepresence Robots Are Helping Take Pressure Off Hospital Staff, IEEE Spectrum, Apr 2020
  • [Jahromi et al. 2020] H. Z. Jahromi, I. Bartolec, E. Gamboa, A. Hines, and R. Schatz, “You Drive Me Crazy! Interactive QoE Assessment for Telepresence Robot Control,” in 12th International Conference on Quality of Multimedia Experience (QoMEX 2020), Athlone, Ireland, 2020.
  • [Hammer et al. 2018] F. Hammer, S. Egger-Lampl, and S. Möller, “Quality-of-user-experience: a position paper,” Quality and User Experience, vol. 3, no. 1, Dec. 2018, doi: 10.1007/s41233-018-0022-0.
  • [Haans et al. 2012] A. Haans & W. A. Ijsselsteijn (2012). Embodiment and telepresence: Toward a comprehensive theoretical framework✩. Interacting with Computers, 24(4), 211-218.
  • [Rae et al. 2015] I. Rae, G. Venolia, JC. Tang, D. Molnar  (2015, February). A framework for understanding and designing telepresence. In Proceedings of the 18th ACM conference on computer supported cooperative work & social computing (pp. 1552-1566).
  • [Narbutt et al. 2017] M. Narbutt, S. O’Leary, A. Allen, J. Skoglund, & A. Hines,  (2017, October). Streaming VR for immersion: Quality aspects of compressed spatial audio. In 2017 23rd International Conference on Virtual System & Multimedia (VSMM) (pp. 1-6). IEEE.
  • [Paloski et al. 2008] W. H. Paloski, C. M. Oman, J. J. Bloomberg, M. F. Reschke, S. J. Wood, D. L. Harm, … & L. S. Stone (2008). Risk of sensory-motor performance failures affecting vehicle control during space missions: a review of the evidence. Journal of Gravitational Physiology, 15(2), 1-29.