The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.
The 133rd MPEG meeting was once again held as an online meeting, and this time, kicked off with great news, that MPEG is one of the organizations honored as a 72nd Annual Technology & Engineering Emmy® Awards Recipient, specifically the MPEG Systems File Format Subgroup and its ISO Base Media File Format (ISOBMFF) et al.
The official press release can be found here and comprises the following items:
6th Emmy® Award for MPEG Technology: MPEG Systems File Format Subgroup wins Technology & Engineering Emmy® Award
Essential Video Coding (EVC) verification test finalized
MPEG issues a Call for Evidence on Video Coding for Machines
Neural Network Compression for Multimedia Applications – MPEG calls for technologies for incremental coding of neural networks
MPEG Systems reaches the first milestone for supporting Versatile Video Coding (VVC) and Essential Video Coding (EVC) in the Common Media Application Format (CMAF)
MPEG Systems continuously enhances Dynamic Adaptive Streaming over HTTP (DASH)
MPEG Systems reached the first milestone to carry event messages in tracks of the ISO Base Media File Format
In this report, I’d like to focus on ISOBMFF, EVC, CMAF, and DASH.
MPEG Systems File Format Subgroup wins Technology & Engineering Emmy® Award
MPEG is pleased to report that the File Format subgroup of MPEG Systems is being recognized this year by the National Academy for Television Arts and Sciences (NATAS) with a Technology & Engineering Emmy® for their 20 years of work on the ISO Base Media File Format (ISOBMFF). This format was first standardized in 1999 as part of the MPEG-4 Systems specification and is now in its 6th edition as ISO/IEC 14496-12. It has been used and adopted by many other specifications, e.g.:
MP4 and 3GP file formats;
Carriage of NAL unit structured video in the ISO Base Media File Format which provides support for AVC, HEVC, VVC, EVC, and probably soon LCEVC;
MPEG-21 file format;
Dynamic Adaptive Streaming over HTTP (DASH) and Common Media Application Format (CMAF);
High-Efficiency Image Format (HEIF);
Timed text and other visual overlays in ISOBMFF;
Common encryption format;
Carriage of timed metadata metrics of media;
Derived visual tracks;
Event message track format;
Carriage of uncompressed video;
Omnidirectional Media Format (OMAF);
Carriage of visual volumetric video-based coding data;
Carriage of geometry-based point cloud compression data;
… to be continued!
This is MPEG’s fourth Technology & Engineering Emmy® Award (after MPEG-1 and MPEG-2 together with JPEG in 1996, Advanced Video Coding (AVC) in 2008, and MPEG-2 Transport Stream in 2013) and sixth overall Emmy® Award including the Primetime Engineering Emmy® Awards for Advanced Video Coding (AVC) High Profile in 2008 and High-Efficiency Video Coding (HEVC) in 2017, respectively.
Essential Video Coding (EVC) verification test finalized
At the 133rd MPEG meeting, a verification testing assessment of the Essential Video Coding (EVC) standard was completed. The first part of the EVC verification test using high dynamic range (HDR) and wide color gamut (WCG) was completed at the 132nd MPEG meeting. A subjective quality evaluation was conducted comparing the EVC Main profile to the HEVC Main 10 profile and the EVC Baseline profile to AVC High 10 profile, respectively:
Analysis of the subjective test results showed that the average bitrate savings for EVC Main profile are approximately 40% compared to HEVC Main 10 profile, using UHD and HD SDR content encoded in both random access and low delay configurations.
The average bitrate savings for the EVC Baseline profile compared to the AVC High 10 profile is approximately 40% using UHD SDR content encoded in the random-access configuration and approximately 35% using HD SDR content encoded in the low delay configuration.
Verification test results using HDR content had shown average bitrate savings for EVC Main profile of approximately 35% compared to HEVC Main 10 profile.
By providing significantly improved compression efficiency compared to HEVC and earlier video coding standards while encouraging the timely publication of licensing terms, the MPEG-5 EVC standard is expected to meet the market needs of emerging delivery protocols and networks, such as 5G, enabling the delivery of high-quality video services to an ever-growing audience.
In addition to verification tests, EVC, along with VVC and CMAF were subject to further improvements to their support systems.
Research aspects: as for every new video codec, its compression efficiency and computational complexity are important performance metrics. Additionally, the availability of (efficient) open-source implementations (i.e., x264, x265, soon x266, VVenC, aomenc, et al., etc.) are vital for its adoption in the (academic) research community.
MPEG Systems reaches the first milestone for supporting Versatile Video Coding (VVC) and Essential Video Coding (EVC) in the Common Media Application Format (CMAF)
At the 133rd MPEG meeting, MPEG Systems promoted Amendment 2 of the Common Media Application Format (CMAF) to Committee Draft Amendment (CDAM) status, the first major milestone in the ISO/IEC approval process. This amendment defines:
constraints to (i) Versatile Video Coding (VVC) and (ii) Essential Video Coding (EVC) video elementary streams when carried in a CMAF video track;
codec parameters to be used for CMAF switching sets with VVC and EVC tracks; and
support of the newly introduced MPEG-H 3D Audio profile.
It is expected to reach its final milestone in early 2022. For research aspects related to CMAF, the reader is referred to the next section about DASH.
MPEG Systems continuously enhances Dynamic Adaptive Streaming over HTTP (DASH)
At the 133rd MPEG meeting, MPEG Systems promoted Part 8 of Dynamic Adaptive Streaming over HTTP (DASH) also referred to as “Session-based DASH” to its final stage of standardization (i.e., Final Draft International Standard (FDIS)).
Historically, in DASH, every client uses the same Media Presentation Description (MPD), as it best serves the scalability of the service. However, there have been increasing requests from the industry to enable customized manifests for enabling personalized services. MPEG Systems has standardized a solution to this problem without sacrificing scalability. Session-based DASH adds a mechanism to the MPD to refer to another document, called Session-based Description (SBD), which allows per-session information. The DASH client can use this information (i.e., variables and their values) provided in the SBD to derive the URLs for HTTP GET requests.
An updated overview of DASH standards/features can be found in the Figure below.
MPEG DASH Status as of January 2021.
Research aspects: CMAF is mostly like becoming the main segment format to be used in the context of HTTP adaptive streaming (HAS) and, thus, also DASH (hence also the name common media application format). Supporting a plethora of media coding formats will inevitably result in a multi-codec dilemma to be addressed in the near future as there will be no flag day where everyone will switch to a new coding format. Thus, designing efficient bitrate ladders for multi-codec delivery will an interesting research aspect, which needs to include device/player support (i.e., some devices/player will support only a subset of available codecs), storage capacity/costs within the cloud as well as within the delivery network, and network distribution capacity/costs (i.e., CDN costs).
The 134th MPEG meeting will be again an online meeting in April 2021. Click here for more information about MPEG meetings and their developments.
The
90th JPEG meeting was held online from 18 to 22 January 2021. This meeting was distinguished
by very relevant activities, notably the new JPEG AI standardization project
planning, and the analysis of the Call for Evidence on JPEG Pleno Point Cloud Coding.
The
new JPEG AI Learning-based Image Coding System has become an official new work
item registered under ISO/IEC 6048 and aims at providing compression efficiency
in addition to image processing and computer visions
tasks without the need for decompression.
The
response to the Call for Evidence on JPEG Pleno Point Cloud Coding was a learning-based
method that was found to offer state of the art compression efficiency. Considering this response, the JPEG Pleno
Point Cloud activity will analyse the possibility of preparing a future call
for proposals on learning-based coding solutions that will also consider new functionalities,
building on the relevant use cases already identified that require machine
learning tasks processed in the compressed domain.
Meanwhile the new JPEG XL coding system has reached FDIS stage and it is ready for adoption. JPEG XL offers compression efficiency similar to the best state of the art in image coding, the best lossless compression performance, affordable low complexity and integration with the legacy JPEG image coding standard allowing a friendly transition between the two standards.
The new JPEG AI logo.
The 90th JPEG meeting had the following highlights:
JPEG AI,
JPEG Pleno Point Cloud response to the Call for Evidence,
JPEG XL Core Coding System reaches FDIS stage,
JPEG Fake Media exploration,
JPEG DNA continues the exploration on image coding suitable for DNA storage,
JPEG systems,
JPEG XS 2nd edition of Profiles reaches DIS stage.
JPEG AI
The
scope of the JPEG AI is the creation of a learning-based image coding standard
offering a single-stream, compact compressed domain representation, targeting
both human visualization with significant compression efficiency improvement
over image coding standards in common use at equivalent subjective quality, and
effective performance for image processing and computer vision tasks, with the
goal of supporting a royalty-free baseline.
JPEG
AI has made several advances during the 90th technical meeting. During this
meeting, the JPEG AI Use Cases and Requirements were discussed and
collaboratively defined. Moreover, the JPEG AI vision and the overall system
framework of an image compression solution with efficient compressed domain
representation was defined. Following this approach, a set of exploration
experiments were defined to assess the capabilities of the
compressed representation generated by learning-based image codecs,
considering some specific computer vision and image processing tasks.
Moreover,
the performance assessment of the most popular objective quality metrics, using
subjective scores obtained during the call for evidence were discussed, as well
as anchors and some techniques to perform spatial prediction and entropy
coding.
JPEG Pleno Point Cloud response to the Call for Evidence
JPEG Pleno is working towards the integration of various modalities of
plenoptic content under a single and seamless framework. Efficient and powerful
point cloud representation is a key feature within this vision. Point cloud
data supports a wide range of applications including computer-aided
manufacturing, entertainment, cultural heritage preservation, scientific
research and advanced sensing and analysis. During the 90th JPEG meeting, the
JPEG Committee reached an exciting major milestone and reviewed the results of its
Final Call for Evidence on JPEG Pleno Point Cloud Coding. With an innovative
Deep Learning based point cloud codec supporting scalability and random access
submitted, the Call for Evidence results highlighted the emerging role of Deep
Learning in point cloud representation and processing. Between the 90th and
91st meetings, the JPEG Committee will be refining the scope and direction of
this activity in light of the results of the Call for Evidence.
JPEG XL Core Coding System reaches FDIS stage
The JPEG Committee has
finalized JPEG XL Part 1 (Core Coding System), which is now at FDIS stage. The
committee has defined new core experiments to determine appropriate profiles
and levels for the codec, as well as appropriate criteria for defining
conformance. With Part 1 complete, and Part 2 close to completion, JPEG XL is
ready for evaluation and adoption by the market.
JPEG Fake Media exploration
The
JPEG Committee initiated the JPEG Fake Media JPEG exploration study with the
objective to create a standard that can facilitate secure and reliable
annotation of media asset generation and modifications. The initiative aims to
support usage scenarios that are in good faith as well as those with
malicious intent. During the 90th JPEG meeting, the committee released a new
version of the document entitled “JPEG Fake Media: Context Use Cases and
Requirements” which is available on the JPEG website. A first workshop on the
topic was organized on the 15th of December 2020. The program,
presentations and a video recording of this workshop are available on the JPEG
website. A second workshop will be organized around March 2021. More details
will be made available soon on JPEG.org.
JPEG invites interested parties to regularly visit https://jpeg.org/jpegfakemedia for the latest information and subscribe to the mailing list
via http://listregistration.jpeg.org.
JPEG DNA continues the exploration on image coding suitable for DNA storage
The
JPEG Committee continued its exploration for coding of images in quaternary
representation, particularly suitable for DNA storage. After a second
successful workshop presentation by stakeholders, additional requirements were
identified, and a new version of the JPEG DNA overview document was issued and
made publicly available. It was decided to continue this exploration by
organising a third workshop and further outreach to stakeholders, as well as a
proposal for an updated version of the JPEG overview document. Interested
parties are invited to refer to the following URL and to consider joining the
effort by registering to the mailing list of JPEG DNA here:
https://jpeg.org/jpegdna/index.html.
JPEG Systems
JUMBF (ISO/IEC 19566-5)
Amendment 1 draft review is complete and it is proceeding to international
standard and subsequent publication; additional features to support new
applications are under consideration. Likewise, JPEG 360 (ISO/IEC
19566-5) Amendment 1 draft review is complete, and it is proceeding to
international standard and subsequent publication. The JLINK (ISO/IEC
19566-7) standard completed the committee draft review and is preparing a DIS
study text ahead of the 91st meeting. The JPEG Snack (ISO/IEC 19566-8) will
make a second working draft. Interested parties can subscribe to the mailing list of the
JPEG Systems AHG in order to contribute to the above activities.
JPEG XS 2nd edition of Profiles reaches DIS stage
The 2nd edition of Part 2 (Profiles) is now at the DIS stage and defines the required new profiles and levels to support the compression of raw Bayer content, mathematically lossless coding of up to 12-bit per component images, and 4:2:0 sampled image content. With the second editions of Parts 1, 2, and 3 completed, and the scheduled second editions of Part 4 (Conformance) and 5 (Reference Software), JPEG XS will soon have received a complete backwards-compatible revision of its entire suite of standards. Moreover, the committee defined a new exploration study to create new coding tools for improving the HDR and mathematically lossless compression capabilities, while still honoring the low-complexity and low-latency requirements.
Final Quote
“The official approval of JPEG AI by JPEG Parent Bodies ISO and IEC is a strong signal of support of this activity and its importance in the creation of AI-based imaging applications” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.
Future JPEG meetings are planned as follows:
No 91, will be held online from April 19 to 23, 2021.
No 92, will be held online from July 7 to 13, 2021.
The
gaming industry has eminently managed to intrinsically motivate users to
interact with their services. According to the latest report of Newzoo, there
will be an estimated total of 2.7 billion players across the globe by the end
of 2020. The global games market will generate revenues of $159.3 billion in
2020 [1]. This surpasses the movie industry (box offices and streaming
services) by a factor of four and almost three times the music industry market
in value [2].
The
rapidly growing domain of online gaming emerged in the late 1990s and early
2000s allowing social relatedness to a great number of players. During
traditional online gaming, typically, the game logic and the game user
interface are locally executed and rendered on the player’s hardware. The
client device is connected via the internet to a game server to exchange
information influencing the game state, which is then shared and synchronized
with all other players connected to the server. However, in 2009 a new concept
called cloud gaming emerged that is comparable to the rise of Netflix for video
consumption and Spotify for music consumption. On the contrary to traditional
online gaming, cloud gaming is characterized by the execution of the game
logic, rendering of the virtual scene, and video encoding on a cloud server,
while the player’s client is solely responsible for video decoding and capturing
of client input [3].
For online gaming and cloud gaming services, in contrast to applications such as voice, video, and web browsing, little information existed on factors influencing the Quality of Experience (QoE) of online video games, on subjective methods for assessing gaming QoE, or on instrumental prediction models to plan and manage QoE during service set-up and operation. For this reason, Study Group (SG) 12 of the Telecommunication Standardization Sector of the International Telecommunication Union (ITU-T) has decided to work on these three interlinked research tasks [4]. This was especially required since the evaluation of gaming applications is fundamentally different compared to task-oriented human-machine interactions. Traditional aspects such as effectiveness and efficiency as part of usability cannot be directly applied to gaming applications like a game without any challenges and time passing would result in boredom, and thus, a bad player experience (PX). The absence of standardized assessment methods as well as knowledge about the quantitative and qualitative impact of influence factors resulted in a situation where many researchers tended to use their own self-developed research methods. This makes collaborative work through reliably, valid, and comparable research very difficult. Therefore, it is the aim of this report to provide an overview of the achievements reached by ITU-T standardization activities targeting gaming QoE.
Theory of Gaming QoE
As a basis for the gaming research carried out, in 2013 a taxonomy of gaming QoE aspects was proposed by Möller et al. [5]. The taxonomy is divided into two layers of which the top layer contains various influencing factors grouped into user (also human), system (also content), and context factors. The bottom layer consists of game-related aspects including hedonic concepts such as appeal, pragmatic concepts such as learnability and intuitivity (part of playing quality which can be considered as a kind of game usability), and finally, the interaction quality. The latter is composed of output quality (e.g., audio and video quality), as well as input quality and interactive behaviour. Interaction quality can be understood as the playability of a game, i.e., the degree to which all functional and structural elements of a game (hardware and software) enable a positive PX. The second part of the bottom layer summarized concepts related to the PX such as immersion (see [6]), positive and negative affect, as well as the well-known concept of flow that describes an equilibrium between requirements (i.e., challenges) and abilities (i.e., competence). Consequently, based on the theory depicted in the taxonomy, the question arises which of these aspects are relevant (i.e., dominant), how they can be assessed, and to which extent they are impacted by the influencing factors.
Fig. 1: Taxonomy of gaming QoE aspects. Upper panel: Influence factors and interaction performance aspects; lower panel: quality features (cf. [5]).
Introduction to Standardization Activities
Building upon this theory, the SG 12 of the ITU-T has decided during the 2013-2016 Study Period to start work on three new work items called P.GAME, G.QoE-gaming, and G.OMG. However, there are also other related activities at the ITU-T summarized in Fig. 2 about evaluation methods (P.CrowdG), and gaming QoE modelling activities (G.OMMOG and P.BBQCG).
Fig. 2: Overview of ITU-T SG12 recommendations and on-going work items related to gaming services.
The efforts on the three initial work items continued during the 2017-2020 Study Period resulting in the recommendations G.1032, P.809, and G.1072, for which an overview will be given in this section.
ITU-T Rec. G.1032 (G.QoE-gaming)
The ITU-T Rec. G.1032 aims at identifying the factors which potentially influence gaming QoE. For this purpose, the Recommendation provides an overview table and then roughly classifies the influence factors into (A) human, (B) system, and (C) context influence factors. This classification is based on [7] but is now detailed with respect to cloud and online gaming services. Furthermore, the recommendation considers whether an influencing factor carries an influence mainly in a passive viewing-and-listening scenario, in an interactive online gaming scenario, or in an interactive cloud gaming scenario. This classification is helpful to evaluators to decide which type of impact may be evaluated with which type of text paradigm [4]. An overview of the influencing factors identified for the ITU-T Rec. G.1032 is presented in Fig. 3. For subjective user studies, in most cases the human and context factors should be controlled and their influence should be reduced as much as possible. For example, even though it might be a highly impactful aspect of today’s gaming domain, within the scope of the ITU-T cloud gaming modelling activities, only single-player user studies are conducted to reduce the impact of social aspects which are very difficult to control. On the other hand, as network operators and service providers are the intended stakeholders of gaming QoE models, the relevant system factors must be included in the development process of the models, in particular the game content as well as network and encoding parameters.
Fig. 3: Overview of influencing factors on gaming QoE summarized in ITU-T Rec. G.1032 (cf. [3]).
ITU-T Rec. P.809 (P.GAME)
The aim of the ITU-T Rec. P.809 is to describe subjective evaluation methods for gaming QoE. Since there is no single standardized evaluation method available that would cover all aspects of gaming QoE, the recommendation mainly summarizes the state of the art of subjective evaluation methods in order to help to choose suitable methods to conduct subjective experiments, depending on the purpose of the experiment. In its main body, the draft consists of five parts: (A) Definitions for games considered in the Recommendation, (B) definitions of QoE aspects relevant in gaming, (C) a description of test paradigms, (D) a description of the general experimental set-up, recommendations regarding passive viewing-and-listening tests and interactive tests, and (E) a description of questionnaires to be used for gaming QoE evaluation. It is amended by two paragraphs regarding performance and physiological response measurements and by (non-normative) appendices illustrating the questionnaires, as well as an extensive list of literature references [4].
Fundamentally, the ITU-T Rec. P.809 defines
two test paradigms to assess gaming quality:
Passive
tests with predefined audio-visual stimuli
passively observed by a participant.
Interactive
tests with game scenarios interactively played by a participant.
The passive paradigm can be used for gaming quality assessment when the impairment does not influence the interaction of players. This method suggests a short stimulus duration of 30s which allows investigating a great number of encoding conditions while reducing the influence of user behaviours on the stimulus due to the absence of their interaction. Even for passive tests, as the subjective ratings will be merged with those derived from interactive tests for QoE model developments, it is recommended to give instruction about the game rules and objectives to allow participants to have similar knowledge of the game. The instruction should also explain the difference between video quality and graphic quality (e.g., graphical details such as abstract and realistic graphics), as this is one of the common mistakes of participants in video quality assessment of gaming content.
The interactive test should be used when other quality features such as interaction quality, playing quality, immersion, and flow are under investigation. While for the interaction quality, a duration of 90s is proposed, a longer duration of 5-10min is suggested in the case of research targeting engagement concepts such as flow. Finally, the recommendation provides information about the selection of game scenarios as stimulus material for both test paradigms, e.g., ability to provide repetitive scenarios, balanced difficulty, representative scenes in terms of encoding complexity, and avoiding ethically questionable content.
ITU-T Rec. G.1072 (G.OMG)
The
quality management of gaming services would require quantitative prediction
models. Such models should be able to predict either “overall quality” (e.g.,
in terms of a Mean Opinion Score), or individual QoE aspects from
characteristics of the system, potentially considering the player
characteristics and the usage context. ITU-T Rec. G.1072 aims at the development
of quality models for cloud gaming services based on the impact of impairments
introduced by typical Internet Protocol (IP) networks on the quality
experienced by players. G.1072 is a network planning tool that estimates the
gaming QoE based on the assumption of network and encoding parameters as well
as game content.
The impairment factors are derived from subjective ratings of the corresponding quality aspects, e.g., spatial video quality or interaction quality, and modelled by non-linear curve fitting. For the prediction of the overall score, linear regression is used. To create the impairment factors and regression, a data transformation from the MOS values of each test condition to the R-scale was performed, similar to the well-known E-model [8]. The R-scale, which results from an s-shaped conversion of the MOS scale, promises benefits regarding the additivity of the impairments and compensation for the fact that participants tend to avoid using the extremes of rating scales [3].
As the impact of the input parameters, e.g. delay, was shown to be highly content-dependent, the model used two modes. If no assumption on a game sensitivity class towards degradations is available to the user of the model (e.g. a network provider), the “default” mode of operation should be used that considers the highest (sensitivity) game class. The “default” mode of operation will result in a pessimistic quality prediction for games that are not of high complexity and sensitivity. If the user of the model can make an assumption about the game class (e.g. a service provider), the “extended” mode can predict the quality with a higher degree of accuracy based on the assigned game classes.
On-going Activities
While
the three recommendations provide a basis for researchers, as well as network
operators and cloud gaming service providers towards improving gaming QoE, the
standardization activities continue by initiating new work items focusing on
QoE assessment methods and gaming QoE model development for cloud gaming and online
gaming applications. Thus, three work items have been established within the
past two years.
ITU-T P.BBQCG
P.BBQCG
is a work item that aims at the development of a bitstream model predicting
cloud gaming QoE. Thus, the model will benefit from the bitstream information,
from header and payload of packets, to reach a higher accuracy of audiovisual
quality prediction, compared to G.1072. In addition, three different types of
codecs and a wider range of network parameters will be considered to develop a
generalizable model. The model will be trained and validated for H.264, H.265,
and AV1 video codecs and video resolutions up to 4K. For the development of the
model, two paradigms of passive and interactive will be followed. The passive
paradigm will be considered to cover a high range of encoding parameters, while
the interactive paradigm will cover the network parameters that might strongly
influence the interaction of players with the game.
ITU-T P.CrowdG
A
gaming QoE study is per se a challenging task on its own due to the
multidimensionality of the QoE concept and a large number of influence factors.
However, it becomes even more challenging if the test would follow a
crowdsourcing approach which is of particular interest in times of the COVID-19
pandemic or if subjective ratings are required from a highly diverse audience,
e.g., for the development or investigation of questionnaires. The aim of the
P.CrowdG work item is to develop a framework that describes the best practices
and guidelines that have to be considered for gaming QoE assessment using a
crowdsourcing approach. In particular, the crowd gaming framework provides the
means to ensure reliable and valid results despite the absence of an
experimenter, controlled network, and visual observation of test participants
had to be considered. In addition to the crowd game framework, guidelines will
be given that provide recommendations to ensure collecting valid and reliable
results, addressing issues such as how to make sure workers put enough focus on
the gaming and rating tasks. While a possible framework for interactive tests
of simple web-based games is already presented in [9], more work is required to
complete the ITU-T work item for more advanced setups and passive tests.
ITU-T G.OMMOG
G.OMMOG
is a work item that focuses on the development of an opinion model predicting
gaming Quality of Experience (QoE) for mobile online gaming services. The work
item is a possible extension of the ITU-T Rec. G.1072. In contrast to G.1072,
the games are not executed on a cloud server but on a gaming server that
exchanges game states with the user’s clients instead of a video stream. This
more traditional gaming concept represents a very popular service, especially
considering multiplayer gaming such as recently published AAA titles of the
Multiplayer Online Battle Arena (MOBA) and battle royal genres.
So far, it is decided to follow a similar model structure to ITU-T Rec. G.1072. However, the component of spatial video quality, which was a major part of G.1072, will be removed, and the corresponding game type information will not be used. In addition, for the development of the model, it was decided to investigate the impact of variable delay and packet loss burst, especially as their interaction can have a high impact on the gaming QoE. It is assumed that more variability of these factors and their interplay will weaken the error handling of mobile online gaming services. Due to missing information on the server caused by packet loss or strong delays, the gameplay is assumed to be not smooth anymore (in the gaming domain, this is called ‘rubber banding’), which will lead to reduced temporal video quality.
About ITU-T SG12
ITU-T
Study Group 12 is the expert group responsible for the development of
international standards (ITU-T Recommendations) on performance, quality of
service (QoS), and quality of experience (QoE). This work spans the full
spectrum of terminals, networks, and services, ranging from speech over fixed
circuit-switched networks to multimedia applications over mobile and
packet-based networks.
In this article, the previous achievements of the ITU-T SG12 with respect to gaming QoE are described. The focus was in particular on subjective assessment methods, influencing factors, and modelling of gaming QoE. We hope that this information will significantly improve the work and research in this domain by enabling more reliable, comparable, and valid findings. Lastly, the report also points out many on-going activities in this rapidly changing domain, to which everyone is gladly invited to participate.
More information about the SG12, which will host its next E-meeting from 4-13 May 2021, can be found at ITU Study Group (SG) 12.
For more information about the gaming activities described in this report, please contact Sebastian Möller (sebastian.moeller@tu-berlin.de).
Acknowledgement
The authors would like to thank all colleagues of ITU-T Study Group 12, as well as of the Qualinet gaming Task Force, for their support. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871793 and No 643072 as well as by the German Research Foundation (DFG) within project MO 1038/21-1.
[3] S. Schmidt, Assessing the Quality of
Experience of Cloud Gaming Services, Ph.D. dissertation, Technische Universität
Berlin, 2021.
[4] S. Möller, S. Schmidt, and S.
Zadtootaghaj, “New ITU-T Standards for Gaming QoE Evaluation and Management”,
in 2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX),
IEEE, 2018.
[5] S. Möller, S. Schmidt, and J. Beyer,
“Gaming Taxonomy: An Overview of Concepts and Evaluation Methods for Computer
Gaming QoE”, in 2013 Fifth International Workshop on Quality of Multimedia
Experience (QoMEX), IEEE, 2013.
[8] ITU-T Recommendation G.107, The E-model: A
Computational Model for Use in Transmission Planning. Geneva: International
Telecommunication Union, 2015.
[9] S. Schmidt, B. Naderi, S. S. Sabet, S.
Zadtootaghaj, and S. Möller, “Assessing Interactive Gaming Quality of
Experience Using a Crowdsourcing Approach”, in 2020 Twelfth International
Conference on Quality of Multimedia Experience (QoMEX), IEEE, 2020.
Alex, could you tell us a bit about your background, and what the road to your current position was?
Alex Thayer, PhD. Head of Research, Amazon (Search); Affiliate Assistant Professor, University of Washington
Sure! I began my career in the tech industry in 1998, when I interned at the IBM Silicon Valley Lab in San Jose, California. Back then it was called the Santa Teresa Lab, and I completed a year-long internship because I wanted to get a richer professional experience than a single school quarter would provide. I also wanted to find an internship at a company that future employers would recognize when they saw my resume.
At the time, I thought about my career as a narrative that would span decades: What story would I want to tell about my employment history 20 or 30 years later? In a sense, each job would become a “chapter” in that story. As I have learned over the years, this metaphor holds up and each chapter has a slightly different theme: from drama to comedy to Greek tragedy. After about 13 different tech industry jobs, I think I’ve got a lot of genres covered.
After the year at IBM, I returned to Seattle and spent another year completing my degrees in Technical Communication (College of Engineering) and Art History (College of Art). After graduation, I focused on building my career as a technical writer. I worked at a voice recognition startup, then at a consulting firm, and I wound up doing a lot of “UX work” that was not quite codified into specific roles yet. For example, in a typical week I might work on the design of a UI component, rewrite the Javascript for a website, change the physical layout of a printed user manual, and write copy for a tutorial. I went back to the University of Washington in 2002 to get a Master of Science degree in the Technical Communication program, and to try teaching courses at the college level.
Eventually I began working full-time at Microsoft in 2006. It was during my time there when I realized technical writing was not my passion. I decided to “adjust my career narrative” and shift toward UX design and research. I was able to make that happen partly because I worked on a cross-disciplinary team at Microsoft: We had interaction design, industrial design, user research, and content publishing included in the same team. I worked on software and hardware projects in a variety of capacities. For one project, I helped design the physical product packaging; on another project, I collaborated with my teammates on the vision for an adaptive keyboard.
Eventually I hit the limits of what I could do professionally without returning to school and advancing my knowledge about people and their practices. I returned to the University of Washington and spent 4 years working on my PhD in Human Centered Design & Engineering. I moved with my family to the Bay Area in California near the conclusion of my PhD work, and I looked for a role with a focus on emerging technology and interfaces. I found that role at Intel, where I stayed for a year and a half before shifting to a very different research role at VMware. When an opportunity to work at HP Labs arose, I decided to make another career move after a year and a half. It was never my intention to work for different companies so quickly, but I thought about the career narrative perspective and the story I wanted to tell. That perspective helped me make my decision to change roles and work at HP.
What is the professional role of interdisciplinarity in your experience?
Because I have an interdisciplinary skill set, I have discovered that it can be tricky to find a job! As a “T-shaped” person, it’s not always easy to know how to bring my full set of skills to a specific role or organization. In my experience, companies are looking for experts who can go deep in a particular area, but who can also span a variety of topics and skills as needed. In practice, this means collaborating with colleagues who have an assortment of technical backgrounds and methodologies. In a typical week at my current role, I engage with product managers, designers, design technologists, business leaders, engineers, economists, and scientists. All of these roles have different requirements and dialects, which means I am constantly surrounded by “interdisciplinarity,” if that makes sense!
Also, because of my academic research focus on how people collaborate, it’s hard for me to imagine a world without “interdisciplinarity.” That’s how I think about the “role” of interdisciplinarity: It’s more of a fabric or texture that underpins the teams on which I work. And as a leader, I need to consider how different members of a team or organization come together and bring their unique skills and backgrounds to bear on the tasks at hand.
As a tangible example, we had a terrific undergraduate intern at HP who was working on Computer Science and Humanities degrees at Stanford. His approach to his education resonated with me since I had taken a similar Engineering/Arts path in my own undergrad education. It was fun to watch him apply his thought processes and knowledge on a team of senior engineers, designers, and researchers. I believe he was successful in his intern role because he could reframe problems or goals in creative ways.
In 2012, you successfully defended your dissertation on “Understanding University Students’ Use of Tools and Artifacts in Support of Collaborative Project Work”. Almost a decade later: what are your thoughts on today’s use of (multimedia) tools and devices at a university level?
This is a great segue from the question about interdisciplinarity and collaboration!
As a social scientist, I am excited to see how new tools and processes “come with” students as they graduate and enter the workforce. The space of design prototyping is evolving rapidly, for example, as recent grads expect to use the same tools on the job that they learned how to use while in school. My role at HP included people management, and I had a number of conversations about how to get access to the specific software and hardware tools that employees needed to achieve their vision. Some of these discussions were easy: one of my colleagues asked if he could buy an iron and an ironing board, for example. I said yes. Other discussions required more planning, like when our team wanted to purchase a laser cutter. So perhaps I am taking this question in an unexpected direction, but I do see an opportunity to bridge a gap between the tools and devices in use at the university level and the availability of those same tools and devices in industry.
To be honest, I have a lot to learn about how students are doing their work today. It’s been several years since I finished my PhD. I spent an entire academic quarter observing a class of advanced design students. When I think about how they were doing their project work nearly a decade ago, and when I think about how I saw students working at Yale a couple of years ago, it’s easy for me to see the advances in technology. Or when we took a trip to Wellesley a few years ago, I watched my young daughter play with the VR headsets and try her hand at archaeology. And yet we still love whiteboards and paper! Once university students are able to safely return to in-person learning, I’m sure we will keep using whiteboards and paper as two of our main tools for learning and collaboration.
Looking at your impressive set of published patents: your inventions draw from and actually span many different disciplines.
Thanks! All of those patents represent the work of teams: I have been lucky to have worked with amazing people who, quite frankly, did the hard work to make those patents happen. So, returning to that topic of interdisciplinarity, I can only point to these published patents because of the amazing work of my colleagues.
One anecdote stands out for me now, as I think back about my experience at HP Labs in particular. I was meeting with one of my teammates, an amazing colleague named Ian Robinson, and we were having our weekly one-on-one meeting. We were talking about tracking digital pen devices in Virtual Reality (VR) spaces. At one point we began riffing on the idea of a “low-cost” VR controller, and then we had a realization: rather than putting a lot of expensive technology inside a single pen, what if you designed a pair of objects that relied on a different VR tracking method? We could conceivably eliminate the need for some of the guts of the single object if we had two objects moving in virtual space. We stopped out meeting and walked over to our desks, hoping to catch some of our teammates. We described the essential concept to a few of our peers and that was the genesis of the “VR Grabbers” idea. Jackie Yang was a Stanford grad student who was working as an intern in our lab at the time, and he did an incredible amount of work on the project from that point on. His effort culminated in our UIST 2018 paper on which Jackie was the first author!
How do you work across disciplines?
Continuing that “VR Grabbers” story, I was lucky enough to have a stimulating conversation with a really smart person in a place that enabled us to pursue the idea. Ian and I came from different professional backgrounds. We happened to find ourselves working together and, on that project, we made the most of our different skills. My role after that initial conversation was to evangelize the project inside the organization rather than develop the prototype, for example. So, while it was great to help a team come together around an idea, my involvement on the project was quite different than it would have been if I were earlier in my career.
I said a bit about collaboration earlier, but I’d like to go a bit deeper on this topic. In my dissertation I spent a lot of time in the literature review section exploring the different types of collaboration. I am a big believer in “contested collaboration,” which occurs when a team of people come from different backgrounds and bring their specific perspectives and experiences to bear on a project. It is certainly more challenging to lead a team that engages in contested collaboration: It would be a lot easier if everyone agreed all the time! I’m not saying anything new here, of course.
Could you name a grand research challenge in your current field of work?
I recently saw the 2021 AI Index Report from Stanford (https://aiindex.stanford.edu/report/) and I thought each topic raised in the summary of that report could represent a “grand research challenge.” On the topic of “generative everything”, I am particularly curious about the future of ideas. In 2019 I delivered one of the keynote presentations at the IEEE Games, Entertainment, and Media (IEEE GEM) conference at Yale University in New Haven, Connecticut. In part of my presentation, I raised the question about attribution of ideas and intellectual property when we “partner” with AI. I can imagine a future where it seems less clear “who” came up with an idea: the person or the AI agent? Thinking about the “VR Grabbers” story I told earlier, I wonder how that same story will play out 20 years from now. In my capacity as an affiliate assistant professor at the University of Washington, I’m excited to continue thinking about this topic!
How and in what form do you feel we as academics can be most impactful?
I think academics need to keep doing what they’re doing. Perhaps that’s a trite answer, but as a society we need to preserve and protect the ability of academics to do their work, to ask very basic questions and be surprised by what they find. I’m not just talking about the need for basic R&D so we can find the next penicillin. I’m also talking about how companies incentivize the effort to identify and use academic work.
I also think others know a lot more about this topic, though! I’d suggest reviewing the 2017 DIS paper, Translational Resources: Reducing the Gap Between Academic Research and HCI Practice, as a useful starting point. Lucas Colusso recently completed his PhD in Human Centered Design & Engineering at the University of Washington, and he was the first author on that paper. Thanks to Professor Gary Hsieh in that department, I became aware of Lucas’ work and now I reference it with my team members when we talk about how to pursue research topics that will have lasting impact. I believe academics are the experts at generating knowledge, and in industry we can apply similar approaches on our projects.
Bios
Alex Thayer, PhD is the Head of Research for Amazon (Search) in Palo Alto. He completed his PhD in Human Centered Design & Engineering at the University of Washington, where he is currently an Affiliate Assistant Professor. Prior to joining Amazon, Alex was the Chief Experience Architect for HP Labs. He has also worked at VMware, Intel, Microsoft, YouTube, and a voice recognition startup that was partly funded by James Doohan (Scotty from Star Trek). Alex’s professional work focuses on explorations of the social-technical gap and how we make sense of people’s habits, practices, and messy lives. His academic work spans topics from AR/VR to professional collaboration to digital gaming. He has published 12 patents on medical testing, haptic feedback systems, 3D and 4D printing, immersive displays, and wearable technology. He also co-leads his daughter’s Girl Scout troop.
Editor Biographies
Dr. Cynthia C. S. Liem is an Assistant Professor in the Multimedia Computing Group of Delft University of Technology, The Netherlands, and pianist of the Magma Duo. She initiated and co-coordinated the European research project PHENICX (2013-2016), focusing on technological enrichment of symphonic concert recordings with partners such as the Royal Concertgebouw Orchestra. Her research interests consider music and multimedia search and recommendation, and increasingly shift towards making people discover new interests and content which would not trivially be retrieved. Beyond her academic activities, Cynthia gained industrial experience at Bell Labs Netherlands, Philips Research and Google. She was a recipient of the Lucent Global Science and Google Anita Borg Europe Memorial scholarships, the Google European Doctoral Fellowship 2010 in Multimedia, and a finalist of the New Scientist Science Talent Award 2016 for young scientists committed to public outreach.
Dr. Jochen Huber is Professor of Computer Science at Furtwangen University, Germany. Previously, he was a Senior User Experience Researcher with Synaptics and an SUTD-MIT postdoctoral fellow in the Fluid Interfaces Group at MIT Media Lab and the Augmented Human Lab at Singapore University of Technology and Design. He holds a Ph.D. in Computer Science and degrees in both Mathematics (Dipl.-Math.) and Computer Science (Dipl.-Inform.), all from Technische Universität Darmstadt, Germany. Jochen’s work is situated at the intersection of Human-Computer Interaction and Human Augmentation. He designs, implements and studies novel input technology in the areas of mobile, tangible & non-visual interaction, automotive UX and assistive augmentation. He has co-authored over 60 academic publications and regularly serves as program committee member in premier HCI and multimedia conferences. He was program co-chair of ACM TVX 2016 and Augmented Human 2015 and chaired tracks of ACM Multimedia, ACM Creativity and Cognition and ACM International Conference on Interface Surfaces and Spaces, as well as numerous workshops at ACM CHI and IUI. Further information can be found on his personal homepage: http://jochenhuber.com
The 2020 edition of SISAP was planned to be held at IT University of Copenhagen, Denmark, but was converted into an online event due to the on-going pandemic.
A strong technical program was assembled by three program committee co-chairs, 63 program committee members, and 18 additional reviewers. Each of 50 valid submissions, with authors from 22 countries, was reviewed by at least three referees. 31 papers were accepted, 12 of them as short papers. The doctoral symposium accepted 2 papers.
Gallery view from the special session on Artificial Intelligence and Similarity
The program included four regular sessions, the doctoral symposium, and a special session on Artificial Intelligence and Similarity, chaired by Anshumali Shrivastava, with four talks followed by a panel discussion. The technical program was completed with three distinguished keynote speakers:
Marcel Worring from the University of Amsterdam spoke about Interactive Exploration using Hypergraphs. In his engaging presentation, Marcel focused on an interactive exploration of large multimedia collections. He first reviewed recent successes in supporting scalable categorisation, and then highlighted the opportunities provided by the new field of hypergraph learning.
Divesh Srivastava from AT&T Labs-Research spoke about Exploiting Similarity Relationships to Repair Graphs. In an entertaining talk, Divesh showed how similarity concepts are important in data management tasks such as entity resolution and taxonomies for noisy data.
Ilya Razenshteyn from Microsoft Research spoke about Scalable Nearest Neighbor Search for Optimal Transport. The Wasserstein (aka Optimal Transport) distance is a popular similarity measure for structured data domains, modelled as collections of point sets. The talk focused on efficient algorithms for approximating the distance between a pair of point sets, showing both theoretically well-founded and practical results.
The program committee identified five papers as candidates for the best paper award. It was decided to give the award to Vladimir Mic and Pavel Zezula for their paper “Accelerating Metric Filtering by Improving Bounds on Estimated Distances”. The best student paper award was given Erik Thordsen and Erich Schubert for the paper “ABID: Angle Based Intrinsic Dimensionality”. The best doctoral symposium paper award was given to Shima Moghtasedi for the paper “Temporal Similarity of Trajectories in Graphs”. Top papers from the conference were invited for a special issue of the journal Information Systems.
116 participants signed up for the conference, about half of them from Europe and the other half from institutions around the world. Due to generous sponsorships from Springer, Google, and the IT University of Copenhagen, we were able to make registration completely free. To allow participation from many time zones, a condensed schedule was used with a 5-6 hour main time slot each day. Speakers provided pre-recorded long versions of their talks and gave a short, interactive version on Zoom during the conference. Most participants were active, with 30-40 participants on average in poster sessions, and 30-60 in the technical sessions.
To facilitate interaction, there were three poster sessions placed such that it was possible to attend two at reasonable hours in any time zone. There was also a social event, featuring a popular quiz about Copenhagen. For the poster and social events, we used the gather.town platform, in which a small virtual conference venue had been built.
The conference venue in gather.town: poster room
The conference venue in gather.town: room for gatherings
A scene from the Copenhagen quiz during the social event.
Acknowledgements: Many people worked hard to make SISAP 2020 a success, despite the challenging circumstances. We are particularly indebted to the PC chairs Shin’ichi Satoh, Lucia Vadicamo, and Arthur Zimek, the doctoral symposium chair Ilaria Bartolini, the publication chair Fabio Carrara, and our local arrangements chair Julie Tollund.
Towards SISAP 2021:
As is traditional, the venue for SISAP 2021 was unveiled during the social event. SISAP 2021 is planned to be held in Dortmund, Germany, with Erich Schubert as general chair. We hope that by fall of 2021, the pandemic has subsided sufficiently to allow us to travel to Dortmund, but the experience from SISAP 2020 should provide a template for an online event. On behalf of the organisers, we thank all authors and participants for their contributions, and look forward to seeing you all at SISAP 2021!
About SISAP:
The International Conference on Similarity Search and Applications (SISAP) is an annual forum for researchers and application developers in the area of similarity data management. It aims at the technological problems shared by numerous application domains, such as data mining, information retrieval, multimedia, computer vision, pattern recognition, computational biology, geography, biometrics, machine learning, and many others that make use of similarity search as a necessary supporting service.
From its roots as a regional workshop in metric indexing, SISAP has expanded to become the only international conference entirely devoted to the issues surrounding the theory, design, analysis, practice, and application of content-based and feature-based similarity search. The SISAP initiative has also created a repository serving the similarity search community, for the exchange of examples of real-world applications, the source code for similarity indexes, and experimental testbeds and benchmark data sets (http://www.sisap.org). The proceedings of SISAP are published by Springer as a volume in the Lecture Notes in Computer Science (LNCS) series.
The International Conference on Similarity Search and Applications (SISAP) is an annual forum for researchers and application developers in the area of similarity data management. It aims at the technological problems shared by numerous application domains, such as data mining, information retrieval, multimedia, computer vision, pattern recognition, computational biology, geography, biometrics, machine learning, and many others that make use of similarity search as a necessary supporting service.
From its roots as a regional workshop in metric indexing, SISAP has expanded to become the only international conference entirely devoted to the issues surrounding the theory, design, analysis, practice, and application of content-based and feature-based similarity search. The SISAP initiative has also created a repository serving the similarity search community, for the exchange of examples of real-world applications, the source code for similarity indexes, and experimental testbeds and benchmark data sets (http://www.sisap.org). The proceedings of SISAP are published by Springer as a volume in the Lecture Notes in Computer Science (LNCS) series.
The 2019 edition of SISAP was held at the New Jersey Institute of Technology in Newark, New Jersey, USA. Newark is an attractive location in the New York City metropolitan area with easy and convenient travel to and from the conference. The organization was smooth and with a strong technical program assembled by two co-chairs and sixty program committee members. Each paper was reviewed by at least three referees. SISAP 2019 received 42 papers and accepted 12 as full papers (28% acceptance rate). The program was completed with three keynote speakers of high calibre and one panel.
The first keynote speaker was Fabrizio Silvestri, a Software Engineer at Facebook London working in the Search Systems team. The Facebook AI team in London deals with applying artificial intelligence techniques to address societal problems such as the spread of online misinformation, or the integrity of election processes around the world. To do so, the team has developed a set of tools that exploit similarity search technologies to efficiently and effectively run a very high number of classification tasks on a massive set of data. Fabrizio Silvestri’s talk reviewed some of the problems studied and the solutions adopted.
The second keynote speaker was Alexander Tuzhilin, the Leonard N. Stern Professor of Business in the Department of Technology, Operations and Statistics at the Stern School of Business, NYU. Alex Tuzhilin discussed the role of similarity measures in recommender systems. Measures of similarity between users and between items to be recommended to the users lie at the core of many recommendation algorithms, and numerous metrics have been proposed in the recommender systems field since its inception. The talk explored the evolution of various similarity-based measures from the initial class of rating-based measures to the more recently proposed latent metrics and the metric learning methods. It also explored possible future research directions and novel applications of similarity measures in recommender systems.
The third keynote speaker was Dr. Cong Yu, a research scientist and manager at Google Research in New York City. Cong Yu leads the Structured Data Research Group. The group’s mission is to understand and leverage structured data on the Web to enhance user experience for Google products and has been responsible for several impactful products such as WebTables, Structured Snippets, and Fact-Checking at Google. Currently, his group focuses on technical research for news and has been partnering with journalists and policy advisors to combat online misinformation and improve news consumption. The ClaimReview structured data (http://schema.org/ClaimReview) is a successful example of such collaborations and powers various fact check features for Google. This talk described the genesis of ClaimReview and its role in combating online misinformation.
The SISAP 2019 panel was on Deep Learning meets Similarity Search. The panel was moderated by K. Selçuk Candan (Arizona State University, USA). The panellists were James Bailey (University of Melbourne, Australia), Ilaria Bartolini (University of Bologna, Italy), Michael Houle (National Institute of Informatics, Japan) and Stéphane Marchand-Maillet (University of Geneva, Switzerland).
As it is usually the case, SISAP 2019 included a program with papers exploring various similarity-aware data analysis and processing problems from multiple perspectives. The papers presented at the conference in 2019 studied the role of similarity processing in the context of metric search, visual search, nearest neighbour queries, clustering, outlier detection, and graph analysis. Some of the papers had a theoretical emphasis, while others had a systems perspective, presenting experimental evaluations comparing against state-of-the-art methods. An interesting event at the 2019 conference, as well as the two previous editions, was an electronic poster session that included all accepted papers. This component of the conference generated many lively interactions between presenters and attendees, to not only learn more about the presented techniques but also to identify potential topics for future collaboration.
In a tradition that began with the 2009 conference in Prague, extended versions of the top-ranked papers were invited for a Special Issue of the Information Systems journal. A shortlist for the best papers was created from those conference papers nominated by at least one of their 3 reviewers. An award committee of 3 researchers ranked the shortlisted papers, from which a final ranking was decided. The Best Paper Award was presented to Martin Aumüller and Matteo Ceccarello (IT University of Copenhagen, Copenhagen, Denmark) for the paper titled “The Role of Local Intrinsic Dimensionality in Benchmarking Nearest Neighbor Search” during the Conference Dinner. The best paper reconsiders common benchmarking approaches to nearest neighbour search and studies the effect of different local intrinsic dimensionality (LID) distributions on the running time performance of different implementations.
In addition to the excellent conference facilities at NJIT, we had several student volunteers who were ready to help ensure that the logistical aspects of the conference ran smoothly. Our conference banquet was held at the Newark Museum (https://www.newarkmuseum.org), the largest museum of the state of New Jersey. It holds major collections of American art, decorative arts, contemporary art, and arts of Asia, Africa, the Americas, and the ancient world. The participants were given a highlight tour of the museum prior to the banquet held in the Ballantine House. The Ballantine House is part of The Newark Museum since 1937, the house was designed a National Historic Landmark in 1985. Built in 1885 for Jeannette and John Holme Ballantine, of the celebrated Newark beer-brewing family, this brick and limestone mansion originally had 27 rooms, including eight bedrooms and three bathrooms.
SISAP 2019 demonstrated that the SISAP community has a strong stable kernel of researchers, active in the field of similarity search and to fostering the growth of the community. Organizing SISAP is a smooth experience thanks to the support of the Steering Committee and dedicated participants.
The SISAP 2019 Doctoral Symposium provided a forum for PhD students to present their research ideas and receive feedback from senior members of the research community. The Symposium fostered a collaborative environment with constructive discussions that benefited the students.
SISAP 2020 was supposed to be organized in Copenhagen by Martin Aumüller, Björn Þór Jónsson and Rasmus Pagh from the IT University of Copenhagen. But it will become a virtual event because of the COVID-19 pandemic. One of the major challenges of the SISAP conference series is to continue to raise its profile in the landscape of scientific events related to information indexing, database and search systems.
Welcome to the third column on the ACM SIGMM Records from the Video Quality Experts Group (VQEG). The last VQEG plenary meeting took place online from 14 to 18 December. Given the current circumstances, it was organized all online for the second time, with multiple sessions distributed over five to six hours each day allowing remote participation of people from different time zones. About 130 participants from 24 different countries registered to the meeting and could attend the several presentations and discussions that took place in all working groups. This column provides an overview of this meeting, while all the information, minutes, files (including the presented slides), and video recordings from the meeting are available online in the VQEG meeting website. As highlights of interest for the SIGMM community, apart from several interesting presentations of state-of-the-art works, relevant contributions to ITU recommendations related to multimedia quality assessment were reported from various groups (e.g., on adaptive bitrate streaming services, on subjective quality assessment of 360-degree videos, on statistical analysis of quality assessments, on gaming applications, etc.), the new group on quality assessment for health applications was launched, and an interesting session on 5G use cases took place, as well as a workshop dedicated to user testing during Covid-19. In addition, new efforts have been launched related to the research on quality metrics for live media streaming applications, and to provide guidelines on implementing objective video quality metrics (ahead of PSNR) to the video compression community. We encourage those readers interested in any of the activities going on in the working groups to check their websites and subscribe to the corresponding reflectors, to follow them and get involved.
Overview of VQEG Projects
Audiovisual HD (AVHD)
AVHD/P.NATS2 project was a joint collaboration between VQEG and ITU SG12, whose goal was to develop a multitude of objective models, varying in terms of complexity/type of input/use-cases for the assessment of video quality in adaptive bitrate streaming services over reliable transport up to 4K. The report of this project, which finished in January 2020, was approved in this meeting. In summary, it resulted in 10 model categories with models trained and validated on 26 subjective datasets. This activity resulted in 4 ITU standards (ITU-T Rec. P.1204 in [1], P.1204.3 in [2], P.1204.4 in [3], P.1204.5 in [4], a dataset created during this effort and a journal publication reporting details on the validation tests [5]. In this sense, one presentation by Alexander Raake (TU Ilmenau) provided details on the P.NATS Phase 2 project and the resulting ITU recommendations, while details of the processing chain used in the project were presented by Werner Robitza (AVEQ GmbH) and David Lindero (Ericsson). In addition to this activity, there were various presentations covering topics related to this group. For instance, Cindy Chen, Deepa Palamadai Sundar, and Visala Vaduganathan (Facebook) presented their work on hardware acceleration of video quality metrics. Also from Facebook, Haixiong Wang presented their work on efficient measurement of quality at scale in their video ecosystem [6]. Lucjan Janowski (AGH University) proposed a discussion on more ecologically valid subjective experiments, Alan Bovik (University of Texas at Austin) presented a hitchhiker’s guide to SSIM, and Ali Ak (Université de Nantes) presented a comprehensive analysis of crowdsourcing for subjective evaluation of tone mapping operators. Finally, Rohit Puri (Twitch) opened a discussion on the research on QoE metrics for live media streaming applications, which led to the agreement to start a new sub-project within AVHD group on this topic.
The chairs of the PsyPhyQA group provided an update on the activities carried out. In this sense, a test plan for psychophysiological video quality assessment was established and currently the group is aiming to develop ideas to do quality assessment tests with psychophysiological measures in times of a pandemic and to collect and discuss ideas about possible joint works. In addition, the project is trying to learn about physiological correlates of simulator sickness, and in this sense, a presentation was delivered J.P. Tauscher (Technische Universität Braunschweig) on exploring neural and peripheral physiological correlates of simulator sickness. Finally, Waqas Ellahi (Université de Nantes) gave a presentation on visual fidelity of tone mapping operators from gaze data using HMM [7].
The report from the chairs of the CGI group covered the progress on the research on assessment methodologies for quality assessment of gaming services (e.g., ITU-T P.809 [10]), on crowdsourcing quality assessment for gaming application (P.808 [11]), on quality prediction and opinion models for cloud gaming (e.g., ITU-T G.1072 [12]), and on models (signal-, bitstream-, and parametric-based models) for video quality assessment of CGI content (e.g., nofu, NDNetGaming, GamingPara, DEMI, NR-GVQM, etc.). In terms of planned activities, the group is targeting the generation of new gaming datasets and tools for metrics to assess gaming QoE, but also the group is aiming at identifying other topics of interest in CGI rather than gaming content. In addition, there was a presentation on updates on gaming standardization activities and deep learning models for gaming quality prediction by Saman Zadtootaghaj (TU Berlin), another one on subjective assessment of multi-dimensional aesthetic assessment for mobile game images by Suiyi Ling (Université de Nantes), and one addressing quality assessment of gaming videos compressed via AV1 by Maria Martini (Kingston University London), leading to interesting discussions on those topics.
Quality Assessment for Computer Vision Applications (QACoViA)
The QACoViA group announced Lu Zhang (INSA Rennes) as new third co-chair, who will also work in the near future in a project related to image compression for optimized recognition by distributed neural networks. In addition, Mikołaj Leszczuk (AGH University) presented a report on a recently finished project related to objective video quality assessment method for recognition tasks, in collaboration with Huawei through its Innovation Research Programme.
5G Key Performance Indicators (5GKPI)
The 5GKPI session was oriented to identify possible interested partners and joint works (e.g., contribution to ITU-T SG12 recommendation G.QoE-5G [14], generation of open/reference datasets, etc.). In this sense, it included four presentations of use cases of interest: tele-operated driving by Yungpeng Zang (5G Automotive Association), content production related to the European project 5G-Records by Paola Sunna (EBU), Augmented/Virtual Reality by Bill Krogfoss (Bell Labs Consulting), and QoE for remote controlled use cases by Kjell Brunnström (RISE).
Immersive Media Group (IMG)
A report on the updates within the IMG group was initially presented, especially covering the current joint work investigating the subjective quality assessment of 360-degree video. In particular, a cross-lab test, involving 10 different labs, were carried out at the beginning of 2020 resulting in relevant outcomes including various contributions to ITU SG12/Q13 and MPEG AhG on Quality of Immersive Media. It is worth noting that the new ITU-T recommendation P.919 [15], related to subjective quality assessment of 360-degree videos (in line with ITU-R BT.500 [8] or ITU-T P.910 [13]), was approved in mid-October, and was supported by the results of these cross-lab tests. Furthermore, since these tests have already finished, there was a presentation by Pablo Pérez (Nokia Bell-Labs) on possible future joint activities within IMG, which led to an open discussion after it that will continue in future audio calls. In addition, a total of four talks covered topics related to immersive media technologies, including an update from the Audiovisual Technology Group of the TU Ilmenau on immersive media topics, and a presentation of a no-reference quality metric for light field content based on a structural representation of the epipolar plane image by Ali Ak and Patrick Le Callet (Université de Nantes) [16]. Also, there were two presentations related to 3D graphical contents, one addressing the perceptual characterization of 3D graphical contents based on visual attention patterns by Mona Abid (Université de Nantes), and another one comparing subjective methods for quality assessment of 3D graphics in virtual reality by Yana Nehmé (INSA Lyon).
Intersector Rapporteur Group on Audiovisual Quality Assessment (IRG-AVQA) and Q19 Interim Meeting
Chulhee Lee (Yonsei University) chaired the IRG-AVQA session, providing an overview on the progress and recent works within ITU-R WP6C in HDR related topics and ITU-T SG12 Questions 9, 13, 14, 19 (e.g., P.NATS Phase 2 and follow-ups, subjective assessment of 360-degree video, QoE factors for AR applications, etc.). In addition, a new work item was announced within ITU-T SG9: End-to-end network characteristics requirements for video services (J.pcnp-char [17]). From the discussions raised during this session, a new dedicated group was set up to work on introducing and provide guidelines on implementing objective video quality metrics, ahead of PSNR, to the video compression community. The group was named “Implementers Guide for Video Quality Metrics (IGVQM)” and will be chaired by Ioannis Katsavounidis (Facebook), accounting with the involvement of several people from VQEG. After the IRG-AVQA session, the Q19 interim meeting took place with a report by Chulhee Lee and a presentation by Zhi Li (Netflix) on an update on improvements on subjective experiment data analysis process.
Other updates
Apart from the aforementioned groups, the Human Factors for Visual Experience (HVEI) is still active coordinating VQEG activities in liaison with the IEEE Standards Association Working Groups on HFVE, especially on perceptual quality assessment of 3D, UHD and HD contents, quality of experience assessment for VR and MR, quality assessment of light-field imaging contents, and deep-learning-based assessment of visual experience based on human factors. In this sense, there are ongoing contributions from VQEG members to IEEE Standards. In addition, there was a workshop dedicated to user testing during Covid-19, which included a presentation on precaution for lab experiments by Kjell Brunnström (RISE), another presentation by Babak Naderi (TU Berlin) on subjective tests during the pandemic, and a break-out session for discussions on the topic.
Finally, the next VQEG plenary meeting will take place in spring 2021 (exact dates still to be agreed), probably online again.
JPEG initiates standardisation of image compression based on AI
The 89th JPEG meeting was held online from 5 to 9 October 2020.
During this meeting, multiple JPEG standardisation activities and explorations were discussed and progressed. Notably, the call for evidence on learning-based image coding was successfully completed and evidence was found that this technology promises several new functionalities while offering at the same time superior compression efficiency, beyond the state of the art. A new work item, JPEG AI, that will use learning-based image coding as core technology has been proposed, enlarging the already wide families of JPEG standards.
Figure 1. JPEG Families of standards and JPEG AI.
The 89th JPEG meeting had the following highlights:
JPEG AI call for evidence report
JPEG explores standardization needs to address fake media
JPEG Pleno Point Cloud Coding reviews the status of the call for evidence
JPEG Pleno Holography call for proposals timeline
JPEG DNA identifies use cases and requirements
JPEG XL standard defines the final specification
JPEG Systems JLINK reaches committee draft stage
JPEG XS 2nd Edition Parts 1, 2 and 3.
JPEG AI
At the 89th meeting, the submissions to the Call for Evidence on learning-based image coding were presented and discussed. Four submissions were received in response to the Call for Evidence. The results of the subjective evaluation of the submissions to the Call for Evidence were reported and discussed in detail by experts. It was agreed that there is strong evidence that learning-based image coding solutions can outperform the already defined anchors in terms of compression efficiency when compared to state-of-the-art conventional image coding architecture. Thus, it was decided to create a new standardisation activity for a JPEG AI on learning-based image coding system, that applies machine learning tools to achieve substantially better compression efficiency compared to current image coding systems, while offering unique features desirable for efficient distribution and consumption of images. This type of approach should allow obtaining an efficient compressed domain representation not only for visualisation but also for machine learning-based image processing and computer vision. JPEG AI releases to the public the results of the objective and subjective evaluations as well as the first version of common test conditions for assessing the performance of learning-based image coding systems.
JPEG explores standardization needs to address fake media
Recent advances
in media modification, particularly deep learning-based approaches, can produce
near realistic media content that is almost indistinguishable from authentic
content. These developments open opportunities for production of new types of
media contents that are useful for many creative industries but also increase risks
of spread of maliciously modified content (e.g., ‘deepfake’) leading to social
unrest, spreading of rumours or encouragement of hate crimes. The JPEG
Committee is interested in exploring if a JPEG standard can facilitate a secure
and reliable annotation of media modifications, both in good faith and
malicious usage scenarios.
The JPEG is currently discussing with stakeholders from academia, industry and other organisations to explore the use cases that will define a roadmap to identify the requirements leading to a potential standard. The Committee has received significant interest and has released a public document outlining the context, use cases and requirements. JPEG invites experts and technology users to actively participate in this activity and attend a workshop, to be held online in December 2020. Details on the activities of JPEG in this area can be found on the JPEG.org website. Interested parties are notably encouraged to register to the mailing list of the ad hoc group that has been set up to facilitate the discussions and coordination on this topic.
JPEG Pleno Point Cloud Coding
JPEG Pleno is working towards the
integration of various modalities of plenoptic content under a single and seamless framework. Efficient and
powerful point cloud representation is a key
feature within this vision. Point cloud data supports a wide range of applications
including computer-aided manufacturing,
entertainment, cultural heritage preservation, scientific research and advanced
sensing and analysis. During the 89th JPEG meeting, the JPEG Committee reviewed
expressions of interest in the Final Call for Evidence on JPEG Pleno Point
Cloud Coding. This Call for Evidence focuses specifically on point cloud coding
solutions supporting scalability and random access of decoded point clouds.
Between its 89th and 90th meetings, the JPEG Committee will be actively
promoting this activity and collecting submissions to participate in the Call
for Evidence.
JPEG Pleno Holography
At the 89th meeting, the JPEG Committee released an updated draft of the Call for Proposals for JPEG Pleno Holography. A final Call for Proposals on JPEG Pleno Holography will be released in April 2021. JPEG Pleno Holography is seeking for compression solutions of holographic content. The scope of the activity is quite large and addresses diverse use cases such as holographic microscopy and tomography, but also holographic displays and printing. Current activities are centred around refining the objective and subjective quality assessment procedures. Interested parties are already invited at this stage to participate in these activities.
JPEG DNA
JPEG standards are used in storage and archival of digital pictures. This puts the JPEG Committee in a good position to address the challenges of DNA-based storage by proposing an efficient image coding format to create artificial DNA molecules. JPEG DNA has been established as an exploration activity within the JPEG Committee to study use cases, to identify requirements and to assess the state of the art in DNA storage for the purpose of image archival using DNA in order to launch a standardization activity. To this end, a first workshop was organised on 30 September 2020. Presentations made at the workshop are available from the following URL: http://ds.jpeg.org/proceedings/JPEG_DNA_1st_Workshop_Proceedings.zip. At its 89th meeting, the JPEG Committee released a second version of a public document that describes its findings regarding storage of digital images using artificial DNA. In this framework, JPEG DNA ad hoc group was re-conducted in order to continue its activities to further refine the above-mentioned document and to organise a second workshop. Interested parties are invited to join this activity by participating in the AHG through the following URL: http://listregistration.jpeg.org.
JPEG XL
Final
technical comments by national bodies have been addressed and incorporated into
the JPEG XL specification (ISO/IEC 18181-1) and the reference implementation. A
draft FDIS study text has been prepared and final validation experiments are
planned.
JPEG Systems
The JLINK (ISO/IEC 19566-7) standard has reached the committee draft stage and will be made public. The JPEG Committee invites technical feedback on the document which is available on the JPEG website. Development of the JPEG Snack (IS0/IEC 19566-8) standard has begun to support the defined use cases and requirements. Interested parties can subscribe to the mailing list of the JPEG Systems AHG in order to contribute to the above activities.
JPEG XS
The JPEG committee is finalizing its work on the 2nd Editions of JPEG-XS Part 1, Part 2 and Part 3. Part 1 defines new coding tools required to efficiently compress raw Bayer images. The observed quality gains of raw Bayer compression over compressing in the RGB domain can be as high as 5dB PSNR. Moreover, the second edition adds support for mathematically lossless image compression and allows compression of 4:2:0 sub-sampled images. Part 2 defines new profiles for such content. With the support for low-complexity high-quality compression of raw Bayer (or Color-Filtered Array) data, JPEG XS proves to also be an excellent compression scheme in the professional and consumer digital camera market, as well as in the machine vision and automotive industry.
Final Quote
“JPEG AI will be a new work item completing the collection of JPEG standards. JPEG AI relies on artificial intelligence to compress images. This standard not only will offer superior compression efficiency beyond the current state of the art but also will open new possibilities for vision tasks by machines and computational imaging for humans.” Said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.
Future JPEG meetings are planned as follows:
No 90, will be held online from January 18 to 22, 2021.
N0 91, will be held online from April 19 to 23, 2021.
The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.
The 132nd MPEG meeting was the first meeting with the new structure. That is, ISO/IEC JTC 1/SC 29/WG 11 — the official name of MPEG under the ISO structure — was disbanded after the 131st MPEG meeting and some of the subgroups of WG 11 (MPEG) have been elevated to independent MPEG Working Groups (WGs) and Advisory Groups (AGs) of SC 29 rather than subgroups of the former WG 11. Thus, the MPEG community is now an affiliated group of WGs and AGs that will continue meeting together according to previous MPEG meeting practices and will further advance the standardization activities of the MPEG work program.
The 132nd MPEG meeting was the first meeting with the new structure as follows (incl. Convenors and position within WG 11 structure):
AG 2 MPEG Technical Coordination (Convenor: Prof. Jörn Ostermann; for overall MPEG work coordination and prev. known as the MPEG chairs meeting; it’s expected that one can also provide inputs to this AG without being a member of this AG)
WG 2 MPEG Technical Requirements (Convenor Dr. Igor Curcio; former Requirements subgroup)
WG 3 MPEG Systems (Convenor: Dr. Youngkwon Lim; former Systems subgroup)
WG 4 MPEG Video Coding (Convenor: Prof. Lu Yu; former Video subgroup)
WG 5 MPEG Joint Video Coding Team(s) with ITU-T SG 16 (Convenor: Prof. Jens-Rainer Ohm; former JVET)
WG 6 MPEG Audio Coding (Convenor: Dr. Schuyler Quackenbush; former Audio subgroup)
WG 7 MPEG Coding of 3D Graphics (Convenor: Prof. Marius Preda, former 3DG subgroup)
WG 8 MPEG Genome Coding (Convenor: Prof. Marco Mattaveli; newly established WG)
AG 3 MPEG Liaison and Communication (Convenor: Prof. Kyuheon Kim; (former Communications subgroup)
AG 5 MPEG Visual Quality Assessment (Convenor: Prof. Mathias Wien; former Test subgroup).
The 132nd MPEG meeting was held as an online meeting and more than 300 participants continued to work efficiently on standards for the future needs of the industry. As a group, MPEG started to explore new application areas that will benefit from standardized compression technology in the future. A new web site has been created and can be found at http://mpeg.org/.
The official press release can be found here and comprises the following items:
Versatile Video Coding (VVC) Ultra-HD Verification Test Completed and Conformance and Reference Software Standards Reach their First Milestone
MPEG Completes Geometry-based Point Cloud Compression (G-PCC) Standard
MPEG Evaluates Extensions and Improvements to MPEG-G and Announces a Call for Evidence on New Advanced Genomics Features and Technologies
MPEG Issues Draft Call for Proposals on the Coded Representation of Haptics
MPEG Evaluates Responses to MPEG IPR Smart Contracts CfP
MPEG Completes Standard on Harmonization of DASH and CMAF
MPEG Completes 2nd Edition of the Omnidirectional Media Format (OMAF)
MPEG Completes the Low Complexity Enhancement Video Coding (LCEVC) Standard
In this report, I’d like to focus on VVC, G-PCC, DASH/CMAF, OMAF, and LCEVC.
Versatile Video Coding (VVC) Ultra-HD Verification Test Completed and Conformance & Reference Software Standards Reach their First Milestone
MPEG completed a verification testing assessment of the recently ratified Versatile Video Coding (VVC) standard for ultra-high definition (UHD) content with standard dynamic range, as may be used in newer streaming and broadcast television applications. The verification test was performed using rigorous subjective quality assessment methods and showed that VVC provides a compelling gain over its predecessor — the High-Efficiency Video Coding (HEVC) standard produced in 2013. In particular, the verification test was performed using the VVC reference software implementation (VTM) and the recently released open-source encoder implementation of VVC (VVenC):
Using its reference software implementation (VTM), VVC showed bit rate savings of roughly 45% over HEVC for comparable subjective video quality.
Using VVenC, additional bit rate savings of more than 10% relative to VTM were observed, which at the same time runs significantly faster than the reference software implementation.
Additionally, the standardization work for both conformance testing and reference software for the VVC standard reached its first major milestone, i.e., progressing to the Committee Draft ballot in the ISO/IEC approval process. The conformance testing standard (ISO/IEC 23090-15) will ensure interoperability among the diverse applications that use the VVC standard, and the reference software standard (ISO/IEC 23090-16) will provide an illustration of the capabilities of VVC and a valuable example showing how the standard can be implemented. The reference software will further facilitate the adoption of the standard by being available for use as the basis of product implementations.
Research aspects: as for every new video codec, its compression efficiency and computational complexity are important performance metrics. While the reference software (VTM) provides a valid reference in terms of compression efficiency it is not optimized for runtime. VVenC seems to provide already a significant improvement and with x266 another open source implementation will be available soon. Together with AOMedia’s AV1 (including its possible successor AV2), we are looking forward to a lively future in the area of video codecs.
MPEG Completes Geometry-based Point Cloud Compression Standard
MPEG promoted its ISO/IEC 23090-9 Geometry-based Point Cloud Compression (G-PCC) standard to the Final Draft International Standard (FDIS) stage. G-PCC addresses lossless and lossy coding of time-varying 3D point clouds with associated attributes such as color and material properties. This technology is particularly suitable for sparse point clouds. ISO/IEC 23090-5 Video-based Point Cloud Compression (V-PCC), which reached the FDIS stage in July 2020, addresses the same problem but for dense point clouds, by projecting the (typically dense) 3D point clouds onto planes, and then processing the resulting sequences of 2D images using video compression techniques. The generalized approach of G-PCC, where the 3D geometry is directly coded to exploit any redundancy in the point cloud itself, is complementary to V-PCC and particularly useful for sparse point clouds representing large environments.
Point clouds are typically represented by extremely large amounts of data, which is a significant barrier to mass-market applications. However, the relative ease of capturing and rendering spatial information compared to other volumetric video representations makes point clouds increasingly popular for displaying immersive volumetric data. The current draft reference software implementation of a lossless, intra-frame G‐PCC encoder provides a compression ratio of up to 10:1 and lossy coding of acceptable quality for a variety of applications with a ratio of up to 35:1.
By providing high immersion at currently available bit rates, the G‐PCC standard will enable various applications such as 3D mapping, indoor navigation, autonomous driving, advanced augmented reality (AR) with environmental mapping, and cultural heritage.
Research aspects: the main research focus related to G-PCC and V-PCC is currently on compression efficiency but one should not dismiss its delivery aspects including its dynamic, adaptive streaming. A recent paper on this topic has been published in the IEEE Communications Magazine and is entitled “From Capturing to Rendering: Volumetric Media Delivery With Six Degrees of Freedom“.
MPEG Finalizes the Harmonization of DASH and CMAF
MPEG successfully completed the harmonization of Dynamic Adaptive Streaming over HTTP (DASH) with Common Media Application Format (CMAF) featuring a DASH profile for the use with CMAF (as part of the 1st Amendment of ISO/IEC 23009-1:2019 4th edition).
CMAF and DASH segments are both based on the ISO Base Media File Format (ISOBMFF), which per se enables smooth integration of both technologies. Most importantly, this DASH profile defines (a) a normative mapping of CMAF structures to DASH structures and (b) how to use Media Presentation Description (MPD) as a manifest format. Additional tools added to this amendment include
DASH events and timed metadata track timing and processing models with in-band event streams,
a method for specifying the resynchronization points of segments when the segments have internal structures that allow container-level resynchronization,
an MPD patch framework that allows the transmission of partial MPD information as opposed to the complete MPD using the XML patch framework as defined in IETF RFC 5261, and
content protection enhancements for efficient signalling.
It is expected that the 5th edition of the MPEG DASH standard (ISO/IEC 23009-1) containing this change will be issued at the 133rd MPEG meeting in January 2021. An overview of DASH standards/features can be found in the Figure below.
Research aspects: one of the features enabled by CMAF is low latency streaming that is actively researched within the multimedia systems community (e.g., here). The main research focus has been related to the ABR logic while its impact on the network is not yet fully understood and requires strong collaboration among stakeholders along the delivery path including ingest, encoding, packaging, (encryption), content delivery network (CDN), and consumption. A holistic view on ABR is needed to enable innovation and the next step towards the future generation of streaming technologies (https://athena.itec.aau.at/).
MPEG Completes 2nd Edition of the Omnidirectional Media Format
MPEG completed the standardization of the 2nd edition of the Omnidirectional MediA Format (OMAF) by promoting ISO/IEC 23009-2 to Final Draft International Standard (FDIS) status including the following features:
“Late binding” technologies to deliver and present only that part of the content that adapts to the dynamically changing users’ viewpoint. To enable an efficient implementation of such a feature, this edition of the specification introduces the concept of bitstream rewriting, in which a compliant bitstream is dynamically generated that, by combining the received portions of the bitstream, covers only the users’ viewport on the client.
Extension of OMAF beyond 360-degree video. This edition introduces the concept of viewpoints, which can be considered as user-switchable camera positions for viewing content or as temporally contiguous parts of a storyline to provide multiple choices for the storyline a user can follow.
Enhances the use of video, image, or timed text overlays on top of omnidirectional visual background video or images related to a sphere or a viewport.
Research aspects: standards usually define formats to enable interoperability but various informative aspects are left open for industry competition and subject to research and development. The same holds for OMAF and its 2nd edition enables researchers and developers to work towards efficient viewport-adaptive implementations focusing on the users’ viewport.
MPEG Completes the Low Complexity Enhancement Video Coding Standard
MPEG is pleased to announce the completion of the new ISO/IEC 23094-2 standard, i.e., Low Complexity Enhancement Video Coding (MPEG-5 Part 2 LCEVC), which has been promoted to Final Draft International Standard (FDIS) at the 132nd MPEG meeting.
LCEVC adds an enhancement data stream that can appreciably improve the resolution and visual quality of reconstructed video with an effective compression efficiency of limited complexity by building on top of existing and future video codecs.
LCEVC can be used to complement devices originally designed only for decoding the base layer bitstream, by using firmware, operating system, or browser support. It is designed to be compatible with existing video workflows (e.g., CDNs, metadata management, DRM/CA) and network protocols (e.g., HLS, DASH, CMAF) to facilitate the rapid deployment of enhanced video services.
LCEVC can be used to deliver higher video quality in limited bandwidth scenarios, especially when the available bit rate is low for high-resolution video delivery and decoding complexity is a challenge. Typical use cases include mobile streaming and social media, and services that benefit from high-density/low-power transcoding.
Research aspects: LCEVC provides a kind of scalable video coding by combining hardware- and software-based decoders that allow for certain flexibility as part of regular software life cycle updates. However, LCEVC has been never compared to Scalable Video Coding (SVC) and Scalable High-Efficiency Video Coding (SHVC) which could be an interesting aspect for future work.
The 133rd MPEG meeting will be again an online meeting in January 2021.
Click here for more information about MPEG meetings and their developments.
An introduction to the QUALINET White Paper on Definitions of Immersive Media Experience (IMEx) [1].
Introduction
Immersive media are reshaping the way users experience reality. They are increasingly incorporated across enterprise and consumer sectors to offer experiential solutions to a diverse range of industries. Current technologies that afford an immersive media experience (IMEx) include Augmented Reality (AR), Virtual Reality (VR), Mixed Reality (MR), and 360-degree video. Popular uses can be found in enhancing connectivity applications, supporting knowledge-based tasks, learning & skill development, as well as adding immersive and interactive dimensions to the retail, business, and entertainment industries. Whereas the evolution of immersive media can be traced over the past 50 years, its current popularity boost is primarily owed to significant advances in the last decade brought about by improved connectivity, superior computing, and device capabilities. In specific, advancements witnessed in display technologies, visualizations, interaction & tracking devices, recognition technologies, platform development, new media formats, and increasing user demand for real-time & dynamic content across platforms.
Though still in its infancy, the immersive economy is growing into a dynamic and confident sector. Being an emerging sector, it is hard to find official data, but some estimations project the immersive media global market size to continue its upward growth at around 30% CAGR to reach USD180 Bn by 2022 [2,3]. Country-wise, the USA is expected to secure 1/3rd of the global immersive media market share followed by China, Japan, Germany, and the UK as likely immersive media markets where significant spending is anticipated. Consumer products and devices are poised to be the largest contributing segment. The growth in immersive consumer products is expected to continue as Head-Mounted Displays (HMD) become commonplace and interest in mobile augmented reality increase [4]. However, immersive media are no longer just a pursuit of alternative display technologies but pushing towards holistic ecosystems that seek contributions from hardware manufacturers, application & platform developers, content producers, and users. These ecosystems are making way for sophisticated content creation available on platforms that allow user participation, interaction, and skill integration through advanced tools.
Immersive media experience (IMEx), today, is not only how users view media but in fact a transformative way to consume media altogether. They draw considerable interdisciplinary interest from multiple disciplines. As stakeholders increase, the need for clarity and coherence on definitions and concepts become all the more important. In this article, we provide an overview and a brief survey of some of the key definitions that are central to IMEx including its Quality of Experience (QoE), application areas, influencing factors, and assessment methods. Our aim is to enable some clarity and initiate consensus, on topics related to IMEx that can be useful for researchers and practitioners working both inside academia and the industry.
Why understand IMEx?
IMEx combines reality with technology enabling emplaced multimedia experiences of standard media (film, photographic, or animated) as well as synthetic and interactive environments for users. They utilize visual, auditory, and haptic feedback to stimulate physical senses such that users psychologically feel immersed within these multidimensional media environments. This sense of “being there” is also referred to as presence.
As mentioned earlier, the enthusiasm for IMEx is mainly driven by the gaming, entertainment, retail, healthcare, digital marketing, and skill training industries. So far, research has tilted favourably towards innovation, with a particular interest in image capture, recognition, mapping, and display technologies over the past few years. However, the prevalence of IMEx has also ushered in a plethora of definitions, frameworks, and models to understand the psychological and phenomenological concepts associated with these media forms. Central, of course, are the closely related concepts of immersion and presence, which are interpreted varyingly across fields; for example, when one moves from literature to narratology to computer sciences. However, with immersive media, these three separate fields come together inside interactive digital narrative applications where immersive narratives are used to solve real-world problems. This is when noticeable interdisciplinary differences regarding definitions, scope, and constituents require urgent redressal to achieve a coherent understanding of the used concepts. Such consensus is vital for giving directionality to the future of immersive media that can be shared by all.
A White Paper on IMEx
A recent White Paper [1] by QUALINET, the European Network on Quality of Experience in Multimedia Systems and Services [5], is a contribution to the discussions related to Immersive Media Experience (IMEx). It attempts to build consensus around ideas and concepts that are related to IMEx but originate from multidisciplinary groups with a joint interest in multimedia experiences.
The QUALINET community aims at extending the notion of network-centric Quality of Service (QoS) in multimedia systems, by relying on the concept of Quality of Experience (QoE). The main scientific objective is the development of methodologies for subjective and objective quality metrics considering current and new trends in multimedia communication systems as witnessed by the appearance of new types of content and interactions.
The white paper was created based on an activity launched at the 13th QUALINET meeting on June 4, 2019, in Berlin as part of Task Force 7, Immersive Media Experiences (IMEx). The paper received contributions from 44 authors under 10 section leads, which were consolidated into a first draft and released among all section leads and editors for internal review. After incorporating the feedback from all section leads, the editors initially released the White Paper within the QUALINET community for review. Following feedback from QUALINET at large, the editors distributed the White Paper widely for an open, public community review (e.g., research communities/committees in ACM and IEEE, standards development organizations, various open email reflectors related to this topic). The feedback received from this public consultation process resulted in the final version which has been approved during the 14th QUALINET meeting on May 25, 2020.
Understanding the White Paper
The White Paper surveys definitions and concepts that contribute to IMEx. It describes the Quality of Experience (QoE) for immersive media by establishing a relationship between the concepts of QoE and IMEx. This article provides an outline of these concepts by looking at:
Survey of definitions of immersion and presence discusses various frameworks and conceptual models that are most relevant to these phenomena in terms of multimedia experiences.
Definition of immersive media experience describes experiential determinants for IMEx characterized through its various technological contexts.
Quality of experience for immersive media applies existing QoE concepts to understand the user-centric subjective feelings of “a sense of being there”, “a sense of agency”, and “cybersickness”.
The application area for immersive media experience presents an overview of immersive technologies in use within gaming, omnidirectional content, interactive storytelling, health, entertainment, and communications.
Influencing factors on immersive media experience look at the three existing influence factors on QoE with a pronounced emphasis on the human influence factor as of very high relevance to IMEx.
Assessment of immersive media experience underscores the importance of proper examination of multimedia systems, including IMEx, by highlighting three methods currently in use, i.e., subjective, behavioural, and psychophysiological.
Standardization activities discuss the three clusters of activities currently underway to achieve interoperability for IMEx: (i) data representation & formats; (ii) guidelines, systems standards, & APIs; and (iii) Quality of Experience (QoE).
Conclusions
Immersive media have significantly changed the use and experience of new digital media. These innovative technologies transcend traditional formats and present new ways to interact with digital information inside synthetic or enhanced realities, which include VR, AR, MR, and haptic communications. Earlier the need for a multidisciplinary consensus was discussed vis-à-vis definitions of IMEx. The QUALINET white paper provides such “a toolbox of definitions” for IMEx. It stands out for bringing together insights from multimedia groups spread across academia and industry, specifically the Video Quality Experts Group (VQEG) and the Immersive Media Group (IMG). This makes it a valuable asset for those working in the field of IMEx going forward.
References
[1] Perkis, A., Timmerer, C., et al., “QUALINET White Paper on Definitions of Immersive Media Experience (IMEx)”, European Network on Quality of Experience in Multimedia Systems and Services, 14th QUALINET meeting (online), May 25, 2020. Online: https://arxiv.org/abs/2007.07032 [2] Mateos-Garcia, J., Stathoulopoulos, K., & Thomas, N. (2018). The immersive economy in the UK (Rep. No. 18.1137.020). Innovate UK. [3] Infocomm Media 2025 Supplementary Information (pp. 31-43, Rep.). (2015). Singapore: Ministry of Communications and Information. [4] Hadwick, A. (2020). XR Industry Insight Report 2019-2020 (Rep.). San Francisco: VRX Conference & Expo. [5]http://www.qualinet.eu/