What is the trade-off between CO2 emission and video-conferencing QoE?

It is a natural thing that users of multimedia services want to have the highest possible Quality of Experience (QoE), when using said services. This is especially so in contexts such as video-conferencing and video streaming services, which are nowadays a large part of many users’ daily life, be it work-related Zoom calls, or relaxing while watching Netflix. This has implications in terms of the energy consumed for the provision of those services (think of the cloud services involved, the networks, and the users’ own devices), and therefore it also has an impact on the resulting CO₂ emissions. In this column, we look at the potential trade-offs involved between varying levels of QoE (which for video services is strongly correlated with the bit rates used), and the resulting CO₂ emissions. We also look at other factors that should be taken into account when making decisions based on these calculations, in order to provide a more holistic view of the environmental impact of these types of services, and whether they do have a significant impact.

Energy Consumption and CO2 Emissions for Internet Service Delivery

Understanding the footprint of Internet service delivery is a challenging task. On one hand, the infrastructure and software components involved in the service delivery need to be known. For a very fine-grained model, this requires knowledge of all components along the entire service delivery chain: end-user devices, fixed or mobile access network, core network, data center and Internet service infrastructure. Furthermore, the footprint may need to consider the CO₂ emissions for producing and manufacturing the hardware components as well as the CO₂ emissions during runtime. Life cycle assessment is then necessary to obtain CO₂ emission per year for hardware production. However, one may argue that the infrastructure is already there and therefore the focus will be on the energy consumption and CO₂ emission during runtime and delivery of the services. This is also the approach we follow here to provide quantitative numbers of energy consumption and CO₂ emission for Internet-based video services. On the other hand, quantitative numbers are needed beyond the complexity of understanding and modelling the contributors to energy consumption and C02 emission.

To overcome this complexity, the literature typically considers key figures on the overall data traffic and service consumption times aggregated over users and services over a longer period of time, e.g., one year. In addition, the total energy consumption of mobile operators and data centres is considered. Together with the information on e.g., the number of base station sites, this gives some estimates, e.g., on the average power consumption per site or the average data traffic per base station site [Feh11]. As a result, we obtain measures such as energy per bit (Joule/bit) determining the energy efficiency of a network segment. In [Yan19], the annual energy consumption of Akamai is converted to power consumption and then divided by the maximum network traffic, which results again in the energy consumption per bit of Akamai’s data centers. Knowing the share of energy sources (nonrenewable energy, including coal, natural gas, oil, diesel, petroleum; renewable energy including solar, geothermal, wind energy, biomass, hydropower from flowing water), allows relating the energy consumption to the total CO₂ emissions. For example, the total contribution from renewables exceeded 40% in 2021 in Germany and Finland, Norway has about 60%, Croatia about 36% (statistics from 2020).

A detailed model of the total energy consumption of mobile network services and applications is provided in [Yan19]. Their model structure considers important factors from each network segment from cloud to core network, mobile network, and end-user devices. Furthermore, service-specific energy consumption are provided. They found that there are strong differences between the service type and the emerging data traffic pattern. However, key factors are the amount of data traffic and the duration of the services. They also consider different end-to-end network topologies (user-to-data center, user-to-user via data center, user-to-user and P2P communication). Their model of the total energy consumption is expressed as the sum of the energy consumption of the different segments:

  • Smartphone: service-specific energy depends among others on the CPU usage and the network usage e.g. 4G over the duration of use,
  • Base station and access network: data traffic and signalling traffic over the duration of use,
  • Wireline core network: service specific energy consumption of a mobile service taking into account the data traffic volume and the energy per bit,
  • Data center: energy per bit of the data center is multiplied by data traffic volume of the mobile service.

The Shift Project [TSP19] provides a similar model which is called the “1 Byte Model”. The computation of energy consumption is transparently provided in calculation sheets and discussed by the scientific community. As a result of the discussions [Kam20a,Kam20b], an updated model was released [TSP20] clarifying a simple bit/byte conversion issue. The suggested models in [TSP20, Kam20b] finally lead to comparable numbers in terms of energy consumption and CO₂ emission. As a side remark: Transparency and reproducibility are key for developing such complex models!

The basic idea of the 1 Byte Model for computing energy consumption is to take into account the time t of Internet service usage and the overall data volume v. The time of use is directly related to the energy consumption of the display of an end-user device, but also for allocating network resources. The data volume to transmit through the network, but also to generate or process data for cloud services, drives the energy consumption additionally. The model does not differentiate between Internet services, but they will result in different traffic volumes over the time of use. Then, for each segment i (device, network, cloud) a linear model E_i(t,v)=a_i * t + b_i * v + c_i is provided to quantify the energy consumption. To be more precise, the different coefficients are provided for each segment by [TSP20]. The overall energy consumption is then E_total = E_device + E_network + E_cloud.

CO₂ emission is then again a linear model of the total energy consumption (over the time of use of a service), which depends on the share of nonrenewable and renewable energies. Again, The Shift Project derives such coefficients for different countries and we finally obtain CO2 = k_country * E_total.

The Trade-off between QoE and CO2 Emissions

As a use case, we consider hosting a scientific conference online through video-conferencing services. Assume there are 200 conference participants attending the video-conferencing session. The conference lasts for one week, with 6 hours of online program per day.  The video conference software requires the following data rates for streaming the sessions (video including audio and screen sharing):

  • high-quality video: 1.0 Mbps
  • 720p HD video: 1.5 Mbps
  • 1080p HD video: 3 Mbps

However, group video calls require even higher bandwidth consumption. To make such experiences more immersive, even higher bit rates may be necessary, for instance, if using VR systems for attendance.

A simple QoE model may map the video bit rate of the current video session to a mean opinion score (MOS). [Lop18] provides a logistic regression MOS(x) depending on the video bit rate x in Mbps: f(x) = m_1 log x + m_2

Then, we can connect the QoE model with the energy consumption and CO₂ emissions model from above in the following way. We assume a user attending the conference for time t. With a video bit rate x, the emerging data traffic is v = x*t. Those input parameters are now used in the 1 Byte Model for a particular device (laptop, smartphone), type of network (wired, wifi, mobile), and country (EU, US, China).

Figure 1 shows the trade-off between the MOS and energy consumption (left y-axis). The energy consumption is mapped to CO₂ emission by assuming the corresponding parameter for the EU, and that the conference participants are all connected with a laptop. It can be seen that there is a strong increase in energy consumption and CO₂ emission in order to reach the best possible QoE. The MOS score of 4.75 is reached if a video bit rate of roughly 11 Mbps is used. However, with 4.5 Mbps, a MOS score of 4 is already reached according to that logarithmic model. This logarithmic behaviour is a typical observation in QoE and is connected to the Weber-Fechner law, see [Rei10]. As a consequence, we may significantly save energy and CO₂ when not providing the maximum QoE, but “only” good quality (i.e., MOS score of 4). The meaning of the MOS ratings is 5=Excellent, 4=Good, 3=Fair, 2=Poor, 1=Bad quality.

Figure 1: Trade-off between MOS and energy consumption or CO2 emission.

Figure 2, therefore, visualized the gain when delivering the video in lower quality and lower video bit rates. In fact, the gain compared to the efforts for MOS 5 are visualized. To get a better understanding of the meaning of those CO₂ numbers, we express the CO₂ gain now in terms of thousands of kilometers driving by car. Since the CO₂ emission depends on the share of renewable energies, we may consider different countries and the parameters as provided in [TSP20]. We see that ensuring each conference participant a MOS score of 4 instead of MOS 5 results in savings corresponding to driving approximately 40000 kilometers by car assuming the renewable energy share in the EU – this is the distance around the Earth! Assuming the energy share in China, this would save more than 90000 kilometers. Of course, you could also save 90 000 kilometers by walking – which requires however about 2 years non-stop with a speed of 5 km/h. Note that this large amount of CO₂ emission is calculated assuming a data rate of 15 Mbps over 5 days (and 6 hours per day), resulting in about 40.5 TB of data that needs to be transferred to the 200 conference participants.

Figure 2: Relating the CO2 emission in different countries for achieving this MOS to the distance by travelling in a car (in thousands of kilometers).

Discussions

Raising awareness of CO₂ emissions due to Internet service consumption is crucial. The abstract CO₂ emission numbers may be difficult to understand, but relating this to more common quantities helps to understand the impact individuals have. Of course, the provided numbers only give an impression, since the models are very simple and do not take into account various facets. However, the numbers nicely demonstrate the potential trade-off between QoE of end-users and sustainability in terms of energy consumption and CO₂ emission. In fact, [Gna21] conducted qualitative interviews and found that there is a lack of awareness of the environmental impact of digital applications and services, even for digital natives. In particular, an underlying issue is that there is a lack of understanding among end-users as to how Internet service delivery works, which infrastructure components play a role and are included along the end-to-end service delivery path, etc. Hence, the environmental impact is unclear for many users. Our aim is thus to contribute to overcoming this issue by raising awareness on this matter, starting with simplified models and visualizations.

[Gna21] also found that users indicate a certain willingness to make compromises between their digital habits and the environmental footprint. Given global climate changes and increased environmental awareness among the general population, such a trend in willingness to make compromises may be expected to further increase in the near future. Hence, it may be interesting for service providers to empower users to decide their environmental footprint at the cost of lower (yet still satisfactory) quality. This will also reduce the costs for operators and seems to be a win-win situation if properly implemented in Internet services and user interfaces.

Nevertheless, tremendous efforts are also currently being undertaken by Internet companies to become CO₂ neutral in the future. For example, Netflix claims in [Netflix2021] that they plan to achieve net-zero greenhouse gas emissions by the close of 2022. Similarly, also economic, societal, and environmental sustainability is seen as a key driver for 6G research and development [Mat21]. However, the time horizon is on a longer scope, e.g., a German provider claims they will reach climate neutrality for in-house emissions by 2025 at the latest and net-zero from production to the customer by 2040 at the latest [DT21]. Hence, given the urgency of the matter, end-users and all stakeholders along the service delivery chain can significantly contribute to speeding up the process of ultimately achieving net-zero greenhouse gas emissions.

References

  • [TSP19] The Shift Project, “Lean ict: Towards digital sobriety,” directed by Hugues Ferreboeuf, Tech. Rep., 2019, last accessed: March 2022. Available online (last accessed: March 2022)
  • [Yan19] M. Yan, C. A. Chan, A. F. Gygax, J. Yan, L. Campbell,A. Nirmalathas, and C. Leckie, “Modeling the total energy consumption of mobile network services and applications,” Energies, vol. 12, no. 1, p. 184, 2019.
  • [TSP20] Maxime Efoui Hess and Jean-Noël Geist, “Did The Shift Project really overestimate the carbon footprint of online video? Our analysis of the IEA and Carbonbrief articles”, The Shift Project website, June 2020, available online (last accessed: March 2022) PDF
  • [Kam20a] George Kamiya, “Factcheck: What is the carbon footprint of streaming video on Netflix?”, CarbonBrief website, February 2020. Available online (last accessed: March 2022)
  • [Kam20b] George Kamiya, “The carbon footprint of streaming video: fact-checking the headlines”, IEA website, December 2020. Available online (last accessed: March 2022)
  • [Feh11] Fehske, A., Fettweis, G., Malmodin, J., & Biczok, G. (2011). The global footprint of mobile communications: The ecological and economic perspective. IEEE communications magazine, 49(8), 55-62.
  • [Lop18]  J. P. López, D. Martín, D. Jiménez, and J. M. Menéndez, “Prediction and modeling for no-reference video quality assessment based on machine learning,” in 2018 14th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), IEEE, 2018, pp. 56–63.
  • [Gna21] Gnanasekaran, V., Fridtun, H. T., Hatlen, H., Langøy, M. M., Syrstad, A., Subramanian, S., & De Moor, K. (2021, November). Digital carbon footprint awareness among digital natives: an exploratory study. In Norsk IKT-konferanse for forskning og utdanning (No. 1, pp. 99-112).
  • [Rei10] Reichl, P., Egger, S., Schatz, R., & D’Alconzo, A. (2010, May). The logarithmic nature of QoE and the role of the Weber-Fechner law in QoE assessment. In 2010 IEEE International Conference on Communications (pp. 1-5). IEEE.
  • [Netflix21] Netflix: “Environmental Social Governance 2020”,  Sustainability Accounting Standards Board (SASB) Report, (2021, March). Available online (last accessed: March 2022)
  • [Mat21] Matinmikko-Blue, M., Yrjölä, S., Ahokangas, P., Ojutkangas, K., & Rossi, E. (2021). 6G and the UN SDGs: Where is the Connection?. Wireless Personal Communications, 121(2), 1339-1360.
  • [DT21] Hannah Schauff. Deutsche Telekom tightens its climate targets (2021, January). Available online (last accessed: March 2022)

Towards an updated understanding of immersive multimedia experiences

Bringing theories and measurement techniques up to date

Development of technology for immersive multimedia experiences

Immersive multimedia experiences, as its name is suggesting are those experiences focusing on media that is able to immerse users with different interactions into an experience of an environment. Through different technologies and approaches, immersive media is emulating a physical world through the means of a digital or simulated world, with the goal of creating a sense of immersion. Users are involved in a technologically driven environment where they may actively join and participate in the experiences offered by the generated world [White Paper, 2020]. Currently, as hardware and technologies are developing further, those immersive experiences are getting better with the more advanced feeling of immersion. This means that immersive multimedia experiences are exceeding just the viewing of the screen and are enabling bigger potential. This column aims to present and discuss the need for an up to date understanding of immersive media quality. Firstly, the development of the constructs of immersion and presence over time will be outlined. Second, influencing factors of immersive media quality will be introduced, and related standardisation activities will be discussed. Finally, this column will be concluded by summarising why an updated understanding of immersive media quality is urgent.

Development of theories covering immersion and presence

One of the first definitions of presence was established by Slater and Usoh already in 1993 and they defined presence as a “sense of presence” in a virtual environment [Slater, 1993]. This is in line with other early definitions of presence and immersion. For example, Biocca defined immersion as a system property. Those definitions focused more on the ability of the system to technically accurately provide stimuli to users [Biocca, 1995]. As technology was only slowly capable to provide systems that are able to generate stimulation to users that can mimic the real world, this was of course the main content of definitions. Quite early on questionnaires to capture the experienced immersion were introduced, such as the Igroup Presence Questionnaire (IPQ) [Schubert, 2001]. Also, the early methods for measuring experiences are mainly focused on aspects of how good the representation of the real world was done and perceived. With maturing technology, the focus was shifted more towards emotions and more cognitive phenomena besides the basics stimulus generation. For example, Baños and colleagues showed that experienced emotion and immersion are in relation to each other and also influence the sense of presence [Baños, 2004]. Newer definitions focus more on these mentioned cognitive aspects, e.g., Nilsson defines three factors that can lead to immersion: (i) technology, (ii) narratives, and (iii) challenges, where only the factor technology is a non-cognitive one [Nilsson, 2016]. In 2018, Slater defines the place illusion as the illusion of being in a place while knowing one is not really there. This is a focus on a cognitive construct, removal of disbelieve, but still leaves the focus of how the illusion is created mainly on system factors instead of cognitive ones [Slater, 2018]. In recent years, more and more activities were started to define how to measure immersive experiences as an overall construct.

Constructs of interest in relation to immersion and presence

This section discusses constructs and activities that are related to immersion and presence. In the beginning, subtypes of extended reality (XR) and the relation to user experience (UX) as well as quality of experience (QoE) are outlined. Afterwards, recent standardization activities related to immersive multimedia experiences are introduced and discussed.
Moreover, immersive multimedia experiences can be divided by many different factors, but recently the most common distinctions are regarding the interactivity where content can be made for multi-directional viewing as 360-degree videos, or where content is presented through interactive extended reality. Those XR technologies can be divided into mixed reality (MR), augmented reality (AR), augmented virtuality (AV), virtual reality (VR), and everything in between [Milgram, 1995]. Through all those areas immersive multimedia experiences have found a place on the market, and are providing new solutions to challenges in research as well as in industries, with a growing potential of adopting into different areas [Chuah, 2018].

While discussing immersive multimedia experiences, it is important to address user experience and quality of immersive multimedia experiences, which can be defined following the definition of quality of experience itself [White Paper, 2012] as a measure of the delight or annoyance of a customer’s experiences with a service, wherein this case service is an immersive multimedia experience. Furthermore, while defining QoE terms experience and application are also defined and can be utilized for immersive multimedia experience, where an experience is an individual’s stream of perception and interpretation of one or multiple events; and application is a software and/or hardware that enables usage and interaction by a user for a given purpose [White Paper 2012].

As already mentioned, immersive media experiences have an impact in many different fields, but one, where the impact of immersion and presence is particularly investigated, is gaming applications along with QoE models and optimizations that go with it. Specifically interesting is the framework and standardization for subjective evaluation methods for gaming quality [ITU-T Rec. P.809, 2018]. This standardization is providing instructions on how to assess QoE for gaming services from two possible test paradigms, i.e., passive viewing tests and interactive tests. However, even though detailed information about the environments, test set-ups, questionnaires, and game selection materials are available those are still focused on the gaming field and concepts of flow and immersion in games themselves.

Together with gaming, another step in defining and standardizing infrastructure of audiovisual services in telepresence, immersive environments, and virtual and extended reality, has been done in regards to defining different service scenarios of immersive live experience [ITU-T Rec. H.430.3, 2018] where live sports, entertainment, and telepresence scenarios have been described. With this standardization, some different immersive live experience scenarios have been described together with architectural frameworks for delivering such services, but not covering all possible use case examples. When mentioning immersive multimedia experience, spatial audio sometimes referred to as “immersive audio” must be mentioned as is one of the key features of especially of AR or VR experiences [Agrawal, 2019], because in AR experiences it can provide immersive experiences on its own, but also enhance VR visual information.
In order to be able to correctly assess QoE or UX, one must be aware of all characteristics such as user, system, content, and context because their actual state may have an influence on the immersive multimedia experience of the user. That is why all those characteristics are defined as influencing factors (IF) and can be divided into Human IF, System IF, and Context IF and are as well standardized for virtual reality services [ITU-T Rec. G.1035, 2021]. Particularly addressed Human IF is simulator sickness as it specifically occurs as a result of exposure to immersive XR environments. Simulator sickness is also known as cybersickness or VR/AR sickness, as it is visually induced motion sickness triggered by visual stimuli and caused by the sensory conflict arising between the vestibular and visual systems. Therefore, to achieve the full potential of immersive multimedia experience, the unwanted sensation of simulation sickness must be reduced. However, with the frequent change of immersive technology, some hardware improvement is leading to better experiences, but a constant updating of requirement specification, design, and development is needed together with it to keep up with the best practices.

Conclusion – Towards an updated understanding

Considering the development of theories, definitions, and influencing factors around the constructs immersion and presence, one can see two different streams. First, there is a quite strong focus on the technical ability of systems in most early theories. Second, the cognitive aspects and non-technical influencing factors gain importance in the new works. Of course, it is clear that in the 1990ies, technology was not yet ready to provide a good simulation of the real world. Therefore, most activities to improve systems were focused on that activity including measurements techniques. In the last few years, technology was fast developing and the basic simulation of a virtual environment is now possible also on mobile devices such as the Oculus Quest 2. Although concepts such as immersion or presence are applicable from the past, definitions dealing with those concepts need to capture as well nowadays technology. Meanwhile, systems have proven to provide good real-world simulators and provide users with a feeling of presence and immersion. While there is already activity in standardization which is quite strong and also industry-driven, research in many research disciplines such as telecommunication are still mainly using old questionnaires. These questionnaires are mostly focused on technological/real-world simulation constructs and, thus, not able to differentiate products and services anymore to an extent that is optimal. There are some newer attempts to create new measurement tools for e.g. social aspects of immersive systems [Li, 2019; Toet, 2021]. Measurement scales aiming at capturing differences due to the ability of systems to create realistic simulations are not able to reliably differentiate different systems due to the fact that most systems are providing realistic real-world simulations. To enhance research and industrial development in the field of immersive media, we need definitions of constructs and measurement methods that are appropriate for the current technology even if the newer measurement and definitions are not as often cited/used yet. That will lead to improved development and in the future better immersive media experiences.

One step towards understanding immersive multimedia experiences is reflected by QoMEX 2022. The 14th International Conference on Quality of Multimedia Experience will be held from September 5th to 7th, 2022 in Lippstadt, Germany. It will bring together leading experts from academia and industry to present and discuss current and future research on multimedia quality, Quality of Experience (QoE), and User Experience (UX). It will contribute to excellence in developing multimedia technology towards user well-being and foster the exchange between multidisciplinary communities. One core topic is immersive experiences and technologies as well as new assessment and evaluation methods, and both topics contribute to bringing theories and measurement techniques up to date. For more details, please visit https://qomex2022.itec.aau.at.

References

[Agrawal, 2019] Agrawal, S., Simon, A., Bech, S., Bærentsen, K., Forchhammer, S. (2019). “Defining Immersion: Literature Review and Implications for Research on Immersive Audiovisual Experiences.” In Audio Engineering Society Convention 147. Audio Engineering Society.
[Biocca, 1995] Biocca, F., & Delaney, B. (1995). Immersive virtual reality technology. Communication in the age of virtual reality, 15(32), 10-5555.
[Baños, 2004] Baños, R. M., Botella, C., Alcañiz, M., Liaño, V., Guerrero, B., & Rey, B. (2004). Immersion and emotion: their impact on the sense of presence. Cyberpsychology & behavior, 7(6), 734-741.
[Chuah, 2018] Chuah, S. H. W. (2018). Why and who will adopt extended reality technology? Literature review, synthesis, and future research agenda. Literature Review, Synthesis, and Future Research Agenda (December 13, 2018).
[ITU-T Rec. G.1035, 2021] ITU-T Recommendation G:1035 (2021). Influencing factors on quality of experience for virtual reality services, Int. Telecomm. Union, CH-Geneva.
[ITU-T Rec. H.430.3, 2018] ITU-T Recommendation H:430.3 (2018). Service scenario of immersive live experience (ILE), Int. Telecomm. Union, CH-Geneva.
[ITU-T Rec. P.809, 2018] ITU-T Recommendation P:809 (2018). Subjective evaluation methods for gaming quality, Int. Telecomm. Union, CH-Geneva.
[Li, 2019] Li, J., Kong, Y., Röggla, T., De Simone, F., Ananthanarayan, S., De Ridder, H., … & Cesar, P. (2019, May). Measuring and understanding photo sharing experiences in social Virtual Reality. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (pp. 1-14).
[Milgram, 1995] Milgram, P., Takemura, H., Utsumi, A., & Kishino, F. (1995, December). Augmented reality: A class of displays on the reality-virtuality continuum. In Telemanipulator and telepresence technologies (Vol. 2351, pp. 282-292). International Society for Optics and Photonics.
[Nilsson, 2016] Nilsson, N. C., Nordahl, R., & Serafin, S. (2016). Immersion revisited: a review of existing definitions of immersion and their relation to different theories of presence. Human Technology, 12(2).
[Schubert, 2001] Schubert, T., Friedmann, F., & Regenbrecht, H. (2001). The experience of presence: Factor analytic insights. Presence: Teleoperators & Virtual Environments, 10(3), 266-281.
[Slater, 1993] Slater, M., & Usoh, M. (1993). Representations systems, perceptual position, and presence in immersive virtual environments. Presence: Teleoperators & Virtual Environments, 2(3), 221-233.
[Toet, 2021] Toet, A., Mioch, T., Gunkel, S. N., Niamut, O., & van Erp, J. B. (2021). Holistic Framework for Quality Assessment of Mediated Social Communication.
[Slater, 2018] Slater, M. (2018). Immersion and the illusion of presence in virtual reality. British Journal of Psychology, 109(3), 431-433.
[White Paper, 2012] Qualinet White Paper on Definitions of Quality of Experience (2012). European Network on Quality of Experience in Multimedia Systems and Services (COST Action IC 1003), Patrick Le Callet, Sebastian Möller and Andrew Perkis, eds., Lausanne, Switzerland, Version 1.2, March 2013.
[White Paper, 2020] Perkis, A., Timmerer, C., Baraković, S., Husić, J. B., Bech, S., Bosse, S., … & Zadtootaghaj, S. (2020). QUALINET white paper on definitions of immersive media experience (IMEx). arXiv preprint arXiv:2007.07032.

MPEG Visual Quality Assessment Advisory Group: Overview and Perspectives

Introduction

The perceived visual quality is of utmost importance in the context of visual media compression, such as 2D, 3D, immersive video, and point clouds. The trade-off between compression efficiency and computational/implementation complexity has a crucial impact on the success of a compression scheme. This specifically holds for the development of visual media compression standards which typically aims at maximum compression efficiency using state-of-the-art coding technology. In MPEG, the subjective and objective assessment of visual quality has always been an integral part of the standards development process. Due to the significant effort of formal subjective evaluations, the standardization process typically relies on such formal tests in the starting phase and for verification while in the development phase objective metrics are used. In the new MPEG structure, established in 2020, a dedicated advisory group has been installed for the purpose of providing, maintaining, and developing visual quality assessment methods suitable for use in the standardization process.

This column lays out the scope and tasks of this advisory group and reports on its first achievements and developments. After a brief overview of the organizational structure, current projects are presented, and initial results are presented.

Organizational Structure

MPEG: A Group of Groups in ISO/IEC JTC 1/SC 29

The Moving Pictures Experts Groups (MPEG) is a standardization group that develops standards for coded representation of digital audio, video, 3D Graphics and genomic data. Since its establishment in 1988, the group has produced standards that enable the industry to offer interoperable devices for an enhanced digital media experience [1]. In its new structure as defined in 2020, MPEG is established as a set of Working Groups (WGs) and Advisory Groups (AGs) in Sub-Committee (SC) 29 “Coding of audio, picture, multimedia and hypermedia information” of the Joint Technical Committee (JTC) 1 of ISO (International Standardization Organization) and IEC (International Electrotechnical Commission). The lists of WGs and AGs in SC 29 are shown in Figure 1. Besides MPEG, SC 29 also includes and JPEG (the Joint Photographic Experts Group, WG 1) as well as an Advisory Group for Chair Support Team and Management (AG 1) and an Advisory Group for JPEG and MPEG Collaboration (AG 4), thereby covering the wide field of media compression and transmission. Within this structure, the focus of AG 5 MPEG Visual Quality Assessment (MPEG VQA) is on interaction and collaboration with the working groups directly working on MPEG visual media compression, including WG 4 (Video Coding), WG 5 (JVET), and WG 7 (3D Graphics).

Figure 1. MPEG Advisory Groups (AGs) and Working Groups (WGs) in ISO/IEC JTC 1/SC 29 [2].

Setting the Field for MPEG VQA: The Terms of Reference

SC 29 has defined Terms of Reference (ToR) for all its WGs and AGs. The scope of AG5 MPEG Visual Quality Assessment is to support needs for quality assessment testing in close coordination with the relevant MPEG Working Groups, dealing with visual quality, with the following activities [2]:

  • to assess the visual quality of new technologies to be considered to begin a new standardization project;
  • to contribute to the definition of Calls for Proposals (CfPs) for new standardization work items;
  • to select and design subjective quality evaluation methodologies and objective quality metrics for the assessment of visual coding technologies, e.g., in the context of a Call for Evidence (CfE) and CfP;
  • to contribute to the selection of test material and coding conditions for a CfP;
  • to define the procedures useful to assess the visual quality of the submissions to a CfP;
  • to design and conduct visual quality tests, process, and analyze the raw data, and make the report of the evaluation results available conclusively;
  • to support in the assessment of the final status of a standard, verifying its performance compared to the existing standard(s);
  • to maintain databases of test material;
  • to recommend guidelines for selection of testing laboratories (verifying their current capabilities);
  • to liaise with ITU and other relevant organizations on the creation of new Quality Assessment standards or the improvement of the existing ones.

Way of Working

Given the fact that MPEG Visual Quality Assessment is an advisory group, and given the above-mentioned ToR, the goal of AG5 is not to produce new standards on its own. Instead, AG5 strives to communicate and collaborate with relevant SDOs in the field, applying existing standards and recommendations and potentially contributing to further development by reporting results and working practices to these groups.

In terms of meetings, AG5 adopts the common MPEG meeting cycle of typically four MPEG AG/WG meetings per year, which -due to the ongoing pandemic situation- so far have all been held online. The meetings are held to review the progress of work, agree on recommendations, and decide on further plans. During the meeting, AG5 closely collaborates with the MPEG WGs and conducts experts viewing sessions in various MPEG standardization activities. The focus of such activities includes the preparation of new standardization projects, the performance verification of completed projects, as well as support of ongoing projects, where frequent subjective evaluation results are required in the decision process. Between meetings, AG5 work is carried out in the context of Ad-hoc Groups (AhGs) which are established from meeting to meeting with well-defined tasks.

Focus Groups

Due to the broad field of ongoing standardization activities, AG5 has established so-called focus groups which cover the relevant fields of development. The focus group structure and the appointed chairs are shown in Figure 2.

Figure 2. MPEG VQA focus groups.

The focus groups are mandated to coordinate with other relevant MPEG groups and other standardization bodies on activities of mutual interest, and to facilitate the formal and informal assessment of the visual media type under their consideration. The focus groups are described as follows:

  • Standard Dynamic Range Video (SDR): This is the ‘classical’ video quality assessment domain. The group strives to support, design, and conduct testing activities on SDR content at any resolution and coding condition, and to maintain existing testing methods and best practice procedures.
  • High Dynamic Range Video (HDR): The focus group on HDR strives to facilitate the assessment of HDR video quality using different devices with combinations of spatial resolution, colour gamut, and dynamic range, and further to maintain and refine methodologies for measuring HDR video quality. A specific focus of the starting phase was on the preparation of the verification tests for Versatile Video Coding (VVC, ISO/IEC 23090-3 / ITU-T H.266).
  • 360° Video: The omnidirectional characteristics of 360° video content have to be taken into account for visual quality assessment. The groups’ focus is on continuing the development of 360° video quality assessment methodologies, including those using head-mounted devices. Like with the focus group on HDR, the verification tests for VVC had priority in the starting phase.
  • Immersive Video (MPEG Immersive Video, MIV): Since MIV allows for movement of the user at six degrees of freedom, the assessment of this type of content bears even more challenges and the variability of the user’s perception of the media has to be factored in. Given the absence of an original reference or ground truth, for the synthetically rendered scene, objective evaluation with conventional objective metrics is a challenge. The focus group strives to develop appropriate subjective expert viewing methods to support the development process of the standard and also evaluates and improve objective metrics in the context of MIV.

Ad hoc Groups

AG5 currently has three AhGs defined which are briefly presented with their mandates below:

  • Quality of immersive visual media (chaired by Christian Timmerer of AAU/Bitmovin, Joel Jung of Tencent, and Aljosa Smolic of Trinity College Dublin): Study Draft Overview of Quality Metrics and Methodologies for Immersive Visual Media (AG 05/N00013) with respect to new updates presented at this meeting; Solicit inputs for subjective evaluation methods and objective metrics for immersive video (e.g., 360, MIV, V-PCC, G-PCC); Organize public online workshop(s) on Quality of Immersive Media: Assessment and Metrics.
  • Learning-based quality metrics for 2D video (chaired by Yan Ye of Alibaba and Mathias Wien of RWTH Aachen University): Compile and maintain a list of video databases suitable and available to be used in AG5’s studies; Compile a list of learning-based quality metrics for 2D video to be studied; Evaluate the correlation between the learning-based quality metrics and subjective quality scores in the databases;
  • Guidelines for subjective visual quality evaluation (chaired by Mathias Wien of RWTH Aachen University, Lu Yu of Zhejiang University and Convenor of MPEG Video Coding (ISO/IEC JTC1 SC29/WG4), and Joel Jung of Tencent): Prepare the third draft of the Guidelines for Verification Testing of Visual Media Specifications; Prepare the second draft of the Guidelines for remote experts viewing test methods for use in the context of Ad-hoc Groups, and Core or Exploration Experiments.

AG 5 First Achievements

Reports and Guidelines

The results of the work of the AhGs are aggregated in AG5 output documents which are public (or will become public soon) in order to allow for feedback also from outside of the MPEG community.

The AhG on “Quality for Immersive Visual Media” maintains a report “Overview of Quality Metrics and Methodologies for Immersive Visual Media” [3] which documents the state-of-the-art in the field and shall serve as a reference for MPEG working groups in their work on compression standards in this domain. The AhG further organizes a public workshop on “Quality of Immersive Media: Assessment and Metrics” which takes place in an online form at the beginning of October 2021 [4]. The scope of this workshop is to raise awareness about MPEG efforts in the context of quality of immersive visual media and to invite experts outside of MPEG to present new techniques relevant to the scope of this workshop.

The AhG on “Guidelines for Subjective Visual Quality Evaluation” currently develops two guideline documents supporting the MPEG standardization work. The “Guidelines for Verification Testing of Visual Media Specifications” [5] define the process of assessing the performance of a completed standard after its publication. The concept of verification testing has already been established MPEG working practice for its media compression standards since the 1990ties. The document is intended to formalize the process, describe the steps and conditions for the verification tests, and set the requirements to meet MPEG procedural quality expectations.

The AhG has further released a first draft of “Guidelines for Remote Experts Viewing Sessions” with the intention to establish a formalized procedure for ad-hoc generation subjective test results as input to the standards development process [6]. This activity has been driven by the ongoing pandemic situation which forced MPEG to continue its work in virtual online meetings since early 2020. The procedure for remote experts viewing is intended to be applied during the (online) meeting phase or in the AhG phase and to provide measurable and reproducible subjective results in order to be input to the decision-making process in the project under consideration.

Verification Testing

With Essential Video Coding (EVC) [7], Low Complexity Enhancement Video Coding (LCEVC) [8] of ISO/IEC, and the joint coding standard Versatile Video Coding (VVC) of ISO/IEC and ITU-T [9][10], a significant number of new video coding standards has been recently released. Since its first meeting in October 2020, AG5 has been engaged in the preparation and conduction of verification tests for these video coding specifications. Further verification tests for MPEG Immersive Video (MIV) and Video-based Point Cloud Compression (V-PCC) [11] are under preparation and more are to come. Results of the verification test activities which have been completed in the first year of AG5 are summarized in the following subsections. All reported results have been achieved by formal subjective assessments according to established assessment protocols [12][13] and performed by qualified test laboratories. The bitstreams were generated with reference software encoders of the specification under consideration using established encoder configurations with comparable settings for both, the reference and the evaluated coding schemes. It has to be noted that all testing had to be done under the constrained conditions of the ongoing pandemic situation which induced an additional challenge for the test laboratories in charge.

MPEG-5 Part 1: Essential Video Coding (EVC)

The EVC standard was developed with the goal to provide a royalty-free Baseline profile and a Main profile with higher compression efficiency compared to High-Efficiency Video Coding (HEVC) [15][16][17]. Verification tests were conducted for Standard Dynamic Range (SDR) and high dynamic range (HDR, BT.2100 PQ) video content at both, HD (1920×1080 pixels) and UHD (3840×2160 pixels) resolution. The tests revealed around 40% bitrate savings at a comparable visual quality for the Main profile when compared to HEVC, and around 36% bitrate saving for the Baseline profile when compared to Advanced Video Coding (AVC) [18][19], both for SDR content [20]. For HDR PQ content, the Main profile provided around 35% bitrate savings for both resolutions [21].

MPEG-5 Part 2: Low-Complexity Enhancement Video Coding (LCEVC)

The LCEVC standard follows a layered approach where an LCEVC enhancement layer is added to a lower resolution base layer of an existing codec in order to achieve the full resolution video [22]. Since the base layer codec operates at a lower resolution and the separate enhancement layer decoding process is relatively lightweight, the computational complexity of the decoding process is typically lower compared to decoding of the full resolution with the base layer codec. The addition of the enhancement layer would typically be provided on top of the established base layer decoder implementation by an additional decoding entity, e.g., in a browser.

For verification testing, LCEVC was evaluated using AVC, HEVC, EVC, and VVC base layer bitstreams at half resolution, and comparing the performance to the respective schemes with full resolution coding as well half-resolution coding with a simple upsampling tool. For UHD resolution, the bitrate savings for LCEVC at comparable visual quality were at 46% when compared to full resolution AVC and 31% when compared to full resolution HEVC. The comparison to the more recent and more efficient EVC and VVC coding schemes led to partially overlapping confidence intervals of the subjective scores of the test subjects. The curves still revealed some benefits for the application of LCEVC. The gains compared to half-resolution coding with simple upsampling provided approximately 28%, 34%, 38%, and 33% bitrate savings at comparable visual quality, demonstrating the benefit of LCEVC enhancement layer coding compared to straight-forward plain upsampling [23].

MPEG-I Part 3 / ITU-T H.266: Versatile Video Coding (VVC)

VVC is the most recent video coding standard in the historical line of joint specifications of ISO/IEC and ITU-T, such as AVC and HEVC. The development focus for VVC was on compression efficiency improvement at a moderate increase of decode complexity as well as the versatility of the design [24][25]. Versatility features include tools designed to address HDR, WCG, resolution-adaptive multi-rate video streaming services, 360-degree immersive video, bitstream extraction and merging, temporal scalability, gradual decoding refresh, and multilayer coding to deliver layered video content to support application features such as multiview, alpha maps, depth maps, and spatial and quality scalability.

A series of verification tests have been conducted covering SDR UHD and HD, HDR PQ and HLG, as well as 360° video contents [26][27][28]. An early open-source encoder (VVenC, [14]) was additionally assessed in some categories. For SDR coding, both, the VVC reference software (VTM) and the open-source VVenC were evaluated against the HEVC reference software (HM). The results revealed bit rate savings of around 46% (SDR UHD, VTM and VVenC), 50% (SDR HD, VTM and VVenC), 49% (HDR UHD, PQ and HLG), 52%, and 50-56% (360° with different projection formats) at a similar visual quality compared to HEVC. In Figure 3, pooled MOS (Mean Opinion Score) over bit rate points for the mentioned categories are provided. The MOS values range from 10 (imperceptible impairments) down to 0 (everywhere severely annoying impairments). Pooling was done by computing the geometric mean of the bitrates and the arithmetic mean of the MOS scores across the test sequences of each test category. The results reveal a consistent benefit of VVC over its predecessor HEVC in terms of visual quality over the required bitrate.

Figure 3. Pooled MOS over bitrate plots of the VVC verification tests for the SDR UHD, SDR HD, HDR HLG, and 360° video test categories. Curves cited from [26][27][28].

Summary

This column presented an overview of the organizational structure and the activities of the Advisory Group on MPEG Visual Quality Assessment, ISO/IEC JTC 1/SC 29/AG 5, which has been formed about one year ago. The work items of AG5 include the application, documentation, evaluation, and improvement of objective quality metrics and subjective quality assessment procedures. In its first year of existence, the group has produced an overview on immersive quality metrics, draft guidelines for verification tests and for remote experts viewing sessions as well as reports of formal subjective quality assessments for the verification tests of EVC, LCEVC, and VVC. The work of the group will continue towards studying and developing quality metrics suitable for the assessment tasks emerging by the development of the various MPEG visual media coding standards and towards subjective quality evaluation in upcoming and future verification tests and new standardization projects.

References

[1] MPEG website, https://www.mpegstandards.org/.
[2] ISO/IEC JTC1 SC29, “Terms of Reference of SC 29/WGs and AGs,” Doc. SC29N19020, July 2020.
[3] ISO/IEC JTC1 SC29/AG5 MPEG VQA, “Draft Overview of Quality Metrics and Methodologies for Immersive Visual Media (v2)”, doc. AG5N13, 2nd meeting: January 2021.
[4] MPEG AG 5 Workshop on Quality of Immersive Media: Assessment and Metrics, https://multimediacommunication.blogspot.com/2021/08/mpeg-ag-5-workshop-on-quality-of.html, October 5th, 2021.
[5] ISO/IEC JTC1 SC29/AG5 MPEG VQA, “Guidelines for Verification Testing of Visual Media Specifications (draft 2)”, doc. AG5N30, 4th meeting: July 2021.
[6] ISO/IEC JTC1 SC29/AG5 MPEG VQA, “Guidelines for remote experts viewing sessions (draft 1)”, doc. AG5N31, 4th meeting: July 2021.
[7] ISO/IEC 23094-1:2020, “Information technology — General video coding — Part 1: Essential video coding”, October 2020.
[8] ISO/IEC 23094-2, “Information technology – General video coding — Part 2: Low complexity enhancement video coding”, September 2021.
[9] ISO/IEC 23090-3:2021, “Information technology — Coded representation of immersive media — Part 3: Versatile video coding”, February 2021.
[10] ITU-T H.266, “Versatile Video Coding“, August 2020. https://www.itu.int/rec/recommendation.asp?lang=en&parent=T-REC-H.266-202008-I.
[11] ISO/IEC 23090-5:2021, “Information technology — Coded representation of immersive media — Part 5: Visual volumetric video-based coding (V3C) and video-based point cloud compression (V-PCC)”, June 2021.
[12] ITU-T P.910 (2008), Subjective video quality assessment methods for multimedia applications.
[13] ITU-R BT.500-14 (2019), Methodologies for the subjective assessment of the quality of television images.
[14] Fraunhofer HHI VVenC software repository. [Online]. Available: https://github.com/fraunhoferhhi/vvenc.
[15] K. Choi, J. Chen, D. Rusanovskyy, K.-P. Choi and E. S. Jang, “An overview of the MPEG-5 essential video coding standard [standards in a nutshell]”, IEEE Signal Process. Mag., vol. 37, no. 3, pp. 160-167, May 2020.
[16] ISO/IEC 23008-2:2020, “Information technology — High efficiency coding and media delivery in heterogeneous environments — Part 2: High efficiency video coding”, August 2020.
[17] ITU-T H.265, “High Efficiency Video Coding”, August 2021.
[18] ISO/IEC 14496-10:2020, “Information technology — Coding of audio-visual objects — Part 10: Advanced video coding”, December 2020.
[19] ITU-T H.264, “Advanced Video Coding”, August 2021.
[20] ISO/IEC JTC1 SC29/WG4, “Report on Essential Video Coding compression performance verification testing for SDR Content”, doc WG4N47, 2nd meeting: January 2021.
[21] ISO/IEC JTC1 SC29/WG4, “Report on Essential Video Coding compression performance verification testing for HDR/WCG content”, doc WG4N30, 1st meeting: October 2020.
[22] G. Meardi et al., “MPEG-5—Part 2: Low complexity enhancement video coding (LCEVC): Overview and performance evaluation”, Proc. SPIE, vol. 11510, pp. 238-257, Aug. 2020.
[23] ISO/IEC JTC1 SC29/WG4, “Verification Test Report on the Compression Performance of Low Complexity Enhancement Video Coding”, doc. WG4N76, 3rd meeting: April 2020.
[24] Benjamin Bross, Jianle Chen, Jens-Rainer Ohm, Gary J. Sullivan, and Ye-Kui Wang, “Developments in International Video Coding Standardization After AVC, With an Overview of Versatile Video Coding (VVC)”, Proceedings of the IEEE, Vol. 109, Issue 9, pp. 1463–1493, doi 10.1109/JPROC.2020.3043399, Sept. 2021 (open access publication), available at https://ieeexplore.ieee.org/document/9328514.
[25] Benjamin Bross, Ye-Kui Wang, Yan Ye, Shan Liu, Gary J. Sullivan, and Jens-Rainer Ohm, “Overview of the Versatile Video Coding (VVC) Standard and its Applications”, IEEE Trans. Circuits & Systs. for Video Technol. (open access publication), available online at https://ieeexplore.ieee.org/document/9395142.
[26] Mathias Wien and Vittorio Baroncini, “VVC Verification Test Report for Ultra High Definition (UHD) Standard Dynamic Range (SDR) Video Content”, doc. JVET-T2020 of ITU-T/ISO/IEC Joint Video Experts Team (JVET), 20th meeting: October 2020.
[27] Mathias Wien and Vittorio Baroncini, “VVC Verification Test Report for High Definition (HD) and 360° Standard Dynamic Range (SDR) Video Content”, doc. JVET-V2020 of ITU-T/ISO/IEC Joint Video Experts Team (JVET), 22nd meeting: April 2021.
[28] Mathias Wien and Vittorio Baroncini, “VVC verification test report for high dynamic range video content”, doc. JVET-W2020 of ITU-T/ISO/IEC Joint Video Experts Team (JVET), 23rd meeting: July 2021.

ITU-T Standardization Activities Targeting Gaming Quality of Experience

Motivation for Research in the Gaming Domain

The gaming industry has eminently managed to intrinsically motivate users to interact with their services. According to the latest report of Newzoo, there will be an estimated total of 2.7 billion players across the globe by the end of 2020. The global games market will generate revenues of $159.3 billion in 2020 [1]. This surpasses the movie industry (box offices and streaming services) by a factor of four and almost three times the music industry market in value [2].

The rapidly growing domain of online gaming emerged in the late 1990s and early 2000s allowing social relatedness to a great number of players. During traditional online gaming, typically, the game logic and the game user interface are locally executed and rendered on the player’s hardware. The client device is connected via the internet to a game server to exchange information influencing the game state, which is then shared and synchronized with all other players connected to the server. However, in 2009 a new concept called cloud gaming emerged that is comparable to the rise of Netflix for video consumption and Spotify for music consumption. On the contrary to traditional online gaming, cloud gaming is characterized by the execution of the game logic, rendering of the virtual scene, and video encoding on a cloud server, while the player’s client is solely responsible for video decoding and capturing of client input [3].

For online gaming and cloud gaming services, in contrast to applications such as voice, video, and web browsing, little information existed on factors influencing the Quality of Experience (QoE) of online video games, on subjective methods for assessing gaming QoE, or on instrumental prediction models to plan and manage QoE during service set-up and operation. For this reason, Study Group (SG) 12 of the Telecommunication Standardization Sector of the International Telecommunication Union (ITU-T) has decided to work on these three interlinked research tasks [4]. This was especially required since the evaluation of gaming applications is fundamentally different compared to task-oriented human-machine interactions. Traditional aspects such as effectiveness and efficiency as part of usability cannot be directly applied to gaming applications like a game without any challenges and time passing would result in boredom, and thus, a bad player experience (PX). The absence of standardized assessment methods as well as knowledge about the quantitative and qualitative impact of influence factors resulted in a situation where many researchers tended to use their own self-developed research methods. This makes collaborative work through reliably, valid, and comparable research very difficult. Therefore, it is the aim of this report to provide an overview of the achievements reached by ITU-T standardization activities targeting gaming QoE.

Theory of Gaming QoE

As a basis for the gaming research carried out, in 2013 a taxonomy of gaming QoE aspects was proposed by Möller et al. [5]. The taxonomy is divided into two layers of which the top layer contains various influencing factors grouped into user (also human), system (also content), and context factors. The bottom layer consists of game-related aspects including hedonic concepts such as appeal, pragmatic concepts such as learnability and intuitivity (part of playing quality which can be considered as a kind of game usability), and finally, the interaction quality. The latter is composed of output quality (e.g., audio and video quality), as well as input quality and interactive behaviour. Interaction quality can be understood as the playability of a game, i.e., the degree to which all functional and structural elements of a game (hardware and software) enable a positive PX. The second part of the bottom layer summarized concepts related to the PX such as immersion (see [6]), positive and negative affect, as well as the well-known concept of flow that describes an equilibrium between requirements (i.e., challenges) and abilities (i.e., competence). Consequently, based on the theory depicted in the taxonomy, the question arises which of these aspects are relevant (i.e., dominant), how they can be assessed, and to which extent they are impacted by the influencing factors.

Fig. 1: Taxonomy of gaming QoE aspects. Upper panel: Influence factors and interaction performance aspects; lower panel: quality features (cf. [5]).

Introduction to Standardization Activities

Building upon this theory, the SG 12 of the ITU-T has decided during the 2013-2016 Study Period to start work on three new work items called P.GAME, G.QoE-gaming, and G.OMG. However, there are also other related activities at the ITU-T summarized in Fig. 2 about evaluation methods (P.CrowdG), and gaming QoE modelling activities (G.OMMOG and P.BBQCG).

Fig. 2: Overview of ITU-T SG12 recommendations and on-going work items related to gaming services.

The efforts on the three initial work items continued during the 2017-2020 Study Period resulting in the recommendations G.1032, P.809, and G.1072, for which an overview will be given in this section.

ITU-T Rec. G.1032 (G.QoE-gaming)

The ITU-T Rec. G.1032 aims at identifying the factors which potentially influence gaming QoE. For this purpose, the Recommendation provides an overview table and then roughly classifies the influence factors into (A) human, (B) system, and (C) context influence factors. This classification is based on [7] but is now detailed with respect to cloud and online gaming services. Furthermore, the recommendation considers whether an influencing factor carries an influence mainly in a passive viewing-and-listening scenario, in an interactive online gaming scenario, or in an interactive cloud gaming scenario. This classification is helpful to evaluators to decide which type of impact may be evaluated with which type of text paradigm [4]. An overview of the influencing factors identified for the ITU-T Rec. G.1032 is presented in Fig. 3. For subjective user studies, in most cases the human and context factors should be controlled and their influence should be reduced as much as possible. For example, even though it might be a highly impactful aspect of today’s gaming domain, within the scope of the ITU-T cloud gaming modelling activities, only single-player user studies are conducted to reduce the impact of social aspects which are very difficult to control. On the other hand, as network operators and service providers are the intended stakeholders of gaming QoE models, the relevant system factors must be included in the development process of the models, in particular the game content as well as network and encoding parameters.

Fig. 3: Overview of influencing factors on gaming QoE summarized in ITU-T Rec. G.1032 (cf. [3]).

ITU-T Rec. P.809 (P.GAME)

The aim of the ITU-T Rec. P.809 is to describe subjective evaluation methods for gaming QoE. Since there is no single standardized evaluation method available that would cover all aspects of gaming QoE, the recommendation mainly summarizes the state of the art of subjective evaluation methods in order to help to choose suitable methods to conduct subjective experiments, depending on the purpose of the experiment. In its main body, the draft consists of five parts: (A) Definitions for games considered in the Recommendation, (B) definitions of QoE aspects relevant in gaming, (C) a description of test paradigms, (D) a description of the general experimental set-up, recommendations regarding passive viewing-and-listening tests and interactive tests, and (E) a description of questionnaires to be used for gaming QoE evaluation. It is amended by two paragraphs regarding performance and physiological response measurements and by (non-normative) appendices illustrating the questionnaires, as well as an extensive list of literature references [4].

Fundamentally, the ITU-T Rec. P.809 defines two test paradigms to assess gaming quality:

  • Passive tests with predefined audio-visual stimuli passively observed by a participant.
  • Interactive tests with game scenarios interactively played by a participant.

The passive paradigm can be used for gaming quality assessment when the impairment does not influence the interaction of players. This method suggests a short stimulus duration of 30s which allows investigating a great number of encoding conditions while reducing the influence of user behaviours on the stimulus due to the absence of their interaction. Even for passive tests, as the subjective ratings will be merged with those derived from interactive tests for QoE model developments, it is recommended to give instruction about the game rules and objectives to allow participants to have similar knowledge of the game. The instruction should also explain the difference between video quality and graphic quality (e.g., graphical details such as abstract and realistic graphics), as this is one of the common mistakes of participants in video quality assessment of gaming content.

The interactive test should be used when other quality features such as interaction quality, playing quality, immersion, and flow are under investigation. While for the interaction quality, a duration of 90s is proposed, a longer duration of 5-10min is suggested in the case of research targeting engagement concepts such as flow. Finally, the recommendation provides information about the selection of game scenarios as stimulus material for both test paradigms, e.g., ability to provide repetitive scenarios, balanced difficulty, representative scenes in terms of encoding complexity, and avoiding ethically questionable content.

ITU-T Rec. G.1072 (G.OMG)

The quality management of gaming services would require quantitative prediction models. Such models should be able to predict either “overall quality” (e.g., in terms of a Mean Opinion Score), or individual QoE aspects from characteristics of the system, potentially considering the player characteristics and the usage context. ITU-T Rec. G.1072 aims at the development of quality models for cloud gaming services based on the impact of impairments introduced by typical Internet Protocol (IP) networks on the quality experienced by players. G.1072 is a network planning tool that estimates the gaming QoE based on the assumption of network and encoding parameters as well as game content.

The impairment factors are derived from subjective ratings of the corresponding quality aspects, e.g., spatial video quality or interaction quality, and modelled by non-linear curve fitting. For the prediction of the overall score, linear regression is used. To create the impairment factors and regression, a data transformation from the MOS values of each test condition to the R-scale was performed, similar to the well-known E-model [8]. The R-scale, which results from an s-shaped conversion of the MOS scale, promises benefits regarding the additivity of the impairments and compensation for the fact that participants tend to avoid using the extremes of rating scales [3].

As the impact of the input parameters, e.g. delay, was shown to be highly content-dependent, the model used two modes. If no assumption on a game sensitivity class towards degradations is available to the user of the model (e.g. a network provider), the “default” mode of operation should be used that considers the highest (sensitivity) game class. The “default” mode of operation will result in a pessimistic quality prediction for games that are not of high complexity and sensitivity. If the user of the model can make an assumption about the game class (e.g. a service provider), the “extended” mode can predict the quality with a higher degree of accuracy based on the assigned game classes.

On-going Activities

While the three recommendations provide a basis for researchers, as well as network operators and cloud gaming service providers towards improving gaming QoE, the standardization activities continue by initiating new work items focusing on QoE assessment methods and gaming QoE model development for cloud gaming and online gaming applications. Thus, three work items have been established within the past two years.

ITU-T P.BBQCG

P.BBQCG is a work item that aims at the development of a bitstream model predicting cloud gaming QoE. Thus, the model will benefit from the bitstream information, from header and payload of packets, to reach a higher accuracy of audiovisual quality prediction, compared to G.1072. In addition, three different types of codecs and a wider range of network parameters will be considered to develop a generalizable model. The model will be trained and validated for H.264, H.265, and AV1 video codecs and video resolutions up to 4K. For the development of the model, two paradigms of passive and interactive will be followed. The passive paradigm will be considered to cover a high range of encoding parameters, while the interactive paradigm will cover the network parameters that might strongly influence the interaction of players with the game.

ITU-T P.CrowdG

A gaming QoE study is per se a challenging task on its own due to the multidimensionality of the QoE concept and a large number of influence factors. However, it becomes even more challenging if the test would follow a crowdsourcing approach which is of particular interest in times of the COVID-19 pandemic or if subjective ratings are required from a highly diverse audience, e.g., for the development or investigation of questionnaires. The aim of the P.CrowdG work item is to develop a framework that describes the best practices and guidelines that have to be considered for gaming QoE assessment using a crowdsourcing approach. In particular, the crowd gaming framework provides the means to ensure reliable and valid results despite the absence of an experimenter, controlled network, and visual observation of test participants had to be considered. In addition to the crowd game framework, guidelines will be given that provide recommendations to ensure collecting valid and reliable results, addressing issues such as how to make sure workers put enough focus on the gaming and rating tasks. While a possible framework for interactive tests of simple web-based games is already presented in [9], more work is required to complete the ITU-T work item for more advanced setups and passive tests.

ITU-T G.OMMOG

G.OMMOG is a work item that focuses on the development of an opinion model predicting gaming Quality of Experience (QoE) for mobile online gaming services. The work item is a possible extension of the ITU-T Rec. G.1072. In contrast to G.1072, the games are not executed on a cloud server but on a gaming server that exchanges game states with the user’s clients instead of a video stream. This more traditional gaming concept represents a very popular service, especially considering multiplayer gaming such as recently published AAA titles of the Multiplayer Online Battle Arena (MOBA) and battle royal genres.

So far, it is decided to follow a similar model structure to ITU-T Rec. G.1072. However, the component of spatial video quality, which was a major part of G.1072, will be removed, and the corresponding game type information will not be used. In addition, for the development of the model, it was decided to investigate the impact of variable delay and packet loss burst, especially as their interaction can have a high impact on the gaming QoE. It is assumed that more variability of these factors and their interplay will weaken the error handling of mobile online gaming services. Due to missing information on the server caused by packet loss or strong delays, the gameplay is assumed to be not smooth anymore (in the gaming domain, this is called ‘rubber banding’), which will lead to reduced temporal video quality.

About ITU-T SG12

ITU-T Study Group 12 is the expert group responsible for the development of international standards (ITU-T Recommendations) on performance, quality of service (QoS), and quality of experience (QoE). This work spans the full spectrum of terminals, networks, and services, ranging from speech over fixed circuit-switched networks to multimedia applications over mobile and packet-based networks.

In this article, the previous achievements of the ITU-T SG12 with respect to gaming QoE are described. The focus was in particular on subjective assessment methods, influencing factors, and modelling of gaming QoE. We hope that this information will significantly improve the work and research in this domain by enabling more reliable, comparable, and valid findings. Lastly, the report also points out many on-going activities in this rapidly changing domain, to which everyone is gladly invited to participate.

More information about the SG12, which will host its next E-meeting from 4-13 May 2021, can be found at ITU Study Group (SG) 12.

For more information about the gaming activities described in this report, please contact Sebastian Möller (sebastian.moeller@tu-berlin.de).

Acknowledgement

The authors would like to thank all colleagues of ITU-T Study Group 12, as well as of the Qualinet gaming Task Force, for their support. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871793 and No 643072 as well as by the German Research Foundation (DFG) within project MO 1038/21-1.

References

[1] T. Wijman, The World’s 2.7 Billion Gamers Will Spend $159.3 Billion on Games in 2020; The Market Will Surpass $200 Billion by 2023, 2020.

[2] S. Stewart, Video Game Industry Silently Taking Over Entertainment World, 2019.

[3] S. Schmidt, Assessing the Quality of Experience of Cloud Gaming Services, Ph.D. dissertation, Technische Universität Berlin, 2021.

[4] S. Möller, S. Schmidt, and S. Zadtootaghaj, “New ITU-T Standards for Gaming QoE Evaluation and Management”, in 2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX), IEEE, 2018.

[5] S. Möller, S. Schmidt, and J. Beyer, “Gaming Taxonomy: An Overview of Concepts and Evaluation Methods for Computer Gaming QoE”, in 2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX), IEEE, 2013.

[6] A. Perkis and C. Timmerer, Eds., QUALINET White Paper on Definitions of Immersive Media Experience (IMEx), European Network on Quality of Experience in Multimedia Systems and Services, 14th QUALINET meeting, 2020.

[7] P. Le Callet, S. Möller, and A. Perkis, Eds, Qualinet White Paper on Definitions of Quality of Experience, COST Action IC 1003, 2013.

[8] ITU-T Recommendation G.107, The E-model: A Computational Model for Use in Transmission Planning. Geneva: International Telecommunication Union, 2015.

[9] S. Schmidt, B. Naderi, S. S. Sabet, S. Zadtootaghaj, and S. Möller, “Assessing Interactive Gaming Quality of Experience Using a Crowdsourcing Approach”, in 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), IEEE, 2020.

Immersive Media Experiences – Why finding Consensus is Important

An introduction to the QUALINET White Paper on Definitions of Immersive Media Experience (IMEx) [1].

Introduction

Immersive media are reshaping the way users experience reality. They are increasingly incorporated across enterprise and consumer sectors to offer experiential solutions to a diverse range of industries. Current technologies that afford an immersive media experience (IMEx) include Augmented Reality (AR), Virtual Reality (VR), Mixed Reality (MR), and 360-degree video. Popular uses can be found in enhancing connectivity applications, supporting knowledge-based tasks, learning & skill development, as well as adding immersive and interactive dimensions to the retail, business, and entertainment industries. Whereas the evolution of immersive media can be traced over the past 50 years, its current popularity boost is primarily owed to significant advances in the last decade brought about by improved connectivity, superior computing, and device capabilities. In specific, advancements witnessed in display technologies, visualizations, interaction & tracking devices, recognition technologies, platform development, new media formats, and increasing user demand for real-time & dynamic content across platforms.

Though still in its infancy, the immersive economy is growing into a dynamic and confident sector. Being an emerging sector, it is hard to find official data, but some estimations project the immersive media global market size to continue its upward growth at around 30% CAGR to reach USD180 Bn by 2022 [2,3]. Country-wise, the USA is expected to secure 1/3rd of the global immersive media market share followed by China, Japan, Germany, and the UK as likely immersive media markets where significant spending is anticipated. Consumer products and devices are poised to be the largest contributing segment. The growth in immersive consumer products is expected to continue as Head-Mounted Displays (HMD) become commonplace and interest in mobile augmented reality increase [4]. However, immersive media are no longer just a pursuit of alternative display technologies but pushing towards holistic ecosystems that seek contributions from hardware manufacturers, application & platform developers, content producers, and users. These ecosystems are making way for sophisticated content creation available on platforms that allow user participation, interaction, and skill integration through advanced tools.

Immersive media experience (IMEx), today, is not only how users view media but in fact a transformative way to consume media altogether. They draw considerable interdisciplinary interest from multiple disciplines. As stakeholders increase, the need for clarity and coherence on definitions and concepts become all the more important. In this article, we provide an overview and a brief survey of some of the key definitions that are central to IMEx including its Quality of Experience (QoE), application areas, influencing factors, and assessment methods. Our aim is to enable some clarity and initiate consensus, on topics related to IMEx that can be useful for researchers and practitioners working both inside academia and the industry.

Why understand IMEx?

IMEx combines reality with technology enabling emplaced multimedia experiences of standard media (film, photographic, or animated) as well as synthetic and interactive environments for users. They utilize visual, auditory, and haptic feedback to stimulate physical senses such that users psychologically feel immersed within these multidimensional media environments. This sense of “being there” is also referred to as presence.

As mentioned earlier, the enthusiasm for IMEx is mainly driven by the gaming, entertainment, retail, healthcare, digital marketing, and skill training industries. So far, research has tilted favourably towards innovation, with a particular interest in image capture, recognition, mapping, and display technologies over the past few years. However, the prevalence of IMEx has also ushered in a plethora of definitions, frameworks, and models to understand the psychological and phenomenological concepts associated with these media forms. Central, of course, are the closely related concepts of immersion and presence, which are interpreted varyingly across fields; for example, when one moves from literature to narratology to computer sciences. However, with immersive media, these three separate fields come together inside interactive digital narrative applications where immersive narratives are used to solve real-world problems. This is when noticeable interdisciplinary differences regarding definitions, scope, and constituents require urgent redressal to achieve a coherent understanding of the used concepts. Such consensus is vital for giving directionality to the future of immersive media that can be shared by all.

A White Paper on IMEx

A recent White Paper [1] by QUALINET, the European Network on Quality of Experience in Multimedia Systems and Services [5], is a contribution to the discussions related to Immersive Media Experience (IMEx). It attempts to build consensus around ideas and concepts that are related to IMEx but originate from multidisciplinary groups with a joint interest in multimedia experiences.

The QUALINET community aims at extending the notion of network-centric Quality of Service (QoS) in multimedia systems, by relying on the concept of Quality of Experience (QoE). The main scientific objective is the development of methodologies for subjective and objective quality metrics considering current and new trends in multimedia communication systems as witnessed by the appearance of new types of content and interactions.

The white paper was created based on an activity launched at the 13th QUALINET meeting on June 4, 2019, in Berlin as part of Task Force 7, Immersive Media Experiences (IMEx). The paper received contributions from 44 authors under 10 section leads, which were consolidated into a first draft and released among all section leads and editors for internal review. After incorporating the feedback from all section leads, the editors initially released the White Paper within the QUALINET community for review. Following feedback from QUALINET at large, the editors distributed the White Paper widely for an open, public community review (e.g., research communities/committees in ACM and IEEE, standards development organizations, various open email reflectors related to this topic). The feedback received from this public consultation process resulted in the final version which has been approved during the 14th QUALINET meeting on May 25, 2020.

Understanding the White Paper

The White Paper surveys definitions and concepts that contribute to IMEx. It describes the Quality of Experience (QoE) for immersive media by establishing a relationship between the concepts of QoE and IMEx. This article provides an outline of these concepts by looking at:

  • Survey of definitions of immersion and presence discusses various frameworks and conceptual models that are most relevant to these phenomena in terms of multimedia experiences.
  • Definition of immersive media experience describes experiential determinants for IMEx characterized through its various technological contexts.
  • Quality of experience for immersive media applies existing QoE concepts to understand the user-centric subjective feelings of “a sense of being there”, “a sense of agency”, and “cybersickness”.
  • The application area for immersive media experience presents an overview of immersive technologies in use within gaming, omnidirectional content, interactive storytelling, health, entertainment, and communications.
  • Influencing factors on immersive media experience look at the three existing influence factors on QoE with a pronounced emphasis on the human influence factor as of very high relevance to IMEx.
  • Assessment of immersive media experience underscores the importance of proper examination of multimedia systems, including IMEx, by highlighting three methods currently in use, i.e., subjective, behavioural, and psychophysiological.
  • Standardization activities discuss the three clusters of activities currently underway to achieve interoperability for IMEx: (i) data representation & formats; (ii) guidelines, systems standards, & APIs; and (iii) Quality of Experience (QoE).

Conclusions

Immersive media have significantly changed the use and experience of new digital media. These innovative technologies transcend traditional formats and present new ways to interact with digital information inside synthetic or enhanced realities, which include VR, AR, MR, and haptic communications. Earlier the need for a multidisciplinary consensus was discussed vis-à-vis definitions of IMEx. The QUALINET white paper provides such “a toolbox of definitions” for IMEx. It stands out for bringing together insights from multimedia groups spread across academia and industry, specifically the Video Quality Experts Group (VQEG) and the Immersive Media Group (IMG). This makes it a valuable asset for those working in the field of IMEx going forward.

References

[1] Perkis, A., Timmerer, C., et al., “QUALINET White Paper on Definitions of Immersive Media Experience (IMEx)”, European Network on Quality of Experience in Multimedia Systems and Services, 14th QUALINET meeting (online), May 25, 2020. Online: https://arxiv.org/abs/2007.07032
[2] Mateos-Garcia, J., Stathoulopoulos, K., & Thomas, N. (2018). The immersive economy in the UK (Rep. No. 18.1137.020). Innovate UK.
[3] Infocomm Media 2025 Supplementary Information (pp. 31-43, Rep.). (2015). Singapore: Ministry of Communications and Information.
[4] Hadwick, A. (2020). XR Industry Insight Report 2019-2020 (Rep.). San Francisco: VRX Conference & Expo.
[5] http://www.qualinet.eu/

Towards Interactive QoE Assessment of Robotic Telepresence

Telepresence robots (TPRs) are remote-controlled, wheeled devices with an internet connection. A TPR can “teleport” you to a remote location, let you drive around and interact with people.  A TPR user can feel present in the remote location by being able to control the robot position, movements, actions, voice and video. A TPR facilitates human-to-human interaction, wherever you want and whenever you want. The human user sends commands to the TPR by pressing buttons or keys from a keyboard, mouse, or joystick.

A Robotic Telepresence Environment

In recent years, people from different environments and backgrounds have started to adopt TPRs for private and business purposes such as attending a class, roaming around the office and visiting patients. Due to the COVID-19 pandemic, adoption in healthcare has increased in order to facilitate social distancing and staff safety [Ackerman 2020, Tavakoli et al. 2020].

Robotic Telepresence Sample Use Cases

Despite such increase in adoption, a research gap remains from a QoE perspective, as TPRs offer interaction beyond the well understood QoE issues in traditional static audio-visual conferencing. TPRs, as remote-controlled vehicles, enable users with some form of physical presence at the remote location. Furthermore, for those people interacting with the TPR at the remote location, the robot is a physical representation or proxy agent of its remote operator. The operator can physically interact with the remote location by driving over an object or pushing an object forward. These aspects of teleoperation and navigation represent an additional dimension in terms of functionality, complexity and experience.

Navigating a TPR may pose challenges to end-users and influence their perceived quality of the system. For instance, when a TPR operator is driving the robot, he/she expects an instantaneous reaction from the robot. An increased delay in sending commands to the robot may thus negatively impact robot mobility and the user’s satisfaction, even if the audio-visual communication functionality itself is not affected.

In a recent paper published at QoMEX 2020 [Jahromi et al. 2020], we addressed this gap in research by means of a subjective QoE experiment that focused on the QoE aspects of live TPR teleoperation over the internet. We were interested in understanding how network QoS-related factors influence the operator’s QoE when using a TPR in an office context.

TPR QoE User Study and Experimental Findings

In our study, we investigated the QoE of TPR navigation along three research questions: 1) impact of network factors including bandwidth, delay and packet loss on the TPR navigation QoE, 2) discrimination between navigation QoE and video QoE, 3) impact of task on TPR QoE sensitivity.

The QoE study participants were situated in a laboratory setting in Dublin, Ireland, where they navigated a Beam Plus TPR via keyboard input on a desktop computer. The TPR was placed in a real office setting of California Telecom in California, USA. Bandwidth, delay and packet loss rate were manipulated on the operator’s PC.

A User Participating in the Robotic Telepresence QoE Study

A total of 23 subjects participated in our QoE lab study: 8 subjects were female and 15 male and the average test duration was 30 minutes per participant. We followed  ITU-T Recommendation BT.500 and detected three participants as outliers which were excluded from subsequent analysis. A post-test survey shows that none of the participants reported task boredom as a factor. In fact, many reported that they enjoyed the experience! 

The influence of network factors on Navigation QoE

All three network influence factors exhibited a significant impact on navigation QoE but in different ways. Above a threshold of 0.9 Mbps, bandwidth showed no influence on navigation QoE, while 1% packet loss already showed a noticeable impact on the navigation QoE.  A mixed-model ANOVA confirms that the impact of the different network factors on navigation quality ratings is statistically significant (see [Jahromi et al. 2020] for details).  From the figure below, one can see that the levels of navigation QoE MOS, as well as their sensitivity to network impairment level, depend on the actual impairment type.

The bar plots illustrate the influence of network QoS factors on the navigation quality (left) and the video quality (right).

Discrimination between navigation QoE and video QoE

Our study results show that the subjects were capable of discriminating between video quality and navigation quality, as they treated them as separate concepts when it comes to experience assessment. Based on ANOVA analysis [Jahromi et al. 2020], we see that the impact of bandwidth and packet loss on TPR video quality ratings were statistically significant. However, for the delay, this was not the case (in contrast to navigation quality).  A comparison of navigation quality and video quality subplots shows that changes in MOS across different impairment levels diverge between the two in terms of amplitude.  To quantify this divergence, we performed a Spearman Rank Ordered Correlation Coefficient (SROCC) analysis, revealing only a weak correlation between video and navigation quality (SROCC =0.47).

Impact of task on TPR QoE sensitivity

Our study showed that the type of TPR task had more impact on navigation QoE than streaming video QoE. Statistical analysis reveals that the actual task at hand significantly affects QoE impairment sensitivity, depending on the network impairment type. For example, the interaction between bandwidth and task is statistically significant for navigation QoE, which means that changes in bandwidth were rated differently depending on the task type. On the other hand, this was not the case for delay and packet loss. Regarding video quality, we do not see a significant impact of task on QoE sensitivity to network impairments, except for the borderline case for packet loss rate.

Conclusion: Towards a TPR QoE Research Agenda

There were three key findings from this study. First, we understand that users can differentiate between visual and navigation aspects of TPR operation. Secondly, all three network factors have a significant impact on TPR navigation QoE. Thirdly,  visual and navigation QoE sensitivity to specific impairments strongly depends on the actual task at hand. We also found the initial training phase to be essential in order to ensure familiarity of participants with the system and to avoid bias caused by novelty effects. We observed that participants were highly engaged when navigating the TPR, as was also reflected in the positive feedback received during the debriefing interviews. We believe that our study methodology and design, including task types, worked very well and can serve as a solid basis for future TPR QoE studies. 

We also see the necessity of developing a more generic, empirically validated, TPR experience framework that allows for systematic assessment and modelling of QoE and UX in the context of TPR usage. Beyond integrating concepts and constructs that have been already developed in other related domains such as (multi-party) telepresence, XR, gaming, embodiment and human-robot interaction, the development of such a framework must take into account the unique properties that distinguish the TPR experience from other technologies:

  • Asymmetric conditions
    The factors influencing  QoE for TPR users are not only bidirectional, they are also different on both sides of TPR, i.e., the experience is asymmetric. Considering the differences between the local and the remote location, a TPR setup features a noticeable number of asymmetric conditions as regards the number of users, content, context, and even stimuli: while the robot is typically controlled by a single operator, the remote location may host a number of users (asymmetry in the number of users). An asymmetry also exists in the number of stimuli. For instance, the remote users perceive the physical movement and presence of the operator by the actual movement of the TPR. The experience of encountering a TPR rolling into an office is a hybrid kind of intrusion, somewhere between a robot and a physical person. However, from the operator’s perspective, the experience is a rather virtual one, as he/she only becomes conscious of physical impact at the remote location only by means of technically mediated feedback.
  • Social Dimensions
    According to [Haans et al. 2012], the experience of telepresence is defined as “a consequence of the way in which we are embodied, and that the capability to feel as if one is actually there in a technologically mediated or simulated environment is a natural consequence of the same ability that allows us to adjust to, for example, a slippery surface or the weight of a hammer”.
    The experience of being present in a TPR-mediated context goes beyond AR and VR. It is a blended physical reality. The sense of ownership of a wheeled TPR by means of mobility and remote navigation of using a “physical” object, allows the users to feel as if they are physically present in the remote environment (e.g. a physical avatar). This allows the TPR users to get involved in social activities, such as accompanying people and participating in discussions while navigating, sharing the same visual scenes, visiting a place and getting involved in social discussions, parties and celebrations. In healthcare, a doctor can use TPR for visiting patients as well as dispensing and administering medication remotely.
  • TPR Mobility and Physical Environment
    Mobility is a key dimension of telepresence frameworks [Rae et al. 2015]. TPR mobility and navigation features introduce new interactions between the operators and the physical environment.  The environmental aspect becomes an integral part of the interaction experience [Hammer et al. 2018].
    During a TPR usage, the navigation path and the number of obstacles that a remote user may face can influence the user’s experience. The ease or complexity of navigation can change the operator’s focus and attention from one influence factor to another (e.g., video quality to navigation quality). In Paloski et al’s, 2008 study, it was found that cognitive impairment as a result of fatigue can influence user performance concerning robot operation [Paloski et al. 2008]. This raises the question of how driving and interaction through TPR impacts the user’s cognitive load and results in fatigue compared to physical presence.
    The mobility aspects of TPRs can also influence the perception of spatial configurations of the physical environment. This allows the TPR user to manipulate and interact with the environment from a spatial configuration aspect [Narbutt et al. 2017]. For example,  the ambient noise of the environment can be perceived at different levels. The TPR operator can move the robot closer to the source of the noise or keep a distance from it. This can enhance his/her feelings of being present [Rae et al. 2015].

Above distinctive characteristics of a TPR-mediated context illustrate the complexity and the broad range of aspects that potentially have a significant influence on the TPR quality of user experience. Consideration of these features and factors provides a useful basis for the development of a comprehensive TPR experience framework.

References

  • [Tavakoli et al. 2020] Tavakoli, Mahdi, Carriere, Jay and Torabi, Ali. (2020). Robotics For COVID-19: How Can Robots Help Health Care in the Fight Against Coronavirus.
  • [Ackerman 2020] E. Ackerman (2020). Telepresence Robots Are Helping Take Pressure Off Hospital Staff, IEEE Spectrum, Apr 2020
  • [Jahromi et al. 2020] H. Z. Jahromi, I. Bartolec, E. Gamboa, A. Hines, and R. Schatz, “You Drive Me Crazy! Interactive QoE Assessment for Telepresence Robot Control,” in 12th International Conference on Quality of Multimedia Experience (QoMEX 2020), Athlone, Ireland, 2020.
  • [Hammer et al. 2018] F. Hammer, S. Egger-Lampl, and S. Möller, “Quality-of-user-experience: a position paper,” Quality and User Experience, vol. 3, no. 1, Dec. 2018, doi: 10.1007/s41233-018-0022-0.
  • [Haans et al. 2012] A. Haans & W. A. Ijsselsteijn (2012). Embodiment and telepresence: Toward a comprehensive theoretical framework✩. Interacting with Computers, 24(4), 211-218.
  • [Rae et al. 2015] I. Rae, G. Venolia, JC. Tang, D. Molnar  (2015, February). A framework for understanding and designing telepresence. In Proceedings of the 18th ACM conference on computer supported cooperative work & social computing (pp. 1552-1566).
  • [Narbutt et al. 2017] M. Narbutt, S. O’Leary, A. Allen, J. Skoglund, & A. Hines,  (2017, October). Streaming VR for immersion: Quality aspects of compressed spatial audio. In 2017 23rd International Conference on Virtual System & Multimedia (VSMM) (pp. 1-6). IEEE.
  • [Paloski et al. 2008] W. H. Paloski, C. M. Oman, J. J. Bloomberg, M. F. Reschke, S. J. Wood, D. L. Harm, … & L. S. Stone (2008). Risk of sensory-motor performance failures affecting vehicle control during space missions: a review of the evidence. Journal of Gravitational Physiology, 15(2), 1-29.

Definitions of Crowdsourced Network and QoE Measurements

1 Introduction and Definitions

Crowdsourcing is a well-established concept in the scientific community, used for instance by Jeff Howe and Mark Robinson in 2005 to describe how businesses were using the Internet to outsource work to the crowd [2], but can be dated back up to 1849 (weather prediction in the US). Crowdsourcing has enabled a huge number of new engineering rules and commercial applications. To better define crowdsourcing in the context of network measurements, a seminar was held in Würzburg, Germany 25-26 September 2019 on the topic “Crowdsourced Network and QoE Measurements”. It notably showed the need for releasing a white paper, with the goal of providing a scientific discussion of the terms “crowdsourced network measurements” and “crowdsourced QoE measurements”. It describes relevant use cases for such crowdsourced data and its underlying challenges.

The outcome of the seminar is the white paper [1], which is – to our knowledge – the first document covering the topic of crowdsourced network and QoE measurements. This document serves as a basis for differentiation and a consistent view from different perspectives on crowdsourced network measurements, with the goal of providing a commonly accepted definition in the community. The scope is focused on the context of mobile and fixed network operators, but also on measurements of different layers (network, application, user layer). In addition, the white paper shows the value of crowdsourcing for selected use cases, e.g., to improve QoE, or address regulatory issues. Finally, the major challenges and issues for researchers and practitioners are highlighted.

This article now summarizes the current state of the art in crowdsourcing research and lays down the foundation for the definition of crowdsourcing in the context of network and QoE measurements as provided in [1]. One important effort is first to properly define the various elements of crowdsourcing.

1.1 Crowdsourcing

The word crowdsourcing itself is a mix of the crowd and the traditional outsourcing work-commissioning model. Since the publication of [2], the research community has been struggling to find a definition of the term crowdsourcing [3,4,5] that fits the wide variety of its applications and new developments. For example, in ITU-T P.912, crowdsourcing has been defined as:

Crowdsourcing consists of obtaining the needed service by a large group of people, most probably an on-line community.

The above definition has been written with the main purpose of collecting subjective feedback from users. For the purpose of this white paper focused on network measurements, it is required to clarify this definition. In the following, the term crowdsourcing will be defined as follows:

Crowdsourcing is an action by an initiator who outsources tasks to a crowd of participants to achieve a certain goal.

The following terms are further defined to clarify the above definition:

A crowdsourcing action is part of a campaign that includes processes such as campaign design and methodology definition, data capturing and storage, and data analysis.

The initiator of a crowdsourcing action can be a company, an agency (e.g., a regulator), a research institute or an individual.

Crowdsourcing participants (also “workers” or “users”) work on the tasks set up by the initiator. They are third parties with respect to the initiator, and they must be human.

The goal of a crowdsourcing action is its main purpose from the initiator’s perspective.

The goals of a crowdsourcing action can be manifold and may include, for example:

  • Gathering subjective feedback from users about an application (e.g., ranks expressing the experience of users when using an application)
  • Leveraging existing capacities (e.g., storage, computing, etc.)  offered by companies or individual users to perform some tasks
  • Leveraging cognitive efforts of humans for problem-solving in a scientific context.

In general, an initiator adopts a crowdsourcing approach to remedy a lack of resources (e.g., running a large-scale computation by using the resources of a large number of users to overcome its own limitations) or to broaden a test basis much further than classical opinion polls. Crowdsourcing thus covers a wide range of actions with various degrees of involvement by the participants.

In crowdsourcing, there are various methods of identifying, selecting, receiving, and retributing users contributing to a crowdsourcing initiative and related services. Individuals or organizations obtain goods and/or services in many different ways from a large, relatively open and often rapidly-evolving group of crowdsourcing participants (also called users). The use of goods or information obtained by crowdsourcing to achieve a cumulative result can also depend on the type of task, the collected goods or information and final goal of the crowdsourcing task.

1.2 Roles and Actors

Given the above definitions, the actors involved in a crowdsourcing action are the initiator and the participants. The role of the initiator is to design and initiate the crowdsourcing action, distribute the required resources to the participants (e.g., a piece of software or the task instructions, assign tasks to the participants or start an open call to a larger group), and finally to collect, process and evaluate the results of the crowdsourcing action.

The role of participants depends on their degree of contribution or involvement. In general, their role is described as follows. At least, they offer their resources to the initiator, e.g., time, ideas, or computation resources. In higher levels of contributions, participants might run or perform the tasks assigned by the initiator, and (optionally) report the results to the initiator.

Finally, the relationships between the initiator and the participants are governed by policies specifying the contextual aspects of the crowdsourcing action such as security and confidentiality, and any interest or business aspects specifying how the participants are remunerated, rewarded or incentivized for their participation in the crowdsourcing action.

2 Crowdsourcing in the Context of Network Measurements

The above model considers crowdsourcing at large. In this section, we analyse crowdsourcing for network measurements, which creates crowd data. This exemplifies the broader definitions introduced above, even if the scope is more restricted but with strong contextual aspects like security and confidentiality rules.

2.1 Definition: Crowdsourced Network Measurements

Crowdsourcing enables a distributed and scalable approach to perform network measurements. It can reach a large number of end-users all over the world. This clearly surpasses the traditional measurement campaigns launched by network operators or regulatory agencies able to reach only a limited sample of users. Primarily, crowd data may be used for the purpose of evaluating QoS, that is, network performance measurements. Crowdsourcing may however also be relevant for evaluating QoE, as it may involve asking users for their experience – depending on the type of campaign.

With regard to the previous section and the special aspects of network measurements, crowdsourced network measurements/crowd data are defined as follows, based on the previous, general definition of crowdsourcing introduced above:

Crowdsourced network measurements are actions by an initiator who outsources tasks to a crowd of participants to achieve the goal of gathering network measurement-related data.

Crowd data is the data that is generated in the context of crowdsourced network measurement actions.

The format of the crowd data is specified by the initiator and depends on the type of crowdsourcing action. For instance, crowd data can be the results of large scale computation experiments, analytics, measurement data, etc. In addition, the semantic interpretation of crowd data is under the responsibility of the initiator. The participants cannot interpret the crowd data, which must be thoroughly processed by the initiator to reach the objective of the crowdsourcing action.

We consider in this paper the contribution of human participants only. Distributed measurement actions solely made by robots, IoT devices or automated probes are excluded. Additionally, we require that participants consent to contribute to the crowdsourcing action. This consent might, however, vary from actively fulfilling dedicated task instructions provided by the initiator to merely accepting terms of services that include the option of analysing usage artefacts generated while interacting with a service.

It follows that in the present document, it is assumed that measurements via crowdsourcing (namely, crowd data) are performed by human participants aware of the fact that they are participating in a crowdsourcing campaign. Once clearly stated, more details need to be provided about the slightly adapted roles of the actors and their relationships in a crowdsourcing initiative in the context of network measurements.

2.2 Active and Passive Measurements

For a better classification of crowdsourced network measurements, it is important to differentiate between active and passive measurements. Similar to the current working definition within the ITU-T Study Group 12 work item “E.CrowdESFB” (Crowdsourcing Approach for the assessment of end-to-end QoS in Fixed Broadband and Mobile Networks), the following definitions are made:

Active measurements create artificial traffic to generate crowd data.

Passive measurements do not create artificial traffic, but measure crowd data that is generated by the participant.

For example, a typical case of an active measurement is a speed test that generates artificial traffic against a test server in order to estimate bandwidth or QoS. A passive measurement instead may be realized by fetching cellular information from a mobile device, which has been collected without additional data generation.

2.3 Roles of the Actors

Participants have to commit to participation in the crowdsourcing measurements. The level of contribution can vary depending on the corresponding effort or level of engagement. The simplest action is to subscribe to or install a specific application, which collects data through measurements as part of its functioning – often in the background and not as part of the core functionality provided to the user. A more complex task-driven engagement requires a more important cognitive effort, such as providing subjective feedback on the performance or quality of certain Internet services. Hence, one must differentiate between participant-initiated measurements and automated measurements:

Participant-initiated measurements require the participant to initiate the measurement. The measurement data are typically provided to the participant.

Automated measurements can be performed without the need for the participant to initiate them. They are typically performed in the background.

A participant can thus be a user or a worker. The distinction depends on the main focus of the person doing the contribution and his/her engagement:

A crowdsourcing user is providing crowd data as the side effect of another activity, in the context of passive, automated measurements.

A crowdsourcing worker is providing crowd data as a consequence of his/her engagement when performing specific tasks, in the context of active, participant-initiated measurements.

The term “users” should, therefore, be used when the crowdsourced activity is not the main focus of engagement, but comes as a side effect of another activity – for example, when using a web browsing application which collects measurements in the background, which is a passive, automated measurement.

“Workers” are involved when the crowdsourced activity is the main driver of engagement, for example, when the worker is paid to perform specific tasks and is performing an active, participant-initiated measurement. Note that in some cases, workers can also be incentivized to provide passive measurement data (e.g. with applications collecting data in the background if not actively used).

In general, workers are paid on the basis of clear guidelines for their specific crowdsourcing activity, whereas users provide their contribution on the basis of a more ambiguous, indirect engagement, such as via the utilization of a particular service provided by the beneficiary of the crowdsourcing results, or a third-party crowd provider. Regardless of the participants’ level of engagement, the data resulting from the crowdsourcing measurement action is reported back to the initiator.

The initiator of the crowdsourcing measurement action often has to design a crowdsourcing measurement campaign, recruit the participants (selectively or openly), provide them with the necessary means (e.g. infrastructure and/or software) to run their action, provide the required (backend) infrastructure and software tools to the participants to run the action, collect, process and analyse the information, and possibly publish the results.

2.4 Dimensions of Crowdsourced Network Measurements

In light of the previous section, there are multiple dimensions to consider for crowdsourcing in the context of network measurements. A preliminary list of dimensions includes:

  • Level of subjectivity (subjective vs. objective measurements) in the crowd data
  • Level of engagement of the participant (participant-initiated or background) or their cognitive effort, and awareness (consciousness) of the measurement level of traffic generation (active vs. passive)
  • Type and level of incentives (attractiveness/appeal, paid or unpaid)

Besides these key dimensions, there are other features which are relevant in characterizing a crowdsourced network measurement activity. These include scale, cost, and value; the type of data collected; the goal or the intention, i.e. the intention of the user (based on incentives) versus the intention of the crowdsourcing initiator of the resulting output.

Figure 1: Dimensions for network measurements crowdsourcing definition, and relevant characterization features (examples with two types of measurement actions)

In Figure 1, we have illustrated some dimensions of network measurements based on crowdsourcing. Only the subjectivity, engagement and incentives dimension are displayed, on an arbitrary scale. The objective of this figure is to show that an initiator has a wide range of combinations for crowdsourcing action. The success of a measurement action with regard to an objective (number of participants, relevance of the results, etc.) is multifactorial. As an example, action 1 may indicate QoE measurements from a limited number of participants and action 2 visualizes the dimensions for network measurements by involving a large number of participants.

3 Summary

The attendees of the Würzburg seminar on “Crowdsourced Network and QoE Measurements” have produced a white paper, which defines terms in the context of crowdsourcing for network and QoE measurements, lists of relevant use cases from the perspective of different stakeholders, and discusses the challenges associated with designing crowdsourcing campaigns, analyzing, and interpreting the data. The goal of the white paper is to provide definitions to be commonly accepted by the community and to summarize the most important use-cases and challenges from industrial and academic perspectives.

References

[1] White Paper on Crowdsourced Network and QoE Measurements – Definitions, Use Cases and Challenges (2020). Tobias Hoßfeld and Stefan Wunderer, eds., Würzburg, Germany, March 2020. doi: 10.25972/OPUS-20232.

[2] Howe, J. (2006). The rise of crowdsourcing. Wired magazine, 14(6), 1-4.

[3] Estellés-Arolas, E., & González-Ladrón-De-Guevara, F. (2012). Towards an integrated crowdsourcing definition. Journal of Information science, 38(2), 189-200.

[4] Kietzmann, J. H. (2017). Crowdsourcing: A revised definition and introduction to new research. Business Horizons, 60(2), 151-153.

[5] ITU-T P.912, “Subjective video quality assessment methods for recognition tasks “, 08/2016

[6] ITU-T P.808 (ex P.CROWD), “Subjective evaluation of speech quality with a crowdsourcing approach”, 06/2018

Collaborative QoE Management using SDN

The Software-Defined Networking (SDN) paradigm offers the flexibility and programmability in the deployment and management of network services by separating the Control plane from the Data plane. Being based on network abstractions and virtualization techniques, SDN allows for simplifying the implementation of traffic engineering techniques as well as the communication among different services providers, included Internet Service Providers (ISPs) and Over The Top (OTT) providers. For these reasons, the SDN architectures have been widely used in the last years for the QoE-aware management of multimedia services.

The paper [1] presents Timber, an open source SDN-based emulation platform to provide the research community with a tool for experimenting new QoE management approaches and algorithms, which may also rely on information exchange between ISP and OTT [2].  We believe that the exchange of information between the OTT and the ISP is extremely important because:

  1. QoE models depend on different influence factors, i.e., network, application, system and context factors [3];
  2. OTT and ISP have different information in their hands, i.e., network state and application Key Quality Indicators (KQIs), respectively;
  3. End-to-end encryption of the OTT services makes it difficult for ISP to have access to application KQIs to perform QoE-aware network management.

In the following we briefly describe Timber and the impact of collaborative QoE management.

Timber architecture

Figure 1 represents the reference architecture, which is composed of four planes. The Service Management Plane is a cloud space owned by the OTT provider, which includes: a QoE Monitoring module to estimate the user’s QoE on the basis of service parameters acquired at the client side; a DB where QoE measurements are stored and can be shared with third parties; a Content Distribution service to deliver multimedia contents. Through the RESTful APIs, the OTTs give access to part of the information stored in the DB to the ISP, on the basis of appropriate agreements.

The Network Data Plane, Network Control Plane, and the Network Management Plane are the those in the hands of the ISP. The Network Data Plane includes all the SDN enabled data forwarding network devices; the Network Control Plane consists of the SDN controller which manages the network devices through Southbound APIs; and the Network Management Plane is the application layer of the SDN architecture controlled by the ISP to perform network-wide control operations which communicates with the OTT via RESTful APIs. The SDN application includes a QoS Monitoring module to monitor the performance of the network, a Management Policy module to take into account Service Level Agreements (SLA), and a Control Actions module that decides on the network control actions to be implemented by the SDN controller to optimize the network resources and improve the service’s quality.

Timber implements this architecture on top of the Mininet SDN emulator and the Ryu SDN controller, which provides the major functionalities of the traffic engineering abstractions. According to the depicted scenario, the OTT has the potential to monitor the level of QoE for the provided services as it has access to the needed application and network level KQIs (Key Quality Indicators). On the other hand, the ISP has the potential to control the network level quality by changing the allocated resources. This scenario is implemented in Timber and allows for setting the needed emulation network and application configuration to text QoE-aware service management algorithms.

Specifically, the OTT performs QoE monitoring of the delivered service by acquiring service information from the client side based on passive measurements of service-related KQIs obtained through probes installed in the user’s devices. Based on these measurements, specific QoE models can be used to predict the user experience. The QoE measurements of active clients’ sessions are also stored in the OTT DB, which can also be accessed by the ISP through mentioned RESTful APIs. The ISP’s SDN application periodically controls the OTT-reported QoE and, in case of observed QoE degradations, implements network-wide policies by communicating with the SDN controller through the Northbound APIs. Accordingly, the SDN controller performs network management operations such as link-aggregation, addition of new flows, network slicing, by controlling the network devices through Southbound APIs.

QoE management based on information exchange: video service use-case

The previously described scenario, which is implemented by Timber, portraits a collaborative scenario between the ISP and the OTT, where the first provides QoE-related data and the later takes care of controlling the resources allocated to the deployed services. Ahmad et al. [4] makes use of Timber to conduct experiments aimed at investigating the impact of the frequency of information exchange between an OTT providing a video streaming service and the ISP on the end-user QoE.

Figure 2 shows the experiments topology. Mininet in Timber is used to create the network topology, which in this case regards the streaming of video sequences from the media server to the User1 (U1) when web traffic is also transmitted on the same network towards User2 (U2). U1 and U2 are two virtual hosts sharing the same access network and act as the clients. U1 runs the client-side video player and the Apache server provides both web and HAS (HTTP Adaptive Streaming) video services.

In the considered collaboration scenario, QoE-related KQIs are extracted from the client-side and sent to the to the MongoDB database (managed by the OTT), as depicted by the red dashed arrows. This information is then retrieved by the SDN controller of the ISP at frequency f (see green dashed arrow). The aim is to provide different network level resources to video streaming and normal web traffic when QoE degradation is observed for the video service. These control actions on the network are needed because TCP-based web traffic sessions of 4 Mbps start randomly towards U2 during the HD video streaming sessions, causing network time varying bottlenecks in the S1−S2 link. In these cases, the SDN controller implements virtual network slicing at S1 and S2 OVS switches, which provides the minimum guaranteed throughput of 2.5 Mbps and 1 Mbps to video streaming and web traffic, respectively. The SDN controller application utilizes flow matching criteria to assign flows to the virtual slice. The objective of this emulations is to show the impact of f on the resulting QoE.

The Big Buck Bunny 60-second long video sequence in 1280 × 720 was streamed between the server and the U1 by considering 5 different sampling intervals T for information exchange between OTT and ISP, i.e., 2s, 4s, 8s, 16s, and 32s. The information exchanged in this case were the average length stalling duration and the number of stalling events measured by the probe at the client video player. Accordingly, the QoE for the video streaming service was measured in terms of predicted MOS using the QoE model defined in [5] for HTTP video streaming, as follows:
MOSp = α exp( -β(L)N ) + γ
where L and N are the average length stalling duration and the number of stalling events, respectively, whereas α=3.5, γ=1.5, and β(L)=0.15L+0.19.

Figure 3.a shows the average predicted MOS when information is exchanged at different sampling intervals (the inverse of f). The greatest MOSp is 4.34 obtained for T=2s, and T=4s. Exponential decay in MOSp is observed as the frequency of information exchange decreases. The lowest MOSp is 3.07 obtained for T=32s. This result shows that greater frequency of information exchange leads to low latency in the controller response to QoE degradation. The reason is that the buffer at the client player side keeps on starving for longer durations in case of higher T resulting into longer stalling durations until the SDN controller gets triggered to provide the guaranteed network resources to support the video streaming service.

Figure 3.b Initial loading time, average stalling duration and latency in controller response to quality degradation for different sampling intervals.

Figure 3.b shows the video initial loading time, average stalling duration and latency in controller response to quality degradation w.r.t different sampling intervals. The latency in controller response to QoE degradation increases linearly as the frequency of information exchange decreases while the stalling duration grows exponentially as the frequency decrease. The initial loading time seems to be not relevantly affected by different sampling intervals.

Conclusions

Experiments are conducted on an SDN emulation environment to investigate the impact of the frequency of information exchange between OTT and ISP when a collaborative network management approach is considered. The QoE for a video streaming service is measured by considering 5 different sampling intervals for information exchange between OTT and ISP, i.e., 2s, 4s, 8s, 16s, and 32s. The information exchanged are the video average length stalling duration and the number of stalling events.

The experiment results showed that higher frequency of information exchange results in greater delivered QoE, but a sampling interval lower than 4s (frequency > ¼ Hz) may not further improve the delivered QoE. Clearly, this threshold depends on the variability of the network conditions. Further studies are needed to understand how frequently the ISP and OTT should collaboratively share data to have observable benefits in terms of QoE varying the network status and the deployed services.

References

[1] A. Ahmad, A. Floris and L. Atzori, “Timber: An SDN based emulation platform for QoE Management Experimental Research,” 2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX), Cagliari, 2018, pp. 1-6.

[2] https://github.com/arslan-ahmad/Timber-DASH

[3] P. Le Callet, S. Möller, A. Perkis et al., “Qualinet White Paper on Definitions of Quality of Experience (2012),” in European Network on Quality of Experience in Multimedia Systems and Services (COST Action IC 1003), Lausanne, Switzerland, Version 1.2, March 2013.

[4] A. Ahmad, A. Floris and L. Atzori, “Towards Information-centric Collaborative QoE Management using SDN,” 2019 IEEE Wireless Communications and Networking Conference (WCNC), Marrakesh, Morocco, 2019, pp. 1-6.

[5] T. Hoßfeld, C. Moldovan, and C. Schwartz, “To each according to his needs: Dimensioning video buffer for specific user profiles and behavior,” in IFIP/IEEE Int. Symposium on Integrated Network Management (IM), 2015. IEEE, 2015, pp. 1249–1254.

Can the Multimedia Research Community via Quality of Experience contribute to a better Quality of Life?

Can the multimedia community contribute to a better Quality of Life? Delivering a higher resolution and distortion-free media stream so you can enjoy the latest movie on Netflix or YouTube may provide instantaneous satisfaction, but does it make your long term life better? Whilst the QoMEX conference series has traditionally considered the former, in more recent years and with a view to QoMEX 2020, research works that consider the later are also welcome. In this context, rather than looking at what we do, reflecting on how we do it could offer opportunities for sustained rather than instantaneous impact in fields such as health, inclusive of assistive technologies (AT) and digital heritage among many others.

In this article, we ask if the concepts from the Quality of Experience (QoE) [1] framework model can be applied, adapted and reimagined to inform and develop tools and systems that enhance our Quality of Life. The World Health Organisation (WHO) definition of health states that “[h]ealth is a state of complete physical, mental and social well-being and not merely the absence of disease or infirmity” [2]. This is a definition that is well-aligned with the familiar yet ill-defined term, Quality of Life (QoL). Whilst QoL requires further work towards a concrete definition, the definition of QoE has been developed through work by the QUALINET EU COST Network [3]. Using multimedia quality as a use case, a white paper [1] resulted from this effort that describes the human, context, service and system factors that influence the quality of experience for multimedia systems.

Fig. 1: (a) Quality of Experience and (b) Quality of Life. (reproduced from [2]).

The QoE formation process has been mapped to a conceptual model allowing systems and services to be evaluated and improved. Such a model has been developed and used in predicting QoE. Adapting and applying the methods to health-related QoL will allow predictive models for QoL to be developed.

In this context, the best paper award winner at QoMEX in 2017 [4] proposed such a mapping for QoL in stroke prevention, care and rehabilitation (Fig. 1) along with examining practical challenges for modeling and applications. The process of identifying and categorizing factors and features was illustrated using stroke patient treatment as an example use case and this work has continued through the European Union Horizon 2020 research project PRECISE4Q [5]. For medical practitioners, a QoL framework can assist in the development of decision support systems solutions, patient monitoring, and imaging systems.

At more of a “systems” level in e-health applications, the WHO defines assistive devices and technologies as “those whose primary purpose is to maintain or improve an individual’s functioning and independence to facilitate participation and to enhance overall well-being” [6]. A proposed application of immersive technologies as an assistive technology (AT) training solution applied QoE as a mechanism to evaluate the usability and utility of the system [7]. The assessment of immersive AT used a number of physiological data: EEG signal, GSR/EDA, body surface temperature, accelerometer, HR and BVP. These allow objective analysis while the individual is operating the wheelchair simulator. Performing such evaluations in an ecologically valid manner is a challenging task. However, the QoE framework provides a concrete mechanism to consider the human, context and system factors that influence the usability and utility of such a training simulator. In particular, the use of implicit and objective metrics can complement qualitative approaches to evaluations.

In the same vein, another work presented at QoMEX 2017 [8], employed the use of Augmented Reality (AR) and Virtual Reality (VR) as a clinical aid for diagnosis of speech and language difficulties, specifically aphasia (see Fig. 2). It is estimated, that speech or language difficulties affect more than 12% of people internationally [9]. Individuals who suffer from a stroke or traumatic brain injury (TBI) often experience symptoms of aphasia as a result of damage to the left frontal lobe. Anomic aphasia [10] is a mild form of aphasia in which patients experience word retrieval problems and semantic memory difficulties. Opportunities exist to digitalize well-accepted clinical approaches that can be augmented through QoE based objective and implicit metrics. Understanding the user via advanced processing techniques is an area in dire need of further research with significant opportunities to understand the user at a cognitive, interaction and performance levels moving far beyond the binary pass/fail of traditional approaches.

Fig. 2: Prototype System Framework (Reproduced from [8]). I. Physiological wearable sensors used to capture data. (a) Neurosky mindwave® device. (b) Empatica E4® wristband. II. Representation of user interaction with the wheelchair simulator. III. The compatibles displays. (a) Common screen. (b) Oculus Rift® HMD device. (c) HTC Vive® HMD device.

Moving beyond health, the QoE concept can also be extended to other areas such as digital heritage. Organizations such as broadcasters and national archives that collect media recordings are digitizing their material because the analog storage media degrade over time. Archivists, restoration experts, content creators, and consumers are all stakeholders but they have different perspectives when it comes to their expectations and needs. Hence their QoE for archive material can be very different, as discussed at QoMEX 2019 [11]. For people interested in media archives viewing quality through a QoE lens, QoE aids in understanding the issues and priorities of the stakeholders. Applying the QoE framework to explore the different stakeholders and the influencing factors that affect their QoE perceptions over time allows different kinds of models for QoE to be developed and used across the stages of the archived material lifecycle from digitization through restoration and consumption.

The QoE framework’s simple yet comprehensive conceptual model for the quality formation process has had a major impact on multimedia quality. The examples presented here highlight how it can be used as a blueprint in other domains and to reconcile different perspectives and attitudes to quality. With an eye on the next and future editions of QoMEX, will we see other use cases and applications of QoE to domains and concepts beyond multimedia quality evaluations? The QoMEX conference series has evolved and adapted based on emerging application domains, industry engagement, and approaches to quality evaluations.  It is clear that the scope of QoE research broadened significantly over the last 11 years. Please take a look at [12] for details on the conference topics and special sessions that the organizing team for QoMEX2020 in Athlone Ireland hope will broaden the range of use cases that apply QoE towards QoL and other application domains in a spirit of inclusivity and diversity.

References:

[1] P. Le Callet, S. Möller, and A. Perkis, eds., “Qualinet White Paper on Definitions of Quality of Experience (2012). European Network on Quality of Experience in Multimedia Systems and Services (COST Action IC 1003), Lausanne, Switzerland, Version 1.2, March 2013.”

[2] World Health Organization, “World health organisation. preamble to the constitution of the world health organisation,” 1946. [Online]. Available: http://apps.who.int/gb/bd/PDF/bd47/EN/constitution-en.pdf. [Accessed: 21-Jan-2020].

[3] QUALINET [Online], Available: https://www.qualinet.eu. [Accessed: 21-Jan-2020].

[4] A. Hines and J. D. Kelleher, “A framework for post-stroke quality of life prediction using structured prediction,” 9th International Conference on Quality of Multimedia Experience, QoMEX 2017, Erfurt, Germany, June 2017.

[5] European Union Horizon 2020 research project PRECISE4Q, https://precise4q.eu/. [Accessed: 21-Jan-2020].

[6] “WHO | Assistive devices and technologies,” WHO, 2017. [Online]. Available: http://www.who.int/disabilities/technology/en/. [Accessed: 21-Jan-2020].

[7] D. Pereira Salgado, F. Roque Martins, T. Braga Rodrigues, C. Keighrey, R. Flynn, E. L. Martins Naves, and N. Murray, “A QoE assessment method based on EDA, heart rate and EEG of a virtual reality assistive technology system”, In Proceedings of the 9th ACM Multimedia Systems Conference (Demo Paper), pp. 517-520, 2018.

[8] C. Keighrey, R. Flynn, S. Murray, and N. Murray, “A QoE Evaluation of Immersive Augmented and Virtual Reality Speech & Language Assessment Applications”, 9th International Conference on Quality of Multimedia Experience, QoMEX 2017, Erfurt, Germany, June 2017.

[9] “Scope of Practice in Speech-Language Pathology,” 2016. [Online]. Available: http://www.asha.org/uploadedFiles/SP2016-00343.pdf. [Accessed: 21-Jan-2020].

[10] J. Reilly, “Semantic Memory and Language Processing in Aphasia and Dementia,” Seminars in Speech and Language, vol. 29, no. 1, pp. 3-4, 2008.

[11] A. Ragano, E. Benetos, and A. Hines, “Adapting the Quality of Experience Framework for Audio Archive Evaluation,” Eleventh International Conference on Quality of Multimedia Experience (QoMEX), Berlin, Germany, 2019.

[12] QoMEX 2020, Athlone, Ireland. [Online]. Available: https://www.qomex2020.ie. [Accessed: 21-Jan-2020].

Report on QoMEX 2019: QoE and User Experience in Times of Machine Learning, 5G and Immersive Technologies

qomex2019_logo

The QoMEX 2019 was held from 5 to 7 June 2019 in Berlin, with Sebastian Möller (TU Berlin and DFKI) and Sebastian Egger-Lampl (AIT Vienna) as general chairs. The annual conference celebrated its 10th birthday in Berlin since the first edition in 2009 in San Diego. The latter focused on classic multimedia voice, video and video services. Among the fundamental questions back then were how to measure and how to quantify quality from the user’s point of view in order to improve such services? Answers to these questions were also presented and discussed at QoMEX 2019, where technical developments and innovations in terms of video and voice quality were considered. The scope has however broadened significantly over the last decade: interactive applications, games and immersive technologies, which require new methods for the subjective assessment of perceived quality of service and QoE, were addressed. With a focus on 5G and its implications for QoE, the influence of communication networks and network conditions for the transmission of data and the provisioning of services were also examined. In this sense, QoMEX 2019 looked at both classic multimedia applications such as voice, audio and video as well as interactive and immersive services: gaming QoE, virtual realities such as VR exergames, and augmented realities such as smart shopping, 360° video, Point Clouds, Web QoE, text QoE, perception of medical ultrasound videos for radiologists, QoE of visually impaired users with appropriately adapted videos, QoE in smart home environments, etc.

In addition to this application-oriented perspective, methodological approaches and fundamental models of QoE were also discussed during QoMEX 2019. While suitable methods for carrying out user studies and assessing quality remain core topics of QoMEX, advanced statistical methods and machine learning (ML) techniques emerged as another focus topic at this year’s QoMEX. The applicability, performance and accuracy of e.g. neural networks or deep learning approaches have been studied for a wide variety of QoE models and in several domains: video quality in games, content of image quality and compression methods, quality metrics for high-dynamic-range (HDR) images, instantaneous QoE for adaptive video streaming over the Internet and in wireless networks, speech quality metrics, and ML-based voice quality improvement. Research questions addressed at QoMEX 2019 include the impact of crowdsourcing study design on the outcomes, or the reliability of crowdsourcing, for example, in assessing voice quality. In addition to such data-driven approaches, fundamental theoretical work on QoE and its quantification in systems as well as fundamental relationships and model approaches were presented.

The TPC Chairs were Lynne Baillie (HWU Edinburgh), Tobias Hoßfeld (Univ. Würzburg), Katrien De Moor (NTNU Trondheim), Raimund Schatz (AIT Vienna). In total, the program included 11 sessions on the above topics. From those 11 sessions, 6 sessions on dedicated topics were organized by various Special Session organizers in an open call. A total of 82 full paper contributions were submitted, out of which 35 contributions were accepted (acceptance rate: 43%). Out of the 77 short papers submitted, 33 were accepted and presented in two dedicated poster sessions. The QoMEX 2019 Best Paper Award went to Dominik Keller, Tamara Seybold, Janto Skowronek and Alexander Raake for “Assessing Texture Dimensions and Video Quality in Motion Pictures using Sensory Evaluation Techniques”. The Best Student Paper Award went to Alexandre De Masi and Katarzyna Wac for “Predicting Quality of Experience of Popular Mobile Applications in a Living Lab Study”.

The keynote speakers addressed several timely topics. Irina Cotanis gave an inspiring talk on QoE in 5G. She addressed both the emerging challenges and services in 5G and the question of how to measure quality and QoE in these networks. Katrien De Moor highlighted the similarities and differences between QoE and User Experience (UX), considering the evolution of the two terms QoE and UX in the past and current status. An integrated view of QoE and UX was discussed and how the two concepts develop in the future. In particular, she posed the question how the two communities could empower each other and what would be needed to bring both communities together in the future. The final day of QoMEX 2019 began with the keynote of artist Martina Menegon, who presented some of her art projects based on VR technology.

Additional activities and events within QoMEX 2019 comprised the following. (1) In the Speed ​​PhD mentoring organized by Sebastian Möller and Saman Zadtootaghaj, the participating doctoral students could apply for a short mentoring session (10 minutes per mentor) with various researchers from industry and academia in order to ask technical or general questions. (2) In a session organized by Sebastian Egger-Lampl, the best works of the last 5 years of the simultaneous TVX Conference and QoMEX were presented to show the similarities and differences between the QoE and the UX communities. This was followed by a panel discussion. (3) There was a 3-minute madness session organized by Raimund Schatz and Tobias Hoßfeld, which featured short presentations of “crazy” new ideas in a stimulating atmosphere. The intention of this second session is to playfully encourage the QoMEX community to generate new unconventional ideas and approaches and to provide a forum for mutual creative inspiration.

The next edition, QoMEX 2020, will be held May 26th to 28th 2020 in Athlone, Ireland. More information:  http://qomex2020.ie/