Challenges in Experiencing Realistic Immersive Telepresence


Immersive imaging technologies offer a transformative way to change how we experience interacting with remote environments, i.e., telepresence. By leveraging advancements in light field imaging, omnidirectional cameras, and head-mounted displays, these systems enable realistic, real-time visual experiences that can revolutionize how we interact with the remote scene in fields such as healthcare, education, remote collaboration, and entertainment. However, the field faces significant technical and experiential challenges, including efficient data capture and compression, real-time rendering, and quality of experience (QoE) assessment. Expanding on the findings of the authors’ recent publication and situating them within a broader theoretical framework, this article provides an integrated overview of immersive telepresence technologies, focusing on their technological foundations, applications, and the challenges that must be addressed to advance this field.

1. Redefining Telepresence Through Immersive Imaging

Telepresence is defined as the “sense of being physically present at a remote location through interaction with the system’s human interface[Minsky1980]. Such virtual presence is made possible by digital imaging systems and real-time communication of visuals and interaction signals. Immersive imaging systems such as light fields and omnidirectional imaging enhance the visual sense of presence, i.e., “being there[IJsselsteijn2000], with photorealistic recreation of the remote scene. This emerging field has seen rapid growth, both in research and development [Valenzise2022], due to advancements in imaging and display technologies, combined with increasing demand for interactive and immersive experiences. A visualization is provided in Figure 1 that shows a telepresence system that utilizes traditional cameras and controls and an immersive telepresence system.

Figure 1 – A side-by-side visualization of a traditional telepresence system (left) and an immersive telepresence system (right).

The experience of “presence” consists of three components according to Schubert et al. [Schubert2001], which are renamed in this article to take into account other definitions:

  1. Realness – “Realness[Schubert2001] or “realism[Takatalo2008] of the environment (i.e., in this case, the remote scene) relates to the “believability, the fidelity and validity of sensory features within the generated environments, e.g., photorealism.” [Perkis 2020].
  2. Immersion – User’s level of “involvement[Schubert2001] and “concentration to the virtual environment instead of real world, loss of time[Takatalo2008]. “The combination of sensory cues with symbolic cues essential for user emplacement and engagement[Perkis2020].
  3. Spatiality – An attribute of the environment helps “transporting” the user to induce spatial awareness [Schubert2001] which allows “spatial presence[Takatalo2008] and “the possibility for users to move freely and discover the world offered” [Perkis2020].

Immersion can happen without having realness or spatiality, for example, while we are reading a novel. Telepresence using traditional imaging systems might not be immersive in case of a relatively small display and other distractors present in the visual field. Realistic immersive telepresence necessitates higher degrees of freedom (e.g., 3 DoF+ or 6DoF) compared to a telepresence application with a traditional display. In this context, new view synthesis methods and spherical light field representations (cf. Section 3) will be crucial in giving correct depth cues and depth perception – which will increase realness and spatiality tremendously.

The rapid progress of immersive imaging technologies and their adoption can largely be attributed to advancements in processing and display systems, including light field displays and extended reality (XR) headsets. These XR headsets are becoming increasingly affordable while delivering excellent user experiences [Jackson2023], paving the way for the widespread adoption of immersive communication and telepresence applications in the near future. To further accelerate this transition, extensive efforts are being undertaken in both academia as well as industry.

The visual realism (i.e., realness) in realistic immersive telepresence relies on acquired photos rather than computer-generated imagery (CGI). In healthcare, it enables realistic remote consultations and surgical collaborations [Wisotzky2025]. In education and training, it facilitates immersive, location-independent learning environments [Kachach2021]. Similarly, visual realism can enhance remote collaboration by creating lifelike meeting spaces, while in media and entertainment, it can provide unprecedented realism for live events and performances, offering users a closer connection and having a feeling of being present on remote sites.

This article provides a brief overview of the technological foundations, applications, and challenges in immersive telepresence. The novel contribution of this article is setting up the theoretical framework for realistic immersive telepresence informed by prior literature and positioning the findings of the author’s recent publication [Zerman2024] within this broader theoretical framework. It explores how foundational technologies like light field imaging and real-time rendering drive the field forward, while also identifying critical obstacles, such as dataset availability, compression efficiency, and QoE evaluation.

2. Technological Foundations for Immersive Telepresence

A realistic immersive telepresence can be made possible by enabling its main defining factors of realness (e.g., photorealism), immersion, and spatiality. Although these factors can be satisfied with other modalities (e.g., spatial audio), this article focuses on the visual modality and visual recreation of the remote scene.

2.1 Immersive Imaging Modalities

Immersive imaging technologies encompass a wide range of methods aimed at capturing and recreating realistic visual and spatial experiences. These include light fields, omnidirectional images, volumetric videos using either point clouds or 3D meshes, holography, multi-view stereo imaging, neural radiance fields, Gaussian splats, and other extended reality (XR) applications — all of which contribute to recreating highly realistic and interactive representations of scenes and environments.

Light fields (LF) are vector fields of all the light rays passing through a given region in space, describing the intensity and direction of light at every point. This is fully described through the plenoptic function [Adelson1991] as follows: P(x,y,z,θ,ϕ,λ,t), where x, y, and z describe the 3D position of sampling, θ and ϕ are the angular direction, λ is the wavelength of the light ray, and t is time. Traditionally, LFs are represented using the two-plane parametrization [Levoy1996] with 2 spatial dimensions and 2 angular dimensions; however, this parametrization limits the use case of LFs to processing planar visual stimuli. The plenoptic function can be leveraged beyond the two-plane parameterization for a highly detailed view reconstruction or view synthesis. Newer capture scenarios and representations enable increased immersion with LFs [Overbeck2018],[Broxton2020], which can be further advanced in the future.

Omnidirectional image (or video) representation can provide an all-encompassing 360-degree view of a scene from a point in space for immersive visualization [Yagi1999], [Maugey2023]. This is made possible by stitching multiple views together. The created spherical image can be stored using traditional image formats (i.e., 2D planar formats) by projecting the sphere to planar format (e.g., equirectangular projection, cubemap projection, and others); however, processing these special representations without proper consideration for their spherical nature results in errors or biases.

2.2 Processing Requirements for Realistic Immersive Telepresence

Immersive telepresence relies on capturing, transmitting, and rendering realistic representations of remote environments. “Capturing” can be considered an inherent part of the imaging modalities discussed in the previous section. For transmitting and rendering, there are different requirements to take into account.

Compression is an important step for telepresence that relies heavily on real-time transmission of the visual data from the remote scene. The importance of compression increases even more for immersive telepresence applications as immersive imaging modalities capture (and represent) more information and need even more compression compared to the telepresence using traditional 2D imaging systems. Compression of LFs [Stepanov2023], omnidirectional images and video [Croci2020], and other forms of immersive video such as MPEG Immersive Video [Boyce2021], volumetric 3D representations represented with point clouds [Graziosi2020], and textured 3D meshes [Marvie2022] have been a very hot research topic within the last decade, which led to the standardization of compression methods for some immersive imaging modalities.

Rendering [Eisert2023], [Maugey2023] is yet another important aspect, especially for LFs [Overbeck2018]. The LF data needs to be rendered correctly for the position of the viewer (i.e., to render interpolated or extrapolated views) to provide a realistic and immersive experience to the user. Without the view rendering (i.e., for interpolation or extrapolation), the final displayed visuals will appear jittery, which will make the experience harder to sustain the necessary “suspension of disbelief” for an immersive experience. Furthermore, this rendering has to be real-time, as it is a requirement for telepresence. Although technologies such as GPU acceleration and advanced compression algorithms ensure seamless interaction while minimizing latency, the quality and the realness of the remote scene are still to be solved.

Immersive telepresence systems rely on specialized hardware, including omnidirectional cameras, head-mounted displays, and motion tracking systems. These components must work in harmony to deliver high-quality, immersive experiences. Reducing prices and increasing availability of such specialized devices make them easier to deploy in industrial settings [Jackson2023] regardless of business size and enables the democratization of immersive imaging applications in a broader sense.

3. Efforts in Creating a Realistic Immersive Telepresence Experience

Creating an immersive telepresence system has been a topic of many scholarly studies. These include frameworks for group-to-group telepresence [Beck2013], creating capture and delivery frameworks for volumetric 3D models [Fechteler2013], and various other social XR applications [Cortés2024]. Google’s project Starline can also be mentioned here to include realness and immersion in its delivery of the visuals, creating an immersive experience [Lawrence2024], [Starline2025], although its main functionality is interpersonal video communication. In supporting realness, LFs [Broxton2020] and other types of neural representations [Suhail2022] can create views that can support reflections and similar non-Lambertian light material interactions in recreating light occurring in the remote scene, whereas the usual assumption for texturing reconstructed 3D objects is to assume Lambertian materials [Zhi2020].

Light field reconstruction [Gond2023] and new view synthesis from single-view [Lin2023] or sparse views [Chibane2021] can be a valid way to approach creating realistic immersive telepresence experiences. Various representations can be used to recreate various views that would support movement of the user and the spatial awareness factor of presence in the remote scene. These representations can be Multi-Planar Image (MPI) [Srinivasan2019], Multi-Cylinder Image (MCI) [Waidhofer2022], layered mesh representation [Broxton2020], and neural representations [Chibane2021], [Lin2023], [Gond 2023] – which rely on structured or unstructured 2D image captures of the remote scene.

Another way of creating a realistic immersive experience can be by combining the different imaging modalities – i.e., omnidirectional content and light fields – in the form of spherical light fields (SLFs). SLFs then enable rendering and view synthesis that can generate more realistic and immersive content. There have been various attempts to create SLFs by collecting linear captures vertically [Krolla2014], capturing omnidirectional content from the scene with multiple cameras [Maugey2019], and moving a single camera in a circular trajectory and utilizing deep neural networks to generate an image grid [Lo2023]. Nevertheless, these works either did not yield publicly available datasets or did not have precise localizations of the cameras. To address this, the Spherical Light Field Database (SLFDB) was introduced in previous work [Zerman2024], which provides a foundational dataset for testing and developing applications for realistic immersive telepresence applications.

4. Challenges and Limitations

Studies in creating realistic immersive telepresence environments showed that there are still certain challenges and limitations that need to be addressed to improve QoE and IMEx for these systems. These challenges include dataset availability, compression of the structured and unstructured LFs, new view synthesis and rendering, and QoE estimation. Most of these challenges are also discussed in our recent study [Zerman2024].

Figure 2 – A set of captures highlighting the effects of dynamically changing scene: lighting change and its effect on white balance (top) and dynamic capture environment, where people appear and disappear (bottom).

Datasets relevant to realistic immersive telepresence tasks, such as the SLFDB [Zerman2024], are crucial for developing and validating immersive telepresence technologies. However, the creation and use of such datasets with precise spatial and angular resolution and very precise positioning of the camera face significant hurdles. Traditional camera grid setups are ineffective for capturing spherical light fields due to occlusions. This challenge necessitates having static scenes and meticulous camera positioning for a consistent capture of the scene. A dynamic scene brings a risk of non-consistent views within the same light field, as shown in Figure 2, which is non-ideal. These challenges highlight the critical need for innovative approaches to spherical light field dataset generation and sharing, ensuring future advancements in the field. Additionally, variations in lighting present significant challenges when capturing spherical light fields, as they impact the scene’s dynamic range, white balance, and color grading, which creates yet another challenge in database creation. Brightness and color variations, such as sunlight’s yellow tint compared to cloudy daylight, are not easy to correct and often require advanced algorithms for adjustment. Capturing static outdoor scenes remains a challenge for future work, as they still encounter lighting-related issues despite lacking movement.

LF compression is also another challenge that requires attention after combining imaging modalities. JPEG Pleno compression algorithm [ISO2021] is adapted for 2-dimensional grid-like structured LFs (e.g., LFs captured by microlens array or structured camera grids) and does not work for linear or unstructured captures. The situation is the same for many other compression methods, as most of them require some form of structured representation. Considering how well scene regression and other new view synthesis algorithms can adapt for unstructured inputs, one can also see the importance of advancing the compression field for unstructured LFs (e.g., the volume of light captured by cameras in various positions or in-the-wild user captures). Furthermore, the said LF compression method needs to be real-time to support immersive telepresence applications while having a very good visual QoE that would not impede realism.

Figure 3 – Strong artifacts created at the extremes of view synthesis with a large baseline (i.e. 30cm), where either the scene is warped (left – 360ViewSynth), or strong ghosting artifacts occur (right – PanoSynthVR).

Current new view synthesis methods are primarily designed to handle small baselines, typically just a few centimeters, and face significant challenges when applied to larger baselines required in telepresence applications. Challenges such as ghosting artifacts and unrealistic distortions (e.g., nonlinear distortions, stretching) occur when interpolating views, particularly for larger baselines, as shown in Figure 3. A recent comparative evaluation of PanoSynthVR and 360ViewSynth [Zerman2024] reveals that while 360ViewSynth marginally outperforms PanoSynthVR on average quality metrics, the scores for both methods remain suboptimal. PanoSynthVR struggles with large baselines, exhibiting prominent layer-like ghosting artifacts due to limitations in its MCI structure. Although 360ViewSynth produces visually better results, closer inspection shows that it distorts object perspectives by stretching them rather than accurately rendering the scene, leading to an unnatural user experience. These findings underscore the limitations of current state-of-the-art view synthesis methods for SLFs and highlight the complexity of addressing larger baselines effectively in view synthesis.

Assessing user satisfaction and immersion in telepresence systems is a multidimensional challenge, requiring assessments in three different strands as described in IMEx whitepaper: subjective assessment, behavioral assessment, and assessment via psycho-physiological methods [Perkis2020]. Quantitative metrics can be used for interaction latency and task performance metrics in a user study, and individual preferences and experiences can be collected qualitatively. Certain aspects of user experience, such as visual quality and user engagement, can also be collected as quantitative data during user studies – with user self-reporting. Additionally, behavioral assessment (e.g., user movement, interaction patterns) can be used to identify different use patterns. Here, the limiting factor is mainly the time and experience cost in running the said user studies. Therefore, the challenge here is to prepare a framework that can model the user experience for realistic immersive telepresence scenarios, which can speed up the assessment strategies.

Other limitations and aspects to consider include accessibility, privacy issues, and ethics. Regarding accessibility, it is important to ensure that immersive telepresence technologies are affordable and usable by diverse populations. The situation is improving as the cameras and headsets are getting cheaper and easier to use (e.g., faster and stronger on-device processing, removal of headset connection cables, increased ease of use with hand gestures, etc.). Nevertheless, hardware costs, connectivity requirements, and usability barriers must be further addressed to make these systems widely accessible. Regarding privacy and ethics, the realistic nature of immersive telepresence may raise ethical and privacy concerns. Capturing and transmitting live environments may involve sensitive data, necessitating robust privacy safeguards and ethical guidelines to prevent misuse. Also, privacy concerns regarding the headsets that rely on visual cameras for localization and mapping must be addressed.

5. Conclusions and Future Directions

Realistic immersive telepresence systems represent a transformative shift in how people interact with remote environments. By combining advanced imaging, rendering, and interaction technologies, these systems promise to revolutionize industries ranging from healthcare to entertainment. However, significant challenges remain, including data availability, compression, rendering, and QoE assessment. Addressing these obstacles will require collaboration across disciplines and industries.

To address these challenges, future research should focus on attempting to create relevant datasets for spherical LFs that address with accurate positioning of the camera and challenges such as dynamic lighting conditions and occlusions. Developing real-time, robust compression methods for unstructured LFs, which maintain visual quality and support immersive applications, is another critical area. Developing advanced view synthesis algorithms capable of handling large baselines without introducing artifacts or distortions and creating frameworks for user experience and QoE assessment methodologies are still open research questions.

Further into the future, the remaining challenges can be solved using learning-based algorithms for the challenges related to realness and spatiality factors as well as QoE estimation, increasing the level of interactivity and feeling of immersion through integrating different senses to the existing systems (e.g., spatial audio, haptics, natural interfaces), and increasing the standardization to create common frameworks that can manage interoperability across different systems. Long-term goals include the integration of realistic immersive displays – such as LF displays or improved holographic displays – and the convergence of telepresence systems with emerging technologies like 5G or 6G networks and edge computing, on which the efforts are already underway [Mahmoud2023].

References

  • [Adelson1991] Adelson, E. H., & Bergen, J. R. (1991). The plenoptic function and the elements of early vision (Vol. 2). Cambridge, MA, USA: Vision and Modeling Group, Media Laboratory, Massachusetts Institute of Technology.
  • [Beck2013] Beck, S., Kunert, A., Kulik, A., & Froehlich, B. (2013). Immersive group-to-group telepresence. IEEE transactions on visualization and computer graphics, 19(4), 616-625.
  • [Boyce2021] Boyce, J. M., Doré, R., Dziembowski, A., Fleureau, J., Jung, J., Kroon, B., … & Yu, L. (2021). MPEG immersive video coding standard. Proceedings of the IEEE, 109(9), 1521-1536.
  • [Broxton2020] Broxton, M., Flynn, J., Overbeck, R., Erickson, D., Hedman, P., Duvall, M., … & Debevec, P. (2020). Immersive light field video with a layered mesh representation. ACM Transactions on Graphics (TOG), 39(4), 86-1.
  • [Chibane2021] Chibane, J., Bansal, A., Lazova, V., & Pons-Moll, G. (2021). Stereo radiance fields (SRF): Learning view synthesis for sparse views of novel scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7911-7920).
  • [Cortés2024] Cortés, C., Pérez, P., & García, N. (2023). Understanding latency and qoe in social xr. IEEE Consumer Electronics Magazine.
  • [Croci2020] Croci, S., Ozcinar, C., Zerman, E., Knorr, S., Cabrera, J., & Smolic, A. (2020). Visual attention-aware quality estimation framework for omnidirectional video using spherical Voronoi diagram. Quality and User Experience, 5, 1-17.
  • [Eisert2023] Eisert, P., Schreer, O., Feldmann, I., Hellge, C., & Hilsmann, A. (2023). Volumetric video– acquisition, interaction, streaming and rendering. In Immersive Video Technologies (pp. 289-326). Academic Press.
  • [Fechteler2013] Fechteler, P., Hilsmann, A., Eisert, P., Broeck, S. V., Stevens, C., Wall, J., … & Zahariadis, T. (2013, June). A framework for realistic 3D tele-immersion. In Proceedings of the 6th International Conference on Computer Vision/Computer Graphics Collaboration Techniques and Applications.
  • [Gond2023] Gond, M., Zerman, E., Knorr, S., & Sjöström, M. (2023, November). LFSphereNet: Real Time Spherical Light Field Reconstruction from a Single Omnidirectional Image. In Proceedings of the 20th ACM SIGGRAPH European Conference on Visual Media Production (pp. 1-10).
  • [Graziosi2020] Graziosi, D., Nakagami, O., Kuma, S., Zaghetto, A., Suzuki, T., & Tabatabai, A. (2020). An overview of ongoing point cloud compression standardization activities: Video-based (V-PCC) and geometry-based (G-PCC). APSIPA Transactions on Signal and Information Processing, 9, e13.
  • [IJsselsteijn2000] IJsselsteijn, W. A., De Ridder, H., Freeman, J., & Avons, S. E. (2000, June). Presence: concept, determinants, and measurement. In Human Vision and Electronic Imaging V (Vol. 3959, pp. 520-529). SPIE.
  • [ISO2021] ISO/IEC 21794-2:2021 (2021) Information technology – Plenoptic image coding system (JPEG Pleno) — Part 2: Light field coding.
  • [Jackson2023] Jackson, A. (2023, September) Meta Quest 3: Can businesses use VR day-to-day?, Technology Magazine. https://technologymagazine.com/digital-transformation/meta-quest-3-can-businesses-use-vr-day- to-day, Accessed: 2024-02-05.
  • [Kachach2021] Kachach, R., Orduna, M., Rodríguez, J., Pérez, P., Villegas, Á., Cabrera, J., & García, N. (2021, July). Immersive telepresence in remote education. In Proceedings of the International Workshop on Immersive Mixed and Virtual Environment Systems (MMVE’21) (pp. 21-24).
  • [Krolla2014] Krolla, B., Diebold, M., Goldlücke, B., & Stricker, D. (2014, September). Spherical Light Fields. In BMVC (No. 67.1–67.12).
  • [Lawrence2024] Lawrence, J., Overbeck, R., Prives, T., Fortes, T., Roth, N., & Newman, B. (2024). Project starline: A high-fidelity telepresence system. In ACM SIGGRAPH 2024 Emerging Technologies (pp. 1-2).
  • [Levoy1996] Levoy, M. & Hanrahan, P. (1996) Light field rendering, in Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (pp. 31-42), New York, NY, USA, Association for Computing Machinery.
  • [Lin2023] Lin, K. E., Lin, Y. C., Lai, W. S., Lin, T. Y., Shih, Y. C., & Ramamoorthi, R. (2023). Vision transformer for nerf-based view synthesis from a single input image. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 806-815).
  • [Lo2023] Lo, I. C., & Chen, H. H. (2023). Acquiring 360° Light Field by a Moving Dual-Fisheye Camera. IEEE Transactions on Image Processing.
  • [Mahmoud2023] Mahmood, A., Abedin, S. F., O’Nils, M., Bergman, M., & Gidlund, M. (2023). Remote-timber: an outlook for teleoperated forestry with first 5g measurements. IEEE Industrial Electronics Magazine, 17(3), 42-53.
  • [Marvie2022] Marvie, J. E., Krivokuća, M., Guede, C., Ricard, J., Mocquard, O., & Tariolle, F. L. (2022, September). Compression of time-varying textured meshes using patch tiling and image-based tracking. In 2022 10th European Workshop on Visual Information Processing (EUVIP) (pp. 1-6). IEEE.
  • [Maugey2019] Maugey, T., Guillo, L., & Cam, C. L. (2019, June). FTV360: A multiview 360° video dataset with calibration parameters. In Proceedings of the 10th ACM Multimedia Systems Conference (pp. 291-295).
  • [Maugey2023] Maugey, T. (2023). Acquisition, representation, and rendering of omnidirectional videos. In Immersive Video Technologies (pp. 27-48). Academic Press. [Minsky1980] Minsky, M. (1980). Telepresence. Omni, pp. 45-51.
  • [Overbeck2018] Overbeck, R. S., Erickson, D., Evangelakos, D., Pharr, M., & Debevec, P. (2018). A system for acquiring, processing, and rendering panoramic light field stills for virtual reality. ACM Transactions on Graphics (TOG), 37(6), 1-15.
  • [Perkis2020] Perkis, A., Timmerer, C., et al. (2020, May) “QUALINET White Paper on Definitions of Immersive Media Experience (IMEx)”, European Network on Quality of Experience in Multimedia Systems and Services, 14th QUALINET meeting (online), Online: https://arxiv.org/abs/2007.07032
  • [Schubert2001] Schubert, T., Friedmann, F., & Regenbrecht, H. (2001). The experience of presence: Factor analytic insights. Presence: Teleoperators & Virtual Environments, 10(3), 266-281.
  • [Srinivasan2019] Srinivasan, P. P., Tucker, R., Barron, J. T., Ramamoorthi, R., Ng, R., & Snavely, N. (2019). Pushing the boundaries of view extrapolation with multiplane images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 175-184).
  • [Starline2025] Project Starline: Be there from anywhere with our breakthrough communication technology. (n.d.). Online: https://starline.google/. Accessed: 2025-01-14
  • [Stepanov2023] Stepanov, M., Valenzise, G., & Dufaux, F. (2023). Compression of light fields. In Immersive Video Technologies (pp. 201-226). Academic Press.
  • [Suhail2022] Suhail, M., Esteves, C., Sigal, L., & Makadia, A. (2022). Light field neural rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8269-8279).
  • [Takatalo2008] Takatalo, J., Nyman, G., & Laaksonen, L. (2008). Components of human experience in virtual environments. Computers in Human Behavior, 24(1), 1-15.
  • [Valenzise2022] Valenzise, G., Alain, M., Zerman, E., & Ozcinar, C. (Eds.). (2022). Immersive Video Technologies. Academic Press.
  • [Waidhofer2022] Waidhofer, J., Gadgil, R., Dickson, A., Zollmann, S., & Ventura, J. (2022, October). PanoSynthVR: Toward light-weight 360-degree view synthesis from a single panoramic input. In 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (pp. 584-592). IEEE.
  • [Wisotzky2025] Wisotzky, E. L., Rosenthal, J. C., Meij, S., van den Dobblesteen, J., Arens, P., Hilsmann, A., … & Schneider, A. (2025). Telepresence for surgical assistance and training using eXtended reality during and after pandemic periods. Journal of telemedicine and telecare, 31(1), 14-28.
  • [Yagi1999] Yagi, Y. (1999). Omnidirectional sensing and its applications. IEICE transactions on information and systems, 82(3), 568-579.
  • [Zerman2024] Zerman, E., Gond, M., Takhtardeshir, S., Olsson, R., & Sjöström, M. (2024, June). A Spherical Light Field Database for Immersive Telecommunication and Telepresence Applications. In 2024 16th International Conference on Quality of Multimedia Experience (QoMEX) (pp. 200-206). IEEE.
  • [Zhi2020] Zhi, T., Lassner, C., Tung, T., Stoll, C., Narasimhan, S. G., & Vo, M. (2020). TexMesh: Reconstructing detailed human texture and geometry from RGB-D video. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X 16 (pp. 492-509). Springer International Publishing.

From Theory to Practice: System QoE Assessment by Providers


Service and network providers actively evaluate and derive Quality of Experience (QoE) metrics within their systems, which necessitates suitable monitoring strategies. Objective QoE monitoring involves mapping Quality of Service (QoS) parameters into QoE scores, such as calculating Mean Opinion Scores (MOS) or Good-or-Better (GoB) ratios, by using appropriate mapping functions. Alternatively, individual QoE monitoring directly assesses user experience based on self-reported feedback. We discuss the strengths, weaknesses, opportunities, and threats of both approaches. Based on the collected data from individual or objective QoE monitoring, providers can calculate the QoE metrics across all users in the system, who are subjected to a range of varying QoS conditions. The aggregated QoE across all users in the system for a dedicated time frame is referred to as system QoE. Based on a comprehensive simulation study, the expected system QoE, the system GoB ratio, as well as QoE fairness across all users are computed. Our numerical results explore whether objective and individual QoE monitoring lead to similar conclusions. In our previous work [Hoss2024], we provided a theoretical framework and the mathematical derivation of the corresponding relationships between QoS and system QoE for both monitoring approaches. Here, the focus is on illustrating the key differences of individual and objective QoE monitoring and the consequences in practice.

System QoE: Assessment of QoE of Users in a System

The term “System QoE” refers to the assessment of user experience from a provider’s perspective, focusing on the perceived quality of the users of a particular service. Thereby, providers may be different stakeholders along the service delivery chain, for example, network service provider and, in particular, Internet service provider, or application service provider. QoE monitoring delivers the necessary information to evaluate the system QoE, which is the basis for appropriate actions to ensure high-quality services and high QoE, e.g., through resource and network management.

Typically, QoE monitoring and management involves evaluating how well the network and services perform by analyzing objective metrics like Quality of Service (QoS) parameters (e.g., latency, jitter, packet loss) and mapping them to QoE metrics, such as Mean Opinion Scores (MOS). However, QoE monitoring involves a series of steps that providers need to follow: 1) identify relevant QoE metrics of interest, like MOS or GoB ratio; 2) deploy a monitoring framework to collect and analyze data. We will discuss this in the following.

The scope of system QoE metrics is to quantify the QoE across all users consuming the service for a dedicated time frame, e.g., one day, one week, or one month. Thereby, the expected QoE of an arbitrary user in the system, the ratio of all users experiencing Good-or-Getter (GoB) quality or Poor-or-Worse (PoW) quality, as well as the QoE fairness across all users are of interest. The users in the system may achieve different QoS on network level, e.g., different latency, jitter, throughput, since resources are shared among the users. The same is also true on application level with varying application-specific QoS parameters, for instance, video resolution, buffering time, or startup delays for video streaming. The varying QoS conditions manifest then in the system QoE. Fundamental relationships between the system QoE and QoS metrics were derived in [Hoss2020].

Expected system QoE: The expected system QoE is the average QoE rating of an arbitrary user in the system. The fundamental relationship in [Hoss2020] shows that the expected system QoE may be derived by mapping the QoS as experienced by a user to the corresponding MOS value and computing the average MOS over the varying QoS conditions. Thus, a MOS mapping function is required to map the QoS parameters to MOS values.

System GoB and System PoW: The Mean Opinion Score provides an average score but fails to account for the variability in users and the user rating diversity. Thus, users obtaining the same QoS conditions, may rate this subjectively differently. Metrics like the percentage of users rating the experience as Good or Better or as Poor or Worse provide more granular insights. Such metrics help service providers understand not just the average quality, but how quality is distributed across the user base. The fundamental relationship in [Hoss2020] shows that the system GoB and PoW may be derived by mapping the QoS as experienced by a user to the corresponding GoB or PoW value and computing the average over the varying QoS conditions, respectively. Thus, a GoB or PoW mapping function is required.

QoE Fairness: Operators must not only ensure that users are sufficiently satisfied, but also that this is done in a fair manner. However, what is considered fair in the QoS domain may not necessarily translate to fairness in the QoE domain, making the need to apply a QoE fairness index. [Hoss2018] defines the QoE fairness index as a linear transformation of the standard deviation of MOS values to the range [0;1]. The observed standard deviation is normalized with the maximal standard deviation, being theoretically possible for MOS values in a finite range, typically between 1 (poor quality) and 5 (excellent quality). The difference between 1 (indicating perfect fairness) and the normalized standard deviation of MOS values (indicating the degree of unfairness) yields the fairness index.

The fundamental relationships allow different implementations of QoE monitoring in practice, which are visualized in Figure 1 and discussed in the following. We differentiate between individual QoE monitoring and objective QoE monitoring and provide a qualitative strengths-weaknesses-opportunities-threats (SWOT) analysis.

Figure 1. QoE monitoring approaches to assess system QoE: individual and objective QoE monitoring.

Individual QoE Monitoring

Individual QoE monitoring refers to the assessment of system QoE by collecting individual ratings, e.g., on a 5-point rating scale, from users through their personal feedback. This approach captures the unique and individual nature of user experiences, accounting for factors like personal preferences and context. It allows optimizing services in a personalized manner, which is regarded as a challenging future research objective, see [Schmitt2017, Zhu2018, Gao2020, Yamazaki2021, Skorin-Kapov2018].

The term “individual QoE” was nicely described by in [Zhu2018]: “QoE, by definition, is supposed to be subjective and individual. However, we use the term ‘individual QoE’, since the majority of the literature on QoE has not treated it as such. […] The challenge is that the set of individual factors upon which an individual’s QoE depends is not fixed; rather this (sub)set varies from one context to another, and it is this what justifies even more emphatically the individuality and uniqueness of a user’s experience – hence the term ‘individual QoE’.”

Strengths: Individual QoE monitoring provides valuable insights into how users personally experience a service, capturing the variability and uniqueness of individual perceptions that objective metrics often miss. A key strength is that it gathers direct feedback from a provider’s own users, ensuring a representative sample rather than relying on external or unrepresentative populations. Additionally, it does not require a predefined QoE model, allowing for flexibility in assessing user satisfaction. This approach enables service providers to directly derive various system QoE metrics.

Weaknesses: Individual QoE monitoring is mainly feasible for application service providers and requires additional monitoring efforts beyond the typical QoS tools already in place. Privacy concerns are significant, as collecting sensitive user data can raise issues with data protection and regulatory compliance, such as with GDPR. Additionally, users may use the system primarily as a complaint tool, focusing on reporting negative experiences, which could skew results. Feedback fatigue is another challenge, where users may become less willing to provide ongoing input over time, limiting the validity and reliability of the data collected.

Opportunities: Data from individual QoE monitoring can be utilized to enhance individual user QoE through better resource and service management. From a business perspective, offering a personalized QoE can set providers apart in competitive markets and the data collected has monetization potential, supporting personalized marketing. Data from individual QoE monitoring enables deriving objective metrics like MOS or GoB, to update existing QoE models or to develop new QoE models for novel services by correlating it with QoS parameters. Those insights can drive innovation, leading to new features or services that meet evolving customer needs.

Threats: Individual QoE monitoring accounts for factors outside the provider’s control, such as environmental context (e.g., noisy surroundings [Reichl2015, Jiménez2020]), which may affect user feedback but not reflect actual service performance. Additionally, as mentioned, it may be used as a complaint tool, with users disproportionately reporting negative experiences. There is also the risk of over-engineering solutions by focusing too much on minor individual issues, potentially diverting resources from addressing more significant, system-wide challenges that could have a broader impact on overall service quality

Objective QoE Monitoring

Objective QoE monitoring involves assessing user experience by translating measurable QoS parameters on network level, such as latency, jitter, and packet loss, and on application level, such as video resolution or stalling duration for video streaming, into QoE metrics using predefined models and mapping functions. Unlike individual QoE monitoring, it does not require direct user feedback and instead relies on technically measurable parameters to estimate user satisfaction and various QoE metrics [Hoss2016]. Thereby, the fundamental relationships between system QoE and QoS [Hoss2020] are utilized. For computing the expected system QoE, a MOS mapping function is required, which maps a dedicated QoS value to a MOS value. For computing the system GoB, a GoB mapping function between QoS and GoB is required. Note that the QoS may be a vector of various QoS parameters, which are the input values for the mapping function.

Recent works [Hoss2022] indicated that industrial user experience index values, as obtained by the Threshold-Based Quality (TBQ) model for QoE monitoring, may be accurate enough to derive system QoE metrics. The TBQ model is a framework that defines application-specific thresholds for QoS parameters to assess and classify the user experience, which may be derived with simple and interpretable machine learning models like decision trees.

Strengths: Objective QoE monitoring relies solely on QoS monitoring, making it applicable for network providers, even for encrypted data streams, as long as appropriate QoE models are available, see for example [Juluri2015, Orsolic2020, Casas2022]. It can be easily integrated into existing QoS monitoring tools already deployed, reducing the need for additional resources or infrastructure. Moreover, it offers an objective assessment of user experience, ensuring that the same QoS conditions for different users are consistently mapped to the same QoE scores, as required for QoE fairness.

Weaknesses: Objective QoE monitoring requires specific QoE models and mapping functions for each desired QoE metric, which can be complex and resource-intensive to develop. Additionally, it has limited visibility into the full user experience, as it primarily relies on network-level metrics like bandwidth, latency, and jitter, which may not capture all factors influencing user satisfaction. Its effectiveness is also dependent on the accuracy of the monitored QoS metrics; inaccurate or incomplete data, such as from encrypted packets, can lead to misguided decisions and misrepresentation of the actual user experience.

Opportunities: Objective QoE monitoring enables user-centric resource and network management for application and network service providers by tracking QoS metrics, allowing for dynamic adjustments to optimize resource utilization and improve service delivery. The integration of AI and automation with QoS monitoring can increase the efficiency and accuracy of network management from a user-centric perspective. The objective QoE monitoring data can also enhance Service Level Agreements (SLAs) towards Experience Level Agreements (ELAs) as discussed in [Varela2015].

Threats: One risk of Objective QoE monitoring is the potential for incorrect traffic flow characterization, where data flows may be misattributed to the wrong applications, leading to inaccurate QoE assessments. Additionally, rapid technological changes can quickly make existing QoS monitoring tools and QoE models outdated, necessitating constant upgrades and investment to keep pace with new technologies. These challenges can undermine the accuracy and effectiveness of objective QoE monitoring, potentially leading to misinformed decisions and increased operational costs.

Numerical Results: Visualizing the Differences

In this section, we explore and visualize the obtained system QoE metrics, which are based on collected data either through i) individual QoE monitoring or ii) objective QoE monitoring. The question arises if the two monitoring approaches lead to the same results and conclusions for the provider. The obvious approach for computing the system QoE metrics is to use i) the individual ratings collected directly from the users and ii) the MOS scores obtained through mapping the objectively collected QoS parameters. While the discrepancies are derived mathematically in [Hoss2024], this article presents a visual representation of the differences between individual and objective QoE monitoring through a comprehensive simulation study. This simulation approach allows us to quantify the expected system QoE, the system GoB ratio, and the QoE fairness for a multitude of potential system configurations, which we manipulate in the simulation with varying QoS distributions. Furthermore, we demonstrate methods for utilizing data obtained through either individual QoE monitoring or objective QoE monitoring to accurately calculate the system QoE metrics as intended for a provider.

For the numerical results, the web QoE use case in [Hoss2024] is employed. We conduct a comprehensive simulation study, in which the QoS settings are varied. To be more precise, the page load times (PLTs) are varied, such that the users in the system experience a range of different loading times. For each simulation run, the average PLT and the standard deviation of the PLT across all users in the system are fixed. Then each user gets a randomly assigned PLT according to a beta distribution in the range between 0s and 8s with the specified average and standard deviation. The PLTs per user are sampled from that parameterized beta distribution.

For a concrete PLT, the corresponding user rating distribution is available and follows in our case a shifted binomial distribution, where the mean of the binomial distribution reflects the MOS value for that condition. To mention this clearly, this binomial distribution is a conditional random variable with discrete values on a 5-point scale: the user ratings are conditioned on the actual QoS value. For the individual QoE monitoring, the user ratings are sampled from that conditional random variable, while the QoS values are sampled from the beta distribution. For objective QoE monitoring, only the QoS values are used, but in addition, the MOS mapping function provided in [Hoss2024] is used. Thus, each QoS value is mapped to a continuous MOS value within the range of 1 to 5.

Figure 2 shows the expected system QoE using individual QoE monitoring as well as objective QoE monitoring depending on the average QoS as well as the standard deviation of the QoS, which is indicated by the color. Each point in the figure represents a single simulation run with a fixed average QoS and fixed standard deviation. It can be seen that both QoE monitoring approaches lead to the same results, which was also formally proven in [Hoss2024]. Note that higher QoS variances also result in higher expected system since for the same average QoS, there may be some users with larger QoS values, but also some users with lower QoS values. Due to the non-linear mapping between QoS and QoE this results in higher QoE scores.

Figure 3 shows the system GoB ratio, which can be simply computed with individual QoE monitoring. However, in the case of objective QoE monitoring, we assume that only a MOS mapping function is available. It is tempting to derive the GoB ratio by deriving the ratio of MOS values which are good or better. However, this leads to wrong results, see [Hoss2020]. Nevertheless, the GoB mapping function can be approximated from an existing MOS mapping function, see [Hoss2022, Hoss2017, Perez2023]. Then, the same conclusions are then derived through objective QoE monitoring as for individual QoE monitoring.

Figure 4 considers now QoE fairness for both monitoring approaches. It is tempting to use the user rating values from individual QoE monitoring and apply the QoE fairness index. However, in that case, the fairness index considers the variances of the system QoS and additionally the variances due to user rating diversity, as shown in [Hoss2024]. However, this is not the intended application of the QoE fairness index, which aims to evaluate the fairness objectively from a user-centric perspective, such that resource management can be adjusted and to provide users with high and fairly distributed quality. Therefore, the QoE fairness index uses MOS values, such that users with the same QoS are assigned the same MOS value. In a system with deterministic QoS conditions, i.e., the standard deviation diminishes, the QoE fairness index is 100%, see the results for the objective QoE monitoring. Nevertheless, the individual QoE monitoring also allows computing the MOS values for similar QoS values and then to apply the QoE fairness index. Then, comparable results are obtained as for objective QoE monitoring.

Figure 2. Expected system QoE when using individual and objective QoE monitoring. Both approaches lead to the same expected system QoE.
Figure 3. System GoB ratio: Deriving the ratio of MOS values which are good or better does not work for objective QoE monitoring. But an adjusted GoB computation, by approximating GoB through MOS, leads to the same conclusions as individual QoE monitoring, which simply measures the system GoB.
Figure 4. QoE Fairness: Using the user rating values obtained through individual QoE monitoring additionally includes the user rating diversity, which is not desired in network or resource management. However, individual QoE monitoring also allows computing the MOS values for similar QoS values and then to apply the QoE fairness index, which leads to comparable insights as objective QoE monitoring.

Conclusions

Individual QoE monitoring and objective QoE monitoring are fundamentally distinct approaches for assessing system QoE from a provider’s perspective. Individual QoE monitoring relies on direct user feedback to capture personalized experiences, while objective QoE monitoring uses QoS metrics and QoE models to estimate QoE metrics. Both methods have strengths and weaknesses, offering opportunities for service optimization and innovation while facing challenges such as over-engineering and the risk of models becoming outdated due to technological advancements, as summarized in our SWOT analysis. However, as the numerical results have shown, both approaches can be used with appropriate modifications and adjustments to derive various system QoE metrics like expected system QoE, system GoB and PoW ratio, as well as QoE fairness. A promising direction for future research is the development of hybrid approaches that combine both methods, allowing providers to benefit from objective monitoring while integrating the personalization of individual feedback. This could also be interesting to integrate in existing approaches like the QoS/QoE Monitoring Engine proposal [Siokis2023] or for upcoming 6G networks, which may allow the radio access network (RAN) to autonomously adjust QoS metrics in collaboration with the application to enhance the overall QoE [Bertenyi2024].

References

[Bertenyi2024] Berteny, B., Kunzmann, G., Nielsen, S., and Pedersen, K. Andres, P. (2024). Transforming the 6G vision to action. Nokia Whitepaper, 28 June 2024. Url: https://www.bell-labs.com/institute/white-papers/transforming-the-6g-vision-to-action/.

[Casas2022] Casas, P., Seufert, M., Wassermann, S., Gardlo, B., Wehner, N., & Schatz, R. (2022). DeepCrypt: Deep learning for QoE monitoring and fingerprinting of user actions in adaptive video streaming. In 2022 IEEE 8th International Conference on Network Softwarization (NetSoft) (pp. TBD). IEEE.

[Gao2020] Gao, Y., Wei, X., & Zhou, L. (2020). Personalized QoE improvement for networking video serviceIEEE Journal on Selected Areas in Communications38(10), 2311-2323.

[Hoss2016] Hoßfeld, T., Schatz, R., Egger, S., & Fiedler, M. (2016). QoE beyond the MOS: An in-depth look at QoE via better metrics and their relation to MOS. Quality and User Experience, 1, 1-23.

[Hoss2017] Hoßfeld, T., Fiedler, M., & Gustafsson, J. (2017, May). Betas: Deriving quantiles from MOS-QoS relations of IQX models for QoE management. In 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM) (pp. 1011-1016). IEEE.

[Hoss2018] Hoßfeld, T., Skorin-Kapov, L., Heegaard, P. E., & Varela, M. (2018). A new QoE fairness index for QoE management. Quality and User Experience, 3, 1-23.

[Hoss2020] Hoßfeld, T., Heegaard, P. E., Skorin-Kapov, L., & Varela, M. (2020). Deriving QoE in systems: from fundamental relationships to a QoE-based Service-level Quality IndexQuality and User Experience5(1), 7.

[Hoss2022] Hoßfeld, T., Schatz, R., Egger, S., & Fiedler, M. (2022). Industrial user experience index vs. quality of experience models. IEEE Communications Magazine, 61(1), 98-104.

[Hoss2024] Hoßfeld, T., & Pérez, P. (2024). A theoretical framework for provider’s QoE assessment using individual and objective QoE monitoring. In 2024 16th International Conference on Quality of Multimedia Experience (QoMEX) (pp. TBD). IEEE.

[Jiménez2020] Jiménez, R. Z., Naderi, B., & Möller, S. (2020, May). Effect of environmental noise in speech quality assessment studies using crowdsourcing. In 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX) (pp. 1-6). IEEE.

[Juluri2015] Juluri, P., Tamarapalli, V., & Medhi, D. (2015). Measurement of quality of experience of video-on-demand services: A survey. IEEE Communications Surveys & Tutorials, 18(1), 401-418.

[Orsolic2020] Orsolic, I., & Skorin-Kapov, L. (2020). A framework for in-network QoE monitoring of encrypted video streaming. IEEE Access, 8, 74691-74706.

[Perez2023] Pérez, P. (2023). The Transmission Rating Scale and its Relation to Subjective Scores. In 2023 15th International Conference on Quality of Multimedia Experience (QoMEX) (pp. 31-36). IEEE.

[Reichl2015] Reichl, P., et al. (2015, May). Towards a comprehensive framework for QoE and user behavior modelling. In 2015 Seventh International Workshop on Quality of Multimedia Experience (QoMEX) (pp. 1-6). IEEE.

[Schmitt2017] Schmitt, M., Redi, J., Bulterman, D., & César, P. (2017). Towards individual QoE for multiparty videoconferencing. IEEE Transactions on Multimedia, 20(7), 1781-1795.

[Siokis2023] Siokis, A., Ramantas, K., Margetis, G., Stamou, S., McCloskey, R., Tolan, M., & Verikoukis, C. V. (2023). 5GMediaHUB QoS/QoE monitoring engine. In 2023 IEEE 28th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD) (pp. TBD). IEEE.

[Skorin-Kapov2018] Skorin-Kapov, L., Varela, M., Hoßfeld, T., & Chen, K. T. (2018). A survey of emerging concepts and challenges for QoE management of multimedia servicesACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)14(2s), 1-29.

[Varela2015] Varela, M., Zwickl, P., Reichl, P., Xie, M., & Schulzrinne, H. (2015, June). From service level agreements (SLA) to experience level agreements (ELA): The challenges of selling QoE to the user. In 2015 IEEE International Conference on Communication Workshop (ICCW) (pp. 1741-1746). IEEE.

[Yamazaki2021] Yamazaki, T. (2021). Quality of experience (QoE) studies: Present state and future prospectIEICE Transactions on Communications104(7), 716-724.

[Zhu2018] Zhu, Y., Guntuku, S. C., Lin, W., Ghinea, G., & Redi, J. A. (2018). Measuring individual video QoE: A survey, and proposal for future directions using social mediaACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)14(2s), 1-24.

Energy-Efficient Video Streaming: Open-Source Tools, Datasets, and Solutions


Abstract: Energy efficiency has become a crucial aspect of today’s IT infrastructures, and video (streaming) accounts for over half of today’s Internet traffic. This column highlights open-source tools, datasets, and solutions addressing energy efficiency in video streaming presented at ACM Multimedia Systems 2024 and its co-located workshop ACM Green Multimedia Systems.

Introduction

Across various platforms, users seek the highest Quality of Experience (QoE) in video communication and streaming. Whether it’s a crucial business meeting or a relaxing evening of entertainment, individuals desire seamless and high-quality video experiences. However, meeting this demand for high-quality video comes with a cost: increased energy usage [1],[2]. This energy consumption occurs at every stage of the process, including content provision via cloud services and consumption on end users’ devices [3]. Unfortunately, this heightened energy consumption inevitably leads to higher CO2 emissions (except for renewable energy sources), posing environmental challenges. It emphasizes the need for studies to assess the carbon footprint of video streaming. 

Content provision is a critical stage in video streaming, involving encoding videos into various formats, resolutions, and bitrates. Encoding demands computing power and energy, especially in cloud-based systems. Cloud computing has become famous for video encoding due to its scalability [4] to adjust cloud resources to handle changing workloads and flexibility [5] to scale their operations based on demand. However, this convenience comes at a cost. Data centers, the heart of cloud computing, consume a significant portion of global electricity, around 3% [6]. Video encoding is one of the biggest energy consumers within these data centers. Therefore, optimizing video encoding for lower energy consumption is crucial for reducing the environmental impact of cloud-based video delivery.

Content consumption [7] involves the device using the network interface card to request and download video segments from the server, decompressing them for playback, and finally rendering the decoded frames on the screen, where the energy consumption depends on the screen technology and brightness settings.

The GAIA project showcased its research on the environmental impact of video streaming at the recent 15th ACM Multimedia Systems Conference (April 15-18, Bari, Italy). We presented our findings at relevant conference sessions: Open-Source Software and Dataset and the Green Multimedia Systems (GMSys) workshop.

Open Source Software

GREEM: An Open-Source Benchmark Tool Measuring the Environmental Footprint of Video Streaming [PDF] [Github] [Poster]

GREEM (Gaia Resource Energy and Emission Monitoring) aims to measure energy usage during video encoding and decoding processes. GREEM tracks the effects of video processing on hardware performance and provides a suite of analytical scenarios. This tool offers easy-to-use scenarios covering the most common video streaming situations, such as measuring sequential and parallel video encoding and decoding.

Contributions:

  • Accessible:  GREEM is available in a GitHub repository (https://github.com/cd-athena/GREEM) for energy measurement of video processing.
  • Automates experimentation: It allows users to easily configure and run various encoding scenarios with different parameters to compare results.
  • In-depth monitoring: The tool traces numerous hardware parameters, specifically monitoring energy consumption and GPU metrics, including core and memory utilization, temperature, and fan speed, providing a complete picture of video processing resource usage.
  • Visualization: GREEM offers scripts that generate analytic plots, allowing users to visualize and understand their measurement results easily.

Verifiable: GREEM empowers researchers with a tool that has earned the ACM Reproducibility Badge, which allows others to reproduce the experiments and results reported in the paper.

Open Source Datasets

VEED: Video Encoding Energy and CO2 Emissions Dataset for AWS EC2 instances [PDF] [Github] [Poster]

As video encoding increasingly shifts to cloud-based services, concerns about the environmental impact of massive data centers arise. The Video Encoding Energy and CO2 Emissions Dataset (VEED) provides the energy consumption and CO2 emissions associated with video encoding on Amazon’s Elastic Compute Cloud (EC2) instances. Additionally, VEED goes beyond energy consumption as it also captures encoding duration and CPU utilization.

Contributions:

  • Findability: A comprehensive metadata description file ensures VEED’s discoverability for researchers.
  • Accessibility: VEED is open for download on GitHub (https://github.com/cd-athena/VEEDdataset), removing access barriers for researchers. Core findings in the research that leverages the VEED dataset have been independently verified (ACM Reproducibility Badge).
  • Interoperability: The dataset is provided in a comma-separated value (CSV) format, allowing integration with various analysis applications.
  • Reusability: Description files empower researchers to understand the data structure and context, facilitating its use in diverse analytical projects.

COCONUT: Content Consumption Energy Measurement Dataset for Adaptive Video Streaming  [PDF] [Github]

COCONUT is a dataset comprising the energy consumption of video streaming across various devices and different HAS (HTTP Adaptive Streaming) players. COCONUT captures user data during MPEG-DASH video segment streaming on laptops, smartphones, and other client devices, measuring energy consumption at different stages of streaming, including segment retrieval through the network interface card, video decoding, and rendering on the device.This paper has been designated the ACM Artifacts Available badge, signifying that the COCONUT dataset is publicly accessible. COCONUT can be accessed at https://athena.itec.aau.at/coconut/.

Second International ACM Green Multimedia Systems Workshop — GMSys 2024

VEEP: Video Encoding Energy and CO2 Emission Prediction  [pdf] [slides]

In VEEP, a machine learning (ML) scheme that empowers users to predict the energy consumption and CO2 emissions associated with cloud-based video encoding.

Contributions:

  • Content-aware energy prediction:  VEEP analyzes video content to extract features impacting encoding complexity. This understanding feeds an ML model that accurately predicts the energy consumption required for encoding the video on AWS EC2 instances. (High Accuracy: Achieves an R² score of 0.96)
  • Real-time carbon footprint: VEEP goes beyond energy. It also factors in real-time carbon intensity data based on the location of the cloud instance. This allows VEEP to calculate the associated CO2 emissions for your encoding tasks at encoding time.
  • Resulting impact: By carefully selecting the type and location of cloud instances based on VEEP’s predictions, CO2 emissions can be reduced by up to 375 times. This significant reduction signifies VEEP’s potential to contribute to greener video encoding.

Conclusions

This column provided an overview of the GAIA project’s research on the environmental impact of video streaming, presented at the 15th ACM Multimedia Systems Conference. GREEM measurement tool empowers developers and researchers to measure the energy and  CO2 emissions of video processing. VEED provides valuable insights into energy consumption and CO2 emissions during cloud-based video encoding on AWS EC2 instances. COCONUT sheds light on energy usage during video playback on various devices and with different players, aiding in optimizing client-side video streaming. Furthermore, VEEP, a machine learning framework, takes energy efficiency a step further. It allows users to predict energy consumption and CO2 emissions associated with cloud-based video encoding, allowing users to select cloud instances that minimize environmental impact. These studies can help researchers, developers, and service providers to optimize video streaming for a more sustainable future. The focus on encoding and playback highlights the importance of a holistic approach considering the entire video streaming lifecycle. While these papers primarily focus on the environmental impact of video streaming, a strong connection exists between energy efficiency and QoE [8],[9],[10]. Optimizing video processing for lower energy consumption can sometimes lead to trade-offs regarding video quality. Future research directions could explore techniques for optimizing video processing while ensuring a consistently high QoE for viewers.

References

[1] A. Katsenou, J. Mao, and I. Mavromatis, “Energy-Rate-Quality Tradeoffs of State-of-the-Art Video Codecs.” arXiv, Oct. 02, 2022. Accessed: Oct. 06, 2022. [Online]. Available: http://arxiv.org/abs/2210.00618

[2] H. Amirpour, V. V. Menon, S. Afzal, R. Prodan, and C. Timmerer, “Optimizing video streaming for sustainability and quality: The role of preset selection in per-title encoding,” in 2023 IEEE International Conference on Multimedia and Expo (ICME), IEEE, 2023, pp. 1679–1684. Accessed: May 05, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10219577/

[3] S. Afzal, R. Prodan, C. Timmerer, “Green Video Streaming: Challenges and Opportunities – ACM SIGMM Records.” Accessed: May 05, 2024. [Online]. Available: https://records.sigmm.org/2023/01/08/green-video-streaming-challenges-and-opportunities/

[4] A. Atadoga, U.  J. Umoga, O. A. Lottu, and E. O. Sodiya, “Evaluating the impact of cloud computing on accounting firms: A review of efficiency, scalability, and data security,” Glob. J. Eng. Technol. Adv., vol. 18, no. 2, pp. 065–075, Feb. 2024, doi: 10.30574/gjeta.2024.18.2.0027.

[5] B. Zeng, Y. Zhou, X. Xu, and D. Cai, “Bi-level planning approach for incorporating the demand-side flexibility of cloud data centers under electricity-carbon markets,” Appl. Energy, vol. 357, p. 122406, Mar. 2024, doi: 10.1016/j.apenergy.2023.122406.

[6] M. Law, “Energy efficiency predictions for data centers in 2023.” Accessed: May 03, 2024. [Online]. Available: https://datacentremagazine.com/articles/efficiency-to-loom-large-for-data-centre-industry-in-2023

[7] C. Yue, S. Sen, B. Wang, Y. Qin, and F. Qian, “Energy considerations for ABR video streaming to smartphones: Measurements, models and insights,” in Proceedings of the 11th ACM Multimedia Systems Conference, 2020, pp. 153–165, doi: 10.1145/3339825.3391867.

[8] G. Bingöl, A. Floris, S. Porcu, C. Timmerer, and L. Atzori, “Are Quality and Sustainability Reconcilable? A Subjective Study on Video QoE, Luminance and Resolution,” in 2023 15th International Conference on Quality of Multimedia Experience (QoMEX), IEEE, 2023, pp. 19–24. Accessed: May 06, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10178513/

[9] G. Bingöl, S. Porcu, A. Floris, and L. Atzori, “An Analysis of the Trade-Off Between Sustainability and Quality of Experience for Video Streaming,” in 2023 IEEE International Conference on Communications Workshops (ICC Workshops), IEEE, 2023, pp. 1600–1605. Accessed: May 06, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10283614/

[10] C. Herglotz, W. Robitza, A. Raake, T. Hossfeld, and A. Kaup, “Power Reduction Opportunities on End-User Devices in Quality-Steady Video Streaming.” arXiv, May 24, 2023. doi: 10.48550/arXiv.2305.15117.

Towards Immersive Digiphysical Experiences


Immersive experiences have the potential of redefining traditional forms of media engagement by intricately combining reality with imagination. Motivated by necessities, current developments and emerging technologies, this column sets out to bridge immersive experiences in both digital and physical realities. Fitting under the umbrella term of eXtended Reality (XR), the first section describes various realizations of blending digital and physical elements to design what we refer to as immersive digiphysical experiences. We further highlight industry and research initiatives related to driving the design and development of such experiences, considered to be key building-blocks of the futuristic ‘metaverse’. The second section outlines challenges related to assessing, modeling, and managing the Quality of Experience (QoE) of immersive digiphysical experiences and reflects upon ongoing work in the area. While potential use cases span a wide range of application domains, the third section elaborates on the specific case of conference organization, which has over the past few years spanned from fully physical, to fully virtual, and finally to attempts at hybrid organization. We believe this use case provides valuable insights into needs and promising approaches, to be demonstrated and experienced at the upcoming 16th edition of the International Conference on Quality of Multimedia Experience (QoMEX 2024) in Karlshamn, Sweden in June 2024.

Multiple users engaged in a co-located mixed reality experience

Bridging The Digital And Physical Worlds

According to [IMeX WP, 2020], immersive media have been described as involving “multi-modal human-computer interaction where either a user is immersed inside a digital/virtual space or digital/virtual artifacts become a part of the physical world”. Spanning the so-called virtuality continuum [Milgram, 1995], immersive media experiences may involve various realizations of bridging the digital and physical worlds, such as the seamless integration of digital content with the real world (via Augmented or Mixed Reality, AR/MR), and vice versa by incorporating real objects into a virtual environment (Augmented Virtuality, AV). More recently, the term eXtended Reality (XR) (also sometimes referred to as xReality) has been used as an umbrella term for a wide range of levels of “realities”, with [Rauschnabel, 2022] proposing a distinction between AR/MR and Virtual Reality (VR) based on whether the physical environment is, at least visually, part of the user’s experience.

By seamlessly merging digital and physical elements and supporting real-time user engagement with both digital and physical components, immersive digiphysical (i.e., both digitally and physically accessible [Westerlund, 2020]) experiences have the potential of providing compelling experiences blurring the distinction between the real and virtual worlds. A key aspect is that of digital elements responding to user input or the physical environment, and the physical environment responding to interactions with digital objects. Going beyond only visual or auditory stimuli, the incorporation of additional senses, for example via haptic feedback or olfactory elements, can contribute to multisensory engagement [Gibbs, 2022].

The rapid development of XR technologies has been recognized as a key contributor to realizing a wide range of applications built on the fusion of the digital and physical worlds [NEM WP, 2022]. In its contribution to the European XR Coalition (launched by the European Commission), the New European Media Initiative (NEM), Europe’s Technology Platform of Horizon 2020 dedicated to driving the future of digital experiences, calls for needed actions from both industry and research perspectives addressing challenges related to social and human centered XR as well as XR communication aspects [NEM XR, 2022]. One such initiative is the Horizon 2020 TRANSMIXR project [TRANSMIXR], aimed at developing a distributed XR creation environment that supports remote collaboration practices, as well as an XR media experience environment for the delivery and consumption of social immersive media experiences. The NEM initiative further identifies the need for scalable solutions to obtain plausible and convincing virtual copies of physical objects and environments, as well as solutions supporting seamless and convincing interaction between the physical and the virtual world. Among key technologies and infrastructures needed to overcome outlined challenges, the following are identified [NEM XR, 2022]: high bandwidth and low-latency energy-efficient networks; remote computing for processing and rendering deployed on cloud and edge infrastructures; tools for the creation and updating of digital twins (DT) to strengthen the link between the real and virtual worlds, integrating Internet of Things (IoT) platforms; hardware in the form of advanced displays; and various content creation tools relying on interoperable formats.

Merging the digital and physical worlds

Looking towards the future, immersive digiphysical experiences set the stage for visions of the metaverse [Wang, 2023], described as representing the evolution of the Internet towards a platform enabling immersive, persistent, and interconnected virtual environments blending digital and physical [Lee, 2021].[Wang, 2022] see the metaverse as `created by the convergence of physically persistent virtual space and virtually enhance physical reality’. The metaverse is further seen as a platform offering the potential to host real-time multisensory social interactions (e.g., involving sight, hearing, touch) between people communicating with each other in real-time via avatars [Hennig-Thurau, 2023]. As of 2022, the Metaverse Standards Forum is proving a venue for industry coordination fostering the development of interoperability standards for an open and inclusive metaverse [Metaverse, 2023]. Relevant existing standards include: ISO/IEC 23005 (MPEG-V) (standardization of interfaces between the real world and the virtual world, and among virtual worlds) [ISO/IEC 23055], IEEE 2888 (definition of standardized interfaces for synchronization of cyber and physical worlds) [IEEE 2888], and MPEG-I (standards to digitally represent immersive media) [ISO/IEC 23090].

Research Challenges For The Qoe Community

Achieving wide-spread adoption of XR-based services providing digiphysical experiences across a broad range of application domains (e.g., education, industry & manufacturing, healthcare, engineering, etc.) inherently requires ensuring intuitive, comfortable, and positive user experiences. While research efforts in meeting such requirements are well under way, a number of open challenges remain.

Quality of Experience (QoE) for immersive media has been defined as [IMeX WP, 2020]the degree of delight or annoyance of the user of an application or service which involves an immersive media experience. It results from the fulfillment of his or her expectations with respect to the utility and/or enjoyment of the application or service in the light of the user’s personality and current state.” Furthermore, a bridge between QoE and UX has been established through the concept of Quality of User Experience (QUX), combining hedonic, eudaimonic and pragmatic aspects of QoE and UX [Egger-Lampl, 2019]. In the context of immersive communication and collaboration services, significant efforts are being invested towards understanding and optimizing the end user experience [Perez, 2022].

The White Paper [IMeX WP, 2020] ties immersion to the digital media world (“The more the system blocks out stimuli from the physical world, the more the system is considered to be immersive.”). Nevertheless, immersion as such exists in physical contexts as well, e.g., when reading a captivating book. MR, XR and AV scenarios are digiphysical in their nature. These considerations pose several challenges:

  1. Achieving intuitive and natural interactive experiences [Hennig-Thurau, 2023] when mixing realities.
  2. Developing a common understanding of MR-, XR- and AV-related challenges in digiphysical multi-modal multi-party settings.
  3. Advancing VR, AR, MR, XR and AV technologies to allow for truly digiphysical experiences.
  4. Measuring and modeling QoE, UX and QUX for immersive digiphysical services, covering overall methodology, measurement instruments, modeling approaches, test environments and application domains.
  5. Management of the networked infrastructure to support immersive digiphysical experiences with appropriate QoE, UX and QUX.
  6. Sustainability considerations in terms of environmental footprint, accessibility, equality of opportunities in various parts of the world, and cost/benefit ratio.

Challenges 1 and 2 demand for an experience-based bottom-up approach to focus on the most important aspects. Examples include designing and evaluating different user representations [Aseeri, 2021][Viola, 2023], natural interaction techniques [Spittle, 2023] and use of different environments by participants (AR/MR/VR) [Moslavac, 2023]. The latter has shown beneficial for challenges 3 (cf. the emergence of MR-/XR-/AV-supporting head-mounted devices such as the Microsoft Hololens and recent pass-through versions of the Meta Quest) and 4. Finally, challenges 5 and 6 need to be carefully addressed to allow for long-term adoption and feasibility.

Challenges 1 to 4 have been addressed in standardization. For instance, ITU-T Recommendation P.1320 specifies QoE assessment procedures and metrics for the evaluation of XR telemeetings, outlining various categories of QoE influence factors and use cases [ITU-T Rec. P.1320, 2022] (adopted from the 3GPP technical report TR 26.928 on XR technology in 5G). The corresponding ITU-T Study Group 12 (Question 10) developed a taxonomy of telemeetings [ITU-T Rec. G.1092, 2023], providing a systematic classification of telemeeting systems. Ongoing joint efforts between the VQEG Immersive Media Group and ITU-T Study Group 12 are targeted towards specifying interactive test methods for subjective assessment of XR communications [ITU-T P.IXC, 2022].

The complexity of the aforementioned challenges demand for a combination of fundamental work, use cases, implementations, demonstrations, and testing. One specific use case that has shown its urge during recent years in combining digital and physical realities is that of hybrid conference organization, touching in particular on the challenge of achieving intuitive and natural interactions between remote and physically present participants. We consider this use case in detail in the following section, referring to the organization of the International Conference on Quality of Multimedia Experience (QoMEX) as an example.

Immersive Communication And Collaboration: The Case Of Conference Organization

What seemed to be impossible and was undesirable in the past, became a necessity overnight during the CoVid-19 pandemic: running conferences as fully virtual events. Many research communities succeeded in adapting ongoing conference organizations such that communities could meet, present, demonstrate and socialize online. The conference QoMEX 2020 is one such example, whose organizers introduced a set of innovative instruments to mutually interact and enjoy, such as virtual Mozilla Hubs spaces for poster presentations and a music session with prerecorded contributions mixed to form a joint performance to be enjoyed virtually together. A yet unknown inventiveness was observed to make the best out of the heavily travel-restricted situation. Furthermore, the technical approaches varied from off-the-shelf systems (such as Zoom or Teams) to custom-built applications. However, the majority of meetings during CoVid times, no matter scale and nature, were run in unnatural 2D on-screen settings. The frequently reported phenomenon of videoconference (VC) fatigue can be attributed to a set of personal, organizational, technical and environmental factors [Döring, 2022]. Indeed, talking to one’s computer with many faces staring back, limited possibilities to move freely, technostress [Brod, 1984] and organizational mishaps made many people tired of VC technology that was designed for a better purpose, but could not get close enough to a natural real-life experience.

As CoVid was on its retreat, conferences again became physical events and communities enjoyed meeting again, e.g., at QoMEX 2022. However, voices were raised that asked for remote participation for various reasons, such as time or budget restrictions, environmental sustainability considerations, or simply the comfort of being able to work from home. With remote participation came the challenge of bridging between in-person and remote participants, i.e., turning conferences into hybrid events [Bajpai, 2022]. However, there are many mixed experiences from hybrid conferences, both with onsite and online participants: (1) The onsite participants suffer from interruptions of the session flow needed to fix problems with the online participation tool. Their readiness to devote effort, time, and money to participate in a future hybrid event in person might suffer from such issues, which in turn would weaken the corresponding communities; (2) The online participants suffer from similar issues, where sound irregularities (echo, excessive sound volumes, etc.) are felt to be particularly disturbing, along with feelings of being not properly included e.g., in Q&A-sessions and personal interactions. At both ends, clear signs of technostress and “us-and-them” feelings can be observed. Consequently, and despite good intentions and advice [Bajpai, 2022], any hybrid conference might miss its main purpose to bring researchers together to present, discuss and socialize. To avoid the above-listed issues, the post-CoVid QoMEX conferences (since 2022) avoided hybrid operations, with few exceptions.

A conference is a typical case that reveals difficulties in bringing the physical and digital worlds together [Westerlund, 2020], at least when relying upon state-of-the-art telemeeting approaches that have not explicitly been designed for hybrid and digiphysical operations. At the recent 26th ACM Conference on Computer-Supported Cooperative Work And Social Computing in Minneapolis, USA (CSCW 2023), one of the panel sessions focused on “Realizing Values in Hybrid Environments”. Panelists and audience shared experiences about successes and failures with hybrid events. The main take-aways were as follows: (1) there is a general lack of know-how, no matter how much funds are allocated, and (2) there is a significant demand for research activities in the area.

Yet, there is hope, as increasingly many VR, MR, XR and AV-supporting devices and applications keep emerging, enabling new kinds and representations of immersive experiences. In a conference context, the latter implies the feeling of “being there”, i.e., being integrated in the conference community, no matter where the participant is located. This calls for new ways of interacting amongst others through various realities (VR/MR/XR), which need to be invented, tried and evaluated in order to offer new and meaningful experiences in telemeeting scenarios [Viola, 2023]. Indeed, CSCW 2023 hosted a specific workshop titled “Emerging Telepresence Technologies for Hybrid Meetings: an Interactive Workshop”, during which visions, experiences, and solutions were shared and could be experienced locally and remotely. About half of the participants were online, successfully interacting with participants onsite via various techniques.

With these challenges and opportunities in mind, the motto of QoMEX 2024 has been set as “Towards immersive digiphysical experiences.” While the conference is organized as an in-person event, a set of carefully selected hybrid activities will be offered to interested remote participants, such as (1) 360° stereoscopic streaming of the keynote speeches and demo sessions, and (2) the option to take part in so-called hybrid experience demos. The 360° stereoscopic streaming has so far been tested successfully in local, national and transatlantic sessions (during the above-mentioned CSCW workshop) with various settings, and further fine-tuning will be done and tested before the conference. With respect to the demo session – and in addition to traditional onsite demos – this year, the conference will in particular solicit hybrid experience demos that enable both onsite and remote participants to test the demo in an immersive environment. Facilities will also be provided for onsite participants to test demos from both the perspective of a local and remote user, enabling them to experience different roles. The organizers of QoMEX 2024 hope that the hybrid activities of QoMEX 2024 will trigger more research interest in these areas along and beyond the classical lines of QoE research (to perform quantitative subjective studies of QoE features and correlating them with QoE factors).

QoMEX 2024: Towards Immersive Digiphysical Experiences

Concluding Remarks

As immersive experiences extend into both digital and physical worlds and realities, there is a great space to conquer for QoE, UX, and QUX-related research. While the recent CoVid pandemic has forced many users to replace physical with digital meetings and sustainability considerations have reduced many peoples’ and organizations’ readiness to (support) travel, shortcomings of hybrid digiphysical meetings have failed to persuade their participants of their superiority over pure online or on-site meetings. Indeed, one promising path towards a successful integration of physical and digital worlds consists of trying out, experiencing, reflecting, and deriving important research questions for and beyond the QoE research community The upcoming conference QoMEX 2024 will be a stop along this road with carefully selected hybrid experiences aimed at boosting research and best practice in the QoE domain towards immersive digiphysical experiences.

References

  • [Aseeri, 2021] Aseeri, S., & Interrante, V. (2021). The Influence of Avatar Representation on Interpersonal Communication in Virtual Social Environments. IEEE Transactions on Visualization and Computer Graphics, 27(5), 2608-2617.
  • [Bajpai, 2022] Bajpai, V., et al.. (2022). Recommendations for designing hybrid conferences. ACM SIGCOMM Computer Communication Review, 52(2), 63-69.
  • [Brod, 1984] Brod, C. (1984). Technostress: The Human Cost of the Computer Revolution. Basic Books; New York, NY, USA: 1984.
  • [Döring, 2022] Döring, N., Moor, K. D., Fiedler, M., Schoenenberg, K., & Raake, A. (2022). Videoconference Fatigue: A Conceptual Analysis. International Journal of Environmental Research and Public Health, 19(4), 2061.
  • [Egger-Lampl, 2019] Egger-Lampl, S., Hammer, F., & Möller, S. (2019). Towards an integrated view on QoE and UX: adding the Eudaimonic Dimension, ACM SIGMultimedia Records, 10(4):5.
  • [Gibbs, 2022] Gibbs, J. K., Gillies, M., & Pan, X. (2022). A comparison of the effects of haptic and visual feedback on presence in virtual reality. International Journal of Human-Computer Studies, 157, 102717.
  • [Hennig-Thurau, 2023] Hennig-Thurau, T., Aliman, D. N., Herting, A. M., Cziehso, G. P., Linder, M., & Kübler, R. V. (2023). Social Interactions in the Metaverse: Framework, Initial Evidence, and Research Roadmap. Journal of the Academy of Marketing Science, 51(4), 889-913.
  • [IMeX WP, 2020] Perkis, A., Timmerer, C., et al., “QUALINET White Paper on Definitions of Immersive Media Experience (IMEx)”, European Network on Quality of Experience in Multimedia Systems and Services, 14th QUALINET meeting (online), May 25, 2020. Online: https://arxiv.org/abs/2007.07032
  • [ISO/IEC 23055] ISO/IEC 23005 (MPEG-V) standards, Media Context and Control, https://mpeg.chiariglione.org/standards/mpeg-v, accessed January 21, 2024.
  • [ISO/IEC 23090] ISO/IEC 23090 (MPEG-I) standards, Coded representation of Immersive Media, https://mpeg.chiariglione.org/standards/mpeg-i, accessed January 21, 2024.
  • [IEEE 2888] IEEE 2888 standards, https://sagroups.ieee.org/2888/, accessed January 21, 2024.
  • [ITU-T Rec.. G.1092, 2023] ITU-T Recommendation G.1092 – Taxonomy of telemeetings from a quality of experience perspective, Oct. 2023.
  • [ITU-T Rec. P.1320, 2022] ITU-T Recommendation P.1320 – QoE assessment of extended reality (XR) meetings, 2022.
  • [ITU-T P.IXC, 2022] ITU-T Work Item: Interactive test methods for subjective assessment of extended reality communications, under study,” 2022.
  • [Lee, 2021] Lee, L. H. et al. (2021). All One Needs to Know about Metaverse: A Complete Survey on Technological Singularity, Virtual Ecosystem, and Research Agenda. arXiv preprint arXiv:2110.05352.
  • [Metaverse, 2023] Metaverse Standards Forum, https://metaverse-standards.org/
  • [Milgram, 1995] Milgram, P., Takemura, H., Utsumi, A., & Kishino, F. (1995, December). Augmented reality: A class of displays on the reality-virtuality continuum. In Telemanipulator and telepresence technologies (Vol. 2351, pp. 282-292). International Society for Optics and Photonics.
  • [Moslavac, 2023] Moslavac, M., Brzica, L., Drozd, L., Kušurin, N., Vlahović, S., & Skorin-Kapov, L. (2023, July). Assessment of Varied User Representations and XR Environments in Consumer-Grade XR Telemeetings. In 2023 17th International Conference on Telecommunications (ConTEL) (pp. 1-8). IEEE.
  • [Rauschnabel, 2022] Rauschnabel, P. A., Felix, R., Hinsch, C., Shahab, H., & Alt, F. (2022). What is XR? Towards a Framework for Augmented and Virtual Reality. Computers in human behavior, 133, 107289.
  • [NEM WP, 2022] New European Media (NEM), NEM: List of topics for the Work Program 2023-2024.
  • [NEM XR, 2022] New European Media (NEM), NEM contribution to the XR coalition, June 2022.
  • [Perez, 2022] Pérez, P., Gonzalez-Sosa, E., Gutiérrez, J., & García, N. (2022). Emerging Immersive Communication Systems: Overview, Taxonomy, and Good Practices for QoE Assessment. Frontiers in Signal Processing, 2, 917684.
  • [Spittle, 2023] Spittle, B., Frutos-Pascual, M., Creed, C., & Williams, I. (2023). A Review of Interaction Techniques for Immersive Environments. IEEE Transactions on Visualization and Computer Graphics, 29(9), Sept. 2023.
  • [TRANSMIXR] EU HORIZON 2020 TRANSMIXR project, Ignite the Immersive Media Sector by Enabling New Narrative Visions, https://transmixr.eu/
  • [Viola, 2023] Viola, I., Jansen, J., Subramanyam, S., Reimat, I., & Cesar, P. (2023). VR2Gather: A Collaborative Social VR System for Adaptive Multi-Party Real-Time Communication. IEEE MultiMedia, 30(2).
  • [Wang 2023] Wang, H. et al. (2023). A Survey on the Metaverse: The State-of-the-Art, Technologies, Applications, and Challenges. IEEE Internet of Things Journal, 10(16).
  • [Wang, 2022] Wang, Y. et al. (2022). A Survey on Metaverse: Fundamentals, Security, and Privacy. IEEE Communications Surveys & Tutorials, 25(1).
  • [Westerlund, 2020] Westerlund, T. & Marklund, B. (2020). Community pharmacy and primary health care in Sweden – at a crossroads. Pharm Pract (Granada), 18(2): 1927.

Explainable Artificial Intelligence for Quality of Experience Modelling

Data-driven Quality of Experience (QoE) modelling using Machine Learning (ML) arose as a promising alternative to the cumbersome and potentially biased manual QoE modelling. However, the reasoning of a majority of ML models is not explainable due to their black-box characteristics, which prevents us from gaining insights about how the model actually related QoE influence factors and QoE. These fundamental relationships are highly relevant for QoE researchers and service and network providers though.

With the emerging field of eXplainable Artificial Intelligence (XAI) and its recent technological advances, these issues can now be resolved. As a consequence, XAI enables data-driven QoE modelling to obtain generalizable QoE models and provides us simultaneously with the model’s reasoning on which QoE factors are relevant and how they affect the QoE score. In this work, we showcase the feasibility of explainable data-driven QoE modelling for video streaming and web browsing, before we discuss the opportunities and challenges of deploying XAI for QoE modelling.

Introduction

In order to enhance services and networks and prevent users from switching to competitors, researchers and service providers need a deep understanding of the factors that influence the Quality of Experience (QoE) [1]. However, developing an effective QoE model is a complex and costly endeavour. Typically, it requires dedicated and extensive studies, which can only cover a limited portion of the parameter space and may be influenced by the study design. These studies often generate a relatively small sample of QoE ratings from a comparatively small population, making them vulnerable to poor performance when applied to unseen data. Moreover, the process of collecting and processing data for QoE modelling is not only arduous and time-consuming, but it can also introduce biases and self-fulfilling prophecies, such as perceiving an exponential relationship when one is expected.

To overcome these challenges, data-driven QoE modelling utilizing machine learning (ML) has emerged as a promising alternative, especially in scenarios where there is a wealth of data available or where data streams can be continuously obtained. A notable example is the ITU-T standard P.1203 [2], which estimates video streaming QoE by combining manual modelling – accounting for 75% of the Mean Opinion Score (MOS) estimation – and ML-based Random Forest modelling – accounting for the remaining 25%. The inclusion of the ML component in P.1203 indicates its ability to enhance performance. However, the inner workings of P.1203’s Random Forest model, specifically how it calculates the output score, are not obvious. Also, the survey in [3] shows that ML-based QoE modelling in multimedia systems is already widely used, including Virtual Reality, 360-degree video, and gaming. However, the QoE models are based on shallow learning methods, e.g., Support Vector Machines (SVM), or on deep learning methods, which lack explainability. Thus, it is difficult to understand what QoE factors are relevant and how they affect the QoE score [13], resulting in a lack of trust in data-driven QoE models and impeding their widespread adoption by researchers and providers [14].

Fortunately, recent advancements in the field of eXplainable Artificial Intelligence (XAI) [6] have paved the way for interpretable ML-based QoE models, thereby fostering trust between stakeholders and the QoE model. These advancements encompass a diverse range of XAI techniques that can be applied to existing black-box models, as well as novel and sophisticated ML models designed with interpretability in mind. Considering the use case of modelling video streaming QoE from real subjective ratings, the work in [4] evaluates the feasibility of explainable, data-driven QoE modelling and discusses the deployment of XAI for QoE research.

The utilization of XAI for QoE modelling brings several benefits. Not only does it speed up the modelling process, but it also enables the identification of the most influential QoE factors and their fundamental relationships with the Mean Opinion Score (MOS). Furthermore, it helps eliminate biases and preferences from different research teams and datasets that could inadvertently influence the model. All that is required is a selective dataset with descriptive features and corresponding QoE ratings (labels), which covers the most important QoE influence factors and, in particular, also rare events, e.g., many stalling events in a session. Generating such complete datasets, however, is an open research question, but calls for data-centric AI [15]. By merging datasets from various studies, more robust and generalizable QoE models can theoretically be created. These studies need to have a common ground though. Another benefit is the fact that the models can also be automatically refined over time as new QoE studies are conducted and additional data becomes available.

XAI: eXplainable Artificial Intelligence

For a comprehensive understanding of eXplainable Artificial Intelligence (XAI), a general overview can be found in [5], while a thorough survey on XAI methods and a taxonomy of XAI methods, in general, is available in [6].

XAI methods can be categorized into two main types: local and global explainability techniques. Local explainability aims to provide explanations for individual stimuli in terms of QoE factors and QoE ratings. On the other hand, global explainability focuses on offering general reasoning for how a model derives the QoE rating from the underlying QoE factors. Furthermore, XAI methods can be classified into post-hoc explainers and interpretable models.

Post-hoc explainers [6] are commonly used to explain various black-box models, such as neural networks or ensemble techniques after they have been trained. One widely utilized post-hoc explainer is SHAP values [7], which originates from game theory. SHAP values quantify the contribution of each feature to the model’s prediction by considering all possible feature subsets and learning a model for each subset. Other post-hoc explainers include LIME and Anchors, although they are limited to classification tasks.

Interpretable models, by design, provide explanations for how the model arrives at its output. Well-known interpretable models include linear models and decision trees. Additionally, generalized additive models (GAM) are gaining recognition as interpretable models.

A GAM is a generalized linear model in which the model output is computed by summing up each of the arbitrarily transformed input features along with a bias [8]. The form of a GAM enables a direct interpretation of the model by analyzing the learned functions and the transformed inputs, which allows to estimate the influence of a feature. Two state-of-the-art ML-based GAM models are Explainable Boosting Machine (EBM) [9] and Neural Additive Model (NAM) [8]. While EBM uses decision trees to learn the functions and gradient boosting to improve training, NAM utilizes arbitrary neural networks to learn the functions, resulting in a neural network architecture with one sub-network per feature. EBM extends GAM by also considering additional pairwise feature interaction terms while maintaining explainability.

Exemplary XAI-based QoE Modelling using GAMs

We demonstrate the learned predictor functions for both EBM (red) and NAM (blue) on a video QoE dataset in Figure 1. All technical details about the dataset and the methodology can be found in [4]. We observe that both models provide smooth shape functions, which are easy to interpret. EBM and NAM differ only marginally and mostly in areas where the data density is low. Here, EBM outperforms NAM by overfitting on single data points using the feature interaction terms. We can see this, for example, for a high total stalling duration and a high number of quality switches, where at some point EBM stops the negative trend and strongly contrasts its previous trend to improve predictions for extreme outliers.

Figure 1: EBM and NAM for video QoE modelling

Using the smooth predictor functions, it is easy to apply curve fitting. In the bottom right plot of Figure 1, we fit the average bitrate predictor function of NAM, which was shifted by the average MOS of the dataset to obtain the original MOS scale on the y-axis, on an inverted x-axis using exponential (IQX), logarithmic (WQL), and linear functions (LIN). Note that this constitutes a univariate mapping of average bitrate to MOS, neglecting the other influencing factors. We observe that our predictor function follows the WQL hypothesis [10] (red) with a high R²=0.967. This is in line with the mechanics of P.1203, where the authors of [11] showed the same logarithmic behavior for the bitrate in mode 0.

Figure 2: EBM and NAM for web QoE modelling

As the presented XAI methods are universally applicable to any QoE dataset, Figure 2 shows a similar GAM-based QoE modelling for a web QoE dataset obtained from [12]. We can see that the loading behavior in terms of ByteIndex-Page Load Time (BI-PLT) and time to last byte (TTLB) has the strongest impact on web QoE. Moreover, we see that different URLs/webpages have a different effect on the MOS, which shows that web QoE is content dependent. Summarizing, using GAMs, we obtain valuable easy to interpret functions, which explain fundamental relationships between QoE factors and MOS. Nevertheless, further XAI methods can be utilized, as detailed in [4,5,6].

Discussion

In addition to expediting the modelling process and mitigating modelling biases, data-driven QoE modelling offers significant advantages in terms of improved accuracy and generalizability compared to manual QoE models. ML-based models are not constrained to specific classes of continuous functions typically used in manual modelling, allowing them to capture more complex relationships present in the data. However, a challenge with ML-based models is the risk of overfitting, where the model becomes overly sensitive to noise and fails to capture the underlying relationships. Overfitting can be avoided through techniques like model regularization or by collecting sufficiently large or complete datasets.

Successful implementation of data-driven QoE modelling relies on purposeful data collection. It is crucial to ensure that all (or at least the most important) QoE factors are included in the dataset, covering their full parameter range with an adequate number of samples. Controlled lab or crowdsourcing studies can define feature values easily, but budget constraints (time and cost) often limit data collection to a small set of selected feature values. Conversely, field studies can encompass a broader range of feature values observed in real-world scenarios, but they may only gather limited data samples for rare events, such as video sessions with numerous stalling events. To prevent data bias, it is essential to balance feature values, which may require purposefully generating rare events in the field. Additionally, thorough data cleaning is necessary. While it is possible to impute missing features resulting from measurement errors, doing so increases the risk of introducing bias. Hence, it is preferable to filter out missing or unusual feature values.

Moreover, adding new data and retraining an ML model is a natural and straightforward process in data-driven modelling, offering long-term advantages. Eventually, data-driven QoE models would be capable of handling concept drift, which refers to changes in the importance of influencing factors over time, such as altered user expectations. However, QoE studies are rarely conducted as temporal and population-based snapshots, limiting frequent model updates. Ideally, a pipeline could be established to provide a continuous stream of features and QoE ratings, enabling online learning and ensuring the QoE models remain up to date. Although challenging for research endeavors, service providers could incorporate such QoE feedback streams into their applications

Comparing black-box and interpretable ML models, there is a slight trade-off between performance and explainability. However, as shown in [4], it should be negligible in the context of QoE modelling. Instead, XAI allows to fully understand the model decisions, identifying relevant QoE factors and their relationships to the QoE score. Nevertheless, it has to be considered that explaining models becomes inherently more difficult when the number of input features increases. Highly correlated features and interactions may further lead to misinterpretations when using XAI since the influence of a feature may also depend on other features. To obtain reliable and trustworthy explainable models, it is, therefore, crucial to exclude highly correlated features.

Finally, although we demonstrated XAI-based QoE modelling only for video streaming and web browsing, from a research perspective, it is important to understand that the whole process is easily applicable in other domains like speech or gaming. Apart from that, it can also be highly beneficial for providers of services and networks to use XAI when implementing a continuous QoE monitoring. They could integrate visualizations of trends like Figure 1 or Figure 2 into dashboards, thus, allowing to easily obtain a deeper understanding of the QoE in their system.

Conclusion

In conclusion, the progress in technology has made data-driven explainable QoE modeling suitable for implementation. As a result, it is crucial for researchers and service providers to consider adopting XAI-based QoE modeling to gain a comprehensive and broader understanding of the factors influencing QoE and their connection to users’ subjective experiences. By doing so, they can enhance services and networks in terms of QoE, effectively preventing user churn and minimizing revenue losses.

References

[1] K. Brunnström, S. A. Beker, K. De Moor, A. Dooms, S. Egger, M.-N. Garcia, T. Hossfeld, S. Jumisko-Pyykkö, C. Keimel, M.-C. Larabi et al., “Qualinet White Paper on Definitions of Quality of Experience,” 2013.

[2] W. Robitza, S. Göring, A. Raake, D. Lindegren, G. Heikkilä, J. Gustafsson, P. List, B. Feiten, U. Wüstenhagen, M.-N. Garcia et al., “HTTP Adaptive Streaming QoE Estimation with ITU-T Rec. P. 1203: Open Databases and Software,” in ACM MMSys, 2018

[3] G. Kougioumtzidis, V. Poulkov, Z. D. Zaharis, and P. I. Lazaridis, “A Survey on Multimedia Services QoE Assessment and Machine Learning-Based Prediction,” IEEE Access, 2022.

[4] N. Wehner, A. Seufert, T. Hoßfeld, M. and Seufert, “Explainable Data-Driven QoE Modelling with XAI,” QoMEX, 2023.

[5] C. Molnar, Interpretable Machine Learning, 2nd ed., 2022. Available: https://christophm.github.io/interpretable-ml-book

[6] A. B. Arrieta, N. Diıaz-Rodriguez et al., “Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges Toward Responsible AI,” Information fusion, 2020.

[7] S. M. Lundberg and S.-I. Lee, “A Unified Approach to Interpreting Model Predictions,” NIPS, 2017.

[8] R. Agarwal, L. Melnick, N. Frosst, X. Zhang, B. Lengerich, R. Caruana, and G. E. Hinton, “Neural Additive Models: Interpretable MachineLearning with Neural Nets,” NIPS, 2021.

[9] H. Nori, S. Jenkins, P. Koch, and R. Caruana, “InterpretML: A Unified Framework for Machine Learning Interpretability,” arXiv preprint arXiv:1909.09223, 2019.

[10] T. Hoßfeld, R. Schatz, E. Biersack, and L. Plissonneau, “Internet Video Delivery in YouTube: From Traffic Measurements to Quality of Experience,” in Data Traffic Monitoring and Analysis, 2013.

[11] M. Seufert, N. Wehner, and P. Casas, “Studying the Impact of HAS QoE Factors on the Standardized Qoe Model P. 1203,” in ICDCS, 2018

[12] D. N. da Hora, A. S. Asrese, V. Christophides, R. Teixeira, D. Rossi, “Narrowing the gap between QoS metrics and Web QoE using Above-the-fold metrics,” PAM, 2018

[13] A. Seufert, F. Wamser, D. Yarish, H. Macdonald, and T. Hoßfeld, “QoE Models in the Wild: Comparing Video QoE Models Using a Crowdsourced Data Set”, in QoMEX, 2021

[14] D. Shin, “The effects of explainability and causability on perception, trust, and acceptance: Implications for explainable AI”, in International Journal of Human-Computer Studies, 2021.

[15] D. Zha, Z. P. Bhat, K. H. Lai, F. Yang, & X. Hu, “Data-centric ai: Perspectives and challenges”, in SIAM International Conference on Data Mining, 2023

Sustainability vs. Quality of Experience: Striking the Right Balance for Video Streaming

The exponential growth in internet data traffic, driven by the widespread use of video streaming applications, has resulted in increased energy consumption and carbon emissions. This outcome is primarily due to higher resolution or higher framerates content and the ability to watch videos on various end-devices. However, efforts to reduce energy consumption in video streaming services may have unintended consequences on users’ Quality of Experience (QoE). This column delves into the intricate relationship between QoE and energy consumption, considering the impact of different bit rates on end-devices. We also consider other factors to provide a more comprehensive understanding of whether these end-devices have a significant environmental impact. It is essential to carefully weigh the trade-offs between QoE and energy consumption to make informed decisions and develop sustainable practices in video streaming services.

Energy Consumption for Video Streaming

In the past few years, we have seen a remarkable expansion in how online content is delivered. According to Sandvine’s 2023 Global Internet Phenomena Report [1], video usage on the Internet has increased by 24% in 2022 and now accounts for 65% of all Internet traffic. This surge in video usage is mainly due to the growing popularity of streaming video services. Videos have become an increasingly popular form of online content, capturing a significant portion of internet users’ attention and shaping how we consume information and entertainment online. Therefore, the rising quality expectations of end-users have necessitated research and implementation of video streaming management approaches that consider the Quality of Experience (QoE) [2]. The idea is to develop applications that can work within the energy and resource limits of end-devices, while still delivering the Quality of Service (QoS) needed for smooth video viewing and an improved user experience (QoE). Even though video streaming services are advancing so quickly, energy consumption is still a significant issue causing many concerns about its impact and the urgent need to boost energy efficiency [14].

The literature provides four main elements: the data centres, the data transmission networks, the end-devices and the consumer behaviour analysing of the energy consumption of video streaming [3]. In this regard, in [4], the authors present a comprehensive review of existing literature on the energy consumption of online video streaming services. Then, they outline the potential actions that can be taken by both service providers and consumers to promote sustainable video streaming, drawing from the literature studies discussed. Their summary of the current possible actions for sustainable video streaming, from both the provider’s and consumer’s perspective, is expressed in the following segments with some of the possible solutions:

  • Data center: CDN (Content Delivery Network) can be utilized to offload contents/applications to the edge from the provider’s side and choose providers that prioritize sustainability from the consumer’s side.
  • Data transmission network: Data compression/encoding algorithms from the provider’s side and no autoplay from the consumer’s side.
  • End-Device: Produce energy-efficient devices from the provider’s size and prefer small-size (mobile) devices from the consumer’s side.
  • Consumer behaviour: Reduce the number of subscribers from the provider’s size and prefer watching videos with other people than alone from the consumer’s side.

Finally, they noted that the end device and consumer behaviour are the primary contributors to energy costs in the video streaming process. This result includes actions such as reducing video resolution and using smaller devices. However, taking such actions may have a potential downside as they can negatively impact the QoE due to their effect on video quality. Therefore, in [5], they found that by sacrificing the maximum QoE and aiming for good quality instead (e.g., MOS score of 4=Good instead of MOS score 5=Excellent), significant energy savings can be achieved in video-conferencing services. This is possible by using lower video bitrates compared to higher bitrates which result in higher energy consumption, as per their logarithmic QoE model. Regarding this research, in [4], the authors propose identifying an acceptable level of QoE, rather than striving for maximum QoE, as a potential solution to reduce energy consumption while still meeting consumer satisfaction. They conducted a crowdsourcing survey to gather real consumer opinions on their willingness to save energy consumption while streaming online videos. Then, they analysed the survey results to understand how willing people are to lower video streaming quality in order to achieve energy savings.

Green Video Streaming: The Trade-Off Between QoE and Energy Consumption

To provide a trade-off between QoE and Energy Consumption, we looked at the connection between video bitrate of standard resolution, electricity usage, and perceived QoE for a video streaming service on four different devices (smartphone, tablet, laptop/PC, and smart TV) as taken from [4].

They calculated the energy consumption of streaming on devices which is provided in [6]: Q_i = t_i.(P_i+R_i.ƿ), in the given equation, Q_i represents the electricity consumption (in kWh) of the i-th device, t_i denotes the streaming duration (in hours per week) for the i-th device, P_i represents the power load (in kW) of the i-th device, R_i signifies the data traffic (in GB/h) for a specific bitrate, and ρ = 0.1 kWh/GB represents the electricity intensity of data traffic.

Then,  to estimate the perceived QoE based on the video bitrate, the authors employed a QoE model from [7], as noted in their analysis which is: QoE = a.br^b + c, where “br” represents the bitrate, and “a”, “b”, and “c” are the regression coefficients calculated for specific resolutions.

After taking this into account, we can establish a link between the QoE model, energy consumption, and the perceived QoE associated with video bitrate across various end-devices. Therefore, we implemented the green QoE model in [8] to provide a trade-off between the perceived QoE and the calculated energy consumption from above in the following way: f_γ(x)= 4/(log(x’_5)-log(x_1))*log(x)+ (log(x’_5)-5*log(x_1))/(log(x’_5)-log(x_1)). The given equation represents the mapping function between video bitrate and Mean Opinion Scores (MOS), considering both the minimum bitrate x_1 corresponding to MOS 1 and the maximum bitrate x_5 corresponding to MOS 5. Moreover, the factor γ, representing the greenness of a user, is considered in the context of maximum bitrate x’_5 = x_5/γ, which results in a MOS score of 5.

The model focuses on the concept of a “green user,” who considers the energy consumption aspect in their overall QoE evaluations. Thus, a green user might rate their QoE slightly lower in order to reduce their carbon footprint compared to a high-quality (HQ) user (or “non-green” user) who prioritizes QoE without considering energy consumption.

The numerical results for the energy consumption (in kWh) and the MOS scores depending on the video bitrate can be simplified with linear and logarithmic regressions, respectively. In Figure 1, the graph depicts a linear regression analysis conducted to examine the relationship between energy consumption (kWh) and bitrate (kbps). The y-axis represents energy consumption while the x-axis represents bitrate (kbps). The graph displays a straight-line trend that starts at 1.6 kWh and extends up to 3.5 kWh as the bitrate increases. The linear fitting function used for the analysis is formulated as: kWh = f(bitrate) = a * bitrate + c, where ‘a’ represents the slope and ‘c’ represents the y-intercept of the line.

Figure 1 visually illustrates how energy consumption tends to increase with higher bitrates, as indicated by the positive slope of the linear regression line in Figure 1. One notable observation is that as video bitrates increase, the electricity consumption of end-devices also tends to increase. This can be attributed to the larger amount of data traffic generated by higher-resolution video content, which requires higher bitrates for transmission. Consequently, smart TVs are likely to consume more energy compared to other devices. This finding is consistent with the results obtained from the linear regression model, as described in [4], further validating the relationship between bitrate and energy consumption.

As illustrated in Figure 2, the relationship between MOS and video bitrate (kbps) follows a logarithmic pattern. Therefore, we can use a straightforward QoE model to estimate the MOS if there is information about the video bitrate. This can be achieved by utilizing a logistic regression model MOS(x), where MOS = f(x) = a * log(x) + c, with x representing the video bitrate in Mbps, and a and c being coefficients, as provided in [9]. After, MOS and video bitrate (kbps) values in [4] are applied to the above-mentioned QoE green model equation regarding the logistic regression model, which is an extension of the logarithmic regression model [8]. This relationship allows to determine the green user QoE model and we exemplary show the green user QoE model for smart TV (using γ=2 in f_γ(x)).

According to Figure 2, it is categorized users into two groups: those who prioritize high-quality (HQ) video regardless of energy consumption, and green users who prioritize energy efficiency while still being satisfied with slightly lower video quality. It can be observed that the MOS value changes in video quality on their smart TVs faster compared to other end-devices.  This is evident from the steeper curve in the smart TV section. On the other hand, when looking at the curve for tablets, it shows that changes in bitrate have a milder impact on MOS values. The outcome suggests that video streaming on smaller screens, such as tablets or laptops, may contribute less to the perception of quality changes. Considering that those small-screen devices consume less energy than larger screen devices, it may be preferable to use lower resolution videos instead of high-resolution ones. Analysing the relationship between laptops and tablets, it can be seen that low-resolution video streaming on laptops resulted in lower MOS scores compared to the tablet. From this result, it can be inferred that the choice of end-device and user behaviour plays a significant role in energy savings. Figure 2 indicates that the MOS values for the green user of a smart TV is comparable to the MOS values of an HQ user using a laptop.

Concerning this outcome, in [10], the authors presented the results of a subjective assessment aimed at investigating how different factors, such as video resolution, luminance, and end devices (TV, Laptop, and Smartphone), impact the QoE and energy consumption of video streaming services. The study found that, in certain conditions such as dark or bright environments, low device backlight luminance, or small-screen devices), users may need to strike a balance between acceptable QoE and sustainable (green) choices, as consuming more energy (e.g., by streaming higher-quality videos) may not significantly enhance the QoE.

Therefore, Figure 3 plots the trade-off relationship between energy consumption (kWh) and MOS for the end devices (such as smart TV, laptop and tablet). Thereby, we differentiate the HQ user and the green user, which presents some interesting results. First, a MOS score of 4 leads to comparable energy consumption results for green and HQ users. The relative differences are rather small. However, aiming for best quality (MOS 5) leads to significant differences. Furthermore, it is seen that the device type has a significant impact on energy consumption. Even for green users, which rate lower bitrates with higher MOS scores than HQ users, the energy consumption of the smart TV is much higher than for any quality (i.e. bitrate) for laptop and tablet users. Thus, device type and user behaviour are essential to strike the right balance between QoE and energy consumption.

Discussions and Future Research

Meeting the QoE expectations of end-users is essential to fulfilling the requirements of video streaming services. As users are the primary viewers of streaming videos in most real-world scenarios, subjective QoE assessment [11] provides a direct and dependable means to evaluate the perceptual quality of video streaming. Furthermore, there is a growing need to create objective QoE assessment models provided in [12][13]. However, many studies have focused on investigating the QoE obtained through subjective and objective models and have overlooked the consideration of energy consumption in video streaming.

Therefore, in the previous section, we have discussed how the different elements within the video streaming ecosystem play a role in consuming energy and emitting CO2.  The findings pave the way for an objective response to determining an appropriate optimal video bitrate for viewing, considering both QoE and sustainability considerations, which can be further explored in future research.

It is evident that addressing energy consumption and emissions is crucial for the future of video streaming systems, while ensuring that end-users’ QoE is not compromised poses a significant and ongoing challenge. Thus, potential solutions to prevent energy consumption increase in QoE while still satisfying the user include streaming videos on smaller screen devices and watching lower resolution videos that offer sufficient quality instead of the highest resolution ones. Here, it can be highlighted the importance of user behavior to prevent energy consumption. Additionally, trade-off models can be developed using the green QoE model (especially for smarTV) by identifying ideal bitrate values for energy savings and user satisfaction in the QoE.

Delving deeper into the dynamics of the video streaming ecosystem, it becomes increasingly clear that energy consumption and emissions are critical concerns that must be addressed for the sustainable future of video streaming systems. The environmental impact of video streaming, particularly in terms of carbon emissions, cannot be understated. With the growing awareness of the urgent need to combat climate change, mitigating the environmental footprint of video streaming has become a pressing priority.

As video streaming technologies evolve, optimizing energy-efficient approaches without compromising users’ QoE is a complex task. End-users, who expect seamless and high-quality video streaming experiences, should not be deprived of their QoE while addressing the energy and emissions concerns. The outcome opens a novel door for an objective answer to the question of what constitutes an appropriate optimal video bitrate for viewing that takes into account both QoE and sustainability concerns.

Future research in this area is crucial to explore innovative techniques and strategies that can effectively reduce the energy consumption and carbon emissions of video streaming systems without sacrificing the QoE. Additionally, collaborative efforts among stakeholders, including researchers, industry practitioners, policymakers, and end-users, are essential in devising sustainable video streaming solutions that consider both environmental and user experience factors [14].

In conclusion, the discussions on the relationship between energy consumption, emissions, and QoE in video streaming systems emphasize the need for continued research and innovation to achieve a sustainable balance between environmental sustainability and user satisfaction.

References

  • [1] Sandvine. The Global Internet Phenomena Report. January 2023. Retrieved April 24, 2023
  • [2] M. Seufert, S. Egger, M. Slanina, T. Zinner, T. Hoßfeld and P. Tran-Gia, “A Survey on Quality of Experience of HTTP Adaptive Streaming,” in IEEE Communications Surveys & Tutorials, vol. 17, no. 1, pp. 469-492, Firstquarter 2015, doi: 10.1109/COMST.2014.2360940., 2015.
  • [3] Reinhard Madlener, Siamak Sheykhha, Wolfgang Briglauer,”The electricity- and CO2-saving potentials offered by regulation of European video-streaming services,” Energy Policy,vol. 161, p. 112716, 2022.
  • [4] G. Bingöl, S. Porcu, A. Floris and L. Atzori, “An Analysis of the Trade-off between Sustainability,” in IEEE ICC Workshop-GreenNet, Rome, 2023.
  • [5] T. Hoßfeld, M. Varela, L. Skorin-Kapov, P. E. Heegaard, “What is the trade-off between CO2 emission and video-conferencing QoE?,” ACM SIGMM Records, 2022.
  • [6] P. Suski, J. Pohl, and V. Frick, “All you can stream: Investigating the role of user behavior for greenhouse gas intensity of video streaming,” in Proc. of the 7th Int. Conf. on ICT for Sustainability, 2020, pp. 128–138.
  • [7] M. Mu, M. Broadbent, A. Farshad, N. Hart, D. Hutchison, Q. Ni, and N. Race, “A Scalable User Fairness Model for Adaptive Video Streaming Over SDN-Assisted Future Networks,” IEEE Journal on Selected Areas in Communications, vol. 34, no. 8, p. 2168–2184, 2016.
  • [8] T. Hossfeld, M. Varela, L. Skorin-Kapov and P. E. Heegaard, “A Greener Experience: Trade-offs between QoE and CO2 Emissions in Today’s and 6G Networks,” IEEE Communications Magazine, pp. 1-7, 2023.
  • [9] J. P. López, D. Martín, D. Jiménez and J. M. Menéndez, “Prediction and Modeling for No-Reference Video Quality Assessment Based on Machine Learning,” in 14th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), IEEE, pp. 56-63, Las Palmas de Gran Canaria, Spain, 2018.
  • [10] G. Bingöl, A. Floris, S. Porcu, C. Timmerer and L. Atzori, “Are Quality and Sustainability Reconcilable? A Subjective Study on Video QoE, Luminance and Resolution,” in 15th International Conference on Quality of Multimedia Experience (QoMEX), Gent, Belgium, 2023.
  • [11] G. Bingol, L. Serreli, S. Porcu, A. Floris, L. Atzori, “The Impact of Network Impairments on the QoE of WebRTC applications: A Subjective study,” in 14th International Conference on Quality of Multimedia Experience (QoMEX), Lippstadt, Germany, 2022.
  • [12] D. Z. Rodríguez, R. L. Rosa, E. C. Alfaia, J. I. Abrahão and G. Bressan, “Video quality metric for streaming service using DASH standard,” IEEE Trans. Broadcasting, vol. vol. 62, no. 3, pp. 628-639, Sep. 2016.
  • [13] T. Hoßfeld, M. Seufert, C. Sieber and T. Zinner, “Assessing effect sizes of influence factors towards a QoE model for HTTP adaptive streaming,” in 6th Int. Workshop Qual. Multimedia Exper. (QoMEX), Sep. 2014.
  • [14] S. Afzal, R. Prodan, C. Timmerer, “Green Video Streaming: Challenges and Opportunities.” ACM SIGMultimedia Records, Jan. 2023.

Green Video Streaming: Challenges and Opportunities

Introduction

Regarding the Intergovernmental Panel on Climate Change (IPCC) report in 2021 and Sustainable Development Goal (SDG) 13 “climate action”, urgent action is needed against climate change and global greenhouse gas (GHG) emissions in the next few years [1]. This urgency also applies to the energy consumption of digital technologies. Internet data traffic is responsible for more than half of digital technology’s global impact, which is 55% of energy consumption annually. The Shift Project forecast [2] shows an increase of 25% in data traffic associated with 9% more energy consumption per year, reaching 8% of all GHG emissions in 2025. 

Video flows represented 80% of global data flows in 2018, and this video data volume is increasing by 80% annually [2].  This exponential increase in the use of streaming video is due to (i) improvements in Internet connections and service offerings [3], (ii) the rapid development of video entertainment (e.g., video games and cloud gaming services), (iii) the deployment of Ultra High-Definition (UHD, 4K, 8K), Virtual Reality (VR), and Augmented Reality (AR), and (iv) an increasing number of video surveillance and IoT applications [4]. Interestingly, video processing and streaming generate 306 million tons of CO2, which is 20% of digital technology’s total GHG emissions and nearly 1% of worldwide GHG emissions [2].

While research has shown that the carbon footprint of video streaming has been decreasing in recent years [5], there is still a high need to invest in research and development of efficient next-generation computing and communication technologies for video processing technologies. This carbon footprint reduction is due to technology efficiency trends in cloud computing (e.g., renewable power), emerging modern mobile networks (e.g., growth in Internet speed), and end-user devices (e.g., users prefer less energy-intensive mobile and tablet devices over larger PCs and laptops). However, since the demand for video streaming is growing dramatically, it raises the risk of increased energy consumption. 

Investigating energy efficiency during video streaming is essential to developing sustainable video technologies. The processes from video encoding to decoding and displaying the video on the end user’s screen require electricity, which results in CO2 emissions. Consequently, the key question becomes: “How can we improve energy efficiency for video streaming systems while maintaining an acceptable Quality of Experience (QoE)?”.

Challenges and Opportunities 

In this section, we will outline challenges and opportunities to tackle the associated emissions for video streaming of (i) data centers, (ii) networks, and (iii) end-user devices [5] – presented in Figure 1.

Figure 1. Challenges and opportunities to tackle emissions for video streaming.

Data centers are responsible for the video encoding process and storage of the video content. The video data traffic volume grows through data centers, driving their workloads with the estimated total power consumption of more than 1,000 TWh by 2025 [6]. Data centers are the most prioritized target of regulatory initiatives. National and regional policies are established related to the growing number of data centers and the concern over their energy consumption [7]. 

  • Suitable cloud services: Select energy-optimized and sustainable cloud services to help reduce CO2 emissions. Recently, IT service providers have started innovating in energy-efficient hardware by designing highly efficient Tensor Processing Units, high-performance servers, and machine-learning approaches to optimize cooling automatically to reduce the energy consumption in their data centers [8]. In addition to advances in hardware designs, it is also essential to consider the software’s potential for improvements in energy efficiency [9].
  • Low-carbon cloud regions: IT service providers offer cloud computing platforms in multiple regions delivered through a global network of data centers. Various power plants (e.g., fuel, natural gas, coal, wind, sun, and water) supply electricity to run these data centers generating different amounts of greenhouse gases. Therefore, it is essential to consider how much carbon is emitted by the power plants that generate electricity to run cloud services in the selected region for cloud computing. Thus, a cloud region needs to be considered by its entire carbon footprint, including its source of energy production.
  • Efficient and fast transcoders (and encoders): Another essential factor to be considered is using efficient transcoders/encoders that can transcode/encode the video content faster and with less energy consumption but still at an acceptable quality for the end-user [10][11][12].
  • Optimizing the video encoding parameters: There is a huge potential in optimizing the overall energy consumption of video streaming by optimizing the video encoding parameters to reduce the bitrates of encoded videos without affecting quality, including choosing a more power-efficient codec, resolution, frame rate, and bitrate among other parameters.

The next component within the video streaming process is video delivery within heterogeneous networks. Two essential energy consumption factors for video delivery are the network technology used and the amount of data to be transferred.

  • Energy-efficient network technology for video streaming: the network technology used to transmit data from the data center to the end-users determine energy performance since the networks’ GHG emissions vary widely [5]. A fiber-optic network is the most climate-friendly transmission technology, with only 2 grams of CO2 per hour of HD video streaming, while a copper cable (VDSL) generates twice as much (i.e., 4 grams of CO2 per hour). UMTS data transmission (3G) produces 90 grams of CO2 per hour, reduced to 5 grams of CO2 per hour when using 5G [13]. Therefore, research shows that expanding fiber-optic networks and 5G transmission technology are promising for climate change mitigation [5].
  • Lower data transmission: Lower data transmission drops energy consumption. Therefore, the amount of video data needs to be reduced without compromising video quality [2]. The video data per hour for various resolutions and qualities range from 30 MB/hr for very low resolutions to 7 GB/hr for UHD resolutions. A higher data volume causes more transmission energy. Another possibility is the reduction of unnecessary video usage, for example, by avoiding autoplay and embedded videos. Such video content aims to maximize the quantity of content consumed. Broadcasting platforms also play a central role in how viewers consume content and, thus, the impact on the environment [2].

The last component of the video streaming process is video usage at the end-user device, including decoding and displaying the video content on the end-user devices like personal computers, laptops, tablets, phones, or television sets.

  • End-user devices: Research works [3][14] show that the end-user devices and decoding hardware account for the greatest portion of energy consumption and CO2 emission in video streaming. Thus, most reduction strategies lay within the energy efficiency of the end-user devices, for instance, by improving screen display technologies or shifting from desktops to using more energy-efficient laptops, tablets, and smartphones.
  • Streaming parameters: Energy consumption of the video decoding process depends on video streaming parameters similar to the end-user QoE. Thus, it is important to intelligently select video streaming parameters to optimize the QoE and power efficiency of the end-user device. Moreover, different underlying video encoding parameters also impact the video decodings’ energy usage.
  • End-user device environment: A wide variety of browsers (including legacy versions), codecs, and operating systems besides the hardware (e.g., CPU, display) determine the final power consumption.

In this column, we argue that these challenges and opportunities for green video streaming can help to gain insights that further drive the adoption of novel, more sustainable usage patterns to reduce the overall energy consumption of video streaming without sacrificing end-user’s QoE.  

End-to-end video streaming: While we have highlighted the main factors of each video streaming component that impact energy consumption to create a generic power consumption model, we need to study and holistically analyze video streaming and its impact on all components. Implementing a dedicated system for optimizing energy consumption may introduce additional processing on top of regular service operations if not done efficiently. For instance, overall traffic will be reduced when using the most recent video codec (e.g., VVC) compared to AVC (the most deployed video codec up to date), but its encoding and decoding complexity will be increased and, thus, require more energy.

Optimizing the video streaming parameters: There is a huge potential in optimizing the overall energy consumption for video service providers by optimizing the video streaming parameters, including choosing a more power-efficient codec implementation, resolution, frame rate, and bitrate, among other parameters.

GAIA: Intelligent Climate-Friendly Video Platform 

Recently, we started the “GAIA” project to research the aspects mentioned before. In particular, the GAIA project researches and develops a climate-friendly adaptive video streaming platform that provides (i) complete energy awareness and accountability, including energy consumption and GHG emissions along the entire delivery chain, from content creation and server-side encoding to video transmission and client-side rendering; and (ii) reduced energy consumption and GHG emissions through advanced analytics and optimizations on all phases of the video delivery chain.

Figure 2. GAIA high-level approach for the intelligent climate-friendly video platform.

As shown in Figure 2, the research considered in GAIA comprises benchmarking, energy-aware and machine learning-based modeling, optimization algorithms, monitoring, and auto-tuning.

  • Energy-aware benchmarking involves a functional requirement analysis of the leading project objectives, measurement of the energy for transcoding video tasks on various heterogeneous cloud and edge resources, video delivery, and video decoding on end-user devices. 
  • Energy-aware modelling and prediction use the benchmarking results and the data collected from real deployments to build regression and machine learning. The models predict the energy consumed by heterogeneous cloud and edge resources, possibly distributed across various clouds and delivery networks. We further provide energy models for video distribution on different channels and consider the relation between bitrate, codec, and video quality.
  • Energy-aware optimization and scheduling researches and develops appropriate generic algorithms according to the requirements for real-time delivery (including encoding and transmission) of video processing tasks (i.e., transcoding) deployed on heterogeneous cloud and edge infrastructures. 
  • Energy-aware monitoring and auto-tuning perform dynamic real-time energy monitoring of the different video delivery chains for improved data collection, benchmarking, modelling and optimization. 

GMSys 2023: First International ACM Green Multimedia Systems Workshop

Finally, we would like to use this opportunity to highlight and promote the first International ACM Green Multimedia Systems Workshop (GMSys’23). The GMSys’23 takes place in Vancouver, Canada, in June 2023 co-located with ACM Multimedia Systems 2023. We expect a series of at least three consecutive workshops since this topic may critically impact the innovation and development of climate-effective approaches. This workshop strongly focuses on recent developments and challenges for energy reduction in multimedia systems and the innovations, concepts, and energy-efficient solutions from video generation to processing, delivery, and consumption. Please see the Call for Papers for further details.

Final Remarks 

In both the GAIA project and ACM GMSys workshop, there are various actions and initiatives to put energy efficiency-related topics for video streaming on the center stage of research and development. In this column, we highlighted major video streaming components concerning their possible challenges and opportunities enabling energy-efficient, sustainable video streaming, sometimes also referred to as green video streaming. Having a thorough understanding of the key issues and gaining meaningful insights are essential for successful research.

References

[1] IPCC, 2021: Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change[Masson-Delmotte, V., P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, In press, doi:10.1017/9781009157896.
[2] M. Efoui-Hess, Climate Crisis: the unsustainable use of online video – The practical case for digital sobriety, Technical Report, The Shift Project, July, 2019.
[3] IEA (2020), The carbon footprint of streaming video: fact-checking the headlines, IEA, Paris https://www.iea.org/commentaries/the-carbon-footprint-of-streaming-video-fact-checking-the-headlines.
[4] Cisco Annual Internet Report (2018–2023) White Paper, 2018 (updated 2020), https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.html.
[5] C. Fletcher, et al., Carbon impact of video streaming, Technical Report, 2021, https://s22.q4cdn.com/959853165/files/doc_events/2021/Carbon-impact-of-video-streaming.pdf.
[6] Huawei Releases Top 10 Trends of Data Center Facility in 2025, 2020, https://www.huawei.com/en/news/2020/2/huawei-top10-trends-datacenter-facility-2025.
[7] COMMISSION REGULATION (EC) No 642/2009, Official Journal of the European Union, 2009, https://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2009:191:0042:0052:EN:PDF#:~:text=COMMISSION%20REGULATION%20(EC)%20No%20642/2009%20of%2022%20July,regard%20to%20the%20Treaty%20establishing%20the%20European%20Community.
[8] U. Hölzle, Data centers are more energy efficient than ever, Technical Report, 2020, https://blog.google/outreach-initiatives/sustainability/data-centers-energy-efficient/.
[9] Charles E. Leiserson, Neil C. Thompson, Joel S. Emer, Bradley C. Kuszmaul, Butler W. Lampson, Daniel Sanchez, and Tao B. Schardl. 2020. There’s plenty of room at the Top: What will drive computer performance after Moore’s law? Science 368, 6495 (2020), eaam9744. DOI:https://doi.org/10.1126/science.aam9744
[10] M. G. Koziri, P. K. Papadopoulos, N. Tziritas, T. Loukopoulos, S. U. Khan and A. Y. Zomaya, “Efficient Cloud Provisioning for Video Transcoding: Review, Open Challenges and Future Opportunities,” in IEEE Internet Computing, vol. 22, no. 5, pp. 46-55, Sep./Oct. 2018, doi: 10.1109/MIC.2017.3301630.
[11] J. -F. Franche and S. Coulombe, “Fast H.264 to HEVC transcoder based on post-order traversal of quadtree structure,” 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, 2015, pp. 477-481, doi: 10.1109/ICIP.2015.7350844.
[12] E. de la Torre, R. Rodriguez-Sanchez and J. L. Martínez, “Fast video transcoding from HEVC to VP9,” in IEEE Transactions on Consumer Electronics, vol. 61, no. 3, pp. 336-343, Aug. 2015, doi: 10.1109/TCE.2015.7298293.
[13] Federal Ministry for the Environment, Nature Conservation and Nuclear Safety, Video streaming: data transmission technology crucial for climate footprint, No. 144/20, 2020, https://www.bmuv.de/en/pressrelease/video-streaming-data-transmission-technology-crucial-for-climate-footprint/
[14] Malmodin, Jens, and Dag Lundén. 2018. “The Energy and Carbon Footprint of the Global ICT and E&M Sectors 2010–2015” Sustainability 10, no. 9: 3027. https://doi.org/10.3390/su10093027



Towards the design and evaluation of more sustainable multimedia experiences: which role can QoE research play?

In this column, we reflect on the environmental impact and broader sustainability implications of resource-demanding digital applications and services such as video streaming, VR/AR/XR and videoconferencing. We put emphasis not only on the experiences and use cases they enable but also on the “cost” of always striving for high Quality of Experience (QoE) and better user experiences. Starting by sketching the broader context, our aim is to raise awareness about the role that QoE research can play in the context of various of the United Nations’ Sustainable Development Goals (SDGs), either directly (e.g., SDG 13 “climate action”) or more indirectly (e.g., SDG 3 “good health and well-being” and SDG 12 “responsible consumption and production”).

UNs Sustainable Development goals (Figure taken from https://www.un.org/en/sustainable-development-goals)

The ambivalent role of digital technology

One of the latest reports from the Intergovernmental Panel on Climate Change (IPCC) confirmed the urgency of drastically reducing emissions of carbon dioxide and other human-induced greenhouse gas (GHG) emissions in the years to come (IPCC, 2021). This report, directly relevant in the context of SDG 13 “climate action”, confirmed the undeniable and negative human influence on global warming and the need for collective action. While the potential of digital technology (and ICT more broadly) for sustainable development has been on the agenda for some time, the context of the COVID-19 pandemic has made it possible to better understand a set of related opportunities and challenges.

First of all, it has been observed that long-lasting lockdowns and restrictions due to the COVID-19 pandemic and its aftermath have triggered a drastic increase in internet traffic (see e.g., Feldmann, 2020). This holds particularly for the use of videoconferencing and video streaming services for various purposes (e.g., work meetings, conferences, remote education, and social gatherings, just to name a few). At the same time, the associated drastic reduction of global air traffic and other types of traffic (e.g., road traffic) with their known environmental footprint, has had undeniable positive effects on the environment (e.g., reduced air pollution, better water quality see e.g., Khan et al., 2020). Despite this potential, the environmental gains enabled by digital technology and recent advances in energy efficiency are threatened by digital rebound effects due to increased energy consumption and energy demands related to ICT (Coroamua et al., 2019; Lange et al., 2020). In the context of ever-increasing consumption, there has for instance been a growing focus in the literature on the negative environmental impact of unsustainable use and viewing practices such as binge-watching, multi-watching and media-multitasking, which have become more common over the last years (see e.g., Widdicks, 2019). While it is important to recognize that the overall emission factor will vary depending on the mix of energy generation technologies used and region in the world (Preist et al., 2014), the above observation also fits with other recent reports and articles, which expect the energy demands linked to digital infrastructure, digital services and their use to further expand and which expect the greenhouse gas emissions of ICT relative to the overall worldwide footprint to significantly increase (see e.g., Belkhir et al., 2018, Morley et al., 2018, Obringer et al., 2021). Hence, these and other recent forecasts show a growing and even unsustainable high carbon footprint of ICT in the middle-term future, due to, among others, the increasing energy demand of data centres (including e.g., also the energy needed for cooling) and the associated traffic (Preist et al., 2016).

Another set of challenges that became more apparent can be linked to the human mental resources and health involved as well as environmental effects. Here, there is a link to the abovementioned Sustainable development goals 3 (good health and well-being) and 12 (sustainable consumption and production). For instance, the transition to “more sustainable” digital meetings, online conferences, and online education has also pointed to a range of challenges from a user point of view.  “Zoom fatigue” being a prominent example illustrates the need to strike the right balance between the more sustainable character of experiences provided by and enabled through technology and how these are actually experienced and perceived from a user point of view (Döring et al., 2022; Raake et al., 2022). Another example is binge-watching behavior, which can in certain cases have a positive effect on an individual’s well-being, but has also been shown to have a negative effect through e.g., feelings of guilt and goal conflicts  (Granow et al., 2018) or through problematic involvement resulting in e.g., chronic sleep issues  (Flayelle, 2020).

From the “production” perspective, recent work has looked at the growing environmental impact of commonly used cloud-based services such as video streaming (see e.g., Chen et al., 2020, Suski et al., 2020, The Shift Project, 2021) and the underlying infrastructure consisting of data centers, transport network and end devices (Preist et al., 2016, Suski, 2020, Preist et al., 2014). As a result, the combination of technological advancements and user-centered approaches aiming to always improve the experience may have undesired environmental consequences. This includes stimulating increased user expectations (e.g., higher video quality, increased connectivity and availability, almost zero-latency, …) and by triggering increased use, and unsustainable use practices, resulting in potential rebound effects due to increased data traffic and electricity demand. 

These observations have started to culminate into a plea for a shift towards a more sustainable and humanity-centered paradigm, which considers to a much larger extent how digital consumption and increased data demand impact individuals, society and our planet (Widdicks et al., 2019, Priest et al., 2016, Hazas & Nathan, 2018). Here, it is obvious that experience, consumption behavior and energy consumption are tightly intertwined.

How does QoE research fit into this picture?

This leads to the question of where research on Quality of Experience and its underlying goals fit into this broader picture, to which extent related topics have gained attention so far and how future research can potentially have an even larger impact.

As the COVID-19 related examples above already indicated, QoE research, through its focus on improving the experience for users in e.g., various videoconferencing-based scenarios or immersive technology-related use cases, already plays and will continue to play a key role in enabling more sustainable practices in various domains (e.g., remote education, online conferences, digital meetings, and thus reducing unnecessary travels, …) and linking up to various SDGs. A key challenge here is to enable experiences that become so natural and attractive that they may even become preferred in the future. While this is a huge and important topic, we refrain from discussing it further in this contribution, as it already is a key focus within the QoE community. Instead, in the following, we, first of all, reflect on the extent to which environmental implications of multimedia services have explicitly been on the agenda of the QoE community in the past, what the focus is in more recent work, and what is currently not yet sufficiently addressed. Secondly, we consider a broader set of areas and concrete topics in which QoE research can be related to environmental and broader sustainability-related concerns.

Traditionally, QoE research has predominantly focused on gathering insights that can guide the optimization of technical parameters and allocation of resources at different layers, while still ensuring a high QoE from a user point of view. A main underlying driver in this respect has traditionally been the related business perspective: optimizing QoE as a way to increase profitability and users/customers’ willingness to pay for better quality  (Wechsung, 2014). While better video compression techniques or adaptive video streaming may allow the saving of resources, which overall may lead to environmental gains, the latter has traditionally not been a main or explicit motivation.

There are however some exceptions in earlier work, where the focus was more explicitly on the link between energy consumption-related aspects, energy efficiency and QoE. The study of Ickin, 2012 for instance, aimed to investigate QoE influence factors of mobile applications and revealed the key role of the battery in successful QoE provisioning. In this work, it was observed that energy modelling and saving efforts are typically geared towards the immediate benefits of end users, while less attention was paid to the digital infrastructure (Popescu, 2018). Efforts were further also made in the past to describe, analyze and model the trade-off between QoE and energy consumption (QoE perceived per user per Joule, QoEJ) (Popescu, 2018) or power consumption (QoE perceived per user per Watt, QoEW) (Zhang et al., 2013), as well as to optimize resource consumption so as to avoid sources of annoyance (see e.g., (Fiedler et al., 2016). While these early efforts did not yet result in a generic end-to-end QoE-energy-model that can be used as a basis for optimizations, they provide a useful basis to build upon.

A more recent example (Hossfeld et al., 2022) in the context of video streaming services looked into possible trade-offs between varying levels of QoE and the resulting energy consumption, which is further mapped to CO₂ emissions (taking the EU emission parameter as input, as this – as mentioned – depends on the overall energy mix of green and non-renewable energy sources). Their visualization model further considers parameters such as the type of device and type of network and while it is a simplification of the multitude of possible scenarios and factors, it illustrates that it is possible to identify areas where energy consumption can be reduced while ensuring an acceptable QoE.

Another recent work (Herglotz et al., 2022) jointly analyzed end-user power efficiency and QoE related to video streaming, based on actual real-world data (i.e., YouTube streaming events). More specifically, power consumption was modelled and user-perceived QoE was estimated in order to model where optimization is possible. They found that optimization is possible and pointed to the importance of the choice of video codec, video resolution, frame rate and bitrate in this respect.

These examples point to the potential to optimize at the “production” side, however, the focus has more recently also been extended to the actual use, user expectations and “consumption” side (Jiang et al., 2021, Lange et al., 2020, Suski et al., 2020, Elgaaied-Gambier et al., 2020) Various topics are explored in this respect, e.g., digital carbon footprint calculation at the individual level (Schien et al., 2013, Preist et al., 2014), consumer awareness and pro-environmental digital habits (Elgaaied-Gambier et al., 2020; Gnanasekaran et al., 2021), or impact of user behavior (Suski et al., 2020). While we cannot discuss all of these in detail here, they all are based on the observation that there is a growing need to involve consumers and users in the collective challenge of reducing the impact of digital applications and services on the environment (Elgaaied-Gambier et al., 2020; Priest et al., 2016).

QoE research can play an important role here, extending the understanding of carbon footprint vs. QoE trade-offs to making users more aware of the actual “cost” of high QoE. A recent interview study with digital natives conducted by some of the co-authors of this column  (Gnanasekaran et al., 2021) illustrated that many users are not aware of the environmental impact of their user behavior and expectations and that even with such insights, substantial drastic changes in behavior cannot be expected. The lack of technological understanding, public information and social awareness about the topic were identified as important factors. It is therefore of utmost importance to trigger more awareness and help users see and understand their carbon footprint related to e.g., the use of video streaming services (Gnanasekaran et al., 2021). This perspective is currently missing in the field of QoE and we argue here that QoE research could – in collaboration with other disciplines and by integrating insights from other fields – play an important role here.

In terms of the motivation for adopting pro-environmental digital habits, Gnanasekaran et al., (2021) found that several factors indirectly contribute to this goal, including the striving for personal well-being. Finally, the results indicate some willingness to change and make compromises (e.g., accepting a lower video quality), albeit not an unconditional one: the alignment with other goals (e.g., personal well-being) and the nature of the perceived sacrifice and its impact play a key role. A key challenge for future work is therefore to identify and understand concrete mechanisms that could trigger more awareness amongst users about the environmental and well-being impact of their use of digital applications and services, and those that can further motivate positive behavioral change (e.g., opting for use practices that limit one’s digital carbon footprint, mindful digital consumption). By investigating the impact of various more environmentally-friendly viewing practices on QoE (e.g., actively promoting standard definition video quality instead of HD, nudging users to switch to audio-only when a service like YouTube is used as background noise or stimulating users to switch to the least data demanding viewing configuration depending on the context and purpose), QoE research could help to bridge the gap towards actual behavioral change.

Final reflections and challenges for future research

We have argued that research on users’ Quality of Experience and overall User Experience can be highly relevant to gain insights that may further drive the adoption of new, more sustainable usage patterns and that can trigger more awareness of implications of user expectations, preferences and actual use of digital services. However, the focus on continuously improving users’ Quality Experience may also trigger unwanted rebound effects, leading to an overall higher environmental footprint due to the increased use of digital applications and services. Further, it may have a negative impact on users’ long-term well-being as well.

We, therefore, need to join efforts with other communities to challenge the current design paradigm from a more critical stance, partly as “it’s difficult to see the ecological impact of IT when its benefits are so blindingly bright” (Borning et al., 2020). Richer and better experiences may lead to increased, unnecessary or even excessive consumption, further increasing individuals’ environmental impact and potentially impeding long-term well-being. Open questions are, therefore: Which fields and disciplines should join forces to mitigate the above risks? And how can QoE research — directly or indirectly — contribute to the triggering of sustainable consumption patterns and the fostering of well-being?

Further, a key question is how energy efficiency can be improved for digital services such as video streaming, videoconferencing, online gaming, etc., while still ensuring an acceptable QoE. This also points to the question of which compromises can be made in trading QoE against its environmental impact (from “willingness to pay” to “willingness to sacrifice”), under which circumstances and how these compromises can be meaningfully and realistically assessed. In this respect, future work should extend the current modelling efforts to link QoE and carbon footprint, go beyond exploring what users are willing to (more passively) endure, and also investigate how users can be more actively motivated to adjust and lower their expectations and even change their behavior.

These and related topics will be on the agenda of the Dagstuhl seminar  23042 “Quality of Sustainable Experience” and the conference QoMEX 2023 “Towards sustainable and inclusive multimedia experiences”.

Conference QoMEX 2023 “Towards sustainable and inclusive multimedia experiences

References

Belkhir, L., Elmeligi, A. (2018). “Assessing ICT global emissions footprint: Trends to 2040 & recommendations,” Journal of cleaner production, vol. 177, pp. 448–463.

Borning, A., Friedman, B., Logler, N. (2020). The ’invisible’ materiality of information technology. Communications of the ACM, 63(6), 57–64.

Chen, X., Tan, T., et al. (2020). Context-Aware and Energy-Aware Video Streaming on Smartphones. IEEE Transactions on Mobile Computing.

Coroama, V.C., Mattern, F. (2019). Digital rebound–why digitalization will not redeem us our environmental sins. In: Proceedings 6th international conference on ICT for sustainability. Lappeenranta. http://ceur-ws.org. vol. 238

Döring, N., De Moor, K., Fiedler, M., Schoenenberg, K., Raake, A. (2022). Videoconference Fatigue: A Conceptual Analysis. Int. J. Environ. Res. Public Health, 19(4), 2061 https://doi.org/10.3390/ijerph19042061

Elgaaied-Gambier, L., Bertrandias, L., Bernard, Y. (2020). Cutting the internet’s environmental footprint: An analysis of consumers’ self-attribution of responsibility. Journal of Interactive Marketing, 50, 120–135.

Feldmann, A., Gasser, O., Lichtblau, F., Pujol, E., Poese, I., Dietzel, C., … & Smaragdakis, G. (2020, October). The lockdown effect: Implications of the COVID-19 pandemic on internet traffic. In Proceedings of the ACM internet measurement conference (pp. 1-18).

Daniel Wagner, Matthias Wichtlhuber, Juan Tapiador, Narseo Vallina-Rodriguez, Oliver Hohlfeld, and Georgios Smaragdakis.

Feldmann, A., Gasser, O., Lichtblau, F., Pujol, E., Poese, I., Dietzel, C., Wagner, D., Wichtlhuber, M., Tapiador, J., Vallina-Rodriguez, N., Hohlfeld, O., Smaragdakis, G. (2020, October). The lockdown effect: Implications of the COVID-19 pandemic on internet traffic. In Proceedings of the ACM internet measurement conference (pp. 1-18).

Fiedler, M., Popescu, A., Yao, Y. (2016), “QoE-aware sustainable throughput for energy-efficient video streaming,” in 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom)(BDCloud-SocialCom-SustainCom). pp. 493–50

Flayelle, M., Maurage, P., Di Lorenzo, K.R., Vögele, C., Gainsbury, S.M., Billieux, J. (2020). Binge-Watching: What Do we Know So Far? A First Systematic Review of the Evidence. Curr Addict Rep 7, 44–60. https://doi.org/10.1007/s40429-020-00299-8

Gnanasekaran, V., Fridtun, H. T., Hatlen, H., Langøy, M. M., Syrstad, A., Subramanian, S., & De Moor, K. (2021). Digital carbon footprint awareness among digital natives: an exploratory study. In Norsk IKT-konferanse for forskning og utdanning (No. 1, pp. 99-112).

Granow, V.C., Reinecke, L., Ziegele, M. (2018): Binge-watching and psychological well-being: media use between lack of control and perceived autonomy. Communication Research Reports 35 (5), 392–401.

Hazas, M. and Nathan, L. (Eds.)(2018). Digital Technology and Sustainability. London: Routledge.

Herglotz, C., Springer, D., Reichenbach,  M., Stabernack B. and Kaup, A. (2018). “Modeling the Energy Consumption of the HEVC Decoding Process,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 1, pp. 217-229, Jan. 2018, doi: 10.1109/TCSVT.2016.2598705.

Hossfeld, T., Varela, M., Skorin-Kapov, L. Heegaard, P.E. (2022). What is the trade-off between CO2 emission and videoconferencing QoE. ACM SIGMM records, https://records.sigmm.org/2022/03/31/what-is-the-trade-off-between-co2-emission-and-video-conferencing-qoe/

Ickin, S., Wac, K., Fiedler, M. and Janowski, L. (2012). “Factors influencing quality of experience of commonly used mobile applications,” IEEE Communications Magazine, vol. 50, no. 4, pp. 48–56.

IPCC, 2021: Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V., P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, In press, doi:10.1017/9781009157896.

Jiang, P., Van Fan, Y., Klemes, J.J. (2021). Impacts of covid-19 on energy demand and consumption: Challenges, lessons and emerging opportunities. Applied energy, 285, 116441.

Khan, D., Shah, D. and Shah, S.S. (2020). “COVID-19 pandemic and its positive impacts on environment: an updated review,” International Journal of Environmental Science and Technology, pp. 1–10, 2020.

Lange, S., Pohl, J., Santarius, T. (2020). Digitalization and energy consumption. Does ICT reduce energy demand? Ecological Economics, 176, 106760.

Morley, J., Widdicks, K., Hazas, M. (2018). Digitalisation, energy and data demand: The impact of Internet traffic on overall and peak electricity consumption. Energy Research & Social Science, 38, 128–137.

Obringer, R., Rachunok, B., Maia-Silva, D., Arbabzadeh, M., Roshanak, N., Madani, K. (2021). The overlooked environmental footprint of increasing internet use. Resources, Conservation and Recycling, 167, 105389.

Popescu, A. (Ed.)(2018). Greening Video Distribution Networks, Springer.

Preist, C., Schien, D., Blevis, E. (2016). “Understanding and mitigating the effects of device and cloud service design decisions on the environmental footprint of digital infrastructure,” in Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 1324–1337.

Preist, C., Schien, D., Shabajee, P. , Wood, S. and Hodgson, C. (2014). “Analyzing End-to-End Energy Consumption for Digital Services,” Computer, vol. 47, no. 5, pp. 92–95.

Raake, A., Fiedler, M., Schoenenberg, K., De Moor, K., Döring, N. (2022). Technological Factors Influencing Videoconferencing and Zoom Fatigue. arXiv:2202.01740, https://doi.org/10.48550/arXiv.2202.01740

Schien, D., Shabajee, P., Yearworth, M. and Preist, C. (2013), Modeling and Assessing Variability in Energy Consumption During the Use Stage of Online Multimedia Services. Journal of Industrial Ecology, 17: 800-813. https://doi.org/10.1111/jiec.12065

Suski, P., Pohl, J., Frick, V. (2020). All you can stream: Investigating the role of user behavior for greenhouse gas intensity of video streaming. In: Proceedings of the 7th International Conference on ICT for Sustainability. p. 128–138. ICT4S2020, Association for Computing Machinery, New York, NY, USA.

The Shift Project, Climate crisis: the unsustainable use of online video: Our new report on the environmental impact of ICT. https://theshiftproject.org/en/article/unsustainable-use-online-video/

Wechsung, I., De Moor, K. (2014). Quality of Experience Versus User Experience. In: Möller, S., Raake, A. (eds) Quality of Experience. T-Labs Series in Telecommunication Services. Springer, Cham. https://doi.org/10.1007/978-3-319-02681-7_3

Widdicks, K., Hazas, M., Bates, O., Friday, A. (2019). “Streaming, Multi-Screens and YouTube: The New (Unsustainable) Ways of Watching in the Home,” in Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, ser. CHI ’19. New York, NY, USA: Association for Computing Machinery, p. 1–13.

Zhang, X., Zhang, J., Huang, Y., Wang, W. (2013). “On the study of fundamental trade-offs between QoE and energy efficiency in wireless networks,” Transactions on Emerging Telecommunications Technologies, vol. 24, no. 3, pp. 259–265.

What is the trade-off between CO2 emission and video-conferencing QoE?

It is a natural thing that users of multimedia services want to have the highest possible Quality of Experience (QoE), when using said services. This is especially so in contexts such as video-conferencing and video streaming services, which are nowadays a large part of many users’ daily life, be it work-related Zoom calls, or relaxing while watching Netflix. This has implications in terms of the energy consumed for the provision of those services (think of the cloud services involved, the networks, and the users’ own devices), and therefore it also has an impact on the resulting CO₂ emissions. In this column, we look at the potential trade-offs involved between varying levels of QoE (which for video services is strongly correlated with the bit rates used), and the resulting CO₂ emissions. We also look at other factors that should be taken into account when making decisions based on these calculations, in order to provide a more holistic view of the environmental impact of these types of services, and whether they do have a significant impact.

Energy Consumption and CO2 Emissions for Internet Service Delivery

Understanding the footprint of Internet service delivery is a challenging task. On one hand, the infrastructure and software components involved in the service delivery need to be known. For a very fine-grained model, this requires knowledge of all components along the entire service delivery chain: end-user devices, fixed or mobile access network, core network, data center and Internet service infrastructure. Furthermore, the footprint may need to consider the CO₂ emissions for producing and manufacturing the hardware components as well as the CO₂ emissions during runtime. Life cycle assessment is then necessary to obtain CO₂ emission per year for hardware production. However, one may argue that the infrastructure is already there and therefore the focus will be on the energy consumption and CO₂ emission during runtime and delivery of the services. This is also the approach we follow here to provide quantitative numbers of energy consumption and CO₂ emission for Internet-based video services. On the other hand, quantitative numbers are needed beyond the complexity of understanding and modelling the contributors to energy consumption and C02 emission.

To overcome this complexity, the literature typically considers key figures on the overall data traffic and service consumption times aggregated over users and services over a longer period of time, e.g., one year. In addition, the total energy consumption of mobile operators and data centres is considered. Together with the information on e.g., the number of base station sites, this gives some estimates, e.g., on the average power consumption per site or the average data traffic per base station site [Feh11]. As a result, we obtain measures such as energy per bit (Joule/bit) determining the energy efficiency of a network segment. In [Yan19], the annual energy consumption of Akamai is converted to power consumption and then divided by the maximum network traffic, which results again in the energy consumption per bit of Akamai’s data centers. Knowing the share of energy sources (nonrenewable energy, including coal, natural gas, oil, diesel, petroleum; renewable energy including solar, geothermal, wind energy, biomass, hydropower from flowing water), allows relating the energy consumption to the total CO₂ emissions. For example, the total contribution from renewables exceeded 40% in 2021 in Germany and Finland, Norway has about 60%, Croatia about 36% (statistics from 2020).

A detailed model of the total energy consumption of mobile network services and applications is provided in [Yan19]. Their model structure considers important factors from each network segment from cloud to core network, mobile network, and end-user devices. Furthermore, service-specific energy consumption are provided. They found that there are strong differences between the service type and the emerging data traffic pattern. However, key factors are the amount of data traffic and the duration of the services. They also consider different end-to-end network topologies (user-to-data center, user-to-user via data center, user-to-user and P2P communication). Their model of the total energy consumption is expressed as the sum of the energy consumption of the different segments:

  • Smartphone: service-specific energy depends among others on the CPU usage and the network usage e.g. 4G over the duration of use,
  • Base station and access network: data traffic and signalling traffic over the duration of use,
  • Wireline core network: service specific energy consumption of a mobile service taking into account the data traffic volume and the energy per bit,
  • Data center: energy per bit of the data center is multiplied by data traffic volume of the mobile service.

The Shift Project [TSP19] provides a similar model which is called the “1 Byte Model”. The computation of energy consumption is transparently provided in calculation sheets and discussed by the scientific community. As a result of the discussions [Kam20a,Kam20b], an updated model was released [TSP20] clarifying a simple bit/byte conversion issue. The suggested models in [TSP20, Kam20b] finally lead to comparable numbers in terms of energy consumption and CO₂ emission. As a side remark: Transparency and reproducibility are key for developing such complex models!

The basic idea of the 1 Byte Model for computing energy consumption is to take into account the time t of Internet service usage and the overall data volume v. The time of use is directly related to the energy consumption of the display of an end-user device, but also for allocating network resources. The data volume to transmit through the network, but also to generate or process data for cloud services, drives the energy consumption additionally. The model does not differentiate between Internet services, but they will result in different traffic volumes over the time of use. Then, for each segment i (device, network, cloud) a linear model E_i(t,v)=a_i * t + b_i * v + c_i is provided to quantify the energy consumption. To be more precise, the different coefficients are provided for each segment by [TSP20]. The overall energy consumption is then E_total = E_device + E_network + E_cloud.

CO₂ emission is then again a linear model of the total energy consumption (over the time of use of a service), which depends on the share of nonrenewable and renewable energies. Again, The Shift Project derives such coefficients for different countries and we finally obtain CO2 = k_country * E_total.

The Trade-off between QoE and CO2 Emissions

As a use case, we consider hosting a scientific conference online through video-conferencing services. Assume there are 200 conference participants attending the video-conferencing session. The conference lasts for one week, with 6 hours of online program per day.  The video conference software requires the following data rates for streaming the sessions (video including audio and screen sharing):

  • high-quality video: 1.0 Mbps
  • 720p HD video: 1.5 Mbps
  • 1080p HD video: 3 Mbps

However, group video calls require even higher bandwidth consumption. To make such experiences more immersive, even higher bit rates may be necessary, for instance, if using VR systems for attendance.

A simple QoE model may map the video bit rate of the current video session to a mean opinion score (MOS). [Lop18] provides a logistic regression MOS(x) depending on the video bit rate x in Mbps: f(x) = m_1 log x + m_2

Then, we can connect the QoE model with the energy consumption and CO₂ emissions model from above in the following way. We assume a user attending the conference for time t. With a video bit rate x, the emerging data traffic is v = x*t. Those input parameters are now used in the 1 Byte Model for a particular device (laptop, smartphone), type of network (wired, wifi, mobile), and country (EU, US, China).

Figure 1 shows the trade-off between the MOS and energy consumption (left y-axis). The energy consumption is mapped to CO₂ emission by assuming the corresponding parameter for the EU, and that the conference participants are all connected with a laptop. It can be seen that there is a strong increase in energy consumption and CO₂ emission in order to reach the best possible QoE. The MOS score of 4.75 is reached if a video bit rate of roughly 11 Mbps is used. However, with 4.5 Mbps, a MOS score of 4 is already reached according to that logarithmic model. This logarithmic behaviour is a typical observation in QoE and is connected to the Weber-Fechner law, see [Rei10]. As a consequence, we may significantly save energy and CO₂ when not providing the maximum QoE, but “only” good quality (i.e., MOS score of 4). The meaning of the MOS ratings is 5=Excellent, 4=Good, 3=Fair, 2=Poor, 1=Bad quality.

Figure 1: Trade-off between MOS and energy consumption or CO2 emission.

Figure 2, therefore, visualized the gain when delivering the video in lower quality and lower video bit rates. In fact, the gain compared to the efforts for MOS 5 are visualized. To get a better understanding of the meaning of those CO₂ numbers, we express the CO₂ gain now in terms of thousands of kilometers driving by car. Since the CO₂ emission depends on the share of renewable energies, we may consider different countries and the parameters as provided in [TSP20]. We see that ensuring each conference participant a MOS score of 4 instead of MOS 5 results in savings corresponding to driving approximately 40000 kilometers by car assuming the renewable energy share in the EU – this is the distance around the Earth! Assuming the energy share in China, this would save more than 90000 kilometers. Of course, you could also save 90 000 kilometers by walking – which requires however about 2 years non-stop with a speed of 5 km/h. Note that this large amount of CO₂ emission is calculated assuming a data rate of 15 Mbps over 5 days (and 6 hours per day), resulting in about 40.5 TB of data that needs to be transferred to the 200 conference participants.

Figure 2: Relating the CO2 emission in different countries for achieving this MOS to the distance by travelling in a car (in thousands of kilometers).

Discussions

Raising awareness of CO₂ emissions due to Internet service consumption is crucial. The abstract CO₂ emission numbers may be difficult to understand, but relating this to more common quantities helps to understand the impact individuals have. Of course, the provided numbers only give an impression, since the models are very simple and do not take into account various facets. However, the numbers nicely demonstrate the potential trade-off between QoE of end-users and sustainability in terms of energy consumption and CO₂ emission. In fact, [Gna21] conducted qualitative interviews and found that there is a lack of awareness of the environmental impact of digital applications and services, even for digital natives. In particular, an underlying issue is that there is a lack of understanding among end-users as to how Internet service delivery works, which infrastructure components play a role and are included along the end-to-end service delivery path, etc. Hence, the environmental impact is unclear for many users. Our aim is thus to contribute to overcoming this issue by raising awareness on this matter, starting with simplified models and visualizations.

[Gna21] also found that users indicate a certain willingness to make compromises between their digital habits and the environmental footprint. Given global climate changes and increased environmental awareness among the general population, such a trend in willingness to make compromises may be expected to further increase in the near future. Hence, it may be interesting for service providers to empower users to decide their environmental footprint at the cost of lower (yet still satisfactory) quality. This will also reduce the costs for operators and seems to be a win-win situation if properly implemented in Internet services and user interfaces.

Nevertheless, tremendous efforts are also currently being undertaken by Internet companies to become CO₂ neutral in the future. For example, Netflix claims in [Netflix2021] that they plan to achieve net-zero greenhouse gas emissions by the close of 2022. Similarly, also economic, societal, and environmental sustainability is seen as a key driver for 6G research and development [Mat21]. However, the time horizon is on a longer scope, e.g., a German provider claims they will reach climate neutrality for in-house emissions by 2025 at the latest and net-zero from production to the customer by 2040 at the latest [DT21]. Hence, given the urgency of the matter, end-users and all stakeholders along the service delivery chain can significantly contribute to speeding up the process of ultimately achieving net-zero greenhouse gas emissions.

References

  • [TSP19] The Shift Project, “Lean ict: Towards digital sobriety,” directed by Hugues Ferreboeuf, Tech. Rep., 2019, last accessed: March 2022. Available online (last accessed: March 2022)
  • [Yan19] M. Yan, C. A. Chan, A. F. Gygax, J. Yan, L. Campbell,A. Nirmalathas, and C. Leckie, “Modeling the total energy consumption of mobile network services and applications,” Energies, vol. 12, no. 1, p. 184, 2019.
  • [TSP20] Maxime Efoui Hess and Jean-Noël Geist, “Did The Shift Project really overestimate the carbon footprint of online video? Our analysis of the IEA and Carbonbrief articles”, The Shift Project website, June 2020, available online (last accessed: March 2022) PDF
  • [Kam20a] George Kamiya, “Factcheck: What is the carbon footprint of streaming video on Netflix?”, CarbonBrief website, February 2020. Available online (last accessed: March 2022)
  • [Kam20b] George Kamiya, “The carbon footprint of streaming video: fact-checking the headlines”, IEA website, December 2020. Available online (last accessed: March 2022)
  • [Feh11] Fehske, A., Fettweis, G., Malmodin, J., & Biczok, G. (2011). The global footprint of mobile communications: The ecological and economic perspective. IEEE communications magazine, 49(8), 55-62.
  • [Lop18]  J. P. López, D. Martín, D. Jiménez, and J. M. Menéndez, “Prediction and modeling for no-reference video quality assessment based on machine learning,” in 2018 14th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), IEEE, 2018, pp. 56–63.
  • [Gna21] Gnanasekaran, V., Fridtun, H. T., Hatlen, H., Langøy, M. M., Syrstad, A., Subramanian, S., & De Moor, K. (2021, November). Digital carbon footprint awareness among digital natives: an exploratory study. In Norsk IKT-konferanse for forskning og utdanning (No. 1, pp. 99-112).
  • [Rei10] Reichl, P., Egger, S., Schatz, R., & D’Alconzo, A. (2010, May). The logarithmic nature of QoE and the role of the Weber-Fechner law in QoE assessment. In 2010 IEEE International Conference on Communications (pp. 1-5). IEEE.
  • [Netflix21] Netflix: “Environmental Social Governance 2020”,  Sustainability Accounting Standards Board (SASB) Report, (2021, March). Available online (last accessed: March 2022)
  • [Mat21] Matinmikko-Blue, M., Yrjölä, S., Ahokangas, P., Ojutkangas, K., & Rossi, E. (2021). 6G and the UN SDGs: Where is the Connection?. Wireless Personal Communications, 121(2), 1339-1360.
  • [DT21] Hannah Schauff. Deutsche Telekom tightens its climate targets (2021, January). Available online (last accessed: March 2022)

Towards an updated understanding of immersive multimedia experiences

Bringing theories and measurement techniques up to date

Development of technology for immersive multimedia experiences

Immersive multimedia experiences, as its name is suggesting are those experiences focusing on media that is able to immerse users with different interactions into an experience of an environment. Through different technologies and approaches, immersive media is emulating a physical world through the means of a digital or simulated world, with the goal of creating a sense of immersion. Users are involved in a technologically driven environment where they may actively join and participate in the experiences offered by the generated world [White Paper, 2020]. Currently, as hardware and technologies are developing further, those immersive experiences are getting better with the more advanced feeling of immersion. This means that immersive multimedia experiences are exceeding just the viewing of the screen and are enabling bigger potential. This column aims to present and discuss the need for an up to date understanding of immersive media quality. Firstly, the development of the constructs of immersion and presence over time will be outlined. Second, influencing factors of immersive media quality will be introduced, and related standardisation activities will be discussed. Finally, this column will be concluded by summarising why an updated understanding of immersive media quality is urgent.

Development of theories covering immersion and presence

One of the first definitions of presence was established by Slater and Usoh already in 1993 and they defined presence as a “sense of presence” in a virtual environment [Slater, 1993]. This is in line with other early definitions of presence and immersion. For example, Biocca defined immersion as a system property. Those definitions focused more on the ability of the system to technically accurately provide stimuli to users [Biocca, 1995]. As technology was only slowly capable to provide systems that are able to generate stimulation to users that can mimic the real world, this was of course the main content of definitions. Quite early on questionnaires to capture the experienced immersion were introduced, such as the Igroup Presence Questionnaire (IPQ) [Schubert, 2001]. Also, the early methods for measuring experiences are mainly focused on aspects of how good the representation of the real world was done and perceived. With maturing technology, the focus was shifted more towards emotions and more cognitive phenomena besides the basics stimulus generation. For example, Baños and colleagues showed that experienced emotion and immersion are in relation to each other and also influence the sense of presence [Baños, 2004]. Newer definitions focus more on these mentioned cognitive aspects, e.g., Nilsson defines three factors that can lead to immersion: (i) technology, (ii) narratives, and (iii) challenges, where only the factor technology is a non-cognitive one [Nilsson, 2016]. In 2018, Slater defines the place illusion as the illusion of being in a place while knowing one is not really there. This is a focus on a cognitive construct, removal of disbelieve, but still leaves the focus of how the illusion is created mainly on system factors instead of cognitive ones [Slater, 2018]. In recent years, more and more activities were started to define how to measure immersive experiences as an overall construct.

Constructs of interest in relation to immersion and presence

This section discusses constructs and activities that are related to immersion and presence. In the beginning, subtypes of extended reality (XR) and the relation to user experience (UX) as well as quality of experience (QoE) are outlined. Afterwards, recent standardization activities related to immersive multimedia experiences are introduced and discussed.
Moreover, immersive multimedia experiences can be divided by many different factors, but recently the most common distinctions are regarding the interactivity where content can be made for multi-directional viewing as 360-degree videos, or where content is presented through interactive extended reality. Those XR technologies can be divided into mixed reality (MR), augmented reality (AR), augmented virtuality (AV), virtual reality (VR), and everything in between [Milgram, 1995]. Through all those areas immersive multimedia experiences have found a place on the market, and are providing new solutions to challenges in research as well as in industries, with a growing potential of adopting into different areas [Chuah, 2018].

While discussing immersive multimedia experiences, it is important to address user experience and quality of immersive multimedia experiences, which can be defined following the definition of quality of experience itself [White Paper, 2012] as a measure of the delight or annoyance of a customer’s experiences with a service, wherein this case service is an immersive multimedia experience. Furthermore, while defining QoE terms experience and application are also defined and can be utilized for immersive multimedia experience, where an experience is an individual’s stream of perception and interpretation of one or multiple events; and application is a software and/or hardware that enables usage and interaction by a user for a given purpose [White Paper 2012].

As already mentioned, immersive media experiences have an impact in many different fields, but one, where the impact of immersion and presence is particularly investigated, is gaming applications along with QoE models and optimizations that go with it. Specifically interesting is the framework and standardization for subjective evaluation methods for gaming quality [ITU-T Rec. P.809, 2018]. This standardization is providing instructions on how to assess QoE for gaming services from two possible test paradigms, i.e., passive viewing tests and interactive tests. However, even though detailed information about the environments, test set-ups, questionnaires, and game selection materials are available those are still focused on the gaming field and concepts of flow and immersion in games themselves.

Together with gaming, another step in defining and standardizing infrastructure of audiovisual services in telepresence, immersive environments, and virtual and extended reality, has been done in regards to defining different service scenarios of immersive live experience [ITU-T Rec. H.430.3, 2018] where live sports, entertainment, and telepresence scenarios have been described. With this standardization, some different immersive live experience scenarios have been described together with architectural frameworks for delivering such services, but not covering all possible use case examples. When mentioning immersive multimedia experience, spatial audio sometimes referred to as “immersive audio” must be mentioned as is one of the key features of especially of AR or VR experiences [Agrawal, 2019], because in AR experiences it can provide immersive experiences on its own, but also enhance VR visual information.
In order to be able to correctly assess QoE or UX, one must be aware of all characteristics such as user, system, content, and context because their actual state may have an influence on the immersive multimedia experience of the user. That is why all those characteristics are defined as influencing factors (IF) and can be divided into Human IF, System IF, and Context IF and are as well standardized for virtual reality services [ITU-T Rec. G.1035, 2021]. Particularly addressed Human IF is simulator sickness as it specifically occurs as a result of exposure to immersive XR environments. Simulator sickness is also known as cybersickness or VR/AR sickness, as it is visually induced motion sickness triggered by visual stimuli and caused by the sensory conflict arising between the vestibular and visual systems. Therefore, to achieve the full potential of immersive multimedia experience, the unwanted sensation of simulation sickness must be reduced. However, with the frequent change of immersive technology, some hardware improvement is leading to better experiences, but a constant updating of requirement specification, design, and development is needed together with it to keep up with the best practices.

Conclusion – Towards an updated understanding

Considering the development of theories, definitions, and influencing factors around the constructs immersion and presence, one can see two different streams. First, there is a quite strong focus on the technical ability of systems in most early theories. Second, the cognitive aspects and non-technical influencing factors gain importance in the new works. Of course, it is clear that in the 1990ies, technology was not yet ready to provide a good simulation of the real world. Therefore, most activities to improve systems were focused on that activity including measurements techniques. In the last few years, technology was fast developing and the basic simulation of a virtual environment is now possible also on mobile devices such as the Oculus Quest 2. Although concepts such as immersion or presence are applicable from the past, definitions dealing with those concepts need to capture as well nowadays technology. Meanwhile, systems have proven to provide good real-world simulators and provide users with a feeling of presence and immersion. While there is already activity in standardization which is quite strong and also industry-driven, research in many research disciplines such as telecommunication are still mainly using old questionnaires. These questionnaires are mostly focused on technological/real-world simulation constructs and, thus, not able to differentiate products and services anymore to an extent that is optimal. There are some newer attempts to create new measurement tools for e.g. social aspects of immersive systems [Li, 2019; Toet, 2021]. Measurement scales aiming at capturing differences due to the ability of systems to create realistic simulations are not able to reliably differentiate different systems due to the fact that most systems are providing realistic real-world simulations. To enhance research and industrial development in the field of immersive media, we need definitions of constructs and measurement methods that are appropriate for the current technology even if the newer measurement and definitions are not as often cited/used yet. That will lead to improved development and in the future better immersive media experiences.

One step towards understanding immersive multimedia experiences is reflected by QoMEX 2022. The 14th International Conference on Quality of Multimedia Experience will be held from September 5th to 7th, 2022 in Lippstadt, Germany. It will bring together leading experts from academia and industry to present and discuss current and future research on multimedia quality, Quality of Experience (QoE), and User Experience (UX). It will contribute to excellence in developing multimedia technology towards user well-being and foster the exchange between multidisciplinary communities. One core topic is immersive experiences and technologies as well as new assessment and evaluation methods, and both topics contribute to bringing theories and measurement techniques up to date. For more details, please visit https://qomex2022.itec.aau.at.

References

[Agrawal, 2019] Agrawal, S., Simon, A., Bech, S., Bærentsen, K., Forchhammer, S. (2019). “Defining Immersion: Literature Review and Implications for Research on Immersive Audiovisual Experiences.” In Audio Engineering Society Convention 147. Audio Engineering Society.
[Biocca, 1995] Biocca, F., & Delaney, B. (1995). Immersive virtual reality technology. Communication in the age of virtual reality, 15(32), 10-5555.
[Baños, 2004] Baños, R. M., Botella, C., Alcañiz, M., Liaño, V., Guerrero, B., & Rey, B. (2004). Immersion and emotion: their impact on the sense of presence. Cyberpsychology & behavior, 7(6), 734-741.
[Chuah, 2018] Chuah, S. H. W. (2018). Why and who will adopt extended reality technology? Literature review, synthesis, and future research agenda. Literature Review, Synthesis, and Future Research Agenda (December 13, 2018).
[ITU-T Rec. G.1035, 2021] ITU-T Recommendation G:1035 (2021). Influencing factors on quality of experience for virtual reality services, Int. Telecomm. Union, CH-Geneva.
[ITU-T Rec. H.430.3, 2018] ITU-T Recommendation H:430.3 (2018). Service scenario of immersive live experience (ILE), Int. Telecomm. Union, CH-Geneva.
[ITU-T Rec. P.809, 2018] ITU-T Recommendation P:809 (2018). Subjective evaluation methods for gaming quality, Int. Telecomm. Union, CH-Geneva.
[Li, 2019] Li, J., Kong, Y., Röggla, T., De Simone, F., Ananthanarayan, S., De Ridder, H., … & Cesar, P. (2019, May). Measuring and understanding photo sharing experiences in social Virtual Reality. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (pp. 1-14).
[Milgram, 1995] Milgram, P., Takemura, H., Utsumi, A., & Kishino, F. (1995, December). Augmented reality: A class of displays on the reality-virtuality continuum. In Telemanipulator and telepresence technologies (Vol. 2351, pp. 282-292). International Society for Optics and Photonics.
[Nilsson, 2016] Nilsson, N. C., Nordahl, R., & Serafin, S. (2016). Immersion revisited: a review of existing definitions of immersion and their relation to different theories of presence. Human Technology, 12(2).
[Schubert, 2001] Schubert, T., Friedmann, F., & Regenbrecht, H. (2001). The experience of presence: Factor analytic insights. Presence: Teleoperators & Virtual Environments, 10(3), 266-281.
[Slater, 1993] Slater, M., & Usoh, M. (1993). Representations systems, perceptual position, and presence in immersive virtual environments. Presence: Teleoperators & Virtual Environments, 2(3), 221-233.
[Toet, 2021] Toet, A., Mioch, T., Gunkel, S. N., Niamut, O., & van Erp, J. B. (2021). Holistic Framework for Quality Assessment of Mediated Social Communication.
[Slater, 2018] Slater, M. (2018). Immersion and the illusion of presence in virtual reality. British Journal of Psychology, 109(3), 431-433.
[White Paper, 2012] Qualinet White Paper on Definitions of Quality of Experience (2012). European Network on Quality of Experience in Multimedia Systems and Services (COST Action IC 1003), Patrick Le Callet, Sebastian Möller and Andrew Perkis, eds., Lausanne, Switzerland, Version 1.2, March 2013.
[White Paper, 2020] Perkis, A., Timmerer, C., Baraković, S., Husić, J. B., Bech, S., Bosse, S., … & Zadtootaghaj, S. (2020). QUALINET white paper on definitions of immersive media experience (IMEx). arXiv preprint arXiv:2007.07032.