Explainable Artificial Intelligence for Quality of Experience Modelling

Michael Seufert and Nikolas Wehner (University of Würzburg, Germany)

Tobias Hoßfeld (University of Würzburg, Germany)
Christian Timmerer (Alpen-Adria-Universität (AAU) Klagenfurt and Bitmovin Inc., Austria)

Data-driven Quality of Experience (QoE) modelling using Machine Learning (ML) arose as a promising alternative to the cumbersome and potentially biased manual QoE modelling. However, the reasoning of a majority of ML models is not explainable due to their black-box characteristics, which prevents us from gaining insights about how the model actually related QoE influence factors and QoE. These fundamental relationships are highly relevant for QoE researchers and service and network providers though.

With the emerging field of eXplainable Artificial Intelligence (XAI) and its recent technological advances, these issues can now be resolved. As a consequence, XAI enables data-driven QoE modelling to obtain generalizable QoE models and provides us simultaneously with the model’s reasoning on which QoE factors are relevant and how they affect the QoE score. In this work, we showcase the feasibility of explainable data-driven QoE modelling for video streaming and web browsing, before we discuss the opportunities and challenges of deploying XAI for QoE modelling.


In order to enhance services and networks and prevent users from switching to competitors, researchers and service providers need a deep understanding of the factors that influence the Quality of Experience (QoE) [1]. However, developing an effective QoE model is a complex and costly endeavour. Typically, it requires dedicated and extensive studies, which can only cover a limited portion of the parameter space and may be influenced by the study design. These studies often generate a relatively small sample of QoE ratings from a comparatively small population, making them vulnerable to poor performance when applied to unseen data. Moreover, the process of collecting and processing data for QoE modelling is not only arduous and time-consuming, but it can also introduce biases and self-fulfilling prophecies, such as perceiving an exponential relationship when one is expected.

To overcome these challenges, data-driven QoE modelling utilizing machine learning (ML) has emerged as a promising alternative, especially in scenarios where there is a wealth of data available or where data streams can be continuously obtained. A notable example is the ITU-T standard P.1203 [2], which estimates video streaming QoE by combining manual modelling – accounting for 75% of the Mean Opinion Score (MOS) estimation – and ML-based Random Forest modelling – accounting for the remaining 25%. The inclusion of the ML component in P.1203 indicates its ability to enhance performance. However, the inner workings of P.1203’s Random Forest model, specifically how it calculates the output score, are not obvious. Also, the survey in [3] shows that ML-based QoE modelling in multimedia systems is already widely used, including Virtual Reality, 360-degree video, and gaming. However, the QoE models are based on shallow learning methods, e.g., Support Vector Machines (SVM), or on deep learning methods, which lack explainability. Thus, it is difficult to understand what QoE factors are relevant and how they affect the QoE score [13], resulting in a lack of trust in data-driven QoE models and impeding their widespread adoption by researchers and providers [14].

Fortunately, recent advancements in the field of eXplainable Artificial Intelligence (XAI) [6] have paved the way for interpretable ML-based QoE models, thereby fostering trust between stakeholders and the QoE model. These advancements encompass a diverse range of XAI techniques that can be applied to existing black-box models, as well as novel and sophisticated ML models designed with interpretability in mind. Considering the use case of modelling video streaming QoE from real subjective ratings, the work in [4] evaluates the feasibility of explainable, data-driven QoE modelling and discusses the deployment of XAI for QoE research.

The utilization of XAI for QoE modelling brings several benefits. Not only does it speed up the modelling process, but it also enables the identification of the most influential QoE factors and their fundamental relationships with the Mean Opinion Score (MOS). Furthermore, it helps eliminate biases and preferences from different research teams and datasets that could inadvertently influence the model. All that is required is a selective dataset with descriptive features and corresponding QoE ratings (labels), which covers the most important QoE influence factors and, in particular, also rare events, e.g., many stalling events in a session. Generating such complete datasets, however, is an open research question, but calls for data-centric AI [15]. By merging datasets from various studies, more robust and generalizable QoE models can theoretically be created. These studies need to have a common ground though. Another benefit is the fact that the models can also be automatically refined over time as new QoE studies are conducted and additional data becomes available.

XAI: eXplainable Artificial Intelligence

For a comprehensive understanding of eXplainable Artificial Intelligence (XAI), a general overview can be found in [5], while a thorough survey on XAI methods and a taxonomy of XAI methods, in general, is available in [6].

XAI methods can be categorized into two main types: local and global explainability techniques. Local explainability aims to provide explanations for individual stimuli in terms of QoE factors and QoE ratings. On the other hand, global explainability focuses on offering general reasoning for how a model derives the QoE rating from the underlying QoE factors. Furthermore, XAI methods can be classified into post-hoc explainers and interpretable models.

Post-hoc explainers [6] are commonly used to explain various black-box models, such as neural networks or ensemble techniques after they have been trained. One widely utilized post-hoc explainer is SHAP values [7], which originates from game theory. SHAP values quantify the contribution of each feature to the model’s prediction by considering all possible feature subsets and learning a model for each subset. Other post-hoc explainers include LIME and Anchors, although they are limited to classification tasks.

Interpretable models, by design, provide explanations for how the model arrives at its output. Well-known interpretable models include linear models and decision trees. Additionally, generalized additive models (GAM) are gaining recognition as interpretable models.

A GAM is a generalized linear model in which the model output is computed by summing up each of the arbitrarily transformed input features along with a bias [8]. The form of a GAM enables a direct interpretation of the model by analyzing the learned functions and the transformed inputs, which allows to estimate the influence of a feature. Two state-of-the-art ML-based GAM models are Explainable Boosting Machine (EBM) [9] and Neural Additive Model (NAM) [8]. While EBM uses decision trees to learn the functions and gradient boosting to improve training, NAM utilizes arbitrary neural networks to learn the functions, resulting in a neural network architecture with one sub-network per feature. EBM extends GAM by also considering additional pairwise feature interaction terms while maintaining explainability.

Exemplary XAI-based QoE Modelling using GAMs

We demonstrate the learned predictor functions for both EBM (red) and NAM (blue) on a video QoE dataset in Figure 1. All technical details about the dataset and the methodology can be found in [4]. We observe that both models provide smooth shape functions, which are easy to interpret. EBM and NAM differ only marginally and mostly in areas where the data density is low. Here, EBM outperforms NAM by overfitting on single data points using the feature interaction terms. We can see this, for example, for a high total stalling duration and a high number of quality switches, where at some point EBM stops the negative trend and strongly contrasts its previous trend to improve predictions for extreme outliers.

Figure 1: EBM and NAM for video QoE modelling

Using the smooth predictor functions, it is easy to apply curve fitting. In the bottom right plot of Figure 1, we fit the average bitrate predictor function of NAM, which was shifted by the average MOS of the dataset to obtain the original MOS scale on the y-axis, on an inverted x-axis using exponential (IQX), logarithmic (WQL), and linear functions (LIN). Note that this constitutes a univariate mapping of average bitrate to MOS, neglecting the other influencing factors. We observe that our predictor function follows the WQL hypothesis [10] (red) with a high R²=0.967. This is in line with the mechanics of P.1203, where the authors of [11] showed the same logarithmic behavior for the bitrate in mode 0.

Figure 2: EBM and NAM for web QoE modelling

As the presented XAI methods are universally applicable to any QoE dataset, Figure 2 shows a similar GAM-based QoE modelling for a web QoE dataset obtained from [12]. We can see that the loading behavior in terms of ByteIndex-Page Load Time (BI-PLT) and time to last byte (TTLB) has the strongest impact on web QoE. Moreover, we see that different URLs/webpages have a different effect on the MOS, which shows that web QoE is content dependent. Summarizing, using GAMs, we obtain valuable easy to interpret functions, which explain fundamental relationships between QoE factors and MOS. Nevertheless, further XAI methods can be utilized, as detailed in [4,5,6].


In addition to expediting the modelling process and mitigating modelling biases, data-driven QoE modelling offers significant advantages in terms of improved accuracy and generalizability compared to manual QoE models. ML-based models are not constrained to specific classes of continuous functions typically used in manual modelling, allowing them to capture more complex relationships present in the data. However, a challenge with ML-based models is the risk of overfitting, where the model becomes overly sensitive to noise and fails to capture the underlying relationships. Overfitting can be avoided through techniques like model regularization or by collecting sufficiently large or complete datasets.

Successful implementation of data-driven QoE modelling relies on purposeful data collection. It is crucial to ensure that all (or at least the most important) QoE factors are included in the dataset, covering their full parameter range with an adequate number of samples. Controlled lab or crowdsourcing studies can define feature values easily, but budget constraints (time and cost) often limit data collection to a small set of selected feature values. Conversely, field studies can encompass a broader range of feature values observed in real-world scenarios, but they may only gather limited data samples for rare events, such as video sessions with numerous stalling events. To prevent data bias, it is essential to balance feature values, which may require purposefully generating rare events in the field. Additionally, thorough data cleaning is necessary. While it is possible to impute missing features resulting from measurement errors, doing so increases the risk of introducing bias. Hence, it is preferable to filter out missing or unusual feature values.

Moreover, adding new data and retraining an ML model is a natural and straightforward process in data-driven modelling, offering long-term advantages. Eventually, data-driven QoE models would be capable of handling concept drift, which refers to changes in the importance of influencing factors over time, such as altered user expectations. However, QoE studies are rarely conducted as temporal and population-based snapshots, limiting frequent model updates. Ideally, a pipeline could be established to provide a continuous stream of features and QoE ratings, enabling online learning and ensuring the QoE models remain up to date. Although challenging for research endeavors, service providers could incorporate such QoE feedback streams into their applications

Comparing black-box and interpretable ML models, there is a slight trade-off between performance and explainability. However, as shown in [4], it should be negligible in the context of QoE modelling. Instead, XAI allows to fully understand the model decisions, identifying relevant QoE factors and their relationships to the QoE score. Nevertheless, it has to be considered that explaining models becomes inherently more difficult when the number of input features increases. Highly correlated features and interactions may further lead to misinterpretations when using XAI since the influence of a feature may also depend on other features. To obtain reliable and trustworthy explainable models, it is, therefore, crucial to exclude highly correlated features.

Finally, although we demonstrated XAI-based QoE modelling only for video streaming and web browsing, from a research perspective, it is important to understand that the whole process is easily applicable in other domains like speech or gaming. Apart from that, it can also be highly beneficial for providers of services and networks to use XAI when implementing a continuous QoE monitoring. They could integrate visualizations of trends like Figure 1 or Figure 2 into dashboards, thus, allowing to easily obtain a deeper understanding of the QoE in their system.


In conclusion, the progress in technology has made data-driven explainable QoE modeling suitable for implementation. As a result, it is crucial for researchers and service providers to consider adopting XAI-based QoE modeling to gain a comprehensive and broader understanding of the factors influencing QoE and their connection to users’ subjective experiences. By doing so, they can enhance services and networks in terms of QoE, effectively preventing user churn and minimizing revenue losses.


[1] K. Brunnström, S. A. Beker, K. De Moor, A. Dooms, S. Egger, M.-N. Garcia, T. Hossfeld, S. Jumisko-Pyykkö, C. Keimel, M.-C. Larabi et al., “Qualinet White Paper on Definitions of Quality of Experience,” 2013.

[2] W. Robitza, S. Göring, A. Raake, D. Lindegren, G. Heikkilä, J. Gustafsson, P. List, B. Feiten, U. Wüstenhagen, M.-N. Garcia et al., “HTTP Adaptive Streaming QoE Estimation with ITU-T Rec. P. 1203: Open Databases and Software,” in ACM MMSys, 2018

[3] G. Kougioumtzidis, V. Poulkov, Z. D. Zaharis, and P. I. Lazaridis, “A Survey on Multimedia Services QoE Assessment and Machine Learning-Based Prediction,” IEEE Access, 2022.

[4] N. Wehner, A. Seufert, T. Hoßfeld, M. and Seufert, “Explainable Data-Driven QoE Modelling with XAI,” QoMEX, 2023.

[5] C. Molnar, Interpretable Machine Learning, 2nd ed., 2022. Available: https://christophm.github.io/interpretable-ml-book

[6] A. B. Arrieta, N. Diıaz-Rodriguez et al., “Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges Toward Responsible AI,” Information fusion, 2020.

[7] S. M. Lundberg and S.-I. Lee, “A Unified Approach to Interpreting Model Predictions,” NIPS, 2017.

[8] R. Agarwal, L. Melnick, N. Frosst, X. Zhang, B. Lengerich, R. Caruana, and G. E. Hinton, “Neural Additive Models: Interpretable MachineLearning with Neural Nets,” NIPS, 2021.

[9] H. Nori, S. Jenkins, P. Koch, and R. Caruana, “InterpretML: A Unified Framework for Machine Learning Interpretability,” arXiv preprint arXiv:1909.09223, 2019.

[10] T. Hoßfeld, R. Schatz, E. Biersack, and L. Plissonneau, “Internet Video Delivery in YouTube: From Traffic Measurements to Quality of Experience,” in Data Traffic Monitoring and Analysis, 2013.

[11] M. Seufert, N. Wehner, and P. Casas, “Studying the Impact of HAS QoE Factors on the Standardized Qoe Model P. 1203,” in ICDCS, 2018

[12] D. N. da Hora, A. S. Asrese, V. Christophides, R. Teixeira, D. Rossi, “Narrowing the gap between QoS metrics and Web QoE using Above-the-fold metrics,” PAM, 2018

[13] A. Seufert, F. Wamser, D. Yarish, H. Macdonald, and T. Hoßfeld, “QoE Models in the Wild: Comparing Video QoE Models Using a Crowdsourced Data Set”, in QoMEX, 2021

[14] D. Shin, “The effects of explainability and causability on perception, trust, and acceptance: Implications for explainable AI”, in International Journal of Human-Computer Studies, 2021.

[15] D. Zha, Z. P. Bhat, K. H. Lai, F. Yang, & X. Hu, “Data-centric ai: Perspectives and challenges”, in SIAM International Conference on Data Mining, 2023

Bookmark the permalink.