Overview of Open Dataset Sessions and Benchmarking Competitions in 2022 – Part 1 (QoMEX 2022, ODS at MMSys ’22)

Editors: Karel Fliegel (Czech Technical University in Prague, Czech Republic), Mihai Gabriel Constantin (University Politehnica of Bucharest, Romania), Maria Torres Vega (Ghent University, Belgium)

In this Dataset Column, we present a review of some of the notable events related to open datasets and benchmarking competitions in the field of multimedia. This year’s selection highlights the wide range of topics and datasets currently of interest to the community. Some of the events covered in this review include special sessions on open datasets and competitions featuring multimedia data. While this list is not exhaustive and contains an overview of about 40 datasets, it is meant to showcase the diversity of subjects and datasets explored in the field. This year’s review follows similar efforts from the previous year (https://records.sigmm.org/2022/01/12/overview-of-open-dataset-sessions-and-benchmarking-competitions-in-2021/), highlighting the ongoing importance of open datasets and benchmarking competitions in advancing research and development in multimedia. The column is divided into three parts, in this one we focus on QoMEX 2022 and ODS at MMSys ’22:

  • 14th International Conference on Quality of Multimedia Experience (QoMEX 2022 – https://qomex2022.itec.aau.at/). We summarize three datasets included in this conference, that address QoE studies on audiovisual 360° video, storytelling for quality perception and energy consumption while streaming video QoE.
  • Open Dataset and Software Track at 13th ACM Multimedia Systems Conference (ODS at MMSys ’22 – https://mmsys2022.ie/). We summarize nine datasets presented at the ODS track, targeting several topics, including surveillance videos from a fishing vessel (Njord), multi-codec 8K UHD videos (8K MPEG-DASH dataset), light-field (LF) synthetic immersive large-volume plenoptic dataset (SILVR), a dataset of online news items and the related task of rematching (NewsImages), video sequences, characterized by various complexity categories (VCD), QoE dataset of realistic video clips for real networks, dataset of 360° videos with subjective emotional ratings (PEM360), free-viewpoint video dataset, and cloud gaming dataset (CGD).

For the overview of datasets related to MDRE at MMM 2022 and ACM MM 2022 please check the second part (http://records.sigmm.org/?p=12360), while ImageCLEF 2022 and MediaEval 2022 are addressed in the third part (http://records.sigmm.org/?p=12362).

QoMEX 2022

Three dataset papers were presented at the International Conference on Quality of Multimedia Experience (QoMEX 2022), organized in Lippstadt, Germany, September 5 – 7, 2022 (https://qomex2022.itec.aau.at/). The complete QoMEX ’22 Proceeding is available in the IEEE Digital Library (https://ieeexplore.ieee.org/xpl/conhome/9900491/proceeding).

These datasets were presented within the Databases session, chaired by Professor Oliver Hohlfeld. These three papers present contributions focused on audiovisual 360-degree videos, storytelling for quality perception and modelling of energy consumption and streaming of video QoE.

Audiovisual Database with 360° Video and Higher-Order Ambisonics Audio for Perception, Cognition, Behavior and QoE Evaluation Research
Paper available at: https://ieeexplore.ieee.org/document/9900893
Robotham, T., Singla, A., Rummukainen, O., Raake, A. and Habets, E.
International Audio Laboratories Erlangen, A joint institution of the Friedrich-Alexander-Universitat Erlangen-Nurnberg (FAU) and Fraunhofer Institute for Integrated Circuits (IIS), Germany; TU Ilmenau, Germany.
Dataset available at: https://qoevave.github.io/database/

This publicly available database provides audiovisual 360° content with high-order Ambisonics audio. It consists of twelve scenes capturing real-life nature and urban environments with a video resolution of 7680×3840 at 60 frames-per-second and with 4th-order Ambisonics audio. These 360° video sequences, with an average duration of 60 seconds, represent real-life settings for systematically evaluating various dimensions of uni-/multi-modal perception, cognition, behavior, and QoE. It provides high-quality reference material with a balanced focus on auditory and visual sensory information.

The Storytime Dataset: Simulated Videotelephony Clips for Quality Perception Research
Paper available at: https://ieeexplore.ieee.org/document/9900888
Spang, R. P., Voigt-Antons, J. N. and Möller, S.
Technische Universität Berlin, Berlin, Germany; Hamm-Lippstadt University of Applied Sciences, Lippstadt, Germany.
Dataset available at: https://osf.io/cyb8w/

This is a dataset of simulated videotelephony clips to act as stimuli in quality perception research. It consists of four different stories in the German language that are told through ten consecutive parts, each about 10 seconds long. Each of these parts is available in four different quality levels, ranging from perfect to stalling. All clips (FullHD, H.264 / AAC) are actual recordings from end-user video-conference software to ensure ecological validity and realism of quality degradation. Apart from a detailed description of the methodological approach, we contribute the entire stimuli dataset containing 160 videos and all rating scores for each file.

Modelling of Energy Consumption and Streaming Video QoE using a Crowdsourcing Dataset
Paper available at: https://ieeexplore.ieee.org/document/9900886
Herglotz, C, Robitza, W., Kränzler, M., Kaup, A. and Raake, A.
Friedrich-Alexander-Universität, Erlangen, Germany; Audiovisual Technology Group, TU Ilmenau, Germany; AVEQ GmbH, Vienna, Austria.
Dataset available at: On request

This paper performs a first analysis of end-user power efficiency and Quality of Experience of a video streaming service. A crowdsourced dataset comprising 447,000 streaming events from YouTube is used to estimate both the power consumption and perceived quality. The power consumption is modelled based on previous work, which extends toward predicting the power usage of different devices and codecs. The user-perceived QoE is estimated using a standardized model.

ODS at MMSys ’22

The traditional Open Dataset and Software Track (ODS) was a part of the 13th ACM Multimedia Systems Conference (MMSys ’22) organized in Athlone, Ireland, June 14 – 17, 2022 (https://mmsys2022.ie/). The complete MMSys ’22: Proceedings of the 13th ACM Multimedia Systems Conference are available in the ACM Digital Library (https://dl.acm.org/doi/proceedings/10.1145/3524273).

The Open Dataset and Software Chairs for MMSys ’22 were Roberto Azevedo (Disney Research, Switzerland), Saba Ahsan (Nokia Technologies, Finland), and Yao Liu (Rutgers University, USA). The ODS session with 14 papers has been initiated with pitches on Wednesday, June 15, followed by a poster session. There have been nine dataset papers presented out of fourteen contributions. A listing of the paper titles, dataset summaries, and associated DOIs is included below for your convenience.

Njord: a fishing trawler dataset
Paper available at: https://doi.org/10.1145/3524273.3532886
Nordmo, T.-A.S., Ovesen, A.B., Juliussen, B.A., Hicks, S.A., Thambawita, V., Johansen, H.D., Halvorsen, P., Riegler, M.A., Johansen, D.
UiT the Arctic University of Norway, Norway; SimulaMet, Norway; Oslo Metropolitan University, Norway.
Dataset available at: https://doi.org/10.5281/zenodo.6284673

This paper presents Njord, a dataset of surveillance videos from a commercial fishing vessel. The dataset aims to demonstrate the potential for using data from fishing vessels to detect accidents and report fish catches automatically. The authors also provide a baseline analysis of the dataset and discuss possible research questions that it could help answer.

Multi-codec ultra high definition 8K MPEG-DASH dataset
Paper available at: https://doi.org/10.1145/3524273.3532889
Taraghi, B., Amirpour, H., Timmerer, C.
Christian Doppler Laboratory Athena, Institute of Information Technology (ITEC), Alpen-Adria-Universität Klagenfurt, Klagenfurt, Austria.
Dataset available at: http://ftp.itec.aau.at/datasets/mmsys22/

This paper presents a dataset of multimedia assets encoded with various video codecs, including AVC, HEVC, AV1, and VVC, and packaged using the MPEG-DASH format. The dataset includes resolutions up to 8K and has a maximum media duration of 322 seconds, with segment lengths of 4 and 8 seconds. It is intended to facilitate research and development of video encoding technology for streaming services.

SILVR: a synthetic immersive large-volume plenoptic dataset
Paper available at: https://doi.org/10.1145/3524273.3532890
Courteaux, M., Artois, J., De Pauw, S., Lambert, P., Van Wallendael, G.
Ghent University – Imec, Oost-Vlaanderen, Zwijnaarde, Belgium.
Dataset available at: https://idlabmedia.github.io/large-lightfields-dataset/

SILVR (synthetic immersive large-volume plenoptic dataset) is a light-field (LF) image dataset allowing for six-degrees-of-freedom navigation in larger volumes while maintaining full panoramic field of view. It includes three virtual scenes with 642-2226 views, rendered with 180° fish-eye lenses and featuring color images and depth maps. The dataset also includes multiview rendering software and a lens-reprojection tool. SILVR can be used to evaluate LF coding and rendering techniques.

NewsImages: addressing the depiction gap with an online news dataset for text-image rematching
Paper available at: https://doi.org/10.1145/3524273.3532891
Lommatzsch, A., Kille, B., Özgöbek, O., Zhou, Y., Tešić, J., Bartolomeu, C., Semedo, D., Pivovarova, L., Liang, M., Larson, M.
DAI-Labor, TU-Berlin, Berlin, Germany; NTNU, Trondheim, Norway; Texas State University, San Marcos, TX, United States; Universidade Nova de Lisboa, Lisbon, Portugal.
Dataset available at: https://multimediaeval.github.io/editions/2021/tasks/newsimages/

NewsImages is a dataset of online news items and the related task of news images rematching, which aims to study the “depiction gap” between the content of an image and the text that accompanies it. The dataset is useful for studying connections between image and text and addressing the depiction gap, including sparse data, diversity of content, and the importance of background knowledge.

VCD: Video Complexity Dataset
Paper available at: https://doi.org/10.1145/3524273.3532892
Amirpour, H., Menon, V.V., Afzal, S., Ghanbari, M., Timmerer, C.
Christian Doppler Laboratory Athena, Institute of Information Technology (ITEC), Alpen-Adria-Universität Klagenfurt, Klagenfurt, Austria; School of Computer Science and Electronic Engineering, University of Essex, Colchester, United Kingdom.
Dataset available at: https://ftp.itec.aau.at/datasets/video-complexity/

The Video Complexity Dataset (VCD) is a collection of 500 Ultra High Definition (UHD) resolution video sequences, characterized by spatial and temporal complexities, rate-distortion complexity, and encoding complexity with the x264 AVC/H.264 and x265 HEVC/H.265 video encoders. It is suitable for video coding applications such as video streaming, two-pass encoding, per-title encoding, and scene-cut detection. These sequences are provided at 24 frames per second (fps) and stored online in losslessly encoded 8-bit 4:2:0 format.

Realistic video sequences for subjective QoE analysis
Paper available at: https://doi.org/10.1145/3524273.3532894
Hodzic, K., Cosovic, M., Mrdovic, S., Quinlan, J.J., Raca, D.
Faculty of Electrical Engineering, University of Sarajevo, Bosnia and Herzegovina; School of Computer Science & Information Technology, University College Cork, Ireland.
Dataset available at: https://shorturl.at/dtISV

The DashReStreamer framework is designed to recreate adaptively streamed video in real networks to evaluate user Quality of Experience (QoE). The authors have also created a dataset of 234 realistic video clips, based on video logs collected from real mobile and wireless networks, including video logs and network bandwidth profiles. This dataset and framework will help researchers understand the impact of video QoE dynamics on multimedia streaming.

PEM360: a dataset of 360° videos with continuous physiological measurements, subjective emotional ratings and motion traces
Paper available at: https://doi.org/10.1145/3524273.3532895
Guimard, Q., Robert, F., Bauce, C., Ducreux, A., Sassatelli, L., Wu, H.-Y., Winckler, M., Gros, A.
Université Côte d’Azur, Inria, CNRS, I3S, Sophia-Antipolis, France.
Dataset available at: https://gitlab.com/PEM360/PEM360/

PEM360 is a dataset of user head movements and gaze recordings in 360° videos, along with self-reported emotional ratings and continuous physiological measurement data. It aims to understand the connection between user attention, emotions, and immersive content, and includes software tools and joint instantaneous visualization of user attention and emotion, called “emotional maps.” The entire data and code are available in a reproducible framework.

A New Free Viewpoint Video Dataset and DIBR Benchmark
Paper available at: https://doi.org/10.1145/3524273.3532897
Guo, S., Zhou, K., Hu, J., Wang, J., Xu, J., Song, L.
Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, China.
Dataset available at: https://github.com/sjtu-medialab/Free-Viewpoint-RGB-D-Video-Dataset

A new dynamic RGB-D video dataset for FVV research is presented, including 13 groups of dynamic scenes and one group of static scenes, each with 12 HD video sequences and 12 corresponding depth video sequences. Also, the FVV synthesis benchmark is introduced based on depth image-based rendering to aid data-driven method validation. The dataset and benchmark aim to advance FVV synthesis with improved robustness and performance.

CGD: a cloud gaming dataset with gameplay video and network recordings
Paper available at: https://doi.org/10.1145/3524273.3532898
Slivar, I., Bacic, K., Orsolic, I., Skorin-Kapov, L., Suznjevic, M.
University of Zagreb, Faculty of Electrical Engineering and Computing, Zagreb, Croatia.
Dataset available at: https://muexlab.fer.hr/muexlab/research/datasets

The cloud gaming (CGD) dataset contains 600 game streaming sessions from 10 games of different genres, with various encoding parameters (bitrate, resolution, and frame rate) to evaluate the impact of these parameters on Quality of Experience (QoE). The dataset includes gameplay video recordings, network traffic traces, user input logs, and streaming performance logs, and can be used to understand relationships between network and application layer data for cloud gaming QoE and QoE-aware network management mechanisms.

Bookmark the permalink.