Overview of Open Dataset Sessions and Benchmarking Competitions in 2023-2025 – Part 4 (ACM MMSys 2023, 2024, 2025)

Editors: Maria Torres Vega (KU Leuven, Belgium), Karel Fliegel (Czech Technical University in Prague, Czech Republic), Mihai Gabriel Constantin (University Politehnica of Bucharest, Romania),  

In this Dataset Columns, we continue the tradition of the previous three columns by reviewing  some of the notable events related to open datasets and benchmarking competitions in the field of multimedia in the years 2023, 2024 and 2025 from this column. This selection highlights the wide range of topics and datasets currently of interest to the community. Some of the events covered in this review include special sessions on open datasets and competitions featuring multimedia data. This review follows similar efforts from the previous editions:

This fourth column focuses on the last three editions of ACM Multimedia Systems (MSys), i.e.,  2023, 2024, and 2025:

ACM MMSys 2023

10 dataset papers were presented at the 14th ACM Multimedia Systems Conference (ACM MMSys’23), organized in Vancouver, Canada, June 7-10, 2023 (https://2023.acmmmsys.org/). The complete ACM MMSys’23 Proceedings are available in the ACM Digital Library (https://dl.acm.org/doi/proceedings/10.1145/3587819).

  1. Rhys Cox, S., et al., VOLVQAD: An MPEG V-PCC Volumetric Video Quality Assessment Dataset (paper available at: https://dl.acm.org/doi/10.1145/3587819.3592543; dataset available at: https://github.com/nus-vv-streams/volvqad-dataset).
    This is a volumetric video quality assessment dataset consisting of 7,680 ratings on 376 video sequences from 120 participants. The sequences are encoded with MPEG V-PCC using 4 different avatar models and 16 quality variations, and then rendered into test videos for quality assessment using 2 different background colors and 16 different quality switching patterns. 
  2. Prakash, N., et al., TotalDefMeme: A Multi-Attribute Meme dataset on Total Defence in Singapore (paper available at: https://dl.acm.org/doi/10.1145/3587819.3592545; dataset available at: https://gitlab.com/bottle_shop/meme/TotalDefMemes).   Total Defence is a large-scale multi-modal and multi-attribute meme dataset that captures public sentiments toward Singapore’s Total Defence policy. Besides supporting social informatics and public policy analysis of the Total Defence policy, TotalDefMeme can also support many downstream multi-modal machine learning tasks, such as aspect-based stance classification and multi-modal meme clustering. 
  3. Sun, Y., et al., A Dynamic 3D Point Cloud Dataset for Immersive Applications (paper available at: https://dl.acm.org/doi/10.1145/3587819.3592546; dataset available on request to the authors). This dataset consists of synthetically generated objects with pre-determined motion patterns. It contains nine objects in three categories (shape, avatar, and textile) with different animation patterns.
  4. Raca, D., et al., 360 Video DASH Dataset (paper available at: https://dl.acm.org/doi/10.1145/3587819.3592548; dataset available at: https://github.com/darijo/360-Video-DASH-Dataset). This study introduces a SW tool that offers straight-forward encoding platforms to simplify the encoding of DASH VR videos. In addition, it includes a dataset composed of 9 VR videos encoded with seven tiling configurations, four segment durations, four different bitrates. 
  5. Hu, K., et al., FSVVD: A Dataset of Full Scene Volumetric Video ( paper available at: https://dl.acm.org/doi/10.1145/3587819.3592551, dataset available at: https://cuhksz-inml.github.io/full_scene_volumetric_video_dataset/). This dataset focuses on the current most widely used data format, point cloud, and for the first time, releases a full-scene volumetric video dataset that includes multiple people and their daily activities interacting with the external environments.
  6. Wu, Y., et al.,  A Dataset of Food Intake Activities Using Sensors with Heterogeneous Privacy Sensitivity Levels (paper available at: https://dl.acm.org/doi/10.1145/3587819.3592553; dataset available on request to the authors). This dataset compiles fine-grained food intake activities using sensors of heterogeneous privacy sensitivity levels, namely a mmWave radar, an RGB camera, and a depth camera. Solutions to recognize food intake activities can be developed using this dataset, which may provide a more comprehensive picture of the accuracy and privacy trade-offs involved with heterogeneous sensors.
  7. Soares da Costa, T., et al., A Dataset for User Visual Behaviour with Multi-View Video Content (paper available at: https://dl.acm.org/doi/10.1145/3587819.3592556; dataset available on request to the authors). This dataset, collected with a large-scale testbed, compiles tracking data from head movements was obtained from 45 participants using an Intel Realsense F200 camera, with 7 video playlists, each being viewed a minimum of 17 times. 
  8. Wei, Y., et al., A 6DoF VR Dataset of 3D virtualWorld for Privacy-Preserving Approach and Utility-Privacy Tradeoff (paper available at: https://dl.acm.org/doi/10.1145/3587819.3592557; dataset available on request to the authors). This dataset collects a 6 degree-of-freedom VR dataset of 3D virtual worlds for the investigation of privacy-preserving approaches and utility-privacy tradeoff.
  9. Mohammed, A. et al., IDCIA: Immunocytochemistry Dataset for Cellular Image Analysis (paper available at: https://dl.acm.org/doi/10.1145/3587819.3592558; dataset available at: https://figshare.com/articles/dataset/Dataset/21970604). This dataset is a new annotated microscopic cellular image dataset to improve the effectiveness of machine learning methods for cellular image analysis. It includes microscopic images of cells, and for each image, the cell count and the location of individual cells. The data were collected as part of an ongoing study investigating the potential of electrical stimulation to modulate stem cell differentiation and possible applications for neural repair. 
  10. Al Shoura, T., et al., SEPE Dataset: 8K Video Sequences and Images for Analysis and Development (paper available at: https://dl.acm.org/doi/10.1145/3587819.3592560; dataset available at: https://github.com/talshoura/SEPE-8K-Dataset). The SEPE 8K dataset (Software Engineering Practice and Education) is made of 40 different 8K (8192 x 4320) video sequences and 40 variant 8K (8192 x 5464) images. The proposed dataset is – as far as we know – the first to publish true 8K natural sequences; thus, it is important for the next level of applications dealing with multimedia such as video quality assessment, super-resolution, video coding, video compression, and many more.

 ACM MMSys 2024

14 dataset papers were presented at the 15th ACM Multimedia Systems Conference (ACM MMSys’24), organized in Bari, Italy, April 15-18, 2024 (https://2024.acmmmsys.org/). The complete ACM MMSys’24 Proceedings are available in the ACM Digital Library (https://dl.acm.org/doi/proceedings/10.1145/3625468).

  1. Malon, T., et al., Ceasefire Hierarchical Weapon Dataset (paper available at: https://dl.acm.org/doi/10.1145/3625468.3653434; dataset available on request to the authors). The Ceasefire Hierarchical Weapon Dataset, an RGB image dataset of firearms tailored for fine-grained image classi- fication, contains 260 classes ranging from 25 to hundreds of images per class, with a total of 40,789 images. In addition, a 4-level hierarchy (family, group, type, model) is provided and validated by forensic experts. 
  2. Kassab, E.J., et al., TACDEC: Dataset for Automatic Tackle Detection in Soccer Game Videos (paper available at: https://dl.acm.org/doi/10.1145/3625468.3652166; dataset available on request to the authors). TACDEC is a dataset of tackle events in soccer game videos. By leveraging video data from the Norwegian Eliteserien league across multiple seasons, we annotated 425 videos with 4 types of tackle events, categorized into “tackle-live”, “tackle-replay”, “tackle-live-incomplete”, and “tackle-replay-incomplete”, yielding a total of 836 event annotations. 
  3. Zhao, J., Pan, J., LENS: A LEO Satellite Network Measurement Dataset (paper available at: https://dl.acm.org/doi/10.1145/3625468.3652170; dataset available at: https://github.com/clarkzjw/LENS). LENS is a LEO satellite network measurement dataset, collected from 13 Starlink dishes, associated with 7 Point-of-Presence (PoP) locations across 3 continents. The dataset currently consists of network latency traces from Starlink dishes with different hardware revisions, various service subscriptions and distinct sky obstruction ratios.
  4. Chen, B., et al., vRetention: A User Viewing Dataset for Popular Video Streaming Services (paper available at: https://dl.acm.org/doi/10.1145/3625468.3652175; dataset available at: https://github.com/flowtele/vRetention). This dataset collects 229178 audience retention curves from YouTube and Bilibili, offering a thorough view of viewer engagement and diverse watching styles. Our analysis reveals notable behavioral differences across countries, categories, and platforms.
  5. Xu , Y.,  et al., Panonut360: A Head and Eye Tracking Dataset for Panoramic Video (paper available at: https://dl.acm.org/doi/10.1145/3625468.3652176; dataset available at: https://dianvrlab.github.io/Panonut360/). This dataset presents head and eye trackings involving 50 users (25 males and 25 females) watching 15 panoramic videos (mostly in 4K). The dataset provides details on the viewport and gaze attention locations of users.
  6. Linder, S.,  et al., VEED: Video Encoding Energy and CO2 Emissions Dataset for AWS EC2 instances (paper available at: https://dl.acm.org/doi/10.1145/3625468.3652178; dataset available at: https://github.com/cd-athena/VEED-dataset). VEED is a FAIR Video Encoding Energy and CO2 Emissions Dataset for Amazon Web Services (AWS) EC2 instances. The dataset also contains the duration, CPU utilization, and cost of the encoding. 
  7. Tashtarian, F., et al., COCONUT: Content Consumption Energy Measurement Dataset for Adaptive Video Streaming (paper available at: https://dl.acm.org/doi/10.1145/3625468.3652179; dataset available at: https://athena.itec.aau.at/coconut/). The COCONUT dataset provides a COntent COnsumption eNergy measUrement daTaset for adaptive video streaming collected through a digital multimeter on various types of client devices, such as laptop and smartphone, streaming MPEG-DASH segments.
  8. Sarkhoosh, M. H., et al., The SoccerSum Dataset for Automated Detection, Segmentation, and Tracking of Objects on the Soccer Pitch (paper available at: https://dl.acm.org/doi/10.1145/3625468.3652180; dataset available at: https://zenodo.org/records/10612084). SoccerSum is a novel dataset aimed at enhancing object detection and segmentation in video frames depicting the soccer pitch, using footage from the Norwegian Eliteserien league across 2021-2023. It also includes the segmentation of key pitch areas such as the penalty and goal boxes for the same frame sequences. It comprises 750 frames annotated with 10 classes for advanced analysis. 
  9. Li, G., et al., A Driver Activity Dataset with Multiple RGB-D Cameras and mmWave Radars (paper available at: https://dl.acm.org/doi/10.1145/3625468.3652181; dataset available at: https://www.kaggle.com/datasets/guanhualee/driver-activity-dataset). This work introduces a novel dataset for fine-grained driver activities, utilizing diverse sensors such as mmWave radars, RGB, and depth cameras, each of which includes three camera angles: body, face, and hands. 
  10. Nguyen, M., et al., ComPEQ – MR: Compressed Point Cloud Dataset with Eye-tracking and Quality Assessment in Mixed Reality (paper available at: https://dl.acm.org/doi/10.1145/3625468.3652182; dataset available at: https://ftp.itec.aau.at/datasets/ComPEQ-MR/). This dataset comprises four compressed dynamic point clouds processed by Moving Picture Experts Group (MPEG) reference tools (i.e., VPCC and GPCC), each with 12 distortion levels. We also conducted subjective tests to assess the quality of the compressed point clouds with different levels of distortion. Additionally, eye-tracking data for visual saliency is included in this dataset, which is necessary to predict where people look when watching 3D videos in MR experiences. We collected opinion scores and eye-tracking data from 41 participants, resulting in 2132 responses and 164 visual attention maps in total. 
  11. Barone, N., et al., APEIRON: a Multimodal Drone Dataset Bridging Perception and Network Data in Outdoor Environments (paper available at: https://dl.acm.org/doi/10.1145/3625468.3652186; dataset available at: https://c3lab.github.io/Apeiron/). APEIRON is a rich multimodal aerial dataset that simultaneously collects perception data from a stereo camera and an event based camera sensor, along with measurements of wireless network links obtained using an LTE module. The assembled dataset consists of both perception and network data, making it suitable for typical perception or communication applications, as well as cross-disciplinary applications that require both types of data. 
  12. Baldoni, S., et al., Questset: A VR Dataset for Network and Quality of Experience Studies (paper available at: https://dl.acm.org/doi/10.1145/3625468.3652187; dataset available at: https://researchdata.cab.unipd.it/1179/). Questset contains over 40 hours of VR traces from 70 users playing commercially available video games, and includes both traffic data for network optimization, and movement and user experience data for cybersickness analysis. Therefore, Questset represents an enabler to jointly address the main VR challenges in the near future.
  13. Jabal, A. et al., StreetLens: An In-Vehicle Video Dataset for Public Facility Monitoring in Urban Streets (paper available at: https://dl.acm.org/doi/10.1145/3625468.3652188; dataset available on request to the authors). StreetLens is a new dataset of videos capturing urban streets with plentiful annotations for vision-based public facility monitoring. It includes four-and-a-half hours of videos recorded by smartphone cameras placed in moving vehicles in the suburbs of three different cities. 
  14. Brescia, W., et al., MilliNoise: a Millimeter-wave Radar Sparse Point Cloud Dataset in Indoor Scenarios (paper available at: https://dl.acm.org/doi/10.1145/3625468.3652189; dataset available at: https://github.com/c3lab/MilliNoise). MilliNoise is a point cloud dataset captured in indoor scenarios through a mmWave radar sensor installed on a wheeled mobile robot. Each of the 12M points in the MilliNoise dataset is accurately labeled as true/noise point by leveraging known information of the scenes and a motion capture system to obtain the ground truth position of the moving robot. Along with the dataset, we provide researchers with the tools to visualize the data and prepare it for statistical and machine learning analysis.

ACM MMSys 2025

8 dataset papers were presented at the 16th ACM Multimedia Systems Conference (ACM MMSys’25), organized in Stellenbosch, South Africa, March 30th to April 4th, 2025 (https://2025.acmmmsys.org/). The complete ACM MMSys’25 Proceedings are available in the ACM Digital Library (https://dl.acm.org/doi/proceedings/10.1145/3712676).

  1. Lechelek, L. et al., eCHFD: extended Ceasefire Hierarchical Firearm Dataset (paper available at: https://dl.acm.org/doi/10.1145/3712676.3718333; dataset available on request to the authors). This is the extended Ceasefire Hierarchical Firearm Dataset (eCHFD), a large image dataset of firearms consisting of over 93,000 images in 505 classes. It was constructed from more than 240 videos filmed at the Toulouse Forensics Laboratory (France) and further enriched with images from the existing CHFD dataset and additional downloaded images.
  2. Sarkhoosh, M. H. et al., HockeyAI: A Multi-Class Ice Hockey Dataset for Object Detection (paper available at: https://dl.acm.org/doi/10.1145/3712676.3718335; dataset available at: https://huggingface.co/SimulaMet-HOST/HockeyAI). HockeyAI is a novel open source dataset specifically designed for multi-class object detection in ice hockey. It includes 2,101 high resolution frames extracted from professional games in the Swedish Hockey League (SHL), annotated in the You Look Only Once (YOLO) format. 
  3. Nguyen, M. et al., OLED-EQ: A Dataset for Assessing Video Quality and Energy Consumption in OLED TVs Across Varying Brightness Levels (paper available at: https://dl.acm.org/doi/abs/10.1145/3712676.3718337; dataset available at: https://github.com/minhkstn/OLED-EQ). The dataset comprises the energy data of four OLED TVs with different screen sizes and manufacturers in playing 176 videos in a range of dark and bright content. As a result, 704 data traces of energy consumption are collected. It also includes subjective annotations (28 participants, resulting in 2240 responses in total) of the quality of videos displayed in OLED TVs when they are reduced in brightness. 
  4. Sarkhoosh, M. H. et al., HockeyRink: A Dataset for Precise Ice Hockey Rink Keypoint Mapping and Analytics (paper available at: https://dl.acm.org/doi/10.1145/3712676.3718338; dataset available at: https://huggingface.co/SimulaMet-HOST/HockeyRink). HockeyRink is a novel dataset comprising 56 meticulously annotated keypoints corresponding to significant landmarks on a standard hockey rink, including face-off dots, goalposts, and blue lines. 
  5. Sarkhoosh, M. H. et al., HockeyOrient: A Dataset for Ice Hockey Player Orientation Classification (paper available at: https://dl.acm.org/doi/10.1145/3712676.3718342; dataset available at: https://huggingface.co/datasets/SimulaMet-HOST/HockeyOrient ). HockeyOrient is a novel dataset for classifying the orientation of ice hockey players based on their poses. The dataset comprises 9,700 manually annotated frames, selected randomly and non-sequentially, taken from Swedish Hockey League (SHL) games during the 2023 and 2024 seasons. 
  6. Li, J. et al., PCVD: A Dataset of Point Cloud Video for Dynamic Human Interaction (paper available at: https://dl.acm.org/doi/10.1145/3712676.3718343; dataset available at:  https://github.com/acmmmsys/2025-PCVD-A-Dataset-of-Point-Cloud-Video-for-Dynamic-Human-Interaction). This is a point cloud video dataset PCVD captured with synchronized Azure Kinect cameras, designed to support tasks like denoising, segmentation, and motion recognition in single and multi-person scenes. It provides high-quality depth and color data from diverse real-world scenes with human actions. 
  7. Bhattacharya, A. et al., AMIS: An Audiovisual Dataset for Multimodal XR Research (paper available at: https://dl.acm.org/doi/10.1145/3712676.3718344; dataset available at: https://github.com/Telecommunication-Telemedia-Assessment/AMIS). The Audiovisual Multimodal Interaction Suite (AMIS) is an open-source dataset and accompanying Unity-based demo implementation designed to aid research on immersive media communication and social XR environments. It features synchronized audiovisual recordings of three actors performing monologues and participating in dyadic conversations across four modalities: talking-head videos, full-body videos, volumetric avatars, and personalized animated avatars. 
  8. Ouellette, J. et al., MazeLab: A Large-Scale Dynamic Volumetric Point Cloud Video Dataset With User Behavior Traces (paper available at: https://dl.acm.org/doi/10.1145/3712676.3718345; dataset available on request to the authors). MazeLab is a dynamic volumetric video dataset comprising a feature-rich point cloud representation of a large maze environment. It captures navigation traces from 15 participants interacting with 15 distinct maze variants, categorized into seven classes designed to elicit specific behavioral characteristics such as navigation patterns, attention hotspots, and interaction dynamic.

Overview of Open Dataset Sessions and Benchmarking Competitions in 2023-2024 – Part 1 (QoMEX 2023 and QoMEX 2024)

In this  and the following Dataset Columns, we present a review of some of the notable events related to open datasets and benchmarking competitions in the field of multimedia in the years 2023 and 2024. This selection highlights the wide range of topics and datasets currently of interest to the community. Some of the events covered in this review include special sessions on open datasets and competitions featuring multimedia data. This year’s review follows similar efforts from the previous year (https://records.sigmm.org/records-issues/acm-sigmm-records-issue-1-2023/), highlighting the ongoing importance of open datasets and benchmarking competitions in advancing research and development in multimedia. This first column focuses on the last two editions of QoMEX, i.e.,  2023 and 2024:

QoMEX 2023

4 dataset full papers were presented at the 15th International Conference on Quality of Multimedia Experience (QoMEX 2023), organized in Ghent, Belgium, June 19 – 21, 2023 (https://qomex2023.itec.aau.at/). The complete QoMEX ’23 Proceedings is available in the IEEE Xplore Digital Library (https://ieeexplore.ieee.org/xpl/conhome/10178424/proceeding).

These datasets were presented within the Datasets session, chaired by Professor Lea Skorin-Kapov. Given the scope of the conference (i.e., Quality of Multimedia Experience), these four papers present contributions focused on the impact on user perception of adaptive 2D video streaming, holographic video codecs, omnidirectional video/audio environments and multi-screen video.

PNATS-UHD-1-Long: An Open Video Quality Dataset for Long Sequences for HTTP-based Adaptive Streaming QoE Assessment
Ramachandra Rao, R. R., Borer, S., Lindero, D., Göring, S. and Raake, A.

Paper available at: https://ieeexplore.ieee.org/document/10178493   
Dataset available at: https://github.com/Telecommunication-Telemedia-Assessment/PNATS-UHD-1-Long 

A collaboration work of Technische Universität Ilmenau (Germany), Ericsson Research (Sweden) and Rohde&Schwarz (Switzerland) 

The presented dataset consists of 3 subjective databases targeting overall quality assessment of a typical HTTP-based Adaptive Streaming session consisting of degradations such as quality switching, initial loading delay, and stalling events using audiovisual content ranging between 2 and 5 minutes. In addition to this, subject bias and consistency in quality assessment of such longer-duration audiovisual contents with multiple degradations are investigated using a subject behaviour model. As part of this paper, the overall test design, subjective test results, sources, encoded audiovisual contents, and a set of analysis plots are made publicly available for further research.

Open access dataset of holographic videos for codec analysis and machine learning applications
Gilles, A., Gioia, P., Madali, N., El Rhammad, A., Morin, L.

Paper available at: https://ieeexplore.ieee.org/document/10178637 
Dataset available at: https://hologram-repository.labs.b-com.com/#/holographic-videos 

A collaboration work between IRT and INSA, Rennes, France

This is reported as the first large-scale dataset containing 18 holographic videos computed with three different resolutions and pixel pitches. By providing the color and depth images corresponding to each hologram frame, our dataset can be used in additional applications such as the validation of 3D scene geometry retrieval or deep learning-based hologram synthesis methods. Altogether, our dataset comprises 5400 pairs of RGB-D images and holograms, totaling more than 550 GB of data.

Saliency of Omnidirectional Videos with Different Audio Presentations: Analyses and Dataset
Singla, A., Robotham, T., Bhattacharya, A., Menz, W., Habets, E. and Raake, A.

Paper available at: https://ieeexplore.ieee.org/abstract/document/10178588 
Dataset available at: https://qoevave.github.io/database/docs/Saliency

A collaboration between the Technische Universität Ilmenau and the International Audio Laboratories of Erlangen, both in Germany.

This dataset uses a between-subjects test design to collect users’ exploration data of 360-degree videos in a free-form viewing scenario using the Varjo XR-3 Head Mounted Display, in the presence of no, mono, and 4th-order Ambisonics audio. Saliency information was captured as head-saliency in terms of the center of a viewport at 50 Hz. For each item, subjects were asked to describe the scene with a short free-verbalization task. Moreover, cybersickness was assessed using the simulator sickness questionnaire at the beginning and at the end of the test. The data is sought to enable training of visual and audiovisual saliency prediction models for interactive experiences.

A Subjective Dataset for Multi-Screen Video Streaming Applications
Barman, N., Reznik Y. and Martini, M. G.

Paper available at: https://ieeexplore.ieee.org/document/10178645 
Dataset available at: https://github.com/NabajeetBarman/Multiscreen-Dataset 

A collaboration between Brightcove (London, UK and Seattle, USA) and Kingston University Londong, UK.

This paper presents a new, open-source dataset consisting of subjective ratings for various encoded video sequences of different resolutions and bitrates (quality) when viewed on three devices of varying screen sizes: TV, Tablet, and Mobile. Along with the subjective scores, an evaluation of some of the most famous and commonly used open-source objective quality metrics is also presented. It is observed that the performance of the metrics varies a lot across different device types, with the recently standardized ITU-T P.1204.3 Model, on average, outperforming their full-reference counterparts. 

QoMEX’24

5 dataset full papers were presented at the 16th International Conference on Quality of Multimedia Experience (QoMEX 2024), organized in Karshamn, Sweden, June 18 – 20, 2024 (https://qomex2024.itec.aau.at/). The complete QoMEX ’24 Proceedings is available in the IEEE Xplore Digital Library (https://ieeexplore.ieee.org/xpl/conhome/10597667/proceeding ).

These datasets were presented within the Datasets session, chaired by Dr. Mohsen Jenadeleh. Given the scope of the conference (i.e., Quality of Multimedia Experience), these five papers present contributions focused on the impact on user perception of HDR videos (UHD-1, 8K, and AV1),  immersive 360° video and light fields. This last contribution was awarded the best paper award of the conference.

AVT-VQDB-UHD-1-HDR: An Open Video Quality Dataset for Quality Assessment of UHD-1 HDR Videos
Ramachandra Rao, R. R., Herb, B., Helmi-Aurora, T., Ahmed, M. T, Raake, A.

Paper available at: https://ieeexplore.ieee.org/document/10598284 
Dataset available at: https://github.com/Telecommunication-Telemedia-Assessment/AVT-VQDB-UHD-1-HDR 

A work from Technische Universität Ilmenau, Germany.

This dataset deals with the assessment of the perceived quality of HDR videos. Firstly, a subjective test with 4K/UHD1 HDR videos using the ACR-HR (Absolute Category Rating – Hidden Reference) method was conducted. The tests consisted of a total of 195 encoded videos from 5 source videos which all had a framerate of 60 fps. In this test, the 4K/UHD-1 HDR stimuli were encoded at four different resolutions, namely, 720p, 1080p, 1440p, and 2160p using bitrates ranging between 0.5 Mbps and 40 Mbps. The results of the subjective test have been analyzed to assess the impact of factors such as resolution, bitrate, video codec, and content on the perceived video quality. 

AVT-VQDB-UHD-2-HDR: An open 8K HDR source dataset for video quality research
Keller, D., Goebel, T., Sievenkees, V., Prenzel, J., Raake, A.

Paper available at: https://ieeexplore.ieee.org/document/10598268 
Dataset available at: https://github.com/Telecommunication-Telemedia-Assessment/AVT-VQDB-UHD-2-HDR 

A work from Techniche Universität Ilmenau, Germany.

The AVT-VQDB-UHD-2-HDR dataset consists of 31 8K HDR video sources of 15s created with the goal of accurately representing real-life footage, while taking into account video coding and video quality testing challenges. 

The effect of viewing distance and display peak luminance – HDR AV1 video streaming quality dataset
Hammou, D., Krasula, L., Bampis, C., Li, Z., Mantiuk, R.,

Paper available at: https://ieeexplore.ieee.org/document/10598289 
Dataset available at: https://doi.org/10.17863/CAM.107964 

A collaboration between University of Cambridge (UK) and Netflix Inc. (USA).

The HDR-VDC dataset captures the quality degradation of HDR content due to AV1 coding artifacts and the resolution reduction. The quality drop was measured at two viewing distances, corresponding to 60 and 120 pixels per visual degree, and two display mean luminance levels, 51 and 5.6 nits. It employs a highly sensitive pairwise comparison protocol with active sampling and comparisons across viewing distances to ensure possibly accurate quality measurements. It also provides the first publicly available dataset that measures the effect of display peak luminance and includes HDR videos encoded with AV1. 

A Spherical Light Field Database for Immersive Telecommunication and Telepresence Applications (Best Paper Award)
Zerman, E., Gond, M., Takhtardeshir, S., Olsson, R., Sjöström, M.

Paper available at: https://ieeexplore.ieee.org/document/10598264 
Dataset available at: https://zenodo.org/records/13342006 

A work presented from Mid Sweden University, Sundsvall, Sweden.

The Spherical Light Field Database (SLFDB) consists of a light field of 60 views captured with an omnidirectional camera in 20 scenes. To show the usefulness of the proposed database, we provide two use cases: compression and viewpoint estimation. The initial results validate that the publicly available SLFDB will benefit the scientific community.

AVT-ECoClass-VR: An open-source audiovisual 360° video and immersive CGI multi-talker dataset to evaluate cognitive performance
Fremerey, S., Breuer, C., Leist, L., Klatte, M., Fels, J., Raake, A.

Paper available at: https://ieeexplore.ieee.org/document/10598262 
Dataset available at: https://github.com/Telecommunication-Telemedia-Assessment/AVT-ECoClass-VR

A collaboration work between Technische Universität Ilmenau, RWTH Aache University and RPTU Kaiserslautern (Germany).

This dataset includes two audiovisual scenarios (360◦ video and computer-generated imagery) and two implementations for dataset playback. The 360◦ video part of the dataset features 200 video and single-channel audio recordings of 20 speakers reading ten stories, and 20 videos of speakers in silence, resulting in a total of 220 video and 200 audio recordings. The dataset also includes one 360◦ background image of a real primary school classroom scene, targeting young school children for subsequent subjective tests. The second part of the dataset comprises 20 different 3D models of the speakers and a computer-generated classroom scene, along with an immersive audiovisual virtual environment implementation that can be interacted with using an HTC Vive controller. 

Two Interviews with renown Datasets Researchers

This issue of the Dataset Column provides two interviews with the researchers responsible for novel datasets of recent years. In particular, we first interview Nacho Reimat (https://www.cwi.nl/people/nacho-reimat), the scientific programmer responsible for the CWIPC-SXR, one of the first datasets on dynamic, interactive volumetric media. Second, we interview Pierre-Etienne Martin (https://www.eva.mpg.de/comparative-cultural-psychology/staff/pierre-etienne-martin/), responsible for contributions to datasets in the area of sports and culture.  

The two interviewees were asked about their contribution to the dataset research, their interests, challenges, and the future.  We would like to thank both Nacho and Pierre-Etienne for their agreement to contribute to our column. 

Nacho Reimat, Scientific Programmer at the Distributed and Interactive Systems group at the CWI, Amsterdam, The Netherlands

Short bio: Ignacio Reimat is currently an R&D Engineer at Centrum Wiskunde & Informatica (CWI) in Amsterdam. He received the B.S. degree in Audiovisual Systems Engineering of Telecommunications at Universitat Politecnica de Catalunya in 2016 and the M.S degree in Innovation and Research in Informatics – Computer Graphics and Virtual Reality at Universitat Politecnica de Catalunya in 2020. His current research interests are 3D graphics, volumetric capturing, 3d reconstruction, point clouds, social Virtual Reality and real-time communications.

Could you provide a small summary of your contribution to the dataset research?

We have released the CWI Point Cloud Social XR Dataset [1], a dynamic point cloud dataset that depicts humans interacting in social XR settings. In particular, using commodity hardware we captured audio-visual data (RGB + Depth + Infrared + synchronized Audio) for a total of 45 unique sequences of people performing scripted actions [2]. The screenplays for the human actors were devised so as to simulate a variety of common use cases in social XR, namely, (i) Education and training, (ii) Healthcare, (iii) communication and social interaction, and (iv) Performance and sports. Moreover, diversity in gender, age, ethnicities, materials, textures and colours were additionally considered. As part of our release, we provide annotated raw material, resulting point cloud sequences, and an auxiliary software toolbox to acquire, process, encode, and visualize data, suitable for real-time applications.

Sample frames from the point cloud sequences released with the CWIPC-SXR dataset.

Why did you get interested in datasets research?

Real-time, immersive telecommunication systems are quickly becoming a reality, thanks to the advances in the acquisition, transmission, and rendering technologies. Point clouds in particular serve as a promising representation in these types of systems, offering photorealistic rendering capabilities with low complexity. Further development of transmission, coding, and quality evaluation algorithms, though, is currently hindered by the lack of publicly available datasets that represent realistic scenarios of remote communication between people in real-time. So we are trying to fill this gap. 

What is the most challenging aspect of datasets research?

In our case, because point clouds are a relatively new format, the most challenging part has been developing the technology to generate them. Our dataset is generated from several cameras, which need to be calibrated and synchronized in order to merge the views successfully. Apart from that, if you are releasing a large dataset, you also need to deal with other challenges like data hosting and maintenance, but even more important, find the way to distribute the data in a way that is suitable for different target users. Because we are not releasing just point clouds but also the raw data, there may be people interested in the raw videos, or in particular point clouds, and they do not want to download the full 1.6TB of data. And going even further, because of the novelty of the point cloud format, there is also a lack of tools to re-capture, playback or modify this type of data. That’s why, together with the dataset, we also released our point cloud auxiliary toolbox of software utilities built on top of the Point Cloud Library, which allows for alignment and processing of point clouds, as well as real-time capturing, encoding, transmission, and rendering.

How do you see the future of datasets research?

Open datasets are an essential part of science since they allow for comparison and reproducibility. The major problem is that creating datasets is difficult and expensive, requiring a big investment from research groups. In order to ensure that relevant datasets keep on being created, we need a push including: scientific venues for the publication and discussion of datasets (like the dataset track at the Multimedia Systems conference, which started more than a decade ago), investment from funding agencies and organizations identifying the datasets that the community will need in the future, and collaboration between labs to share the effort.

What are your future plans for your research?

We are very happy with the first version of the dataset since it provides a good starting point and was a source of learning. Still, there is room for improvements, so now that we have a full capturing system (together with the auxiliary tools), we would like to extend the dataset and refine the tools. The community still needs more datasets of volumetric video to further advance the research on alignment, post-processing, compression, delivery, and rendering. Apart from the dataset, the Distributed and Interactive Systems (https://www.dis.cwi.nl) group from CWI is working on volumetric video conferencing, developing a Social VR pipeline for enabling users to more naturally communicate and interact. Recently, we deployed a solution for visiting museums remotely together with friends and family members (https://youtu.be/zzB7B6EAU9c), and next October we will start two EU-funded projects on this topic.   


Pierre-Etienne Martin, Postdoctoral Researcher & Tech Development Coordinator, Max Planck Institute for Evolutionary Anthropology, Department of Comparative Cultural Psychology, Leipzig, Germany

Short Bio: Pierre-Etienne Martin is currently a Postdoctoral researcher at the Max Planck Institute. He received his M.S. degree in 2017 from the University of Bordeaux, the Pázmány Péter Catholic University and the Autonomous University of Madrid via the Image Processing and Computer vision Erasmus Master program. He obtained his PhD, labelled European, from the University of Bordeaux in 2020, supervised by Jenny Benois-Pineau and Renaud Péteri, on the topic of video detection and classification by means of Convolutional Neural Networks. His current research interests include among others Artificial Intelligence, Machine Learning and Computer Vision.

Could you provide a small summary of your contribution to the dataset research?

In 2017, I started my PhD thesis which focuses on movement analysis in sports. The aim of this research project, so-called CRIPS (ComputeR vIsion for Sports Performance – see ), is to improve the training experience of the athletes. Our team decided to focus on Table Tennis, and it is with the collaboration of the Sports Faculty of the University of Bordeaux, STAPS, that our first contribution came to be: the TTStroke-21 dataset [3]. This dataset gathers recordings of table tennis games at high resolution and 120 frames per second. The players and annotators are both from the STAPS. The annotation platform was designed by students from the LaBRI – University of Bordeaux, and the MIA from the University of la Rochelle. Coordination for recording the videos and doing the annotation was performed by my supervisors and myself.

In 2019, and until now, the TTStroke-21 is used to propose the Sports Task at the Multimedia Evaluation benchmark – MediaEval [4]. The goal is to segment and classify table tennis strokes from videos.

TTStrokes-21 sample images

Since 2021, I have joined the MPI EVA institute and I now focus on elaborating datasets for the Comparative Cultural Psychology department (CCP). The data we are working on focuses on great apes and children. We aim at segmenting, identifying and tracking. 

Why did you get interested in datasets research?

Datasets research is the field where the application of computer vision tools is possible. In order to widen the range of applications, datasets with qualitative ground truth need to be offered by the scientific community. Only then, models can be developed to solve the problem raised by the dataset and finally be offered to the community. This has been the goal of the interdisciplinary CRISP project, through the collaboration of the sport and computer science community, for improving athlete performance.

It is also the aim of collaborative projects, such as MMLAB [5], which gathers many models and implementations trained on various datasets, in order to ease reproducibility, performance comparison and inference for applications.

What is the most challenging aspect of datasets research?

From my experience, when organizing the Sport task at the MediaEval workshop, the most challenging aspect of datasets research is to be able to provide qualitative data: from acquisition to annotation; and tools to process them: use, demonstration and evaluation. That is why, on the side of our task, we also provide a baseline which covers most of these aspects.

How do you see the future of datasets research?

I hope datasets research will transcend in order to have a general scheme for annotation and evaluation of datasets. I hope the different datasets could be used together for training multi-task models, and give the opportunity to share knowledge and features proper to each type of dataset. Finally, quantity has been a major criterion for dataset research, but quality should be more considered in order to improve state-of-the-art performance while keeping a sustainable way to conduct research.

What are your future plans for your research?

Within the CCP department at MPI, I hope to be able to build different types of datasets to put to best use what has been implemented in the computer vision field to psychology.

Relevant references:

  1. CWIPC-SXR dataset: https://www.dis.cwi.nl/cwipc-sxr-dataset/
  2. I. Reimat, et al., “CWIPC-SXR: Point Cloud dynamic human dataset for Social XR. In Proceedings of the 12th ACM Multimedia Systems Conference (MMSys ’21). Association for Computing Machinery, New York, NY, USA, 300–306. https://doi.org/10.1145/3458305.3478452
  3. TTStroke-21: https://link.springer.com/article/10.1007/s11042-020-08917-3
  4. Media-Eval: http://www.multimediaeval.org/
  5. Open-MMLab: https://openmmlab.com/