Dataset Column: Report from the MMM 2019 Special Session on Multimedia Datasets for Repeatable Experimentation (MDRE 2019)

Special Session

Information retrieval and multimedia content access have a long history of comparative evaluation, and many of the advances in the area over the past decade can be attributed to the availability of open datasets that support comparative and repeatable experimentation. Sharing data and code to allow other researchers to replicate research results is needed in the multimedia modeling field, as it helps to improve the performance of systems and the reproducibility of published papers.

This report summarizes the special session on Multimedia Datasets for Repeatable Experimentation (MDRE 2019), which was organized at the 25th International Conference on MultiMedia Modeling (MMM 2019), which was held in January 2019 in Thessaloniki, Greece.

The intent of these special sessions is to be a venue for releasing datasets to the multimedia community and discussing dataset related issues. The presentation mode in 2019 was to have short presentations (8 minutes) with some questions, and an additional panel discussion after all the presentations, which was moderated by Björn Þór Jónsson. In the following we summarize the special session, including its talks, questions, and discussions.

The special session presenters: Luca Rossetto, Cathal Gurrin and Minh-Son Dao.

Presentations

A Test Collection for Interactive Lifelog Retrieval

The session started with a presentation about A Test Collection for Interactive Lifelog Retrieval [1], given by Cathal Gurrin from Dublin City University (Ireland). In their work, the authors introduced a new test collection for interactive lifelog retrieval, which consists of multi-modal data from 27 days, comprising nearly 42 thousand images and other personal data (health and activity data; more specifically, heart rate, galvanic skin response, calorie burn, steps, blood pressure, blood glucose levels, human activity, and diet log). The authors argued that, although other lifelog datasets already exist, their dataset is unique in terms of the multi-modal character, and has a reasonable and easily manageable size of 27 consecutive days. Hence, it can also be used for interactive search and provides newcomers with an easy entry into the field. The published dataset has already been used for the Lifelog Search Challenge (LSC) [5] in 2018, which is an annual competition run at the ACM International Conference on Multimedia Retrieval (ICMR).

The discussion about this work started with a question about the plans for the dataset and whether it should be extended over the years, e.g. to increase the challenge of participating in the LSC. However, the problem with public lifelog datasets is the fact that there is a conflict between releasing more content and safeguarding privacy. There is a strong need to anonymize the contained images (e.g. blurring faces and license plates), where the rules and requirements of the EU GDPR regulations make this especially important. However, anonymizing content unfortunately is a very slow process. An alternative to removing and/or masking actual content from the dataset for privacy reasons would be to create artificial datasets (e.g. containing public images or only faces from people who consent to publish), but this would likely also be a non-trivial task. One interesting aspect could be the use of Generative Adversarial Networks (GANs) for the anonymization of faces, for instance by replacing all faces appearing in the content with generated faces learned from a small group of people who gave their consent. Another way to preemptively mitigate the privacy issues could be to wear conspicuous ‘lifelogging stickers’ during recording to make people aware of the presence of the camera, which would give them the possibility to object to being filmed or to avoid being captured altogether.

SEPHLA: Challenges and Opportunities Within Environment-Personal Health Archives

The second presentation was given by Minh-Son Dao from the National Institute of Information and Communications Technology (NICT) in Japan about SEPHLA: Challenges and Opportunities Within Environment-Personal Health Archives [2]. This is a dataset that aims at combining the conditions of the environment with health-related aspects (e.g., pollution or weather data with cardio-respiratory or psychophysiological data). The creation of the dataset was motivated by the fact that people in larger cities in Japan very often do not want to go out (e.g., for some sports activities), because they are very concerned about pollution, i.e., health conditions. So it would be beneficial to have a map of the city with assigned pollution ratings, or a system that allows to perform related queries. Their dataset contains sensor data collected on routes by a few dozen volunteer  people over seven days in Fukuoka, Japan. More particularly, they collected data about the location, O3, NO2, PM2.5 (particulates), temperature, and humidity in combination with heart rate, motion behavior (from 3-axis accelerometer), relaxation level, and other personal perception data from questionnaires.

This dataset has also been used for multimedia benchmark challenges, such as the Lifelogging for Wellbeing task at MediaEval. In order to define the ground truth, volunteers were presented with specific use cases and annotation rules, and were asked to collaboratively annotate the dataset. The collected data (the feelings of participants at different locations) was also visualized using an interactive map. Although the dataset may have some inconsistent annotations, it is easy to filter them out since labels of corresponding annotators and annotator groups are contained in the dataset as well.

V3C – a Research Video Collection

The third presentation was given by Luca Rossetto from the University of Basel (Switzerland) about V3C – a Research Video Collection [3]. This is a large-scale dataset for multimedia retrieval, consisting of nearly 30,000 videos with an overall duration of about 3,800 hours. Although many other video datasets are available already (e.g., IACC.3 [6], or YFCC100M [8]), the V3C dataset is unique in the aspects of timeliness (more recent content than many other datasets and therefore more representative content for current ‘videos in the wild’) and diversity (represents many different genres or use cases), while also having no copyright restrictions (all contained videos were labelled with a Creative Commons license by their uploaders). The videos have been collected from the video sharing platform Vimeo (hence the name ‘Vimeo Creative Commons Collection’ or V3C in short) and represent video data currently used on video sharing platforms. The dataset comes together with a master shot-boundary detection ground truth, as well as keyframes and additional metadata. It is partitioned into three major parts (V3C1, V3C2, and V3C3) to make it more manageable, and it will be used by the TRECVID and the Video Browser Showdown (VBS) evaluation campaigns for several years. Although the dataset was not specifically built for retrieval, it is suitable for any use case that requires a larger video dataset.

The shot-boundary detection used to provide the master-shot reference for the V3C dataset was implemented using Cineast, which is an open source software available for download. It divides every frame into a 3×3 grid and computes color histograms for all 9 areas, which are then concatenated into a ‘regional color histogram’ feature vector that is compared between all adjacent frames. This seems to work very well for hard cuts and gradual transitions, although for grayscale content (and flashlights etc.) it is not very stable. The additional metadata provided with the dataset includes information about resolution, frame rate, uploading user and the upload date, as well as any semantic information provided by the uploader (title, description, tags, etc.). 

Athens Urban Soundscape (ATHUS): A Dataset for Urban Soundscape Quality Recognition

Originally a fourth presentation was scheduled about Athens Urban Soundscape (ATHUS): A Dataset for Urban Soundscape Quality Recognition [4], but unfortunately no author was on site to give the presentation. This dataset contains audio samples with a duration of 30 seconds (as well as extracted features and ground truth) from a metropolitan city (Athens, Greece), that have been recorded during a period of about four years by 10 different persons with the aim to provide a collection about city sounds. The metadata includes geospatial coordinates, timestamp, rating, and tagging of the sound by the recording person. The authors demonstrated in a baseline evaluation that their dataset allows to predict the soundscape quality in the city with about 42% accuracy.

Discussion

After the presentations, Björn Þór Jónsson moderated a panel discussion in which all presenters participated.

The panel started with a discussion on the size of datasets, whether the only way to make challenges more difficult is to keep increasing the dataset, or whether there are alternatives to this. Although this heavily depends on the research question one would like to solve, it was generally agreed that there is a definite need for evaluation with large datasets, because for small datasets some problems are trivial. Moreover, too small datasets often introduce some kind of content bias, so that they do not fully reflect the practical situation.

For now, it seems there is no real alternative to using larger datasets although it is clear that this will introduce additional challenges/hurdles for data management and data processing. All presenters (and the audience too) agreed that introducing larger datasets will also necessitate the need for closer collaboration with other research communities―with fields like data science, data management/engineering, and distributed and high-performance computing―in order to manage the higher data load.

However, even though we need larger datasets, we might not be ready yet to really go for true large-scale. For example, the V3C dataset is still far away from a true web-scale video search dataset; it originally was intended to be even bigger, but there were concerns from the TRECVID and VBS communities about the manageability. Datasets that are too large would set the entrance barrier for newcomers so high that an evaluation benchmark may not attract enough participants―a problem that could possibly disappear in a few years (as hardware becomes cheaper and faster/larger), but still needs to be addressed from an organizational viewpoint. 

There were notes from the audience that instead of focusing on size alone, we should also consider the problem we want to solve. It appears many researchers use datasets for use cases for which they were not designed and are not suited to. Instead of blindly going for larger size, datasets could be kept small and simple for solving essential research questions, for example by truly optimizing them to the problem to solve; different evaluations would then use different datasets. However, this would lead to a considerable dataset fragmentation and necessitate the need for combining several datasets for broader/larger evaluation tasks, which has been shown to be quite challenging in the past. For example, there are already a lot of health datasets available, and it would be interesting to take benefit from them, but the workload for the integration into competitions is often too high in practice.

Another issue that should be addressed more intensively by the research community is to figure out the situation for personal datasets that are compliant with GDPR regulations, since currently nobody really knows how to deal with this.

Acknowledgments

The session was organized by the authors of the report, in collaboration with Duc-Tien Dang-Nguyen (Dublin City University), Michael Riegler (Center for Digitalisation and Engineering & University of Oslo), and Luca Piras (University of Cagliari). The panel format of the special session made the discussions much more lively and interactive than that of a traditional technical session. We would like to thank the presenters and their co-authors for their excellent contributions, as well as the members of the audience who contributed greatly to the session.

References

[1] Gurrin, C., Schoeffmann, K., Joho, H., Munzer, B., Albatal, R., Hopfgartner, F., … & Dang-Nguyen, D. T. (2019, January). A test collection for interactive lifelog retrieval. In International Conference on Multimedia Modeling (pp. 312-324). Springer, Cham.
[2] Sato, T., Dao, M. S., Kuribayashi, K., & Zettsu, K. (2019, January). SEPHLA: Challenges and Opportunities Within Environment-Personal Health Archives. In International Conference on Multimedia Modeling (pp. 325-337). Springer, Cham.
[3] Rossetto, L., Schuldt, H., Awad, G., & Butt, A. A. (2019, January). V3C–A Research Video Collection. In International Conference on Multimedia Modeling (pp. 349-360). Springer, Cham.
[4] Giannakopoulos, T., Orfanidi, M., & Perantonis, S. (2019, January). Athens Urban Soundscape (ATHUS): A Dataset for Urban Soundscape Quality Recognition. In International Conference on Multimedia Modeling (pp. 338-348). Springer, Cham.
[5] Dang-Nguyen, D. T., Schoeffmann, K., & Hurst, W. (2018, June). LSE2018 Panel-Challenges of Lifelog Search and Access. In Proceedings of the 2018 ACM Workshop on The Lifelog Search Challenge (pp. 1-2). ACM.
[6] Awad, G., Butt, A., Curtis, K., Lee, Y., Fiscus, J., Godil, A., … & Kraaij, W. (2018, November). Trecvid 2018: Benchmarking video activity detection, video captioning and matching, video storytelling linking and video search.
[7] Lokoč, J., Kovalčík, G., Münzer, B., Schöffmann, K., Bailer, W., Gasser, R., … & Barthel, K. U. (2019). Interactive search or sequential browsing? a detailed analysis of the video browser showdown 2018. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 15(1), 29.
[8] Kalkowski, S., Schulze, C., Dengel, A., & Borth, D. (2015, October). Real-time analysis and visualization of the YFCC100M dataset. In Proceedings of the 2015 Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions(pp. 25-30). ACM.

Introducing the new role of the Director of Diversity and Outreach

sigmm-logo2

Over the last few decades SIGMM has grown with regard to the number and size of conferences and workshops we organize and sponsor, and we have grown with regard to our international outreach. Researchers from all over the world now participate in SIGMM and its many activities. In the same way in which we grow internationally with regard to members, with regard to the participants attending our conferences and their different backgrounds, the diversity of SIGMM is also growing. However, we can observe that diversity and all the aspects it brings to a society is not necessarily “just something” but needs to be supported and embraced by a cultural change of the organization and all its members.

Introducing the new role of SIGMM Director of Diversity and Outreach

In 2019, SIGMM created the new role of SIGMM Director of Diversity and Outreach with a variety of roles and responsibilities, for an initial 3-year period. Creation of this position is a sign and an action to establish future activities and an invitation on a more formal level to move our work in this area beyond anecdotal activities and personal engagement. The Director of Diversity and Outreach will be a voting member of the SIGMM Executive Committee. The EC Chair has drafted and circulated a role specification for this and sent a call to the community for expressions of interest in the role in Spring 2019. The confirmation of an appointment was made by the EC in May 2019. For the inaugural appointment 2019-2021, Susanne Boll has has been elected unanimously for this role. With this new director of diversity and outreach, SIGMM is supporting and developing diversity on an institutional level as a voting member of the SIGMM Executive Committee.

First Initiative

As a first initiative, the SIGMM EC has decided on a “25 in 25’’ strategy to strategically increase the participation of women in SIGMM and all its activities. This strategy aims at increasing the participation of women in all activities and committees of SIGMM to at least 25% by 2025. 

It can be observed that female participation in SIGMM has been low over many years. Even though there were good initiatives over the last decades, we have failed to include a proportionate number of women researchers into the SIG and into our executive structures and event organization. As we observe that about 25% of all CS degrees in computer science are held by women, we may well expect that ACM find these numbers reflected in the number of women active within their Special Interest Groups –  which is not the case in SIGMM. We strongly believe that it will only change if we as SIGMM take action. This action will take place on three levels. 

With the SIGMM Executive Actions we aim at an obligatory inclusion of women in the steering committees of SIGMM. For the coming elections in 2021, we will implement a voting scheme by which the two leading chair positions, SIGMM Chair and SIGMM Vice Chair, will be filled by a man and a woman. For the forthcoming SIGMM officer elections, SIGMM will also fill other candidate roles with two individuals, one man and one woman to ensure gender equality on the level of the different roles. 

With the SIGMM Conference Steering Actions for all forthcoming appointments to the individual Steering Committees, the Steering Committees will invite female candidates in order to reach at least a 25% share of their memberships. All Steering Committees will have their members online and maintain a history of their SC and the different positions on the organizing committee of their related conferences online.

With the SIGMM Conference Actions we request that all SIGMM-sponsored conferences have at least 25% representation of women in all roles of their organizing committee which will be observed for all forthcoming bids for conferences.  We aim at organising committees in which the many volunteer roles for our conferences, such as general chair, workshop chair, tutorial chair, panel chair, web chair, local chair, or proceedings chair could be filled by two individuals, one woman and one man.  

The SIGMM Director of Diversity and Outreach will observe the implementation of these rules and report on the state and progress annually within the EC, at the annual SIGMM business meeting at ACM Multimedia and publish a report in SIGMM Records.

What’s next?

The creation of the role of the  SIGMM Director of Diversity and Outreach was a first step. The initiative “25 in 25” is the first set of initiatives and further initiatives will follow. Currently, we are already in discussion about actions across SIGMM events such as travel support, childcare, mentoring support, support for speakers and targeted meetings. We will regularly inform you through our regular newsletter, website, and meetings. 

SIGMM understands the new role as actively pushing and developing diversity and outreach within SIGMM. The new director is here to listen and to act for a better diversity of our Special Interest Group MM, our activities and our outreach to the multimedia community.  All SIGMM members are strongly invited to support the activities of the director of outreach and the different initiatives. The director will also seek and actively exchange with and learn from other Special Interest Groups within ACM and other societies. If you want to get involved please join us (contact: Susanne Boll boll@acm.org).

Dataset Column: Datasets for Online Multimedia Verification

Introduction

Online disinformation is a problem that has been attracting increased interest by researchers worldwide as the breadth and magnitude of its impact is progressively manifested and documented in a number of studies (Boididou et al., 2014; Zhou & Zafarani, 2018; Zubiaga et al., 2018). This emerging area of research is inherently multidisciplinary and there have been numerous treatments of the subject, each having a distinct perspective or theme, ranging from the predominant perspectives of media, journalism and communications (Wardle & Derakhshan, 2017) and political science (Allcott & Gentzkow, 2017) to those of network science (Lazer et al., 2018), natural language processing (Rubin et al., 2015) and signal processing, including media forensics (Zampoglou et al., 2017). Given the multimodal nature of the problem, it is no surprise that the multimedia community has taken a strong interest in the field.

From a multimedia perspective, two research problems have attracted the bulk of researchers’ attention: a) detection of content tampering and content fabrication, and b) detection of content misuse for disinformation. The first was traditionally studied within the field of media forensics (Rocha et al, 2011), but has recently been under the spotlight as a result of the rise of deepfake videos (Güera & Delp, 2018), i.e. a special class of generative models that are capable of synthesizing highly convincing media content from scratch or based on some authentic seed content. The second problem has focused on the problem of multimedia misuse or misappropriation, i.e. the use of media content out of its original context with the goal of spreading misinformation or false narratives (Tandoc et al., 2018).

Developing automated approaches to detect media-based disinformation is relying to a great extent on the availability of relevant datasets, both for training supervised learning models and for evaluating their effectiveness. Yet, developing and releasing such datasets is a challenge in itself for a number of reasons:

  1. Identifying, curating, understanding, and annotating cases of media-based misinformation is a very effort-intensive task. More often than not, the annotation process requires careful and extensive reading of pertinent news coverage from a variety of sources similar to the journalistic practice of verification (Brandtzaeg et al., 2016).
  2. Media-based disinformation is largely manifested in social media platforms and relevant datasets are therefore hard to collect and distribute due to the temporary nature of social media content and the numerous technical restrictions and challenges involved in collecting content (mostly due to limitations or complete lack of appropriate support by the respective APIs), as well as the legal and ethical issues in releasing social media-based datasets (due to the need to comply with the respective Terms of Service and any applicable data protection law).

In this column, we present two multimedia datasets that could be of value to researchers who study media-based disinformation and develop automated approaches to tackle the problem. The first, called Fake Video Corpus (Papadopoulou et al., 2019) is a manually curated collection of 200 debunked and 180 verified videos, along with relevant annotations, accompanied by a set of 5,193 near-duplicate instances of them that were posted on popular social media platforms. The second, called FIVR-200K (Kordopatis-Zilos et al., 2019), is an automatically collected dataset of 225,960 videos, a list of 100 video queries and manually verified annotations regarding the relation (if any) of the dataset videos to each of the queries (i.e. near-duplicate, complementary scene, same incident).

For each of the two datasets, we present the design and creation process, focusing on issues and questions regarding the relevance of the collected content, the technical means of collection, and the process of annotation, which had the dual goal of ensuring high accuracy and keeping the manual annotation cost manageable. Given that each dataset is accompanied by a detailed journal article, in this column we only limit our description to high-level information, emphasizing the utility and creation process in each case, rather than on detailed statistics, which are disclosed in the respective papers.

Following the presentation of the two datasets, we then proceed to a critical discussion, highlighting their limitations and some caveats, and delineating future steps towards high quality dataset creation for the field of multimedia-based misinformation.

Related Datasets

The complexity and challenge of the multimedia verification problem has led to the creation of numerous datasets and benchmarking efforts, each designed specifically for a particular task within this area. We can broadly classify these efforts in three areas: a) multimedia forensics, b) multimedia retrieval, and c) multimedia post classification. Datasets that are focused on the text modality, e.g. Fake News Challenge, Clickbait Challenge, Hyperpartisan News Detection, RumourEval (Derczynski et al 2017), etc. are beyond the scope of this post and are hence not included in this discussion.

Multimedia forensics: Generating high-quality multimedia forensics datasets has always been a challenge, since creating convincing forgeries is normally a manual task requiring a fair amount of skill, and as a result such datasets have generally been few and limited in scale. With respect to image splicing, our own survey (Zampoglou et al, 2017) listed a number of datasets that had been made available by this point, including our own Wild Web tampered image dataset, which consists of real-world forgeries that have been collected from the Web, including multiple near-duplicates, making it a large and particularly challenging collection. Recently, the Realistic Tampering Dataset (Korus et al,2017) was proposed, offering a large number of convincing forgeries for evaluation. On the other hand, copy-move image forgeries pose a different problem that requires specially designed datasets. Three such commonly used datasets are those produced by MICC (Amerini et al, 2011), the Image Manipulation Dataset by (Christlein et al, 2012), and CoMoFoD (Tralic et al, 2013). These datasets are still actively used in research.

With respect to video tampering, there has been relative scarcity in high-quality large-scale datasets, which is understandable given the difficulty of creating convincing forgeries. The recently proposed Multimedia Forensics Challenge datasets include some large-scale sets of tampered images and videos for the evaluation of forensics algorithms. Finally, there has recently been increased interest towards the automatic detection of forgeries made with the assistance of particular software, and specifically face-swapping software. As the quality of produced face-swaps is constantly improving, detecting face-swaps is an important emerging verification task. The FaceForensics++ dataset (Rössler et al, 2019) is a very-large scale dataset containing face-swapped videos (and untampered face videos) from a number of different algorithms, aimed for the evaluation of face-swap detection algorithms.

Multimedia retrieval: Several cases of multimedia verification can be considered to be an instance of a near-duplicate retrieval task, in which the query video (video to be verified) is run against a database of past cases/videos to check whether it has already appeared before. The most popular and publicly-available dataset for near-duplicate video retrieval is arguably the CC_WEB_VIDEO dataset (Wu et al., 2007). This consists of 12,790 user-generated videos collected from popular video sharing websites (YouTube, Google Video, and Yahoo! Video). It is organized in 24 query sets, for each of which the most popular video was selected to serve as query, and the rest of the videos were manually annotated based on their duplicity to the query. Another relevant dataset is VCDB (Jiang et al., 2014), which was compiled and annotated as a benchmark for the partial video copy detection problem and is composed of videos from popular video platforms (YouTube and Metacafe). VCDB contains two subsets of videos: a) the core, which consists of 28 discrete sets of videos with a total of 528 videos with over 9,000 pairs of manually annotated partial copies, and b) the distractors, which consists of 100,000 videos with the purpose to make the video copy detection problem more challenging.

Multimedia post classification: A benchmark task under the name “Verifying Multimedia Use” (Boididou et al., 2015; Boididou et al., 2016) was organized and took place in the context of MediaEval 2015 and 2016 respectively. The task made a dataset available of 15,629 tweets containing images and videos, each of which made a false or factual claim with respect to the shared image/video. The released tweets were posted in the context of breaking news events (e.g. Hurricane Sandy, Boston Marathon bombings) or hoaxes. 

Video Verification Datasets

The Fake Video Corpus (FVC)

The Fake Video Corpus (Papadopoulou et al., 2018) is a collection of 380 user-generated videos and 5,193 near-duplicate versions of them, all collected from three online video platforms: YouTube, Facebook, and Twitter. The videos are annotated either as “verified” (“real”) or as “debunked” (“fake”) depending on whether the information they convey is accurate or misleading. Verified videos are typically user-generated takes of newsworthy events, while debunked videos include various types of misinformation, including staged content posing as UGC, real content taken out of context, or modified/tampered content (see Figure 1 for examples). The near-duplicates of each video are arranged in temporally ordered “cascades”, and each near-duplicate video is annotated with respect to its relation to the first video of the cascade (e.g. whether it is reinforcing or debunking the original claim). The FVC is the first, to our knowledge, large-scale dataset of debunked and verified user-generated videos (UGVs). The dataset contains different kinds of metadata for its videos, including channel (user) information, video information, and community reactions (number of likes, shares and comments) at the time of their inclusion.

  
  
Figure 1. A selection of real (top row) and fake (bottom row) videos from the Fake Video Corpus. Click image to jump to larger version, description, and link to YouTube video.

The initial set of 380 videos were collected and annotated using various sources including the Context Aggregation and Analysis (CAA) service developed within the InVID project and fact-checking sites such as Snopes. To build the dataset, all videos submitted to the CAA service between November 2017 and January 2018 were collected in an initial pool of approximately 1600 videos, which were then manually inspected and filtered. The remaining videos were annotated as “verified” or “debunked” using established third party sources (news articles or blog posts), leading to the final pool of 180 verified and 200 fake unique videos. Then, keyword-based search was run on the three platforms, and near-duplicate video detection was used to identify the video duplicates within the returned results. More specifically, for each of the 380 videos, its title was reformulated in a more general form, and translated into four major languages: Russian, Arabic, French, and German. The original title, the general form and the translations were submitted as queries to YouTube, Facebook, and Twitter. Then, the  near-duplicate retrieval algorithm of Kordopatis-Zilos etal (2017) was used on the resulting pool, and the results were manually inspected to remove erroneous matches.

The purpose of the dataset is twofold: i) to be used for the analysis of the dissemination patterns of real and fake user-generated videos (by analyzing the traits of the near-duplicate video cascades), and ii) to serve as a benchmark for the evaluation of automated video verification methods. The relatively large size of the dataset is important for both of these tasks. With respect to the study of dissemination patterns, the dataset provides the opportunity to study the dissemination of the same or similar content by analyzing associations between videos not provided by the original platform APIs, combined with the wealth of associated metadata. In parallel, having a collection of 5,573 annotated “verified” or “debunked” videos- even if many are near-duplicate versions of the 380 cases – can be used for the evaluation (or even training) of verification systems, either based on visual content or the associated video metadata.

The Fine-grained Incident Video Retrieval Dataset (FIVR-200K)

The FIVR-200K dataset (Kordopatis-Zilos et al., 2019) consists of 225,960 videos associated with 4,687 Wikipedia events and 100 selected video queries (see Figure 2 for examples). It has been designed to simulate the problem of Fine-grained Incident Video Retrieval (FIVR). The objective of this problem is: given a query video, retrieve all associated videos considering several types of associations with respect to an incident of interest. FIVR contains several retrieval tasks as special cases under a single framework. In particular, we consider three types of association between videos: a) Duplicate Scene Videos (DSV), which share at least one scene (originating from the same camera) regardless of any applied transformation, b) Complementary Scene Videos (CSV), which contain part of the same spatiotemporal segment, but captured from different viewpoints, and c) Incident Scene Videos (ISV), which capture the same incident, i.e. they are spatially and temporally close, but have no overlap.

For the collection of the dataset, we first crawled Wikipedia’s Current Event page to collect a large number of major news events that occurred between 2013 and 2017 (five years). Each news event is accompanied with a topic, headline, text, date, and hyperlinks. To collect videos of the same category, we retained only news events with topic “Armed conflicts and attacks” or “Disasters and accidents”. This ultimately led to a total of 4,687 events after filtering. To gather videos around these events and build a large collection with numerous video pairs that are associated through the relations of interest (DSV, CSV and ISV), we queried the public YouTube API with the event headlines. To ensure that the collected videos capture the corresponding event, we retained only the videos published within a timespan of one week from the event date. This process resulted in the collection of 225,960 videos.

  
Figure 2. A selection of query videos from the Fine-grained Incident Video Retrieval dataset. Click image to jump to larger version, link to YouTube video, and several associated videos.

Next, we proceeded with the selection of query videos. We set up an automated filtering and ranking process that implemented the following criteria: a) query videos should be relatively short and ideally focus on a single scene, b) queries should have many near-duplicates or same-incident videos within the dataset that are published by many different uploaders, c) among a set of near-duplicate/same-instance videos, the one that was uploaded first should be selected as query. This selection process was implemented based on a graph-based clustering approach and resulted in the selection of 635 query videos, of which we used the top 100 (ranked by corresponding cluster size) as the final query set.

For the annotation of similarity relations among videos, we followed a multi-step process, in which we presented annotators with the results of a similarity-based video retrieval system and asked them to indicate the type of relation through a drop-down list of the following labels: a) Near-Duplicate (ND), a special case where the whole video is near-duplicate to the query video, b) Duplicate Scene (DS), where only some scenes in the candidate video are near-duplicates of scenes in the query video, c) Complementary Scenes (CS), d) Incident Scene (IS), and e) Distractors (DI), i.e. irrelevant videos.

To make sure that annotators were presented with as many potentially relevant videos as possible, we used visual-only, text-only and hybrid similarity in turn. As a result, each annotator reviewed video candidates that had very high similarity with the query video in terms either of their visual content, or text metadata (title and description) or the combination of similarities. Once an initial set of annotations were produced by two independent annotators, the annotators went twice again through the annotations two ensure consistency and accuracy.

FIVR-200K was designed to serve as a benchmark that poses real-world challenges for the problem of reverse video search. Given a query video to be verified, the analyst would want to know whether the same or a very similar version of it has already been published. In that way, the user would be able to easily debunk cases of out-of-context video use (i.e. misappropriation) and on the other hand, if several videos are found that depict the same scene from different viewpoints at approximately the same time, then they could be considered to corroborate the video of interest.

Discussion: Limitations and Caveats

We are confident that the two video verification datasets presented in this column can be valuable resources for researchers interested in the problem of media-based disinformation and could serve both as training sets and as benchmarks for automated video verification methods. Yet, both of them suffer from certain limitations and care should be taken when using them to draw conclusions. 

A first potential issue has to do with the video selection bias arising from the particular way that each of the two datasets was created. The videos of the Fake Video Corpus were selected in a mixed manner trying to include a number of cases that were known to the dataset creators and their collaborators, and was also enriched by a pool of test videos that were submitted for analysis to a publicly available video verification service. As a result, it is likely to be more focused on viral and popular videos. Also, videos were included, for which debunking or corroborating information was found online, which introduces yet another source of bias, potentially towards cases that were more newsworthy or clear cut. In the case of the FIVR-200K dataset, videos were intentionally collected to be between two categories of newsworthy events with the goal of ending up with a relatively homogeneous collection, which would be challenging in terms of content-based retrieval. This means that certain types of content, such as political events, sports and entertainment, are very limited or not present at all in the dataset. 

A question that is related to the selection bias of the above datasets pertains to their relevance for multimedia verification and for real-world applications. In particular, it is not clear whether the video cases offered by the Fake Video Corpus are representative of actual verification tasks that journalists and news editors face in their daily work. Another important question is whether these datasets offer a realistic challenge to automatic multimedia analysis approaches. In the case of FIVR-200K, it was clearly demonstrated (Kordopatis-Zilos et al., 2019) that the dataset is a much harder benchmark for near-duplicate detection methods compared to previous datasets such as CC_WEB_VIDEO and VCDB. Even so, we cannot safely conclude that a method, which performs very well in FIVR-200K, would perform equally well in a dataset of much larger scale (e.g. millions or even billions of videos).

Another issue that affects the access to these datasets and the reproducibility of experimental results relates to the ephemeral nature of online video content. A considerable (and increasing) part of these video collections is taken down (either by their own creators or from the video platform), which makes it impossible for researchers to gain access to the exact video set that was originally collected. To give a better sense of the problem, 21% of the Fake Video Corpus and 11% of the FIVR-200K videos were not available online on September 2019. This issue, which affects all datasets that are based on online multimedia content, raises the more general question of whether there are steps that can be taken by online platforms such as YouTube, Facebook and Twitter that could facilitate the reproducibility of social media research without violating copyright legislation or the platforms’ terms of service.

The ephemeral nature of online content is not the only factor that renders the value of multimedia datasets very sensitive to the passing of time. Especially in the case of online disinformation, there seems to be an arms’ race, where new machine learning methods constantly get better in detecting misleading or tampered content, but at the same time new types of misinformation emerge, which are increasingly AI-assisted. This is particularly profound in the case of deepfakes, where the main research paradigm is based on the concept of competition between a generator (adversary) and a detector (Goodfellow et al., 2014). 

Last but not least, one may always be concerned about the potential ethical issues arising when publicly releasing such datasets. In our case, reasonable concerns for privacy risks, which are always relevant when dealing with social media content, are addressed by complying with the relevant Terms of Service of the source platforms and by making sure that any annotation (label) assigned to the dataset videos is accurate. Additional ethical issues pertain to the potential “dual use” of the dataset, i.e. their use by adversaries to craft better tools and techniques to make misinformation campaigns more effective. A recent pertinent case was OpenAI’s delayed release of their very powerful GPT-2 model, which sparked numerous discussions and criticism, and making clear that there is no commonly accepted practice for ensuring reproducibility of research results (and empowering future research) and at the same time making sure that risks of misuse are eliminated.

Future work

Given the challenges of creating and releasing a large-scale dataset for multimedia verification, the main conclusions from our efforts towards this direction so far are the following:

  • The field of multimedia verification is in constant motion and therefore the concept of a static dataset may not be sufficient to capture the real-world nuances and latest challenges of the problem. Instead new benchmarking models, e.g. in the form of open data challenges, and resources, e.g. constantly updated repository of “fake” multimedia, appear to be more effective for empowering future research in the area.
  • The role of social media and multimedia sharing platforms (incl. YouTube, Facebook, Twitter, etc.) seems to be crucial in enabling effective collaboration between academia and industry towards addressing the real-world consequences of online misinformation. While there have been recent developments towards this direction, including the announcements by both Facebook and Alphabet’s Jigsaw of new deepfake datasets, there is also doubt and scepticism about the degree of openness and transparency that such platforms are ready to offer, given the conflicts of interest that are inherent in the underlying business model. 
  • Building a dataset that is fit for a highly diverse and representative set of verification cases appears to be a task that would require a community effort instead of effort from a single organisation or group. This would not only help towards distributing the massive dataset creation cost and effort to multiple stakeholders, but also towards ensuring less selection bias, richer and more accurate annotation and more solid governance.

References

Allcott, H., Gentzkow, M., “Social media and fake news in the 2016 election”, Journal of economic perspectives, 31(2), pp. 211–36, 2017.
Amerini, I, Ballan, L., Caldelli, R., Del Bimbo, A., Serra, G., “A SIFT-based forensic method for copy-move attack detection and transformation recovery”, IEEE Transactions on Information Forensics and Security, 6(3), pp. 1099–1110,2011.
Boididou, C., Papadopoulos, S., Kompatsiaris, Y., Schifferes, S., Newman, N., “Challenges of computational verification in social multimedia”, In Proceedings of the 23rd ACM International Conference on World Wide Web, pp. 743–748,2014.
Boididou, C., Andreadou, K., Papadopoulos, S., Dang-Nguyen, D.T., Boato, G., Riegler, M., Kompatsiaris, Y., “Verifying multimedia use at MediaEval 2015”. In Proceedings of MediaEval 2015, 2015.
Boididou C., Papadopoulos S., Dang-Nguyen D., Boato G., Riegler M., Middleton S.E., Petlund A., Kompatsiaris Y., “Verifying multimedia use at MediaEval 2016”. In Proceedings of MediaEval 2016, 2016.
Brandtzaeg, P.B., Lüders, M., Spangenberg, J., Rath-Wiggins, L., Følstad, A., “Emerging journalistic verification practices concerning social media”. Journalism Practice, 10(3), pp. 323–342, 2016.
Christlein V., Riess C., Jordan J., Riess C., Angelopoulou, E., “An evaluation of popular copy-move forgery detection approaches”. IEEE Transactions on Information Forensics & Security, 7(6), pp. 1841–1854, 2012.
Derczynski, L., Bontcheva, K., Liakata, M., Procter, R., Hoi, G.W.S., Zubiaga, A., “Semeval-2017 Task 8: Rumoureval: determining rumour veracity and support for rumours”, Proceedings of the 11th International Workshop on Semantic Evaluation,pp. 69-76, 2017.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Bengio, Y., “Generative adversarial nets”. In Advances in Neural Information Processing Systems, pp. 2672–2680, 2014.
Guan, H., Kozak, M., Robertson, E., Lee, Y., Yates, A.N., Delgado, A., Zhou, D., Kheyrkhah, T., Smith, J., Fiscus, J., “MFC datasets: Large-scale benchmark datasets for media forensic challenge evaluation”, In Proceedings of the 2019 IEEEWinter Applications of Computer Vision Workshops, pp. 63–72, 2019.
Güera, D., Delp, E.J., “Deepfake video detection using recurrent neural networks”, In Proceedings of the 15th IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 1–6, 2018.
Jiang, Y. G., Jiang, Y., Wang, J., “VCDB: A large-scale database for partial copy detection in videos”. In Proceedings of the European Conference on Computer Vision, pp. 357–371, 2014.
Kiesel, J., Mestre, M., Shukla, R., Vincent, E., Adineh, P., Corney, D., Stein, B. Potthast, M., “Semeval-2019 Task 4: Hyperpartisan news detection”. In Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 829–839,2019.
Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., Kompatsiaris, I., “FIVR: Fine-grained incident video retrieval”. IEEE Transactions on Multimedia, 21(10), pp. 2638–2652, 2019.
Korus, P., Huang, J., “Multi-scale analysis strategies in PRNU-based tampering localization”, IEEE Transactions on Information Forensics & Security, 21(4), pp. 809–824, 2017.
Lazer, D.M., Baum, M.A., Benkler, Y., Berinsky, A.J., Greenhill, K.M., Menczer, F., Schudson, M., “The science of fake news”, Science, 359(6380), pp. 1094–1096, 2018.
Papadopoulou, O., Zampoglou, M., Papadopoulos, S., Kompatsiaris, I., “A corpus of debunked and verified user-generated videos”. Online Information Review, 43(1), pp. 72–88, 2019.
Rocha, A., Scheirer, W., Boult, T., Goldenstein, S., “Vision of the unseen: Current trends and challenges in digital image and video forensics”, ACM Computing Surveys, 43(4), art. 26, 2011.
Rössler, A. Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., Nießner, M. “Faceforensics++: Learning to detect manipulated facial images”, In Proceedings of the IEEE International Conference on Computer Vision, 2019.
Rubin, V.L., Chen, Y., Conroy, N.J., “Deception detection for news: Three types of fakes”, In Proceedings of the 78th ASIS&T Annual Meeting: Information Science with Impact: Research in and for the Community, art. 83, 2015.
Tandoc Jr, E.C., Lim, Z.W., Ling, R. “Defining “fake news”: A typology of scholarly definitions”, Digital journalism, 6(2), pp. 137–153, 2018.
Tralic, D., Zupancic I., Grgic S., Grgic M., “CoMoFoD – New database for copy-move forgery detection”. In Proceedings of the 55th International Symposium on Electronics in Marine, pp. 49–54, 2013.
Wardle, C., Derakhshan, H., “Information disorder: Toward an interdisciplinary framework for research and policy making”, Council of Europe Report, 27, 2017.
Wu, X., Hauptmann, A.G., Ngo, C.-W., “Practical elimination of near-duplicates from web video search”, In Proceedings of the 15th ACM International Conference on Multimedia, pp. 218–227, 2007.
Zampoglou, M., Papadopoulos, S., Kompatsiaris, Y., “Detecting image splicing in the wild (web)”, In Proceedings of the 2015 IEEE International Conference on Multimedia & Expo Workshops, 2015.
Zampoglou, M., Papadopoulos, S., Kompatsiaris, Y., “Large-scale evaluation of splicing localization algorithms for web images”, Multimedia Tools and Applications, 76(4), pp. 4801–4834, 2017.
Zhou, X., Zafarani, R., “Fake news: A survey of research, detection methods, and opportunities”. arXiv preprint arXiv:1812.00315, 2018.
Zubiaga, A., Aker, A., Bontcheva, K., Liakata, M., Procter, R., “Detection and resolution of rumours in social media: A survey”, ACM Computing Surveys, 51(2), art. 32, 2018.

Appendix A: Examples of videos in the Fake Video Corpus.

Real videos


US Airways Flight 1549 ditched in the Hudson River.


A group of musicians playing in an Istanbul park while bombs explode outside the stadium behind them.


A giant alligator crossing a Florida golf course.

Fake videos


“Syrian boy rescuing a girl amid gunfire” – Staged (fabricated content): The video was filmed by Norwegian Lars Klevberg in Malta.


“Golden Eagle Snatches Kid” – Tampered: The video was created by a team of students in Montreal as part of their course on visual effects.


“Pope Francis slaps Donald Trump’s hand for touching him” – Satire/parody: The video was digitally manipulated, and was made for the late-night television show Jimmy Kimmel Live.

Appendix B: Examples of videos in the Fine-grained Incident Video Retrieval dataset.

Example 1


Query video from the American Airlines Flight 383 fire at Chicago O’Hare International Airport in October 28, 2016.


Duplicate scene video.


Complimentary scene video.


Incident scene video.

Example 2


Query video from the Boston Marathon bombing in April 15, 2013.


Duplicate scene video.


Complimentary scene video.


Incident scene video.

Example 3


Query video from the the Las Vegas shooting in October 1, 2017.


Duplicate scene video.


Complimentary scene video.


Incident scene video.

JPEG Column: 84th JPEG Meeting in Brussels, Belgium

The 84th JPEG meeting was held in Brussels, Belgium.

This meeting was characterised by significant progress in most of JPEG projects and also exploratory studies. JPEG XL, the new image coding system, has issued the Committee Draft, giving shape to this new effective solution for the future of image coding. JPEG Pleno, the standard for new imaging technologies, Part 1 (Framework) and Part 2 (Light field coding) have also reached Draft International Standard status.

Moreover, exploration studies are ongoing in the domain of media blockchain and on the application of learning solutions for image coding (JPEG AI). Both have triggered a number of activities providing new knowledge and opening new possibilities on the future use of these technologies in future JPEG standards.

The 84th JPEG meeting had the following highlights: 84th meetingTE-66694113_10156591758739370_4025463063158194176_n

  • JPEG XL issues the Committee Draft
  • JPEG Pleno Part 1 and 2 reaches Draft International Standard status
  • JPEG AI defines Common Test Conditions
  • JPEG exploration studies on Media Blockchain
  • JPEG Systems –JLINK working draft
  • JPEG XS

In the following, a short description of the most significant activities is presented.

 

JPEG XL

The JPEG XL Image Coding System (ISO/IEC 18181) has completed the Committee Draft of the standard. The new coding technique allows storage of high-quality images at one-third the size of the legacy JPEG format. Moreover, JPEG XL can losslessly transcode existing JPEG images to about 80% of their original size simplifying interoperability and accelerating wider deployment.

The JPEG XL reference software, ready for mobile and desktop deployments, will be available in Q4 2019. The current contributors have committed to releasing it publicly under a royalty-free and open source license.

 

JPEG Pleno

A significant milestone has been reached during this meeting: the Draft International Standard (DIS) for both JPEG Pleno Part 1 (Framework) and Part 2 (Light field coding) have been completed. A draft architecture of the Reference Software (Part 4) and developments plans have been also discussed and defined.

In addition, JPEG has completed an in-depth analysis of existing point cloud coding solutions and a new version of the use-cases and requirements document has been released reflecting the future role of JPEG Pleno in point cloud compression. A new set of Common Test Conditions has been released as a guideline for the testing and evaluation of point cloud coding solutions with both a best practice subjective testing protocol and a set of objective metrics.

JPEG Pleno holography activities had significant advances on the definition of use cases and requirements, and description of Common Test Conditions. New quality assessment methodologies for holographic data defined in the framework of a collaboration between JPEG and Qualinet were established. Moreover, JPEG Pleno continues collecting microscopic and tomographic holographic data.

 

JPEG AI

The JPEG Committee continues to carry out exploration studies with deep learning-based image compression solutions, typically with an auto-encoder architecture. The promise that these types of codecs hold, especially in terms of coding efficiency, will be evaluated with several studies. In this meeting, a Common Test Conditions was produced, which includes a plan for subjective and objective quality assessment experiments as well as coding pipelines for anchor and learning-based codecs. Moreover, a JPEG AI dataset was proposed and discussed, and a double stimulus impairment scale experiment (side-by-side) was performed with a mix of experts and non-experts in a controlled environment.

 

JPEG exploration on Media Blockchain

Fake news, copyright violation, media forensics, privacy and security are emerging challenges in digital media. JPEG has determined that blockchain and distributed ledger technologies (DLT) have great potential as a technology component to address these challenges in transparent and trustable media transactions. However, blockchain and DLT need to be integrated closely with a widely adopted standard to ensure broad interoperability of protected images. JPEG calls for industry participation to help define use cases and requirements that will drive the standardization process. In order to clearly identify the impact of blockchain and distributed ledger technologies on JPEG standards, the committee has organised several workshops to interact with stakeholders in the domain.

The 4th public workshop on media blockchain was organized in Brussels on Tuesday the 16th of July 2019 during the 84th ISO/IEC JTC 1/SC 29/WG1 (JPEG) Meeting. The presentations and program of the workshop are available on jpeg.org.

The JPEG Committee has issued an updated version of the white paper entitled “Towards a Standardized Framework for Media Blockchain” that elaborates on the initiative, exploring relevant standardization activities, industrial needs and use cases.

To keep informed and to get involved in this activity, interested parties are invited to register to the ad hoc group’s mailing list.

 

JPEG Systems – JLINK

At the 84th meeting, IS text reviews for ISO/IEC 19566-5 JUMBF and ISO/IEC 19566-6 JPEG 360 were completed; IS publication will be forthcoming.  Work began on adding functionality to JUMBF, Privacy & Security, and JPEG 360; and initial planning towards developing software implementation of these parts of JPEG Systems specification.  Work also began on the new ISO/IEC 19566-7 Linked media images (JLINK) with development of a working draft.

 

JPEG XS

The JPEG Committee is pleased to announce new Core Experiments and Exploration Studies on compression of raw image sensor data. The JPEG XS project aims at the standardization of a visually lossless low-latency and lightweight compression scheme that can be used as a mezzanine codec in various markets. Video transport over professional video links (SDI, IP, Ethernet), real-time video storage in and outside of cameras, memory buffers, machine vision systems, and data compression onboard of autonomous vehicles are among the targeted use cases for raw image sensor compression. This new work on raw sensor data will pave the way towards highly efficient close-to-sensor image compression workflows with JPEG XS.

 

Final Quote

“Completion of the Committee Draft of JPEG XL, the new standard for image coding is an important milestone. It is hoped that JPEG XL can become an excellent replacement of the widely used JPEG format which has been in service for more than 25 years.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

About JPEG

The Joint Photographic Experts Group (JPEG) is a Working Group of ISO/IEC, the International Organisation for Standardization / International Electrotechnical Commission, (ISO/IEC JTC 1/SC 29/WG 1) and of the International Telecommunication Union (ITU-T SG16), responsible for the popular JPEG, JPEG 2000, JPEG XR, JPSearch, JPEG XT and more recently, the JPEG XS, JPEG Systems, JPEG Pleno and JPEG XL families of imaging standards.

More information about JPEG and its work is available at www.jpeg.org.

Future JPEG meetings are planned as follows:

  • No 85, San Jose, California, U.S.A., November 2 to 8, 2019
  • No 86, Sydney, Australia, January 18 to 24, 2020

MPEG Column: 127th MPEG Meeting in Gothenburg, Sweden

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

Plenary of the 127th MPEG Meeting in Gothenburg, Sweden.

Plenary of the 127th MPEG Meeting in Gothenburg, Sweden.

The 127th MPEG meeting concluded on July 12, 2019 in Gothenburg, Sweden with the following topics:

  • Versatile Video Coding (VVC) enters formal approval stage, experts predict 35-60% improvement over HEVC
  • Essential Video Coding (EVC) promoted to Committee Draft
  • Common Media Application Format (CMAF) 2nd edition promoted to Final Draft International Standard
  • Dynamic Adaptive Streaming over HTTP (DASH) 4th edition promoted to Final Draft International Standard
  • Carriage of Point Cloud Data Progresses to Committee Draft
  • JPEG XS carriage in MPEG-2 TS promoted to Final Draft Amendment of ISO/IEC 13818-1 7th edition
  • Genomic information representation – WG11 issues a joint call for proposals on genomic annotations in conjunction with ISO TC 276/WG 5
  • ISO/IEC 23005 (MPEG-V) 4th Edition – WG11 promotes the Fourth edition of two parts of “Media Context and Control” to the Final Draft International Standard (FDIS) stage

The corresponding press release of the 127th MPEG meeting can be found here: https://mpeg.chiariglione.org/meetings/127

Versatile Video Coding (VVC)

The Moving Picture Experts Group (MPEG) is pleased to announce that Versatile Video Coding (VVC) progresses to Committee Draft, experts predict 35-60% improvement over HEVC.

The development of the next major generation of video coding standard has achieved excellent progress, such that MPEG has approved the Committee Draft (CD, i.e., the text for formal balloting in the ISO/IEC approval process).

The new VVC standard will be applicable to a very broad range of applications and it will also provide additional functionalities. VVC will provide a substantial improvement in coding efficiency relative to existing standards. The improvement in coding efficiency is expected to be quite substantial – e.g., in the range of 35–60% bit rate reduction relative to HEVC although it has not yet been formally measured. Relative to HEVC means for equivalent subjective video quality at picture resolutions such as 1080p HD or 4K or 8K UHD, either for standard dynamic range video or high dynamic range and wide color gamut content for levels of quality appropriate for use in consumer distribution services. The focus during the development of the standard has primarily been on 10-bit 4:2:0 content, and 4:4:4 chroma format will also be supported.

The VVC standard is being developed in the Joint Video Experts Team (JVET), a group established jointly by MPEG and the Video Coding Experts Group (VCEG) of ITU-T Study Group 16. In addition to a text specification, the project also includes the development of reference software, a conformance testing suite, and a new standard ISO/IEC 23002-7 specifying supplemental enhancement information messages for coded video bitstreams. The approval process for ISO/IEC 23002-7 has also begun, with the issuance of a CD consideration ballot.

Research aspects: VVC represents the next generation video codec to be deployed in 2020+ and basically the same research aspects apply as for previous generations, i.e., coding efficiency, performance/complexity, and objective/subjective evaluation. Luckily, JVET documents are freely available including the actual standard (committee draft), software (and its description), and common test conditions. Thus, researcher utilizing these resources are able to conduct reproducible research when contributing their findings and code improvements back to the community at large. 

Essential Video Coding (EVC)

MPEG-5 Essential Video Coding (EVC) promoted to Committee Draft

Interestingly, at the same meeting as VVC, MPEG promoted MPEG-5 Essential Video Coding (EVC) to Committee Draft (CD). The goal of MPEG-5 EVC is to provide a standardized video coding solution to address business needs in some use cases, such as video streaming, where existing ISO video coding standards have not been as widely adopted as might be expected from their purely technical characteristics.

The MPEG-5 EVC standards includes a baseline profile that contains only technologies that are over 20 years old or are otherwise expected to be royalty-free. Additionally, a main profile adds a small number of additional tools, each providing significant performance gain. All main profile tools are capable of being individually switched off or individually switched over to a corresponding baseline tool. Organizations making proposals for the main profile have agreed to publish applicable licensing terms within two years of FDIS stage, either individually or as part of a patent pool.

Research aspects: Similar research aspects can be described for EVC and from a software engineering perspective it could be also interesting to further investigate this switching mechanism of individual tools or/and fall back option to baseline tools. Naturally, a comparison with next generation codecs such as VVC is interesting per se. The licensing aspects itself are probably interesting for other disciplines but that is another story…

Common Media Application Format (CMAF)

MPEG ratified the 2nd edition of the Common Media Application Format (CMAF)

The Common Media Application Format (CMAF) enables efficient encoding, storage, and delivery of digital media content (incl. audio, video, subtitles among others), which is key to scaling operations to support the rapid growth of video streaming over the internet. The CMAF standard is the result of widespread industry adoption of an application of MPEG technologies for adaptive video streaming over the Internet, and widespread industry participation in the MPEG process to standardize best practices within CMAF.

The 2nd edition of CMAF adds support for a number of specifications that were a result of significant industry interest. Those include

  • Advanced Audio Coding (AAC) multi-channel;
  • MPEG-H 3D Audio;
  • MPEG-D Unified Speech and Audio Coding (USAC);
  • Scalable High Efficiency Video Coding (SHVC);
  • IMSC 1.1 (Timed Text Markup Language Profiles for Internet Media Subtitles and Captions); and
  • additional HEVC video CMAF profiles and brands.

This edition also introduces CMAF supplemental data handling as well as new structural brands for CMAF that reflects the common practice of the significant deployment of CMAF in industry. Companies adopting CMAF technology will find the specifications introduced in the 2nd Edition particularly useful for further adoption and proliferation of CMAF in the market.

Research aspects: see below (DASH).

Dynamic Adaptive Streaming over HTTP (DASH)

MPEG approves the 4th edition of Dynamic Adaptive Streaming over HTTP (DASH)

The 4th edition of MPEG-DASH comprises the following features:

  • service description that is intended by the service provider on how the service is expected to be consumed;
  • a method to indicate the times corresponding to the production of associated media;
  • a mechanism to signal DASH profiles and features, employed codec and format profiles; and
  • supported protection schemes present in the Media Presentation Description (MPD).

It is expected that this edition will be published later this year. 

Research aspects: CMAF 2nd and DASH 4th edition come along with a rich feature set enabling a plethora of use cases. The underlying principles are still the same and research issues arise from updated application and service requirements with respect to content complexity, time aspects (mainly delay/latency), and quality of experience (QoE). The DASH-IF awards the excellence in DASH award at the ACM Multimedia Systems conference and an overview about its academic efforts can be found here.

Carriage of Point Cloud Data

MPEG progresses the Carriage of Point Cloud Data to Committee Draft

At its 127th meeting, MPEG has promoted the carriage of point cloud data to the Committee Draft stage, the first milestone of ISO standard development process. This standard is the first one introducing the support of volumetric media in the industry-famous ISO base media file format family of standards.

This standard supports the carriage of point cloud data comprising individually encoded video bitstreams within multiple file format tracks in order to support the intrinsic nature of the video-based point cloud compression (V-PCC). Additionally, it also allows the carriage of point cloud data in one file format track for applications requiring multiplexed content (i.e., the video bitstream of multiple components is interleaved into one bitstream).

This standard is expected to support efficient access and delivery of some portions of a point cloud object considering that in many cases that entire point cloud object may not be visible by the user depending on the viewing direction or location of the point cloud object relative to other objects. It is currently expected that the standard will reach its final milestone by the end of 2020.

Research aspects: MPEG’s Point Cloud Compression (PCC) comes in two flavors, video- and geometric-based but still requires to be packaged into file and delivery formats. MPEG’s choice here is the ISO base media file format and the efficient carriage of point cloud data is characterized by both functionality (i.e., enabling the required used cases) and performance (such as low overhead).

MPEG 2 Systems/Transport Stream

JPEG XS carriage in MPEG-2 TS promoted to Final Draft Amendment of ISO/IEC 13818-1 7th edition

At its 127th meeting, WG11 (MPEG) has extended ISO/IEC 13818-1 (MPEG-2 Systems) – in collaboration with WG1 (JPEG) – to support ISO/IEC 21122 (JPEG XS) in order to support industries using still image compression technologies for broadcasting infrastructures. The specification defines a JPEG XS elementary stream header and specifies how the JPEG XS video access unit (specified in ISO/IEC 21122-1) is put into a Packetized Elementary Stream (PES). Additionally, the specification also defines how the System Target Decoder (STD) model can be extended to support JPEG XS video elementary streams.

Genomic information representation

WG11 issues a joint call for proposals on genomic annotations in conjunction with ISO TC 276/WG 5

The introduction of high-throughput DNA sequencing has led to the generation of large quantities of genomic sequencing data that have to be stored, transferred and analyzed. So far WG 11 (MPEG) and ISO TC 276/WG 5 have addressed the representation, compression and transport of genome sequencing data by developing the ISO/IEC 23092 standard series also known as MPEG-G. They provide a file and transport format, compression technology, metadata specifications, protection support, and standard APIs for the access of sequencing data in the native compressed format.

An important element in the effective usage of sequencing data is the association of the data with the results of the analysis and annotations that are generated by processing pipelines and analysts. At the moment such association happens as a separate step, standard and effective ways of linking data and meta information derived from sequencing data are not available.

At its 127th meeting, MPEG and ISO TC 276/WG 5 issued a joint Call for Proposals (CfP) addressing the solution of such problem. The call seeks submissions of technologies that can provide efficient representation and compression solutions for the processing of genomic annotation data.

Companies and organizations are invited to submit proposals in response to this call. Responses are expected to be submitted by the 8th January 2020 and will be evaluated during the 129th WG 11 (MPEG) meeting. Detailed information, including how to respond to the call for proposals, the requirements that have to be considered, and the test data to be used, is reported in the documents N18648, N18647, and N18649 available at the 127th meeting website (http://mpeg.chiariglione.org/meetings/127). For any further question about the call, test conditions, required software or test sequences please contact: Joern Ostermann, MPEG Requirements Group Chair (ostermann@tnt.uni-hannover.de) or Martin Golebiewski, Convenor ISO TC 276/WG 5 (martin.golebiewski@h-its.org).

ISO/IEC 23005 (MPEG-V) 4th Edition

WG11 promotes the Fourth edition of two parts of “Media Context and Control” to the Final Draft International Standard (FDIS) stage

At its 127th meeting, WG11 (MPEG) promoted the 4th edition of two parts of ISO/IEC 23005 (MPEG-V; Media Context and Control) standards to the Final Draft International Standard (FDIS). The new edition of ISO/IEC 23005-1 (architecture) enables ten new use cases, which can be grouped into four categories: 3D printing, olfactory information in virtual worlds, virtual panoramic vision in car, and adaptive sound handling. The new edition of ISO/IEC 23005-7 (conformance and reference software) is updated to reflect the changes made by the introduction of new tools defined in other parts of ISO/IEC 23005. More information on MPEG-V and its parts 1-7 can be found at https://mpeg.chiariglione.org/standards/mpeg-v.


Finally, the unofficial highlight of the 127th MPEG meeting we certainly found while scanning the scene in Gothenburg on Tuesday night…

MPEG127_Metallica

Qualinet Databases: Central Resource for QoE Research – History, Current Status, and Plans

Introduction

Datasets are an enabling tool for successful technological development and innovation in numerous fields. Large-scale databases of multimedia content play a crucial role in the development and performance evaluation of multimedia technologies. Among those are most importantly audiovisual signal processing, for example coding, transmission, subjective/objective quality assessment, and QoE (Quality of Experience) [1]. Publicly available and widely accepted datasets are necessary for a fair comparison and validation of systems under test; they are crucial for reproducible research. In the public domain, large amounts of relevant multimedia contents are available, for example, ACM SIGMM Records Dataset Column (http://sigmm.hosting.acm.org/category/datasets-column/), MediaEval Benchmark (http://www.multimediaeval.org/), MMSys Datasets (http://www.sigmm.org/archive/MMsys/mmsys14/index.php/mmsys-datasets.html), etc. However, the description of these datasets is usually scattered – for example in technical reports, research papers, online resources – and it is a cumbersome task for one to find the most appropriate dataset for the particular needs.

The Qualinet Multimedia Databases Online platform is one of many efforts to provide an overview and comparison of multimedia content datasets – especially for QoE-related research, all in one place. The platform was introduced in the frame of ICT COST Action IC1003 European Network on Quality of Experience in Multimedia Systems and Services – Qualinet (http://www.qualinet.eu). The platform, abbreviated “Qualinet Databases” (http://dbq.multimediatech.cz/), is used to share information on databases with the community [3], [4]. Qualinet was supported as a COST Action between November 8, 2010, and November 7, 2014. It has continued as an independent entity with a new structure, activities, and management since 2015. Qualinet Databases platform fulfills the initial goal to provide a rich and internationally recognized database and has been running since 2010. It is widely considered as one of Qualinet’s most notable achievements.

In the following paragraphs, there is a summary on Qualinet Databases, including its history, current status, and plans.

Background

A commonly recognized database for multimedia content is a crucial resource required not only for QoE-related research. Among the first published efforts in this field are the image and video quality resources website by Stefan Winkler (https://stefan.winklerbros.net/resources.html) and related publications providing in-depth analysis of multimedia content databases [2]. Since 2010, one of the main interests of Qualinet and its Working Group 4 (WG4) entitled Databases and Validation (Leader: Christian Timmerer, Deputy Leaders: Karel Fliegel, Shelley Buchinger, Marcus Barkowsky) was to create an even broader database with extended functionality and take the necessary steps to make it accessible to all researchers.

Qualinet firstly decided to list and summarize available multimedia databases based on a literature search and feedback from the project members. As the number of databases in the list was rapidly increasing, the handling of the necessary updates became inefficient. Based on these findings, WG4 started the implementation of the Qualinet Databases online platform in 2011. Since then, the website has been used as Qualinet’s central resource for sharing the datasets among Qualinet members and the scientific community. To the best of our knowledge, there is no other publicly available resource for QoE research that offers similar functionality. The Qualinet Databases platform is intended to provide more features than other known similar solutions such as Consumer Video Digital Library (http://www.cdvl.org). The main difference lies in the fact that the Qualinet Databases acts as a hub to various scattered resources of multimedia content, especially with the available data, such as MOS (Mean Opinion Score), raw data from subjective experiments, eye-tracking data, and detailed descriptions of the datasets including scientific references.

In the development of Qualinet DBs within the frame of COST Action IC1003, there are several milestones, which are listed in the timeline below:

  • March 2011 (1st Qualinet General Assembly (GA), Lisbon, Portugal), an initial list of multimedia databases collected and published internally for Qualinet members, creation of Web-based portal proposed,
  • September 2011 (2nd Qualinet GA, Brussels, Belgium), Qualinet DBs prototype portal introduced, development of publicly available resource initiated,
  • February 2012 (3rd Qualinet GA, Prague, Czech Republic), hosting of the Qualinet DBs platform under development at the Czech Technical University in Prague (http://dbq.multimediatech.cz/), Qualinet DBs Wiki page (http://dbq-wiki.multimediatech.cz/) introduced,
  • October 2012 (4th Qualinet GA, Zagreb, Croatia), White paper on Qualinet DBs published [3], Qualinet DBs v1.0 online platform released to the public,
  • March 2013 (5th Qualinet GA, Novi Sad, Serbia), Qualinet DBs v1.5 online platform published with extended functionality,
  • September 2013 (6th Qualinet GA, Novi Sad, Serbia), Qualinet DBs Information leaflet published, Task Force (TF) on Standardization and Dissemination established, QoMEX 2013 Dataset Track organized,
  • March 2014 (7th Qualinet GA, Berlin, Germany), ACM MMSys 2014 Dataset Track organized, liaison with Ecma International (https://www.ecma-international.org/) on possible standardization of Qualinet DBs subset established,
  • October 2014 (8th Final Qualinet GA and Workshop, Delft, The Netherlands), final development stage v3.00 of Qualinet DBs platform reached, code freeze.

Qualinet Databases became Qualinet’s primary resource for sharing datasets publicly to Qualinet members and after registration also to the broad scientific community. At the final Qualinet General Assembly under the COST Action IC1003 umbrella (October 2014, Delft, The Netherlands) it was concluded – also based on numerous testimonials – that Qualinet DBs is one of the major assets created throughout the project. Thus it was decided that the sustainability of this resource must be ensured for the years to come. Since 2015 the Qualinet DBs platform is being kept running with the effort of a newly established Task Force, TF4 Qualinet Databases (Leader: Karel Fliegel, Deputy Leaders: Lukáš Krasula, Werner Robitza). The status and achievements are being discussed regularly at Qualinet’s Annual Meetings collocated with QoMEX (International Conference on Quality of Multimedia Experience), i.e., 7th QoMEX 2015 (Costa Navarino, Greece), 8th QoMEX 2016 (Lisbon, Portugal), 9th QoMEX 2017 (Erfurt, Germany), 10th QoMEX 2018 (Sardinia, Italy), and 11th QoMEX 2019 (Berlin, Germany).

Current Status

The basic functionality of the Qualinet Databases online platform, see Figure 1, is based on the idea that registered users (Qualinet members and other interested users from the scientific community) have access through an easy-to-use Web portal providing a list of multimedia databases. Based on their user rights, they are allowed to browse information about the particular database and eventually download the actual multimedia content from the link provided by the database owner.

qualinetDatabaseInterface

Figure 1. Qualinet Databases online platform and its current interface.

Selected users – Database Owners in particular – have rights to upload or edit their records in the list of databases. Most of the multimedia databases have a flag of “Publicly Available” and are accessible to the registered users outside Qualinet. Only Administrators (Task Force leader and deputy leaders) have the right to delete records in the database. Qualinet DBs does not contain the actual multimedia content but only the access information with provided links to the dataset files saved at the server of the Database Owner.

The Qualinet DBs is accessible to all registered users after entering valid login data. Depending on the level of the rights assigned to the particular account, the user can browse the list of the databases with description (all registered users) and has access to the actual multimedia content via a link entered by the Database Owner. It provides the user with a powerful tool to find the multimedia database that best suits his/her needs.

In the list of databases user can select visible fields for the list in the User Settings, namely:

  • Database name, Institution, Qualinet Partner (Yes/No),
  • Link, Description (abstract), Access limitations, Publicly available (Yes/No), Copyright Agreement signed (Yes/No),
  • Citation, References, Copyright notice, Database usage tracking,
  • Content type, MOS (Yes/No), Other (Eye tracking, Sensory, …),
  • Total number of contents, SRC, HRC,
  • Subjective evaluation method (DSCQS, …), Number of ratings.

Fulltext search within the selected visible fields is available. In the current version of the Qualinet DBs, users can sort databases alphabetically based on the visible fields or use the search field as described above.

The list of databases allows:

  • Opening a card with details on particular database record (accessible to all users),
  • Editing database record (accessible to the database owners and administrators),
  • Deleting database record (accessible only to administrators),
  • Requesting deletion of a database record (accessible to the database owners),
  • Requesting assignment as the database owner (accessible to all users).

As for the records available in Qualinet DBs, the listed multimedia databases are a crucial resource for various tasks in multimedia signal processing. The Qualinet DBs is focused primarily on QoE research [1] related content, where, while designing objective quality assessment algorithms, it is necessary to perform (1) Verification of model during development, (2) Validation of model after development, and (2) Benchmarking of various models.

Annotated multimedia databases contain essential ground truth, that is, test material from the subjective experiment annotated with subjective ratings. Qualinet DBs also lists other material without subjective ratings for other kinds of experiments. Qualinet DBs covers mostly image and video datasets, including special contents (e.g., 3D, HDR) and data from subjective experiments, such as subjective quality ratings or visual attention data.

A timeline with statistics on the number of records and users registered in Qualinet DBs throughout the years can be seen in Figure 2. Throughout Qualinet COST Action IC1003 the number of registered datasets grew from 64 in March 2011 to 201 in October 2014. The number of datasets created by the Qualinet partner institutions grew from 30 in September 2011 to 83 in October 2014. The number of registered users increased from 37 in March 2013 to 222 in October 2014. After the end of COST Action IC1003 in November 2014 the number of datasets increased to 246 and the number of registered users to 491. The average yearly increase of registered users is approximately 56 users, which illustrates continuous interest and value of Qualinet DBs for the community.

Figure 2. Qualinet Databases statistics on the number of records and users.

Figure 2. Qualinet Databases statistics on the number of records and users.

Besides the Qualinet DBs online platform (http://dbq.multimediatech.cz/), there are also additional resources available for download via the Wiki page (http://dbq-wiki.multimediatech.cz) and Qualinet website (http://www.qualinet.eu/). Two documents are available: (1) “QUALINET Multimedia Databases v6.5” (May 28, 2017) with a detailed description of registered datasets, and “List of QUALINET Multimedia Databases v6.5” in a searchable spreadsheet with records as of May 28, 2017.

Plans

There are indicators – especially the number of registered users – showing that Qualinet DBs is a valuable resource for the community. However, the current platform as described above has not been updated since 2014, and there are several issues to be solved, such as the burden on one institution to host and maintain the system, possible instability and an obsolete interface, issues with the Wiki page and lack of a file repository. Moreover, in the current system, user registration is required. It is a very useful feature for usage tracking, ensuring database privacy, but at the same time, it can put some people off from using and adding new datasets, and it requires handling of personal data. There are also numerous obsolete links in Qualinet DBs, which is useful for the record, but the respective databases should be archived.

A proposal for a new platform for Qualinet DBs has been presented at the 13th Qualinet General Meeting in June 2019 (Berlin, Germany) and was subsequently supported by the assembly. The new platform is planned to be based on a Git repository so that the system will be open-source and text-based, and no database will be needed. The user-friendly interface is to be provided by a static website generator; the website itself will be hosted on GitHub. A similar approach has been successfully implemented for the VQEG Software & Tools (https://vqeg.github.io/software-tools/) web portal. Among the main advantages of the new platform are (1) easier access (i.e., fast performance with simple interface, no hosting fees and thus long term sustainability, no registration necessary and thus no entry barrier), (2) lower maintenance burden (i.e., minimal technical maintenance effort needed, easy code editing), and (3) future-proofness (i.e., databases are just text files with easy format conversion, and hosting can be done on any server).

On the other hand, the new platform will not support user registration and login, which is beneficial in order to prevent data privacy issues. Tracking of registered users will no longer be available, but database usage tracking is planned to be provided via, for example, Google Analytics. There are three levels of dataset availability in the current platform: (1) Publicly available dataset, (2) Information about dataset but data not available/available upon request, and (3) Not publicly available (e.g., Qualinet members only, not supported in the new platform). The migration of Qualinet DBs to the new platform is to be completed by mid-2020. Current data are to be checked and sanitized, and obsolete records moved to the archive.

Conclusions

Broad audiovisual contents with diverse characteristics, annotated with data from subjective experiments, is an enabling resource for research in multimedia signal processing, especially when QoE is considered. The availability of training and testing data becomes even more important nowadays, with ever-increasing utilization of machine learning approaches. Qualinet Databases helps to facilitate reproducible research in the field and has become a valuable resource for the community. 

References

  • [1] Le Callet, P., Möller, S., Perkis, A. Qualinet White Paper on Definitions of Quality of Experience, European Network on Quality of Experience in Multimedia Systems and Services (COST Action IC 1003), Lausanne, Switzerland, Version 1.2, March 2013. (http://www.qualinet.eu/images/stories/QoE_whitepaper_v1.2.pdf
  • [2] Winkler, S. Analysis of public image and video databases for quality assessment, IEEE Journal of Selected Topics in Signal Processing, 6(6):616-625, 2012. (https://doi.org/10.1109/JSTSP.2012.2215007)
  • [3] Fliegel, K., Timmerer, C. (eds.) WG4 Databases White Paper v1.5: QUALINET Multimedia Database enabling QoE Evaluations and Benchmarking, Prague/Klagenfurt, Czech Republic/Austria, Version 1.5, March 2013.
  • [4] Fliegel, K., Battisti, F., Carli, M., Gelautz, M., Krasula, L., Le Callet, P., Zlokolica, V. 3D Visual Content Datasets. In: Assunção P., Gotchev A. (eds) 3D Visual Content Creation, Coding and Delivery. Signals and Communication Technology, Springer, Cham, 2019. (https://doi.org/10.1007/978-3-319-77842-6_11)

NoteThe readers interested in active contribution to extending the success of Qualinet Databases are referred to Qualinet (http://www.qualinet.eu/) and invited to join its Task Force on Qualinet Databases via email reflector. To subscribe, please send an email to (dbq.wg4.qualinet-subscribe@listes.epfl.ch). This work was partially supported by the project No. GA17-05840S “Multicriteria optimization of shift-variant imaging system models” of the Czech Science Foundation.

Report from MMSYS 2019 – by Alia Sheikh

Alia Sheikh (@alteralias) is researching immersive and interactive content. At present she is interested in the narrative language of immersive environments and how stories can best be choreographed within them.

Being part of an international academic research community and actually meeting said international research community are not exactly the same thing it turns out. After attending the 2019 ACM MMSys conference this year, I have decided that leaving the office and actually meeting the people behind the research is very worth doing.

This year I was invited to give an overview presentation at ACM MMSys ’19, which was being hosted at the University of Massachusetts. The MMSys, NOSSDAV and MMVE (International Workshop on Immersive Mixed and Virtual Environment Systems) conferences happen back to back, in a different location each year. I was asked to talk about some of our team’s experiments in immersive storytelling at MMVE. This included our current work on lightfields and my work on directing attention in, and the cinematography of, immersive environments.

To be honest it wasn’t the most convenient time to decide to catch a plane to New York and then a train to Boston for a multi-day conference, but it felt like the right time to take a break from the office and find out what the rest of the community had been working on.

Fig.1: A picturesque scene from the wonderful University of Massachussetts Amherst campus

Fig.1: A picturesque scene from the wonderful University of Massachussetts Amherst campus

I arrived at Amherst the day before the conference and (along with another delegate who had taken the same bus) wandered the tranquil university grounds slightly lost before being rescued by the ever calm and cheerful Michael Zink. Michael is the chair of the MMSys organising committee and someone who later spent much of the conference introducing people with shared interests to each other – he appeared to know every delegate by name.

Once installed in my UMass hotel room, I proceeded to spend the evening on my usual pre-conference ritual: entirely rewriting my presentation.

As the timetable would have it, I was going to be the first speaker.

Fig 2: Attendees at MMSys 2019 taking their seats

Fig. 2: Attendees at MMSys 2019 taking their seats

Fig 3: Alia in full flow during our talk on day 1

Fig. 3: Alia in full flow during our talk on day 1

I don’t actually know why I do this to myself, but there is something about turning up to the event proper that gives you a sense of what will work for that particular audience, and Michael had given me a brilliantly concise snapshot of the type of delegate that MMSys attracts – highly motivated, expert on the nuts and bolts of how to get data to where it needs to be and likely to be interested in a big picture overview of how these systems can be used to create a meaningful human connection.

Using selected examples from our research, I put together a talk on how the experience of stories in high tech immersive environments differs from more traditional formats, but, once the language of immersive cinematography is properly understood, we find that we are able to create new narrative experiences that are both meaningful and emotionally rich.

The next morning I walked into an auditorium full of strangers filing in, gave my talk (I thought it went well?) and then sank happily into a plush red flip-seat chair safe in the knowledge that I was free to enjoy the rest of the event.

The next item was the keynote and easily one of the best talks I have ever experienced at a conference. Presented by Professor Nimesha Ranasinghe it was a masterclass in taking an interesting problem (how do we transmit a full sensory experience over a network?) And presenting it in such a way as to neatly break down and explain the science (we can electrically stimulate the tongue to recreate a taste!) while never losing sight of the inherent joy in working on the kind of science you dream of as a child (therefore electrified cutlery!).

Fig. 4: Professor Nimesha Ranasinghe during his talk on Multisensory experiences

Fig. 4: Professor Nimesha Ranasinghe during his talk on Multisensory experiences

Fig 5: Multisensory enhanced multimedia - experiences of the future ?

Fig. 5: Multisensory enhanced multimedia – experiences of the future ?

Fig6: Networking and some delicious lunch

Fig. 6: Networking and some delicious lunch

At lunch I discovered the benefit of having presented my talk early – I made a lot of friends with people who had specific questions about our work, and got a useful heads up on work they were presenting either in the afternoon’s long papers session or the poster session.

We all spent the evening at the welcome reception on the top floor of UMass Hotel, where we ate a huge variety of tiny, delicious cakes and got to know each other better. It was obvious that in some cases, researchers that might collaborate remotely all year, were able to use MMSys as an excellent opportunity to catch up. As a newcomer to this ACM conference however, I have to say that I found it a very welcoming event, and I met a lot of very friendly people many of them working on research that was entirely different to my own, but which seemed to offer an interesting insight or area of overlap.

I wasn’t surprised that I really enjoyed MMVE – virtual environments are very much my topic of interest right now. But I was delighted by how much of MMSys was entirely up my street. ACM MMSys provides a forum for researchers to present and share their latest research findings in multimedia systems, and the conference cuts across all media/data types to showcase the intersections and the interplay of approaches and solutions developed for different domains. This year, the work presented on how to best encode and transport mixed reality content, as well as predict head motion to better encode and deliver the part of a spherical panorama a viewer was likely to be looking at, was particularly interesting to me. I wondered whether comparing the predicted path of user attention to the desired path of user attention, would teach us how to better control a users attention within a panoramic scene, or whether peoples viewing patterns were simply too variable. In the Open Datasets & Software track, I was fascinated by one particular dataset: “ A Dataset of Eye Movements for the Children with Autism Spectrum Disorder”. This was a timely reminder for me that diversity within the audience needed to be catered for when designing multimedia systems, to avoid consigning sections of our audience to a substandard experience.

Of the demos, there were too many interesting ones to list, but I was hugely impressed by the demo for Multi-Sensor Capture and Network Processing for Virtual Reality Conferencing. This used cameras and Kinects to turn me into a point cloud and put a live 3D representation of my own physical body in a virtual space.A brilliantly simple and incredibly effective idea and I found myself sitting next to the people responsible for it at a talk later that day and discussing ways to optimise their data compression.

Despite wearing a headset that allowed me to see the other participants, I was still able to see and therefore use my own hands in the real world – even extending to picking up and using my phone.

Fig7: Trying out some cool demos during a bustling demo session

Fig. 7: Trying out some cool demos during a bustling demo session

Fig. 8: An example of the social media interaction from my "tweeting"

Fig. 8: An example of the social media interaction from my “tweeting”

Amusingly, I found that I was (virtually) sat next to a point-cloud of TNO researcher Omar Niamut which led to my favourite twitter exchange of the whole conference. I knew Omar from online, but we had never actually managed to meet in real life. Still, this was the most life-like digital incarnation yet!

I really should mention the Women’s and Diversity lunch event which (pleasingly) was attended by both men and women and offered some absolutely fascinating insights.

These included: the value of mentors over the course of a successful academic life, how a gender pay-gap is inextricably related to work family policies and steps that have successfully been taken by some countries and organisations to improve work-life balance for all genders.

It was incredibly refreshing to see these topics being discussed both scientifically and openly. The conversations I had with people afterwards as they opened up about their own experiences of work and parenthood, were among the most interesting I have ever had on the topic.

Another nice surprise – MMSys offers childcare grants available for conference attendees who are bringing small children to the conference and require on-site childcare or who incur extra expenses in leaving their children at home. It was very cheering to see that the Inclusion Policy did not stop at simply providing interesting talks, but also translated into specific inclusive action.

Fig. 9:  Women’s and Diversity lunch! What a wonderful initiative - well done MMSys and SIGMM

Fig. 9: Women’s and Diversity lunch! What a wonderful initiative – well done MMSys and SIGMM

I am delighted that I made the decision to attend MMSys. I had not realised that I was feeling somewhat detached from my peers and the academic research community in general, until I was put in an environment which contained a concentrated amount of interesting research, interesting researchers and an air of collaboration and sheer good will. It is easy to get tunnel vision when you are focused on your own little area of work, but every conversation I had at the conference reminded me that research does not happen in a vacuum.

Fig. 10: A fascinating talk at the  Women’s and Diversity lunch - it initiated great post event discussions!

Fig. 10: A fascinating talk at the Women’s and Diversity lunch – it initiated great post event discussions!

Fig. 11: The food truck experience - one of many wonderful social aspects to MMSys 2019

Fig. 11: The food truck experience – one of many wonderful social aspects to MMSys 2019

I could write a thousand more words about every interesting thing I saw or person I met at MMSys, but that would only give you my own specific experience of the conference. (I did live tweet* a lot of the talks and demos just for my own records and that can all be found here: https://twitter.com/Alteralias/status/1148546945859952640?s=20)

Fig. 12: Receiving the SIGMM Social Media Reporter Award for MMSys 2019!

Fig. 12: Receiving the SIGMM Social Media Reporter Award for MMSys 2019!

Whether you were someone I was sitting next to at a paper session, a person I spoke to standing next to in line at the food truck (one of the many sociable meal events) or someone who demoed their PhD work to me, thank you so much for sharing this event with me.

Maybe I will see you at MMSys 2020.

* p.s it turns out that if you live-tweet an entire conference, Niall gives you a Social Media Reporter award.

An Interview with Professor Susanne Boll

Describe your journey into research from your youth up to the present. What foundational lessons did you learn from this journey? 

My journey into research started with my interest in computers and computer science at school while I was still in my early years at that time. I liked all the STEM subjects and was very good at these in school. I got in touch with programming and the first Mac in high school when my physics teacher started the first basic programming course. After highschool, I continued on this journey and became a Mathematical-Technical1 Assistant and continued studying CS and went on to do a PhD, always driven by the desire that I could learn more, could explore and understand more of this field.

Why were you initially attracted to Multimedia? 

Susanne at

Susanne Boll at the beginning of her research career in 2001

I was initially attracted by multimedia when information systems started to look at novel methods of integrating large amounts of unstructured multimedia and different media types into structured database systems. I joined the GMD Institute for Integrated Publication and Information Systems who were working on multimedia database systems. My PhD was on multimedia document models for representing and replaying multimedia presentations in the context of multimedia information systems. One of the most inspiring early events was a small but very nice IFIP working conference on Database Semantics – Semantic Issues in Multimedia Systems in New Zealand 1999 where I met many researchers from the multimedia community some of whom I still consider my research friends today. I stayed in the field of multimedia but as my work was always relating to the applications of multimedia and the interaction with the user it was not surprising that I moved into the field of Human Computer Interaction and SIGCHI in which I am an active member also today. Over the last three decades I have worked in the field of interactive multimedia and human computer interaction – in different application domains from personal media to health, from mobility to industry 4.0. To cite a much valued friend of mine whom I just met again – “I enjoy when my research makes me smile”, when I can see how research can be translated in applications for a better use.

Why did I volunteer for the role of the director for diversity and outreach? 

Professor Susanne Boll in 2019

Professor Susanne Boll in 2019

Over more than three decades now I was supporting gender equality as a mentor, in different roles, in committees and institutions, by speaking up and by driving actions. Within the multimedia community I observed that there are many individuals supporting and acting for a better gender equality, however, it remained efforts of individuals and we as a community were not able to turn this into a collective understanding. 

There were actually a few recent events related to SIGMM that made truly sad and consider if I should leave this community which I at the same time consider my home community. Some years ago I was observing in a panel in which only men were discussing the future and challenges in multimedia. Observing this was painful for me. I knew and met with each of them individually over the years and they were interesting researchers and great mentors. But that panel it made again obvious that we as a community failed to be inclusive also with regard to the women. Why would there be not an excellent woman would have her say in that panel? Why would not someone organizing the panel consider to be inclusive with regard to gender? Why would not the panelists, when they are invited, ask who else would be on the panel and encourage this?

When I talk about gender equality in these days I almost immediately get the reaction that gender is not diversity. People say that looking at gender equality would be too short sighted and that I should care more about diversity and not gender alone. So let me clearly say that I am well aware that diversity is not gender it is much more than that.  But, don’t let the perfect be the enemy of the good. My personal story starts with gender equality in STEM fields. Looking at women participation in SIGMM, I decided that the actions described in the “25 in 25’’ strategy would be a good starting point for my new role – it is just the beginning.

What are my plans serving in this position?

Within SIGMM, we need to understand and fully embrace the different dimensions of diversity. We should not use the term in the sense of an easy cover-up of a multitude of aspects in which the individual needs get blurred. I sometimes have the feeling as if one aspect of diversity could be traded for another one, and the term was used as if there was a measure that there is “sufficient” diversity in some setting. 

As a  director for diversity and outreach I will be caring about the richness of diversity.  I want to bring the different dimensions of diversity into the multimedia community and make us understand, embrace listen and take action for better diversity and outreach of SIGMM.


1Mathematical-Technical Assistant (MaTA, MA or MTA for short; also: mathematical-technical software developer) is the occupational title of a recognised training occupation according to the Vocational Training Act in Germany, which has existed since the mid-1960s. It is the first non-academic training occupation in data processing.


Bios

Prof. Susanne Boll: 

Susanne Boll is a full professor for Media Informatics and Multimedia Systems at the University of Oldenburg and a member of the board of the OFFIS-Institute for Information Technology. OFFIS belongs to the top 5% research institutes among the non-university institutes in computer science in Germany. Over the last two decades, she has consistently achieved highly competitive research results in the field of multimedia and human–computer interaction. She has actively been driving these fields of research by many scientific research projects and organization of highly visible events in the field. Her scientific results have been published in competitive peer-reviewed international conferences such as Multimedia, CHI, MobileHCI, AutomotiveUI, DIS, and IDC, as well as internationally recognized journals. Her research makes competitive contributions to the field of human computer interaction and ubiquitous computing. Her research projects also have a strong connection to industry partners and application partners and addresses highly relevant challenges in the applications field of automation in transportation systems as well as health care technologies. I am an active member of the scientific community and have co-chaired and organized many international events in my field. Her teaching follows combination of theoretical foundations with team-oriented and research-oriented practical assignments.  She currently leads a highly visible international team of researchers (PhD students, research associates, post docs, senior principal scientists).


Opinion Column: Fairness, Accountability and Transparency (in Multimedia)

The inclusiveness and transparency of automatic information processing methods is a research topic that exhibited growing interest in the past years. In the era of digitized decision-making software where the push for artificial intelligence happens worldwide and at different strata of the socio-economic tissue, the consequences of biased, unexplainable and opaque methods for content analysis can be dramatic.

Several initiatives have raisen to address these issues in different communities. From 2014 to 2018, the FAT/ML workshop was co-located with the International Conference on Machine Learning. This year, the FATE/CV workshop (E standing for Ethics) was co-located with the International Conference on Computer Vision and Pattern Recognition. Similarly, the FAT/MM workshop is co-located with ACM Multimedia 2019. This initiatives, and specifically the FAT/ML series of workshop, converge to the birth of the ACM FAT* conference, having its first edition in New York in 2018, this years in Atlanta, and the third edition, next year in Barcelona.

ACM FAT* is a very recent interdisciplinary conference dedicated to bringing together a multidisciplinary community of researchers from computer science, law, social sciences, and humanities to investigate and tackle issues in this emerging area. The focus of the conference is not limited to technological solutions regarding potential bias, but also to address the question of whether decisions should be outsourced to data- and code-driven computing systems. This question is very timely given the impressive number of algorithmic systems (adopted in a growing number of contexts) fueled by big data. These systems aim to filter, sort, score, recommend, personalize, and shape human experience. They increasingly make/inform decisions with major impact on credit, insurance, healthcare, and immigration, to cite a few key fields with inherent critical risks.

In this context, we believe that the multimedia community should put together the necessary efforts in the same direction, investigating how to transform the current technical tools and methodologies to derive computational models that are transparent and inclusive. Information processing is one of the fundamental pillars of multimedia, it does not matter whether data is processed for content delivery, experience or systems applications, the automatic analysis of content is used in every corner of our community. Typical risks of large-scale computational models include model bias and algorithmic discrimination. These risks become particularly prominent in the multimedia field, which historically has been focusing on user-centered technologies. This is why it is crucial to start bringing the notion of fairness, accountability and transparency into ACM Multimedia.

ACM Multimedia 2019 in Nice will benefit from mainly two initiatives to start melting with the trend of Fairness, Accountability and Transparency. First, one of the workshops co-located with ACM Multimedia 2019 (as mentioned above) will deal with Fairness, Accountability and Transparency in Multimedia (FAT/MM, held on October 27th). The FAT/MM workshop is the first attempt to foster research efforts that focus on addressing fairness, accountability and transparency issues in the Multimedia field. To ensure a healthy and constructive development of the best multimedia technologies, this workshop offers a space to discuss how to develop fair, unbiased, representative, and transparent multimedia models, bringing together researchers from different areas to present computational solutions to these issues.

Second, one of the two selected Conference Ambassadors of SIGMM for 2019 attended the FATE/CV workshop at CVPR earlier this year, identified a speaker that could be of great interest for the Multimedia field, and invited them to FAT/MM to meet and discuss with the Multimedia community. The paper selected covers topics such as age bias in datasets and the impact this could have in real-world applications, such as autonomous driving or recommendation systems.

We hope that, by organising and getting strongly involved in these two initiatives, we can raise awareness within our community, and finally come to create a group of researchers interested in analysing and solving potential issues associated to fairness, accountability and transparency in multimedia.