Authors:
Bart Thomee - Google, San Bruno, CA, USA
Michael Riegler - Simula Metropolitan Center for Digital Engineering, Oslo, Norway
Francesca de Simone - CWI, Amsterdam, Netherlands
Gwendal Simon - IMT Atlantique, Rennes, France
Editors: Martha Larson and Bart Thomee
This column discusses the efforts of ACM SIGMM towards sharing and reproducibility. Apart from the specific sessions dedicated to open source and datasets, ACM Multimedia Systems started to provide official ACM badges for articles that make artifacts available since last year. This year, it has marked a record with 45% of the articles acquiring such a badge.
Without data it is impossible to put theories to the test. Moreover, without running code it is tedious at best to (re)produce and evaluate any results. Yet collecting data and writing code can be a road full of pitfalls, ranging from datasets containing copyrighted materials to algorithms containing bugs. The ideal datasets and software packages are those that are open and transparent for the world to look at, inspect, and use without or with limited restrictions. Such “artifacts” make it possible to establish public consensus on their correctness or otherwise to start a dialogue on how to fix any identified problems.
In our interconnected world, storing and sharing information has never been easier. Despite the temptation for researchers to keep datasets and software to themselves, a growing number are willing to share their resources with others. To further promote this sharing behavior, conferences, workshops, publishers, non-profit and even for-profit companies are increasingly recognizing and supporting these efforts. For example, the ACM Multimedia conference has hosted an open source software competition since 2004, and the ACM Multimedia Systems conference has included an open datasets and software track since 2011 . The ACM Digital Library now also hands out badges to public artifacts that have been made available and optionally reviewed and verified by members of the community. At the same time, organizations such as Zenodo and Amazon host open datasets for free. Sharing ultimately pays off: the citation statistics for ACM Multimedia Systems conferences over the past five years, for example, show that half of the 20 most cited papers shared data and code although they have represented a small fraction of the published papers so far.
Good practices are increasingly adopted. In this year’s edition of the ACM Multimedia Systems conference, 69 works (papers, demos, datasets, software) were accepted, out of which 31 (45%) were awarded an ACM badge. This is a large increase compared to last year, when out of 42 works only a total of 13 (31%) received one. This greatly expands one of the core objectives of both the conference and SIGMM towards open science. At this moment, the ACM Digital Library does not separately index which papers received a badge, making it challenging to find all papers who have one. It further appears not many other ACM conferences are aware of the badges yet; for example, while ACM Multimedia accepted 16 open source papers in 2016 and 6 papers in 2017, none applied for a badge. This year at ACM Multimedia Systems only “artifacts available” badges have been awarded. For next year our intention is to ensure all dataset and software submissions receive the “artifacts evaluated” badge. This would require several committed community members to spend time working with the authors to get the artifacts running on all major platforms with corresponding detailed documentation.
The accepted artifacts this year are diverse in nature: several submissions focus on releasing artifacts related to quality of experience of (mobile/wireless) streaming video, while others center on making datasets and tools related to images, videos, speech, sensors, and events available; in addition, there are a number of contributions in the medical domain. It is great to see such a range of interests in our community!