Report from the MMM 2020 Special Session on Multimedia Datasets for Repeatable Experimentation (MDRE 2020)

Aaron Duane is a postdoctoral researcher at the School of Computing at Dublin City University, Ireland and an expert in human-computer interaction. His research focuses on large-scale data analytics, especially in the context of personal data and lifelogging.

Björn Þór Jónsson (http://www.itu.dk/people/bjth/) is an associate professor in the Computer Science Department at the IT University of Copenhagen, Denmark. His research focuses on interactivity and scalability of multimedia analytics and multimedia retrieval applications.

Cathal Gurrin (http://www.computing.dcu.ie/~cgurrin/) is an associate professor at the School of Computing, at Dublin City University, Ireland and a principal co-investigator at the Insight Centre for Data Analytics. His research focuses on multimedia information retrieval and personal data analytics, with special emphasis on lifelogging applications.

Klaus Schöffmann (https://www.KlausSchoeffmann.com/) is an associate professor at the Institute of Information Technology (ITEC) at Klagenfurt University. His research focuses on medical video analysis, multimedia retrieval, machine learning, and interactive video search.

Introduction

Information retrieval and multimedia content access have a long history of comparative evaluation, and many of the advances in the area over the past decade can be attributed to the availability of open datasets that support comparative and repeatable experimentation. Hence, sharing data and code to allow other researchers to replicate research results is needed in the multimedia modeling field, as it helps to improve the performance of systems and the reproducibility of published papers.

This report summarizes the special session on Multimedia Datasets for Repeatable Experimentation (MDRE 2020), which was organized at the 26th International Conference on MultiMedia Modeling (MMM 2020), held in January 2020 in Daejeon, South Korea.

The intent of these special sessions is to be a venue for releasing datasets to the multimedia community and discussing dataset related issues. The presentation mode in 2020 was to have short presentations (approximately 8 minutes), followed by a panel discussion moderated by Aaron Duane. In the following we summarize the special session, including its talks, questions, and discussions.

Presentations

GLENDA: Gynecologic Laparoscopy Endometriosis Dataset

The session began with a presentation on ‘GLENDA: Gynecologic Laparoscopy Endometriosis Dataset’ [1], given by Andreas Leibetseder from the University of Klagenfurt. The researchers worked with experts on gynecologic laparoscopy, a type of minimally invasive surgery (MIS), that is performed via a live feed of a patient’s abdomen to survey the insertion and handling of various instruments for conducting medical treatments. Adopting this kind of surgical intervention not only facilitates a great variety of treatments but also the possibility of recording such video streams is essential for numerous post-surgical activities, such as treatment planning, case documentation and education. The process of manually analyzing these surgical recordings, as it is carried out in current practice, usually proves tediously time-consuming. In order to improve upon this situation, more sophisticated computer vision as well as machine learning approaches are actively being developed. Since most of these approaches rely heavily on sample data that, especially in the medical field, is only sparsely available, the researchers published the Gynecologic Laparoscopy ENdometriosis DAtaset (GLENDA) – an image dataset containing region-based annotations of a common medical condition called endometriosis.

Endometriosis is a disorder involving the dislocation of uterine-like tissue. Andreas explained that this dataset is the first of its kind and was created in collaboration with leading medical experts in the field. GLENDA contains over 25K images, about half of which are pathological, i.e., showing endometriosis, and the other half non-pathological, i.e., containing no visible endometriosis. The accompanying paper thoroughly described the data collection process, the dataset’s properties and structure, while also discussing its limitations. The authors plan on continuously extending GLENDA, including the addition of other relevant categories and ultimately lesion severities. Furthermore, they are in the process of collecting specific ”endometriosis suspicion” class annotations in all categories for capturing a common situation where at times it proves difficult, even for endometriosis specialists, to classify the anomaly without further inspection. The difficulty in classification may be due to several reasons, such as visible video artifacts. Including such challenging examples in the dataset may greatly improve the quality of endometriosis classifiers.

Kvasir-SEG: A Segmented Polyp Dataset

The second presentation was given by Debesh Jha from the Simula Research Laboratory, who introduced the work entitled ‘Kvasir-SEG: A Segmented Polyp Dataset’ [2]. Debesh explained that pixel-wise image segmentation is a highly demanding task in medical image analysis. Similar to the aforementioned GLENDA dataset, it is difficult to find annotated medical images with corresponding segmentation masks in practice. The Kvasir-SEG dataset is an open-access corpus of gastrointestinal polyp images and corresponding segmentation masks, which has been further manually annotated and verified by an experienced gastroenterologist. The researchers demonstrated the use of their dataset with both a traditional segmentation approach and a modern deep learning-based CNN approach. In addition to presenting the Kvasir-SEG dataset, Debesh also discussed the FCM clustering algorithm and the ResUNet-based approach for automatic polyp segmentation they presented in their paper. The results show that the ResUNet model was superior to FCM clustering.

The researchers released the Kvasir-SEG dataset as an open-source dataset to the multimedia and medical research communities, in the hope that it can help evaluate and compare existing and future computer vision methods. By adding segmentation masks to the Kvasir dataset, which until today only consisted of framewise annotations, the authors have enabled multimedia and computer vision researchers to contribute in the field of polyp segmentation and automatic analysis of colonoscopy videos. This could boost the performance of other computer vision methods and may be an important step towards building clinically acceptable CAI methods for improved patient care.

Rethinking the Test Collection Methodology for Personal Self-Tracking Data

The third presentation was given by Cathal Gurrin from Dublin City University and was titled ‘Rethinking the Test Collection Methodology for Personal Self-Tracking Data’ [3]. Cathal argued that, although vast volumes of personal data are being gathered daily by individuals, the MMM community has not really been tackling the challenge of developing novel retrieval algorithms for this data, due to the challenges of getting access to the data in the first place. While initial efforts have taken place on a small scale, it is their conjecture that a new evaluation paradigm is required in order to make progress in analysing, modeling and retrieving from personal data archives. In their position paper, the researchers proposed a new model of Evaluation-as-a-Service that re-imagines the test collection methodology for personal multimedia data in order to address the many challenges of releasing test collections of personal multimedia data.

After providing a detailed overview of prior research on the creation and use of self-tracking data for research, the authors identified issues that emerge when creating test collections of self-tracking data as commonly used by shared evaluation campaigns. This includes in particular the challenge of finding self-trackers willing to share their data, legal constraints that require expensive data preparation and cleaning before a potential release to the public, as well as ethical considerations. The Evaluation-as-a-Service model is a novel evaluation paradigm meant to address these challenges by enabling collaborative research on personal self-tracking data. The model relies on the idea of a central data infrastructure that guarantees full protection of the data, while at the same time allowing algorithms to operate on this protected data. Cathal highlighted the importance of data banks in this scenario. Finally, he briefly outlined technical aspects that would allow setting up a shared evaluation campaign on self-tracking data.

Experiences and Insights from the Collection of a Novel Multimedia EEG Dataset

The final presentation of the session was also provided by Cathal Gurrin from Dublin City University in which he introduced the topic ‘Experiences and Insights from the Collection of a Novel Multimedia EEG Dataset’ [4]. This work described how there is a growing interest in utilising novel signal sources such as EEG (Electroencephalography) in multimedia research. When using such signals, subtle limitations are often not readily apparent without significant domain expertise. Multimedia research outputs incorporating EEG signals can fail to be replicated when only minor modifications have been made to an experiment or seemingly unimportant (or unstated) details are changed. Cathal claimed that this can lead to over-optimistic or over-pessimistic viewpoints on the potential real-world utility of these signals in multimedia research activities.

In their paper, the researchers described the EEG/MM dataset and presented a summary of distilled experiences and knowledge gained during the preparation (and utilisation) of the dataset that supported a collaborative neural-image labelling benchmarking task. They stated that the goal of this task was to collaboratively identify machine learning approaches that would support the use of EEG signals in areas such as image labelling and multimedia modeling or retrieval. The researchers stressed that this research is relevant for the multimedia community as it suggests a template experimental paradigm (along with datasets and a baseline system) upon which researchers can explore multimedia image labelling using a brain-computer interface. In addition, the paper provided insights and experience of commonly encountered issues (and useful signals) when conducting research that utilises EEG in multimedia contexts. Finally, this work provided insight on how an EEG dataset can be used to support a collaborative neural-image labelling benchmarking task.

Discussion

After the presentations, Aaron Duane moderated a panel discussion in which all presenters participated, as well as Björn Þór Jónsson who joined the panel as one of the special session chairs.

The panel began with a question about how the research community should address data anonymity in large multimedia datasets and how, even if the dataset is isolated and anonymised, data analysis techniques can be utilised to reverse this process either partially or completely. The panel agreed this was an important question and acknowledged that there is no simple answer. Cathal Gurrin stated that there is less of a restrictive onus on the datasets used for such research because the owners of the dataset often provide it with full knowledge of how it will be used.

As a follow up, the questioner asked the panel about GDPR compliancy in this context and the fact that uploaders could potentially change their minds about allowing their datasets to be used in research several years after it was released. The panel acknowledged this remains an open concern and even expanded on such concerns by presenting an additional concern, namely the malicious uploading of data without the consent of the owner. One solution to this which was provided by the panel was the introduction of an additional layer of security in the form of a human curator who could review the security and privacy concerns of a dataset during its generation, as is the case with some datasets of personal data currently under release to the community.

The discussion continued with much interest continuing to be directed toward effective privacy in datasets, especially when dealing with personal data, such as those generated by lifeloggers. One audience member recalled a story where a personal dataset was publicly released and individuals were able to garner personal information about individuals who were not the original uploader of the dataset and who did not consent to their face or personal information being publicly released. Cathal and Björn acknowledged that this remains an issue but drew attention to advanced censoring techniques such as automatic face blurring which is rapidly maturing in the domain. Furthermore, they claimed that the proposed model of Evaluation-as-a-Service discussed in Cathal’s earlier presentation could help to further alleviate some of these concerns.

Steering the conversation away from exclusively dealing with data privacy concerns, Aaron directed a question at Debesh and Andreas regarding the challenges and limitations associated with working directly with medical professionals to generate their datasets related to medical disorders. Debesh stated that there were numerous challenges such as the medical professionals being unfamiliar with the tools used in the generation of this work and that in many cases circumstances required multiple medical professionals and their opinion as they would often disagree. This generated significant technical and administrative overhead for the researchers and their work which resulted in a tedious speed of progress. Andreas stated that such issues were identical for him and his colleagues and highlighted the importance of effective communication between the medical experts and the technical researchers.

Towards the end of the discussion, the panel discussed the concept of encouraging the release of more large-scale multimedia datasets for experimentation and what challenges are currently associated with that. The panel responded that the process remains difficult but having special sessions such as this are very helpful. The recognition of papers associated with multimedia datasets is becoming increasingly apparent with many exceptional papers earning hundreds of citations within the community. The panel also stated that we should be mindful of the nature of each dataset as releasing the same type of dataset, again and again, is not beneficial and has the potential to do more harm than good.

Conclusions

The MDRE special session, in its second incarnation at MMM 2020, was organised to facilitate the publication of high-quality datasets, and for community discussions on the methodology of dataset creation. The creation of reliable and shareable research artifacts, such as datasets with reliable ground truths, usually represents tremendous effort; effort that is rarely valued by publication venues, funding agencies or research institutions. In turn, this leads many researchers to focus on short-term research goals, with an emphasis on improving results on existing and often outdated datasets by small margins, rather than boldly venturing where no researchers have gone before. Overall, we believe that more emphasis on reliable and reproducible results would serve our community well, and the MDRE special session is a small effort towards that goal.

Acknowledgements

The session was organized by the authors of the report, in collaboration with Duc-Tien Dang-Nguyen (Dublin City University), who could not attend MMM. The panel format of the special session made the discussions much more engaging than that of a traditional special session. We would like to thank the presenters, and their co-authors for their excellent contributions, as well as the members of the audience who contributed greatly to the session.

References

[1] Leibetseder A., Kletz S., Schoeffmann K., Keckstein S., and Keckstein J. “GLENDA: Gynecologic Laparoscopy Endometriosis Dataset.” In: Cheng WH. et al. (eds) MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science, vol. 11962, 2020. Springer, Cham. https://doi.org/10.1007/978-3-030-37734-2_36.
[2] Jha D., Smedsrud P.H., Riegler M.A., Halvorsen P., De Lange T., Johansen D., and Johansen H.D. “Kvasir-SEG: A Segmented Polyp Dataset.” In: Cheng WH. et al. (eds) MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science, vol. 11962, 2020. Springer, Cham. https://doi.org/10.1007/978-3-030-37734-2_37.
[3] Hopfgartner F., Gurrin C., and Joho H. “Rethinking the Test Collection Methodology for Personal Self-tracking Data.” In: Cheng WH. et al. (eds) MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science, vol. 11962, 2020. Springer, Cham. https://doi.org/10.1007/978-3-030-37734-2_38.
[4] Healy G., Wang Z., Ward T., Smeaton A., and Gurrin C. “Experiences and Insights from the Collection of a Novel Multimedia EEG Dataset.” In: Cheng WH. et al. (eds) MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science, vol. 11962, 2020. Springer, Cham. https://doi.org/10.1007/978-3-030-37734-2_39.