Editors: Mihai Gabriel Constantin (National University of Science and Technology Politehnica Bucharest, Romania), Karel Fliegel (Czech Technical University in Prague, Czech Republic), Maria Torres Vega (KU Leuven, Belgium)
In this final part of the Overview of Open Dataset Sessions and Benchmarking Competitions we are focusing on the latest editions of some of the most popular multimedia-centric benchmarking competitions, continuing our reviews from previous years (https://records.sigmm.org/2023/01/19/overview-of-open-dataset-sessions-and-benchmarking-competitions-in-2022-part-3/). This third part of our review focuses on two benchmarking competitions:
- MediaEval 2023 (https://multimediaeval.github.io/editions/2023/). We present the five benchmarking tasks, which target a wide range of topics, including medical multimedia applications (Medico), multimodal understanding of smells (Musti), multimodal content in news media (NewsImages), social media video memorability (Memorability), and sports action classification (SportsVideo).
- ImageCLEF 2024 (https://www.imageclef.org/2024). This edition of ImageCLEF targets a wide range of tasks, covering four different medical-focused tasks (medical captions, Visual Question Answering, remote medicine, and GANs in medical scenarios), recommendation systems for editorials, image retrieval and generation, and pictogram generation from textual information.
For an overview of the QoMEX 2023 and QoMEX 2024 conferences, please see the first part of this column (https://records.sigmm.org/2024/09/07/overview-of-open-dataset-sessions-and-benchmarking-competitions-in-2023-2024-part-1-qomex-2023-and-qomex-2024/), while an overview of MDRE special sessions at MMM2023 and 2024 please take a look at the second part of this column (https://records.sigmm.org/2024/11/19/overview-of-open-dataset-sessions-and-benchmarking-competitions-in-2023-2024-part-2-mdre-at-mmm-2023-and-mmm-2024/).
MediaEval 2023
The MediaEval Multimedia Evaluation benchmark (https://multimediaeval.github.io/) benchmark offers challenges in artificial intelligence for multimedia data, tasking participants in benchmarking tasks centered around retrieval, classification, generation, analysis, and exploration of multimodal data. The latest editions of MediaEval also wish to delve deeper into understanding the data, trends, and system performance, by proposing a set of Quest for Insight (Q4I) questions and themes for each task. A column signed by the Coordination Committee of the latest MediaEval edition, outlying MediaEval’s history, impressions from the latest edition, and plans for the future is published in the October 2024 edition of our records (https://records.sigmm.org/2024/11/15/one-benchmarking-cycle-wraps-up-and-the-next-ramps-up-news-from-the-mediaeval-multimedia-benchmark/). MediaEval 2023 (https://multimediaeval.github.io/editions/2023/) was held between 1-2 February 2024, Collocated with MMM 2024 in Amsterdam, Netherlands, and the Coordination Committee was composed of Mihai Gabriel Constantin (University Politehnica of Bucharest, Romania), Steven Hicks, (SimulaMet, Norway), and Martha Larson (Radboud University, Netherlands) as the main coordinator.
Medical Multimedia Task – Transparent Tracking of Spermatozoa
Paper available at: https://ceur-ws.org/Vol-3658/paper1.pdf
Vajira Thambawita, Andrea Storås, Tuan-Luc Huynh, Hai-Dang Nguyen, Minh-Triet Tran, Trung-Nghia Le, Pål Halvorsen, Michael Riegler, Steven Hicks, Thien-Phuc Tran
SimulaMet, Norway, OsloMet, Norway, University of Science, VNU-HCM, Vietnam, Vietnam National University, Ho Chi Minh City, Vietnam
Dataset available at: https://multimediaeval.github.io/editions/2023/tasks/medico/
The Medico task provides a set of spermatozoa videos, tracked with a set of frame-by-frame bounding box annotations, tasking participants with the prediction of standard sperm quality assessment measurements, specifically the motility (movement) of spermatozoa (living sperm cells).
Musti: Multimodal Understanding of Smells in Texts and Images
Paper available at: https://ceur-ws.org/Vol-3658/paper34.pdf
Ali Hürriyetoğlu, Inna Novalija, Mathias Zinnen, Vincent Christlein, Pasquale Lisena, Stefano Menini, Marieke van Erp, Raphael Troncy
KNAW Humanities Cluster, DHLab, Jožef Stefan Institute, Slovenia, Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, EURECOM, Sophia Antipolis, France, Fondazione Bruno Kessler, Trento, Italy
Dataset available at: https://multimediaeval.github.io/editions/2023/tasks/musti/
Musti is an innovative task, seeking to understand the descriptions and depictions of smells in multilingual texts (English, German, Italian, French, Slovenian) and images from the 17th to the 20th century. Participants must create systems that recognize references to smells in texts and images, connecting these references across different modalities.
NewsImages: Connecting Text and Images
Paper available at: https://ceur-ws.org/Vol-3658/paper4.pdf
Andreas Lommatzsch, Benjamin Kille, Özlem Özgöbek, Mehdi Elahi, Duc Tien Dang Nguyen
Technische Universität Berlin, Berlin, Germany, Norwegian University of Science and Technology, Trondheim, Norway, University of Bergen, Bergen, Norway.
Dataset available at: https://multimediaeval.github.io/editions/2023/tasks/newsimages/
In this edition of the NewsImages task participants are encouraged to discover patterns and models that describe the relation between images and texts of news articles, body of articles, and their headlines.
Predicting Video Memorability
Paper available at: https://ceur-ws.org/Vol-3658/paper2.pdf
Mihai Gabriel Constantin, Claire-Hélène Demarty, Camilo Fosco, Alba García Seco de Herrera, Sebastian Halder, Graham Healy, Bogdan Ionescu, Ana Matran-Fernandez, Rukiye Savran Kiziltepe, Alan F. Smeaton, Lorin Sweeney
University Politehnica of Bucharest, Romania, InterDigital, France, Massachusetts Institute of Technology Cambridge, USA, University of Essex, UK, Dublin City University, Ireland, Karadeniz Technical University, Turkey
Dataset available at: https://multimediaeval.github.io/editions/2023/tasks/memorability/
The organizers propose a dataset that studies the long-term memorability of social media-like videos, providing participants with an extensive data set of videos with memorability annotations, related information, pre-extracted state-of-the-art visual features, and Electroencephalography (EEG) recordings.
SportsVideo: Fine Grained Action Classification and Position Detection in Table Tennis and Swimming Videos
Paper available at: https://ceur-ws.org/Vol-3658/paper3.pdf
Aymeric Erades, Pierre-Etienne Martin, Romain Vuillemot, Boris Mansencal, Renaud Peteri, Julien Morlier, Stefan Duffner, Jenny Benois-Pineau
Ecole Centrale de Lyon, LIRIS, France, CCP Department, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany, University of Bordeaux, Labri, France, INSA Lyon, LIRIS, France
Dataset available at: https://multimediaeval.github.io/editions/2023/tasks/sportsvideo/
The organizers developed a set of six sub-tasks covering table tennis and swimming, related to athlete position detection, stroke detection, the classification of motions, field or table registration, sound detection in sports, and scores and result extraction from visual cues.
ImageCLEF 2024
ImageCLEF (https://www.imageclef.org/) is part of the popular CLEF initiative (https://www.clef-initiative.eu/), and states as its main goal the evaluation of technologies for annotation, indexing, classification, and retrieval of multimodal data. The 2024 edition of ImageCLEF (https://www.imageclef.org/2024) was organized between the 9-12 September 2024, in Grenoble, France, with an Organization Committee composed of Bogdan Ionescu, Henning Müller, Ana-Maria Drăgulinescu, Ivan Eggel, and Liviu-Daniel Ștefan.
ImageCLEFmedical Caption
Paper available at: https://ceur-ws.org/Vol-3740/paper-132.pdf
Johannes Rückert, Asma Ben Abacha, Alba G. Seco de Herrera, Louise Bloch, Raphael Brüngel, Ahmad Idrissi-Yaghir, Henning Schäfer, Benjamin Bracke, Hendrik Damm, Tabea M. G. Pakull, Cynthia Sabrina Schmidt, Henning Müller, Christoph M. Friedrich
Department of Computer Science, University of Applied Sciences and Arts Dortmund, Dortmund, Germany, Microsoft, Redmond, Washington, USA, University of Essex, UK, UNED, Spain, Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), University Hospital Essen, Germany, Institute for Artificial Intelligence in Medicine (IKIM), University Hospital Essen, Germany, Institute for Transfusion Medicine, University Hospital Essen, Essen, Germany, University of Applied Sciences Western Switzerland (HES-SO), Switzerland, University of Geneva, Switzerland,
Dataset available at: https://www.imageclef.org/2024/medical/caption
The medical caption task focuses on evaluating models that detect medical concepts and automatically create captions for medical images, which can be further applied for context-based image and information retrieval purposes.
ImageCLEFmed VQA
Paper available at: https://ceur-ws.org/Vol-3740/paper-131.pdf
Steven Hicks, Andrea Storås, Pål Halvorsen, Michael Riegler, Vajira Thambawita
SimulaMet, Oslo, Norway, OsloMet- Oslo Metropolitan University, Oslo, Norway
Dataset available at: https://www.imageclef.org/2024/medical/vqa
This edition of the medical VQA task focuses on images of the gastrointestinal tract, tasking participants with directing the power of artificial intelligence to generate medical images based on text input, while also looking at optimal prompts for off-the-shelf generative models, thus augmenting the datasets associated with the previous edition of this task.
ImageCLEFmed MEDIQA-MAGIC
Paper available at: https://ceur-ws.org/Vol-3740/paper-133.pdf
Wen-Wai Yim, Asma Ben Abacha, Yujuan Fu, Zhaoyi Sun, Meliha Yetisgen, Fei Xia
Microsoft Health AI, Redmond, USA, University of Washington, Seattle, USA.
Dataset available at: https://www.imageclef.org/2024/medical/mediqa
The MEDIQA task focuses on the problem of Multimodal And Generative TelemedICine (MAGIC) in the area of dermatology. Participants must develop systems that can take queries, text, clinical context, and images as input and generate appropriate medical textual responses to this input in a telemedicine setting.
ImageCLEFmed GANs
Paper available at: https://ceur-ws.org/Vol-3740/paper-130.pdf
Alexandra-Georgiana Andrei, Ahmedkhan Radzhabov, Dzmitry Karpenka, Yuri Prokopchuk, Vassili Kovalev, Bogdan Ionescu, Henning Müller
AI Multimedia Lab, National University of Science and Technology Politehnica Bucharest, Romania, Belarusian Academy of Sciences, Minsk, Belarus, University of Applied Sciences Western Switzerland (HES-SO), Sierre, Switzerland.
Dataset available at: https://www.imageclef.org/2024/medical/gans
This task addresses the challenges of privacy preservation in artificially generated medical images, looking for “fingerprints” of the original real-world training images in a set of artificially generated images, fingerprints that may break patient privacy when exposed in unwanted or unforeseen circumstances.
ImageCLEFrecommending
Alexandru Stan, George Ioannidis, Bogdan Ionescu, Hugo Manguinhas
IN2 Digital Innovations, Germany, Politehnica University of Bucharest, Romania, Europeana Foundation, Netherlands,
Dataset available at: https://www.imageclef.org/2024/recommending
This task identifies traditional multimedia search methods as a performance bottleneck and proposes the development of recommendation methods and systems applied to blog posts, editorials, and galleries, targeting data related to cultural heritage organizations and collections.
Image Retrieval for Arguments (part of Touché at CLEF)
Paper available at: https://ceur-ws.org/Vol-3740/paper-322.pdf
Johannes Kiesel, Çağrı Çöltekin, Maximilian Heinrich, Maik Fröbe, Milad Alshomary, Bertrand De Longueville, Tomaž Erjavec, Nicolas Handke, Matyáš Kopp, Nikola Ljubešić, Katja Meden, Nailia Mirzakhmedova, Vaidas Morkevičius, Theresa Reitis-Münstermann, Mario Scharfbillig, Nicolas Stefanovitch, Henning Wachsmuth, Martin Potthast, Benno Stein
Bauhaus-Universität Weimar, University of Tübingen, Friedrich-Schiller-Universität Jena, Leibniz University Hannover, European Commission, Joint Research Centre (JRC), Jožef Stefan Institute, Leipzig University, Charles University, Kaunas University of Technology, Arcadia Sistemi Informativi Territoriali, University of Kassel, hessian.AI, and ScaDS.AI
Dataset available at: https://www.imageclef.org/2024/image-retrieval-for-arguments
The goal for this task is the retrieval of images and data that can increase the persuasiveness of an argument, building upon the datasets of topics developed in previous editions of the Touché task.
ImageCLEF ToPicto
Cécile Macaire, Benjamin Lecouteux, Didier Schwab, Emmanuelle Esperança-Rodier
Université Grenoble Alpes, LIG, France
Dataset available at: https://www.imageclef.org/2023/topicto
Targeting the alleviation of symptoms related to diseases that create language impairment problems, the ToPicto task proposes the development of automated systems that translate text or speech to visual pictograms, that can then be used as communication aids and tools.