Call for Multimedia Grand Challenge Solutions

Chairs of the ACM Multimedia 2013 Grand Challenge:
Neil O’Hare (Yahoo!, Spain) and
Yiannis Kompatsiaris (CERTH, Greece)

The Grand Challenges are originally found on the ACM MM 2013 conference webpages..


The Multimedia Grand Challenge presents a set of problems and issues from industry leaders, geared to engage the Multimedia research community in solving relevant, interesting and challenging questions about the industry’s 3-5 year vision for multimedia.
The Multimedia Grand Challenge was first presented as part of ACM Multimedia 2009 and has established itself as a prestigious competition in the multimedia community. This year’s conference will continue the tradition with by repeating previous challenges, and by introducing brand new challenges.


NHK Where is beauty? Grand Challenge

Scene Evaluation based on Aesthetic Quality

Automatic understanding of viewer’s impressions from image or video sequences is a very difficult task, but an interesting theme for study. Therefore, more and more researchers have investigated this theme recently. To achieve automatic understanding, various elemental features or techniques need to be used in a comprehensive manner, such as the balance of color or contrast, composition, audio, object recognition, and object motion. In addition, we might have to consider not only image features but also semantic features.

The task NHK sets is “Where is Beauty?”, which aims at automatically recognizing beautiful scenes in a set of video sequences. The important point of this task is “how to evaluate beauty using an engineering approach”, which is a challenging task involving human feelings. We will provide participants with approx. 1,000 clips of raw broadcast video footage, containing various categories such as creatures, landscape, and CGI. These video clips last about 1 min. Participants will have to evaluate the beautifulness of these videos automatically, and rank them in terms of beauty.

The proposed method will be evaluated on the basis of its originality and accuracy. We expect that participants will consider a diverse range of beauty, not only the balance of color but also composition, motion, audio, and other brand new features! The reliability and the diversity of the extracted beauty will be scored by using manually annotated data. In addition, if a short video composed of the highly ranked videos is submitted, it will be included in the evaluation.

More details

Technicolor – Rich Multimedia Retrieval from Input Videos Grand Challenge

Visual search that aims at retrieving copies of an image as well as information on a specific object, person or place in this image has progressed dramatically in the past few years. Thanks to modern techniques for large scale image description, indexing and matching, such an image-based information retrieval can be conducted either in a structured image database for a given topic (e.g., photos in a collection, paintings, book covers, monuments) or in an unstructured image database which is weakly labeled (e.g., via user-input tags or surrounding texts, including captions).

This Grand Challenge aims at exploring tools to push this search paradigm forward by addressing the following question: how can we search unstructured multimedia databases based on video queries? This problem is already encountered in professional environments where large semi-structured multimedia assets, such as TV/radio archives or cultural archives, are operationally managed. In these cases, resorting to trained professionals such as archivists remains the rule, both to annotate part of the database beforehand and to conduct searches. Unfortunately, this workflow does not apply to large-scale search into wildly unstructured repositories accessible on-line.

The challenge is to retrieve and organize automatically relevant multimedia documents based on an input video. In a scenario where the input video features a news story for instance, can we retrieve other videos, articles and photos about the same news story? And, when the retrieved information is voluminous, how can these multimedia documents be linked, organized and summarized for easy reference, navigation and exploitation?

More details

Yahoo! – Large-scale Flickr-tag Image Classification Grand Challenge

Image classification is one of the fundamental problems of computer vision and multimedia research. With the proliferation of the Internet, the availability of cheap digital cameras, and the ubiquity of cell-phone cameras, the amount of accessible visual content has increased astronomically. Websites such as Flickr alone boast of over 5 billion images, not counting the may such websites and countless other images that are not published online. This explosion poses unique challenges for the classification of images.

Classification of images with a large number of classes and images has attracted several research efforts in recent years. The availability of datasets such as ImageNet, which boasts of over 14 million images and over 21 thousand classes, has motivated researchers to develop classification algorithms that can deal with large quantities of data. However, most of the effort has been dedicated to building systems that can scale up when the number of classes is large. In this challenge we are interested to learn classifiers when the number of images is large. There has been some recent work that deals with thousands of images for training, however in this challenge we are looking at upwards of 250,000 images per class. What makes the challenge difficult is that the annotations are provided by users of Flickr (, which might not be always accurate. Furthermore each class can be considered as a collection of sub-classes with varied visual properties.

More details

Huawei/3DLife – 3D human reconstruction and action recognition Grand Challenge

3D human reconstruction and action recognition from multiple active and passive sensors

This challenge calls for demonstrations of methods and technologies that support real-time or near real-time 3D reconstruction of moving humans from multiple calibrated and remotely located RGB cameras and/or consumer depth cameras. Additionally, this challenge also calls for methods for human gesture/movement recognition from multimodal data. The challenge targets mainly real-time applications, such as collaborative immersive environments and inter-personal communications over the Internet or other dedicated networking environments.

To this end, we provide two data sets to support investigation of various techniques in the fields of 3D signal processing, computer graphics and pattern recognition, and enable demonstrations of various relevant technical achievements.

Consider multiple distant users, which are captured in real-time by their own visual capturing equipment, ranging from a single Kinect (simple user) to multiple Kinects and/or high-definition cameras (advanced users), as well as non-visual sensors, such as Wearable Inertial Measurement Units (WIMUs) and multiple microphones. The captured data is either processed at the capture site to produce 3D reconstructions of users or directly coded and transmitted, enabling rendering of multiple users in a shared environment, where users can “meet” and “interact” with each other or the virtual environment via a set of gestures/movements.

More details

MediaMixer/VideoLectures.NET – Temporal Segmentation and Annotation Grand Challenge

Semantic VideoLectures.NET segmentation service

VideoLectures.NET mostly hosts lectures 1 to 1.5h long linked with slides and enriched with metadata and additional textual contents. With automatic temporal segmentation and annotation of the video we would gain on efficiency of our video search engine and be able to provide users with the ability to search for sections within a video, as well as recommend similar content. This would mean that the challenge partcipants develop tools for automatic segmentation of videos that could then be implemented in VideoLectures.NET.

More details

Microsoft: MSR – Bing Image Retrieval Grand Challenge

The Second Microsoft Research (MSR)-Bing challenge (the “Challenge”) is organized into a dual track format, one scientific and the other industrial. The two tracks share exactly the same task and timelines but independent submission and ranking processes.

For the scientific track, we will follow exactly what MM13 GC outlines. The papers will be submitted to MM13, and go through the review process. The accepted ones will be presented at the conference. At the conference, the authors of the accepted papers will be requested to introduce their solutions, give a quick demo, and take questions from the judges and the audience. Winners will be selected for Multimedia Grand Challenge Award based on their presentation.

The industrial track of the Challenge will be conducted over the internet through a website maintained by Microsoft. Contestants participating in the industrial track are encouraged to take advantage of the recent advancements in the cloud computing infrastructure and public datasets and must submit their entries in the form of publicly accessible REST-based web services (further specified below). Each entry will be evaluated against a test set created by Bing on queries received at Bing Image Search in the EN-US market. Due to the global nature of the Web the queries are not necessarily limited to the English language used in the United States.

More details


Submissions should:

  • Significantly address one of the challenges posted on the web site.
  • Depict working, presentable systems or demos, using the grand challenge dataset where provided.
  • Describe why the system presents a novel and interesting solution.

Submission Guidelines

The submissions (max 4 pages) should be formatted according to ACM Multimedia formatting guidelines. The submissions should be formatted according to ACM Multimedia formatting guidelines. Multimedia Grand Challenge reviewing is Double-blind so authors shouldn’t reveal their identity in the paper. The finalists will be selected by a committee consisting of academia and industry representatives, based on novelty, presentation, scientific interest of the approache and, for the evaluation-based challenges, on the performance against the task.

Finalist submissions will be published in the conference proceedings, and will be presented in a special event during the ACM Multimedia 2013 conference in Barcelona, Spain. At the conference, finalists will be requested to introduce their solutions, give a quick demo, and take questions from the judges and the audience.
Winners will be selected for Multimedia Grand Challenge awards based on their presentation.

Important Dates

Challenges Announced: February 25, 2013
Paper Submission Deadline: July 1, 2013
Notification of Acceptance: July 29, 2013
Camera-Ready Submission Deadline: August 12, 2013


For any questions regarding the Grand Challenges please email the Multimedia Grand Challenge Solutions Chairs:

Neil O’Hare (Yahoo!, Spain)
Yiannis Kompatsiaris (CERTH, Greece)

Bookmark the permalink.