Call for Nominations: ACM TOMCCAP Nicolas D. Georganas Best Paper Award

The Editor-in-Chief of ACM TOMCCAP invites you to nominate candidates for the “ACM Transactions on Multimedia Computing, Communications and Applications Nicolas D. Georganas Best Paper Award”.

The award is given annually to the author(s) of an outstanding paper published in ACM TOMCCAP within the previous legal year from January 1 until December 31. The award carries a plaque as well as travel funds to the ACM MM conference where the awardee(s) will be honored.


Nominations for the award must include the following:

  • A statement describing the technical contributions of the nominated paper and a description of the significance of the paper. The statement should not exceed 500 words. No self-nomination is accepted.
  • Two additional supporting statements by recognized experts in the field regarding the technical contribution of the paper and its significance to the respective field.

Only papers published in regular issues (no Special Issues) can be nominated.

Nominations will be reviewed by the Selection Committee and the winning paper will finally be voted by the TOMCCAP Editorial Board.


Deadline for nominations of papers published in 2013 (Volume 9) is the 15th of June 2014.


Please send your nominations to the Editor-in-Chief at
If you have questions, please contact the TOMCCAP information director at

Further details can be found at

MediaEval 2014: Benchmarking Initiative for Multimedia Evaluation

MediaEval is a multimedia benchmark evaluation that offers tasks promoting research and innovation in areas related to human and social aspects of multimedia. Registration for MediaEval 2014 is now open.

We encourage participants to register before 1 May, when the first tasks release their first data. The MediaEval benchmark focuses on aspects of multimedia including and going beyond visual content, such as language, speech, music, and social factors. Participants carry out one or more of the tasks offered and submit runs to be evaluated. They then write up their results and present them at the MediaEval 2014 workshop, 16-17 October, Barcelona, Spain.

Dates: May-October

More information:

Call for Nominations for the SIGMM Technical Achievement Award 2014

for Outstanding Technical Contributions to Multimedia Computing, Communications and Applications


This award is presented every year to a researcher who has made significant and lasting contributions to multimedia computing, communication and applications. Outstanding technical contributions through research and practice are recognized. Towards this goal, contributions are considered from academia and industry that focus on major advances in multimedia including multimedia processing, multimedia content analysis, multimedia systems, multimedia network protocols and services, and multimedia applications and interfaces. The award recognizes members of the community for long-term technical accomplishments or those who have made a notable impact through a significant technical innovation. The selection committee focuses on candidates’ contributions as judged by innovative ideas, influence in the community, and/or the technical/social impact resulting from their work. The award includes a $2000 honorarium, an award certificate of recognition, and an invitation for the recipient to present a keynote talk at a current year’s SIGMM-sponsored conference, the ACM International Conference on Multimedia (ACM Multimedia). Travel expenses to the conference will be covered by SIGMM, and a public citation for the award will be placed on the SIGMM website.


The award honorarium, the award certificate of recognition and travel expenses to the ACM International Conference on Multimedia is fully sponsored by the SIGMM budget.


Nominations are solicited by May 31, 2014 with decision made by July 30 2014, in time to allow the above recognition and award presentation at ACM Multimedia 2014.

Nominations for the award must include:

  • A statement summarizing the candidate’s accomplishments, description of the significance of the work, and justification of the nomination (two pages maximum);
  • Curriculum Vitae of the nominee;
  • Three endorsement letters supporting the nomination including the significant contributions of the candidate. Each endorsement should be no longer than 500 words with clear specification of nominee contributions and impact on the multimedia field;
  • A concise statement (one sentence) of the achievement(s) for which the award is being given. This statement will appear on the award certificate and on the website.

The nomination rules are:

  • The nominee can be any member of the scientific community.
  • The nominator must be a SIGMM member.
  • No self-nomination is allowed.
  • Nominations that do not result in an award will be valid for two further years. After three years a revised nomination can be resubmitted.
  • The SIGMM elected officers as well as members of the Awards Selection Committee are not eligible.

Please submit your nomination to the award committee by email.

  • Dick Bulterman (
  • Hong-Jiang Zhang (
  • Nicu Sebe (
  • Rainer Lienhart (
  • Shih-Fu Chang (


  • 2013: Dick Bulterman (for outstanding technical contributions in multimedia authoring through research, standardization, and entrepreneurship).
  • 2012: Hong-Jiang Zhang (pioneering contributions to and leadership in media computing including content-based media analysis and retrieval, and their applications).
  • 2011: Shi-Fu Chang (for pioneering research and inspiring contributions in multimedia analysis and retrieval).
  • · 2010: Ramesh Jain (for pioneering research and inspiring leadership that transformed multimedia information processing to enhance the quality of life and visionary leadership of the multimedia community).
  • · 2009: Lawrence A. Rowe (for pioneering research in continuous media software systems and visionary leadership of the multimedia research community).
  • · 2008: Ralf Steinmetz (for pioneering work in multimedia communications and the fundamentals of multimedia synchronization).

SIGMM Award for Outstanding PhD Thesis in Multimedia Computing, Communications and Applications

Award Description

This award will be presented at most once per year to a researcher whose PhD thesis has the potential of very high impact in multimedia computing, communication and applications, or gives direct evidence of such impact. A selection committee will evaluate contributions towards advances in multimedia including multimedia processing, multimedia systems, multimedia network services, multimedia applications and interfaces. The award will recognize members of the SIGMM community and their research contributions in their PhD theses as well as the potential of impact of their PhD theses in multimedia area. The selection committee will focus on candidates’ contributions as judged by innovative ideas and potential impact resulting from their PhD work.

The award includes a US$500 honorarium, an award certificate of recognition, and an invitation for the recipient to receive the award at a current year’s SIGMM-sponsored conference, the ACM International Conference on Multimedia (ACM Multimedia). A public citation for the award will be placed on the SIGMM website, in the SIGMM Records e-newsletter as well as in the ACM e-newsletter.


The award honorarium, the award plaque of recognition and travel expenses to the ACM International Conference on Multimedia will be fully sponsored by the SIGMM budget.

Nomination Applications

Nominations will be solicited by the 31st May 2014 with an award decision to be made by August 30. This timing will allow a recipient to prepare for an award presentation at ACM Multimedia in that Fall (October/November).

The initial nomination for a PhD thesis must relate to a dissertation deposited at the nominee’s Academic Institution between January and December of the year previous to the nomination. As discussed below, some dissertations may be held for up to three years by the selection committee for reconsideration. If the original thesis is not in English, a full English translation must be provided with the submission. Nominations for the award must include:

  1. PhD thesis (upload at: )
  2. A statement summarizing the candidate’s PhD thesis contributions and potential impact, and justification of the nomination (two pages maximum);
  3. Curriculum Vitae of the nominee
  4. Three endorsement letters supporting the nomination including the significant PhD thesis contributions of the candidate. Each endorsement should be no longer than 500 words with clear specification of nominee PhD thesis contributions and potential impact on the multimedia field.
  5. A concise statement (one sentence) of the PhD thesis contribution for which the award is being given. This statement will appear on the award certificate and on the website.

The nomination rules are:

  1. The nominee can be any member of the scientific community.
  2. The nominator must be a SIGMM member.
  3. No self-nomination is allowed.

If a particular thesis is considered to be of exceptional merit but not selected for the award in a given year, the selection committee (at its sole discretion) may elect to retain the submission for consideration in at most two following years. The candidate will be invited to resubmit his/her work in these years.

A thesis is considered to be outstanding if:

  1. Theoretical contributions are significant and application to multimedia is demonstrated.
  2. Applications to multimedia is outstanding, techniques are backed by solid theory with clear demonstration that algorithms can be applied in new domains – e.g., algorithms must be demonstrably scalable in application in terms of robustness, convergence and complexity.

The submission process of nominations will be preceded by the call for nominations. The call of nominations will be widely publicized by the SIGMM awards committee and by the SIGMM Executive Board at the different SIGMM venues, such as during the SIGMM premier ACM Multimedia conference (at the SIGMM Business Meeting) on the SIGMM web site, via SIGMM mailing list, and via SIGMM e-newsletter between September and December of the previous year.

Submission Process

  • Register an account at and upload one copy of the nominated PhD thesis. The nominee will receive a Paper ID after the submission.
  • The nominator must then collate other materials detailed in the previous section and upload them as supplementary materials, except the endorsement letters, which must be emailed separately as detailed below.
  • Contact your referees and ask them to send all endorsement letters to with the title: “PhD Thesis Award Endorsement Letter for [YourName]”. The web administrator will acknowledge the receipt and the submission CMT website will reflect the status of uploaded documents and endorsement letters.

It is the responsibility of the nominator to follow the process and make sure documentation is complete. Thesis with incomplete documentation will be considered invalid.

Selection Committee

The 2014 award selection committee consists of:

  • Prof. Kiyoharu Aizawa ( from University of Tokyo, Japan
  • Prof. Baochun Li ( from University of Toronto, Canada
  • Prof. K. Selcuk Candan ( from Arizona State University, USA
  • Prof. Shin’ichi Satoh ( from National Institute of Informatics, Japan
  • Dr. Daniel Gatica-Perez ( from Idiap-EPFL, Switzerland

ESSENTIA: an open source library for audio analysis

Over the last decade, audio analysis has become a field of active research in academic and engineering worlds. It refers to the extraction of information and meaning from audio signals for analysis, classification, storage, retrieval, and synthesis, among other tasks. Related research topics challange understanding and modeling of sound and music, and develop methods and technologies that can be used to process audio in order to extract acoustically and musically relevant data and make use of this information. Audio analysis techniques are instrumental in the development of new audio-related products and services, because these techniques allow novel ways of interaction with sound and music. Essentia is an open-source C++ library for audio analysis and audio-based music information retrieval released under the Affero GPLv3 license (also available under proprietary license upon request). It contains an extensive collection of reusable algorithms which implement audio input/output functionality, standard digital signal processing blocks, statistical characterization of data, and a large set of spectral, temporal, tonal and high-level music descriptors that can be computed from audio. In addition, Essentia can be complemented with Gaia, a C++ library with python bindings which allows searching in a descriptor space using different similarity measures and classifying the results of audio analysis (same license terms apply). Gaia can be used to generate classification models that Essentia can use to compute high-level description of music. Essentia is not a framework, but rather a collection of algorithms wrapped in a library. It doesn’t enforce common high-level logic for descriptor computation (so you aren’t locked into a certain way of doing things). It rather focuses on the robustness, performance and optimality of the provided algorithms, as well as ease of use. The flow of the analysis is decided and implemented by the user, while Essentia is taking care of the implementation details of the algorithms being used. A number of examples are provided with the library, however they should not be considered as the only correct way of doing things. The library includes Python bindings as well as a number of predefined executable extractors for the available music descriptors, which facilitates its use for fast prototyping and allows setting up research experiments very rapidly. The extractors cover a number of common use-cases for researchers, for example, computing all available music descriptors for an audio track, extracting only spectral, rhythmic, or tonal descriptors, computing predominant melody and beat positions, and returning the results in yaml/json data formats. Furthermore, it includes a Vamp plugin to be used for visualization of music descriptors using hosts such as Sonic Visualiser. The library is cross-platform and supports Linux, Mac OS X and Windows systems. Essentia is designed with a focus on the robustness of the provided music descriptors and is optimized in terms of the computational cost of the algorithms. The provided functionality, specifically the music descriptors included out-of-the-box and signal processing algorithms, is easily expandable and allows for both research experiments and development of large-scale industrial applications. Essentia has been in development for more than 7 years incorporating the work of more than 20 researchers and developers through its history. The 2.0 version marked the first release to be publicly available as free software released under AGPLv3.


Essentia currently features the following algorithms (among others):

  • Audio file input/output: ability to read and write nearly all audio file formats (wav, mp3, ogg, flac, etc.)
  • Standard signal processing blocks: FFT, DCT, frame cutter, windowing, envelope, smoothing
  • Filters (FIR & IIR): low/high/band pass, band reject, DC removal, equal loudness
  • Statistical descriptors: median, mean, variance, power means, raw and central moments, spread, kurtosis, skewness, flatness
  • Time-domain descriptors: duration, loudness, LARM, Leq, Vickers’ loudness, zero-crossing-rate, log attack time and other signal envelope descriptors
  • Spectral descriptors: Bark/Mel/ERB bands, MFCC, GFCC, LPC, spectral peaks, complexity, rolloff, contrast, HFC, inharmonicity and dissonance
  • Tonal descriptors: Pitch salience function, predominant melody and pitch, HPCP (chroma) related features, chords, key and scale, tuning frequency
  • Rhythm descriptors: beat detection, BPM, onset detection, rhythm transform, beat loudness
  • Other high-level descriptors: danceability, dynamic complexity, audio segmentation, semantic annotations based on SVM classifiers

The complete list of algorithms is available online in the official documentation.


The main purpose of Essentia is to serve as a library of signal-processing blocks. As such, it is intended to provide as many algorithms as possible, while trying to be as little intrusive as possible. Each processing block is called an Algorithm, and it has three different types of attributes: inputs, outputs and parameters. Algorithms can be combined into more complex ones, which are also instances of the base Algorithm class and behave in the same way. An example of such a composite algorithm is presented in the figure below. It shows a composite tonal key/scale extractor, which combines the algorithms for frame cutting, windowing, spectrum computation, spectral peaks detection, chroma features (HPCP) computation and finally the algorithm for key/scale estimation from the HPCP (itself a composite algorithm).

The algorithms can be used in two different modes: standard and streaming. The standard mode is imperative while the streaming mode is declarative. The standard mode requires to specifying the inputs and outputs for each algorithm and calling their processing function explicitly. If the user wants to run a network of connected algorithms, he/she will need to manually run each algorithm. The advantage of this mode is that it allows very rapid prototyping (especially when the python bindings are coupled with a scientific environment in python, such as ipython, numpy, and matplotlib).

The streaming mode, on the other hand, allows to define a network of connected algorithms, and then an internal scheduler takes care of passing data between the algorithms inputs and outputs and calling the algorithms in the appropriate order. The scheduler available in Essentia is optimized for analysis tasks, and does not take into account the latency of the network. For real-time applications, one could easily replace this scheduler with another one that favors latency over throughput. The advantage of this mode is that it results in simpler and safer code (as the user only needs to create algorithms and connect them, there is no room for him to make mistakes in the execution order of the algorithms), and in lower memory consumption in general, as the data is streamed through the network instead of being loaded entirely in memory (which is the usual case when working with the standard mode). Even though most of the algorithms are available for both the standard and streaming mode, the code that implements them is not duplicated as either the streaming version of an algorithm is deduced/wrapped from its standard implementation, or vice versa.


Essentia has served in a large number of research activities conducted at Music Technology Group since 2006. It has been used for music classification, semantic autotagging, music similarity and recommendation, visualization and interaction with music, sound indexing, musical instruments detection, cover detection, beat detection, and acoustic analysis of stimuli for neuroimaging studies. Essentia and Gaia have been used extensively in a number of research projects and industrial applications. As an example, both libraries are employed for large-scale indexing and content-based search of sound recordings within Freesound, a popular repository of Creative Commons licensed audio samples. In particular, Freesound uses audio based similarity to recommend sounds similar to user queries. Dunya is a web-based software application using Essentia that lets users interact with an audio music collection through the use of musical concepts that are derived from a specific musical culture, in this case Carnatic music.


Essentia can be easily used via its python bindings. Below is a quick illustration of Essentia’s possibilities for example on detecting beat positions of music track and its predominant melody in a few lines of python code using the standard mode:

from essentia.standard import *; audio = MonoLoader(filename = 'audio.mp3')(); beats, bconfidence = BeatTrackerMultiFeature()(audio); print beats; audio = EqualLoudness()(audio); melody, mconfidence = PredominantMelody(guessUnvoiced=True, frameSize=2048, hopSize=128)(audio); print melody Another python example for computation of MFCC features using the streaming mode: from essentia.streaming import * loader = MonoLoader(filename = 'audio.mp3') frameCutter = FrameCutter(frameSize = 1024, hopSize = 512) w = Windowing(type = 'hann') spectrum = Spectrum() mfcc = MFCC() pool = essentia.Pool() # connect all algorithms into a network >> frameCutter.signal frameCutter.frame >> w.frame >> spectrum.frame spectrum.spectrum >> mfcc.spectrum mfcc.mfcc >> (pool, 'mfcc') mfcc.bands >> (pool, 'mfcc_bands') # compute network print pool['mfcc'] print pool['mfcc_bands'] Vamp plugin provided with Essentia allows to use many of its algorithms via the graphical interface of Sonic Visualiser. In this example, positions of onsets are computed for a music piece (marked in red): An interested reader is referred to the documention online for more example applications built on top of Essentia.

Getting Essentia

The detailed information about Essentia is available online on the official web page: It contains the complete documentation for the project, compilation instructions for Debian/Ubuntu, Mac OS X and Windows, as well as precompiled packages. The source code is available at the official Github repository: In our current work we are focused on expanding the library and the community of users, and all active Essentia users are encouraged to contribute to the library.


[1] Serra, X., Magas, M., Benetos, E., Chudy, M., Dixon, S., Flexer, A., Gómez, E., Gouyon, F., Herrera, P., Jordà, S., Paytuvi, O, Peeters, G., Schlüter, J., Vinet, H., and Widmer, G., Roadmap for Music Information ReSearch, G. Peeters, Ed., 2013. [Online].

[2] Bogdanov, D., Wack N., Gómez E., Gulati S., Herrera P., Mayor O., Roma, G., Salamon, J., Zapata, J., Serra, X. (2013). ESSENTIA: an Audio Analysis Library for Music Information Retrieval. International Society for Music Information Retrieval Conference(ISMIR’13). 493-498.

[3] Bogdanov, D., Wack N., Gómez E., Gulati S., Herrera P., Mayor O., Roma, G., Salamon, J., Zapata, J., Serra, X. (2013). ESSENTIA: an Open-Source Library for Sound and Music Analysis. ACM International Conference on Multimedia (MM’13).

Most cited papers before the era of ICMR

In the early years of 2000, the field of multimedia retrieval was composed of special sessions at conferences and small workshops. There were no multimedia retrieval conferences. One of the leading workshops (B. Kerherve, V. Oria and S. Satoh) was the ACM SIGMM Workshop on Multimedia Information Retrieval (MIR) which was held with the ACM MM conference.

To have a central meeting for the scientific community, the International Conference on Image and Video Retrieval (CIVR) was founded in 2002 (J. Eakins, P. Enser, M. Graham, M.S. Lew, P. Lewis and A. Smeaton). Both meetings evolved over the next decade.  CIVR and MIR became ACM SIGMM sponsored conferences and established reputations for high quality work.

In 2010, the steering committees of both CIVR and MIR voted to combine the two conferences toward unifying the communities and establishing the ACM flagship meeting for multimedia retrieval, the ACM International Conference on Multimedia Retrieval (ICMR).  In 2013, ICMR was ranked by the Chinese Computing Federation as the #1 meeting in multimedia retrieval and the #4 meeting in the wide domain of Multimedia and Graphics.

For archival reasons, this is a summary of which papers had the most citations from ACM CIVR and ACM MIR (2008-2010), based on Google Scholar data in the period from February 17-18, 2014.

Google Scholar citations were used because they have wide coverage (ACM, IEEE, Springer, Elsevier, etc.), are publicly accessible and because they are being increasingly accepted by researchers for both paper citations estimates and computing the h-index.

The information below is given in the format of
Rank | Citations | Article-Information

CIVR 2008

  1. 173 – World-scale mining of objects and events from community photo collections
    Till Quack, Bastian Leibe, Luc Van Gool
  2. 81 – Analyzing Flickr groups
    Radu Andrei Negoescu, Daniel Gatica-Perez
  3. 70 – A comparison of color features for visual concept classification
    Koen E.A. van de Sande, Theo Gevers, Cees G.M. Snoek
  4. 68 – Language modeling for bag-of-visual words image categorization
    Pierre Tirilly, Vincent Claveau, Patrick Gros
  5. 46 – Multiple feature fusion by subspace learning
    Yun Fu, Liangliang Cao, Guodong Guo, Thomas S. Huang

CIVR 2009

  1. 379 – NUS-WIDE: a real-world web image database from National University of Singapore
    Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, Yantao Zheng
  2. 124 – Evaluation of GIST descriptors for web-scale image search
    Matthijs Douze, Hervé Jégou, Harsimrat Sandhawalia, Laurent Amsaleg, Cordelia Schmid
  3. 81 – Real-time bag of words, approximately
    J. R. R. Uijlings, A. W. M. Smeulders, R. J. H. Scha
  4. 57 – Dense sampling and fast encoding for 3D model retrieval using bag-of-visual features
    Takahiko Furuya, Ryutarou Ohbuchi
  5. 46 – Multilayer pLSA for multimodal image retrieval
    Rainer Lienhart, Stefan Romberg, Eva Hörster

CIVR 2010

  1. 43 – Signature Quadratic Form Distance
    Christian Beecks, Merih Seran Uysal, Thomas Seidl
  2. 41 – Feature detector and descriptor evaluation in human action recognition
    Ling Shao, Riccardo Mattivi
  3. 38 – Unsupervised multi-feature tag relevance learning for social image retrieval
    Xirong Li, Cees G. M. Snoek, Marcel Worring
  4. 29 – Co-reranking by mutual reinforcement for image search
    Ting Yao, Tao Mei, Chong-Wah Ngo
  5. Two papers were tied for 5th place in citations:

MIR 2008

  1. 285 – The MIR flickr retrieval evaluation
    Mark J. Huiskes, Michael S. Lew
  2. 203 – Outdoors augmented reality on mobile phone using loxel-based visual feature organization
    Gabriel Takacs, Vijay Chandrasekhar, Natasha Gelfand, Yingen Xiong, Wei-Chao Chen, Thanos Bismpigiannis, Radek Grzeszczuk, Kari Pulli, Bernd Girod
  3. 119 – Learning tag relevance by neighbor voting for social image retrieval
    Xirong Li, Cees G.M. Snoek, Marcel Worring
  4. 58 – Spirittagger: a geo-aware tag suggestion tool mined from flickr
    Emily Moxley, Jim Kleban, B. S. Manjunath
  5. 42 – Content-based mood classification for photos and music: a generic multi-modal classification framework and evaluation approach
    Peter Dunker, Stefanie Nowak, André Begau, Cornelia Lanz

MIR 2010

  1. 82 – New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative
    Mark J. Huiskes, Bart Thomee, Michael S. Lew
  2. 78 – How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation
    Stefanie Nowak, Stefan Rüger
  3. 45 – Exploring automatic music annotation with “acoustically-objective” tags
    Derek Tingle, Youngmoo E. Kim, Douglas Turnbull
  4. 39 – Feature selection for content-based, time-varying musical emotion regression
    Erik M. Schmidt, Douglas Turnbull, Youngmoo E. Kim
  5. 34 – ACQUINE: aesthetic quality inference engine – real-time automatic rating of photo aesthetics
    Ritendra Datta, James Z. Wang

Report from ACM Multimedia 2013

Conference/Workshop Program Highlights

ACM Multimedia 2013 was held at the CCIB (Centre de Conventions Internacional de Barcelona) from October 21st to October 25th, 2012 in Barcelona. The Art Exhibition has been held for the entire duration of the conference at the FAD (Forment de les Arts i del Disseny) in the center of the city while the workshops were held in the Universitat Pompeu Fabra – Balmes building during the first two days of the conference (Oct. 21-Oct 22). It was the first time the conference was held in Spain and it offered a high-quality program and a few notable innovations. Dr. Nozha Boujemaa from INRIA, France, Dr. Alejandro Jaimes from Yahoo! Labs, Spain and Prof. Nicu Sebe from the University of Trento, Italy were the general co-chairs of the conference. Dr. Daniel Gatica-Perez from IDIAP & EPFL, Switzerland, Dr. David A. Shamma from Yahoo! Labs, USA, Prof. Marcel Worring from the University of Amsterdam, The Netherlands, and Prof. Roger Zimmermann from the National University of Singapore, Singapore were the program co-chairs. The entire organization committee is listed in Appendix A. The number of participants was 544. The main conference was attended by 476 participants out of which 425 paid and 51 participants were special cases (sponsors, student volunteers, etc.), and 68 participants attended workshops only. The tutorials which were free of charge were registered by 312 in advance. Multimedia art exhibition was open to public from Oct. 21 to Oct.28, and visited by more than 2,000 visitors. The total revenue of the conference was $318,151, and the surplus was $25,430.

The venue (CCIB)

Below is the list of the program components of Multimedia 2013.

  • ž Technical Papers: Full and Short papers
  • ž Keynote Talks
  • ž SIGMM Achievement Award Talk, Ph.D Thesis Award Talk
  • ž Panel
  • ž Brave New Ideas
  • ž Multimedia Grand Challenge Solutions
  • ž Technical Demos
  • ž Open Source Software Competition
  • ž Doctoral Symposium
  • ž Art Exhibition and Reception
  • ž Tutorials
  • ž Workshops
  • ž Awards and Banquet

Innovations made for Multimedia 2013:

In attempt to continuously improve ACM Multimedia and ensure its vibrant role for the multimedia community, we have made a number of enhancements for this year’s conference:

  • The Technical Program Committee defined twelve Technical Areas for major focus for this year’s conference, including introducing new Technical Areas for Music & Audio and Crowdsourcing to reflect their growing interest and promise. We have also changed the names of some traditional Technical Areas and provided extensive description of each area to help the authors choosing the most appropriate Technical Area for their manuscripts.
  • We have introduced a new role in the organization of the conference: the author’s advocate. His explicit role was to listen to the authors, and to help them if reviews are clearly below average quality. The authors could request the mediation of the author’s advocate after the reviews have been sent to them and they had to clearly justify the reasons why such mediation is needed (the reviews or the meta-review were below average quality). The task of the advocate was to investigate carefully the matter and to request additional review or reexamination of the decision of the particular manuscript. This year, the author’s advocate was Pablo Cesar from CWI, The Netherlands.
  • We have decided to keep a couple of plenary sessions which will bring singular focus to conference activities: keynotes, Multimedia Grand Challenge competition, Best Paper session, Technical Achievement Award and Best PhD Award sessions. The other technical sessions are held in parallel to allow pursuit of more specialized interests at the conference. We have limited the number of parallel session to no more than 3 to minimize the risk of having overlapping interests.
  • The use of video spotlights for advertising the works to be presented. These were meant to offer all attendees an opportunity to become aware of the content of each paper, and thus to be attracted to attend the corresponding poster or talk.
  • Workshops and Tutorials are held on separate days from the main conference in order to reduce conflict with the regular Technical Program.
  • The Multimedia Art Exhibition featured both invited and selected artists. It was open for the duration of the conference in the satellite venue located in the center of the city.
  • Following the last two years’ precedent, Tutorials are made free for all participants.
  • Recognizing that students are the lifeblood of our next generation of multimedia thinkers, this year’s Student Travel Grant was greatly expanded. We had a total amount of $26,000 received from SIGMM ($16,000) and NSF ($10,000) that supported 35 students.
  • Finally, we have decided to provide open access for the community to the proceedings available in the ACM Digital Libraries. As such, no USB proceedings were handed over to the participants encouraging everyone to get online access.

Technical Program

Following the guidelines of the ACM Multimedia Review Committee, the conference was structured into 12 Areas, with a two-tier TPC, a double-blind review process, and a target acceptance rate of 20% for long papers and 27.7% for short papers. Based on the experience from ACM Multimedia 2012 and the responses to our “Call for Areas” that we issued to the community, we selected the following Areas.

  1. Art, Entertainment, and Culture
  2. Authoring and Collaboration
  3. Crowdsourcing
  4. Media Transport and Delivery
  5. Mobile & Multi-device
  6. Multimedia Analysis
  7. Multimedia HCI
  8. Music & Audio
  9. Search, Browsing, and Discovery
  10. Security and Forensics
  11. Social Media & Presence
  12. Systems and Middleware

The Technical Program Committee was first created by appointing Area Chairs (ACs). A total of 29 colleagues agreed to serve in this role. Each Area was represented by two ACs, with exception of two Areas (Multimedia Analysis and Search, Browsing, and Discovery) whose scope has traditionally attracted the largest proportion of papers and so required further coordination. The added topic diversity brought an increase in gender diversity to the ACs, which increased from approximately 12% in previous years to 22% for 2013. We also made a conscious effort to bring new talent and excellence into the community and to better represent emerging trends in the field. For this we appointed many young and well recognized ACs who served in this role for the first time. For each junior AC, we co-appointed a senior researcher as their co-AC to aid in their shepherding. In a second step, the Area Chairs were responsible for appointing the TPC members (reviewers) for their coordinated areas. This was a large effort to grow the TPC base for the conference as well as ensure proper expertise was represented in each area. We coupled this with a hard goal of limiting the number of submissions assigned to each TPC member for review. For example, two years ago, the average number of papers assigned to a reviewer was 9 with over 38% of the approximately 225 TPC members receiving 10 or more papers to review. With our design, we had a total of 398 reviewers receiving an average of 4.13 papers per reviewer. While we were unable to keep a hard ceiling limitation, only 2.51% of the TPC received 10 or more papers to review—all TPC members who had agreed to serve in more than one area. The Area Chairs were in charge of assigning all papers for review, and each submission was reviewed double-blind by three TPC members. Reviews and reviewer assignments of papers co-authored by Area Chairs, Program Chairs, and General Chairs were handled by Program Chairs who had no conflicts of interest for each specific case. Another novelty introduced in the reviewing process was to set the paper submission deadline to a significantly earlier date than previous years, in order to allocate more time for reviews, rebuttals, discussions, and final decisions. Despite the reduced time given to authors, the response to the Call for Papers was enthusiastic with a total of 235 long papers and 278 short papers going through review. The authors of long papers were asked to write a rebuttal after receiving the reviews. A new element in the reviewing process was the introduction of the Author’s Advocate figure, created to provide authors with an independent channel to express concerns about the quality of the reviews for their papers, and to raise a flag about these reviews. All cases were brought to the attention to the corresponding Area Chair. After evaluating each case reported to him (16 reviews out of 761 long paper reviews), the Author’s Advocate recommended in 5 cases that new reviews were generated and added to the discussion. The reviewers had a period for on-line discussion of reviews and rebuttals, after which the Area Chairs drafted a meta-review for each paper. Decisions on long and short papers were made at the TPC meeting held at the University of Amsterdam on June 11, 2013. The meeting was physically attended by one of the General Chairs, three of the Program Chairs, the Author’s Advocate, and 86% of the ACs. Many of the ACs who were unable to attend were tele-present online for discussions. On the first half day of the TPC meeting, the Area Chairs worked in breakout sessions to discuss the papers that were weak accepts and weak rejects, with the exception of conflict of interest papers which were handled out of band as previously mentioned. In the second half of the first day, the ACs met in a plenary session where they reviewed the clear accepts and defended the decisions on the borderline papers based on the papers themselves, reviews, meta-reviews, on-line discussions, and authors’ rebuttal comments. In many cases, an emergency reviewer was added if there was clear intersection with a related submission area. If a paper had any conflict of interest during the plenary session with an Area, Program, or General Chair, they were excused from the room. On June 12, 2013, the Program Chairs finalized the process and conference program in a separate meeting—arranging the sessions by thematic narratives and not by submission area to promote cross-area conversations during the conference itself. The review process resulted in an overall acceptance rate of 20.0% for long papers and 27.7% for short papers (the distribution of submissions and the acceptance rate for each one of the 12 areas is shown in the graph below). All accepted long papers were shepherded by the Area Chairs themselves or by qualified TPC members who were in charge of verifying that the revised papers adequately addressed concerns raised by the reviewers and changes promised by authors in their rebuttals. This step ensured that all of the accepted papers are of the highest quality possible. In addition, four papers with high review scores were nominated at the TPC meeting as candidates for the Best Paper Award. Each nominated paper had to be successfully championed and defended by the ACs from that area. The winner was announced at the Conference Banquet.

ACM Multimedia 2013 Program at a Glance

The entire program of ACM Multimedia 2013 is shown below.

Workshop session

Conference venue

Opening ceremony

Keynote presentation

Poster/Demo session

SIGMM Achievement Award Talk

Keynote Talks

Multimedia Framed Dr. Elizabeth F. Churchill (Ebay Research Labs) Wednesday, Oct. 23, 2013 Abstract: Multimedia is the combination of several media forms. Information designers, educationalists and artists are concerned with questions such as: Is text, or audio or video, or a combination of all three, the best format for the message? Should another modality (e.g., haptics/touch, olfaction) be invoked instead to make the message more effective and/or the experience more engaging? How does the setting affect perception/reception? How does framing affect people’s experience of multimedia? How is the artifact changed through interaction with audience members? In this presentation, I will talk about people’s experience of multimedia artifacts like videos. I will discuss the ways in which framing affects how we experience multimedia. Framing can be intentional–scripted creations produced with clear intent by technologists, designers, media producers, media artists, film-makers, archivists, documentarians and architects. Framing can also be unintentional. Everyday acts of interest and consumption turn us, the viewers, into co-producers of the experiences of the multimedia artifacts we have viewed. We download, annotate, comment and share multimedia artifacts online. Our actions are reflected in view counts, displayed comments and content ranking. Our actions therefore change how multimedia artifacts are interpreted and understood by others. Drawing on examples from the history of film and of performance art, from current social media research and from research conducted with collaborators over the past 16 years, I will illustrate how content understanding is modulated by context, by the “framing” of the content. I will consider three areas of research that are addressing the issue of framing, and that have implications for our understanding of ‘multimedia’ consumption, now and in the future: (1) The psychology and psychophysiology of multimedia as multimodal experience; (2) Emerging practices with contemporary social media capture and sharing from personal devices; and (3) Innovations in social media and audience analytics focused on more deeply understanding media consumption. I will conclude with some technical excitements, design/development challenges and experiential possibilities that lie ahead. Dr. Elizabeth Churchill is Director of Human Computer Interaction at eBay Research Labs (ERL) in San Jose, California. Formerly a Principal Research Scientist at Yahoo! Research, she founded, staffed and managed the Internet Experiences Group. Until September of 2006, she worked at the Palo Alto Research Center (PARC), California, in the Computing Science Lab (CSL). Prior to that she formed and led the Social Computing Group at FX Palo Laboratory, Fuji Xerox’s research lab in Palo Alto. Originally a psychologist by training, throughout her career Elizabeth has focused on understanding people’s social and collaborative interactions in their everyday digital and physical contexts. With over 100 peer-reviewed publications and 5 edited books, topics she has written about include implicit learning, human-agent systems, mixed initiative dialogue systems, social aspects of information seeking, digital archive and memory, and the development of emplaced media spaces. She has been a regular columnist for ACM interactions since 2008. Elizabeth has a BSc in Experimental Psychology, an MSc in Knowledge Based Systems, both from the University of Sussex, and a PhD in Cognitive Science from the University of Cambridge. In 2010, she was recognised as a Distinguished Scientist by the Association for Computing Machinery (ACM). Elizabeth is the current Executive Vice President of ACM SigCHI (Human Computer Interaction Special Interest Group). She is a Distinguished Visiting Scholar at Stanford University’s Media X, the industry affiliate program to Stanford’s H-STAR Institute. The Space between the Images Leonidas J. Guibas (Stanford University) Thursday, Oct. 24, 2013 Abstract: Multimedia content has become a ubiquitous presence on all our computing devices, spanning the gamut from live content captured by device sensors such as smartphone cameras to immense databases of images, audio and video stored in the cloud. As we try to maximize the utility and value of all these petabytes of content, we often do so by analyzing each piece of data individually and foregoing a deeper analysis of the relationships between the media. Yet with more and more data, there will be more and more connections and correlations, because the data captured comes from the same or similar objects, or because of particular repetitions, symmetries or other relations and self-relations that the data sources satisfy. This is particularly true for media of a geometric character, such as GPS traces, images, videos, 3D scans, 3D models, etc. In this talk we focus on the “space between the images”, that is on expressing the relationships between different multimedia data items. We aim to make such relationships explicit, tangible, first class objects that themselves can be analyzed, stored, and queried — irrespective of the media they originate from. We discuss mathematical and algorithmic issues on how to represent and compute relationships or mappings between media data sets at multiple levels of detail. We also show how to analyze and leverage networks of maps and relationships, small and large, between inter-related data. The network can act as a regularizer, allowing us to to benefit from the “wisdom of the collection” in performing operations on individual data sets or in map inference between them. We will illustrate these ideas using examples from the realm of 2D images and 3D scans/shapes — but these notions are more generally applicable to the analysis of videos, graphs, acoustic data, biological data such as microarrays, homeworks in MOOCs, etc. This is an overview of joint work with multiple collaborators, as will be discussed in the talk. Prof. Leonidas Guibas obtained his Ph.D. from Stanford under the supervision of Donald Knuth. His main subsequent employers were Xerox PARC, DEC/SRC, MIT, and Stanford. He is currently the Paul Pigott Professor of Computer Science (and by courtesy, Electrical Engineering) at Stanford University. He heads the Geometric Computation group and is part of the Graphics Laboratory, the AI Laboratory, the Bio-X Program, and the Institute for Computational and Mathematical Engineering. Professor Guibas’ interests span geometric data analysis, computational geometry, geometric modeling, computer graphics, computer vision, robotics, ad hoc communication and sensor networks, and discrete algorithms. Some well-known past accomplishments include the analysis of double hashing, red-black trees, the quad-edge data structure, Voronoi-Delaunay algorithms, the Earth Mover’s distance, Kinetic Data Structures (KDS), Metropolis light transport, and the Heat-Kernel Signature. Professor Guibas is an ACM Fellow, an IEEE Fellow and winner of the ACM Allen Newell award.


SIGMM Achievement Award Talk Dick Bulterman, CWI, The Netherlands Friday, Oct. 25, 2013 The 2013 winner of SIGMM award for Outstanding Technical Contributions to Multimedia Computing, Communications and Applications is Prof. Dr. Dick Bulterman. The ACM SIGMM Technical Achievement award is given in recognition of outstanding contributions over a researcher’s career. Prof. Dick Bulterman has been selected for his outstanding technical contributions in multimedia authoring, media annotation, and social sharing from research through standardization to entrepreneurship, and in particular for promoting international Web standards for multimedia authoring and presentation (SMIL) in the W3C Synchronized Multimedia Working Group as well as his dedicated involvement in the SIGMM research community for many years. Dr. Dick Bulterman has been a long time intellectual leader in the area of temporal modeling and support for complex multimedia system. His research has led to the development of several widely used multimedia authoring systems and players. He developed the Amsterdam Hypermedia Model, the CMIF document structure, the CMIFed authoring environment, the GRiNS editor and player, and a host of multimedia demonstrator applications. In 1999, he started the CWI spinoff company called Oratrix Development BV, and he worked as CEO to widely deliver this software. He is currently a Research Group Head of the Distributed and Interactive Systems at Centrum Wiskunde & Informatica (CWI) in Amsterdam, The Netherlands. He is also a Full Professor of Computer Science at Vrije Universiteit, Amsterdam. His research interests are multimedia authoring and document processing. Dick has a strong international reputation for the development of the domain-specific temporal language for multimedia (SMIL). Much of this software has been incorporated into the widely used Ambulant Open Source SMIL Player, which has served to encourage development and use of time-based multimedia content. His conference publications and book on SMIL have helped to promote SMIL and its acceptance as a W3C standard. Dick’s recent work on social sharing of video will likely prove influential in upcoming Interactive TV products. This work has already been recognized in the academic community, earning the ACM SIGMM best paper award at ACM MM 2008 and also at the EUROITV conference. SIGMM Ph.D Thesis Award Talk Xirong Li, Remin University, China Friday, Oct. 25, 2013 The SIGMM Ph.D. Thesis Award Committee recommended this year’s award for the outstanding Ph.D. thesis in multimedia computing, communications and applications to Dr. Xirong Li. The committee considered Dr. Li’s dissertation titled “Content-based visual search learned from social media” as worthy of the award as it substantially extends the boundaries for developing content-based multimedia indexing and retrieval solutions. In particular, it provides fresh new insights into the possibilities for realizing image retrieval solutions in the presence of vast information that can be drawn from the social media. The committee considered the main innovation of Dr. Li’s work to be in the development of the theory and algorithms providing answers to the following challenging research questions: (a) what determines the relevance of a social tag with respect to an image, (b) how to fuse tag relevance estimators, (c) which social images are the informative negative examples for concept learning, (d) how to exploit socially tagged images for visual search and (e) how to personalize automatic image tagging with respect to a user’s preferences. The significance of the developed theory and algorithms lies in their power to enable effective and efficient deployment of the information collected from the social media to enhance the datasets that can be used to learn automatic image indexing mechanisms (visual concept detection) and to make this learning more personalized for the user. Dr. Xirong Li received the B.Sc. and M.Sc. degrees from the Tsinghua University, China, in 2005 and 2007, respectively, and the Ph.D. degree from the University of Amsterdam, The Netherlands, in 2012, all in computer science. The title of his thesis is “Content-based visual search learned from social media”. He is currently an Assistant Professor in the Key Lab of Data Engineering and Knowledge Engineering, Renmin University of China. His research interest is image search and multimedia content analysis. Dr. Li received the IEEE Transactions on Multimedia Prize Paper Award 2012, Best Paper Nominee of the ACM International Conference on Multimedia Retrieval 2012, Chinese Government Award for Outstanding Self-Financed Students Abroad 2011, and the Best Paper Award of the ACM International Conference on Image and Video Retrieval 2010. He served as publicity co-chair for ICMR 2013. Panel Cross-Media Analysis and Mining Wednesday, Oct 23, 2013 Panelists:Mark Zhang, Alberto del Bimbo, Selcuk Candan, Alexander Hauptmann, Ramesh Jain, Alexis Joly, Yueting Zhuang Motivation Today there are lots of heterogeneous and homogeneous media data from multiple sources, such as news media websites, microblog, mobile phone, social networking websites, and photo/video sharing websites. Integrated together these media data represent different aspects of the real-world and help document the evolution of the world. Consequently, it is impossible to correctly conceive and to appropriately understand the world without exploiting the data available on these different sources of rich multimedia content simultaneously and synergistically. Cross-media analysis and mining is a research area in the general field of multimedia content analysis which focuses on the exploitation of the data with different modalities from multiple sources simultaneously and synergistically to discover knowledge and understand the world. Specifically, we emphasize two essential elements in the study of cross-media analysis that help differentiate cross-media analysis from the rest of the research in multimedia content analysis or machine learning. The first is the simultaneous co-existence of data from two or more different data sources. This element indicates the concept of “cross”, e.g., cross-modality, cross-source, and cross cyberspace to reality. Cross-modality means that heterogeneous features are obtained from the data in different modalities; cross-source means that the data may be obtained across multiple sources (domains or collections); cross-space means that the virtual world (i.e., cyberspace) and the real world (i.e., reality) complement each other. The second is the leverage of different types of data across multiple sources for strengthening the knowledge discovery, for example, discovering the (latent) correlation or synergy between the data with different modalities across multiple sources, transferring the knowledge learned from one domain (e.g., a modality or a space) to generate knowledge in another related domain, and generating a summary with the data from multiple sources. There two essential elements help promote cross-media analysis and mining as a new, emerging, and important research area in today’s multimedia research. With the emphasis on knowledge discovery, cross-media analysis is different from the traditional research areas such as cross-lingual translation. On the other hand, with the general scenarios of the leverage of different types of data across multiple sources for strengthening the knowledge discovery, cross-media analysis and mining addresses a broader series of problems than the traditional research areas such as transfer learning. Overall, cross-media analysis and mining is beneficial for many applications in data mining, causal inference, machine learning, multimedia, and public security. Like other emerging hot topics in multimedia research, cross-media analysis and mining also has a number of fundamental and controversial issues that must be addressed in order to have a full and complete understanding of the research in this topic. These issues include but are not limited to whether or not there exists a unified representation or modeling for the same semantic concept from different media, and if there is what such unified representation or modeling is; whether or not there exists any “law” that governs the topic evolution and development over the time in different media and if there is what such “law” is and how it is formulated; whether or not there exists a mapping for a conceptual or semantic activity between the cyberspace and the real-world, and if there is what such a mapping is and how it is developed and formulated. Brave New Idea Program Brave New Ideas addressed long term research challenges, pointed to new research directions, or provided new insights or brave perspectives that pave the way to innovation. The selection process was different from the regular papers. First, submission of a 2 page abstract was requested. Then, the first selection was performed and a full paper was required for the selected abstracts and reviewed and chosen. We received 38 submissions for the first stage and 14 were invited to submit the full paper for the second reviewing stage. Finally, there were accepted 6 papers, which formed two sessions of oral presentations. Multimedia Grand Challenge Solutions We had received six challenges as shown below for the Multimedia Grand Challenge Solutions Program.

  1. NHK – Where is beauty? Grand Challenge
  2. Technicolor – Rich Multimedia Retrieval from Input Videos Grand Challenge
  3. Yahoo! – Large-scale Flickr-tag Image Classification Grand Challenge
  4. Huawei/3DLife – 3D human reconstruction and action recognition Grand Challenge
  5. MediaMixer/VideoLectures.NET – Temporal Segmentation and Annotation Grand Challenge
  6. Microsoft: MSR – Bing Image Retrieval Grand Challenge

We received 34 proposals for this program, and 14 of them were accepted for the presentation. In order to promote submissions, all presentations in this program were awarded as Multimedia Grand Challenge Finalists. The best prize and two second best prizes were chosen and awarded. Requested by Technicolor, the Grand Challenge Multimodal Prize was also chosen and awarded. Technical Demonstrations We have received 80 excellent technical demonstrations proposals. The number of submissions was in line to the demonstrations received in the previous year. Three reviewers were assigned to each demo proposal, and finally 40 proposals were chosen. The best demo prize was awarded. Open Source Software Competition This year was the 6th edition of the Open software competition being part of the ACM Multimedia program. The goal of this competition is to praise the invaluable contribution of researchers and software developers who advance the field by providing the community with implementations of codecs, middleware, frameworks, toolkits, libraries, applications, and other multimedia software. This year we have received 16 submissions and after assigning three reviewers to each of them we have selected 11 for the competition. The best open source software was awarded. Doctoral Symposium Doctoral Symposium was meant as a forum for mentoring graduate students. It was held in the afternoon of Oct. 25 both in the oral and poster formats. We have received 19 proposals for doctoral symposium. We accepted 13 presentations (6 oral + poster and 7 additional posters). Additionally, there was organized a Doctoral Symposium lunch in which the students had the opportunity to talk to their assigned mentors. Finally, the best doctoral symposium paper was awarded. Multimedia Art Exhibition and Reception ACM Multimedia provided a rich Multimedia Art Exhibition to stimulate artists and researchers alike to meet and discover the frontiers of multimedia artistic communication. The Art Exhibition has attracted significant work from a variety of digital artists collaborating with research institutions. We have endeavored to select exhibits that achieved an interesting balance between technology and artistic intent. The techniques underpinning these artworks are relevant to several technical tracks of the conference, in particular those dealing with human-centered and interactive media. We had a satellite venue, FAD (Forment de les Arts i del Disseny), for the art exhibition located in the center of the city. The venue had a very good public access. The exhibition was open from Oct. 21 to Oct. 28 and visited by more than 2,000 visitors. The reception event was held with the artists on Oct. 23. We had selected 10 art works for the exhibition:

  1. Emotion Forecast, Maurice Benayoun (City University of Hong Kong)
  2. Critical, Anabela Costa (France)
  3. Smile-Wall, Shen-Chi Chen, He-Lin Luo, Kuan-Wen Chen, Yu-Shan Lin, Hsiao-Lun Wang, Che-Yao Chan, Kai-Chih Huang, Yi-Ping Hung (National Taiwan University)
  4. SOMA, Guillaume Faure (France)
  5. A Feast of Shadow Puppetry, Zhenzhen Hu, Min Lin, Si Liu, Jiangguo Jiang, Meng Wang, Richang Hong, Shuicheng Yan, Hefei University of Technology and NUS
  6. Tele Echo Tube, Hill Hiroki Kobayashi, Kaoru Saito, Akio Fujiwara (University of Tokyo)
  7. 3D-Stroboscopy, Sujin Lee (Sogang University, South Korea)
  8. The Qi of Calligraphy, He-Lin Luo, Yi-Ping Hung (Taiwan National University), I-Chun Chen (Tainan National University of the Arts)
  9. Gestural Pen Animation, Sheng-Ying Pao and Kent Larson (MIT Media Lab, USA)
  10. MixPerceptions, Jose San Pedro (Telefonica Research, Spain), Aurelio San Pedro (Escola Massana, Barcelona), Juan Pablo Carrascal (UPF, Barcelona), Matylda Szmukier (Telefonica Research, Spain)

Attending the Art Exhibition

San Pedro’s Mix Perceptions


We received in 14 tutorial proposals and we have selected 8 tutorials for the main program. All tutorials were half day and were held on Oct. 21 and 22 in parallel with the workshops in the in the Universitat Pompeu Fabra – Balmes building. Tutorials were made free for all participants and we received 312 pre-registrations. Gerald Friedland(ICSI)

Tutorial 1 Foundations and Applications of Semantic Technologies for Multimedia Content
Ansgar Scherp (Uni Mannheim, Germany)
Tutorial 2 Towards Next-Generation Multimedia Recommendation Systems
Jialie Shen, (SMU Singapore)
Shuicheng Yan (NUS)
Xian-Sheng Hua (Microsoft)
Tutorial 3 Crowdsourcing for Multimedia Research
Mohammad Soleymani (Imperial College London)
Martha Larson (TU Delft)
Tutorial 4 Massive-Scale Multimedia Semantic Modeling
John R. Smith (IBM Research )
Liangliang Cao (IBM Research)
Tutorial 5 Social Interactions over Geographic-Aware Multimedia Systems
Roger Zimmerman (NUS)
Yi Yu (NUS)
Tutorial 6 Multimedia Information Retrieval: Music and Audio
Markus Schedl (JKU Linz)
Emilia Gomez (UPF)
Masataka Goto (AIST)
Tutorial 7 Blending the Physical and the Virtual in Musical Technology: From interface design to multimodal signal processing
George Tzanetakis (U Victoria, Canada)
Sidney Fels (UBC)
Michael Lyons (Ritsumeikan U, JP)
Tutorial 8 Privacy Concerns of Sharing Multimedia in Social Networks
Gerald Friedland (ICSI)


Workshops have always been an important part of the conference. Below is the list of workshops held in conjunction with ACM Multimedia 2013. We had 9 full day workshops and 4 half day workshops, which were held on Oct. 21-22 in parallel with the tutorials. We followed the rule from last year and two complementary workshop only registrations were provided for invited talks of each workshop to encourage participation of notable speakers.

Full Day Workshops (8)

  1. 2nd International Workshop on Socially-Aware Multimedia (SAM 2013) Organizers: Pablo Cesar (CWI, NL) Matthew Cooper (FXPAL) David A. Shamma (Yahoo!) Doug Williams (BT)
  1. 4th ACM/IEEE ARTEMIS 2013 International Workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Streams Organizers: Marco Bertini (University of Florence, Italy) Anastasios Doulamis (TU Crete, Greece) Nikolaos Doulamis (Cyprus University of Technology, Cyprus) Jordi Gonzàlez (Universitat Autònoma de Barcelona, Spain) Thomas Moeslund (University of Aalborg, Denmark)
  1. 5th International Workshop on Multimedia for Cooking and Eating Activities (CEA2013) Organizer: Kiyoharu Aizawa(Univ. of Tokyo, JP)
  1. 4th International Workshop on Human Behavior Understanding (HBU 2013) Organizers: Albert Ali Salah, Boğaziçi Univ., Turkey Hayley Hung, Delft Univ. of Technology, The Netherlands Oya Aran, Idiap Research Intitute, Switzerland Hatice Gunes, Queen Mary Univ. of London (QMUL), UK
  1. International ACM Workshop on Crowdsourcing for Multimedia 2013 (CrowdMM 2013) Organizers: Wei-Ta Chu (National Chung Cheng University, TW) Martha Larson (Delft University of Technology, NL) Kuan-Ta Chen (Academia Sinica, TW)
  1. First ACM MM Workshop on Multimedia Indexing and Information Retrieval for Healthcare (ACM MM MIIRH) Organizers: Jenny Benois-Pineau, University of Bordeaux 1, France Alexia Briasouli, CERTH -ITI Alex Hauptman, Carnegie-Mellon University, USA
  1. Workshop on Personal Data Meets Distributed Multimedia Organizers: Vivek Singh, MIT, USA Tat-Seng Chua, NUS Ramesh Jain, University of California, Irvine, USA Alex (Sandy) Pentland, MIT, USA
  1. Workshop on Immersive Media Experiences Organizers: Teresa Chambel, University of Lisbon, Portugal V. Michael Bove, MIT Media Lab, USA Sharon Strover, University of Texas at Austin, USAA Paula Viana, Polytechnic of Porto and INESC TEC, Portugal Graham Thomas, BBC, UK
  1. Workshop on Event-based Media Integration and Processing Organizers: Fausto Giunchiglia, University of Trento, Italy Sang “Peter” Chin, Johns Hopkins University, US Giulia Boato, University of Trento, Italy Bogdan Ionescu, University Politehnica of Bucharest, Romania Yiannis Kompatsiaris, Centre for Research and Technology Hellas, Greece

Half Day Workshops (4)

  1. ACM Multimedia Workshop on Geotagging and Its Applications Organizers: Liangliang Cao, IBM T. J. Watson Research Center, USA Gerald Friedland, International Computer Science Institute, USA, Pascal Kelm, Technische Universitaet of Berlin, Germany
  1. Data-driven challenge-based workshop ACM MM 2013(AVEC 2013) Organizers: Björn Schuller, TUM, Germany Michel Valstar, University of Nottingham, UK Roddy Cowie, Queen’s University Belfast, UK Maja Pantic, Imperial College London, UK Jarek Krajewski, University of Wuppertal, Germany
  1. 2nd ACM International Workshop on Multimedia Analysis for Ecological Data (MAED 2013) Organizers: Concetto Spampinato, University of Catania, Italy Vasileios Mezaris, CERTH, Greece Jacco van Ossenbruggen, CWI, The Netherlands
  1. 3rd International Workshop on Interactive Multimedia on Mobile and Portable Devices(IMMPD’13) Organizers: Jiebo Luo, University of Rochester, USA Caifeng Shan, Philips Research, The Netherlands Ling Shao, The University of Sheffield, UK Minoru Etoh, NTT DOCOMO, Japan


Awards were given in almost all the programs except for short papers during the banquet that was organized at the conference venue. The following awards have been given: Best Paper Award Luoqi Liu, Hui Xu, Junliang Xing, Si Liu, Xi Zhou and Shuicheng Yan, National University of Singapore (NUS), “Wow! You Are So Beautiful Today!” Best Student Paper Award Hanwang Zhang, Zheng-Jun Zha, Yang Yang, Shuicheng Yan, Yue Gao and Tat-Seng Chua, National University of Singapore (NUS), “Attributes-augmented Semantic Hierarchy for Image Retrieval” Grand Challenge 1st Place Award [Sponsored by Technicolor] Brendan Jou, Hongzhi Li, Joseph G. Ellis, Daniel Morozoff-Abegauz and Shih-Fu Chang, Digital Video & Multimedia (DVMM) Lab, Columbia University, “Structured Exploration of Who, What, When, and Where in Heterogenous Multimedia News Sources” Grand Challenge 2nd Place Award [Sponsored by Technicolor] Subhabrata Bhattacharya, Behnaz Nojavanasghari, Tao Chen, Dong Liu, Shih-Fu Chang, Mubarak Shah, University of Central Florida and Columbia University, “Towards a Comprehensive Computational Model for Aesthetic Assessment of Videos” Grand Challenge 3rd Place Award [Sponsored by Technicolor] Shannon Chen, Penye Xia, and Klara Nahrstedt, UIUC, “Activity-Aware Adaptive Compression: A Morphing-Based Frame Synthesis Application in 3DTI”

Program chairs during the banquet

Award ceremony

Banquet venue

Social program

Grand Challenge Multimodal Award [Sponsored by Technicolor] Chun-Che Wu, Kuan-Yu Chu, Yin-Hsi Kuo, Yan-Ying Chen, Wen-Yu Lee, Winston H. Hsu, National Taiwan University, Taiwan, “Search-Based Relevance Association with Auxiliary Contextual Cues” Best Demo Award Duong-Trung-Dung Nguyen, Mukesh Saini; Vu-Thanh Nguyen, Wei Tsang Ooi, National University of Singapore (NUS), “Jiku director: An online mobile video mashup system” Best Doctoral Symposium Paper Jules Francoise, Institut de Recherche et Coordination Acoustique/Musique (IRCAM), “Gesture-Sound Mapping by demonstration in Interactive Music Systems” Best Open Source Software Award Dmitry Bogdanov, Nicolas Wack, Emilia Gómez, Sankalp Gulati, Perfecto Herrera, Oscar Mayor, Gerard Roma, Justin Salamon, Jose Zapata Xavier Serra (UPF), “ESSENTIA: An Audio Analysis Library for Music Information Retrieval”

Prize amounts:

Best Paper Award 500 euro
Best Student Paper Award 250 euro
Grand Challenge 1st Prize 750 euro
Grand Challenge 2nd Prize 500 euro
Grand Challenge 3nd Prize 200 euro
Grand Challenge Multimodal Prize 500 euro
Best Technical Demo Award 250 euro
Best Doctoral Symposium Paper 250 euro
Best Open Source Software Award 250 euro
Student Travel Grant (35 students) $26,000 ($10,000 NSF, $16,000 SIGMM)

Sponsors:We had an incredible support from industries and funding organizations (38.5k euro). All the sponsors and the institutional supporters are listed in Appendix B. The sponsoring amount for each individual sponsor is as follows:

Sponsor Amount
FXPAL 5000 euro
Google 5000 euro
Huawei 5000 euro
Yahoo!Labs 5000 euro
Technicolor 4000 euro
Media Mixer 3500 euro
INRIA 3000 euro
Facebook 2000 euro
IBM 2000 euro
Telefonica 2000 euro
Microsoft 2000 euro
Total 38500 euro

The benefits for the sponsors were honorary registrations and publicity, that is, the company logo was published on the website of the conference, in the Proceedings, and the Booklet. On top of these amounts we have received 16k$ from SIGMM and 10K from NSF for student travel grants.

Geographical distribution of the participants

We had 544 participants at the main conference and workshops. The main conference was attended by 476 participants out of which 425 paid and 51 participants were special cases (sponsors, student volunteers, etc.), and 68 participants attended only the workshops. The tutorials which were free of charge were registered by 312 in advance. Country-wise distribution is shown below. As shown in the list, the geographical distribution was wide meaning that we managed to attract participants from a large number of countries.

Total  # of participants 544      
USA 75 Switzerland 20
Singapore 48 Germany 20
China 45 Portugal 20
Japan 40 Taiwan 18
UK 35 Korea 15
Italy 29 Australia 15
France 28 Greece 14
Netherlands 26 Turkey 14
Spain 26 25 other countries 56


In order to gather opinions from the participants at ACM Multimedia 2013 we have performed a post-conference survey and the results are summarized in Appendix C. Here we summarize the 10 most important issues that were compiled after analyzing the answers received. The effort to gather all this information is the first of its kind at ACM Multimedia and we hope this tradition will be continued in the future. The results of the survey represent in our opinion a very good source of information for the future organizers.

  1. Poster space too small
  2. Many people still want USB proceedings!!
  3. Oral topics in the same time slot overlapped too much. Need to diversify.
  4. Need to attract more multimedia niche topics. Should not become a second rate CV conference
  5. First day location hard to find. Workshop/tutorial better to be co-located with main conference
  6. Senior members of MM community should participate in paper sessions more
  7. Need to update web site program content and make it available earlier
  8. Consider offering short spotlight talks for poster papers
  9. Keep 15 mins for oral, but have them presented again in poster session for more discussion
  10. SIGMM business meeting too long. Not enough time for QA.


ACM Multimedia 2013 was a great success with a great number of submissions, an excellent technical program, attractive program components, and stimulating events. As a result, we welcomed a large number of participants, in line with our initial expectation. There were a few problems see above but this is only natural. We greatly acknowledge those who have contributed to the success of ACM Multimedia 2013. We thank the organizers of ACM Multimedia 2012 for their useful suggestions and comments which helped us to improve the organization the 2013 edition. We also thank them for giving us the template for the conference booklet. We thank the many paper authors and proposal contributors for the various technical and program components. We thank the large number of volunteers, including the Organizing Committee members and Technical Program Committee members who worked very hard to create this year’s outstanding conference. Every aspect of the conference was also aided by local committee members and by the hard work of Grupo Pacifico, to whom we are very grateful. We thank also ACM staff and Sheridan Printing Company for their constant support. This success was clearly due to the integration of their efforts.


General Co-Chairs  Alejandro (Alex) Jaimes (Yahoo Labs, Spain) Nicu Sebe (University of Trento, Italy) Nozha Boujemaa (INRIA, France) Technical Program Co-Chairs Daniel Gatica-Perez (IDIAP & EPFL, Switzerland) David A. Shamma (Yahoo Labs, USA) Marcel Worring (University of Amsterdam, The Netherlands) Roger Zimmermann (National University of Singapore, Singapore) Author’s Advocate Pablo Cesar (CWI, The Netherlands) Multimedia Grand Challenge Co-Chairs Yiannis Kompatsiaris (CERTH, Greece) Neil O’Hare (Yahoo Labs, Spain) Interactive Arts Co-Chairs Antonio Camurri (University of Genova, Italy) Marc Cavazza (Teesside University, UK) Local Arrangement Chair Mari-Carmen Marcos (Pompeu Fabra University, Spain) Sponsorship Chairs Ricardo Baeza-Yates (Yahoo Labs, Spain) Bernard Merialdo (Eurecom, France) Panel Co-Chairs  Yong Rui (Microsoft, China) Winston Hsu (National Tawain University, Taiwan) Michael Lew (University of Leiden, The Netherlands) Video Program Chairs Alexis Joly (INRIA, France) Giovanni Maria Farinella (University of Catania, Italy) Julien Champ (INRIA/LIRMM, France) Brave New Ideas Co-Chairs Jiebo Luo (University of Rochester, USA) Shuicheng Yan (National University of Singapore, Singapore) Doctorial Symposium Chairs Hayley Hung (Technical University of Delft, The Netherlands) Marco Cristani (University of Verona, Italy) Open Source Competition Chairs Ioannis (Yiannis) Patras (Queen Mary University, UK) Andrea Vedaldi (Oxford University, UK) Tutorial Co-Chairs Kiyoharu Aizawa (University of Tokyo, Japan) Lexing Xie (Australian National University, Australia) Workshop Co-Chairs Maja Pantic (Imperial College, UK ) Vladimir Pavlovic (Rutgers University, USA) Student Travel Grants Co-Chairs Ramanathan Subramanian (ADSC, Singapore) Jasper Uijlings (University of Trento, Italy) Publicity Co-Chairs Marco Bertini (University of Florence, Italy) Ichiro Ide (Nagoya University, Japan) Technical Demo Co-Chairs  Yi Yang (Carnegie Mellon University, USA) Xavier Anguera (Telefonica Research, Spain) Proceedings Co-Chairs  Bogdan Ionescu (University Politehnica of Bucharest, Romania) Qi Tian (University of Texas San Antonio, USA) Web Chair Michele Trevisol (Web Research Group UPF & Yahoo Labs, Spain)

Appendix B. ACM MM 2012 Sponsors & Supporters

MPEG Column: 107th MPEG Meeting

— original posts here and here by Multimedia Communication blog and bitmovin techblogChristian TimmererAAU/bitmovin

The MPEG-2 Transport Stream (M2TS; formally known as Rec. ITU-T H.222.0 | ISO/IEC 13818-1) has been awarded with the Technology & Engineering Emmy® Award by the National Academy of Television Arts & Sciences. It is the fourth time MPEG received an Emmy award. The M2TS is widely deployed across a broad range of application domain such as broadcast, cable TV, Internet TV (IPTV and OTT), and Blu-ray Disks. The Emmy was received during this year’s CES2014 in Las Vegas.

Plenary during the 107th MPEG Meeting.

Other topics of the 107th MPEG meeting in San Jose include the following highlights:

  • Requirements: Call for Proposals on Screen Content jointly with ITU-T’s Video Coding Experts Group (VCEG)
  • Systems: Committee Draft for Green Metadata
  • Video: Study Text Committee Draft for Compact Descriptors for Visual Search (CDVS)
  • JCT-VC: Draft Amendment for HEVC Scalable Extensions (SHVC)
  • JCT-3D: Proposed Draft Amendment for HEVC 3D Extensions (3D-HEVC)
  • Audio: 3D audio plans to progress to CD at 108th meeting
  • 3D Graphics: Working Draft 4.0 of Augmented Reality Application Format (ARAF) 2nd Edition

The official MPEG press release can be downloaded from the MPEG Web site. Some of the above highlighted topics will be detailed in the following and, of course, there’s an update on DASH-related matters at the end.

Call for Proposals on Screen Content

Screen content refers to content coming not from cameras but from screen/desktop sharing and collaboration, cloud computing and gaming, wirelessly connected displays, control rooms with high resolution display walls, virtual desktop infrastructures, tablets as secondary displays, PC over IP, ultra-thin client technology, etc. Also mixed-content is within the scope of this work item and may contain a mixture of camera-captured video and images with rendered computer-generated graphics, text, animation, etc.

Although this type of content was considered during the course of the HEVC standardization, recent studies in MPEG have led to the conclusion that significant further improvements in coding efficiency can be obtained by exploiting the characteristics of screen content and, thus, a Call for Proposals (CfP) is being issued for developing possible future extensions of the HEVC standard.

Companies and organizations are invited to submit proposals in response to this call –issued jointly by MPEG with ITU-T VCEG. Responses are expected to be submitted by early March, and will be evaluated during the 108th MPEG meeting. The timeline is as follows:

  • 2014/01/17: Final Call for Proposals
  • 2014/01/22: Availability of anchors and end of editing period for Final CfP
  • 2014/02/10: Mandatory registration deadline
    One of the contact persons (see Section 10) must be notified, and an invoice for the testing fee will be sent after registration. Additional logistic information will also be sent to proponents by this date.
  • 2014/03/05: Coded test material shall be available at the test site. By this date, the payment of the testing fee is expected to be finalized.
  • 2014/03/17: Submission of all documents and requested data associated with the proposal.
  • 2014/03/27-04/04: Evaluation of proposals at standardization meeting.
  • 2015: Final draft standard expected.

It will be interesting to see the coding efficiency of the submitted proposals compared to a pure HEVC or even AVC approach.

DEC PDP-8 at Computer History Museum during MPEG Social Event.

Committee Draft for Green Metadata

Green Metadata, formerly known as Green MPEG, shall enable energy-efficient media consumption and reached Committee Draft (CD) status at the 107th MPEG meeting. The representation formats defined within Green Metadata help reducing decoder power consumption and display power consumption. Clients may utilize such information for the adaptive selection of operating voltage or clock frequencies within their chipsets. Additional, it may be used to set the brightness of the backlights for the display to save power consumption.

Green Metadata also provides metadata for the signaling and selection of DASH representations to enable the reduction of power consumption for their encoding.

The main challenge in terms of adoption of this kind of technology is how to exploit these representation formats to actually achieve energy-efficient media consumption and how much!

What’s new on the DASH frontier?

The text of ISO/IEC 23009-1 2nd edition PDAM1 has been approved which may be referred to as MPEG-DASH v3 (once finalized and integrated into the second edition, possibly with further amendments and corrigenda, if applicable). This first amendment to MPEG-DASH v2 comprises accurate time synchronization between server and client for live services as well as a new profile, i.e., ISOBMFF High Profile which basically combines the ISOBMFF Live and ISOBMFF On-demand profiles and adds the Xlink feature.

Additionally, a second amendment to MPEG-DASH v2 has been started featuring Spatial Relationship Description (SRD) and DASH Client Authentication and Content Access Authorization (DAA).

Other DASH-related aspects include the following:

  • The common encryption for ISOBMFF has been extended with a simple pattern-based encryption mode, i.e., a new method which should simply content encryption.
  • The CD has been approved for the carriage of timed metadata metrics of media in ISOBMFF. This allows for the signaling of quality metrics within the segments enabling QoE-aware DASH clients.

What else? That is, some publicly available MPEG output documents… (Dates indicate availability and end of editing period, if applicable, using the following format YY/MM/DD):

  • Report of 3D-AVC Subjective Quality Assessment (14/02/28)
  • Working Draft 3 of Video Coding for Browsers (14/01/31)
  • Common Test Conditions for Proposals on VCB Enhancements (14/01/17)
  • Study Text of ISO/IEC CD 15938-13 Compact Descriptors for Visual Search (14/02/14)
  • WD 4.0 of ARAF 2nd Edition (14/02/07)
  • Text of ISO/IEC 23001-7 PDAM 1 Simple pattern-based encryption mode (14/01/31)
  • Text of ISO/IEC CD 23001-10 Carriage of Timed Metadata Metrics of Media in the ISO Base Media File Format (14/01/31)
  • Text of ISO/IEC CD 23001-11 Green Metadata (14/01/24)
  • Preliminary Draft of ISO/IEC 23008-2:2013/FDAM1 HEVC Range Extensions (14/02/28)
  • Text of ISO/IEC 23008-2:2013/DAM3 HEVC Scalable Extensions (14/01/31)
  • Preliminary Draft of ISO/IEC 23008-2:2013/FDAM2 HEVC Multiview Extensions (14/02/28)
  • Text of ISO/IEC 23008-2:2013/PDAM4 3D Extensions (14/03/14)
  • Text of ISO/IEC CD 23008-12 Image File Format (14/01/17)
  • Text of ISO/IEC 23009-1:201x DCOR 1 (14/01/24)
  • Text of ISO/IEC 23009-1:201x PDAM 1 High Profile and Availability Time Synchronization (14/01/24)
  • WD of ISO/IEC 23009-1 AMD 2 (14/01/31)
  • Requirements for an extension of HEVC for coding of screen content (14/01/17)
  • Joint Call for Proposals for coding of screen content (14/01/22)
  • Draft requirements for Higher Dynamic Range (HDR) and Wide Color Gamut (WCG) video coding for Broadcasting, OTT, and Storage Media (14/01/17)
  • Working Draft 1 of Internet Video Coding (IVC) (14/01/31)

ACM TOMM (TOMCCAP) Call for Special Issue Proposals

ACM – TOMM is one of the world’s leading journals on multimedia. As in previous years, we are planning to publish a special issue in 2015. Proposals are accepted until May, 1st 2014. Each special issue is in the responsibility of the guest editors. If you wish to guest edit a special issue, you should prepare a proposal as outlined below, then send this via e-mail to the Senior Associate Editor (SAE) for Special Issue Management of TOMM, Shervin Shirmohammadi (

Call for Proposals – Special Issue
Deadline for Proposal Submission: May, 1st 2014
Notification: June, 1st 2014
Proposals should:

  • Cover a current or emerging topic in the area of multimedia computing, communications and applications;
  • Set out the importance of the special issue’s topic in that area;
  • Give a strategy for the recruitment of high quality papers;
  • Indicate a draft timeline in which the special issue could be produced (paper writing, reviewing, and submission of final copies to TOMM), assuming the proposal is accepted.
  • Include the list of the proposed guest editors, their short bios, and their experience as related to the Special Issue’s topic

As in the previous years, the special issue will be published as online-only issue in the ACM Digital Library. This gives the guest editors higher flexibility in the review process and the number of papers to be accepted, while yet ensuring a timely publication.

The proposals will be reviewed by the SAE together with the EiC. The final decision will be made by the EiC. A notification of acceptance for the proposals will be given until June, 1st 2014. Once a proposal is accepted we will contact you to discuss the further process.

For questions please contact:

  • Shervin Shirmohammadi – Senior Associate Editor for Special Issue Management ( )
  • Ralf Steinmetz – Editor in Chief (EiC) ( )
  • Sebastian Schmidt – Information Director ( )