Report from ACM SIG Heritage Workshop

What does history mean to computer scientists?” – that was the first question that popped up in my mind when I was to attend the ACM Heritage Workshop at Minneapolis few months back. And needless to say, the follow up question was “what does history mean for a multimedia systems researcher?” As a young graduate student, I had the joy of my life when my first research paper on multimedia authoring (a hot topic those days) was accepted for presentation in the first ACM Multimedia in 1993, and that conference was held along side SIGGRAPH. Thinking about that, it gives multimedia systems researchers about 25 to 30 years of history. But what a flow of topics this area has seen: from authoring to streaming to content-based retrieval to social media and human-centered multimedia, the research area has been hot as ever. So, is it the history of research topics or the researchers or both? Then, how about the venues hosting these conferences, the networking events, or the grueling TPC meetings that prepped the conference actions?

Figure 1. Picture from the venue

With only questions and no clear answers, I decided to attend the workshop with an open mind. Most SIGs (Special Interest Groups) in ACM had representation at this workshop. The workshop itself was organized by the ACM History Committee. I understood this committee, apart from the workshop, organizes several efforts to track, record, and preserve computing efforts across disciplines. This includes identifying distinguished persons (who are retired but made significant contributions to computing), coming up with a customized questionnaire for the persons, training the interviewer, recording the conversations, curating them, archiving, and providing them for public consumption. Efforts at most SIGs were mostly based on the website. They were talking about how they try to preserve conference materials such as paper proceedings (when only paper proceedings were published), meeting notes, pictures, and videos. For instance, some SIGs were talking about how they tracked and preserved ACM’s approval letter for the SIG! 

It was very interesting – and touching – to see some attendees (senior Professors) coming to the workshop with boxes of materials – papers, reports, books, etc. They were either downsizing their offices or clearing out, and did not feel like throwing the material in recycling bins! These materials were given to ACM and Babbage Institute (at University of Minnesota, Minneapolis) for possible curation and storage.

Figure 2. Galleries with collected material

ACM History committee members talked about how they can fund (at a small level) projects that target specific activities for preserving and archiving computing events and materials. ACM History Committee agreed that ACM should take more responsibility in providing technical support to web hosting – obviously, not sure whether anything tangible would result.

Over the two days at the workshop, I was getting answers to my questions: History can mean pictures and videos taken at earlier MM conferences, TPC meetings, SIGMM sponsored events and retreats. Perhaps, the earlier paper proceedings that have some additional information than what is found in the corresponding ACM Digital Library version. Interviews with different research leaders that built and promoted SIGMM.

It was clear that history meant different things to different SIGs, and as SIGMM community, we would have to arrive at our own interpretation, collect and preserve that. And that made me understand the most obvious and perhaps, the most important thing: today’s events become tomorrow’s history! No brainer, right? Preserving today’s SIGMM events will give us a richer, colorful, and more complete SIGMM history for the future generations!

For the curious ones:

ACM Heritage Workshop website is at: https://acmsigheritage.dash.umn.ed

Some of the workshop presentation materials are available at:

Socially significant music events

Social media sharing platforms (e.g., YouTube, Flickr, Instagram, and SoundCloud) have revolutionized how users access multimedia content online. Most of these platforms provide a variety of ways for the user to interact with the different types of media: images, video, music. In addition to watching or listening to the media content, users can also engage with content in different ways, e.g., like, share, tag, or comment. Social media sharing platforms have become an important resource for scientific researchers, who aim to develop new indexing and retrieval algorithms that can improve users’ access to multimedia content. As a result, enhancing the experience provided by social media sharing platforms.

Historically, the multimedia research community has focused on developing multimedia analysis algorithms that combine visual and text modalities. Less highly visible is research devoted to algorithms that exploit an audio signal as the main modality. Recently, awareness for the importance of audio has experienced a resurgence. Particularly notable is Google’s release of the AudioSet, “A large-scale dataset of manually annotated audio events” [7]. In a similar spirit, we have developed the “Socially Significant Music Event“ dataset that supports research on music events [3]. The dataset contains Electronic Dance Music (EDM) tracks with a Creative Commons license that have been collected from SoundCloud. Using this dataset, one can build machine learning algorithms to detect specific events in a given music track.

What are socially significant music events? Within a music track, listeners are able to identify certain acoustic patterns as nameable music events.  We call a music event “socially significant” if it is popular in social media circles, implying that it is readily identifiable and an important part of how listeners experience a certain music track or music genre. For example, listeners might talk about these events in their comments, suggesting that these events are important for the listeners (Figure 1).

Traditional music event detection has only tackled low-level events like music onsets [4] or music auto-tagging [810]. In our dataset, we consider events that are at a higher abstraction level than the low-level musical onsets. In auto-tagging, descriptive tags are associated with 10-second music segments. These tags generally fall into three categories: musical instruments (guitar, drums, etc.), musical genres (pop, electronic, etc.) and mood based tags (serene, intense, etc.). The types of tags are different than what we are detecting as part of this dataset. The events in our dataset have a particular temporal structure unlike the categories that are the target of auto-tagging. Additionally, we analyze the entire music track and detect start points of music events rather than short segments like auto-tagging.

There are three music events in our Socially Significant Music Event dataset: Drop, Build, and Break. These events can be considered to form the basic set of events used by the EDM producers [1, 2]. They have a certain temporal structure internal to themselves, which can be of varying complexity. Their social significance is visible from the presence of large number of timed comments related to these events on SoundCloud (Figure 1,2). The three events are popular in the social media circles with listeners often mentioning them in comments. Here, we define these events [2]:

  1. Drop: A point in the EDM track, where the full bassline is re-introduced and generally follows a recognizable build section
  2. Build: A section in the EDM track, where the intensity continuously increases and generally climaxes towards a drop
  3. Break: A section in an EDM track with a significantly thinner texture, usually marked by the removal of the bass drum

Figure 1. Screenshot from SoundCloud showing a list of timed comments left by listeners on a music track [11].

Figure 1. Screenshot from SoundCloud showing a list of timed comments left by listeners on a music track [11].


SoundCloud is an online music sharing platform that allows users to record, upload, promote and share their self-created music. SoundCloud started out as a platform for amateur musicians, but currently many leading music labels are also represented. One of the interesting features of SoundCloud is that it allows “timed comments” on the music tracks. “Timed comments” are comments, left by listeners, associated with a particular time point in the music track. Our “Socially Significant Music Events” dataset is inspired by the potential usefulness of these timed comments as ground truth for training music event detectors. Figure 2 contains an example of a timed comment: “That intense buildup tho” (timestamp 00:46). We could potentially use this as a training label to detect a build, for example. In a similar way, listeners also mention the other events in their timed comments. So, these timed comments can serve as training labels to build machine learning algorithms to detect events.

Figure 2. Screenshot from SoundCloud indicating the useful information present in the timed comments. [11]

Figure 2. Screenshot from SoundCloud indicating the useful information present in the timed comments. [11]

SoundCloud also provides a well-documented API [6] with interfaces to many programming languages: Python, Ruby, JavaScript etc. Through this API, one can download the music tracks (if allowed by the uploader), timed comments and also other metadata related to the track. We used this API to collect our dataset. Via the search functionality we searched for tracks uploaded during the year 2014 with a Creative Commons license, which results in a list of tracks with unique identification numbers. We looked at the timed comments of these tracks for the keywords: drop, break and build. We kept the tracks whose timed comments contained a reference to these keywords and discarded the other tracks.


The dataset contains 402 music tracks with an average duration of 4.9 minutes. Each track is accompanied by timed comments relating to Drop, Build, and Break. It is also accompanied by ground truth labels that mark the true locations of the three events within the tracks. The labels were created by a team of experts. Unlike many other publicly available music datasets that provide only metadata or short previews of music tracks  [9], we provide the entire track for research purposes. The download instructions for the dataset can be found here: [3]. All the music tracks in the dataset are distributed under the Creative Commons license. Some statistics of the dataset are provided in Table 1.  

Table 1. Statistics of the dataset: Number of events, Number of timed comments

Event Name Total number of events Number of events per track Total number of timed comments Number of timed comments per track
Drop  435  1.08  604  1.50
Build  596  1.48  609  1.51
Break  372  0.92  619  1.54

The main purpose of the dataset is to support training of detectors for the three events of interest (Drop, Build, and Break) in a given music track. These three events can be considered a case study to prove that it is possible to detect socially significant musical events, opening the way for future work on an extended inventory of events. Additionally, the dataset can be used to understand the properties of timed comments related to music events. Specifically, timed comments can be used to reduce the need for manually acquired ground truth, which is expensive and difficult to obtain.

Timed comments present an interesting research challenge: temporal noise. The timed comments and the actual events do not always coincide. The comments could be at the same position, before, or after the actual event. For example, in the below music track (Figure 3), there is a timed comment about a drop at 00:40, while the actual drop occurs only at 01:00. Because of this noisy nature, we cannot use the timed comments alone as ground truth. We need strategies to handle temporal noise in order to use timed comments for training [1].

Figure 3. Screenshot from SoundCloud indicating the noisy nature of timed comments [11].

Figure 3. Screenshot from SoundCloud indicating the noisy nature of timed comments [11].

In addition to music event detection, our “Socially Significant Music Event” dataset opens up other possibilities for research. Timed comments have the potential to improve users’ access to music and to support them in discovering new music. Specifically, timed comments mention aspects of music that are difficult to derive from the signal, and may be useful to calculate song-to-song similarity needed to improve music recommendation. The fact that the comments are related to a certain time point is important because it allows us to derive continuous information over time from a music track. Timed comments are potentially very helpful for supporting listeners in finding specific points of interest within a track, or deciding whether they want to listen to a track, since they allow users to jump-in and listen to specific moments, without listening to the track end-to-end.

State of the art

The detection of music events requires training classifiers that are able to generalize over the variability in the audio signal patterns corresponding to events. In Figure 4, we see that the build-drop combination has a characteristic pattern in the spectral representation of the music signal. The build is a sweep-like structure and is followed by the drop, which we indicate by a red vertical line. More details about the state-of-the-art features useful for music event detection and the strategies to filter the noisy timed comments can be found in our publication [1].

Figure 4. The spectral representation of the musical segment containing a drop. You can observe the sweeping structure indicating the buildup. The red vertical line is the drop.

Figure 4. The spectral representation of the musical segment containing a drop. You can observe the sweeping structure indicating the buildup. The red vertical line is the drop.

The evaluation metric used to measure the performance of a music event detector should be chosen according to the user scenario for that detector. For example, if the music event detector is used for non-linear access (i.e., creating jump-in points along the playbar) it is important that the detected time point of the event falls before, rather than after, the actual event.  In this case, we recommend using the “event anticipation distance” (ea_dist) as a metric. The ea_dist is amount of time that the predicted event time point precedes an actual event time point and represents the time the user would have to wait to listen to the actual event. More details about ea_dist can be found in our paper [1].

In [1], we report the implementation of a baseline music event detector that uses only timed comments as training labels. This detector attains an ea_dist of 18 seconds for a drop. We point out that from the user point of view, this level of performance could already lead to quite useful jump-in points. Note that the typical length of a build-drop combination is between 15-20 seconds. If the user is positioned 18 seconds before the drop, the build would have already started and the user knows that a drop is coming. Using an optimized combination of timed comments and manually acquired ground truth labels we are able to achieve an ea_dist of 6 seconds.


Timed comments, on their own, can be used as training labels to train detectors for socially significant events. A detector trained on timed comments performs reasonably well in applications like non-linear access, where the listener wants to jump through different events in the music track without listening to it in its entirety. We hope that the dataset will encourage researchers to explore the usefulness of timed comments for all media. Additionally, we would like to point out that our work has demonstrated that the impact of temporal noise can be overcome and that the contribution of timed comments to video event detection is worth investigating further.


Should you have any inquiries or questions about the dataset, do not hesitate to contact us via email at:


[1] K. Yadati, M. Larson, C. Liem and A. Hanjalic, “Detecting Socially Significant Music Events using Temporally Noisy Labels,” in IEEE Transactions on Multimedia. 2018.

[2] M. Butler, Unlocking the Groove: Rhythm, Meter, and Musical Design in Electronic Dance Music, ser. Profiles in Popular Music. Indiana University Press, 2006 






[8] H. Y. Lo, J. C. Wang, H. M. Wang and S. D. Lin, “Cost-Sensitive Multi-Label Learning for Audio Tag Annotation and Retrieval,” in IEEE Transactions on Multimedia, vol. 13, no. 3, pp. 518-529, June 2011.





Dear Member of the SIGMM Community, welcome to the third issue of the SIGMM Records in 2013.

On the verge of ACM Multimedia 2013, we can already present the receivers of SIGMM’s yearly awards, the SIGMM Technical Achievement Award, the SIGMM Best Ph.D. Thesis Award, the TOMCCAP Nicolas D. Georganas Best Paper Award, and the TOMCCAP Best Associate Editor Award.

The TOMCCAP Special Issue on the 20th anniversary of ACM Multimedia is out in October, and you can read both the announcement, and find each of the contributions directly through the TOMCCAP Issue 9(1S) table of contents.

That SIGMM has established a strong foothold in the scientific community can also be seen by the Chinese Computing Federation’s rankings of SIGMM’s venues. Read the article to get even more motivation for submitting your papers to SIGMM’s conferences and journal.

We are also reporting from SLAM, the international workshop on Speech, Language and Audio in Multimedia. Not a SIGMM event, but certainly of interest to many SIGMMers who care about audio technology.

You find also two PhD thesis summaries, and last but most certainly not least, you find pointers to the latest issues of TOMCCAP and MMSJ, and several job announcements.

We hope that you enjoy this issue of the Records.

The Editors
Stephan Kopf, Viktor Wendel, Lei Zhang, Pradeep Atrey, Christian Timmerer, Pablo Cesar, Mathias Lux, Carsten Griwodz

ACM TOMCCAP Special on 20th Anniversary of ACM Multimedia

ACM Transactions on Multimedia Computing, Communications and Applications

Special Issue: 20th Anniversary of ACM International Conference on Multimedia

A journey ‘Back to the Future’

The ACM Special Interest Group on Multimedia (SIGMM) celebrated the 20th anniversary of the establishment of its premier conference, the ACM International Conference on Multimedia (ACM Multimedia) in 2012. To commemorate this milestone, leading researchers organized and extensively contributed to the 20th anniversary celebration.

from left to right: Malcolm Slaney, Ramesh Jain, Dick Bulterman, Klara Nahrstedt, Larry Rowe and Ralf Steinmetz

The celebratory events started at ACM Multimedia 2012 in Nara Japan, with the  “Coulda, Woulda, Shoulda: 20 Years of Multimedia Opportunities” panel, organized by Klara Nahrstedt (center) and Malcolm Slaney (far left). At this panel, pioneers of the field, Ramesh Jain, Dick Bulterman, Larry Rowe and Ralf Steinmetz, from left to right shown in the image, reflected on innovations, and successful and missed opportunities in the multimedia research area.

This special issue of the ACM Transaction on Multimedia Computing, Communication and Applications (TOMCCAP) is the final event to celebrate achievements and opportunities in a variety of multimedia areas. Through peer-reviewed long articles and invited short contributions, readers will get a sense of the past, present and future of multimedia research. The evolution ranges over traditional topics such as video streaming, multimedia synchronization, multimedia authoring, content analysis, and multimedia retrieval to newer topics including music retrieval, geo-tagging context in worldwide community of photos, multi-modal humancomputer interactions and experiential media systems.

Recent years have seen an explosion of research and technologies in multimedia, beyond individual algorithms, protocols and small scale systems. The scale of multimedia innovations and deployment has exploded with unimaginable speed. Hence, as the multimedia area is growing fast, penetrating every facet of our society, this special issue fills an important need to look back at the multimedia research achievements over the past 20 years, celebrates the exciting potential, and explores new goals of the multimedia research community.

Visit to view in the DL.

TOMCCAP Nicolas D. Georganas Best Paper Award 2013

ACM Transactions on Multimedia Computing, Communications and Applications (TOMCCAP) Nicolas D. Georganas Best Paper Award

The 2013 ACM Transactions on Multimedia Computing, Communications and Applications (TOMCCAP) Nicolas D. Georganas Best Paper Award is provided to the paper Exploring interest correlation for peer-to-peer socialized video sharing (TOMCCAP vol. 8, Issue 1) by Xu Cheng and Jiangchuan Liu.

The purpose of the named award is to recognize the most significant work in ACM TOMCCAP in a given calendar year. The whole readership of ACM TOMCCAP was invited to nominate articles which were published in Volume 8 (2012). Based on the nominations the winner has been chosen by the TOMCCAP Editorial Board. The main assessment criteria have been quality, novelty, timeliness, clarity of presentation, in addition to relevance to multimedia computing, communications, and applications.

In this paper the authors examine architectures for large-scale video streaming systems exploiting social relations. To achieve this objective, a large study of YouTube traffic was conducted and a cluster analysis performed on the resulting data. Based on the observations made, a new approach for video pre-fetching based on social relations has been developed. This important work bridges the gap between social media and multimedia streaming and hence combines two extremely relevant research topics.

The award honors the founding Editor-in-Chief of TOMCCAP, Nicolas D. Georganas, for his outstanding contributions to the field of multimedia computing and his significant contributions to ACM.  He exceedingly influenced the research and the whole multimedia community.

The Editor-in-Chief Prof. Dr.-Ing. Ralf Steinmetz and the Editorial Board of ACM TOMCCAP cordially congratulate the winner. The award will be presented to the authors on October 24th 2013 at the ACM Multimedia 2013 in Barcelona, Spain and includes travel expenses for the winning authors.

Bio of Awardees:

Xu Cheng is currently a research engineer at BroadbandTV, Vancouver, Canada. He receive the Bachelor of Science from Peking University, China, in 2006, Master of Science from Simon Fraser University, Canada, in 2008, and PhD from Simon Fraser University, Canada, in 2012. His research interests included multimedia networks, social networks and overlay networks.


Jiangchuan Liu is an Associate Professor in School of Computing Science, Simon Fraser University, British Columbia, Canada. He received BEng(cum laude) from Tsinghua University in 1999 and PhD from HKUST in 2003, both in computer science. He is a co-recipient of ACM Multimedia’2012 Best Paper Award, IEEE Globecom’2011 Best Paper Award, IEEE Communications Society Best Paper Award on Multimedia Communications 2009, as well as IEEE IWQoS’08 and IEEE/ACM IWQoS’2012 Best Student Paper Awards. His research interests are in networking and multimedia. He served on the editorial boards of IEEE Transactions on Multimedia, IEEE Communications Tutorial and Surveys, and IEEE Internet of Things Journal. He will be TPC co-chair for IEEE/ACM IWQoS’2014 at Hong Kong.


TOMCCAP Best Associate Editor Award 2013

ACM Transactions on Multimedia Computing, Communications and Applications Best Associate Editor Award

Annually, the Editor-in-Chief of the ACM Transactions on Multimedia Computing, Communications and Applications (TOMCCAP) honors one member of the Editorial Board with the TOMCCAP Associate Editor of the Year Award. The purpose of the award is the distinction of excellent work for ACM TOMCCAP and hence also for the whole multimedia community in the previous year. Criteria for the award are (1.) the amount of submissions processed in time, (2.) the performance during the reviewing process and (3.) the accurate interaction with the reviewers in order to broader the awareness for the journal.

Based on the criteria mentioned above, the ACM Transactions on Multimedia Computing, Communications and Applications Associate Editor of the Year Award 2013 goes to Mohan S. Kankanhalli from the National University of Singapore.  The Editor-in-Chief Prof. Dr.-Ing. Ralf Steinmetz cordially congratulates Mohan.

Bio of Awardee:

Mohan Kankanhalli is a Professor at the Department of Computer Science of the National University of Singapore. He is also the Associate Provost for Graduate Education at NUS. Before that, he was the Vice-Dean for Academic Affairs and Graduate Studies at the NUS School of Computing during 2008-2010 and Vice-Dean for Research during 2001-2007. Mohan obtained his BTech from IIT Kharagpur and MS & PhD from the Rensselaer Polytechnic Institute.

His current research interests are in Multimedia Systems (content processing, retrieval) and Multimedia Security (surveillance and privacy). He has been awarded a S$10M grant by Singapore’s National Research Foundation to set up the Centre for “Sensor-enhanced Social Media” (

Mohan has been actively involved in the organization of many major conferences in the area of Multimedia. He was the Director of Conferences for ACM SIG Multimedia from 2009 to 2013. He is on the editorial boards of several journals including the ACM Transactions on Multimedia Computing, Communications, and Applications, Springer Multimedia Systems Journal, Pattern Recognition Journal and Multimedia Tools & Applications Journal.

SIGMM Outstanding Technical Contributions Award 2013

SIGMM Award for Outstanding Technical Contributions to Multimedia Computing, Communications and Applications

The 2013 winner of the prestigious ACM Special Interest Group on Multimedia (SIGMM) award for Outstanding Technical Contributions to Multimedia Computing, Communications and Applications is Prof. Dr. Dick Bulterman. He is currently a Research Group Head of the Distributed and Interactive Systems at Centrum Wiskunde & Informatica (CWI) in Amsterdam, The Netherlands. He is also a Full Professor of Computer Science at Vrije Universiteit, Amsterdam. His research interests are multimedia authoring and document processing. His recent research concerns socially-aware multimedia, interactive television, and media analysis.

The ACM SIGMM Technical Achievement award is given in recognition of outstanding contributions over a researcher’s career. Prof. Dick Bulterman has been selected for his outstanding technical contributions in multimedia authoring, media annotation, and social sharing from research through standardization to entrepreneurship, and in particular for promoting international Web standards for multimedia authoring and presentation (SMIL) in the W3C Synchronized Multimedia Working Group as well as his dedicated involvement in the SIGMM research community for many years. The SIGMM award will be presented at the ACM International Conference on Multimedia 2013 that will be held Oct 21–25 2013 in Barcelona, Spain.

Dick Bulterman has been a long time intellectual leader in the area of temporal modeling and support for complex multimedia system. His research has led to the development of several widely used multimedia authoring systems and players. He developed the Amsterdam Hypermedia Model, the CMIF document structure, the CMIFed authoring environment, the GRiNS editor and player, and a host of multimedia demonstrator applications. In 1999, he started the CWI spinoff company called Oratrix Development BV, and he worked as CEO to widely deliver this software.

Dick has a strong international reputation for the development of the domain-specific temporal language for multimedia (SMIL). Much of this software has been incorporated into the widely used Ambulant Open Source SMIL Player, which has served to encourage development and use of time-based multimedia content. His conference publications and book on SMIL have helped to promote SMIL and its acceptance as a W3C standard.

Dick’s recent work on social sharing of video will likely prove influential in upcoming Interactive TV products. This work has already been recognized in the academic community, earning the ACM SIGMM best paper award at ACM MM 2008 and also at the EUROITV conference.

In summary, Prof. Bulterman’s accomplishments include pioneering and extraordinary contributions in multimedia authoring, media annotation, and social sharing and outstanding service to the computing community.

SIGMM PhD Thesis Award 2013

SIGMM Award for Outstanding PhD Thesis in Multimedia Computing, Communications and Applications 2013

The SIGMM Ph.D. Thesis Award Committee is pleased to recommend this year’s award for the outstanding Ph.D. thesis in multimedia computing, communications and applications to Dr. Xirong Li.

The committee considered Dr. Li’s dissertation titled “Content-based visual search learned from social media” as worthy of the award as it substantially extends the boundaries for developing content-based multimedia indexing and retrieval solutions. In particular, it provides fresh new insights into the possibilities for realizing image retrieval solutions in the presence of vast information that can be drawn from the social media.

The committee considered the main innovation of Dr. Li’s work to be in the development of the theory and algorithms providing answers to the following challenging research questions:

  1. what determines the relevance of a social tag with respect to an image,
  2. how to fuse tag relevance estimators,
  3. which social images are the informative negative examples for concept learning,
  4. how to exploit socially tagged images for visual search and
  5. how to personalize automatic image tagging with respect to a user’s preferences.

The significance of the developed theory and algorithms lies in their power to enable effective and efficient deployment of the information collected from the social media to enhance the datasets that can be used to learn automatic image indexing mechanisms (visual concept detection) and to make this learning more personalized for the user.

Bio of Awardee:

Dr. Xirong Li received the B.Sc. and M.Sc. degrees from the Tsinghua University, China, in 2005 and 2007, respectively, and the Ph.D. degree from the University of Amsterdam, The Netherlands, in 2012, all in computer science. The title of his thesis is “Content-based visual search learned from social media”. He is currently an Assistant Professor in the Key Lab of Data Engineering and Knowledge Engineering, Renmin University of China. His research interest is image search and multimedia content analysis. Dr. Li received the IEEE Transactions on Multimedia Prize Paper Award 2012, Best Paper Nominee of the ACM International Conference on Multimedia Retrieval 2012, Chinese Government Award for Outstanding Self-Financed Students Abroad 2011, and the Best Paper Award of the ACM International Conference on Image and Video Retrieval 2010. He served as publicity co-chair for ICMR 2013.

ACM SIGMM/TOMCCAP 2013 Award Announcements

The ACM Special Interest Group in Multimedia (SIGMM) and ACM Transactions on Multimedia Computing, Communications and Applications (TOMCCAP) are pleased to announce the following awards for 2013 recognizing outstanding achievements and services made in the multimedia community.

SIGMM Technical Achievement Award:
Dr. Dick Bulterman

SIGMM Best Ph.D. Thesis Award:
Dr. Xirong Li

TOMCCAP Nicolas D. Georganas Best Paper Award:
“Exploring interest correlation for peer-to-peer socialized video sharing” by Xu Cheng and Jiangchuan Liu, published in TOMCCAP vol. 8, Issue 1, 2012.

TOMCCAP Best Associate Editor Award:
Dr. Mohan S. Kankanhalli

Additional information of each award and recepient will be released in separate announcemrtns. Awards will be presented in the annual SIGMM event, ACM Multimedia Conference, held in Barcelona, Catalunya, Spain during October 23-25 2013.

ACM is the professional society of computer scientists, and SIGMM is the special interest group on multimedia. TOMCCAP is the flagship journal publication of SIGMM.

Opencast Matterhorn – Open source lecture capture and video management

Since its formation in 2007, Opencast has become a global community around academic video and its related domains. It is a valid source of inspiring ideas and a huge living community for educational multimedia content creation and usage. Matterhorn is a community-driven collaborative project to develop an end-to-end, open source solution that supports the scheduling, capture, managing, encoding, and delivery of educational audio and video content, and the engagement of users with that content. Recently (July 2013) Matterhorn 1.4 has been released after almost a year of feature planing, development and bug fixing by the community. The latest version, along with documentation, can be downloaded from the project website at Opencast Matterhorn.

Opencast Matterhorn Welcomes Climbers

The first screenshot shows a successful system installation and start in a web browser.

Opencast: A community for higher education

The Opencast community is a collaborative effort, in which individuals, higher education institutions and organizations work together to explore, develop, define and document best practices and technologies for management of audiovisual content in academia. As such, it wants to stimulate discussion and collaboration between institutions to develop and enhance the use of academic video. The community shares experiences with existing technologies as well as identifies future approaches and requirements. The community seeks broad participation in this important and dynamic domain, to allow community members to share expertise and experience and collaborate in related projects. It was initiated by the founding members of the community [2] to solve the need identified with academic institutions to run an affordable, flexible and enterprise-ready video management system. A list of current system adopters along with a detailed description can be found on the adopters page at List of Matterhorn adopters.

Matterhorn: Underlying technology

Matterhorn offers an open source reference implementation of an end-to-end enterprise lecture capture suite and a comprehensive set of flexible rich media services. It supports the typical lecture capture and video management phases: preparation, scheduling, capture, media processing, content distribution and usage. The picture below depicts the supported phases. These phases are also major differentiators in the system architecture. Additional information is available in [2].

Opencast Matterhorn phases of lecture recording

The core components are build upon a service-based architecture leveraging Java OSGI technology, which provides a standardized, component oriented environment for service coupling and cooperation [1], [2]. System administrators, lecturers or students do not need to handle Java objects, interfaces or service endpoints directly but can create and interact with system components by using fully customizable workflows (XML descriptions) for media recordings, encoding, handling and/or content distribution. Matterhorn comes with tools for administrators that allow to plan and schedule upcoming lectures as well as monitor different processes across distributed Matterhorn nodes. Feeds (ATOM/RSS) as well as a browsable media gallery enable users to customize and adapt content created with the system to local needs. Finally content player components (engage applications) are provided which allow to synchronize different streams (e.g. talking head and screen capture video or audience cameras), access content directly based on media search queries and use the media analysis capabilities for navigation purposes (e.g. slide changes).

Opencast Matterhorn welcome page

The project website provides a guided feature and demo tour, cookbook and installation sections about how to use and operate the system on a daily basis, as well as links to the community issue/feature tracker system Opencast issues.

Matterhorn2GO: Mobile connection to online learning repositories

Matterhorn 2GO is a cross-platform open source mobile front-end for recorded video material produced with the Opencast Matterhorn system [3]. It can be used on Android or iOS smartphones and tablets. The core technology used is Apache Flex. It has been released in the Google Play Store as well as in Apple’s iTunes store. Further information is available on the project website: Download / Install Matterhorn2GO. It brings lecture recordings and additional material created by Opencast Matterhorn to mobile learners worldwide. Matterhorn 2GO comes with powerful in content search capabilities based on Matterhorn’s core multimedia analysis features and is able to synchronize different content streams in one single view to fully follow the activity in the classroom. This allows users, for example, to access a certain aspect directly within numerous recorded series and/or single episodes. A user can choose between three different video view state options: a parallel view (professor and corresponding synchronized slides or screen recording), just the professor or only the lecture slides. Since most Opencast Matterhorn service endpoints offer a streaming option, a user can directly navigate to any time position in the video without waiting until it has been fully downloaded. The browse media page lists recordings from available Matterhorn installations. Students can simply follow their own eLectures but also get information about what else is being taught or presented at the local university or abroad at other learning institutes.

Stay informed and join the discussion

As an international open source community, Opencast has established several mechanisms for individuals to communicate, support each other and collaborate.

More information about communication within the community can be found at


Matterhorn and the Opencast Community can offer research initiatives a prolific environment with a multitude of partners and a technology developed to be adapted, amended or supplemented by new features, be that voice recognition, face detection, support for mobile devices, semantic connections in learning objects or (big) data mining. The final objective is to ensure that research initiatives will consider Matterhorn a focal point for their activities. The governance model of the Opencast Community and the Opencast Matterhorn project can be found online at

Acknowledgments and License

The authors would like to thank the Opencast Community and the Opencast Matterhorn developers for their support and creativity as well as the continuous efforts to create tools that can be used across campuses and learning institutes worldwide. Matterhorn is published under the Educational Community License (ECL) 2.0.


[1] Christopher A. Brooks, Markus Ketterl, Adam Hochman, Josh Holtzman, Judy Stern, Tobias Wunden, Kristofor Amundson, Greg Logan, Kenneth Lui, Adam McKenzie, Denis Meyer, Markus Moormann, Matjaz Rihtar, Ruediger Rolf, Nejc Skofic, Micah Sutton, Ruben Perez Vazquez, und Benjamin Wulff. OpenCast Matterhorn 1.1: reaching new heights. ACM Multimedia, pages 703-706. ACM, (2011)

[2] Ketterl, M, Schulte, O. A., Hochman, A. Opencast Matterhorn: A Community-Driven Open Source Software Project for Producing, Managing, and Distributing Academic Video. International Journal of Interactive Technology and Smart Education, Emerald Group Publishing Limited, Vol. 7 Iss: 3, pp.168 – 180, 2010.

[3] Markus Ketterl, Leonid Oldenburger, Oliver Vornberger. Opencast 2 Go: Mobile Connections to Multimedia Learning Repositories. In proceeding of: IADIS International Conference Mobile Learning, pages 181-188, Berlin, Germany, 2012