Dear Member of the SIGMM Community, welcome to the last issue of the SIGMM Records in 2013.

The editors of the Records have taken to a classical reporting approach, and you can read here the first of series of interviews. In this issue, Cynthia Liem is interview by Mathias Lux, and explains about the Phenicx project.

We have received a report from the first international competition on game-based learning applications, and also our regular column reporting from the 106th MPEG meeting that was held in Geneva. Our open source column presents libraries and tools for threading and visualizing a large video collection in this issue, a set of tools that will be useful for many in the community. Beyond that, you also read about two PhD thesis.

Among the announcements are several open positions, and a long list of calls for paper. The long list of calls is achieved by a policy change in SIGMM. After several years that have seen our two public mailing lists, sigmm@pi4.informatik.tu-mannheim.de and mm-interest@acm.org, flooded by calls for papers, the board and online services editors have decided to change the posting policy. Both lists are now closed for public submissions of calls for paper and participation. Instead, calls must be submitted through the SIGMM Records web page, and will be distributed on the mailing list in a weekly digest. We hope that the members of the SIG appreciate this service, and that those of us who have filtered emails for years feel that this is a more appropriate policy.

With those news, we invite you to read on in this issue of the Records.

The Editors
Stephan Kopf, Viktor Wendel, Lei Zhang, Pradeep Atrey, Christian Timmerer, Pablo Cesar, Mathias Lux, Carsten Griwodz

VIREO-VH: Libraries and Tools for Threading and Visualizing a Large Video Collection


“Video Hyperlinking” refers to the creation of links connecting videos that share near-duplicate segments. Like hyperlinks in HTML documents, the video links help user navigating videos of similar content, and facilitate the mining of iconic clips (or visual memes) spread among videos. Figure 1 shows some example of iconic clips, which can be leveraged for linking videos and the results are potentially useful for multimedia tasks such as video search, mining and analytics.

VIREO-VH [1] is an open source software developed by the VIREO research team. The software provides end-to-end support for the creation of hyperlinks, including libraries and tools for threading and visualizing videos in a large collection. The major software components are: near-duplicate keyframe retrieval, partial near-duplicate localization with time alignment, and galaxy visualization. These functionalities are mostly implemented based on state-of-the-art technologies, and each of them is developed as an independent tool taking into consideration flexibility, such that users can substitute any of the components with their own implementation. The earlier versions of the software are LIP-VIREO and SOTU, which have been downloaded more than 3,500 times. VIREO-VH has been internally used by VIREO since 2007, and evolved over the years based on the experiences of developing various multimedia applications, such as news events evolution analysis, novelty reranking, multimedia-based question-answering [2], cross media hyperlinking [3], and social video monitoring.

Figure 1: Examples of iconic clips.


The software components include video pre-processing, bag-of-words based inverted file indexing for scalable near-duplicate keyframe search, localization of partial near-duplicate segments [4], and galaxy visualization of a video collection, as shown in Figure 2. The open source includes over 400 methods with 22,000 lines of code.

The workflow of the open source is as followings. Given a collection of videos, the visual content will be indexed based on a bag-of-words (BoW) representation. Near-duplicate keyframes will be retrieved and then temporally aligned in a pairwise manner among videos. Segments of a video which are near-duplicate to other videos in the collection will then be hyperlinked with the start and end times of segments being explicitly logged. The end product is a galaxy browser, where the videos are visualized as a galaxy of clusters on a Web browser, with each cluster being a group of videos that are hyperlinked directly or indirectly through transitivity propagation. User friendly interaction is provided such that end user can zoom in and out, so they can glance or take a close inspection of the video relationship.

Figure 2: Overview of VIREO-VH software architecture.


VIREO-VH could be either used as an end-to-end system that outputs visual hyperlinks, with a video collection as input, or as independent functions for development of different applications.

For content owners interested in the content-wise analysis of a video collection, VIREO-VH can be used as an end-to-end system by simply inputting the location of a video collection and the output paths (Figure 3). The resulting output can then be viewed with the provided interactive interface for showing the glimpse of video relationship in the collection.

Figure 3: Interface for end-to-end processing of video collection.

VIREO-VH also provides libraries to grant researchers programmatic access. The libraries consist of various classes (e.g., Vocab, HE, Index, SearchEngine and CNetwork), providing different functions for vocabulary and Hamming signature training [5], keyframe indexing, near-duplicate keyframe searching and video alignment. Users can refer to the manual for details. Furthermore, the components of VIREO-VH are independently developed for providing flexibility, so users can substitute any of the components with their own implementation. This capability is particular useful for benchmarking the users’ own choice of algorithms. As an example, users can choose their own visual vocabulary and Hamming median, but use the open source for building index and retrieving near-duplicate keyframes. For example, the following few lines of code implements a typical image retrieval system:

#include “Vocab_Gen.h” #include “Index.h” #include “HE.h” #include “SearchEngine.h” … // train visual vocabulary using descriptors in folder “dir_desc” // here we choose to train a hierarchical vocabulary with 1M leaf nodes (3 layers, 100 nodes / layer) Vocab_Gen::genVoc(“dir_desc”, 100, 3); // load pre-trained vocabulary from disk Vocab* voc = new Vocab(100, 3, 128); voc->loadFromDisk(“vk_words/”); // Hamming Embedding training for the vocabulary HE* he = new HE(32, 128, p_mat, 1000000, 12); he->train(voc, “matrix”, 8); // index the descriptors with inverted file Index::indexFiles(voc, he, “dir_desc/”, “.feat”, “out_dir/”, 8); // load index and conduct online search for images in “query_desc” SearchEngine* engine = new SearchEngine(voc, he); engine->loadIndexes(“out_dir/”); engine->search_dir(“query_desc”, “result_file”, 100); …


We use a video collection consisting of 220 videos (around 31 hours) as an example. The collection was crawled from YouTube using the keyword “economic collapse”. Using our open source and default parameter settings, a total of 35 partial near-duplicate (ND) segments are located, resulting in 10 visual clusters (or snippets). Figure 4 shows two examples of the snippets. Based on our experiments, the precision of ND localization is as high as 0.95 and the recall is 0.66. Table 1 lists the running time for each step. The experiment was conducted on a PC with dual core 3.16 GHz CPU and 3 GB of RAM. In total, creating a galaxy view for 31.2 hours of videos (more than 4,000 keyframes) could be completed within 2.5 hours using our open source. More details can be found in [6].

Pre-processing 75 minutes
ND Retrieval 59 minutes
Partial ND localization 8 minutes
Galaxy Visualization 55 seconds

Table 1: The running time for processing 31.2 hours of videos.

Figure 4: Examples of visual snippets mined from a collection of 220 videos. For ease of visualization, each cluster is tagged with a timeline description from Wikipedia using the techniques developed in [3].


The open source software described in this article was fully supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (CityU 119610).


[1] http://vireo.cs.cityu.edu.hk/VIREO-VH/

[2] W. Zhang, L. Pang and C. W. Ngo. Snap-and-Ask: Answering Multimodal Question by Naming Visual Instance. ACM Multimedia, Nara, Japan, October 2012. Demo

[3] S. Tan, C. W. Ngo, H. K. Tan and L. Pang. Cross Media Hyperlinking for Search Topic Browsing. ACM Multimedia, Arizona, USA, November 2011. Demo

[4] H. K. Tan, C. W. Ngo, R. Hong and T. S. Chua. Scalable Detection of Partial Near-Duplicate Videos by Visual-Temporal Consistency. In ACM Multimedia, pages 145-154, 2009.

[5] H. Jegou, M. Douze, and C. Schmid. Improving bag-of-features for large scale image search. IJCV,87(3):192-212, May 2010.

[6] L. Pang, W. Zhang and C. W. Ngo. Video Hyperlinking: Libraries and Tools for Threading and Visualizing a Large Video Collection. ACM Multimedia, Nara, Japan, Oct 2012.

MPEG Column: 106th MPEG Meeting

— original posts here and here by Multimedia Communication blog and bitmovin techblogChristian TimmererAAU/bitmovin

National Day Present by Austrian Airlines on my way to Geneva.

November, 2013, Geneva, Switzerland. Here comes a news report from the 106th MPEG in Geneva, Switzerland which was actually during the Austrian national day but Austrian Airlines had a nice present (see picture) for their guests.

The official press release can be found here.

In this meeting, ISO/IEC 23008-1 (i.e., MPEG-H Part 1) MPEG Media Transport (MMT) reached Final Draft International Standard (FDIS). Looking back when this project was started with the aim to supersede the widely adopted MPEG-2 Transport Stream (M2TS) — which receives the Technology & Engineering Emmy®Award in Jan’14 — and what we have now, the following features are supported within MMT:

  • Self-contained multiplexing structure
  • Strict timing model
  • Reference buffer model
  • Flexible splicing of content
  • Name based access of data
  • AL-FEC (application layer forward error correction)
  • Multiple Qualities of Service within one packet flow

ITU-T Tower Building, Geneva.

Interestingly, MMT supports the carriage of MPEG-DASH segments and MPD for uni-directional environments such as broadcasting.

MPEG-H now comprises three major technologies, part 1 is about transport (MMT; at FDIS stage), part 2 deals with video coding (HEVC; at FDIS stage), and part 3 will be about audio coding, specifically 3D audio coding (but it’s still in its infancy for which technical responses have been evaluated only recently). Other parts of MPEG-H are currently related to these three parts.

In terms of research, it is important to determine the efficiency, overhead, and — in general — the use cases enabled by MMT. From a business point of view, it will be interesting to see whether MMT will actually supersede M2TS and how it will evolve compared or in relation to DASH.

On another topic, MPEG-7 visual reached an important milestone at this meeting. The Committee Draft (CD) for Part 13 (ISO/IEC 15938-13) has been approved and is entitled Compact Descriptors for Visual Search (CDVS). This image description enables comparing and finding pictures that include similar content, e.g., when showing the same object from different viewpoints. CDVS mainly deals with images but MPEG also started work for compact descriptors for video search.

The CDVS standard truly helps to reduce the semantic gap. However, research in this domain is already well developed and it is unclear whether the research community will adopt CDVS, specifically because the interest in MPEG-7 descriptors has decreased lately. On the other hand, such a standard will enable interoperability among vendors and services (e.g., Google Goggles) reducing the number of proprietary formats and, hopefully, APIs. However, the most important question is whether CDVS will be adopted by the industry (and research).

Finally, what about MPEG-DASH?

The 2nd edition of part 1 (MPD and segment formats) and the 1st edition of part 2 (conformance and reference software) have been finalized at the 105th MPEG meeting (FDIS). Additionally, we had a public/open workshop at that meeting which was about session management and control for DASH. This and other new topics are further developed within so-called core experiments for which I’d like to give a brief overview:

  • Server and Network assisted DASH Operation (SAND) which is the immediate result of the workshop at the 105th MPEG meeting and introduces a DASH-Aware Media Element (DANE) as depicted in the Figure below. Parameters from this element — as well as others — may support the DASH client within its operations, i.e., downloading the “best” segments for its context. SAND parameters are typically coming from the network itself whereas Parameters for enhancing delivery by DANE (PED) are coming from the content author.

Baseline Architecture for Server and Network assisted DASH.

  • Spatial Relationship Description is about delivering (tiled) ultra-high-resolution content towards heterogeneous clients while at the same time providing interactivity (e.g., zooming). Thus, not only the temporal but also spatial relationship of representations needs to be described.

Other CEs are related to signaling intended source and display characteristicscontrolling the DASH client behavior, and DASH client authentication and content access authorization.

The outcome of these CEs is potentially interesting for future amendments. One CE closed at this meeting which was about including quality information within DASH, e.g., as part of an additional track within ISOBMFF and an additional representation within the MPD. Clients may access this quality information in advance to assist the adaptation logic in order to make informed decisions about which segment to download next.

Interested people may join the MPEG-DASH Ad-hoc Group (AhG; http://lists.uni-klu.ac.at/mailman/listinfo/dash) where these topics (and others) are discussed.

Finally, additional information/outcome from the last meeting is accessible via http://mpeg.chiariglione.org/meetings/106 including documents publicly available (some may have an editing period).

An Interview with Cynthia Liem: The PHENICX Project

The PHENICX project is supported by the European Commission, FP7 (Seventh Framework Programme, STREP project, ICT-2011.8.2 ICT for access to cultural resources, grant agreement No 601166). The project is running for a year now and Cynthia Liem is involved since the initial planning and proposal writing. Currently, she is a work package leader in the project, and part of the overall project coordination team in the role of dissemination coordinator.

Partners in the project are Universitat Pompeu Fabra, Barcelona, ES; Delft University of Technology, NL; Johannes Kepler University Linz, AT; Austrian Research Institute for Artificial Intelligence, Vienna, AT; Video Dock BV, Amsterdam, NL; Royal Concertgebouw Orchestra, Amsterdam, NL; and Escola Superior de Música de Catalunya, Barcelona, ES. More information on the project can be found at http://phenicx.upf.edu

Q: What is the goal and scope of the PHENICX project?

PHENICX is about music and concert experiences. We want to use multimedia technologies to enhance the experience of a concert and make it more interesting and accessible for broad audiences. In this, we mainly focus on classical music.

Basically, the project has two sides. First of all, there is a content analysis side, in which we analyze concert performance data in a broad sense. We do not only look at an audio stream, but also e.g. at videos, gesture information, and social commenting information from people who attended concerts. Besides multiple modalities, we also try to take into account multiple perspectives: think of multiple cameras and microphones registering an orchestra, but also of multiple types of people (a conductor, orchestra musicians, or just your personal friends) speaking about a concert. Finally, a concert really is a multilayered phenomenon, with lots of things going on at the same time in which one could be potentially interested. The particular notes being played from a score are part of a larger structural whole; and while 130 individuals may be playing at the same time in a symphony orchestra, they form sub-groups which all have a particular role in the musical narrative and instrumental mix.

On the other side, it’s about the experience, about getting and keeping users from different consumer groups engaged. This is not just targeted at live attendance scenarios in the concert hall, but also for scenarios in which people attend concerts off-site through a live stream, or want to relive a concert on-demand after its performance. While for the content analysis part, we mostly focus on signal-oriented research topics, for this experience part we strongly look into topics such as recommendation, visualization and interaction. For example, how can you make the whole multilayered aspect of music more tangible? This can for example be done with automated score-following, through more simplified visualizations, but also by contrasting a particular performance against other existing performances of the same piece.

Our mission to broaden audiences for the classical music genre can be seen as a way of cultural heritage preservation using ICT. In the end, we really hope to see digital technology affecting culture consumption in a positive way. [As a concrete example, our partners Video Dock and the Royal Concertgebouw Orchestra already are working on a commercial tablet app called RCO Editions. The technologies we work on in PHENICX can really help in making the production of the app more scalable, expanding its feature set, and optimizing its user experience.

Q: Are there special organizational challenges?

In the project there are seven partners, four of them being academic partners. The three non-academic partners are major players in different parts of the music stakeholder spectrum, but have less experience with academic projects – especially the Royal Concertgebouw Orchestra, which really is involved for the first time in a large academic technology project. So in communicating and working with each other, there is always some translation needed between partners with different background and project experience levels. This is a very interesting organizational challenge in which we always try to find an optimal balance between different stakeholders.

Another potential challenge is language. Especially in the first year, we have been running a lot of focus groups to validate use cases. But while we have grown completely accustomed to using English in our daily academic work, as soon as you wish to interact with realistic local potential users of your technology in all project partner countries, you can’t take for granted these users have full expressive command of English (the younger generation typically does, but you don’t want to only reach them). And music is a very attractive topic for general public dissemination, since it’s a concrete part in many people’s lives; but once again, to make full use of this opportunity, you may have to look beyond English. So we’re having some dedicated organizational activities on that, working to also hold some studies and get some publicity material available in local languages.

Q: What is your personal relation to the project?

Well, I wrote a significant part of the proposal, so in that sense have a considerable relation to the project … but, at least as importantly, my musician background creates a strong personal link to this project. Having degrees in computer science and classical piano performance, I’m really interested in the interface between these two: working with music and digital data, using data technologies to improve on what you can learn and do with music – and PHENICX definitely is about this. So I’m very actively trying to use this double background for the project. It is especially useful for communication and dissemination: I can talk to people at the more musical side, many of which do not have extensive technical backgrounds, but also to those at the more technical side, who do not always have an extensive music background.

Funnily enough, the project also affected views I had from my own musicianship. The Royal Concertgebouw Orchestra is one of the most famous orchestras in the world. If you’re a music student in Holland, you can be backstage and engage with people from many national orchestras, but only the lucky few will manage to get even in the neighborhood of this particular orchestra. Now I’m having this connecting role in the project between academics and music stakeholders, and the orchestra became a project partner, I suddenly find myself being in their office quite often. I would never have expected that!

Besides that, with our work on user requirements and focus groups, I really managed to be in contact with actual audience. In our focus groups, we asked people why they liked going to concert performances, and we frequently heard people responding they valued feeling isolated from external influences in the concert hall, to have themselves being swept away by the music. Probably because a concert hall is a bit of a working space for me, I had totally forgotten this escapism aspect of concert attendance. So here, the project really made me aware of my own professional biases and ‘put me back on the ground’.

Q: Would you ever write an EU project proposal again?

Well, yes, I would, definitely with a consortium and project as inspiring as PHENICX. But I hope that next time I’ll have a bit more time than the three weeks in which we raced to completing the PHENICX proposal. 😉

Curriculum Vitae:

Cynthia Liem obtained her BSc and MSc degrees in Media and Knowledge Engineering (Computer Science) with honors at Delft University of Technology, The Netherlands, and currently is a PhD student at the Multimedia Information Retrieval Lab of the same university, working under the supervision of Prof. Alan Hanjalic. Besides, she holds Bachelor and Master of Music degrees in classical piano performance from the Royal Conservatoire in The Hague. Her research interests are strongly motivated by her background in both engineering and music and concentrate around multimedia content analysis for the music information retrieval domain.

From this background, she has been very active in getting music on the multimedia research agenda, particularly at the ACM Multimedia Conference, where she first initiated and served as the main organizer of the ACM MIRUM workshop (2011, 2012). This led to her becoming a co-chair of a dedicated ‘Music & Audio’ area at ACM MM 2013, and currently the more broadened ‘Music, Speech, and Audio Processing in Multimedia’ area for ACM MM 2014. She also was a main initiator of the EU FP7 PHENICX project (2013 – 2016), in which she now serves as work package leader and dissemination coordinator.

She is the recipient of several international scholarships and awards, including the Lucent Global Science Scholarship in 2005, the Google Anita Borg Scholarship in 2008, the Google European Doctoral Fellowship in Multimedia in 2010 (which partially supports her PhD research work), and the UfD Best PhD Candidate Award at Delft University of Technology in 2012. Besides her ongoing academic and musical activities, Cynthia has interned at Bell Labs Europe Netherlands, Philips Research, Google UK and Google Research, Mountain View, USA.

The interviewer, Mathias Lux, is a Associate Professor at the Institute for Information Technology (ITEC) at Klagenfurt University, where he has been since 2006. He received his M.S. in Mathematics in 2004 and his Ph.D. in Telematics in 2006 from Graz University of Technology. Before joining Klagenfurt University, he worked in industry on web-based applications, as a junior researcher at a research center for knowledge-based applications, and as research and teaching assistant at the Knowledge Management Institute (KMI) of Graz University of Technology. In research, he is working on user intentions in multimedia retrieval and production, visual information retrieval, and serious games. In his scientific career he has (co-) authored more than 60 scientific publications, has served in multiple program committees and as reviewer of international conferences, journals, and magazines, and has organized several scientific events. He is also well known for managing the development of the award-winning and popular open source tools Caliph & Emir and LIRE for visual information retrieval.

A report from the First International Competition on Game-Based Learning Applications

The European Conference on Game Based Learning is an academic conference that has been held annually in various European Universities since 2006. For the first time this year the Programme Committee, together with Segan (Serious Games Network, https://www.facebook.com/groups/segan) decided to launch a competition at the conference for the best educational game. The aims of the competition were:

  • To provide an opportunity for educational game designers and creators to participate in the conference and demonstrate their game design and development skills in an international competition;
  • To provide an opportunity for GBL creators to peer-assess and peer-evaluate their games;
  • To provide ECGBL attendees with engaging and best-practice games that showcase exemplary applications of GBL .

In the first instance prospective participants were asked to submit a 1000 word extended abstract giving an overview of the game itself, how it is positioned in terms of related work and what the unique education contribution is. We received 56 applications and these were reduced to 22 finalists who were invited to come to the conference to present their games. Four judges, in two teams assessed the games based on a comprehensive set of criteria including sections on learning outcomes, usability and soci-cultural aspects. A shortlist of 6 games were then revisited by all the judges during an open demonstration session at which conference participants were also welcome to participate. First, Second and Third place awards were given and two Highly Commended certificates were presented. The top three games were quite different in terms of the target audience and the format.

Third place

In third place was an app-based early learning game called Lipa Eggs developed by Ian Hook and Roman Hodek from Lipa Learning in the Czech Republic. This game was designed to help pre-school children with colour mixing and recognition and was delivered via a tablet. The gameplay takes the form of a graduated learning system which first allows children to develop the skills to play the game and then develops the learning process to encourage players to find new solutions. More information about the game can be found at http://www.lipalearning.com/game/lipa-eggs

Second place

In second place was a non-digital game called ChemNerd developed by Jakob Thomas Holm from Sterskov Efterskole (a secondary school in Denmark specializing in game-based learning). This game was designed to help teach the periodic table to secondary school students and was presented as a multi-level card game. The game utilizes competition and face to face interaction between students to teach them complicated chemical theory over six phases beginning with a memory challenge and ending with a practical experiment. A video illustrating the game can been seen at http://youtu.be/XD6BPrJyxlc


The winner was a computer game called Mystery of Taiga River developed by Sasha Barab and Anna Arici from Arizona State University in the USA. The aim of the game was to teach ecological studies to secondary school students and was presented as a game-based immersive world where students become investigative reporters who had to investigate, learn and apply scientific concepts to solve applied problems in a virtual park and restore the health of the dying fish. A video of the game can be seen at http://gamesandimpact.org/taiga_river

Both competitors and conference participants said that they had enjoyed the opportunity of seeing applied educational game development from around the world and the intention is to make this an annual competition associated with the European Conference on Game-Based Learning (ECGBL). The conference in 2014 will be held in Berlin on 30-31 October and the call for games is now open. Details can be found here: http://academic-conferences.org/ecgbl/ecgbl2014/ecgbl14-call-papers.htm

ACM MM 2013 awards

Best Paper Award
Luoqi Liu, Hui Xu, Junliang Xing, Si Liu, Xi Zhou and Shuicheng Yan
Wow! You Are So Beautiful Today!


Best Student Paper Award
Hanwang Zhang, Zheng-Jun Zha, Yang Yang, Shuicheng Yan, Yue Gao and
Tat-Seng Chua
Attribute-augmented Semantic Hierarchy: Towards Bridging Semantic Gap and
Intention Gap in Image Retrieval


Grand Challenge 1st Place Award
Brendan Jou, Hongzhi Li, Joseph G. Ellis, Daniel Morozoff-Abegauz and
Shih-Fu Chang
Structured Exploration of Who, What, When, and Where in Heterogenous
Multimedia News Sources


Grand Challenge 2nd Place Award
Subhabrata Bhattacharya, Behnaz Nojavanasghari, Tao Chen, Dong Liu,
Shih-Fu Chang, Mubarak Shah
Towards a Comprehensive Computational Model for Aesthetic Assessment of


Grand Challenge 3rd Place Award
Shannon Chen, Penye Xia, and Klara Nahrstedt
Activity-Aware Adaptive Compression: A Morphing-Based Frame Synthesis
Application in 3DTI


Grand Challenge Multimodal Award
Chun-Che Wu, Kuan-Yu Chu, Yin-Hsi Kuo, Yan-Ying Chen, Wen-Yu Lee, Winston
H. Hsu
Search-Based Relevance Association with Auxiliary Contextual Cues


Best Demo Award
Duong-Trung-Dung Nguyen, Mukesh Saini; Vu-Thanh Nguyen, Wei Tsang Ooi
Jiku director: An online mobile video mashup system


Best Doctoral Symposium Paper
Jules Francoise
Gesture-Sound Mapping by demonstration in Interactive Music Systems


Best Open Source Software Award
Dmitry Bogdanov,  Nicolas Wack, Emilia Gómez, Sankalp Gulati, Perfecto
Oscar Mayor, Gerard Roma, Justin Salamon, Jose Zapata Xavier Serra
ESSENTIA: An Audio Analysis Library for Music Information Retrieval