JPEG Column: 105th JPEG Meeting in Berlin, Germany

JPEG Trust becomes an International Standard

The 105th JPEG meeting was held in Berlin, Germany, from October 6 to 11, 2024. During this JPEG meeting, JPEG Trust was sent for publication as an International Standard. This is a major achievement in providing standardized tools to effectively fight against the proliferation of fake media and disinformation while restoring confidence in multimedia information.

In addition, the JPEG Committee also sent for publication the JPEG Pleno Holography standard, which is the first standardized solution for holographic content coding. This type of content might be represented by huge amounts of information, and efficient compression is needed to enable reliable and effective applications.

The following sections summarize the main highlights of the 105th JPEG meeting:

105th JPEG Meeting, held in Berlin, Germany.
  • JPEG Trust
  • JPEG Pleno

JPEG Trust

In an important milestone, the first part of JPEG Trust, the “Core Foundation” (ISO/IEC IS 21617-1) International Standard, has now been approved by the international ISO committee and is being published. This standard addresses the problem of dis- and misinformation and provides leadership in global interoperable media asset authenticity. JPEG Trust defines a framework for establishing trust in digital media.

Users of social media are challenged to assess the trustworthiness of the media they encounter, and agencies that depend on the authenticity of media assets must be concerned with mistaking fake media for real, with risks of real-world consequences. JPEG Trust provides a proactive approach to trust management. It is built upon and extends the Coalition for Content Provenance and Authenticity (C2PA) engine. The first part defines the JPEG Trust framework and provides building blocks for more elaborate use cases via its three main pillars:

  • Annotating provenance – linking media assets together with their associated provenance annotations in a tamper-evident manner
  • Extracting and evaluating Trust Indicators – specifying how to extract an extensive array of Trust Indicators from any given media asset for evaluation
  • Handling privacy and security concerns – providing protection for sensitive information based on the provision of JPEG Privacy and Security (ISO/IEC 19566-4)

Trust in digital media is context-dependent. JPEG Trust does NOT explicitly define trustworthiness but rather provides a framework and tools for proactively establishing trust in accordance with the trust conditions needed. The JPEG Trust framework outlined in the core foundation enables individuals, organizations, and governing institutions to identify specific conditions for trustworthiness, expressed in Trust Profiles, to evaluate relevant Trust Indicators according to the requirements for their specific usage scenarios. The resulting evaluation can be expressed in a Trust Report to make the information easily accessed and understood by end users.

JPEG Trust has an ambitious schedule of future work, including evolving and extending the core foundation into related topics of media tokenization and media asset watermarking, and assembling a library of common Trust Profile requirements.

JPEG Pleno

The JPEG Pleno Holography activity reached a major milestone with the FDIS of ISO/IEC 21794-5 being accepted and the International Standard being under preparation by ISO. This is a major achievement for this activity and is the result of the dedicated work of the JPEG Committee over a number of years. The JPEG Pleno Holography activity continues with the development of a White Paper on JPEG Pleno Holography to be released at the 106th JPEG meeting and planning for a workshop for future standardization on holography intended to be conducted in November or December 2024.

The JPEG Pleno Light Field activity focused on the 2nd edition of ISO/IEC 21794-2 (“Plenoptic image coding system (JPEG Pleno) Part 2: Light field coding”) which will integrate AMD1 of ISO/IEC 21794-2 (“Profiles and levels for JPEG Pleno Light Field Coding”) and include the specification of the third coding mode entitled Slanted 4D Transform Mode and the associated profile.

Following the Call for Contributions on Subjective Light Field Quality Assessment and as a result of the collaborative process, the JPEG Pleno Light Field is also preparing standardization activities for subjective and objective quality assessment of light fields. At the 105th JPEG meeting, collaborative subjective results on light field quality assessments were presented and discussed. The results will guide the subjective quality assessment standardization process, which has issued its fourth Working Draft.

The JPEG Pleno Point Cloud activity released a White Paper on JPEG Pleno Learning-based Point Cloud Coding. This document outlines the context, motivation, and scope of the upcoming Part 6 of ISO/IEC 21794 scheduled for publication in early 2025, as well as giving the basis of the new technology, use cases, performance, and future activities. This activity focuses on a new exploration study into the latent space optimization for the current Verification Model.


At the 105th meeting JPEG AI activity primarily concentrated on advancing Part 2 (Profiling), Part 3 (Reference Software), and Part 4 (Conformance). Part 4 moved forward to the Committee Draft (CD) stage, while Parts 2 and 3 are anticipated to reach DIS at the next meeting. The conformance CD outlines different types of conformances: 1) strict conformance for decoded residuals; 2) soft conformance for decoded feature tensors, allowing minor deviations; and 3) soft conformance for decoded images, ensuring that image quality remains comparable to or better than the quality offered by the reference model. For decoded images, two types of soft conformance were introduced based on device capabilities. Discussions on Part 2 examined memory requirements for various JPEG AI VM codec configurations. Additionally, three core experiments were established during this meeting, focusing on JPEG AI subjective assessment, integerization, and the study of profiles and levels.


The JPEG XE activity is currently focused on preparing for handling the open Final Call for Proposals on lossless coding of events. This activity revolves around a new and emerging image modality created by event-based visual sensors. JPEG XE is about the creation and development of a standard to represent events in an efficient way allowing interoperability between sensing, storage, and processing, targeting machine vision and other relevant applications. The Final Call for Proposals ends in March of 2025 and aims to receive relevant coding tools that will serve as a basis for a JPEG XE standard. The JPEG Committee is also preparing discussions on lossy coding of events and how to evaluate such lossy coding technologies in the future. The JPEG Committee invites those interested in JPEG XE activity to consider the public documents, available on The Ad-hoc Group on event-based vision was re-established to continue work towards the 106th JPEG meeting. To stay informed about this activity, please join the event-based vision Ad-hoc Group mailing list.


Part 3 of JPEG AIC (AIC-3) advanced to the Committee Draft (CD) stage during the 105th JPEG meeting. AIC-3 defines a methodology for subjective assessment of the visual quality of high-fidelity images. Based on two test protocols—Boosted Triplet Comparisons and Plain Triplet Comparisons—it reconstructs a fine-grained quality scale in JND (Just Noticeable Difference) units. According to the defined work plan, JPEG AIC-3 is expected to advance to the Draft International Standard (DIS) stage by April 2025 and become an International Standard (IS) by October 2026. During this meeting, the JPEG Committee also focused on the upcoming Part 4 of JPEG AIC, which refers to the objective quality assessment of high-fidelity images.


JPEG DNA is an initiative aimed at developing a standard capable of representing bi-level, continuous-tone grey-scale, continuous-tone colour, or multichannel digital samples in a format using nucleotide sequences to support DNA storage. The JPEG DNA Verification Model was created during the 102nd JPEG meeting based on the performance assessments and descriptive analyses of the submitted solutions to the Call for Proposals, published at the 99th JPEG meeting. Several core experiments are continuously conducted to validate and improve this Verification Model (VM), leading to the creation of the first Working Draft of JPEG DNA during the 103rd JPEG meeting. At the 105th JPEG meeting, the committee created a New Work Item Proposal for JPEG DNA to make it an official ISO work item. The proposal stated that JPEG DNA would be a multi-part standard: Part 1—Core Coding System, Part 2—Profiles and Levels, Part 3—Reference Software, and Part 4—Conformance. The committee aims to reach the IS stage for Part 1 by April 2026.


The third editions of JPEG XS, Part 1 – Core coding tools, Part 2 – Profiles and buffer models, and Part 3 – Transport and container formats, have now been published and made available on ISO. The JPEG Committee is finalizing the third edition of the remaining two parts of the JPEG XS standards suite, Part 4 – Conformance testing and Part 5 – Reference software. The FDIS of Party 4 was issued for the ballot at this meeting. Part 5 is still at the Committee Draft stage, and the DIS is planned for the next JPEG meeting. The reference software has a feature-complete decoder fully compliant with the 3rd edition. Work on the TDC profile encoder is ongoing.


A third edition of JPEG XL Part 2 (File Format) will be initiated to add an embedding syntax for ISO 21496 gain maps, which can be used to represent a custom local tone mapping and have artistic control over the SDR rendition of an HDR image coded with JPEG XL. Work on hardware and software implementations continues, including a new Rust implementation.

Final Quote

“In its commitment to tackle dis/misinformation and to manage provenance, authorship, and ownership of multimedia information, the JPEG Committee has reached a major milestone by publishing the first ever ISO/IEC endorsed specifications for bringing back trust into multimedia. The committee will continue developing additional enhancements to JPEG Trust. New parts of the standard are under development to define a set of additional tools to further enhance interoperable trust mechanisms in multimedia.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Report from ACM SIG Heritage Workshop

What does history mean to computer scientists?” – that was the first question that popped up in my mind when I was to attend the ACM Heritage Workshop at Minneapolis few months back. And needless to say, the follow up question was “what does history mean for a multimedia systems researcher?” As a young graduate student, I had the joy of my life when my first research paper on multimedia authoring (a hot topic those days) was accepted for presentation in the first ACM Multimedia in 1993, and that conference was held along side SIGGRAPH. Thinking about that, it gives multimedia systems researchers about 25 to 30 years of history. But what a flow of topics this area has seen: from authoring to streaming to content-based retrieval to social media and human-centered multimedia, the research area has been hot as ever. So, is it the history of research topics or the researchers or both? Then, how about the venues hosting these conferences, the networking events, or the grueling TPC meetings that prepped the conference actions?

Figure 1. Picture from the venue

With only questions and no clear answers, I decided to attend the workshop with an open mind. Most SIGs (Special Interest Groups) in ACM had representation at this workshop. The workshop itself was organized by the ACM History Committee. I understood this committee, apart from the workshop, organizes several efforts to track, record, and preserve computing efforts across disciplines. This includes identifying distinguished persons (who are retired but made significant contributions to computing), coming up with a customized questionnaire for the persons, training the interviewer, recording the conversations, curating them, archiving, and providing them for public consumption. Efforts at most SIGs were mostly based on the website. They were talking about how they try to preserve conference materials such as paper proceedings (when only paper proceedings were published), meeting notes, pictures, and videos. For instance, some SIGs were talking about how they tracked and preserved ACM’s approval letter for the SIG! 

It was very interesting – and touching – to see some attendees (senior Professors) coming to the workshop with boxes of materials – papers, reports, books, etc. They were either downsizing their offices or clearing out, and did not feel like throwing the material in recycling bins! These materials were given to ACM and Babbage Institute (at University of Minnesota, Minneapolis) for possible curation and storage.

Figure 2. Galleries with collected material

ACM History committee members talked about how they can fund (at a small level) projects that target specific activities for preserving and archiving computing events and materials. ACM History Committee agreed that ACM should take more responsibility in providing technical support to web hosting – obviously, not sure whether anything tangible would result.

Over the two days at the workshop, I was getting answers to my questions: History can mean pictures and videos taken at earlier MM conferences, TPC meetings, SIGMM sponsored events and retreats. Perhaps, the earlier paper proceedings that have some additional information than what is found in the corresponding ACM Digital Library version. Interviews with different research leaders that built and promoted SIGMM.

It was clear that history meant different things to different SIGs, and as SIGMM community, we would have to arrive at our own interpretation, collect and preserve that. And that made me understand the most obvious and perhaps, the most important thing: today’s events become tomorrow’s history! No brainer, right? Preserving today’s SIGMM events will give us a richer, colorful, and more complete SIGMM history for the future generations!

For the curious ones:

ACM Heritage Workshop website is at: https://acmsigheritage.dash.umn.ed

Some of the workshop presentation materials are available at:

Socially significant music events

Social media sharing platforms (e.g., YouTube, Flickr, Instagram, and SoundCloud) have revolutionized how users access multimedia content online. Most of these platforms provide a variety of ways for the user to interact with the different types of media: images, video, music. In addition to watching or listening to the media content, users can also engage with content in different ways, e.g., like, share, tag, or comment. Social media sharing platforms have become an important resource for scientific researchers, who aim to develop new indexing and retrieval algorithms that can improve users’ access to multimedia content. As a result, enhancing the experience provided by social media sharing platforms.

Historically, the multimedia research community has focused on developing multimedia analysis algorithms that combine visual and text modalities. Less highly visible is research devoted to algorithms that exploit an audio signal as the main modality. Recently, awareness for the importance of audio has experienced a resurgence. Particularly notable is Google’s release of the AudioSet, “A large-scale dataset of manually annotated audio events” [7]. In a similar spirit, we have developed the “Socially Significant Music Event“ dataset that supports research on music events [3]. The dataset contains Electronic Dance Music (EDM) tracks with a Creative Commons license that have been collected from SoundCloud. Using this dataset, one can build machine learning algorithms to detect specific events in a given music track.

What are socially significant music events? Within a music track, listeners are able to identify certain acoustic patterns as nameable music events.  We call a music event “socially significant” if it is popular in social media circles, implying that it is readily identifiable and an important part of how listeners experience a certain music track or music genre. For example, listeners might talk about these events in their comments, suggesting that these events are important for the listeners (Figure 1).

Traditional music event detection has only tackled low-level events like music onsets [4] or music auto-tagging [810]. In our dataset, we consider events that are at a higher abstraction level than the low-level musical onsets. In auto-tagging, descriptive tags are associated with 10-second music segments. These tags generally fall into three categories: musical instruments (guitar, drums, etc.), musical genres (pop, electronic, etc.) and mood based tags (serene, intense, etc.). The types of tags are different than what we are detecting as part of this dataset. The events in our dataset have a particular temporal structure unlike the categories that are the target of auto-tagging. Additionally, we analyze the entire music track and detect start points of music events rather than short segments like auto-tagging.

There are three music events in our Socially Significant Music Event dataset: Drop, Build, and Break. These events can be considered to form the basic set of events used by the EDM producers [1, 2]. They have a certain temporal structure internal to themselves, which can be of varying complexity. Their social significance is visible from the presence of large number of timed comments related to these events on SoundCloud (Figure 1,2). The three events are popular in the social media circles with listeners often mentioning them in comments. Here, we define these events [2]:

  1. Drop: A point in the EDM track, where the full bassline is re-introduced and generally follows a recognizable build section
  2. Build: A section in the EDM track, where the intensity continuously increases and generally climaxes towards a drop
  3. Break: A section in an EDM track with a significantly thinner texture, usually marked by the removal of the bass drum

Figure 1. Screenshot from SoundCloud showing a list of timed comments left by listeners on a music track [11].

Figure 1. Screenshot from SoundCloud showing a list of timed comments left by listeners on a music track [11].


SoundCloud is an online music sharing platform that allows users to record, upload, promote and share their self-created music. SoundCloud started out as a platform for amateur musicians, but currently many leading music labels are also represented. One of the interesting features of SoundCloud is that it allows “timed comments” on the music tracks. “Timed comments” are comments, left by listeners, associated with a particular time point in the music track. Our “Socially Significant Music Events” dataset is inspired by the potential usefulness of these timed comments as ground truth for training music event detectors. Figure 2 contains an example of a timed comment: “That intense buildup tho” (timestamp 00:46). We could potentially use this as a training label to detect a build, for example. In a similar way, listeners also mention the other events in their timed comments. So, these timed comments can serve as training labels to build machine learning algorithms to detect events.

Figure 2. Screenshot from SoundCloud indicating the useful information present in the timed comments. [11]

Figure 2. Screenshot from SoundCloud indicating the useful information present in the timed comments. [11]

SoundCloud also provides a well-documented API [6] with interfaces to many programming languages: Python, Ruby, JavaScript etc. Through this API, one can download the music tracks (if allowed by the uploader), timed comments and also other metadata related to the track. We used this API to collect our dataset. Via the search functionality we searched for tracks uploaded during the year 2014 with a Creative Commons license, which results in a list of tracks with unique identification numbers. We looked at the timed comments of these tracks for the keywords: drop, break and build. We kept the tracks whose timed comments contained a reference to these keywords and discarded the other tracks.


The dataset contains 402 music tracks with an average duration of 4.9 minutes. Each track is accompanied by timed comments relating to Drop, Build, and Break. It is also accompanied by ground truth labels that mark the true locations of the three events within the tracks. The labels were created by a team of experts. Unlike many other publicly available music datasets that provide only metadata or short previews of music tracks  [9], we provide the entire track for research purposes. The download instructions for the dataset can be found here: [3]. All the music tracks in the dataset are distributed under the Creative Commons license. Some statistics of the dataset are provided in Table 1.  

Table 1. Statistics of the dataset: Number of events, Number of timed comments

Event Name Total number of events Number of events per track Total number of timed comments Number of timed comments per track
Drop  435  1.08  604  1.50
Build  596  1.48  609  1.51
Break  372  0.92  619  1.54

The main purpose of the dataset is to support training of detectors for the three events of interest (Drop, Build, and Break) in a given music track. These three events can be considered a case study to prove that it is possible to detect socially significant musical events, opening the way for future work on an extended inventory of events. Additionally, the dataset can be used to understand the properties of timed comments related to music events. Specifically, timed comments can be used to reduce the need for manually acquired ground truth, which is expensive and difficult to obtain.

Timed comments present an interesting research challenge: temporal noise. The timed comments and the actual events do not always coincide. The comments could be at the same position, before, or after the actual event. For example, in the below music track (Figure 3), there is a timed comment about a drop at 00:40, while the actual drop occurs only at 01:00. Because of this noisy nature, we cannot use the timed comments alone as ground truth. We need strategies to handle temporal noise in order to use timed comments for training [1].

Figure 3. Screenshot from SoundCloud indicating the noisy nature of timed comments [11].

Figure 3. Screenshot from SoundCloud indicating the noisy nature of timed comments [11].

In addition to music event detection, our “Socially Significant Music Event” dataset opens up other possibilities for research. Timed comments have the potential to improve users’ access to music and to support them in discovering new music. Specifically, timed comments mention aspects of music that are difficult to derive from the signal, and may be useful to calculate song-to-song similarity needed to improve music recommendation. The fact that the comments are related to a certain time point is important because it allows us to derive continuous information over time from a music track. Timed comments are potentially very helpful for supporting listeners in finding specific points of interest within a track, or deciding whether they want to listen to a track, since they allow users to jump-in and listen to specific moments, without listening to the track end-to-end.

State of the art

The detection of music events requires training classifiers that are able to generalize over the variability in the audio signal patterns corresponding to events. In Figure 4, we see that the build-drop combination has a characteristic pattern in the spectral representation of the music signal. The build is a sweep-like structure and is followed by the drop, which we indicate by a red vertical line. More details about the state-of-the-art features useful for music event detection and the strategies to filter the noisy timed comments can be found in our publication [1].

Figure 4. The spectral representation of the musical segment containing a drop. You can observe the sweeping structure indicating the buildup. The red vertical line is the drop.

Figure 4. The spectral representation of the musical segment containing a drop. You can observe the sweeping structure indicating the buildup. The red vertical line is the drop.

The evaluation metric used to measure the performance of a music event detector should be chosen according to the user scenario for that detector. For example, if the music event detector is used for non-linear access (i.e., creating jump-in points along the playbar) it is important that the detected time point of the event falls before, rather than after, the actual event.  In this case, we recommend using the “event anticipation distance” (ea_dist) as a metric. The ea_dist is amount of time that the predicted event time point precedes an actual event time point and represents the time the user would have to wait to listen to the actual event. More details about ea_dist can be found in our paper [1].

In [1], we report the implementation of a baseline music event detector that uses only timed comments as training labels. This detector attains an ea_dist of 18 seconds for a drop. We point out that from the user point of view, this level of performance could already lead to quite useful jump-in points. Note that the typical length of a build-drop combination is between 15-20 seconds. If the user is positioned 18 seconds before the drop, the build would have already started and the user knows that a drop is coming. Using an optimized combination of timed comments and manually acquired ground truth labels we are able to achieve an ea_dist of 6 seconds.


Timed comments, on their own, can be used as training labels to train detectors for socially significant events. A detector trained on timed comments performs reasonably well in applications like non-linear access, where the listener wants to jump through different events in the music track without listening to it in its entirety. We hope that the dataset will encourage researchers to explore the usefulness of timed comments for all media. Additionally, we would like to point out that our work has demonstrated that the impact of temporal noise can be overcome and that the contribution of timed comments to video event detection is worth investigating further.


Should you have any inquiries or questions about the dataset, do not hesitate to contact us via email at:


[1] K. Yadati, M. Larson, C. Liem and A. Hanjalic, “Detecting Socially Significant Music Events using Temporally Noisy Labels,” in IEEE Transactions on Multimedia. 2018.

[2] M. Butler, Unlocking the Groove: Rhythm, Meter, and Musical Design in Electronic Dance Music, ser. Profiles in Popular Music. Indiana University Press, 2006 






[8] H. Y. Lo, J. C. Wang, H. M. Wang and S. D. Lin, “Cost-Sensitive Multi-Label Learning for Audio Tag Annotation and Retrieval,” in IEEE Transactions on Multimedia, vol. 13, no. 3, pp. 518-529, June 2011.





Dear Member of the SIGMM Community, welcome to the third issue of the SIGMM Records in 2013.

On the verge of ACM Multimedia 2013, we can already present the receivers of SIGMM’s yearly awards, the SIGMM Technical Achievement Award, the SIGMM Best Ph.D. Thesis Award, the TOMCCAP Nicolas D. Georganas Best Paper Award, and the TOMCCAP Best Associate Editor Award.

The TOMCCAP Special Issue on the 20th anniversary of ACM Multimedia is out in October, and you can read both the announcement, and find each of the contributions directly through the TOMCCAP Issue 9(1S) table of contents.

That SIGMM has established a strong foothold in the scientific community can also be seen by the Chinese Computing Federation’s rankings of SIGMM’s venues. Read the article to get even more motivation for submitting your papers to SIGMM’s conferences and journal.

We are also reporting from SLAM, the international workshop on Speech, Language and Audio in Multimedia. Not a SIGMM event, but certainly of interest to many SIGMMers who care about audio technology.

You find also two PhD thesis summaries, and last but most certainly not least, you find pointers to the latest issues of TOMCCAP and MMSJ, and several job announcements.

We hope that you enjoy this issue of the Records.

The Editors
Stephan Kopf, Viktor Wendel, Lei Zhang, Pradeep Atrey, Christian Timmerer, Pablo Cesar, Mathias Lux, Carsten Griwodz

ACM TOMCCAP Special on 20th Anniversary of ACM Multimedia

ACM Transactions on Multimedia Computing, Communications and Applications

Special Issue: 20th Anniversary of ACM International Conference on Multimedia

A journey ‘Back to the Future’

The ACM Special Interest Group on Multimedia (SIGMM) celebrated the 20th anniversary of the establishment of its premier conference, the ACM International Conference on Multimedia (ACM Multimedia) in 2012. To commemorate this milestone, leading researchers organized and extensively contributed to the 20th anniversary celebration.

from left to right: Malcolm Slaney, Ramesh Jain, Dick Bulterman, Klara Nahrstedt, Larry Rowe and Ralf Steinmetz

The celebratory events started at ACM Multimedia 2012 in Nara Japan, with the  “Coulda, Woulda, Shoulda: 20 Years of Multimedia Opportunities” panel, organized by Klara Nahrstedt (center) and Malcolm Slaney (far left). At this panel, pioneers of the field, Ramesh Jain, Dick Bulterman, Larry Rowe and Ralf Steinmetz, from left to right shown in the image, reflected on innovations, and successful and missed opportunities in the multimedia research area.

This special issue of the ACM Transaction on Multimedia Computing, Communication and Applications (TOMCCAP) is the final event to celebrate achievements and opportunities in a variety of multimedia areas. Through peer-reviewed long articles and invited short contributions, readers will get a sense of the past, present and future of multimedia research. The evolution ranges over traditional topics such as video streaming, multimedia synchronization, multimedia authoring, content analysis, and multimedia retrieval to newer topics including music retrieval, geo-tagging context in worldwide community of photos, multi-modal humancomputer interactions and experiential media systems.

Recent years have seen an explosion of research and technologies in multimedia, beyond individual algorithms, protocols and small scale systems. The scale of multimedia innovations and deployment has exploded with unimaginable speed. Hence, as the multimedia area is growing fast, penetrating every facet of our society, this special issue fills an important need to look back at the multimedia research achievements over the past 20 years, celebrates the exciting potential, and explores new goals of the multimedia research community.

Visit to view in the DL.

TOMCCAP Nicolas D. Georganas Best Paper Award 2013

ACM Transactions on Multimedia Computing, Communications and Applications (TOMCCAP) Nicolas D. Georganas Best Paper Award

The 2013 ACM Transactions on Multimedia Computing, Communications and Applications (TOMCCAP) Nicolas D. Georganas Best Paper Award is provided to the paper Exploring interest correlation for peer-to-peer socialized video sharing (TOMCCAP vol. 8, Issue 1) by Xu Cheng and Jiangchuan Liu.

The purpose of the named award is to recognize the most significant work in ACM TOMCCAP in a given calendar year. The whole readership of ACM TOMCCAP was invited to nominate articles which were published in Volume 8 (2012). Based on the nominations the winner has been chosen by the TOMCCAP Editorial Board. The main assessment criteria have been quality, novelty, timeliness, clarity of presentation, in addition to relevance to multimedia computing, communications, and applications.

In this paper the authors examine architectures for large-scale video streaming systems exploiting social relations. To achieve this objective, a large study of YouTube traffic was conducted and a cluster analysis performed on the resulting data. Based on the observations made, a new approach for video pre-fetching based on social relations has been developed. This important work bridges the gap between social media and multimedia streaming and hence combines two extremely relevant research topics.

The award honors the founding Editor-in-Chief of TOMCCAP, Nicolas D. Georganas, for his outstanding contributions to the field of multimedia computing and his significant contributions to ACM.  He exceedingly influenced the research and the whole multimedia community.

The Editor-in-Chief Prof. Dr.-Ing. Ralf Steinmetz and the Editorial Board of ACM TOMCCAP cordially congratulate the winner. The award will be presented to the authors on October 24th 2013 at the ACM Multimedia 2013 in Barcelona, Spain and includes travel expenses for the winning authors.

Bio of Awardees:

Xu Cheng is currently a research engineer at BroadbandTV, Vancouver, Canada. He receive the Bachelor of Science from Peking University, China, in 2006, Master of Science from Simon Fraser University, Canada, in 2008, and PhD from Simon Fraser University, Canada, in 2012. His research interests included multimedia networks, social networks and overlay networks.


Jiangchuan Liu is an Associate Professor in School of Computing Science, Simon Fraser University, British Columbia, Canada. He received BEng(cum laude) from Tsinghua University in 1999 and PhD from HKUST in 2003, both in computer science. He is a co-recipient of ACM Multimedia’2012 Best Paper Award, IEEE Globecom’2011 Best Paper Award, IEEE Communications Society Best Paper Award on Multimedia Communications 2009, as well as IEEE IWQoS’08 and IEEE/ACM IWQoS’2012 Best Student Paper Awards. His research interests are in networking and multimedia. He served on the editorial boards of IEEE Transactions on Multimedia, IEEE Communications Tutorial and Surveys, and IEEE Internet of Things Journal. He will be TPC co-chair for IEEE/ACM IWQoS’2014 at Hong Kong.


TOMCCAP Best Associate Editor Award 2013

ACM Transactions on Multimedia Computing, Communications and Applications Best Associate Editor Award

Annually, the Editor-in-Chief of the ACM Transactions on Multimedia Computing, Communications and Applications (TOMCCAP) honors one member of the Editorial Board with the TOMCCAP Associate Editor of the Year Award. The purpose of the award is the distinction of excellent work for ACM TOMCCAP and hence also for the whole multimedia community in the previous year. Criteria for the award are (1.) the amount of submissions processed in time, (2.) the performance during the reviewing process and (3.) the accurate interaction with the reviewers in order to broader the awareness for the journal.

Based on the criteria mentioned above, the ACM Transactions on Multimedia Computing, Communications and Applications Associate Editor of the Year Award 2013 goes to Mohan S. Kankanhalli from the National University of Singapore.  The Editor-in-Chief Prof. Dr.-Ing. Ralf Steinmetz cordially congratulates Mohan.

Bio of Awardee:

Mohan Kankanhalli is a Professor at the Department of Computer Science of the National University of Singapore. He is also the Associate Provost for Graduate Education at NUS. Before that, he was the Vice-Dean for Academic Affairs and Graduate Studies at the NUS School of Computing during 2008-2010 and Vice-Dean for Research during 2001-2007. Mohan obtained his BTech from IIT Kharagpur and MS & PhD from the Rensselaer Polytechnic Institute.

His current research interests are in Multimedia Systems (content processing, retrieval) and Multimedia Security (surveillance and privacy). He has been awarded a S$10M grant by Singapore’s National Research Foundation to set up the Centre for “Sensor-enhanced Social Media” (

Mohan has been actively involved in the organization of many major conferences in the area of Multimedia. He was the Director of Conferences for ACM SIG Multimedia from 2009 to 2013. He is on the editorial boards of several journals including the ACM Transactions on Multimedia Computing, Communications, and Applications, Springer Multimedia Systems Journal, Pattern Recognition Journal and Multimedia Tools & Applications Journal.

SIGMM Outstanding Technical Contributions Award 2013

SIGMM Award for Outstanding Technical Contributions to Multimedia Computing, Communications and Applications

The 2013 winner of the prestigious ACM Special Interest Group on Multimedia (SIGMM) award for Outstanding Technical Contributions to Multimedia Computing, Communications and Applications is Prof. Dr. Dick Bulterman. He is currently a Research Group Head of the Distributed and Interactive Systems at Centrum Wiskunde & Informatica (CWI) in Amsterdam, The Netherlands. He is also a Full Professor of Computer Science at Vrije Universiteit, Amsterdam. His research interests are multimedia authoring and document processing. His recent research concerns socially-aware multimedia, interactive television, and media analysis.

The ACM SIGMM Technical Achievement award is given in recognition of outstanding contributions over a researcher’s career. Prof. Dick Bulterman has been selected for his outstanding technical contributions in multimedia authoring, media annotation, and social sharing from research through standardization to entrepreneurship, and in particular for promoting international Web standards for multimedia authoring and presentation (SMIL) in the W3C Synchronized Multimedia Working Group as well as his dedicated involvement in the SIGMM research community for many years. The SIGMM award will be presented at the ACM International Conference on Multimedia 2013 that will be held Oct 21–25 2013 in Barcelona, Spain.

Dick Bulterman has been a long time intellectual leader in the area of temporal modeling and support for complex multimedia system. His research has led to the development of several widely used multimedia authoring systems and players. He developed the Amsterdam Hypermedia Model, the CMIF document structure, the CMIFed authoring environment, the GRiNS editor and player, and a host of multimedia demonstrator applications. In 1999, he started the CWI spinoff company called Oratrix Development BV, and he worked as CEO to widely deliver this software.

Dick has a strong international reputation for the development of the domain-specific temporal language for multimedia (SMIL). Much of this software has been incorporated into the widely used Ambulant Open Source SMIL Player, which has served to encourage development and use of time-based multimedia content. His conference publications and book on SMIL have helped to promote SMIL and its acceptance as a W3C standard.

Dick’s recent work on social sharing of video will likely prove influential in upcoming Interactive TV products. This work has already been recognized in the academic community, earning the ACM SIGMM best paper award at ACM MM 2008 and also at the EUROITV conference.

In summary, Prof. Bulterman’s accomplishments include pioneering and extraordinary contributions in multimedia authoring, media annotation, and social sharing and outstanding service to the computing community.

SIGMM PhD Thesis Award 2013

SIGMM Award for Outstanding PhD Thesis in Multimedia Computing, Communications and Applications 2013

The SIGMM Ph.D. Thesis Award Committee is pleased to recommend this year’s award for the outstanding Ph.D. thesis in multimedia computing, communications and applications to Dr. Xirong Li.

The committee considered Dr. Li’s dissertation titled “Content-based visual search learned from social media” as worthy of the award as it substantially extends the boundaries for developing content-based multimedia indexing and retrieval solutions. In particular, it provides fresh new insights into the possibilities for realizing image retrieval solutions in the presence of vast information that can be drawn from the social media.

The committee considered the main innovation of Dr. Li’s work to be in the development of the theory and algorithms providing answers to the following challenging research questions:

  1. what determines the relevance of a social tag with respect to an image,
  2. how to fuse tag relevance estimators,
  3. which social images are the informative negative examples for concept learning,
  4. how to exploit socially tagged images for visual search and
  5. how to personalize automatic image tagging with respect to a user’s preferences.

The significance of the developed theory and algorithms lies in their power to enable effective and efficient deployment of the information collected from the social media to enhance the datasets that can be used to learn automatic image indexing mechanisms (visual concept detection) and to make this learning more personalized for the user.

Bio of Awardee:

Dr. Xirong Li received the B.Sc. and M.Sc. degrees from the Tsinghua University, China, in 2005 and 2007, respectively, and the Ph.D. degree from the University of Amsterdam, The Netherlands, in 2012, all in computer science. The title of his thesis is “Content-based visual search learned from social media”. He is currently an Assistant Professor in the Key Lab of Data Engineering and Knowledge Engineering, Renmin University of China. His research interest is image search and multimedia content analysis. Dr. Li received the IEEE Transactions on Multimedia Prize Paper Award 2012, Best Paper Nominee of the ACM International Conference on Multimedia Retrieval 2012, Chinese Government Award for Outstanding Self-Financed Students Abroad 2011, and the Best Paper Award of the ACM International Conference on Image and Video Retrieval 2010. He served as publicity co-chair for ICMR 2013.

ACM SIGMM/TOMCCAP 2013 Award Announcements

The ACM Special Interest Group in Multimedia (SIGMM) and ACM Transactions on Multimedia Computing, Communications and Applications (TOMCCAP) are pleased to announce the following awards for 2013 recognizing outstanding achievements and services made in the multimedia community.

SIGMM Technical Achievement Award:
Dr. Dick Bulterman

SIGMM Best Ph.D. Thesis Award:
Dr. Xirong Li

TOMCCAP Nicolas D. Georganas Best Paper Award:
“Exploring interest correlation for peer-to-peer socialized video sharing” by Xu Cheng and Jiangchuan Liu, published in TOMCCAP vol. 8, Issue 1, 2012.

TOMCCAP Best Associate Editor Award:
Dr. Mohan S. Kankanhalli

Additional information of each award and recepient will be released in separate announcemrtns. Awards will be presented in the annual SIGMM event, ACM Multimedia Conference, held in Barcelona, Catalunya, Spain during October 23-25 2013.

ACM is the professional society of computer scientists, and SIGMM is the special interest group on multimedia. TOMCCAP is the flagship journal publication of SIGMM.