The 16th ACM Multimedia Systems Conference (with the associated workshops: NOSSDAV 2025 and MMVE 2025) was held from March 31st to April 4th 2025, in Stellenbosch, South Africa. By choosing this location, the steering committee marked a milestone for SIGMM: MMSys became the very first SIGMM conference to take place on the African continent. This perfectly aligns with the SIGMM ongoing mission to build an inclusive and globally representative multimedia‑systems community.
The MMSys conference brings together researchers in multimedia systems to showcase and exchange their cutting-edge research findings. Once again, there were technical talks spanning various multimedia domains and inspiring keynote presentations.
Recognising the importance of in‑person exchange—especially for early‑career researchers—SIGMM once again funded Student Travel Grants. This support enabled a group of doctoral students to attend the conference, present their work and start building their international peer networks. In this column, the recipients of the travel grants share their experiences at MMSys 2025.
Guodong Chen – PhD student, Northeastern University, USA
What an incredible experience attending ACM MMSys 2025 in South Africa! Huge thanks to SIGMM for the travel grant that made this possible.
It was an honour to present our paper, “TVMC: Time-Varying Mesh Compression Using Volume-Tracked Reference Meshes”, and I’m so happy that it received the Best Reproducible Paper Award!
MMSys is not that huge, but it’s truly great. It’s exceptionally well-organized, and what impressed me the most was the openness and enthusiasm of the community. Everyone is eager to communicate, exchange ideas, and dive deep into cutting-edge multimedia systems research. I made many new friends and discovered exciting overlaps between my research and the work of other groups. I believe many collaborations are on the way and that, to me, is the true mark of a successful conference.
Besides the conference, South Africa was amazing, don’t miss the wonderful wines of Stellenbosch and the unforgettable experience of a safari tour.
Lea Brzica – PhD student, University of Zagreb, Croatia
Attending MMSys’25 in Stellenbosch, South Africa was an unforgettable and inspiring experience. As a new PhD student and early-career researcher, this was not only my first in-person conference but also my first time presenting. I was honoured to share my work, “Analysis of User Experience and Task Performance in a Multi-User Cross-Reality Virtual Object Manipulation Task,” and excited to see genuine interest from other attendees. Beyond the workshop and technical sessions, I thoroughly enjoyed the keynotes and panel discussions. The poster sessions and demos were great opportunities to explore new ideas and engage with people from all over the world. One of the most meaningful aspects of the conference was the opportunity to meet fellow PhD students and researchers face-to-face. The coffee breaks and social activities created a welcoming atmosphere that made it easy to form new connections.
I am truly grateful to SIGMM for supporting my participation. The travel grant helped alleviate the financial burden of international travel and made this experience possible. I’m already hoping for the chance to come back and be part of it all over again!
My time at MMSys 2025 was an incredibly rewarding experience. It was great meeting so many interesting and passionate people in the field, and the reception was both enthusiastic and exceptionally well organized. I want to sincerely thank SIGMM for the travel grant, as their support made it possible for me to attend and present my work. South Africa was an amazing destination, and the entire experience was both professionally and personally unforgettable. MMSys was also the perfect environment for networking, offering countless opportunities to connect with researchers and industry experts. It was truly exciting to see so much interest in my work and to engage in meaningful conversations with others in the multimedia systems community.
JPEG assesses responses to its Call for Proposals on Lossless Coding of Visual Events
The 107th JPEG meeting was held in Brussels, Belgium, from April 12 to 18, 2025. During this meeting, the JPEG Committee assessed the responses to its call for proposals on JPEG XE, an International Standard for lossless coding of visual events. JPEG XE is being developed under the auspices of three major standardisation organisations: ISO, IEC, and ITU. It will be the first codec developed by the JPEG committee targeting lossless representation and coding of visual events.
The JPEG Committee is also working on various standardisation projects, such as JPEG AI, which uses learning technology to achieve high compression, JPEG Trust, which sets standards to combat fake media and misinformation while rebuilding trust in multimedia, and JPEG DNA, which represents digital images using DNA sequences for long-term storage.
The following sections summarise the main highlights of the 107th JPEG meeting:
JPEG XE
JPEG AI
JPEG Trust
JPEG AIC
JPEG Pleno
JPEG DNA
JPEG XS
JPEG RF
JPEG XE
This initiative focuses on a new imaging modality produced by event-based visual sensors. This effort aims to establish a standard that efficiently represents and codes events, thereby enhancing interoperability in sensing, storage, and processing for machine vision and related applications.
As a response to the JPEG XE Final Call for Proposals on lossless coding of events, the JPEG Committee received five innovative proposals for consideration. Their evaluation indicated that two among them meet the stringent requirements of the constrained case, where resources, power, and complexity are severely limited. The remaining three proposals can cater to the unconstrained case. During the 107th JPEG meeting, the JPEG Committee launched a series of Core Experiments to define a path forward based on the received proposals as a starting point for the development of the JPEG XE standard.
To streamline the standardisation process, the JPEG Committee will proceed with the JPEG XE initiative in three distinct phases. Phase 1 will concentrate on lossless coding for the constrained case, while Phase 2 will address the unconstrained case. Both phases will commence simultaneously, although Phase 1 will follow a faster timeline to enable a timely publication of the first edition of the standard. The JPEG Committee recognises the urgent industry demand for a standardised solution for the constrained case, aiming to produce a Committee Draft by as early as July 2025. The third phase will focus on lossy compression of event sequences. The discussions and preparations will be initiated soon.
In a significant collaborative effort between ISO/IEC JTC 1/SC 29/WG1 and ITU-T SG21, the JPEG Committee will proceed to specify a joint JPEG XE standard. This partnership will ensure that JPEG XE becomes a shared standard under ISO, IEC, and ITU-T, reflecting their mutual commitment to developing standards for event-based systems.
Additionally, the JPEG Committee is actively discussing and exploring lossy coding of visual events, exploring future evaluation methods for such advanced technologies. Stakeholders interested in JPEG XE are encouraged to access public documents available at jpeg.org. Moreover, a joint Ad-hoc Group on event-based vision has been formed between ITU-T Q7/21 and ISO/IEC JTC1 SC29/WG1, paving the way for continued collaboration leading up to the 108th JPEG meeting.
JPEG AI
At the 107th JPEG meeting, JPEG AI discussions focused around conformance (JPEG AI Part 4), which has now advanced to the Draft International Standard (DIS) stage. The specification defines three conformance points — namely, the decoded residual tensor, the decoded latent space tensor (also referred to as feature space), and the decoded image. Strict conformance for the residual tensor is evaluated immediately after entropy decoding, while soft conformance for the latent space tensor is assessed after tensor decoding. The decoded image conformance is measured after converting the image to the output picture format, but before any post-processing filters are applied. Regarding the decoded image, two types have been defined: conformance Type A, which implies low tolerance, and conformance Type B, which allows for moderate tolerance.
During the 107th JPEG meeting, the results of several subjective quality assessment experiments were also presented and discussed, using different methodologies and for different test conditions, from low to very high qualities, including both SDR and HDR images. The results of these evaluations have shown that JPEG AI is highly competitive and, in many cases, outperforms existing state-of-the-art codecs such as VVC Intra, AVIF, and JPEG XL. A demonstration of an JPEG AI encoder running on a Huawei Mate50 Pro smartphone with a Qualcomm Snapdragon 8+ Gen1 chipset was also presented. This implementation supports tiling, high-resolution (4K) support, and a base profile with level 20. Finally, the implementation status of all mandatory and desirable JPEG AI requirements was discussed, assessing whether each requirement had been fully met, partially addressed, or remained unaddressed. This helped to clarify the current maturity of the standard and identify areas for further refinements.
JPEG Trust
Building on the publication of JPEG Trust (ISO/IEC 21617) Part 1 – Core Foundation in January 2025, the JPEG Committee approved a Draft International Standard (DIS) for a 2nd edition of Part 1 – Core Foundation during the 107th JPEG meeting. This Part 1 – Core Foundation 2nd edition incorporates the signalling of identity and intellectual property rights to address three particular challenges:
achieving transparency, through the signaling of content provenance
identifying content that has been generated either by humans, machines or AI systems, and
enabling interoperability, for example, by standardising machine-readable terms of use of intellectual property, especially AI-related rights reservations.
Additionally, the JPEG Committee is currently developing Part 2 – Trust Profiles Catalogue. Part 2 provides a catalogue of trust profile snippets that can be used either on their own or in combination for the purpose of constructing trust profiles, which can then be used for assessing the trustworthiness of media assets in given usage scenarios. The Trust Profiles Catalogue also defines a collection of conformance points, which enables interoperability across usage scenarios through the use of associated trust profiles.
The Committee continues to develop JPEG Trust Part 3 – Media asset watermarking to build out additional requirements for identified use cases, including the emerging need to identify AIGC content.
Finally, during the 107th meeting, the JPEG Committee initiated a Part 4 – Reference software, which will provide reference implementations of JPEG Trust to which implementers can refer to in developing trust solutions based on the JPEG Trust framework.
JPEG AIC
The JPEG AIC Part 3 standard (ISO/IEC CD 29170-3), has received a revised title “Information technology — JPEG AIC Assessment of image coding — Part 3: Subjective quality assessment of high-fidelity images”. At the 107th JPEG meeting, the results of the last Core Experiments for the standard and the comments on the Committee Draft of the standard were addressed. The draft text was thoroughly revised and clarified, and has now advanced to the Draft International Standard (DIS) stage.
Furthermore, Part 4 of JPEG AIC deals with objective quality metrics, also of high-fidelity images, and at the 107th JPEG meeting, the technical details regarding anchor metrics as well as the testing and evaluation of proposed methods were discussed and finalised. The results have been compiled in the document “Common Test Conditions on Objective Image Quality Assessment”, available on the JPEG website. Moreover, the corresponding Final Call for Proposals on Objective Image Quality Assessment (AIC-4) has been issued. Proposals are expected at the end of Summer 2025. The first Working Draft for Objective Image Quality Assessment (AIC-4) is planned for April 2026.
JPEG Pleno
The JPEG Pleno Light Field activity discussed the DoCR for the submitted Committee Draft (CD) of the 2nd edition of ISO/IEC 21794-2 (“Plenoptic image coding system (JPEG Pleno) Part 2: Light field coding”). This 2nd edition integrates AMD1 of ISO/IEC 21794-2 (“Profiles and levels for JPEG Pleno Light Field Coding”) and includes the specification of a third coding mode entitled Slanted 4D Transform Mode and its associated profile. It is expected that at the 108th JPEG meeting this new edition will advance to the Draft International Standard (DIS) stage.
Software tools have been created and tested to be added as Common Test Condition Tools to a reference software implementation for the standardized technologies within the JPEG Pleno framework, including the JPEG Pleno Part 2 (ISO/IEC 21794-2).
In the framework of the ongoing standardisation effort on quality assessment methodologies for light fields, significant progress was achieved during the 107th JPEG meeting. The JPEG Committee finalised the Committee Draft (CD) of the forthcoming standard ISO/IEC 21794-7 entitled JPEG Pleno Quality Assessment – Light Fields, representing an important step toward the establishment of reliable tools for evaluating the perceptual quality of light fields. This CD incorporates recent refinements to the subjective light field assessment framework and integrates insights from the latest core experiments.
The Committee also approved the Final Call for Proposals (CfP) on Objective Metrics for JPEG Pleno Quality Assessment – Light Fields. This initiative invites proposals of novel objective metrics capable of accurately predicting perceived quality of compressed light field content. The detailed submission timeline and required proposal components are outlined in the released final CfP document. To support this process, updated versions of the Use Cases and Requirements (v6.0) and Common Test Conditions (v2.0) related to this CfP were reviewed and made available. Moreover, several task forces have been established to address key proposal elements, including dataset preparation, codec configuration, objective metric evaluation, and the subjective experiments.
At this meeting, ISO/IEC 21794-6 (“Plenoptic image coding system (JPEG Pleno) Part 6: Learning-based point cloud coding”) progressed to the balloting of the Final Draft International Standard (FDIS) stage. Balloting will end on the 12th of June 2025 with the publication of the International Standard expected for August 2025.
The JPEG Committee held a workshop on Future Challenges in Compression of Holograms for XR Applications organised on April 16th, covering major applications from holographic cameras to holographic displays. The 2nd workshop for Future Challenges in Compression of Holograms for Metrology Applications is planned for July.
JPEG DNA
The JPEG Committee continues to develop JPEG DNA, an ambitious initiative to standardize the representation of digital images using DNA sequences for long-term storage. Following a Call for Proposals launched at its 99th JPEG meeting, a Verification Model was established during the 102nd JPEG meeting, then refined through core experiments that led to the first Working Draft at the 103rd JPEG meeting.
New JPEG DNA logo.
At its 105th JPEG meeting, JPEG DNA was officially approved as a new ISO/IEC project (ISO/IEC 25508), structured into four parts: Core Coding System, Profiles and Levels, Reference Software, and Conformance. The Committee Draft (CD) of Part 1 was produced at the 106th JPEG meeting.
During the 107th JPEG meeting, the JPEG Committee reviewed the comments received on the CD of JPEG DNA standard and prepared a Disposition of Comments Report (DoCR). The goal remains to reach International Standard (IS) status for Part 1 by April 2026.
On this occasion, the official JPEG DNA logo was also unveiled, marking a new milestone in the visibility and identity of the project.
JPEG XS
The development of the third edition of the JPEG XS standard is nearing its final stages, marking significant progress for the standardisation of high-performance video coding. Notably, Part 4, focusing on conformance testing, has been officially accepted by ISO and IEC for publication. Meanwhile, Part 5, which provides reference software, is presently at Draft International Standard (DIS) ballot stage.
In a move that underscores the commitment to accessibility and innovation in media technology, both Part 4 and Part 5 will be made publicly available as free standards. This decision is expected to facilitate widespread adoption and integration of JPEG XS in relevant industries and applications.
Looking to the future, the JPEG Committee is exploring enhancements to the JPEG XS standard, particularly in supporting a master-proxy stream feature. This feature enables a high-fidelity master video stream to be accompanied by a lower-resolution proxy stream, ensuring minimal overhead. Such functionalities are crucial in optimising broadcast and content production workflows.
JPEG RF
The JPEG RF activity issued the proceedings of the Joint JPEG/MPEG Workshop on Radiance Fields which was held on the 31st of January and featured world-renowned speakers discussing Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) from the perspective of both academia, industry, and standardisation groups. Video recordings and all related material were made publicly available on the JPEG website. Moreover, an improved version of the JPEG RF State of the Art and Challenges document was proposed, including an updated review of coding techniques for radiance fields as well as newly identified use cases and requirements. The group also defined an exploration study to investigate protocols for subjective and objective quality assessment, which are considered to be crucial to advance this activity towards a coding standard for radiance fields.
Final Quote
“A cost-effective and interoperable event-based vision ecosystem requires an efficient coding standard. The JPEG Committee embraces this new challenge by initiating a new standardisation project to achieve this objective.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.
“What does history mean to computer scientists?” – that was the first question that popped up in my mind when I was to attend the ACM Heritage Workshop at Minneapolis few months back. And needless to say, the follow up question was “what does history mean for a multimedia systems researcher?” As a young graduate student, I had the joy of my life when my first research paper on multimedia authoring (a hot topic those days) was accepted for presentation in the first ACM Multimedia in 1993, and that conference was held along side SIGGRAPH. Thinking about that, it gives multimedia systems researchers about 25 to 30 years of history. But what a flow of topics this area has seen: from authoring to streaming to content-based retrieval to social media and human-centered multimedia, the research area has been hot as ever. So, is it the history of research topics or the researchers or both? Then, how about the venues hosting these conferences, the networking events, or the grueling TPC meetings that prepped the conference actions?
Figure 1. Picture from the venue
With only questions and no clear answers, I decided to attend the workshop with an open mind. Most SIGs (Special Interest Groups) in ACM had representation at this workshop. The workshop itself was organized by the ACM History Committee. I understood this committee, apart from the workshop, organizes several efforts to track, record, and preserve computing efforts across disciplines. This includes identifying distinguished persons (who are retired but made significant contributions to computing), coming up with a customized questionnaire for the persons, training the interviewer, recording the conversations, curating them, archiving, and providing them for public consumption. Efforts at most SIGs were mostly based on the website. They were talking about how they try to preserve conference materials such as paper proceedings (when only paper proceedings were published), meeting notes, pictures, and videos. For instance, some SIGs were talking about how they tracked and preserved ACM’s approval letter for the SIG!
It was very interesting – and touching – to see some attendees (senior Professors) coming to the workshop with boxes of materials – papers, reports, books, etc. They were either downsizing their offices or clearing out, and did not feel like throwing the material in recycling bins! These materials were given to ACM and Babbage Institute (at University of Minnesota, Minneapolis) for possible curation and storage.
Figure 2. Galleries with collected material
ACM History committee members talked about how they can fund (at a small
level) projects that target specific activities for preserving and archiving
computing events and materials. ACM History Committee agreed that ACM
should take more responsibility in providing technical support to web hosting –
obviously, not sure whether anything tangible would result.
Over the two days at the workshop, I was getting answers to my questions: History can mean pictures and videos taken at earlier MM conferences, TPC meetings, SIGMM sponsored events and retreats. Perhaps, the earlier paper proceedings that have some additional information than what is found in the corresponding ACM Digital Library version. Interviews with different research leaders that built and promoted SIGMM.
It was clear that history meant different things to different SIGs, and
as SIGMM community, we would have to arrive at our own
interpretation, collect and preserve that. And that made me understand the most
obvious and perhaps, the most important thing: today’s events become tomorrow’s
history! No brainer, right? Preserving today’s SIGMM events will give
us a richer, colorful, and more complete SIGMM history for the future
generations!
Social media sharing platforms (e.g., YouTube, Flickr, Instagram, and SoundCloud) have revolutionized how users access multimedia content online. Most of these platforms provide a variety of ways for the user to interact with the different types of media: images, video, music. In addition to watching or listening to the media content, users can also engage with content in different ways, e.g., like, share, tag, or comment. Social media sharing platforms have become an important resource for scientific researchers, who aim to develop new indexing and retrieval algorithms that can improve users’ access to multimedia content. As a result, enhancing the experience provided by social media sharing platforms.
Historically, the multimedia research community has focused on developing multimedia analysis algorithms that combine visual and text modalities. Less highly visible is research devoted to algorithms that exploit an audio signal as the main modality. Recently, awareness for the importance of audio has experienced a resurgence. Particularly notable is Google’s release of the AudioSet, “A large-scale dataset of manually annotated audio events” [7]. In a similar spirit, we have developed the “Socially Significant Music Event“ dataset that supports research on music events [3]. The dataset contains Electronic Dance Music (EDM) tracks with a Creative Commons license that have been collected from SoundCloud. Using this dataset, one can build machine learning algorithms to detect specific events in a given music track.
What are socially significant music events? Within a music track, listeners are able to identify certain acoustic patterns as nameable music events. We call a music event “socially significant” if it is popular in social media circles, implying that it is readily identifiable and an important part of how listeners experience a certain music track or music genre. For example, listeners might talk about these events in their comments, suggesting that these events are important for the listeners (Figure 1).
Traditional music event detection has only tackled low-level events like music onsets [4] or music auto-tagging [8, 10]. In our dataset, we consider events that are at a higher abstraction level than the low-level musical onsets. In auto-tagging, descriptive tags are associated with 10-second music segments. These tags generally fall into three categories: musical instruments (guitar, drums, etc.), musical genres (pop, electronic, etc.) and mood based tags (serene, intense, etc.). The types of tags are different than what we are detecting as part of this dataset. The events in our dataset have a particular temporal structure unlike the categories that are the target of auto-tagging. Additionally, we analyze the entire music track and detect start points of music events rather than short segments like auto-tagging.
There are three music events in our Socially Significant Music Event dataset: Drop, Build, and Break. These events can be considered to form the basic set of events used by the EDM producers [1, 2]. They have a certain temporal structure internal to themselves, which can be of varying complexity. Their social significance is visible from the presence of large number of timed comments related to these events on SoundCloud (Figure 1,2). The three events are popular in the social media circles with listeners often mentioning them in comments. Here, we define these events [2]:
Drop: A point in the EDM track, where the full bassline is re-introduced and generally follows a recognizable build section
Build: A section in the EDM track, where the intensity continuously increases and generally climaxes towards a drop
Break: A section in an EDM track with a significantly thinner texture, usually marked by the removal of the bass drum
Figure 1. Screenshot from SoundCloud showing a list of timed comments left by listeners on a music track [11].
SoundCloud
SoundCloud is an online music sharing platform that allows users to record, upload, promote and share their self-created music. SoundCloud started out as a platform for amateur musicians, but currently many leading music labels are also represented. One of the interesting features of SoundCloud is that it allows “timed comments” on the music tracks. “Timed comments” are comments, left by listeners, associated with a particular time point in the music track. Our “Socially Significant Music Events” dataset is inspired by the potential usefulness of these timed comments as ground truth for training music event detectors. Figure 2 contains an example of a timed comment: “That intense buildup tho” (timestamp 00:46). We could potentially use this as a training label to detect a build, for example. In a similar way, listeners also mention the other events in their timed comments. So, these timed comments can serve as training labels to build machine learning algorithms to detect events.
Figure 2. Screenshot from SoundCloud indicating the useful information present in the timed comments. [11]
SoundCloud also provides a well-documented API [6] with interfaces to many programming languages: Python, Ruby, JavaScript etc. Through this API, one can download the music tracks (if allowed by the uploader), timed comments and also other metadata related to the track. We used this API to collect our dataset. Via the search functionality we searched for tracks uploaded during the year 2014 with a Creative Commons license, which results in a list of tracks with unique identification numbers. We looked at the timed comments of these tracks for the keywords: drop, break and build. We kept the tracks whose timed comments contained a reference to these keywords and discarded the other tracks.
Dataset
The dataset contains 402 music tracks with an average duration of 4.9 minutes. Each track is accompanied by timed comments relating to Drop, Build, and Break. It is also accompanied by ground truth labels that mark the true locations of the three events within the tracks. The labels were created by a team of experts. Unlike many other publicly available music datasets that provide only metadata or short previews of music tracks [9], we provide the entire track for research purposes. The download instructions for the dataset can be found here: [3]. All the music tracks in the dataset are distributed under the Creative Commons license. Some statistics of the dataset are provided in Table 1.
Table 1. Statistics of the dataset: Number of events, Number of timed comments
Event Name
Total number of events
Number of events per track
Total number of timed comments
Number of timed comments per track
Drop
435
1.08
604
1.50
Build
596
1.48
609
1.51
Break
372
0.92
619
1.54
The main purpose of the dataset is to support training of detectors for the three events of interest (Drop, Build, and Break) in a given music track. These three events can be considered a case study to prove that it is possible to detect socially significant musical events, opening the way for future work on an extended inventory of events. Additionally, the dataset can be used to understand the properties of timed comments related to music events. Specifically, timed comments can be used to reduce the need for manually acquired ground truth, which is expensive and difficult to obtain.
Timed comments present an interesting research challenge: temporal noise. The timed comments and the actual events do not always coincide. The comments could be at the same position, before, or after the actual event. For example, in the below music track (Figure 3), there is a timed comment about a drop at 00:40, while the actual drop occurs only at 01:00. Because of this noisy nature, we cannot use the timed comments alone as ground truth. We need strategies to handle temporal noise in order to use timed comments for training [1].
Figure 3. Screenshot from SoundCloud indicating the noisy nature of timed comments [11].
In addition to music event detection, our “Socially Significant Music Event” dataset opens up other possibilities for research. Timed comments have the potential to improve users’ access to music and to support them in discovering new music. Specifically, timed comments mention aspects of music that are difficult to derive from the signal, and may be useful to calculate song-to-song similarity needed to improve music recommendation. The fact that the comments are related to a certain time point is important because it allows us to derive continuous information over time from a music track. Timed comments are potentially very helpful for supporting listeners in finding specific points of interest within a track, or deciding whether they want to listen to a track, since they allow users to jump-in and listen to specific moments, without listening to the track end-to-end.
State of the art
The detection of music events requires training classifiers that are able to generalize over the variability in the audio signal patterns corresponding to events. In Figure 4, we see that the build-drop combination has a characteristic pattern in the spectral representation of the music signal. The build is a sweep-like structure and is followed by the drop, which we indicate by a red vertical line. More details about the state-of-the-art features useful for music event detection and the strategies to filter the noisy timed comments can be found in our publication [1].
Figure 4. The spectral representation of the musical segment containing a drop. You can observe the sweeping structure indicating the buildup. The red vertical line is the drop.
The evaluation metric used to measure the performance of a music event detector should be chosen according to the user scenario for that detector. For example, if the music event detector is used for non-linear access (i.e., creating jump-in points along the playbar) it is important that the detected time point of the event falls before, rather than after, the actual event. In this case, we recommend using the “event anticipation distance” (ea_dist) as a metric. The ea_dist is amount of time that the predicted event time point precedes an actual event time point and represents the time the user would have to wait to listen to the actual event. More details about ea_dist can be found in our paper [1].
In [1], we report the implementation of a baseline music event detector that uses only timed comments as training labels. This detector attains an ea_dist of 18 seconds for a drop. We point out that from the user point of view, this level of performance could already lead to quite useful jump-in points. Note that the typical length of a build-drop combination is between 15-20 seconds. If the user is positioned 18 seconds before the drop, the build would have already started and the user knows that a drop is coming. Using an optimized combination of timed comments and manually acquired ground truth labels we are able to achieve an ea_dist of 6 seconds.
Conclusion
Timed comments, on their own, can be used as training labels to train detectors for socially significant events. A detector trained on timed comments performs reasonably well in applications like non-linear access, where the listener wants to jump through different events in the music track without listening to it in its entirety. We hope that the dataset will encourage researchers to explore the usefulness of timed comments for all media. Additionally, we would like to point out that our work has demonstrated that the impact of temporal noise can be overcome and that the contribution of timed comments to video event detection is worth investigating further.
Contact
Should you have any inquiries or questions about the dataset, do not hesitate to contact us via email at: n.k.yadati@tudelft.nl
References
[1] K. Yadati, M. Larson, C. Liem and A. Hanjalic, “Detecting Socially Significant Music Events using Temporally Noisy Labels,” in IEEE Transactions on Multimedia. 2018. http://ieeexplore.ieee.org/document/8279544/
[2] M. Butler, Unlocking the Groove: Rhythm, Meter, and Musical Design in Electronic Dance Music, ser. Profiles in Popular Music. Indiana University Press, 2006
[8] H. Y. Lo, J. C. Wang, H. M. Wang and S. D. Lin, “Cost-Sensitive Multi-Label Learning for Audio Tag Annotation and Retrieval,” in IEEE Transactions on Multimedia, vol. 13, no. 3, pp. 518-529, June 2011. http://ieeexplore.ieee.org/document/5733421/
Dear Member of the SIGMM Community, welcome to the third issue of the SIGMM Records in 2013.
On the verge of ACM Multimedia 2013, we can already present the receivers of SIGMM’s yearly awards, the SIGMM Technical Achievement Award, the SIGMM Best Ph.D. Thesis Award, the TOMCCAP Nicolas D. Georganas Best Paper Award, and the TOMCCAP Best Associate Editor Award.
The TOMCCAP Special Issue on the 20th anniversary of ACM Multimedia is out in October, and you can read both the announcement, and find each of the contributions directly through the TOMCCAP Issue 9(1S) table of contents.
That SIGMM has established a strong foothold in the scientific community can also be seen by the Chinese Computing Federation’s rankings of SIGMM’s venues. Read the article to get even more motivation for submitting your papers to SIGMM’s conferences and journal.
We are also reporting from SLAM, the international workshop on Speech, Language and Audio in Multimedia. Not a SIGMM event, but certainly of interest to many SIGMMers who care about audio technology.
You find also two PhD thesis summaries, and last but most certainly not least, you find pointers to the latest issues of TOMCCAP and MMSJ, and several job announcements.
We hope that you enjoy this issue of the Records.
The Editors
Stephan Kopf, Viktor Wendel, Lei Zhang, Pradeep Atrey, Christian Timmerer, Pablo Cesar, Mathias Lux, Carsten Griwodz
ACM Transactions on Multimedia Computing, Communications and Applications
Special Issue: 20th Anniversary of ACM International Conference on Multimedia
A journey ‘Back to the Future’
The ACM Special Interest Group on Multimedia (SIGMM) celebrated the 20th anniversary of the establishment of its premier conference, the ACM International Conference on Multimedia (ACM Multimedia) in 2012. To commemorate this milestone, leading researchers organized and extensively contributed to the 20th anniversary celebration.
from left to right: Malcolm Slaney, Ramesh Jain, Dick Bulterman, Klara Nahrstedt, Larry Rowe and Ralf Steinmetz
The celebratory events started at ACM Multimedia 2012 in Nara Japan, with the “Coulda, Woulda, Shoulda: 20 Years of Multimedia Opportunities” panel, organized by Klara Nahrstedt (center) and Malcolm Slaney (far left). At this panel, pioneers of the field, Ramesh Jain, Dick Bulterman, Larry Rowe and Ralf Steinmetz, from left to right shown in the image, reflected on innovations, and successful and missed opportunities in the multimedia research area.
This special issue of the ACM Transaction on Multimedia Computing, Communication and Applications (TOMCCAP) is the final event to celebrate achievements and opportunities in a variety of multimedia areas. Through peer-reviewed long articles and invited short contributions, readers will get a sense of the past, present and future of multimedia research. The evolution ranges over traditional topics such as video streaming, multimedia synchronization, multimedia authoring, content analysis, and multimedia retrieval to newer topics including music retrieval, geo-tagging context in worldwide community of photos, multi-modal humancomputer interactions and experiential media systems.
Recent years have seen an explosion of research and technologies in multimedia, beyond individual algorithms, protocols and small scale systems. The scale of multimedia innovations and deployment has exploded with unimaginable speed. Hence, as the multimedia area is growing fast, penetrating every facet of our society, this special issue fills an important need to look back at the multimedia research achievements over the past 20 years, celebrates the exciting potential, and explores new goals of the multimedia research community.
ACM Transactions on Multimedia Computing, Communications and Applications (TOMCCAP) Nicolas D. Georganas Best Paper Award
The 2013 ACM Transactions on Multimedia Computing, Communications and Applications (TOMCCAP) Nicolas D. Georganas Best Paper Award is provided to the paper Exploring interest correlation for peer-to-peer socialized video sharing (TOMCCAP vol. 8, Issue 1) by Xu Cheng and Jiangchuan Liu.
The purpose of the named award is to recognize the most significant work in ACM TOMCCAP in a given calendar year. The whole readership of ACM TOMCCAP was invited to nominate articles which were published in Volume 8 (2012). Based on the nominations the winner has been chosen by the TOMCCAP Editorial Board. The main assessment criteria have been quality, novelty, timeliness, clarity of presentation, in addition to relevance to multimedia computing, communications, and applications.
In this paper the authors examine architectures for large-scale video streaming systems exploiting social relations. To achieve this objective, a large study of YouTube traffic was conducted and a cluster analysis performed on the resulting data. Based on the observations made, a new approach for video pre-fetching based on social relations has been developed. This important work bridges the gap between social media and multimedia streaming and hence combines two extremely relevant research topics.
The award honors the founding Editor-in-Chief of TOMCCAP, Nicolas D. Georganas, for his outstanding contributions to the field of multimedia computing and his significant contributions to ACM. He exceedingly influenced the research and the whole multimedia community.
The Editor-in-Chief Prof. Dr.-Ing. Ralf Steinmetz and the Editorial Board of ACM TOMCCAP cordially congratulate the winner. The award will be presented to the authors on October 24th 2013 at the ACM Multimedia 2013 in Barcelona, Spain and includes travel expenses for the winning authors.
Bio of Awardees:
Xu Cheng is currently a research engineer at BroadbandTV, Vancouver, Canada. He receive the Bachelor of Science from Peking University, China, in 2006, Master of Science from Simon Fraser University, Canada, in 2008, and PhD from Simon Fraser University, Canada, in 2012. His research interests included multimedia networks, social networks and overlay networks.
Jiangchuan Liu is an Associate Professor in School of Computing Science, Simon Fraser University, British Columbia, Canada. He received BEng(cum laude) from Tsinghua University in 1999 and PhD from HKUST in 2003, both in computer science. He is a co-recipient of ACM Multimedia’2012 Best Paper Award, IEEE Globecom’2011 Best Paper Award, IEEE Communications Society Best Paper Award on Multimedia Communications 2009, as well as IEEE IWQoS’08 and IEEE/ACM IWQoS’2012 Best Student Paper Awards. His research interests are in networking and multimedia. He served on the editorial boards of IEEE Transactions on Multimedia, IEEE Communications Tutorial and Surveys, and IEEE Internet of Things Journal. He will be TPC co-chair for IEEE/ACM IWQoS’2014 at Hong Kong.
ACM Transactions on Multimedia Computing, Communications and Applications Best Associate Editor Award
Annually, the Editor-in-Chief of the ACM Transactions on Multimedia Computing, Communications and Applications (TOMCCAP) honors one member of the Editorial Board with the TOMCCAP Associate Editor of the Year Award. The purpose of the award is the distinction of excellent work for ACM TOMCCAP and hence also for the whole multimedia community in the previous year. Criteria for the award are (1.) the amount of submissions processed in time, (2.) the performance during the reviewing process and (3.) the accurate interaction with the reviewers in order to broader the awareness for the journal.
Based on the criteria mentioned above, the ACM Transactions on Multimedia Computing, Communications and Applications Associate Editor of the Year Award 2013 goes to Mohan S. Kankanhalli from the National University of Singapore. The Editor-in-Chief Prof. Dr.-Ing. Ralf Steinmetz cordially congratulates Mohan.
Bio of Awardee:
Mohan Kankanhalli is a Professor at the Department of Computer Science of the National University of Singapore. He is also the Associate Provost for Graduate Education at NUS. Before that, he was the Vice-Dean for Academic Affairs and Graduate Studies at the NUS School of Computing during 2008-2010 and Vice-Dean for Research during 2001-2007. Mohan obtained his BTech from IIT Kharagpur and MS & PhD from the Rensselaer Polytechnic Institute.
His current research interests are in Multimedia Systems (content processing, retrieval) and Multimedia Security (surveillance and privacy). He has been awarded a S$10M grant by Singapore’s National Research Foundation to set up the Centre for “Sensor-enhanced Social Media” (sesame.comp.nus.edu.sg).
Mohan has been actively involved in the organization of many major conferences in the area of Multimedia. He was the Director of Conferences for ACM SIG Multimedia from 2009 to 2013. He is on the editorial boards of several journals including the ACM Transactions on Multimedia Computing, Communications, and Applications, Springer Multimedia Systems Journal, Pattern Recognition Journal and Multimedia Tools & Applications Journal.
SIGMM Award for Outstanding Technical Contributions to Multimedia Computing, Communications and Applications
The 2013 winner of the prestigious ACM Special Interest Group on Multimedia (SIGMM) award for Outstanding Technical Contributions to Multimedia Computing, Communications and Applications is Prof. Dr. Dick Bulterman. He is currently a Research Group Head of the Distributed and Interactive Systems at Centrum Wiskunde & Informatica (CWI) in Amsterdam, The Netherlands. He is also a Full Professor of Computer Science at Vrije Universiteit, Amsterdam. His research interests are multimedia authoring and document processing. His recent research concerns socially-aware multimedia, interactive television, and media analysis.
The ACM SIGMM Technical Achievement award is given in recognition of outstanding contributions over a researcher’s career. Prof. Dick Bulterman has been selected for his outstanding technical contributions in multimedia authoring, media annotation, and social sharing from research through standardization to entrepreneurship, and in particular for promoting international Web standards for multimedia authoring and presentation (SMIL) in the W3C Synchronized Multimedia Working Group as well as his dedicated involvement in the SIGMM research community for many years. The SIGMM award will be presented at the ACM International Conference on Multimedia 2013 that will be held Oct 21–25 2013 in Barcelona, Spain.
Dick Bulterman has been a long time intellectual leader in the area of temporal modeling and support for complex multimedia system. His research has led to the development of several widely used multimedia authoring systems and players. He developed the Amsterdam Hypermedia Model, the CMIF document structure, the CMIFed authoring environment, the GRiNS editor and player, and a host of multimedia demonstrator applications. In 1999, he started the CWI spinoff company called Oratrix Development BV, and he worked as CEO to widely deliver this software.
Dick has a strong international reputation for the development of the domain-specific temporal language for multimedia (SMIL). Much of this software has been incorporated into the widely used Ambulant Open Source SMIL Player, which has served to encourage development and use of time-based multimedia content. His conference publications and book on SMIL have helped to promote SMIL and its acceptance as a W3C standard.
Dick’s recent work on social sharing of video will likely prove influential in upcoming Interactive TV products. This work has already been recognized in the academic community, earning the ACM SIGMM best paper award at ACM MM 2008 and also at the EUROITV conference.
In summary, Prof. Bulterman’s accomplishments include pioneering and extraordinary contributions in multimedia authoring, media annotation, and social sharing and outstanding service to the computing community.
SIGMM Award for Outstanding PhD Thesis in Multimedia Computing, Communications and Applications 2013
The SIGMM Ph.D. Thesis Award Committee is pleased to recommend this year’s award for the outstanding Ph.D. thesis in multimedia computing, communications and applications to Dr. Xirong Li.
The committee considered Dr. Li’s dissertation titled “Content-based visual search learned from social media” as worthy of the award as it substantially extends the boundaries for developing content-based multimedia indexing and retrieval solutions. In particular, it provides fresh new insights into the possibilities for realizing image retrieval solutions in the presence of vast information that can be drawn from the social media.
The committee considered the main innovation of Dr. Li’s work to be in the development of the theory and algorithms providing answers to the following challenging research questions:
what determines the relevance of a social tag with respect to an image,
how to fuse tag relevance estimators,
which social images are the informative negative examples for concept learning,
how to exploit socially tagged images for visual search and
how to personalize automatic image tagging with respect to a user’s preferences.
The significance of the developed theory and algorithms lies in their power to enable effective and efficient deployment of the information collected from the social media to enhance the datasets that can be used to learn automatic image indexing mechanisms (visual concept detection) and to make this learning more personalized for the user.
Bio of Awardee:
Dr. Xirong Li received the B.Sc. and M.Sc. degrees from the Tsinghua University, China, in 2005 and 2007, respectively, and the Ph.D. degree from the University of Amsterdam, The Netherlands, in 2012, all in computer science. The title of his thesis is “Content-based visual search learned from social media”. He is currently an Assistant Professor in the Key Lab of Data Engineering and Knowledge Engineering, Renmin University of China. His research interest is image search and multimedia content analysis. Dr. Li received the IEEE Transactions on Multimedia Prize Paper Award 2012, Best Paper Nominee of the ACM International Conference on Multimedia Retrieval 2012, Chinese Government Award for Outstanding Self-Financed Students Abroad 2011, and the Best Paper Award of the ACM International Conference on Image and Video Retrieval 2010. He served as publicity co-chair for ICMR 2013.