Mario Montagud – Page 3 – ACM SIGMM Records

Towards Data Driven Conferences Within SIGMM

By Mario Montagud | May 29, 2018 - 01:47 |October 13, 2018 0318, Feature

There is no doubt that research in our field has become more data driven. And while the term data science might suggest there is some science without data, it is data the feeds our systems, trains our networks, and tests our results. Recently we have seen examples of a few conferences experimenting with their review process to gain new insights. For those of us in the Multimedia (MM) community, it is not an entirely new thing. In 2013, I led an effort along with my TPC Co-Chairs to look at the past several years of conferences and examine the reviewing system, the process, the scores, and the reviewer load. Additionally, we ran surveys to the authors (accepted and rejected) and the reviewers and ACs to gauge how we did. This was presented at the business meeting in Barcelona along with the suggestion that this practice continues. While this was met with great praise, it was never repeated.

Fast forward to 2017, I found myself asking the same questions about the MM review process which went through several changes (such as the late addition of the “Thematic Workshops” as well as an explicit COI—for papers from the Chairs —which we stated in 2013 could have adverse effects). And, just like before, I requested data from the Director of Conferences and SIGMM chair so I could run an analysis. There are a few things to note about the 2017 data.

Some reviews were contained in attachments which were unavailable.
Rebuttals were not present (some chairs allowed them, some did not).
The conference was divided into a “Regular” set of tracks and a “COI” track for anyone who was on the PC and submitted a paper.
The Call for Papers was a mixture of “Papers and Posters”.

The conference reports:

Finally, the Program Committee accepted 189 out of 684 submissions, yielding an acceptance rate of 27.63 percent. Among the 189 accepted full papers, 49 are selected to give oral presentations on the conference, while the rest are arranged to present to conference attendees in a poster format.

Track	Reviewed	Accepted	Rate
Main Paper	684	189	27.63%
Thematic Workshop	495	64	12.93%

In 2017, in a departure from previous years, the chairs decided to invite roughly 9% of the accepted papers for an oral presentation with the remaining accepts delegated to a larger poster session. During the review process, to be inclusive, a decision was made to invite some of the rejected papers to a non-archival Thematic workshop where their work could be presented as posters in a non-archival format such that the article can be published elsewhere at a future date. The published rates for these Thematic workshop was 64/495 or roughly 13% of the rejected papers. To dive in further, first, we compute the accepted orals and posters against the total submitted. Second, amongst the rejected papers, we compute the percent of rejects that were invited to a Thematic Workshop. However in the dataset there were 113 papers invited for Thematic Workshops; 49 of these did not make it into the program as the authors refused the automatic enrollment invitation.

	Normal	COI	Normal Rate	COI Rate
Oral	41	8	7.03%	7.92%
Poster	123	17	21.1%	16.83%
Workshop	79	34	18.85%	44.74%
Reject	339	42	58.15%	41.58%

Comparing the Regular and COI tracks, we find the scores to be correlated (p<0.003) if the workshops are treated as rejects. Including the workshops into the calculation shows no correlation (p<0.093). To further examine this, we plotted the percent decision by area and track.

Decision Percent Rates by Track and Area

While one must remember the numbers by volume are smaller in the COI track, some inflation will be seen here. Again, by percentage, you can see Novel Topics – Privacy and Experience – Novel Interactions have a higher oral accept rate while Understanding Vision & Deep Learning and Experience Perceptual pulled in higher Thematic Workshop rates.

No real change was seen in the score distribution across the tracks and areas (as seen here in the following jitter plots).

For the review lengths, the average size by character was 1452 with an IQR of 1231. Some reviews skewed longer in the Regular track but they still are outliers for the most part. The load averaged around 4 papers per reviewer with some normal exception. The people depicted with more than 10 papers were TCP members or ACs.

Overall, there is some difference but still a correlation between the COI and Regular tracks and the average number of papers per reviewer was kept to a manageable number. The score distributions roughly seems similar with the exception of the IQR but this is likely more product of the COI track being smaller. For the Thematic Workshops thereʼs an inflation in the accept rate for the COI track: accepting at 18.85% for the regular submissions but 44.74% for the COI. This was dampened by authors rejecting the Thematic Workshop invitation. Of the 79 Regular Workshop invitations and 34 COI invitations, only 50 regular and 14 COI were in the final program. So the final accept rates for what was actually at the conference became 11.93% for Regular Thematic workshop submissions and 18.42% for COI.

So where do we go from here?

Removal of a COI track. A COI track comes and goes in ACM MM and it seems its removal is at the top of the list. Modern conference management software (EasyChair, PCS, CMT, etc.) handles conflicts extremely well already.

TCP and ACs must monitor reviews. Next, while quantity is not related to quality, a short review length might be an early indicator of poor quality. TCP Chairs and ACs should monitor these reviews because a review of 959 characters is not particularly informative despite it being positive or negative (in fact this paragraph is almost as long as the average review). While some might believe trapping that error is the job of the authors and the Author Advocate (and hence the authors who need to invoke the Advocate), it is the job of the ACs and the TPC to ensure review quality and make sure the Advocate never gets invoked (as we presented the role back when we invented it in 2014).

CMS Systems Need To Support Us. There is no shortage of Conference Management Systems (CMS); none of them are data-driven. Why do I have to export a set of CSVs from a conference system then write R scripts to see there are short reviews? Yelp and TripAdvisor give me guidance on how long my review should be, how is it that a review for ACM MM can be two short sentences?

Provide upfront submission information. The Thematic Workshops were introduced late into the submission process and came as a surprise to many authors. While some innovation in the Technical program is a good idea, the decline rate showed it was undesirable. Some a priori communication with the community might give insights into what experiments we should try and what to avoid. Which falls into the final point.

We need a New Role. And while the SIGMM EC has committed to looking back at past conferences, we should continue this practice routinely. Conferences or ideally the SIGMM EC should create a “Data Health and Metrics” role (or assign this to the Director of Conferences) to continue to oversee the TPC as well as issue post-review and post-conference surveys to learn how we did at each step and ensure we can move forward and grow our community. However, if done right, it will be considerable work and should likely be its own role.

To get started, the SIGMM Executive Committee is working on obtaining past MM conference datasets to further track the history of the conference in a data-forward method. Hopefully youʼll hear more at the ACM MM Business Meeting and Town Hall in Seoul; SIGMM is looking to hear more from the community.

Socially significant music events

By Mario Montagud | March 7, 2018 - 21:46 |March 23, 2018 0118, Datasets Column, Feature, September 2013

Leave a comment

Social media sharing platforms (e.g., YouTube, Flickr, Instagram, and SoundCloud) have revolutionized how users access multimedia content online. Most of these platforms provide a variety of ways for the user to interact with the different types of media: images, video, music. In addition to watching or listening to the media content, users can also engage with content in different ways, e.g., like, share, tag, or comment. Social media sharing platforms have become an important resource for scientific researchers, who aim to develop new indexing and retrieval algorithms that can improve users’ access to multimedia content. As a result, enhancing the experience provided by social media sharing platforms.

Historically, the multimedia research community has focused on developing multimedia analysis algorithms that combine visual and text modalities. Less highly visible is research devoted to algorithms that exploit an audio signal as the main modality. Recently, awareness for the importance of audio has experienced a resurgence. Particularly notable is Google’s release of the AudioSet, “A large-scale dataset of manually annotated audio events” [7]. In a similar spirit, we have developed the “Socially Significant Music Event“ dataset that supports research on music events [3]. The dataset contains Electronic Dance Music (EDM) tracks with a Creative Commons license that have been collected from SoundCloud. Using this dataset, one can build machine learning algorithms to detect specific events in a given music track.

What are socially significant music events? Within a music track, listeners are able to identify certain acoustic patterns as nameable music events. We call a music event “socially significant” if it is popular in social media circles, implying that it is readily identifiable and an important part of how listeners experience a certain music track or music genre. For example, listeners might talk about these events in their comments, suggesting that these events are important for the listeners (Figure 1).

Traditional music event detection has only tackled low-level events like music onsets [4] or music auto-tagging [8, 10]. In our dataset, we consider events that are at a higher abstraction level than the low-level musical onsets. In auto-tagging, descriptive tags are associated with 10-second music segments. These tags generally fall into three categories: musical instruments (guitar, drums, etc.), musical genres (pop, electronic, etc.) and mood based tags (serene, intense, etc.). The types of tags are different than what we are detecting as part of this dataset. The events in our dataset have a particular temporal structure unlike the categories that are the target of auto-tagging. Additionally, we analyze the entire music track and detect start points of music events rather than short segments like auto-tagging.

There are three music events in our Socially Significant Music Event dataset: Drop, Build, and Break. These events can be considered to form the basic set of events used by the EDM producers [1, 2]. They have a certain temporal structure internal to themselves, which can be of varying complexity. Their social significance is visible from the presence of large number of timed comments related to these events on SoundCloud (Figure 1,2). The three events are popular in the social media circles with listeners often mentioning them in comments. Here, we define these events [2]:

Drop: A point in the EDM track, where the full bassline is re-introduced and generally follows a recognizable build section
Build: A section in the EDM track, where the intensity continuously increases and generally climaxes towards a drop
Break: A section in an EDM track with a significantly thinner texture, usually marked by the removal of the bass drum

Figure 1. Screenshot from SoundCloud showing a list of timed comments left by listeners on a music track [11].

SoundCloud

SoundCloud is an online music sharing platform that allows users to record, upload, promote and share their self-created music. SoundCloud started out as a platform for amateur musicians, but currently many leading music labels are also represented. One of the interesting features of SoundCloud is that it allows “timed comments” on the music tracks. “Timed comments” are comments, left by listeners, associated with a particular time point in the music track. Our “Socially Significant Music Events” dataset is inspired by the potential usefulness of these timed comments as ground truth for training music event detectors. Figure 2 contains an example of a timed comment: “That intense buildup tho” (timestamp 00:46). We could potentially use this as a training label to detect a build, for example. In a similar way, listeners also mention the other events in their timed comments. So, these timed comments can serve as training labels to build machine learning algorithms to detect events.

Figure 2. Screenshot from SoundCloud indicating the useful information present in the timed comments. [11]

SoundCloud also provides a well-documented API [6] with interfaces to many programming languages: Python, Ruby, JavaScript etc. Through this API, one can download the music tracks (if allowed by the uploader), timed comments and also other metadata related to the track. We used this API to collect our dataset. Via the search functionality we searched for tracks uploaded during the year 2014 with a Creative Commons license, which results in a list of tracks with unique identification numbers. We looked at the timed comments of these tracks for the keywords: drop, break and build. We kept the tracks whose timed comments contained a reference to these keywords and discarded the other tracks.

Dataset

The dataset contains 402 music tracks with an average duration of 4.9 minutes. Each track is accompanied by timed comments relating to Drop, Build, and Break. It is also accompanied by ground truth labels that mark the true locations of the three events within the tracks. The labels were created by a team of experts. Unlike many other publicly available music datasets that provide only metadata or short previews of music tracks [9], we provide the entire track for research purposes. The download instructions for the dataset can be found here: [3]. All the music tracks in the dataset are distributed under the Creative Commons license. Some statistics of the dataset are provided in Table 1.

Table 1. Statistics of the dataset: Number of events, Number of timed comments

Event Name	Total number of events	Number of events per track	Total number of timed comments	Number of timed comments per track
Drop	435	1.08	604	1.50
Build	596	1.48	609	1.51
Break	372	0.92	619	1.54

The main purpose of the dataset is to support training of detectors for the three events of interest (Drop, Build, and Break) in a given music track. These three events can be considered a case study to prove that it is possible to detect socially significant musical events, opening the way for future work on an extended inventory of events. Additionally, the dataset can be used to understand the properties of timed comments related to music events. Specifically, timed comments can be used to reduce the need for manually acquired ground truth, which is expensive and difficult to obtain.

Timed comments present an interesting research challenge: temporal noise. The timed comments and the actual events do not always coincide. The comments could be at the same position, before, or after the actual event. For example, in the below music track (Figure 3), there is a timed comment about a drop at 00:40, while the actual drop occurs only at 01:00. Because of this noisy nature, we cannot use the timed comments alone as ground truth. We need strategies to handle temporal noise in order to use timed comments for training [1].

Figure 3. Screenshot from SoundCloud indicating the noisy nature of timed comments [11].

In addition to music event detection, our “Socially Significant Music Event” dataset opens up other possibilities for research. Timed comments have the potential to improve users’ access to music and to support them in discovering new music. Specifically, timed comments mention aspects of music that are difficult to derive from the signal, and may be useful to calculate song-to-song similarity needed to improve music recommendation. The fact that the comments are related to a certain time point is important because it allows us to derive continuous information over time from a music track. Timed comments are potentially very helpful for supporting listeners in finding specific points of interest within a track, or deciding whether they want to listen to a track, since they allow users to jump-in and listen to specific moments, without listening to the track end-to-end.

State of the art

The detection of music events requires training classifiers that are able to generalize over the variability in the audio signal patterns corresponding to events. In Figure 4, we see that the build-drop combination has a characteristic pattern in the spectral representation of the music signal. The build is a sweep-like structure and is followed by the drop, which we indicate by a red vertical line. More details about the state-of-the-art features useful for music event detection and the strategies to filter the noisy timed comments can be found in our publication [1].

Figure 4. The spectral representation of the musical segment containing a drop. You can observe the sweeping structure indicating the buildup. The red vertical line is the drop.

The evaluation metric used to measure the performance of a music event detector should be chosen according to the user scenario for that detector. For example, if the music event detector is used for non-linear access (i.e., creating jump-in points along the playbar) it is important that the detected time point of the event falls before, rather than after, the actual event. In this case, we recommend using the “event anticipation distance” (ea_dist) as a metric. The ea_dist is amount of time that the predicted event time point precedes an actual event time point and represents the time the user would have to wait to listen to the actual event. More details about ea_dist can be found in our paper [1].

In [1], we report the implementation of a baseline music event detector that uses only timed comments as training labels. This detector attains an ea_dist of 18 seconds for a drop. We point out that from the user point of view, this level of performance could already lead to quite useful jump-in points. Note that the typical length of a build-drop combination is between 15-20 seconds. If the user is positioned 18 seconds before the drop, the build would have already started and the user knows that a drop is coming. Using an optimized combination of timed comments and manually acquired ground truth labels we are able to achieve an ea_dist of 6 seconds.

Conclusion

Timed comments, on their own, can be used as training labels to train detectors for socially significant events. A detector trained on timed comments performs reasonably well in applications like non-linear access, where the listener wants to jump through different events in the music track without listening to it in its entirety. We hope that the dataset will encourage researchers to explore the usefulness of timed comments for all media. Additionally, we would like to point out that our work has demonstrated that the impact of temporal noise can be overcome and that the contribution of timed comments to video event detection is worth investigating further.

Contact

Should you have any inquiries or questions about the dataset, do not hesitate to contact us via email at: n.k.yadati@tudelft.nl

References

[1] K. Yadati, M. Larson, C. Liem and A. Hanjalic, “Detecting Socially Significant Music Events using Temporally Noisy Labels,” in IEEE Transactions on Multimedia. 2018. http://ieeexplore.ieee.org/document/8279544/

[2] M. Butler, Unlocking the Groove: Rhythm, Meter, and Musical Design in Electronic Dance Music, ser. Profiles in Popular Music. Indiana University Press, 2006

[3] http://osf.io/eydxk

[4] http://www.music-ir.org/mirex/wiki/2017:Audio_Onset_Detection

[5] https://developers.soundcloud.com/docs/api/guide

[6] https://developers.soundcloud.com/docs/api/guide

[7] https://research.google.com/audioset/

[8] H. Y. Lo, J. C. Wang, H. M. Wang and S. D. Lin, “Cost-Sensitive Multi-Label Learning for Audio Tag Annotation and Retrieval,” in IEEE Transactions on Multimedia, vol. 13, no. 3, pp. 518-529, June 2011. http://ieeexplore.ieee.org/document/5733421/

[9] http://majorminer.org/info/intro

[10] http://www.music-ir.org/mirex/wiki/2016:Audio_Tag_Classification

[11] https://soundcloud.com/spinninrecords/ummet-ozcan-lose-control-original-mix

Report from ACM Multimedia 2017 – by Benoit Huet

By Mario Montagud | February 16, 2018 - 21:18 |February 16, 2018 0118, Conference Report, Feature

Leave a comment

Best #SIGMM Social Media Reporter Award! Me? Really??

This was my reaction after being informed by the SIGMM Social Media Editors that I was one of the two recipients following ACM Multimedia 2017! #ACMMM What a wonderful idea this is to encourage our community to communicate, both internally and to other related communities, about our events, our key research results and all the wonderful things the multimedia community stands for! I have always been surprised by how limited social media engagement is within the multimedia community. Your initiative has all my support! Let’s disseminate our research interest and activities on social media! @SIGMM #Motivated

The SIGMM flagship conference took place on October 23-27 at the Computer History Museum in Mountain View California, USA. For its 25th edition, the organizing committee had prepared an attractive program cleverly mixing expected classics (i.e. Best Paper session, Grand Challenges, Open Source software competition, etc…) and brand new sessions (such as Fast Forward and Thematic Workshops, Business Idea Venture, and the Novel Topics Track). In this last edition, the conference adopted a single paper length, removing the boundary between long and short papers. The TPC Co-Chairs and Area Chairs had the responsibility of directing accepted papers to either an oral session or a thematic workshop.

Thematic workshops took the form of poster presentations. Presenters were asked to provide a short video briefly motivating their work with the intention of making them available online for reference after the conference (possibly with a link to the full paper and the poster!). However, this did not come through as publication permissions were not cleared out in time, but the idea is interesting and should be taken into account for future editions. Fast forward (or Thematic workshop pitches) are short targeted presentations aimed at attracting the audience to the Thematic Workshop where the papers are presented (in the form of posters in this case). While such short presentations allow conference attendees to efficiently identify which poster are relevant to them, it is crucial for presenters to be well prepared and concentrate on highlighting one key research idea, as time is very limited. It also gives more exposure to poster. I would be in favor of keeping such sessions for future ACM Multimedia editions.

The 25th edition of ACM MM wasn’t short of keynotes. No less than 6 industry keynotes had punctuated each of the conference half day. The first keynote by Achin Bhowmik from Starkey focused on Audio as a mean to “Enhancing and Augmenting Human Perception with Artificial Intelligence”. Bill Dally from NVidia presented “Efficient Methods and Hardware for Deep Learning”, in short why we all need GPUs! “Building Multi-Modal Interfaces for Smartphones” was the topic presented by Injong Rhee (Samsung Electronics), Scott Silver (YouTube) discussed the difficulties in “Bringing a Billion Hours to Life” (referring to the vast quantities of videos uploaded and viewed on the sharing platform, and the long tail). Ed. Chang from HTC presented “DeepQ: Advancing Healthcare Through AI and VR” and demonstrated how healthcare is and will benefit from AR, VR and AI. Danny Lange from Unity Technologies highlighted how important machine learning and deep learning are in the game industry in ”Bringing Gaming, VR, and AR to Life with Deep Learning”. Personally, I would have preferred a mix of industry/academic keynotes as I found some of the keynotes not targeting an audience of computer scientists.

Arnold W. M. Smeulders received the SIGMM Technical Achievement Award for his outstanding and pioneering contribution defining and bridging the semantic gap in content based image retrieval (his lecture is here: https://youtu.be/n8kLxKNjQ0A). His talk was sharp, enlightening and very well received by the audience.

The @sigmm rising star award went to Dr Liangliang Cao for his contribution to large-scale multimedia recognition and social media mining.

The conference was noticeably flavored with trendy topics such as AI, Human augmenting technologies, Virtual and Augmented Reality, and Machine (Deep) Learning, as can be noticed from the various works rewarded.

The Best Paper award was given to Bokun Wang, Yang Yang, Xing Xu, Alan Hanjalic, Heng Tao Shen for their work on “Adversarial Cross-Modal Retrieval“.

Yuan Tian, Suraj Raghuraman, Thiru Annaswamy, Aleksander Borresen, Klara Nahrstedt, Balakrishnan Prabhakaran received the Best Student Paper award for the paper “H-TIME: Haptic-enabled Tele-Immersive Musculoskeletal Examination“.

The Best demo award went to “NexGenTV: Providing Real-Time Insight during Political Debates in a Second Screen Application” by Olfa Ben Ahmed, Gabriel Sargent, Florian Garnier, Benoit Huet, Vincent Claveau, Laurence Couturier, Raphaël Troncy, Guillaume Gravier, Philémon Bouzy and Fabrice Leménorel.

The Best Open source software award was received by Hao Dong, Akara Supratak, Luo Mai, Fangde Liu, Axel Oehmichen, Simiao Yu, Yike Guo for “TensorLayer: A Versatile Library for Efficient Deep Learning Development“.

The Best Grand Challenge Video Captioning Paper award went to “Knowing Yourself: Improving Video Caption via In-depth Recap“, by Qin Jin, Shizhe Chen, Jia Chen, Alexander Hauptmann.

The Best Grand Challenge Social Media Prediction Paper award went to Chih-Chung Hsu, Ying-Chin Lee, Ping-En Lu, Shian-Shin Lu, Hsiao-Ting Lai, Chihg-Chu Huang,Chun Wang, Yang-Jiun Lin, Weng-Tai Su for “Social Media Prediction Based on Residual Learning and Random Forest“.

Finally, the Best Brave New Idea Paper award was conferred to John R Smith, Dhiraj Joshi, Benoit Huet, Winston Hsu and Zef Cota for the paper “Harnessing A.I. for Augmenting Creativity: Application to Movie Trailer Creation“.

A few years back, the multimedia community was concerned with the lack of truly multimedia publications. In my opinion, those days are behind us. The technical program has evolved into a richer and broader one, let’s keep the momentum!

The location was a wonderful opportunity for many of the attendees to take a stroll down memory lane and see computers and devices (VT100, PC, etc…) from the past thanks to the complementary entrance to the museum exhibitions. The “isolated” location of the conference venue meant going out for lunch breaks was out of the question given the duration of the lunch break. As a solution, the organizers catered buffet lunches. This resulted in the majority of the attendees interacting and mixing over the lunch break while eating. This could be an effective way to better integrate new participants and strengthen the community. Both the welcome reception and the banquet were held successfully within Computer Museum. Both events offer yet another opportunity for new connections to be made and for further interaction between attendees. Indeed, the atmosphere of both occasions was relaxed, lively and joyful.

All in all, ACM MM 2017 was another successful edition of our flagship conference, many thanks to the entire organizing team and see you all in Seoul for ACM MM 2018 http://www.acmmm.org/2018/ and follow @sigmm on Twitter!

Report from ACM Multimedia 2017 – by Conor Keighrey

By Mario Montagud | February 4, 2018 - 22:36 |February 16, 2018 0118, Conference Report, Event Report, Feature

Leave a comment

My name is Conor Keighrey, I’m a PhD. candidate at the Athlone Institute Technology in Athlone, Co. Westmeath, Ireland. The focus of my research is to understand the key influencing factors that affect Quality of Experience (QoE) in emerging immersive multimedia experiences, with a specific focus on applications in the speech and language therapy domain. I am funded for this research by the Irish Research Council Government of Ireland Postgraduate Scholarship Programme. I’m delighted to have been asked to present this report to the SIGMM community as a result of my social media activity at ACM Multimedia Conference.

Launched in 1993, the ACM Multimedia (ACMMM) Conference held its 25^th anniversary event in the Mountain View, California. The conference was located in the heart of Silicon Valley, at the inspirational Computer History Museum.

Under five focal themes, the conference called for multimedia papers which focused on topics relating to multimedia: Experience, Systems and Applications, Understanding, Novel Topics, and Engagement.

Keynote addresses were delivered by high-profile industry leading experts from the field of multimedia. These talks provided insight into the active development from the following experts:

Achin Bhowmik (CTO & EVP, Starkey, USA)
Bill Dally (Senior Vice President and Chief Scientist, NVidia, USA)
Injong Rhee (CTO & EVP, Samsung Electronics, Korea)
Edward Y. Chang (President, HTC, Taiwan)
Scott Silver (Vice President, Google, USA)
Danny Lange (Vice President, Unity Technologies, USA)

Some keynote highlights include Bill Dally’s talk on “Efficient Methods and Hardware for Deep Learning”. Bill provided insight into the work NVidia are doing with neural networks, the hardware which drives them, and the techniques the company are using to make them more efficient. He also highlighted how AI should not be thought of as a mechanism which replaces, but empower humans, thus allowing us to explore more intellectual activities.

Danny Lange of Unity Technologies discussed the application of the Unity game engine to create scenarios in which machine learning models can be trained. His presentation entitled “Bringing Gaming, VR, and AR to Life with Deep Learning” described the capture of data for self-driving cars to prepare for unexpected occurrences in the real world (e.g. pedestrians activity or other cars behaving in unpredicted ways).

A number of the Keynotes were captured by FXPAL (an ACMMM Platinum Sponsor) and are available here.

With an acceptance rate of 27.63% (684 reviewed, 189 accepted), the main track at ACMMM showcased a diverse collection of research from academic institutes around the globe. An abundance of work was presented in the ever-expanding area of deep/machine learning, virtual/augmented/mixed realities, and the traditional multimedia field.

The importance of gender equality and diversity with respect to advancing careers of women in STEM has never been greater. Sponsored by SIGMM, the Women/Diversity in MM lunch took place on the first day of ACMMM. Speakers such as Prof. Noel O’Conner discussed the significance of initiatives such as Athena SWAN (Scientific Women’s Academic Network) within Dublin City University (DCU). Katherine Breeden (Pictured left), an Assistant Professor in the Department of Computer Science at Harvey Mudd College (HMC), presented a fantastic talk on gender balance at HMC. Katherine’s discussion highlighted the key changes which have occurred resulting in more women than men graduating with a degree in computer science at the college.

Other highlights from day 1 include a paper presented at the Experience 2 (Perceptual, Affect, and Interaction) session, chaired by Susanne Boll (University of Oldenburg). Researchers from the National University of Singapore presented the results of a multisensory virtual cocktail (Vocktail) experience which was well received.

Through the stimulation of 3 sensory modalities, Vocktails aim to create virtual flavor, and augment taste experiences through a customizable interactive drinking utensil. Controlled by a mobile device, participants of the study experienced augmented taste (electrical stimulation of the tongue), smell (micro air-pumps), and visual (RGB light projected onto the liquid) stimulus as they used the system. For more information, check out their paper entitled “Vocktail: A Virtual Cocktail for Pairing Digital Taste, Smell, and Color Sensations” on the ACM Digital Library.

Day 3 of the conference included a session entitled Brave New Ideas. The session presented a fantastic variety of work which focused on the use of multimedia technologies to enhance or create intelligent systems. Demonstrating AI as an assistive tool and winning the Best Brave New Idea Paper award, a paper entitled “Harnessing A.I. for Augmenting Creativity: Application to Movie Trailer Creation” (ACM Digital Library) describes the first-ever human machine collaboration for creating a real movie trailer. Through multi-modal semantic extraction, inclusive of audio-visual, scene analysis, and a statistical approach, key moments which characterize horror films were defined. As a result of this, the AI selected 10 scenes from a feature length film which were further developed alongside a professional film maker to finalize an exciting movie trailer. Officially released by 20^th Century Fox, the complete AI trailer for the horror movie “Morgan” can be viewed here.

A new addition to the last ACMMM edition year has been the inclusion of thematic workshops. Four individual workshops (as outlined below) provided opportunity for papers which could not be accommodated within the main track to be presented to the multimedia research community. A total of 495 papers were reviewed from which 64 were accepted (12.93%). Authors of accepted papers presented their work via on-stage thematic workshop pitches, which were followed by poster presentations on Monday the 23^rd and Friday the 27^th. The workshop themes were as follows:

Experience (Organised by Wanmin Wu)
Systems and Applications (Organised by Roger Zimmermann & He Ma)
Engagement (Organised by Jianchao Yang)
Understanding (Organised by Qi Tian)

Presented as part of the thematic workshop pitches, one of the most fascinating demos at the conference was a body of work carried out by Audrey Ziwei Hu (University of Toronto). Her paper entitled “Liquid Jets as Logic-Computing Fluid-User-Interfaces” describes a fluid (water) user interface which is presented as a logic-computing device. Water jets form a medium for tactile interaction and control to create a musical instrument known as a hydraulophone.

Steve Mann (Pictured left) from Stanford University, who is regarded as “The Father of Wearable Computing”, provided a fantastic live demonstration of the device. The full paper can be found on the ACM Digital Library, and a live demo can be seen here.

In large scale events such ACMMM, the importance of social media reporting/interaction has never been greater. More than 250 social media interactions (tweets, retweets, and likes) were monitored using the #SIGMM and #ACMMM hashtags, as outlined by the SIGMM Records prior to the event. Descriptive (and multimedia enhanced) social media reports provide a chance for those who encounter an unavoidable schedule overlap, and an opportunity to gather some insight into alternative works presented at the conference.

From my own perspective (as a PhD. student), the most important aspect of social media interaction is that reports often serve as a conversational piece. Developing a social presence throughout the many coffee breaks and social events during the conference is key to the success of building a network of contacts within any community. As a newcomer this can often be a daunting task, recognition of other social media reporters offers the perfect ice-breaker, providing opportunity to discuss and inform each other of the on-going work within the multimedia community. As a result of my own online reporting, I was recognized numerous times throughout the conference. Staying active on social media often leads to the development of a research audience, and social media presence among peers. Engaging in such an audience is key to the success of those who wish to follow a path in academia/research.

Building on my own personal experience, continued attendance to SIGMM conferences (irrespective of paper submission) has so many advantages. While the predominant role of a conference is to disseminate work, the informative aspect of attending such events is often overlooked. The area of multimedia research is moving at a fast pace, and thus having the opportunity to engage directly with researchers in your field of expertise is of upmost importance. Attendance to ACMMM and other SIGMM conferences, such ACM Multimedia Systems, has inspired me to explore alternative methodologies within my own respective research. Without a doubt, continued attendance will inspire my research as I move forward.

ACM Multimedia ‘18 (October 22^nd – 26^th) – The diverse landscape of modern skyscrapers mixed with traditional Buddhist temples, and palaces that is Seoul, South Korea, will be host to the 26^th Annual ACMMM. The 2018 event will without a doubt present a variety of work from the multimedia research community. Regular paper abstracts are due on the 30^th of March (Full manuscripts are due on the 8^th of April). For more information on next year’s ACM Multimedia conference check out the following link: http://www.acmmm.org/2018

ACM Fellows in the SIGMM Community

By Mario Montagud | January 25, 2018 - 21:22 |March 24, 2018 0118, Award, Feature

Leave a comment

Multimedia can be defined as the seamless integration of digital technologies in ways which provide for an enriched experience for users as we create and consume information with high fidelity. Behind that definition lies a host of enabling digital technologies to allow us create, capture, store, analyse, index, locate, transmit and present information. But when did multimedia, as we now know it, start? Was it the ideas of Vannevar Bush and Memex, or Ted Nelson and Xanadu or the development of Apple computers in the mid 1980s or maybe the emergence of the web which enables distribution of multimedia?

Certainly by the early 1990s and definitely by 1993 when SIGMM was founded, multimedia was established and recognised as a mainstream activity within computing. Over the intervening two and a half decades we’ve seen tremendous progress, incredible developments and a wholesale adoption of our technologies right across our society. All this has been achieved partly on the backs of innovations by many eminent scientists and technologists who are leaders within our SIGMM community.

We recently saw two of our SIGMM community elevated to the grade of ACM Fellow, joining the 52 other new ACM Fellows in the class of 2017. Our congratulations go to Yong Rui and to Shih-Fu Chang for their elevation to that grade. Yong had a lovely interview for SIGMM on the significance of this honour for him as a researcher, and for us all in SIGMM, which is available at http://sigmm.org/news/interview-dr-yong-rui-acm-fellow and its worth reflecting on some of our other SIGMM family who have been similarly honoured in the past.

While checking SIGMM membership is an easy thing to do (though its a bit more difficult to check back throughout our membership history) it is a bit arbitrary to define who is and who is not part of our SIGMM “family”. To me its somebody who is, or has been, an active participant or organiser of our events, or a contributor to our field. Our SIGMM family includes those I would associate with SIGMM rather than any other SIG, and with ACM rather than with any other society.

In the class of new ACM Fellows for 2017 Shih-Fu Chang is elevated “for contributions to large-scale multimedia content recognition and multimedia information retrieval”. Shih-Fu is my predecessor as SIGMM chair and still serves on the SIGMM Executive as well as maintaining a hugely impressive research output. He won the SIGMM Outstanding Technical Achievement Award in 2011.

– Yong Rui was also elevated to ACM Fellow in 2017 “for contributions to image, video and multimedia analysis, understanding and retrieval”. Yong is a long-time supporter of SIGMM Conferences as well as a regular attendee and major contributor to our field.

– Wen Gao of Peking University is vice president of the National Natural Science Foundation of China and was a co-chair of ACM Multimedia in 2009. He is also on the advisory board of ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) and was elevated to ACM Fellow in 2013 “for contributions to video technology, and for leadership to advance computing in China”.

– Zhengyou Zhang was also elevated in 2013 “for contributions to computer vision and multimedia” and continues to serve the SIGMM community, most recently as Best Papers chair at MM 2017.

– Klara Nahrstedt (class of 2012) was elevated “for contributions to quality-of-service management for distributed multimedia systems” and has served as SIGMM Chair prior to Shih-Fu. In 2014 Klara won the SIGMM Technical Achievement Award and until last year she also served on the SIGMM Executive Committee.

– Joe Konstan (class of 2008) was elevated “for contributions to human-computer interaction” and he also won the ACM Software System Award in 2010. Joe was the ACM MM 2000 TPC Chair and was on the SIGMM Executive Committee from 1999 to 2007.

– HongJiang Zhang (class of 2007) was elevated to Fellow “for contributions to content-based analysis and retrieval of multimedia”. HongJiang also won the 2012 SIGMM Outstanding Technical Achievement Award and he has a huge publications output with a Google Scholar h-index of 120.

– Ramesh Jain (class of 2003) was elevated “for contributions to computer vision and multimedia information systems”. Ramesh remains one of the most prolific authors in our field and a regular, almost omnipresent, attendee at our major SIGMM conferences. In 2010 Ramesh won the SIGMM Outstanding Technical Achievement Award.

– Ralf Steinmetz (class of 2001) was elevated for “pioneering work in multimedia communications and education, including fundamental contributions in perceivable Quality of Service for multimedia systems derived from multimedia synchronization, and for multimedia education”. Ralf is also the winner of the inaugural ACM SIGMM Technical Achievement Award, presented in 2008 and between 2009 and 2015 he served as Editor-in-Chief of the ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), formerly known as TOMCCAP.

– Larry Rowe (class of 1998) was elevated “for seminal contributions to programming languages, relational database technology, user interfaces and multimedia systems”. Larry is a past chair of SIGMM (1998-2003) and in 2009 he received the SIGMM Technical Achievement Award.

– P. Venkat Rangan was elevated to ACM Fellow in 1998. At the recent ACM MM Conference in 2017, we had a short presentation on the first ACM MM Conference in 1993 and Venkat’s efforts in organising that first MM was acknowledged in that presentation. Venkat’s ACM Fellowship citation says that he “founded one of the foremost centers for research in multimedia, in which area he is an inventor of fundamental techniques with global impact”.

What is interesting to note about these awardees is the broad range of areas in which their contributions are grounded, covering the “traditional” areas of multimedia. These range from quality of service delivery across networks to analysis of content and from user interfaces and interaction to progress in fundamental computer vision. This reflects the broad range of research areas covered by the SIGMM community, which has been part of our DNA since SIGMM was founded.

Our ACM Fellows are a varied and talented group of individuals, each richly deserving of their award and their only single unifying theme is broad multimedia, and that’s one of our distinguishing features. In some SIGs like SIGARCH (computer architecture), SIGCSE (computer science education) or SIGIR (information retrieval), there’s a focus on a narrow topic or a challenge while in other SIGs like SIGCHI (computer human interaction), SIGMOD (management of data) or SIGAI (artificial intelligence), there are a broad range of research areas. SIGMM sits with those areas where our application and impact is broad.

The ACM Fellow awards started nearly 25 years ago. Further details can be found at https://awards.acm.org/award-nominations and a link to each of the awards can be found at https://awards.acm.org/fellows/award-winners

Diversity and Credibility for Social Images and Image Retrieval

By Mario Montagud | December 21, 2017 - 22:22 |January 8, 2018 0417, Datasets Column, Feature

Leave a comment

Social media has established itself as an inextricable component of today’s society. Images make up a large proportion of items shared on social media [1]. The popularity of social image sharing has contributed to the popularity of the Retrieving Diverse Social Images task at the MediaEval Benchmarking Initiative for Multimedia Evaluationa [2]. Since its introduction in 2013, the task has attracted a large participation and has published a set of datasets of outstanding value to the multimedia research community.

The task, and the datasets it has released, target a novel facet of multimedia retrieval, namely the search result diversification of social images. The task is defined as follows: Given a large number of images, retrieved by a social media image search engine, find those that are not only relevant to the query, but also provide a diverse view of the topic/topics behind the query (see an example in Figure 1). The features and methods needed to address the task successfully are complex and span different research areas (image processing, text processing, machine learning). For this reason, when creating the collections used in the Retrieving Diverse Social Images Tasks, we also created a set of baseline features. The features are released with the datasets. In this way, task participants who have expertise in one particular research area may focus on that area and still participate in the full evaluation.

Figure 1: Example of retrieval and diversification results for query “Pingxi Sky Lantern Festival” (results are truncated to the first 14 images for better visualization): (top images) Flickr initial retrieval results; (bottom images) diversification achieved with the approach from the TUW team (best approach at MediaEval 2015).

The collections

Before describing the individual collections, it needs to be noted that all data consist of redistributable Creative Commons Flickr and Wikipedia content and are freely available for download (follow the instructions here [3]). Although the task ran also in 2017, we focus in the following on the datasets already released, namely: Div400, Div150Cred, Div150Multi and Div150Adhoc (corresponding to the 2013-2016 evaluation campaigns). Each of the four datasets available so far covers different aspects of the diversification challenge, either from the perspective of the task/use-case addressed, or from the data that can be used to address the task. Table 1 gives an overview of the four datasets that we describe in more detail over the next four subsections. Each of the datasets is divided into a development set and a test set. Although the division of development and test data is arbitrary, for comparability of results and full reproducibility, users of the collections are advised to maintain the separation when performing their experiments.

Table 1: Dataset statistics (devset – development data, testset – testing data, credibilityset – data for estimating user tagging credibility, single (s) – single topic queries, multi (m) – multi-topic queries, ++ – enhanced/updated content, POI – location point of interest, events – events and states associated with locations, general – general purpose ad-hoc topics).

Div400

In 2013, the task started with a narrowly defined use-case scenario, where a tourist, upon deciding to visit a particular location, reads the corresponding Wikipedia page and desires to see a diverse set of images from that location. Queries here might be “Big Ben in London” or “Palazzo delle Albere in Italy”. For each such query, we know the GPS coordinates, the name, and the Wikipedia page, including an example image of the destination. As a search pool, we consider the top 150 photos obtained from Flickr using the name as a search query. These photos come with some metadata (photo ID, title, description, tags, geotagging information, date when the photo was taken, owner’s name, number of times the photo has been displayed, URL in Flickr, license type, number of comments on the photo) [4].

In addition to providing the raw data, the collection also contains visual and text features of the data, such that researchers who are only interested in one of the two, can use the other without investing additional time in generating a baseline set of features.

As visual descriptors, for each of the images in the collection, we provide:

Global color naming histogram
Global histogram of oriented gradients
Global color moments on HSV
Global Locally Binary Patterns on gray scale
Global Color Structure Descriptor
Global statistics on gray level Run Length Matrix (Short Run Emphasis, Long Run Emphasis, Gray-Level Non-uniformity, Run Length Non-uniformity, Run Percentage, Low Gray-Level Run Emphasis, High Gray-Level Run Emphasis, Short Run Low Gray-Level Emphasis, Short Run High Gray-Level Emphasis, Long Run Low Gray-Level Emphasis, Long Run High Gray-Level Emphasis)
Local spatial pyramid representations (3×3) of each of the previous descriptors

As textual descriptors we provide the classic Term Frequency (TF_t,d – the number of occurrences of term t in document d) and Document Frequency (DF_t – the number of documents containing term t). Note that the datasets are not limited to a single notion of document. The most direct definition of a “document” is an image that can be either retrieved or not retrieved. However, it is easily conceivable that the relative frequency of a term in the set of images corresponding to one topic, or the set of images corresponding to one user might also be of interest in ranking the importance of a result to a query. Therefore, the collection also contains statistics that take a document to be a topic, as well as a user. All these are provided both as CSV files, as well as Lucene Index files. The former can be used as part of a custom weighting scheme, while the latter can be deployed directly in a Lucene/Solr search engine to obtain results based on the text without further effort.

Div150Cred

The tourism use case also underlies Div150Cred, but a component addressing the concept of user tagging credibility is added. The idea here is that not all users tag their photos in a manner that is useful for retrieval and, for this reason, it makes sense to consider, in addition to the visual and text descriptors also used in Div400, another feature set – a user credibility feature. Each of the 153 topics (30 in the development set and 123 in the test set) comes therefore, in addition to the visual and text features of each image, with a value indicating the credibility of the user. This value is estimated automatically based on a set of features, so in addition to the retrieval development and test sets, DIV150Cred also contains a credibility set, used by us to generate the credibility of each user, and which can be used by any interested researcher to generate better credibility estimators.

The credibility set contains images for approximately 300 locations from 685 users (a total of 3.6 million images). For each user there is a manually assigned credibility score as well as an automatically estimated one, based on the following features:

Visual score – learned predictor of a user’s consistent and relevant tagging behavior
Face proportion
Tag specificity
Location similarity
Photo count
Unique tags
Upload frequency
Bulk proportion

For each of these, the intuition behind it and the actual calculation is detailed in the collection report [5].

Div150Multi

Div150Multi adds another twist to the task of the search engine and its tourism use-case. Now, the topics are not simply points of interest, but rather a combination of a main concept and a qualifier, namely multi-topic queries about location specific events, location aspects or general activities (e.g., “Oktoberfest in Munich”, “Bucharest in winter”). In terms of features however, the collection builds on the existing ones used in Div400 and Div150Cred, but adds to the pool of resources the researchers have at their disposal. In terms of credibility, in addition to the 8 features listed above, we now also have:

Mean Photo Views
Mean Title Word Counts
Mean Tags per Photo
Mean Image Tag Clarity

Again, for details on the intuition and formulas behind these, the collection report [6] is the reference material.

A new set of descriptors has been now made available, based on convolutional neural networks.

CNN generic: a descriptor based on the reference convolutional (CNN) neural network model provided along with the Caffe framework [7]. This model is trained with the 1,000 ImageNet classes used during the ImageNet challenge. The descriptors are extracted from the last fully connected layer of the network (named fc7).
CNN adapted: These features were also computed using the Caffe framework, with the reference model architecture but using images of 1,000 landmarks instead of ImageNet classes. We collected approximately 1,200 Web images for each landmark and fed them directly to Caffe for training [8]. Similar to CNN generic, the descriptors were extracted from the last fully connected layer of the network (i.e., fc7).

Div150AdHoc

For this dataset, the definition of relevance was expanded from previous years, with the introduction of even more challenging multi-topic queries unrelated to POIs. These queries address the diversification problem for a general ad-hoc image retrieval system, where general-purpose multi-topic queries are used for retrieving the images (e.g., “animals at Zoo”, “flying planes on blue sky”, “hotel corridor”). The Div150Adhoc collection includes most of the previously described credibility descriptors, but drops faceProportion and location-Similarity, as they were no longer relevant for the new retrieval scenario. Also, the visualScore descriptor was updated in order to keep up with the latest advancements on CNN descriptors. Consequently, when training individual visual models, the Overfeat visual descriptor is replaced by the representation produced by the last fully connected layer of the network [9]. Full details are available in the collection report [10].

Ground-truth and state-of-the-art

Each of the above collections comes with an associated ground-truth, created by human assessors. As the focus is on both relevance and diversity, the ground truth and the metrics used reflect it: Precision at cutoff (primarily P@20) is used for relevance, and Cluster Recall at cutoff (primarily CR@20) is used for diversity.

Figure 2 shows an overview of the results obtained by participants in the evaluation campaigns over the period 2013-2016, and serves as a baseline for future experiments on these collections. Results presented here are on the test set alone. The reader may find more information about the methods in the MediaEval proceedings, which are listed on the Retrieving Diverse Social Images yearly task pages on the MediaEval website (http://multimediaeval.org/).

Figure 2. Evolution of the diversification performance (boxplots — the interquartile range (IQR), i.e. where the 50% of the values are; the line within the box = median; the tails = 1.5*IQR; the points outside (+) = outliers) for the different datasets in terms of precision (P) and cluster recall (CR) at different cut-off values. Flickr baseline represents the initial Flickr retrieval result for the corresponding dataset.

Conclusions

The Retrieving Diverse Social Image task datasets, as their name indicates, address the problem of retrieving images taking into account both the need to diversify the results presented to the user, as well as the potential lack of credibility of the users in their tagging behavior. They are based on already state-of-the-art retrieval technology (i.e., the Flickr retrieval system), which makes it possible to focus on the challenge of image diversification. Moreover, the data sets are not limited to images, but rather also include rich social information. The credibility component, represented by the credibility subsets of the last three collections, is unique to this set of benchmark datasets.

Acknowledgments

The Retrieving Diverse Social Image task datasets were made possible by the effort of a large team of people over an extended period of time. The contributions of the authors were essential. Further, we would like to acknowledge the multiple team members who have contributed to annotating the images and making the MediaEval Task possible. Please see the yearly Retrieving Diverse Social Images task pages on the MediaEval website (http://multimediaeval.org/).

Contact

Should you have any inquires or questions about the datasets, don’t hesitate to contact us via email at: bionescu at imag dot pub dot ro.

References

[1] http://contentmarketinginstitute.com/2015/11/visual-content-strategy/ (last visited 2017-11-29).

[2] http://www.multimediaeval.org/

[3] http://www.campus.pub.ro/lab7/bionescu/publications.html#datasets

[4] http://campus.pub.ro/lab7/bionescu/Div400.html

[5] http://campus.pub.ro/lab7/bionescu/Div150Cred.html

[6] http://campus.pub.ro/lab7/bionescu/Div150Multi.html

[7] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding” in ACM International Conference on Multimedia, 2014, pp. 675–678.

[8] E. Spyromitros-Xioufis, S. Papadopoulos, A. L. Ginsca, A. Popescu, Y. Kompatsiaris, and I. Vlahavas, “Improving diversity in image search via supervised relevance scoring” in ACM International Conference on Multimedia Retrieval, 2015, pp. 323–330.

[9] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, “Return of the devil in the details: Delving deep into convolutional nets” arXiv preprint arXiv:1405.3531, 2014.

[10] http://campus.pub.ro/lab7/bionescu/Div150Adhoc.html

How Do Ideas Flow around SIGMM Conferences?

By Mario Montagud | December 8, 2017 - 21:29 |December 23, 2017 0417, Feature

Leave a comment

The ACM Multimedia conference just celebrated its quarter century in October 2017. This is a great opportunity to reflect on the intellectual influence of the conference, and the SIGMM community in general.

The progress on big scholarly data allows us to make this task analytical. I download a data dump from Microsoft Academic Graph (MAG) in February 2016. I find all papers from ACM Multimedia (MM), the SIGMM flagship conference — there are 4,346 publication entries from 1993 to 2015. I then search the entire MAG for: (1) any paper that appears in the reference list of these MM papers – 35,829 entries across 1,560 publication venues (including both journals and conferences), with an average of 8.24 per paper; (2) any paper that cites any of these MM papers – 46826 citations from 1694 publication venues, with an average of 10.77 citations per paper.

This data allows us to profile the incoming (references) and outgoing (citations) influence in the community in detail. In this article, we highlight two questions below.

Where are the intellectual influences of the SIGMM community coming from, and going to?

If you have been publishing in, and going to SIGMM conference(s) for a while, you may wonder where the ideas presented today would have its influence after 5, 10, 20 years? You may also wonder if the ideas cross over to other fields and disciplines, and which stay and flourish within the SIGMM community. You may also wonder whether the influence flow has changed since you entered the community, 3, 5, 10, or 20+ years ago.

If you are new to SIGMM, you may wonder what this community’s intellectual heritage is. For new students or researchers who recently entered this area, you may wonder what other publication venues are you likely to find work relevant to multimedia.

Figure 1. The citation flow for ACM Multimedia (1993-2015). Summary of incoming vs outgoing citations to the top 25 venues in either direction. Node colors: ratio of citations (outgoing ideas, red) vs references (incoming ideas, blue). Node sizes: amount of total citation+references in either direction. Thickness of blue edges are scaled by the number of references going to a given venue; thickness of red edges are scaled by the number of citations coming from a given venue. Nodes are sorted left-to-right by the ratio of incoming vs outgoing citations to this conference.

A summary of this information is found in the “citation flower” graph above, summarising the incoming and outgoing influence since the inception of ACM MM (1993-2015).

On the right of the “citation flower” we can see venues that have had more influence in MM than otherwise, these include computer vision and pattern recognition (CVPR, ICCV, ECCV, T-PAMI, IJCV), machine learning (NIPS, JMLR, ICML), networking and systems (INFOCOM), information retrieval (SIGIR), human-computer interaction (CHI) as well as related journals (IEEE Multimedia). The diversity of incoming influence is part of the SIGMM identity, as the community has always been a place where ideas from disparate areas meet and generate interesting solutions to problems as well as generating new challenges. As indicated by the break down over time (on a separate page), the incoming influence of CVPR is increasing, and that of IEEE Trans. Circuits Systems on Video Technology is decreasing — this is consistent with video encoding technology maturing over the last two decades, and computer vision being fast-evolving currently.

On the left of the “citation flower”, we can see that ACM MM has been a major influencer for a variety of multimedia venues — from conferences (ICME, MIR, ICMR, CIVR) to journals (Multimedia Tools and Applications, IEEE Trans. Multimedia), to journals in related areas (IEEE Trans. On Knowledge Discovery and Engineering).

How many papers are remembered in the collective memory of the academic community and for how long?

Or, as a heated post-conference beer conversation may put it: are 80% of the papers forgotten in 2 years? Spoiler alert: no, for most conferences we looked at; but about 20% tend not be cited at all.

Figure 2. Fraction of ACM MM papers that are cited at least once more than X years after they are published, with a linear regression overlay.

In Figure 2, we see a typical linear decline of the fraction of papers being cited. For example, 53% of papers have at least one citation after being published for 10 years. There are multiple factors that affect the shape of this citation survival graph, such as the size of this research community, the turnover rate of ideas (fast-moving or slow-moving), the perceived quality of publications, and others. See here for a number of different survival curves in different research communities.

What about the newer, and more specialised SIGMM conferences?

Figure 3 and Figure 4 show the citation flowers for ICMR and MMSys, both conferences have had five years of publication data in MAG. We can see that both conferences are well-embedded among the SIGMM and related venues (ACM Multimedia, IEEE Trans. Multimedia), both have strong influence from the computer vision community including T-PAMI, CVPR and ICCV. The sub-community specific influences are coming from WWW, ICML NIPS for ICMR; and INFOCOM, SIGMETRICS, SIGMAR for MMSys. In terms of out-going influence, MMSys influences venues in networking (ICC, CoNEXT), and ICMR influences Information Science and MMSys.

Figure 3. The citation flow for ICMR (2011-2015). See Figure 1 caption for the meaning of node/edge colors and sizes.

Figure 4. The citation flow for MMSys (2011-2015). See Figure 1 caption for the meaning of node/edge colors and sizes.

Overall, this case study shows the truly multi-disciplinary nature of SIGMM, the community should continue the tradition of fusing ideas and strive to increase its influence in other communities.

I hope you find these analyzes and observations somewhat useful, and I would love to hear comments and suggestions from the community. Of course, the data is not perfect, and there is a lot more to do. The project overview page [1] contains details about data processing and several known issues, software for this analysis and visualisation are also released publicly [2].

Acknowledgements

I thank Alan Smeaton and Pablo Cesar for encouraging this post and many helpful editing suggestions. I also thank Microsoft Academic for making data available.

References

[1] Visualizing Citation Patterns of Computer Science Conferences, Lexing Xie, Aug 2016, http://cm.cecs.anu.edu.au/post/citation_vis/

[2] Repository for analyzing citation flow https://github.com/lexingxie/academic-graph

Practical Guide to Using the YFCC100M and MMCOMMONS on a Budget

By Mario Montagud | October 9, 2017 - 16:55 |December 26, 2017 0317, Feature

Leave a comment

The Yahoo-Flickr Creative Commons 100 Million (YFCC100M), the largest freely usable multimedia dataset to have been released so far, is widely used by students, researchers and engineers on topics in multimedia that range from computer vision to machine learning. However, its sheer volume, one of the traits that make the dataset unique and valuable, can pose a barrier to those who do not have access to powerful computing resources. In this article, we introduce useful information and tools to boost the usability and accessibility of the YFCC100M, including the supplemental material provided by the Multimedia Commons (MMCOMMONS) community. In particular, we provide a practical guide on how to set up a feasible and cost effective research and development environment locally or in the cloud that can access the data without having to download it first.

YFCC100M: The Largest Multimodal Public Multimedia Dataset

Datasets are unarguably one of the most important components of multimedia research. In recent years there was a growing demand for a dataset that was not specifically biased or targeted towards certain topics, sufficiently large, truly multimodal, and freely usable without licensing issues.

The YFCC100M dataset was created to meet these needs and overcome many of the issues affecting existing multimedia datasets. It is, so far, the largest publicly and freely available multimedia collection of metadata representing about 99.2 million photos and 0.8 million videos, all of which were uploaded to Flickr between 2004 and 2014. Metadata included in the dataset are, for example, title, description, tags, geo-tag, uploader information, capture device information, URL to the original item. Additional information was later released in the form of expansion packs to supplement the dataset, namely autotags (presence of visual concepts, such as people, animals, objects, events, architecture, and scenery), Exif metadata, and human-readable place labels. All items in the dataset were published under one of the Creative Commons commercial or noncommercial licenses, whereby approximately 31.8% of the dataset is marked for commercial use and 17.3% has the most liberal license that only requires attribution to the photographer. For academic purposes, the entire dataset can be used freely, which enables fair comparisons and reproducibility of published research works.

Two articles from the people who created the dataset, YFCC100M: The New Data in Multimedia Research and Ins and Outs of the YFCC100M give more detail about the the motivation, collection process, and interesting characteristics and statistics about the dataset. Since its initial release in 2014, the YFCC100M quickly gained popularity and is widely used in the research community. As of September 2017, the dataset had been requested over 1400 times and cited over 300 times in research publications with topics ranging in multimedia from computer vision to machine learning. Specific topics include, but are not limited to, image and video search, tag prediction, captioning, learning word embeddings, travel routing, event detection, and geolocation prediction. Demos that use the YFCC100M can be found here.

Figure 1. Overview diagram of YFCC100M and Multimedia Commons.

MMCOMMONS: Making YFCC100M More Useful and Accessible

Out of the many things that the YFCC100M offers, its sheer volume is what makes it especially valuable, but it is also what makes the dataset not so trivial to work with. The metadata alone spans 100 million lines of text and is 45GB in size, not including the expansion packs. To work with the images and/or videos of YFCC100M, they need to be downloaded first using the individual URLs contained in the metadata. Aside from the time required to download all 100 million items, which would further occupy 18TB of disk space, the main problem is that a growing number of images and videos is becoming unavailable due to the natural lifecycle of digital items, where people occasionally delete what they have shared online. In addition, the time alone to process and analyze images and videos is generally infeasible for students and scientists in small research groups who do not have access to high performance computing resources.

These issues were noted upon the creation of the dataset and the MMCOMMONS community was formed to coordinate efforts for making the YFCC100M more useful and accessible to all, and to persist the contents of the dataset over time. To that end, MMCOMMONS provides an online repository that holds supplemental material to the dataset, which can be mounted and used to directly process the dataset in the cloud. The images and videos included in the YFCC100M can be accessed and even downloaded freely from an AWS S3 bucket, which was made possible courtesy of the Amazon Public Dataset program. Note that a tiny percentage of images and videos are missing from the bucket, as they already had disappeared when organizers started the download process right after the YFCC100M was published. This notwithstanding, the images and videos hosted in the bucket still serve as a useful snapshot that researchers can use to ensure proper reproduction of and comparison with their work. Also included in the Multimedia Commons repository are visual and aural features extracted from the image and video content. The MMCOMMONS website provides a detailed description of conventional features and deep features, which include HybridNet, VGG and VLAD. These CNN features can be a good starting point for those who would like to jump right into using the dataset for their research or application.

The Multimedia Commons has been supporting multimedia researchers by generating annotations (see the YLI Media Event Detection and MediaEval Placing tasks), developing tools, as well as organizing competitions and workshops for ideas exchange and collaboration.

Setting up a Research Environment for YFCC100M and MMCOMMONS

Even with pre-extracted features available, to do meaningful research one still needs a lot of computing power to process the large amount of YFCC100M and MMCOMMONS data. We would like to lower the barrier of entry for students and scientists who don’t have access to dedicated high-performance resources. In the following we describe how one can easily set up a research environment for handling the large collection. We introduce how Apache MXNet, Amazon EC2 Spot Instance and AWS S3 can be used to create a research development environment that can handle the data in a cost-efficient way, as well as other ways to use it more efficiently.

1) Use a subset of dataset

It is not necessary to work with the entire dataset just because you can. Depending on the use case, it may make more sense to use a well-chosen subset. For instance, the YLI-GEO and YLI-MED subsets released by the MMCOMMONS can be useful for geolocation and multimedia event detection tasks, respectively. For other needs, the data can be filtered to generate a customized subset.

The YFCC100M Dataset Browser is a web-based tool you can use to search the dataset by keyword. It provides an interactive visualization with statistics that helps to better understand the search results. You can generate a list file (.csv) of the items that match the search query, which you can then use to fetch the images and/or videos afterwards. The limitations of this browser are that it only supports keyword search on the tags and that it only accepts ASCII text as valid input, as opposed to UNICODE for queries using non-Roman characters. Also, queries can take up to a few seconds to return results.

A more flexible way to search the collection with lower latency is to set up your own Apache Solr server and indexing (a subset of) the metadata. For instance, the autotags metadata can be indexed to search for images that have visual concepts of interest. A step-by-step guide to setting up a Solr server environment with the dataset can be found here. You can write Solr queries in most programming languages by using one of the Solr wrappers.

2) Work directly with data from AWS S3

Apache MXNet, a deep learning framework you can run locally on your workstation, allows training with S3 data. Most training and inference modules in MXNet accept data iterators that can read data from and write data to a local drive as well as AWS S3.

The MMCOMMONS provides a data iterator for YFCC100M images, stored as a RecordIO file, so you can process the images in the cloud without ever having to download them to your computer. If you are working with a subset that is sufficiently large, you can further filter it to generate a custom RecordIO file that suits your needs. Since the images stored in the RecordIO file are already resized and saved compactly, generating a RecordIO from an existing RecordIO file by filtering on-the-fly is more time and space efficient than downloading all images first and creating a RecordIO file from scratch. However, if you are using a subset that is relatively small, it is recommended to download just those images you need from S3 and then create a RecordIO file locally, as that will considerably speed up processing the data.

While one would generally set up Apache MXNet to run locally, you should note that the I/O latency of using S3 data can be greatly reduced if you would set it up to run on an Amazon EC2 instance in the same region as where the S3 data is stored (namely, us-west-2, Oregon), see Figure 2. Instructions for setting up a deep learning environment on Amazon EC2 can be found here.

Figure 2. The diagram shows a cost-efficient setup with a Spot Instance in the same region (us-west-2) as the S3 buckets that houses YFCC100M and MMCOMMONS images/videos and RecordIO files. Data in the S3 buckets can be accessed in a same way from researcher’s computer; the only downside with this is the longer latency for retrieving data from S3. Note that there are several Yahoo! Webscope buckets (I3set1-I3setN) that hold a copy of the YFCC100M, but you only can access it using the path you were assigned after requesting the dataset.

3) Save cost by using Amazon EC2 Spot Instances

Cloud computing has become considerably cheaper in recent years. However, the price for using a GPU instance to process the YFCC100M and MMCOMMONS can still be quite expensive. For instance, Amazon EC2’s on-demand p2.xlarge instance (with a NVIDIA TESLA K80 GPU and 12GB RAM) costs 0.9 USD per hour in the us-west-2 region. This would cost approximately $650 (€540) a month if used full-time.

One way to reduce the cost is to set up a persistent Spot Instance environment. If you request an EC2 Spot Instance, you can use the instance as long as its market price is below your maximum bidding price. If the market price goes beyond your maximum bid, the instance gets terminated after a two minutes warning. To deal with such frequent interruptions it is important to store your intermediate results often to persistent storage space, such as AWS S3 or AWS EFS. The market price of the EC2 instance fluctuates, see Figure 3, so there is no guarantee as to how much you can save or how long you have to wait for your final results to be ready. But if you are willing to experiment with pricing, in our case we were able to reduce the costs by 75% during the period January-April 2017.

Figure 3. You can check the current and past market price of different EC2 instance types from the Spot Instance Pricing History panel.

4) Apply for academic AWS credits

Consider applying for the AWS Cloud Credits for Research Program to receive AWS credits to run your research in the cloud. In fact, thanks to the grant we were able to release LocationNet, a pre-trained geolocation model that used all geotagged YFCC100M images.

Conclusion

YFCC100M is at the moment the largest multimedia dataset released to the public, but its sheer volume poses a high barrier to actually use it. To boost the usability and accessibility of the dataset, the MMCOMMONS community provides an additional AWS S3 repository with tools, features, and annotations to facilitate creating a feasible research development environment for those with fewer resources at their disposal. In this column, we provided a guide on how a subset of the dataset can be created for specific scenarios, how the hosted YFCC100M and MMCOMMONS data on S3 can be used directly for training a model with Apache MXNet, and finally how Spot Instances and academic AWS credits can make running experiments cheaper or even free.

Join the Multimedia Commons Community

Please let us know if you’re interested in contributing to the MMCOMMONS. This is a collaborative effort among research groups at several institutions (see below). We welcome contributions of annotations, features, and tools around the YFCC100M dataset, and may potentially be able to host them on AWS. What are you working on?

Contact: multimedia-commons@icsi.berkeley.edu.
Website: www.multimediacommons.org

See the this page for information about how to help out.

Acknowledgements:

This dataset would not have been possible without the effort of many people, especially those at Yahoo, Lawrence Livermore National Laboratory, International Computer Science Institute, Amazon, ISTI-CNR, and ITI-CERTH.

Opinion Column: Tracks, Reviews and Preliminary Works

By Mario Montagud | October 4, 2017 - 14:06 |November 29, 2019 0317, Feature, Opinion: Opinion Column

Leave a comment

In a nutshell, the community agreed that: we need more transparent communication and homogeneous rules across thematic areas; we need more useful rebuttals; there is no need for conflict of interest tracks; large conferences must protect preliminary and emergent research works. Solutions were suggested to improve these points.

Welcome to the first edition of the SIGMM Community Discussion Column!

As promised in our introductory edition, this column will report highlights and lowlights of online discussion threads among the members of the Multimedia community (see our Facebook MM Community Discussion group).

After an initial poll, this quarter the community chose to discuss about the reviewing process and structure of the SIGMM-sponsored conferences. We organized the discussion around 3 main sub-topics: importance of tracks, structure of reviewing process, and value of preliminary works. We collected more than 50 contributions from the members of the Facebook MM Community Discussion group. Therefore, the following synthesis represents only these contributions. We encourage everyone to participate in the upcoming discussions, so that this column becomes more and more representative of the entire community.

Communication, Coordination and Transparency. All participants agreed that more vertical (from chairs to authors) and horizontal (in between area chairs or technical program chairs) communication could improve the quality of both papers and reviews in SIGMM-sponsored conferences. For example, lack of transparency and communication regarding procedures might deal to uneven rules and deadlines across tracks.

Tracks. How should conference thematic areas be coordinated? The community’s view can be summarized into 3 main perspectives:

Rule Homogeneity. The majority of participants agreed that big conferences should have thematic areas, and that tracks should be jointly coordinated by a technical program committee. Tracks are extremely important, but in order for the conference to give an individual, unified message, as opposed to “multi-conferences”, the same review and selection process should apply to all tracks. Moreover, hosting a face to face global TPC meetings is key for a solid, homogeneous conference program.
Non-uniform Selection Process to Help Emerging Areas. A substantial number of participants pointed out that one role of the track system is to help emerging subcommunities: thematic areas ensure a balanced programme with representation from less explored topics (for example, music retrieval or arts and multimedia). Under this perspective, while the reviewing process should be the same for all tracks, the selection phase could be non-uniform. “Mathematically applying a percentage rate per area” does not help selecting the actually high-quality papers across tracks: with a uniformly applied low acceptance rate rule, minor tracks might have one or two papers accepted only, despite the high quality of the submissions.
Abolish Tracks. A minority of participants agreed that, similar to big conferences such as CVPR, tracks should be completely abolished. A rigid track-based structure makes it somehow difficult for authors to choose the right track where to submit; moreover, reviewers and area chairs are often experts in more than one area. These issues could be addressed by a flexible structure where papers are assigned to area chairs and reviewers based on the topic.

Reviewing process How do we want the reviewing process to be? Here is the view of the community on four main points: rebuttal, reviewing instructions, conflict of interest, and reviewers assignment.

Rebuttal: important, but we need to increase impact. The majority of participants agreed that rebuttal is helpful to increase review quality and to grant authors more room for discussion. However, it was pointed out that sometimes the rebuttal process is slightly overlooked by both reviewers and area chairs, thus decreasing the potential impact of the rebuttal phase. It was suggested that, in order to raise awareness on rebuttal’s value, SIGMM could publish statistics on the number of reviewers who changed their opinion after rebuttal. Moreover, proposed improvements on the rebuttal process included: (1) more time allocated for reviewers to have a discussion regarding the quality of the papers; (2) a post-rebuttal feedback where reviewers respond to authors’ rebuttal (to promote reviewers-authors discussion and increase awareness on both sides) and (3) a closer supervision of the area chairs.
Reviewing Guidelines: complex, but they might help preliminary works. Do reviewing guidelines help reviewers writing better reviews? For most participants, giving instructions to reviewers appear to be somehow impractical, as reviewers do not necessarily read or follow the guidelines. A more feasible solution is to insert weak instructions through specific questions in the reviewing form (e.g. “could you rate the novelty of the paper?”). However, it was also pointed out that written rules could help area chairs justify a rejection of a bad review. Also, although reviewing instructions might change from track to track, general written rules regarding “what is a good paper” could help the reviewers understand what to accept. For example, clarification is needed on the depth of acceptable research works, and on how preliminary works should be evaluated, given the absence of a short paper track.
Brave New Idea Track: ensuring scientific advancement. Few participants expressed their opinion regarding this track hosting novel, controversial research ideas. They remarked the importance of such a track to ensure scientific advancement, and it was suggested that, in the future, this track could host exploratory works (former short papers), as preliminary research works are crucial to make a conference exciting.
Conflict of Interest (COI) Track: perhaps we should abolish it. Participants almost unanimously agreed that a COI track is needed only when the conference management system is not able to handle conflicts on its own. It was suggested that, if that is not the case, a COI track might actually have a antithetical effect (is the COI track acceptance rate for ACM MM higher this year?).
Choosing Reviewers: A Semi-Automated Process. The aim of the reviewers assignment procedure is to give the right papers to the right reviewers. How to make this procedure successful? Some participants supported the “fully manual assignment” option, where area chairs directly nominate reviewers for their own track. Others proposed to have a “fully automatic assignment”, based on an automated matching system such as the Toronto Paper Matching System (TPMS). A discussion followed, and eventually most participants agreed on a semi-automated process, having first the TPMS surfacing a relevant pool of reviewers (independent of tracks) and then area chairs manually intervening. Manual inspection of area chairs is crucial for inter-disciplinary papers needing reviews from experts from different areas.

Finally, during the discussion, few observations and questions regarding the future of the community arouse. For example: how to steer the direction of the conference, given the increase in number of AI-related papers? How to support diversity of topics, and encourage papers in novel fields (e.g. arts and music) beyond the legacy (traditional multimedia topics)? Given the wide interest on such issues, we will include these discussion topics in our next pre-discussion poll. To participate in the next discussion, please visit and subscribe to the Facebook MM Community Discussion group, and raise your voice!

Xavier Alameda-Pineda and Miriam Redi.

Impact of the New @sigmm Records

By Mario Montagud | September 20, 2017 - 06:15 |December 26, 2017 0317, Editorial, EiC column

Leave a comment

The SIGMM Records have renewed, with the ambition of continue being a useful resource for the multimedia community. The intention is to provide a forum for (open) discussion and to become a primary source of information (and of inspiration!).

The new team (http://sigmm.hosting.acm.org/impressum/) has committed to lead the Records in the coming years, gathering relevant contributions in the following main clusters:

Open science: with columns on open source software, datasets and benchmarks, and standards.
Opinion: with columns on multi-disciplinary topics, interviews to outstanding members of our community, and viewpoints of the community.
Information: current affairs including call for papers, summaries of PhD theses, open positions, and reports from our social media leaders.

The team has also revitalized the presence of SIGMM on Social Media. SIGMM accounts on Facebook and Twitter have been created for disseminating relevant news, events and contributions for the SIGMM community. Moreover, a new award has been approved: the Best Social Media Reporters from each SIGMM conference will get a free registration to one of the SIGMM conferences within a period of one year. The award criteria are specified at http://sigmm.hosting.acm.org/2017/05/20/awarding-the-best-social-media-reporters/

The following paragraphs detail the impact of all these new activities in terms of increased number of visitors and visits to the Records website (Figure 1), and broaden reach. All the statistics presented below started to be collected since the publication of the June issue (July 29th 2017).

Figure 1. Number of visitors and visits since the publication of the June issue

Visitors and Visits to the Records website

The daily number of visitors ranges approximately between 100 and 400. It has been noticed that this variation is strongly influenced by the publication of Social Media posts promoting contents published on the website. In the first month (since July 29^th, one day after the publication of the issue), more than 13000 visitors were registered, and more than 20000 visitors have been registered until now (see Table 1 for detailed statistics). The number of visits to the different posts and pages of the website accumulates up to more than 100000. The top 5 countries with highest number of visitors are also listed in Table 2. Likewise, the top 3 posts with highest impact, in terms of number of visits and of Social Media shares (via the Social Media icons recently added in the posts and pages of the website) are listed in Table 3. As an example, the daily number of visits to the main page of the June issue is provided in Figure 2, with a total number of 224 visits since its publication.

Finally, the top 3 referring sites (i.e., external websites from which visitors have clicked an URL to access the Records website) are Facebook (>700 references), Google (>300 references) and Twitter (>100 references). So, it seems that Social Media is helping to increase the impact of the Records. More than 30 users have accessed the Records website through the SIGMM website (sigmm.org) as well.

Table 1. Number of visitors and visits to the SIGMM Records website

Period	Visitors
Day	~100-400
Week	~2000-3000
Month	~8000-13000
Total (Since July 29th)	20012 (102855 visits)

Table 2. Top 5 countries in terms of number of visitors

Rank	Country	Visitors
1	China	3339
2	United States	2634
3	India	1368
4	Germany	972
5	Brazil	731

Table 3. Top 3 posts on the Records website with highest impact

Post	Date	Visits	Shares
Interview to Prof. Ramesh Jain	29/08/2017	619	103
Interview to Suranga Nanayakkara	13/09/2017	376	15
Standards Column: JPEG and MPEG	28/7/2017	273	44

Figure 1. Visits to the main page of the June issue since its publication (199 visits)

Figure 2. Visits to the main page of the June issue since its publication (199 visits)

Impact of the Social Media channels

The use of Social Media includes a Facebook page and a Twitter account (@sigmm). The number of followers is still not high (27 followers in Facebook, 88 followers in Twitter), which is natural with recently created channels. However, the impact of the posts on these platforms, in terms of reach, likes and shares is noteworthy. Tables 4 and 5 lists the top 3 Facebook posts and tweets, respectively, with highest impact up to now.

Table 4. Top 3 Facebook posts with highest impact

Post	Date	Reach (users)	Likes	Shares
>10K visitors in 3 weeks	21/08/2017	1347	7	4
Interview to Suranga Nanayakkara	13/09/2017	1297	89	3
Interview to Prof. Ramesh Jain	30/08/2017	645	28	4

Table 5. Top 3 tweets with highest impact

Post	Date	Likes	Retweets
Announcing the publication of the June issue	28/07/2017	7	9
Announcing the availability of the official @sigmm account	8/09/2017	8	9
Social Media Reporter Award: Report from ICMR 2017	11/09/2017	5	8

Awarded Social Media Reporters

The Social Media co-chairs, with the approval of the SIGMM Executive Committee, have already started the processes of selecting the Best Social Media Reporters from the latest SIGMM conferences. In particular, the winners have been Miriam Redi from ICMR 2017 (her post-summary of the conference is available at: http://sigmm.hosting.acm.org/2017/09/02/report-from-icmr-2017/) and Christian Timmerer for MMSYS 2017 (his post-summary of the conference is available at: http://sigmm.hosting.acm.org/2017/10/02/report-from-acm-mmsys-2017/). Congratulations!

The Editorial Team would like to take this opportunity to thank all the SIGMM members who use Social Media channels to share relevant news and information from the SIGMM community. We are convinced it is a very important service for the community.

We will keep pushing to improve the Records and extend their impact!

The Editorial Team.