First Combined ACM SIGMM Strategic Workshop and Summer School in Stellenbosch, South Africa

The first combined ACM SIGMM Strategic Workshop and Summer School will be held in Stellenbosch, South Africa, in the beginning of July 2020.

Rooiplein

First ACM Multimedia Strategic Workshop

The first Multimedia Strategic Workshop follows the successful series of workshops in areas such as information retrieval. The field of multimedia has continued to evolve and develop: collections of images, sounds and videos have become larger, computers have become more powerful, broadband and mobile Internet are widely supported, complex interactive searches can be done on personal computers or mobile devices, and soon. In addition, as large business enterprises find new ways to leverage the data they collect from users, the gap between the types of research conducted in industry and academics has widened, creating tensions over “repeatability” and “public data” in publications. These changes in environment and attitude mean that the time has come for the field to reassess its assumptions, goals, objectives and methodologies. The goal is to bring together researchers in the field to discuss long-term challenges and opportunities within the field. 

The participants of Multimedia Strategic Workshop will be active researchers in the field of Multimedia. The strategic workshop will give these researchers the opportunity to explore long-term issues in the multimedia field, to recognise the challenges on the horizon, to reach consensus on key issues and to describe them in the resulting report that will be made available to the multimedia research community. The report will stimulate debate, provide research directions to both researchers and graduate students, and also provide funding agencies with data that can be used coordinate the support for research.

The workshop will be held at the Wallenberg Research Centre at the Stellenbosch Institute for Advanced Study (STIAS). STIAS provides  provides venues and state-of-the art equipment for up to 300 conference guests at a time as well as breakaway rooms. 

The First ACM Multimedia Summer School on Multimedia

The motivation of the proposed summer school is to build on the success of the Deep Learning Indaba, but to focus on the application of machine learning to the field of Multimedia. We want delegates to be exposed to current research challenges in Multimedia. A secondary goal is to establish and grow the community of African researchers in the field of Multimedia; and to stimulate scientific research and collaboration between African researchers and the international community. The exact topics covered during the summer school will decided later together with the instructors but will reflect the current research trends in Multimedia.

The Strategic Workshop will be followed by the Summer School on Multimedia. Having the first summer school co-located with the Strategic Workshop will help to recruit the best possible instructors for the summer school. 

The Multimedia Summer School on Multimedia will be held at the Faculty of Engineering at Stellenbosch University, which is one of South Africa’s major producers of top quality engineers. The faculty was established in 1944 and is housed in a large complex of buildings with modern facilities, including lectures halls and electronic classrooms.

Stellenbosch is a university town in South Africa’s Western Cape province. It’s surrounded by the vineyards of the Cape Winelands and the mountainous nature reserves of Jonkershoek and Simonsberg. The town’s oak-shaded streets are lined with cafes, boutiques and art galleries. Cape Dutch architecture gives a sense of South Africa’s Dutch colonial history, as do the Village Museum’s period houses and gardens.

For more information about both events, please refer to the events’ web site (africanmultimedia.acm.org) or contact the organizers:

MPEG Column: 125th MPEG Meeting in Marrakesh, Morocco

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 125th MPEG meeting concluded on January 18, 2019 in Marrakesh, Morocco with the following topics:

  • Network-Based Media Processing (NBMP) – MPEG promotes NBMP to Committee Draft stage
  • 3DoF+ Visual – MPEG issues Call for Proposals on Immersive 3DoF+ Video Coding Technology
  • MPEG-5 Essential Video Coding (EVC) – MPEG starts work on MPEG-5 Essential Video Coding
  • ISOBMFF – MPEG issues Final Draft International Standard of Conformance and Reference software for formats based on the ISO Base Media File Format (ISOBMFF)
  • MPEG-21 User Description – MPEG finalizes 2nd edition of the MPEG-21 User Description

The corresponding press release of the 125th MPEG meeting can be found here. In this blog post I’d like to focus on those topics potentially relevant for over-the-top (OTT), namely NBMP, EVC, and ISOBMFF.

Network-Based Media Processing (NBMP)

The NBMP standard addresses the increasing complexity and sophistication of media services, specifically as the incurred media processing requires offloading complex media processing operations to the cloud/network to keep receiver hardware simple and power consumption low. Therefore, NBMP standard provides a standardized framework that allows content and service providers to describe, deploy, and control media processing for their content in the cloud. It comes with two main functions: (i) an abstraction layer to be deployed on top of existing cloud platforms (+ support for 5G core and edge computing) and (ii) a workflow manager to enable composition of multiple media processing tasks (i.e., process incoming media and metadata from a media source and produce processed media streams and metadata that are ready for distribution to a media sink). The NBMP standard now reached Committee Draft (CD) stage and final milestone is targeted for early 2020.

In particular, a standard like NBMP might become handy in the context of 5G in combination with mobile edge computing (MEC) which allows offloading certain tasks to a cloud environment in close proximity to the end user. For OTT, this could enable lower latency and more content being personalized towards the user’s context conditions and needs, hopefully leading to a better quality and user experience.

For further research aspects please see one of my previous posts

MPEG-5 Essential Video Coding (EVC)

MPEG-5 EVC clearly targets the high demand for efficient and cost-effective video coding technologies. Therefore, MPEG commenced work on such a new video coding standard that should have two profiles: (i) royalty-free baseline profile and (ii) main profile, which adds a small number of additional tools, each of which is capable, on an individual basis, of being either cleanly switched off or else switched over to the corresponding baseline tool. Timely publication of licensing terms (if any) is obviously very important for the success of such a standard.

The target coding efficiency for responses to the call for proposals was to be at least as efficient as HEVC. This target was exceeded by approximately 24% and the development of the MPEG-5 EVC standard is expected to be completed in 2020.

As of today, there’s the need to support AVC, HEVC, VP9, and AV1; soon VVC will become important. In other words, we already have a multi-codec environment to support and one might argue one more codec is probably not a big issue. The main benefit of EVC will be a royalty-free baseline profile but with AV1 there’s already such a codec available and it will be interesting to see how the royalty-free baseline profile of EVC compares to AV1.

For a new video coding format we will witness a plethora of evaluations and comparisons with existing formats (i.e., AVC, HEVC, VP9, AV1, VVC). These evaluations will be mainly based on objective metrics such as PSNR, SSIM, and VMAF. It will be also interesting to see subjective evaluations, specifically targeting OTT use cases (e.g., live and on demand).

ISO Base Media File Format (ISOBMFF)

The ISOBMFF (ISO/IEC 14496-12) is used as basis for many file (e.g., MP4) and streaming formats (e.g., DASH, CMAF) and as such received widespread adoption in both industry and academia. An overview of ISOBMFF is available here. The reference software is now available on GitHub and a plethora of conformance files are available here. In this context, the open source project GPAC is probably the most interesting aspect from a research point of view.

JPEG Column: 82nd JPEG Meeting in Lisbon, Portugal

The 82nd JPEG meeting was held in Lisbon, Portugal. Highlights of the meeting are progress on JPEG XL, JPEG XS, HTJ2K, JPEG Pleno, JPEG Systems and JPEG reference software.

JPEG has been the most common representation format of digital images for more than 25 years. Other image representation formats have been standardised by JPEG committee like JPEG 2000 or more recently JPEG XS. Furthermore, JPEG has been extended with new functionalities like HDR or alpha plane coding with the JPEG XT standard, and more recently with a reference software. Another solutions have been also proposed by different players with limited success. The JPEG committee decided it is the time to create a new working item, named JPEG XL, that aims to develop an image coding standard with increased quality and flexibility combined with a better compression efficiency. The evaluation of the call for proposals responses had already confirmed the industry interest, and development of core experiments has now begun. Several functionalities will be considered, like support for lossless transcoding of images represented with JPEG standard.

A 2nd workshop on media blockchain technologies was held in Lisbon, collocated with the JPEG meeting. Touradj Ebrahimi and Frederik Temmermans opened the workshop with presentations on relevant JPEG activities such as JPEG Privacy and Security. Thereafter, Zekeriya Erkin made a presentation on blockchain, distributed trust and privacy, and Carlos Serrão presented an overview of the ISO/TC 307 standardization work on blockchain and distributed ledger technologies. The workshop concluded with a panel discussion chaired by Fernando Pereira where the interoperability of blockchain and media technologies was discussed. A 3rd workshop is planned during the 83rd meeting to be held in Geneva, Switzerland on March 20th, 2019.

The 82nd JPEG meeting had the following highlights: jpeg82ndpicS

  • The new working item JPEG XL
  • JPEG Pleno
  • JPEG XS
  • HTJ2K
  • JPEG Systems – JUMBF & JPEG 360
  • JPEG reference software

 

The following summarizes various highlights during JPEG’s Lisbon meeting. As always, JPEG welcomes participation from industry and academia in all its standards activities.

JPEG XL

The JPEG Committee launched JPEG XL with the aim of developing a standard for image coding that offers substantially better compression efficiency when compared to existing image formats, along with features desirable for web distribution and efficient compression of high quality images. Subjective tests conducted by two independent research laboratories were presented at the 82nd meeting in Lisbon and indicate promising results that compare favorably with state of the art codecs.

A development software for the JPEG XL verification model is currently being implemented. A series of experiments have been also defined for improving the above model; these experiments address new functionalities such as lossless coding and progressive decoding.

JPEG Pleno

The JPEG Committee has three activities in JPEG Pleno: Light Field, Point Cloud, and Holographic image coding.

At the Lisbon meeting, Part 2 of JPEG Pleno Light Field was refined and a Committee Draft (CD) text was prepared. A new round of core experiments targets improved subaperture image prediction quality and scalability functionality.

JPEG Pleno Holography will be hosting a workshop on March 19th, 2019 during the 83rd JPEG meeting in Geneva. The purpose of this workshop is to provide insights in the status of holographic applications such as holographic microscopy and tomography, displays and printing, and to assess their impact on the planned standardization specification. This workshop invites participation from both industry and academia experts. Information on the workshop can be find at https://jpeg.org/items/20190228_pleno_holography_workshop_geneva_announcement.html

JPEG XS

The JPEG Committee is pleased to announce a new milestone of the JPEG XS project, with the Profiles and Buffer Models (JPEG XS ISO/IEC 21122 Part 2) submitted to ISO for immediate publication as International Standard.

This project aims at standardization of a visually lossless low-latency and lightweight compression scheme that can be used as a mezzanine codec within any AV market. Among the targeted use cases are video transport over professional video links (SDI, IP, Ethernet), real-time video storage, memory buffers, omnidirectional video capture and rendering, and sensor compression (for example in cameras and in the automotive industry). The Core Coding System allows for visually lossless quality at moderate compression rates, scalable end-to-end latency ranging from less than a line to a few lines of the image, and low complexity real time implementations in ASIC, FPGA, CPU and GPU. The new part “Profiles and Buffer Models” defines different coding tools subsets addressing specific application fields and use cases. For more information, interested parties are invited to read the JPEG White paper on JPEG XS that has been recently published on the JPEG website (https://jpeg.org).

 HTJ2K

The JPEG Committee continues its work on ISO/IEC 15444-15 High-Throughput JPEG 2000 (HTJ2K) with the development of conformance codestreams and reference software, improving interoperability and reducing obstacles to implementation.

The HTJ2K block coding algorithm has demonstrated an average tenfold increase in encoding and decoding throughput compared to the block coding algorithm currently defined by JPEG 2000 Part 1. This increase in throughput results in an average coding efficiency loss of 10% or less in comparison to the most efficient modes of the block coding algorithm in JPEG 2000 Part 1, and enables mathematically lossless transcoding to-and-from JPEG 2000 Part 1 codestreams.

JPEG Systems – JUMBF & JPEG 360

At the 82nd JPEG meeting, the Committee DIS ballots were completed, comments reviewed, and the standard progressed towards FDIS text for upcoming ballots on “JPEG Universal Metadata Box Format (JUMBF)” as ISO/IEC 19566-5, and “JPEG 360” as ISO/IEC 19566-6. Investigations continued to generalize the framework to other applications relying on JPEG (ISO/IEC 10918 | ITU-T.81), and JPEG Pleno Light Field.

JPEG reference software

With the JPEG Reference Software reaching FDIS stage, the JPEG Committee reaches an important milestone by extending its specifications with a new part containing a reference software. With its FDIS release, two implementations will become official reference to the most successful standard of the JPEG Committee: The fast and widely deployed libjpeg-turbo code, along with a complete implementation of JPEG coming from the Committee itself that also covers coding modes that were only known by a few experts.

 

Final Quote

“One of the strengths of the JPEG Committee has been in its ability to identify important trends in imaging technologies and their impact on products and services. I am delighted to see that this effort still continues and the Committee remains attentive to future.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

About JPEG

The Joint Photographic Experts Group (JPEG) is a Working Group of ISO/IEC, the International Organisation for Standardization / International Electrotechnical Commission, (ISO/IEC JTC 1/SC 29/WG 1) and of the International Telecommunication Union (ITU-T SG16), responsible for the popular JPEG, JPEG 2000, JPEG XR, JPSearch and more recently, the JPEG XT, JPEG XS, JPEG Systems and JPEG Pleno families of imaging standards.

The JPEG Committee nominally meets four times a year, in different world locations. The 82nd JPEG Meeting was held on 19-25 October 2018, in Lisbon, Portugal. The next 83rd JPEG Meeting will be held on 16-22 March 2019, in Geneva, Switzerland.

More information about JPEG and its work is available at www.jpeg.org or by contacting Antonio Pinheiro or Frederik Temmermans (pr@jpeg.org) of the JPEG Communication Subgroup.

If you would like to stay posted on JPEG activities, please subscribe to the jpeg-news mailing list on http://jpeg-news-list.jpeg.org.  

Future JPEG meetings are planned as follows:

  • No 83, Geneva, Switzerland, March 16 to 22, 2019
  • No 84, Brussels, Belgium, July 13 to 19, 2019

 

Solving Complex Issues through Immersive Narratives — Does QoE Play a Role?

Introduction

A transdisciplinary dialogue and innovative research, including technical and artistic research as well as digital humanities are necessary to solve complex issues. We need to support and produce creative practices, and engage in a critical reflection about the social and ethical dimensions of our current technology developments. At the core is an understanding that no single discipline, technology, or field can produce knowledge capable of addressing the complexities and crises of the contemporary world. Moreover, we see the arts and humanities as critical tools for understanding this hyper-complex, mediated, and fragmented global reality. As a use case, we will consider the complexity of extreme weather events, natural disasters and failure of climate change mitigation and adaptation, which are the risks with the highest likelihood of occurrence and largest global impact (World Economic Forum, 2017). Through our project, World of Wild Waters (WoWW), we are using immersive narratives and gamification to create a simpler holistic understanding of cause and effect of natural hazards by creating immersive user experiences based on real data, realistic scenarios and simulations. The objective is to increase societal preparedness for a multitude of stakeholders. Quality of Experience (QoE) modeling and assessment of immersive media experiences are at the heart of the expected impact of the narratives, where we would expect active participation, engagement and change, to play a key role [1].

Here, we present our views of immersion and presence in light of Quality of Experience (QoE). We will discuss the technical and creative considerations needed for QoE modeling and assessment of immersive media experiences. Finally, we will provide some reflections on QoE being an important building block in immersive narratives in general, and especially towards considering Extended Realities (XR) as an instantiation of Digital storytelling.

But what is Immersion and an Immersive Media Experience?

Immersion and immersive media experiences are commonly used terms in industry and academia today to describe new digital media. However, there is a gap in definitions of the term between the two worlds that can lead to confusions. This gap needs to be filled for XR to become a success and finally hit the masses, and not simply vanish as it has done so many times before since the invention of VR in 1962 by Morton Heilig (The Sensorama, or «Experience Theatre»). Immersion, thus far, can be plainly put as submersion in a medium (representational, fictional or simulated). It refers to a sense of belief, or the suspension of disbelief, while describing  the experience/event of being surrounded by an environment (artificial, mental, etc.). This view is contrasted by a data-oriented view often used by technophiles who regard immersion as a technological feat that ensures a multimodal sensory input to the user [2]. This is the objective description, which views immersion as quantifiable afforded or offered by the system (computer and head-mounted display (HMD), in this case).

Developing immersion on these lines risks favoring the typology of spatial immersion while alienating the rest (phenomenological, narrative, tactical, pleasure, etc.). This can be seen in recent VR applications that propel high-fidelity, low-latency, and precision-tracking products that aim to simulate the exactitude of sensorial information (visual, auditory, haptic) available in the real world to make the experience as ‘real’ as possible – a sense of realness, that is not necessarily immersive [3].

Another closely related phenomenon is that of presence, shortened from its original 1980’s form of telepresence [3]. It is a core phenomenon for immersive technologies describing an engagement via technology where one feels as oneself, even though physically removed. This definition was later appropriated for simulated/virtual environments where it was described as a “feeling of being transported” into the synthetic/artificial space of a simulated environment. It is for this reason that presence, a subjective sensation, is most often associated with spatial immersion. A renewed interest in presence research has invited fresh insights into conceptualizing presence.

Based on the technical or system approach towards immersion, we can refer to immersive media experiences through the definitions given in in Figure 1.

Figure 1. Evolution of current immersive media experiences

Figure 1. Definitions of current immersive media experiences

Much of the media considered today still consists of audio and visual presentations, but now enriched by new functionality such as 360 view, 3D and enabling interactivity. The ultimate goals are to create immersive media experiences by digitally creating real world presence by using available media technology and optimizing the experience as perceived by the participant [4].

Immersive Narratives for Solving Complex issues

The optimized immersive experience can be used in various domains to help solve complex issues by narration or gamification. Through World of Wild Waters (WoWW) we aim to focus on immersive narration and gamification of natural hazards. The project focuses on implication of immersive storytelling for disaster management by depicting extreme weather events and natural disasters. Immersive media experiences can present XR solutions for natural hazards by simulating real time data and providing people with a hands-on experience of how it feels to face an unexpected disaster. Immersive narratives can be used to allow people to be better prepared by experiencing the effects of different emergency scenarios while in a safe environment. However, QoE modeling and assessment for serious immersive narrations is a challenge and one need to carefully combine immersion, media technology and end user experiences for solving such complex issues.

Does QoE Play a Role?

Current state-of-the-art (SOTA) in immersive narratives from a technology point of view is by implementing virtual experience through Virtual Reality (VR), Augmented Reality (AR) and Mixed Reality (MR), commonly referred to as eXtended Realities (XR) seen as XR. Discussing the SOTA of XR is challenging as it exists across a large number of companies and sectors in form of fragmented domain specific products and services, and is changing from quarter to quarter. The definitions of immersion and presence differ, however, it is important to raise awareness of its generic building blocks to start a discussion on the way to move forward. The most important building blocks are the use of digital storytelling in the creation of the experience and the quality of the final experiences as perceived by the participants.

XR relies heavily on immersive narratives, stories where the experiences surround you providing a sense of realness as well as a sense of being there. Following Mel Slaters platform for VR [5], immersion consists of three parts:

  1. the concrete technical system for production,
  2. the illusions we are addressing and
  3. the resulting experience as interpreted by the participant.

The illusions part of XR play on providing a sense of being in a different place, which through high quality media makes us perceive that this is really happening (plausibility). Providing a high-quality experience eventually make us feel as participants in the story (agency). Finally, by feeling we are really participating in the experience, we get body ownership in this place. To be able to achieve these high-quality future media technology experiences we need new work processes and work flows for immersive experiences, requiring a vibrant connection between artists, innovators and technologists utilizing creative narratives and interactivity. To validate their quality and usefulness and ultimately business success, we need to focus on research and innovation within quality modeling and assessment making it possibly for the creators to iteratively improve the performance of their XR experience.

A transdisciplinary approach to immersive media experiences amplifies the relevance of content. Current QoE models predominantly treat content as a system influence factor, which allows for evaluations limited to its format, i.e., nature (e.g., image, sound, motion, speech, etc.) and type (e.g., analog or digital). Such a definition seems insufficient given how much the overall perceptual quality of such media is important. With technologies becoming mainstream, there is a global push for engaging content. Successful XR applications require strong content to generate, and retain, interest. One-time adventures, such as rollercoaster rides, are now deal breakers. With technologies, users too have matured, as the novelty factor of such media diminishes so does the initial preoccupation with interactivity and simulations. Immersive experiences must rely on content for a lasting impression.

However, the social impact of this media saturated reality is yet to be completely understood. QoE modeling and assessment and business models are evolving as we see more and more experiences being used commercially. However, there is still a lot of work to be done in the fields of the legal, ethical, political, health and cultural domains.

Conclusion

Immersive media experiences make a significant impact on the use and experience of new digital media through new and innovative approaches. These services are capable of establishing advanced transferable and sustainable best practices, specifically in art and technology, for playful and liveable human centered experiences solving complex problems. Further, the ubiquity of such media is changing our understanding for mediums as they form liveable environments that envelop our lives as a whole. The effects of these experiences are challenging our traditional concepts of liveability, which is why it is imperative for us to approach them as a paradigmatic shift in the civilizational project. The path taken should merge work on the technical aspects (systems) with the creative considerations (content).

Reference and Bibliography Entries

[1] Le Callet, P., Möller, S. and Perkis, A., 2013. Qualinet White Paper on Definitions of Quality of Experience (2012). European Network on Quality of Experience in Multimedia Systems and Services (COST Action IC 1003). Version 1.2. Mar-2013. [URL]

[2] Perrin, A.F.N.M., Xu, H., Kroupi, E., Řeřábek, M. and Ebrahimi, T., 2015, October. Multimodal dataset for assessment of quality of experience in immersive multimedia. In Proceedings of the 23rd ACM international conference on Multimedia (pp. 1007-1010). ACM. [URL]

[3] Normand, V., Babski, C., Benford, S., Bullock, A., Carion, S., Chrysanthou, Y., Farcet, N., Frécon, E., Harvey, J., Kuijpers, N. and Magnenat-Thalmann, N., 1999. The COVEN project: Exploring applicative, technical, and usage dimensions of collaborative virtual environments. Presence: Teleoperators & Virtual Environments, 8(2), pp.218-236. [URL]

[4] A. Perkis and A. Hameed, “Immersive media experiences – what do we need to move forward?,” SMPTE 2018, Westin Bonaventure Hotel & Suites, Los Angeles, California, 2018, pp. 1-12.
doi: 10.5594/M001846

[5] M. Slater, MV Sanchez-Vives, “Enhancing Our Lives with Immersive Virtual Reality”, Frontiers in Robotics and AI, 2016 – frontiersin.org

Note from the Editors:

Quality of Experience (QoE) in the context of immersive media applications and services are gaining momentum as such apps/services become available. Thus, it requires a deep integrated understanding of all involved aspects and corresponding scientific evaluations of the various dimensions (including but not limited to reproducibility). Therefore, the interested reader is referred to QUALINET and QoMEX, specifically QoMEX2019 which play a key role in this exciting application domain.

Report from ACM ICMR 2018 – by Cathal Gurrin

 

Multimedia computing, indexing, and retrieval continue to be one of the most exciting and fastest-growing research areas in the field of multimedia technology. ACM ICMR is the premier international conference that brings together experts and practitioners in the field for an annual conference. The eighth ACM International Conference on Multimedia Retrieval (ACM ICMR 2018) took place from June 11th to 14th, 2018 in Yokohama, Japan’s second most populous city. ACM ICMR 2018 featured a diverse range of activities including: Keynote talks, Demonstrations, Special Sessions and related Workshops, a Panel, a Doctoral Symposium, Industrial Talks, Tutorials, alongside regular conference papers in oral and poster session. The full ICMR2018 schedule can be found on the ICMR 2018 website <http://www.icmr2018.org/>. The organisers of ACM ICMR 2018 placed a large emphasis on generating a high-quality programme and in 2018; ICMR received 179 submissions to the main conference, with 21 accepted for oral presentation and 23 for poster presentation. A number of key themes emerged from the published papers at the conference: deep neural networks for content annotation; multimodal event detection and summarisation; novel multimedia applications; multimodal indexing and retrieval; and video retrieval from regular & social media sources. In addition, a strong emphasis on the user (in terms of end-user applications and user-predictive models) was noticeable throughout the ICMR 2018 programme. Indeed, the user theme was central to many of the components of the conference, from the panel discussion to the keynotes, workshops and special sessions. One of the most memorable elements of ICMR 2018 was a panel discussion on the ‘Top Five Problems in Multimedia Retrieval’ http://www.icmr2018.org/program_panel.html. The panel was composed of leading figures in the multimedia retrieval space: Tat-Seng Chua (National University of Singapore); Michael Houle (National Institute of Informatics); Ramesh Jain (University of California, Irvine); Nicu Sebe (University of Trento) and Rainer Lienhart (University of Augsburg). An engaging panel discussion was facilitated by Chong-Wah Ngo (City University of Hong Kong) and Vincent Oria (New Jersey Institute of Technology). The common theme was that multimedia retrieval is a hard challenge and that there are a number of fundamental topics that we need to make progress in, including bridging the semantic and user gaps, improving approaches to multimodal content fusion, neural network learning, addressing the challenge of processing at scale and the so called “curse of dimensionality”. ICMR2018 included two excellent keynote talks <http://www.icmr2018.org/program_keynote.html>. Firstly, Kohji Mitani, the Deputy Director of Science & Technology Research Laboratories NHK (Japan Broadcasting Corporation) explained about the ongoing evolution of broadcast technology and the efforts underway to create new (connected) broadcast services that can provide viewing experiences never before imagined and user experiences more attuned to daily life. The second keynote from Shunji Yamanaka, from The University of Tokyo discussed his experience of prototyping new user technologies and highlighted the importance of prototyping as a process that bridges an ever increasing gap between advanced technological solutions and societal users. During this entertaining and inspiring talk many prototypes developed in Yamanaka’s lab were introduced and the related vision explained to an eager audience. Three workshops were accepted for ACM ICMR 2018, covering the fields of lifelogging, art and real-estate technologies. Interestingly, all three workshops focused on domain specific applications in three emerging fields for multimedia analytics, all related to users and the user experience. The “LSC2018 – Lifelog Search Challenge”< http://lsc.dcu.ie/2018/> workshop was a novel and highly entertaining workshop modelled on the successful Video Browser Showdown series of participation workshops at the annual MMM conference. LSC was a participation workshop, which means that the participants wrote a paper describing a prototype interactive retrieval system for multimodal lifelog data. It was then evaluated during a live interactive search challenge during the workshop. Six prototype systems took part in the search challenge in front of an audience that reached fifty conference attendees. This was a popular and exciting workshop and could become a regular feature at future ICMR conferences. The second workshop was the MM-Art & ACM workshop <http://www.attractiveness-computing.org/mmart_acm2018/index.html>, which was a joint workshop that merged two existing workshops, the International Workshop on Multimedia Artworks Analysis (MMArt) and the International Workshop on Attractiveness Computing in Multimedia (ACM). The aim of the joint workshop was to enlarge the scope of discussion issues and inspire more works in related fields. The papers at the workshop focused on the creation, editing and retrieval of art-related multimedia content. The third workshop was RETech 2018 <https://sites.google.com/view/multimedia-for-retech/>, the first international workshop on multimedia for real estate tech. In recent years there has been a huge uptake of multimedia processing and retrieval technologies in the domain, but there are still a lot of challenges remaining, such as quality, cost, sensitivity, diversity, and attractiveness to users of content. In addition, ICMR 2018 included three tutorials <http://www.icmr2018.org/program_tutorial.html> on topical areas for the multimedia retrieval communities. The first was ‘Objects, Relationships and Context in Visual Data’ by Hanwang Zhang and Qianru Sun. The second was ‘Recommendation Technologies for Multimedia Content’ by Xiangnan He, Hanwang Zhang and Tat-Seng Chua and the final tutorial was ‘Multimedia Content Understanding, my Learning from very few Examples’ by Guo-Jun Qi. All tutorials were well received and feedback was very good. Other aspects of note from ICMR2018 were a doctoral symposium that attracted five authors and a dedicated industrial session that had four industrial talks highlighting the multimedia retrieval challenges faced by industry. It was interesting from the industrial talks to hear how the analytics and retrieval technologies developed over years and presented at venues such as ICMR were actually being deployed in real-world user applications by large organisations such as NEC and Hitachi. It is always a good idea to listen to the real-world applications of the research carried out by our community. The best paper session at ICMR 2018 had four top ranked works covering multimodal, audio and text retrieval. The best paper award went to ‘Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval’, by Niluthpol Mithun, Juncheng Li, Florian Metze and Amit Roy-Chowdhury. The best Multi-Modal Paper Award winner was ‘Cross-Modal Retrieval Using Deep De-correlated Subspace Ranking Hashing’ by Kevin Joslyn, Kai Li and Kien Hua. In addition, there were awards for best poster ‘PatternNet: Visual Pattern Mining with Deep Neural Network’ by Hongzhi Li, Joseph Ellis, Lei Zhang and Shih-Fu Chang, and best demo ‘Dynamic construction and manipulation of hierarchical quartic image graphs’ by Nico Hezel and Kai Uwe Barthel. Finally, although often overlooked, there were six reviewers commended for their outstanding reviews; Liqiang Nie, John Kender, Yasushi Makihara, Pascal Mettes, Jianquan Liu, and Yusuke Matsui. As with some other ACM sponsored conferences, ACM ICMR 2018 included an award for the most active social media commentator, which is how I ended up writing this report. There were a number of active social media commentators at ICMR 2018 each of which provided a valuable commentary on the proceedings and added to the historical archive.
fig1

Of course, the social side of a conference can be as important as the science. ICMR 2018 included two main social events, a welcome reception and the conference banquet. The welcome reception took place at the Fisherman’s Market, an Asian and ethnic dining experience with a wide selection of Japanese food available. The Conference Banquet took place in the Hotel New Grand, which was built in 1927 and has a long history of attracting famous guests. The venue is famed for the quality of the food and the spectacular panoramic views of the port of Yokohama. As with the rest of the conference, the banquet food was top-class with more than one of the attendees commenting that the Japanese beef on offer was the best they had ever tasted.

ICMR 2018 was an exciting and excellently organised conference and it is important to acknowledge the efforts of the general co-chairs: Kiyoharu Aizawa (The Univ. Of Tokyo), Michael Lew (Leiden Univ.) and Shin’ichi Satoh (National Inst. Of Informatics). They were ably assisted by the TPC co-chairs, Benoit Huet (Eurecom), Qi Tian (Univ. Of Texas at San Antonio) and Keiji Yanai (The Univ. Of Electro-Comm), who coordinated the reviews from a 111 person program committee in a double-blind manner, with an average of 3.8 reviews being prepared for every paper. ICMR 2019 will take place in Ottawa, Canada in June 2019 and ICMR 2020 will take place in Dublin, Ireland in June 2020. I hope to see you all there and continuing the tradition of excellent ICMR conferences.

The Lifelog Search Challenge Workshop attracted six teams for a real-time public interactive search competition.

The Lifelog Search Challenge Workshop attracted six teams for a real-time public interactive search competition.

The Lifelog Search Challenge Workshop attracted six teams for a real-time public interactive search competition.

The Lifelog Search Challenge Workshop attracted six teams for a real-time public interactive search competition.

Shunji Yamanaka about to begin his keynote talk on Prototyping

Shunji Yamanaka about to begin his keynote talk on Prototyping

Kiyoharu Aizawa and Shin'ichi Satoh, two of the ICMR 2018 General co-Chairs welcoming attendees to the ICMR 2018 Banquet at the historical Hotel New Grand.

Kiyoharu Aizawa and Shin’ichi Satoh, two of the ICMR 2018 General co-Chairs welcoming attendees to the ICMR 2018 Banquet at the historical Hotel New Grand.

ACM Multimedia 2019 and Reproducibility in Multimedia Research

The first months of the new calendar year, multimedia researchers traditionally are hard at work on their ACM Multimedia submissions. (This year the submission deadline is 1 April.) Questions of reproducibility, including those of data set availability and release, are at the forefront of everyone’s mind. In this edition of SIGMM Records, the editors of the “Data Sets and Benchmarks” column have teamed up with two intersecting groups, the Reproducibility Chairs and the General Chairs of ACM Multimedia 2019, to bring you a column about reproducibility in multimedia research and the connection between reproducible research and publicly available data sets. The column highlights the activities of SIGMM towards implementing ACM paper badging. ACM MMSys has pushed our community forward on reproducibility and pioneered the use of ACM badging [1]. We are proud that in 2019 the newly established Reproducibility track will introduce badging at ACM Multimedia.

Complete information on Reproducibility at ACM Multimedia is available at:  https://project.inria.fr/acmmmreproducibility/

The importance of reproducibility

Researchers intuitively understand the importance of reproducibility. Too often, however, it is explained superficially, with statements such as, “If you don’t pay attention to reproducibility, your paper will be rejected”. The essence of the matter lies deeper: reproducibility is important because of its role in making scientific progress possible.

What is this role exactly? The reason that we do research is to contribute to the totality of knowledge at the disposal of humankind. If we think of this knowledge as a building, i.e. a sort of edifice, the role of reproducibility is to provide the strength and stability that makes it possible to build continually upwards. Without reproducibility, there would simply be no way of creating new knowledge.

ACM provides a helpful characterization of reproducibility: “An experimental result is not fully established unless it can be independently reproduced” [2]. In short, a result that is obtainable only once is not actually a result.

Reproducibility and scientific rigor are often mentioned in the same breath. Rigorous research provides systematic and sufficient evidence for its contributions. For example, in an experimental paper, the experiments must be properly designed and the conclusions of the paper must be directly supported by the experimental findings. Rigor involves careful analysis, interpretation, and reporting of the research results. Attention to reproducibility can be considered a part of rigor.

When we commit ourselves to reproducible research, we also commit ourselves to making sure that the research community has what it needs to reproduce our work. This means releasing the data that we use, and also releasing implementations of our algorithms. Devoting time and effort to reproducible research is an important way in which we support Open Science, the movement to make research resources and research results openly accessible to society.

Repeatability vs. Replicability vs. Reproducibility

We frequently use the word “reproducibility” in an informal way that includes three individual concepts, which actually have distinct formal uses: “repeatability”, “replicability” and “reproducibility”. Again, we can turn to ACM for definitions [2]. All three concepts express the idea that research results must be invariant with respect to changes in the conditions under which they were obtained.

Specifically, “repeatability” means that the same research team can achieve the same result using the same setup and resources. “Replicability” means that that team can pass the setup and resources to a different research team, and that that team can also achieve the same result. “Reproducibility” (here, used in the formal sense) means that a different team can achieve the same result using a different setup and different resources. Note the connection to scientific rigor: obtaining the same result multiple times via a process that lacks rigor is meaningless.

When we write a research paper paying attention to reproducibility, it means that we are confident we would obtain the same results again within our own research team, that the paper includes a detailed description of how we achieved the result (and is accompanied by code or other resources), and that we are convinced that other researchers would reach the same conclusions using a comparable, but not identical, set up and resources.

Reproducibility at ACM Multimedia 2019

ACM Multimedia 2019 promotes reproducibility in two ways: First, as usual, reproducibility is one of the review criteria considered by the reviewers (https://www.acmmm.org/2019/reviewer-guidelines/). It is critical that authors describe their approach clearly and completely, and do not omit any details of their implementation or evaluation. Authors should release their data and also provide experimental results on publicly available data. Finally, increasingly, we are seeing authors who include a link to their code or other resources associated with the paper. Releasing resources should be considered a best practice.

The second way that ACM Multimedia 2019 promotes reproducibility is the new Reproducibility Track. Full information is available on the ACM Multimedia Reproducibility website [3]. The purpose of the track is to ensure that authors receive recognition for the effort they have dedicated to making their research reproducible, and also to assign ACM badges to their papers. Next, we summarize the concept of ACM badges, then we will return to discuss the Reproducibility Track in more detail.

ACM Paper badging

Here, we provide a short summary of the information on badging available on the ACM website at [2]. ACM introduced a system of badges in order to help push forward the processes by which papers are reviewed. The goal is to move the attention given to reproducibility to a new level, beyond the level achieved during traditional reviews. Badges seek to motivate authors to use practices leading to better replicability, with the idea that replicability will in turn lead to reproducibility.

In order to understand the badge system, it is helpful to know that ACM badges are divided into two categories. “Artifacts Evaluated” and “Results Evaluated”. ACM defines artifacts as digital objects that are created for the purpose of, or as a result of, carrying out research. Artifacts include implementation code as well as scripts used to run experiments, analyze results, or generate plots. Critically, they also include the data sets that were used in the experiment. The different “Artifacts Evaluated” badges reflect the level of care that authors put into making the artifacts available including how far do they go beyond the minimal functionality necessary and how well are the artifacts are documented.  

There are two “Results Evaluated” badges. The “Results Replicated” badge, which results from a replicability review, and a “Results Reproduced” badge, which results from a full reproducibility review, in which the referees have succeeded in reproducing the results of the paper with only the descriptions of the authors, and without any of the authors’ artifacts. ACM Multimedia adopts the ACM idea that replicability leads to full reproducibility, and for this reason choses to focus in its first year on the “Results replicated” badge. Next we turn to a discussion of the ACM Multimedia 2019 Reproducibility Track and how it implements the “Results Replicated” badge.

Badging ACM MM 2019

Authors of main-conference papers appearing at ACM Multimedia 2018 or 2017 are eligible to make a submission to the Reproducibility Track of ACM Multimedia 2019. The submission has two components: An archive containing the resources needed to replicate the paper, and a short companion paper that contains a description of the experiments that were carried out in the original paper and implemented in the archive. The submissions undergo a formal reproducibility review, and submissions that pass receive a “Results Replicated” badge, which  is added to the original paper in the ACM Digital Library. The companion paper appears in the proceedings of ACM Multimedia 2019 (also with a badge) and is presented at the conference as a poster.

ACM defines the badges, but the choice of which badges to award, and how to implement the review process that leads to the badge, is left to the individual conferences. The consequence is that the design and implementation of the ACM Multimedia Reproducibility Track requires a number of important decisions as well as careful implementation.

A key consideration when designing the ACM Multimedia Reproducibility Track was the work of the reproducibility reviewers. These reviewers carry out tasks that go beyond those of main-conference reviewers, since they must use the authors’ artifacts to replicate their results. The track is designed such that the reproducibility reviewers are deeply involved in the process. Because the companion paper is submitted a year after the original paper, reproducibility reviewers have plenty of time to dive into the code and work together with the authors. During this intensive process, the reviewers extend the originally submitted companion paper with a description of the review process and become authors on the final version of the companion paper.

The ACM Multimedia Reproducibility Track is expected to run similarly in years beyond 2019. The experience gained in 2019 will allow future years to tweak the process in small ways if it proves necessary, and also to expand to other ACM badges.

The visibility of badged papers is important for ACM Multimedia. Visibility incentivizes the authors who submit work to the conference to apply best practices in reproducibility. Practically, the visibility of badges also allows researchers to quickly identify work that they can build on. If a paper presenting new research results has a badge, researchers can immediately understand that this paper would be straightforward to use as a baseline, or that they can build confidently on the paper results without encountering ambiguities, technical issues, or other time-consuming frustrations.

The link between reproducibility and multimedia data sets

The link between Reproducibility and Multimedia Data Sets has been pointed out before, for example, in the theme chosen by the ACM Multimedia 2016 MMCommons workshop, “Datasets, Evaluation, and Reproducibility” [4]. One of the goals of this workshop was to discuss how data challenges and benchmarking tasks can catalyze the reproducibility of algorithms and methods.

Researchers who dedicate time and effort to creating and publishing data sets are making a valuable contribution to research. In order to compare the effectiveness of two algorithms, all other aspects of the evaluation must be controlled, including the data set that is used. Making data sets publicly available supports the systematic comparison of algorithms that is necessary to demonstrate that new algorithms are capable of outperforming the state of the art.

Considering the definitions of “replicability” and “reproducibility” introduced above, additional observations can be made about the importance of multimedia data sets. Creating and publishing data sets supports replicability. In order to replicate a research result, the same resources as used in the original experiments, including the data set, must be available to research teams beyond the one who originally carried out the research.

Creating and publishing data sets also supports reproducibility (in the formal sense of the word defined above). In order to reproduce research results, however, it is necessary that there is more than one data set available that is suitable for carrying out evaluation of a particular approach or algorithm. Critically, the definition of reproducibility involves using different resources than were used in the original work. As the multimedia community continues to move from replication to reproduction, it is essential that a large number of data sets are created and published, in order to ensure that multiple data sets are available to assess the reproducibility of research results.

Acknowledgements

Thank you to people whose hard work is making reproducibility at ACM Multimedia happen: This includes the 2019 TPC Chairs, main-conference ACs and reviewers, as well as the Reproducibility reviewers. If you would like to volunteer to be a reproducibility committee member in this or future years, please contact the Reproducibility Chairs at MM19-Repro@sigmm.org

[1] Simon, Gwendal. Reproducibility in ACM MMSys Conference. Blogpost, 9 May 2017 http://peerdal.blogspot.com/2017/05/reproducibility-in-acm-mmsys-conference.html Accessed 9 March 2019.

[2] ACM, Artifact Review and Badging, Reviewed April 2018,  https://www.acm.org/publications/policies/artifact-review-badging Accessed 9 March 2019.

[3] ACM MM Reproducibility: Information on Reproducibility at ACM Multimedia https://project.inria.fr/acmmmreproducibility/ Accessed 9 March 2019.

[4] Bart Thomee, Damian Borth, and Julia Bernd. 2016. Multimedia COMMONS Workshop 2016 (MMCommons’16): Datasets, Evaluation, and Reproducibility. In Proceedings of the 24th ACM international conference on Multimedia (MM ’16). ACM, New York, NY, USA, 1485-1486.

Gender Diversity in SIGMM: We’ll Just Leave This Here As Well

sigmm_logo

1. Introduction and Background

SIGMM is the Association for Computing Machinery’s (ACM) Special Interest Group (SIG) in Multimedia, one of 36 SIGs in the ACM family.  ACM itself was founded in 1947 and is the world’s largest educational and scientific society for computing, uniting computing educators, researchers and professionals. With almost 100,000 members worldwide, ACM is a strong force in the computing world and is dedicated to advancing the art, science, engineering, and application of information technology.

SIGMM has been operating for nearly 30 years and sponsors 5, soon to be 6, major international conferences each year as well as dozens of workshops and an ACM Transactions Journal.  SIGMM sponsors several Excellence and Achievement Awards each year, including awards for Technical Achievement, Rising Star, Outstanding PhD Thesis, TOMM best paper, and Best TOMM Associate Editor award. SIGMM funds student travel scholarships to almost all our conferences with nearly 50 such student travel grants at the flagship MULTIMEDIA conference in Seoul, Korea, in 2018.  SIGMM has two active chapters, one in the Bay Area of San Francisco and one in China. It has a very active online activity with social media reporters at our conferences, a regular SIGMM Records newsletter, and a weekly news digest.  At our flagship conference, SIGMM sponsors Women and diversity lunches, Doctoral Symposiums, and a newcomers’ welcome breakfast.  SIGMM also funds special initiatives based on suggestions/proposals from the community as well as a newly-launched conference ambassador program to reach out to other ACM SIGs for collaborations across our conferences.

It is generally accepted that SIGMM has a diversity and inclusion problem which exists at all levels, but we have now realized this and have started to take action.  In September 2017 ACM SIGARCH produced the first of a series of articles on gender diversity in the field of Computer Architecture. SIGARCH members looked at their numbers of representation of women in SIGARCH conferences over the previous 2 years and produced the first of a set of reports entitled “Gender Diversity in Computer Architecture: We’re Just Going to Leave This Here”.

fig1

This report generated much online debate and commentary, including at the ACM SIG Governing Board (SGB) meetings in 2017 and in 2018.

At a SIGMM Executive Committee meeting in Mountain View, California in October 2017, SIGMM agreed to replicate the SIGARCH study to examine and measure, the (lack of) gender diversity at SIGMM-sponsored Conferences.  We issued a call offering funding support to do this, but there were no takers, so I did this myself, from within my own research lab.

2. Baselines for Performance Comparison

Before jumping into the numbers it is worth establishing a baseline to measure against. As an industry-wide figure, 17-24% of Computer Science undergrads at US R1 institutions are female as are 17% of those with technical roles at large high-tech companies that report diversity. I also looked at the female representation within some of the other ACM SIGs. While we must accept that inclusiveness and diversity is not just about gender but also about race, ethnicity, nationality, even about institution, we don’t have data on these other aspects so I focus just on gender diversity.

So how does SIGMM compare to other SIGs? Let’s look at SIG memberships using data provided by ACM.

The best (most balanced or least imbalanced) SIGs are CSE (Computer Science Education) with 25% female, Computer Human Interaction (CHI) also with 25% female from among those declaring a gender, though CHI is probably better because it has a greater percentage of undeclared gender, thus a lower proportion of males. The worst SIGs (most imbalanced or least balanced) are PLAN (Programming Languages) with 4% female, and OPS (operating systems) with 5% female.

fig2

The figures for SIGMM show 9% female membership with 17% unknown or not declaring which means that among the declared members it is just below 11%. Among the other SIGs this makes us closest to AI (Artificial Intelligence) and to IR (Information Retrieval), though SIGIR has a larger number of members with gender undeclared.

fig3

Measuring this against overall ACM memberships we find that ACM members are 68% male, 12% female and 20% undeclared. This makes SIGMM quite mid-table compared to other SIGs, but we’re all doing badly and we all have an imbalance. Interestingly, the MULTMEDIA Conference in 2018 in Seoul, Korea had 81% male, 18% female and 1% other/undeclared attendees, slightly better than our memberships ratio but still not good.

3. Gender Balance at SIGMM Conferences

We [1] carried out a desk study for the 3 major SIGMM conferences, namely MULTIMEDIA with an average attendance of almost 800, the International Conference on Multimedia Retrieval (ICMR) with 230 attendees at the last conference and Multimedia Systems (MMSys) with about 130 attendees. For each of the last 5 years we trawled through the conference websites, extracting the names/affiliations of the organizing committees, the technical program committees and the invited keynote speakers.  We did likewise for the SIGMM award winners. This required us determining gender for over 2,700 people and although there were duplicates as the same people can recur on the program committees for multiple years and over multiple conferences. Some of these were easy like “John” and “Susanne”, but these were few so for the others we searched for them on the web. If we were still searching after 5 minutes, we gave up. [2]

[1] This work was carried out by Agata Wolski, a Summer intern student, and I, during Summer 2018.

[2] The data gathered from this activity is available on request from alan.smeaton@dcu.ie

The figures for each of these annual conferences for a 5-year period for MULTIMEDIA, for a 4-year period for ICMR and for a 3-year period for MMSys, are shown in the following sequence of charts, first showing the percentages and then the raw numbers, for each conference.

fig4

fig5

fig6

fig7

fig8

fig9

So what do the figures mean in comparison to each other and to our baseline?

The results tell us the following:

  • Almost all the percentages for female participation in the organisation of all SIGMM conferences are above the SIGMM membership figure of 9% which is really closer to 11% when discounting those SIGMM members with gender unassigned yet we know the number of female SIGMM members is much already smaller compared to the 17% female in technology companies and the almost 18% female ACM members when discounting unassigned genders.
  • Even if we were to use 17% to 18% figures as our baseline, our female participation in SIGMM conference organisation is less than that baseline, meaning our female SIGMM members are not appearing in organisational and committee roles as per our membership pro rates would indicate they should.
  • While each of our conferences fall below these pro rata figures, none of the three conferences are particularly worse than the others.

4. Initiatives Elsewhere to Redress Gender Imbalance

I then examined some of the actions that are carried out elsewhere and that SIGMM could implement, and started by looking at other ACM SIGs.  There I found that some of the other SIGs do some of the following:

  • women and diversity events at conferences (breakfasts or lunches, like SIGMM does)
  • Women-only networking pre-conference meals at conferences
  • Women-only technical programme events like N2Women
  • Formation of mentoring group (using Slack) for informal mentoring
  • Highlighting the roles and achievements of women on social media and in newsletters
  • Childcare and companion travel grants for conference attendance

I then looked more broadly at other initiatives and found the following:

  • gender quotas
  • accelerator programs like Athena Swan
  • female-only events like workshops
  • reports like this which act as spotlights

When we put these all together there are three recurring themes which appear across various initiatives:

  1. Networking .. encouraging us to be part of a smaller group within a larger group. This is a natural human trait of us being tribal, we like to belong to groups starting with our family but also the people we have lunch with, go to yoga classes with, go on holidays with, we each have multiple sometimes non-overlapping groups or tribes that we like to be part of. One such group is the network of minority/women that gets formed as a result of some of the activities.
  2. Peer-to-peer buddying .. again there is a natural human trait whereby older siblings (sisters) tend to help younger ones throughout life, from when we are very young and right throughout life.  The buddying activity reflects this and gives a form of satisfaction to the older or senior buddy, as well as practical benefit to the younger or more junior buddy.
  3. Role models .. there are several initiatives which try to promote role models as those kinds of people that we ourselves can try to aspire to be.  More often that not, it is the very successful people and the high flyers who are put into these positions of role models whereas in practice not everyone actually wants to aspire to be a high flyer.  For many people success in their lives means something different, something less lofty and aspirational and when we see high flying successful people promoted as role models our reaction can be the opposite. We can reject them because we don’t want to be in their league and as a result we can feel depressed and regard ourselves as under-achievers, thus defeating the purpose of having role models in the first place.

5. SIGMM Women’s / Diversity Lunch at MULTIMEDIA 2018

At the ACM MULTIMEDIA Conference in Seoul, Korea in October 2018 SIGMM once again organised a women’s / diversity lunch and about 60 people attended, mostly women.

fig10

At the event I gave a high level overview of the statistics presented earlier in this report, and then in order to gather feedback from the audience we held a moderated discussion with PadLet used to gather feedback. PadLet is an online bulletin board used to display information (text, images or links) which can be contributed anonymously from an audience. Attendees at the lunch scanned a QR code on their smartphones which opened a browser and allowed them to post comments on the big screen in response to a topic being discussed during the meeting.

The first topic discussed was “What brings you to the MULTMEDIA Conference?

  • The answers (anonymous comments) posted included that many are here because they are presenting papers or posters, many want to do networking and to share ideas, to help build the community of like-minded researchers, some are attending in order to meet old friends .. and these are the usual reasons for attending a conference.

For the second topic we asked “What excites you about multimedia as a topic, how did you get into the area?

  • The answers included the interaction between computer vision and language, the novel applications around multimodality, the multidisciplinary nature and the practical nature of the subject, and the diversity of topics and the people attending.

The third topic was “What is more/less important for you … networking, role models or peer buddies?

  • From the answers to this, networking was almost universally identified as the most important, and as a follow-on from that, interacting with peers

Finally we asked “Do you know of an initiative that works, or that you would like to see at SIGMM event(s)?

  • A variety of suggestions were put forward including holding hackathons, funding undergraduate students from local schools to attend the conference, an ACM award for women only, ring-fenced funding for supporting women only, training for reviewing, and a lot of people wanted mentoring and mentor matching.

6. SIGMM Initiatives

So what will we do in SIGMM?

  • We will continue to encourage networking at SIGMM sponsored conferences. We will fund lunches like the ones at the MULTIMEDIA Conference. We also started a newcomers breakfast at the MULTIMEDIA Conference in 2018 and we will continue with this.
  • We will ensure that all our conference delegates can attend all conference events at all SIGMM conferences without extra fees. This was a SIGMM policy identified in a review of SIGMM conference some years ago but it has slipped.
  • We will not force but we will facilitate peer-to-peer buddying through the networking events at our conferences and through this we will indirectly help you identify your own role models.
  • We will appoint a diversity coordinator to oversee the women / diversity activities across our SIGMM events and this appointee will be a full member of the SIGMM Executive Committee.
  • We will offer an opportunity for all members of our SIGMM community attending our sponsored conferences, as part of their conference registration, to indicate their availability and interest in taking on an organisational role in SIGMM activities, including conference organisation and/or reviewing. This will provide for us a reserve of people from whom we can draw on their expertise and their services and we can do so in a way which promotes diversity.

These may appear to be small-scale and relatively minor because we are not getting to the roots of what causes the bias and we are not inducing change to counter the causes of the bias. However these are positive steps, steps in the right direction, and we will now have the gender and other bias issues permanently on our radars.

Report from the SIGMM Emerging Leaders Symposium 2018

The idea of a symposium to bring together the bright new talent within the SIGMM community and to hear their views on some topics within the area and on the future of Multimedia, was first mooted in 2014 by Shih-Fu Chang, then SIGMM Chair. That lead to the “Rising Stars Symposium” at the MULTIMEDIA Conference in 2015 where 12 invited speakers made presentations on their work as a satellite event to the main conference. After each presentation a respondent, typically an experienced member of the SIGMM community, gave a response or personal interpretation of the presentation. The format worked well and was very thought-provoking, though some people felt that a shorter event which could be more integrated into the conference, might work better.

For the next year, 2016, the event was run a second time with 6 invited speakers and was indeed more integrated into the main conference. The event skipped a year in 2017, but was brought back for the MULTIMEDIA Conference in 2018 and this time, rather than invite speakers we decided to have an open call with nominations, to make selection for the symposium a competitive process. We also decided to rename the event from Rising Stars Symposium, and call it the “SIGMM Emerging Leaders Symposium”, to avoid confusion with the “SIGMM Rising Star Award”, which is completely different and is awarded annually.

In July 2018 we issued a call for applications to the “Third SIGMM Emerging Leaders Symposium, 2018” which was to be held at the annual MULTIMEDIA Conference in Seoul, Korea, in October 2018. Applications were received and were evaluated by a panel consisting of the following people, and we thank them for volunteering and for their support in doing this.

Werner Bailer, Joanneum Research
Guillaume Gravier, IRISA
Frank Hopfgartner, Sheffield University
Hayley Hung, Delft University, (a previous awardee)
Marta Mrak, BBC

Based on the assessment panel recommendations, 4 speakers were included in the Symposium, namely:

Hanwang Zhang, Nanyang Technological University, Singapore
Michael Riegler, Simula, Norway
Jia Jia, Tsinghua University, China
Liqiang Nie, Shandong University, China

The Symposium took place on the last day of the main conference and was chaired by Gerald Friedland, SIGMM Conference Director.

image1

Towards X Visual Reasoning

By Hanwang Zhang (Nanyang Technological University, Singapore)

For decades, we are interested in detecting objects and classifying them into a fixed vocabulary of lexicon. With the maturity of these “low-level” vision solutions, we are hunger for a “higher-level” representation of the visual data, so as to extract visual knowledge rather than merely bags of visual entities, allowing machines to reason about human-level decision-making. In particular, we wish an “X” reasoning, where X means eXplainable and eXplicit. In this talk, I first reviewed a brief history of symbolism and connectionism, which alternatively promote the development of AI in the past decades. In particular, though the deep neural networks — the prevailing incarnation of connectionism — have shown impressive super-human performance in various tasks, they still lag behind us in high-level reasoning. Therefore, I propose the marriage between symbolism and connectionism to take the complementary advantages of them, that is, the proposed X visual reasoning. Second, I introduced the two building blocks of X visual reasoning: visual knowledge acquisition by scene graph detection and X neural modules applied on the knowledge for reasoning. For scene graph detection, I introduced our recent progress on reinforcement learning of the scene dynamics, which can help to generate coherent scene graphs that respect visual context. For X neural modules, I discussed our most recent work on module design, algorithms, and applications in various visual reasoning tasks such as visual Q&A, natural language grounding, and image captioning. At last, I visioned some future directions towards X visual reasoning, such as using meta-learning and deep reinforcement learning for more dynamic and efficient X neural module compositions.

Professor Ramesh Jain mentioned that a truly X reasoning should consider the potential human-computer interaction that may change or digress a current reasoning path. This is crucial because human intelligence can reasonably respond to interruptions and incoming evidences.

We can position X visual reasoning in the recent trend of neural-symbolic unification, which gradually becomes our consensus towards a general AI. The “neural”’ is good at representation learning and model training, and the “symbolic” is good at knowledge reasoning and model explanation. One should bear in mind that the future multimedia system should take the complementary advantages of the “neural-symbolic”.

BioMedia – The Important Role of Multimedia Research for Healthcare

by Michael Riegler (SimulaMet & University of Oslo, Norway)

With the recent rise of machine learning, analysis of medical data has become a hot topic. Nevertheless, the analysis is still often restricted to a special type of images coming from radiology or CT scans. However, there are continuously vast amounts of multimedia data collected both within the healthcare systems and by the users using devices such as cameras, sensors and mobile phones.

In this talk I focused on the potential of multimedia data and applications to improve healthcare systems. First, a focus on the various data was given. A person’s health is contained in many data sources such as images, videos, text and sensors. Medical data can also be divided into data with hard and soft ground truth. Hard ground truth means that there are procedures that verify certain labels of the given data (for example a biopsy report for a cancerous tissue sample). Soft ground truth is data that was labeled by medical experts without a verification of the outcome. Different data types also come with different levels of security. For example activity data from sensors have a low chance to help to identify the patient whereas speech, social media, GPS come with a higher chance of identification. Finally, it is important to take context into account and results should be explainable and reproducible. This was followed by a discussion about the importance of multimodal data fusion and context aware analysis supported by three example use cases: Mental health, artificial reproduction and colonoscopy.

I also discussed the importance of involving medical experts and patients as users. Medical experts and patients are two different user groups, with different needs and requirements. One common requirement for both groups is the need for explanation about how the decisions were taken. In addition, medical experts are mainly interested in support during their daily tasks, but are not very interested in, for example, huge amounts of sensor data from patients because the increase amount of work. They have a preference on interacting with the patients than with the data. Patients on the other hand usually prefer to collect a lot of data and get informed about their current status, but are more concerned about their privacy. They also usually want that medical experts take as much data into account as possible when making their assessments.

Professor Susanne Boll mentioned that it is important to find out what is needed to make automatic analysis accepted by hospitals and who is taking the responsibility for decisions made by automatic systems. Understandability and reproducibility of methods were mentioned as an important first step.

The most relevant messages of the talk are that the multimedia community has the diverse skills needed to address several challenges related to medicine. Furthermore, it is important to focus on explainable and reproducible methods.

Mental Health Computing via Harvesting Social Media Data

By Jia Jia, Tsinghua University, China

Nowadays, with the rapid pace of life, mental health is receiving widespread attention. Common symptoms like stress, or clinical disorders like depression, are quite harmful, and thus it is of vital significance to detect mental health problems before they lead to severe consequences. Professional mental criteria like the International Classification of Diseases (ICD-10 [1]) and the Diagnostic and Statistical Manual of Mental Disorders (DSM [2]) have defined distinguishing behaviors in daily lives that help diagnosing disorders. However, traditional interventions based on face-to-face interviews or self-report questionnaires are expensive and hysteretic. The potential antipathy towards consulting psychiatrists exacerbates these problems.

Social media platforms, like Twitter and Weibo, have become increasingly prevalent for users to express themselves and interact with friends. The user-generated content (UGC) shared in such platforms may help to better understand the real-life state and emotion of users in a timely manner, making the analysis of the users’ mental wellness feasible. Underlying these discoveries, research efforts have also been devoted for early detection of mental problems.

In this talk, I focused on the timely detection of mental wellness, focusing on typical mental problems: stress and depression. Starting with binary user-level detection, I expanded the research by considering the trigger and the severity of the mental problems, involving different social media platforms that are popular in different cultures. I presented my recent progress from three prespectives:

  1. Through self-reported sentence pattern matching, I constructed a series of large-scale well-labeled datasets in the field of online mental health analysis;
  2. Based on previous psychological research, I extracted multiple groups of discriminating features for detection and presented several multi-modal models targeting at different contexts. I conducted extensive experiments with my models, demonstrating significantly better performance as compared to the state-of-the-art methods; and
  3. I investigated in detail the contribution per feature, of online behaviors and even cultural differences in different contexts. I managed to reveal behaviors not covered in traditional psychological criteria, and provided new perspectives and insights for current and future research.

My developed mental health care applications were also demonstrated in the end.

Dr. B. Prabhakaran indicated that mental health understanding is a difficult problem, even for trained doctors, and we will need to work with psychiatrist sooner than later. Thanks to his valuable comments, regarding possible future directions, I envisage the use of augmented / mixed reality to create different immersive “controlled” scenarios where human behavior can be studied. I consider for example to create stressful situations (such as exams, missing a flight, etc.), for better understanding depression. Especially for depression, I plan to incorporate EEG sensor data in my studies.

[1] https://www.who.int/classifications/icd/en/

[2] https://www.psychiatry.org/psychiatrists/practice/dsm

Towards Micro-Video Understanding

By Liqiang Nie, Shandong University, China

We are living in the era of ever-dwindling attention span. To feed our hunger for quick content, bite-sized videos embracing the philosophy of “shorter-is-better”, are becoming popular with the rise of micro-video sharing services. Typical services include Vine, Snapchat, Viddy, and Kwai. Micro-videos like a wildfire are very popular and taking over the content and social media marketing space, in virtue of their value in brevity, authenticity, communicability, and low-cost. Micro-videos can benefit lots of commercial applications, such as brand building. Despite their value, the analysis and modeling of micro-videos is non-trivial due to the following reasons:

  1. micro-videos are short in length and of low quality;
  2. they can be described by multiple heterogeneous channels, spanning from social, visual, and acoustic to textual modalities;
  3. they are organized into a hierarchical ontology in terms of semantic venues; and
  4. there are no available benchmark dataset on micro-videos.

In my talk, I introduced some shallow and deep learning models for micro-video understanding that are worth studying and have proven effective:

  1. Popularity Prediction. Among the large volume of micro-videos, only a small portion of them will be widely viewed by users, while most will only gain little attention. Obviously, if we can identify in advance the hot and popular micro-videos, it will benefit many applications, like the online marketing and network reservation;
  2. Venue Category Estimation. In a random sample over 2 million Vine videos, I found that only 1.22% of the videos are associated with venue information. Including location information about the videos can benefit multifaceted aspects, such as footprints recording, personalized applications, and other location-based services, it is thus highly desired to infer the missing geographic cues;
  3. Low quality sound. As the quality of the acoustic signal is usually relatively low, simply integrating acoustic features with visual and textual features often leads to suboptimal results, or even adversely degrades the overall quality.

In the future, I may try some other meaningful tasks such as micro-video captioning or tagging and detection of unsuitable content. As many micro-videos are annotated with erroneous words, namely the topic tags or descriptions are not well correlated to the content, this negatively influences other applications, such as textual query search. It is common that users upload many violence and erotic videos. At present, the detection and alert tasks mainly rely on labor-intensive inspection. I plan to create systems that automatically detect erotic and violence content.

During the presentation, the audience asked about the datasets used in my work. In my previous work, all the videos come from Vine, but this service has been closed. The audience wondered how I will build the dataset in the future. As there are many other micro-video sites, such as Kwai and Instagram, I hence can obtain sufficient data from them to support my further research.

Opinion Column: Survey on ACM Multimedia

For this edition of the Opinion Column, happening in correspondence with ACM Multimedia 2018, we launched a short community survey regarding their perception of the conference. We prepared the survey together with senior members of the community, as well as the organizers of ACM Multimedia 2019. You can find the full survey here.

image1_opinion

Overall, we collected 52 responses. The participant sample was slightly skewed towards more senior members of the community: around 70% described themselves are full, associate or assistant professors. Almost 20% were research scientists from industry. Half of the participants were long-term contributors of the conference, having attended more than 6 editions of ACM MM, however only around a quarter of the participants had attended the last edition of MM in Seoul, Korea.

First, we asked participants to describe what ACM Multimedia means for them, using 3 words. We aggregated the responses in the word cloud below. Bigger words correspond to words with higher frequency. Most participants associated MM with prestigious and high quality content, and with high diversity of topics and modalities. While recognizing its prestige, some respondents showed their interest in a modernization of the MM focus.

image2_opinion

Next, we asked respondents “What brings you to ACM Multimedia?”, and provided a set of pre-defined options including “presenting my research”, “networking”, “community building”,  “ACM MM is at the core of my scientific interests” and “other” (free text). 1 on 5 participants selected all options as relevant to their motivation behind attending Multimedia. The large majority of participants (65%) declare to attend ACM Multimedia to present research and do networking. By inspecting the free-text answers in the “other” option, we found that some people were interested in specific tracks, and that others see MM as a good opportunity to showcase research to their graduate students.

The next question was about paper submission. We wanted to characterize what pushes researchers to submit to ACM multimedia. We prepared 3 different statements capturing different dimensions of analysis, and asked participants to rate them on a 5-point scale, from “Strongly disagree” (1), to “Strongly agree” (5).

The distribution of agreement for each question is shown in the plot below. Participants tend to neither disagree nor agree about Multimedia as the only possible venue for their papers (average agreement score 2.9); they generally disagreed with the statement “I consider ACM Multimedia mostly to resubmit papers rejected from other venues” (average score 2.0), and strongly agreed on the idea of MM as a premier conference (average score 4.2).

image3_opinion

One of the goals of this survey was to help the future Program Chairs of MM 2019 understand the extent to which participants agree with the reviewers’ guidelines that will be introduced in the next edition of the conference. To this end, we invited respondents to express their agreement with a fundamental point of these guidelines: “Remember that the problem [..] is expected to involve more than a single modality, or [..] how people interpret and use multimedia. Papers that address a single modality only and also fail to contribute new knowledge on human use of multimedia must be rejected as out of scope for the conference”.  Around 60% agreed or strongly agreed with this statement, while slightly more than 25% disagreed or strongly disagreed. The remaining 15% had no opinion about the statement.

We also asked participants to share with us any further comment regarding this last question or ACM MM in general. People generally approved the introduction of these reviewing guidelines, and the idea of multiple modalities and human perception and applications of multimedia. Some suggested that, given the re-focusing implied by this new reviewing guidelines, the instructions should be made more specific i.e. chairs should clarify the definition of “involve”: how multimodal should the paper be?

Others encouraged to clarify even further the broader scope of ACM Multimedia, defining its position with respect to other multimedia conferences (MMsys, MMM), but also with computer vision conferences such as CVPR/ECCV (and avoid conference dates overlapping).

Some comments proposed to rate papers based on the impact on the community, and on the level of innovation even in  a single modality, as forcing multiple modalities could “alienate” community members.

Beyond reviewing guidelines, a major theme emerging from the free-text comments was about diversity in ACM Multimedia. Several participants called for more geographic diversity in participants and paper authors. Some also noted that more turn-over in the organizing committees should be encouraged. Finally, most participants brought up the need for more balance in MM topics: it was brought up that, while most accepted papers are under the general umbrella of “Multimedia Content Understanding”, MM should encourage in the future more paper about systems, arts, and other emerging topics.

With this bottom-up survey analysis, we aimed to give voice to the major themes that the multimedia community cares about, and hope to continue doing so in the future editions of this column. We would like to thank all researchers and community members who gave their contribution by shaping and filling this survey, and allowed us to get a broader picture of the community perception of ACM MM!

An interview with Géraldine Morin

Please describe your journey into research from your youth up to the present. What foundational lessons did you learn from this journey? Why were you initially attracted to multimedia?

My journey into research was not such a linear path (or ’straight path’ as some French institutions put it —a criteria for them to hire)… I started convinced that I wanted to be a high school math teacher. Since I was accepted in a Math and CS engineering school after a competitive exam, I did accept to study there, working in parallel towards a pure math degree.
The first year, I did manage to follow both curricula (taking two math exams in September), but it was quite a challenge and the second year I gave up on the math degree to keep following the engineering curricula.
I finished with a master degree in applied Math (back then fully included in the engineering curricula) and really enjoyed working on the Master thesis (I did my internship in Kaiserslautern, Germany) so I decided to apply for a Ph.D. grant.
I made it into the Ph.D. program in Grenoble and liked my Ph.D. topic in geometric modelling but had a hard time with my advisor there.
So I decided after two years to give up, (passed a motorcycle driving licence) and went on teaching Math in high school for a year (also passed the teacher examination). Encouraged by my former German Master thesis advisor, I then applied for a Ph.D. program at Rice University in the US to work with Ron Goldman, a researcher whose work and papers I really liked. I got the position and really enjoyed doing research there.
After a wedding, a kid, and finishing the Ph.D. (in that order) I had moved to Germany to live with my husband and found a Postdoc position in Berlin for one year. I applied then to Toulouse, where I have stayed since. In Toulouse, I was hired in a Computer Vision research group, where a subgroup of people were tackling problems in multimedia, and offered me the chance to be the 3D-person of their team 🙂

I learned that a career, or research path, is really shaped by the people you meet on your way, for good or bad. Perseverance for something you enjoy is certainly necessary, and not staying in a context that do not fit you is also important! I am glad I did start again after giving up at first, but also do not regret my choice to give up either.

Research topic, and research areas, are important and a good match with your close collaborators is also very relevant to me. I really enjoy the multimedia community for that matter. The people are open minded and curious, and very encouraging… At multimedia conferences I always feel that my research is valued and relevant to the field (in the other communities, CG or CV, I sometimes get a remark like, ‘oh well, I guess you are not really doing C{G|V}’ …). Multimedia also has a good balance between theory and practice, and that’s fun !

Visit in Chicago during my Ph.D. in the US.

Visit in Chicago during my Ph.D. in the US.

 

Tell us more about your vision and objectives behind your current roles? What do you hope to accomplish and how will you bring this about?

I just took the responsibility of a department, while we are changing the curricula. This is a lot of organisation and administrative work, but also forces me to have a larger vision of how the field of computer science is evolving and what is important to teach. Interestingly, we prepare our student for jobs that do not exist yet ! This new challenge for me, also makes me realise how important it is to keep time for research, and the open-mindedness I get from my research activity.

Can you profile your current research, its challenges, opportunities, and implications?

As I mentioned before, currently, my challenge is to be able to keep on being active in research. I follow up on two paths: first in geometric modeling, trying to bridge the gap between my current interest in skeleton based models and two hot topics that are 3D printing, and machine learning.
The second is to continue working in multimedia, in distributing 3D content in a scalable way.
Concerning my implication, I am also currently co-heading the French geometric modeling group, and I very much appreciate to promote our research community, and contribute to keep it active and recognised.

How would you describe the role of women especially in the field of multimedia?

I have participated in my first women in MM meeting in ACM, and very much appreciated it. I have to admit I was not really interested in women targeted activities before I did participate in my first women workshop (WiSH – Women in SHape) in 2013, that brought groups on women to collaborate during one week… that was a great experience, that made me realise that, despite the fact that I really enjoy working with my -almost all male- colleagues, it was also fun and very inspiring to work with women groups. Moreover, being questioned by younger colleagues about the ability for a woman to have a family and faculty job, I now think that my good experience as a faculty and mother of 3 should be shared when needed.

How would you describe your top innovative achievements in terms of the problems you were trying to solve, your solutions, and the impact it has today and into the future?

My first contributions were in a quite theoretical field : during my Ph.D. I proposed to use analytic functions in a geometric modeling context. That raised some convergence issues that I managed to prove.
Later, I really enjoyed working with collaborators and proposing a shared topic with my colleague Romulus who worked on streaming, we started in 2006 to work on 3D streaming; that led us to collaborating with Wei Tsang Ooi for the National University of Singapore and for more than 12 years, we have been now advancing some innovative solutions for the distribution of 3D content, working on adapted 3D models for me, and system solutions for them… implying along the way new colleagues. Along the way, we won the best paper award for my Ph.D. student paper in the ACM MM in 2008 (I am very proud of that —despite the fact that I could not attend the conference, I gave birth between submission and conference ;).

Over your distinguished career, what are your top lessons you want to share with the audience?

A very simple one: Enjoy what you do! and work will be fun.
For me, I am amazed thinking over new ideas always remain so exciting 🙂

What is the best joke you know? 🙂

hard one !

Jogging in the morning to N Seoul Tower for sunrise, ACM-MM 2018.

Jogging in the morning to N Seoul Tower for sunrise, ACM-MM 2018.

 

If you were conducting this interview, what questions would you ask, and then what would be your answers?

I have heard there are very detailed studies, especially in the US about difference between male and female behaviour.
It seems that being aware of these helps. For example, women tend to judge themselves harder that men do…
(that’s not really a question and answer, more a remark :p )

Another try:
Q: What would make you feel confident/helps you get over challenges ?
A: I think I lack self confidence, and I always ask for a lot of feedback from colleagues (for examples for dry runs).
If I get good feedback, it boosts my confidence, if I get worst feedback, it helps me improve… I win both ways 🙂

 


Bios

Assoc. Prof. Géraldine Morin: 

Je suis Maître de conférences à l’ENSEEIHT, l’une des écoles de l’Institut National Polytechnique de Toulouse de l’Université de Toulouse, et j’effectue ma recherche à l’IRIT (UMR CNRS 5505). Avant de m’installer à Toulouse, j’étais Grenobloise et j’ai été diplomée de l’ENSIMAG (diplôme d’ingénieur) et de l’ Université Joseph Fourier (D.E.A. de mathématiques appliquées) ainsi qu’une licence de maths purs que j’ai suivi en parallèle à ma première année d’école d’ingénieur. J’ai ensuite fait une thèse en Modélisation Géométrique aux Etats-Unis à (Rice University) (“Analytic Functions for Computer Aided Geometric Design”) sous la direction de Ron Goldman. Ensuite, j’ai fait un postdoc d’un an en géométrie algorithmique, à la Freie Universität de Berlin.