Report on QoMEX 2019: QoE and User Experience in Times of Machine Learning, 5G and Immersive Technologies

qomex2019_logo

The QoMEX 2019 was held from 5 to 7 June 2019 in Berlin, with Sebastian Möller (TU Berlin and DFKI) and Sebastian Egger-Lampl (AIT Vienna) as general chairs. The annual conference celebrated its 10th birthday in Berlin since the first edition in 2009 in San Diego. The latter focused on classic multimedia voice, video and video services. Among the fundamental questions back then were how to measure and how to quantify quality from the user’s point of view in order to improve such services? Answers to these questions were also presented and discussed at QoMEX 2019, where technical developments and innovations in terms of video and voice quality were considered. The scope has however broadened significantly over the last decade: interactive applications, games and immersive technologies, which require new methods for the subjective assessment of perceived quality of service and QoE, were addressed. With a focus on 5G and its implications for QoE, the influence of communication networks and network conditions for the transmission of data and the provisioning of services were also examined. In this sense, QoMEX 2019 looked at both classic multimedia applications such as voice, audio and video as well as interactive and immersive services: gaming QoE, virtual realities such as VR exergames, and augmented realities such as smart shopping, 360° video, Point Clouds, Web QoE, text QoE, perception of medical ultrasound videos for radiologists, QoE of visually impaired users with appropriately adapted videos, QoE in smart home environments, etc.

In addition to this application-oriented perspective, methodological approaches and fundamental models of QoE were also discussed during QoMEX 2019. While suitable methods for carrying out user studies and assessing quality remain core topics of QoMEX, advanced statistical methods and machine learning (ML) techniques emerged as another focus topic at this year’s QoMEX. The applicability, performance and accuracy of e.g. neural networks or deep learning approaches have been studied for a wide variety of QoE models and in several domains: video quality in games, content of image quality and compression methods, quality metrics for high-dynamic-range (HDR) images, instantaneous QoE for adaptive video streaming over the Internet and in wireless networks, speech quality metrics, and ML-based voice quality improvement. Research questions addressed at QoMEX 2019 include the impact of crowdsourcing study design on the outcomes, or the reliability of crowdsourcing, for example, in assessing voice quality. In addition to such data-driven approaches, fundamental theoretical work on QoE and its quantification in systems as well as fundamental relationships and model approaches were presented.

The TPC Chairs were Lynne Baillie (HWU Edinburgh), Tobias Hoßfeld (Univ. Würzburg), Katrien De Moor (NTNU Trondheim), Raimund Schatz (AIT Vienna). In total, the program included 11 sessions on the above topics. From those 11 sessions, 6 sessions on dedicated topics were organized by various Special Session organizers in an open call. A total of 82 full paper contributions were submitted, out of which 35 contributions were accepted (acceptance rate: 43%). Out of the 77 short papers submitted, 33 were accepted and presented in two dedicated poster sessions. The QoMEX 2019 Best Paper Award went to Dominik Keller, Tamara Seybold, Janto Skowronek and Alexander Raake for “Assessing Texture Dimensions and Video Quality in Motion Pictures using Sensory Evaluation Techniques”. The Best Student Paper Award went to Alexandre De Masi and Katarzyna Wac for “Predicting Quality of Experience of Popular Mobile Applications in a Living Lab Study”.

The keynote speakers addressed several timely topics. Irina Cotanis gave an inspiring talk on QoE in 5G. She addressed both the emerging challenges and services in 5G and the question of how to measure quality and QoE in these networks. Katrien De Moor highlighted the similarities and differences between QoE and User Experience (UX), considering the evolution of the two terms QoE and UX in the past and current status. An integrated view of QoE and UX was discussed and how the two concepts develop in the future. In particular, she posed the question how the two communities could empower each other and what would be needed to bring both communities together in the future. The final day of QoMEX 2019 began with the keynote of artist Martina Menegon, who presented some of her art projects based on VR technology.

Additional activities and events within QoMEX 2019 comprised the following. (1) In the Speed ​​PhD mentoring organized by Sebastian Möller and Saman Zadtootaghaj, the participating doctoral students could apply for a short mentoring session (10 minutes per mentor) with various researchers from industry and academia in order to ask technical or general questions. (2) In a session organized by Sebastian Egger-Lampl, the best works of the last 5 years of the simultaneous TVX Conference and QoMEX were presented to show the similarities and differences between the QoE and the UX communities. This was followed by a panel discussion. (3) There was a 3-minute madness session organized by Raimund Schatz and Tobias Hoßfeld, which featured short presentations of “crazy” new ideas in a stimulating atmosphere. The intention of this second session is to playfully encourage the QoMEX community to generate new unconventional ideas and approaches and to provide a forum for mutual creative inspiration.

The next edition, QoMEX 2020, will be held May 26th to 28th 2020 in Athlone, Ireland. More information:  http://qomex2020.ie/

Report from MMSYS 2019 – by Alia Sheikh

Alia Sheikh (@alteralias) is researching immersive and interactive content. At present she is interested in the narrative language of immersive environments and how stories can best be choreographed within them.

Being part of an international academic research community and actually meeting said international research community are not exactly the same thing it turns out. After attending the 2019 ACM MMSys conference this year, I have decided that leaving the office and actually meeting the people behind the research is very worth doing.

This year I was invited to give an overview presentation at ACM MMSys ’19, which was being hosted at the University of Massachusetts. The MMSys, NOSSDAV and MMVE (International Workshop on Immersive Mixed and Virtual Environment Systems) conferences happen back to back, in a different location each year. I was asked to talk about some of our team’s experiments in immersive storytelling at MMVE. This included our current work on lightfields and my work on directing attention in, and the cinematography of, immersive environments.

To be honest it wasn’t the most convenient time to decide to catch a plane to New York and then a train to Boston for a multi-day conference, but it felt like the right time to take a break from the office and find out what the rest of the community had been working on.

Fig.1: A picturesque scene from the wonderful University of Massachussetts Amherst campus

Fig.1: A picturesque scene from the wonderful University of Massachussetts Amherst campus

I arrived at Amherst the day before the conference and (along with another delegate who had taken the same bus) wandered the tranquil university grounds slightly lost before being rescued by the ever calm and cheerful Michael Zink. Michael is the chair of the MMSys organising committee and someone who later spent much of the conference introducing people with shared interests to each other – he appeared to know every delegate by name.

Once installed in my UMass hotel room, I proceeded to spend the evening on my usual pre-conference ritual: entirely rewriting my presentation.

As the timetable would have it, I was going to be the first speaker.

Fig 2: Attendees at MMSys 2019 taking their seats

Fig. 2: Attendees at MMSys 2019 taking their seats

Fig 3: Alia in full flow during our talk on day 1

Fig. 3: Alia in full flow during our talk on day 1

I don’t actually know why I do this to myself, but there is something about turning up to the event proper that gives you a sense of what will work for that particular audience, and Michael had given me a brilliantly concise snapshot of the type of delegate that MMSys attracts – highly motivated, expert on the nuts and bolts of how to get data to where it needs to be and likely to be interested in a big picture overview of how these systems can be used to create a meaningful human connection.

Using selected examples from our research, I put together a talk on how the experience of stories in high tech immersive environments differs from more traditional formats, but, once the language of immersive cinematography is properly understood, we find that we are able to create new narrative experiences that are both meaningful and emotionally rich.

The next morning I walked into an auditorium full of strangers filing in, gave my talk (I thought it went well?) and then sank happily into a plush red flip-seat chair safe in the knowledge that I was free to enjoy the rest of the event.

The next item was the keynote and easily one of the best talks I have ever experienced at a conference. Presented by Professor Nimesha Ranasinghe it was a masterclass in taking an interesting problem (how do we transmit a full sensory experience over a network?) And presenting it in such a way as to neatly break down and explain the science (we can electrically stimulate the tongue to recreate a taste!) while never losing sight of the inherent joy in working on the kind of science you dream of as a child (therefore electrified cutlery!).

Fig. 4: Professor Nimesha Ranasinghe during his talk on Multisensory experiences

Fig. 4: Professor Nimesha Ranasinghe during his talk on Multisensory experiences

Fig 5: Multisensory enhanced multimedia - experiences of the future ?

Fig. 5: Multisensory enhanced multimedia – experiences of the future ?

Fig6: Networking and some delicious lunch

Fig. 6: Networking and some delicious lunch

At lunch I discovered the benefit of having presented my talk early – I made a lot of friends with people who had specific questions about our work, and got a useful heads up on work they were presenting either in the afternoon’s long papers session or the poster session.

We all spent the evening at the welcome reception on the top floor of UMass Hotel, where we ate a huge variety of tiny, delicious cakes and got to know each other better. It was obvious that in some cases, researchers that might collaborate remotely all year, were able to use MMSys as an excellent opportunity to catch up. As a newcomer to this ACM conference however, I have to say that I found it a very welcoming event, and I met a lot of very friendly people many of them working on research that was entirely different to my own, but which seemed to offer an interesting insight or area of overlap.

I wasn’t surprised that I really enjoyed MMVE – virtual environments are very much my topic of interest right now. But I was delighted by how much of MMSys was entirely up my street. ACM MMSys provides a forum for researchers to present and share their latest research findings in multimedia systems, and the conference cuts across all media/data types to showcase the intersections and the interplay of approaches and solutions developed for different domains. This year, the work presented on how to best encode and transport mixed reality content, as well as predict head motion to better encode and deliver the part of a spherical panorama a viewer was likely to be looking at, was particularly interesting to me. I wondered whether comparing the predicted path of user attention to the desired path of user attention, would teach us how to better control a users attention within a panoramic scene, or whether peoples viewing patterns were simply too variable. In the Open Datasets & Software track, I was fascinated by one particular dataset: “ A Dataset of Eye Movements for the Children with Autism Spectrum Disorder”. This was a timely reminder for me that diversity within the audience needed to be catered for when designing multimedia systems, to avoid consigning sections of our audience to a substandard experience.

Of the demos, there were too many interesting ones to list, but I was hugely impressed by the demo for Multi-Sensor Capture and Network Processing for Virtual Reality Conferencing. This used cameras and Kinects to turn me into a point cloud and put a live 3D representation of my own physical body in a virtual space.A brilliantly simple and incredibly effective idea and I found myself sitting next to the people responsible for it at a talk later that day and discussing ways to optimise their data compression.

Despite wearing a headset that allowed me to see the other participants, I was still able to see and therefore use my own hands in the real world – even extending to picking up and using my phone.

Fig7: Trying out some cool demos during a bustling demo session

Fig. 7: Trying out some cool demos during a bustling demo session

Fig. 8: An example of the social media interaction from my "tweeting"

Fig. 8: An example of the social media interaction from my “tweeting”

Amusingly, I found that I was (virtually) sat next to a point-cloud of TNO researcher Omar Niamut which led to my favourite twitter exchange of the whole conference. I knew Omar from online, but we had never actually managed to meet in real life. Still, this was the most life-like digital incarnation yet!

I really should mention the Women’s and Diversity lunch event which (pleasingly) was attended by both men and women and offered some absolutely fascinating insights.

These included: the value of mentors over the course of a successful academic life, how a gender pay-gap is inextricably related to work family policies and steps that have successfully been taken by some countries and organisations to improve work-life balance for all genders.

It was incredibly refreshing to see these topics being discussed both scientifically and openly. The conversations I had with people afterwards as they opened up about their own experiences of work and parenthood, were among the most interesting I have ever had on the topic.

Another nice surprise – MMSys offers childcare grants available for conference attendees who are bringing small children to the conference and require on-site childcare or who incur extra expenses in leaving their children at home. It was very cheering to see that the Inclusion Policy did not stop at simply providing interesting talks, but also translated into specific inclusive action.

Fig. 9:  Women’s and Diversity lunch! What a wonderful initiative - well done MMSys and SIGMM

Fig. 9: Women’s and Diversity lunch! What a wonderful initiative – well done MMSys and SIGMM

I am delighted that I made the decision to attend MMSys. I had not realised that I was feeling somewhat detached from my peers and the academic research community in general, until I was put in an environment which contained a concentrated amount of interesting research, interesting researchers and an air of collaboration and sheer good will. It is easy to get tunnel vision when you are focused on your own little area of work, but every conversation I had at the conference reminded me that research does not happen in a vacuum.

Fig. 10: A fascinating talk at the  Women’s and Diversity lunch - it initiated great post event discussions!

Fig. 10: A fascinating talk at the Women’s and Diversity lunch – it initiated great post event discussions!

Fig. 11: The food truck experience - one of many wonderful social aspects to MMSys 2019

Fig. 11: The food truck experience – one of many wonderful social aspects to MMSys 2019

I could write a thousand more words about every interesting thing I saw or person I met at MMSys, but that would only give you my own specific experience of the conference. (I did live tweet* a lot of the talks and demos just for my own records and that can all be found here: https://twitter.com/Alteralias/status/1148546945859952640?s=20)

Fig. 12: Receiving the SIGMM Social Media Reporter Award for MMSys 2019!

Fig. 12: Receiving the SIGMM Social Media Reporter Award for MMSys 2019!

Whether you were someone I was sitting next to at a paper session, a person I spoke to standing next to in line at the food truck (one of the many sociable meal events) or someone who demoed their PhD work to me, thank you so much for sharing this event with me.

Maybe I will see you at MMSys 2020.

* p.s it turns out that if you live-tweet an entire conference, Niall gives you a Social Media Reporter award.

Report from QoE-Management 2019

The 3rd International Workshop on Quality of Experience Management (QoE-Management 2019) was a successful full day event held on February 18, 2019 in Paris, France, where it was co-located with the 22nd Conference on Innovation in Clouds, Internet and Networks (ICIN). After the success of the previous QoE-Management workshops, the third edition of the workshop was also endorsed by the QoE and Networking Initiative (http://qoe.community). It was organized by workshop co-chairs Michael Seufert (AIT, Austrian Institute of Technology, Austria, who is now at University of Würzburg, Germany), Lea Skorin-Kapov (University of Zagreb, Croatia) and Luigi Atzori (University of Cagliari, Italy). The workshop attracted 24 full paper and 3 short paper submissions. The Technical Program Committee consisted of 33 experts in the field of QoE Management, which provided at least three reviews per submitted paper. Eventually, 12 full papers and 1 short paper were accepted for publication, which gave an acceptance rate of 48%.

On the day of the workshop, the co-chairs welcomed 30 participants. The workshop started with a keynote given by Martín Varela (callstats.io, Finland) who elaborated on “Some things we might have missed along the way”. He presented open technical and business-related research challenges for the QoE Management community, which he supported with examples from his current research on the QoE monitoring of WebRTC video conferencing. Afterwards, the first two technical sessions focused on video streaming. Susanna Schwarzmann (TU Berlin, Germany) presented a discrete time analysis approach to compute QoE-relevant metrics for adaptive video streaming. Michael Seufert (AIT Austrian Institute of Technology, Austria) reported the results of an empirical comparison, which did not find any differences in the QoE between QUIC- and TCP-based video streaming for naïve end users. Anika Schwind (University of Würzburg, Germany) discussed the impact of virtualization on video streaming behavior in measurement studies. Maria Torres Vega (Ghent University, Belgium) presented a probabilistic approach for QoE assessment based on user’s gaze in 360° video streams with head mounted displays. Finally, Tatsuya Otoshi (Osaka University, Japan) outlined how quantum decision making-based recommendation methods for adaptive video streaming could be implemented.

The next session was centered around machine learning-based quality prediction. Pedro Casas (AIT Austrian Institute of Technology) presented a stream-based machine learning approach for detecting stalling in real-time from encrypted video traffic. Simone Porcu (University of Cagliari, Italy) reported on the results of a study investigating the potential of predicting QoE from facial expressions and gaze direction for video streaming services. Belmoukadam Othmane (Cote D’Azur University & INRIA Sophia Antipolis, France) introduced ACQUA, which is a lightweight platform for network monitoring and QoE forecasting from mobile devices. After the lunch break, Dario Rossi (Huawei, France) gave the second keynote, entitled “Human in the QoE loop (aka the Wolf in Sheep’s clothing)”. He used the main leitmotiv of Web browsing and showed relevant practical examples to discuss the challenges towards QoE-driven network management and data-driven QoE models based on machine learning.

The following technical session was focused on resource allocation. Tobias Hoßfeld (University of Würzburg, Germany) elaborated on the interplay between QoE, user behavior and system blocking in QoE management. Lea Skorin-Kapov (University of Zagreb, Croatia) presented studies on QoE-aware resource allocation for multiple cloud gaming users sharing a bottleneck link. Quality monitoring was the topic of the last technical session. Tomas Boros (Slovak University of Technology, Slovakia) reported how video streaming QoE could be improved by 5G network orchestration. Alessandro Floris (University of Cagliari, Italy) talked about the value of influence factors data for QoE-aware management. Finally, Antoine Saverimoutou (Orange, France) presented WebView, a measurement platform for web browsing QoE. The workshop co-chairs closed the day with a short recap and thanked all speakers and participants, who joined in the fruitful discussions. To summarize, the third edition of the QoE Management workshop proved to be very successful, as it brought together researchers from both academia and industry to discuss emerging concepts and challenges related to managing QoE for network services. As the workshop has proven to foster active collaborations in the research community over the past years, a fourth edition is planned in 2020.

We would like to thank all the authors, reviewers, and attendants for their precious contributions towards the successful organization of the workshop!

Michael Seufert, Lea Skorin-Kapov, Luigi Atzori
QoE-Management 2019 Workshop Co-Chairs

Report from ACM MM 2018 – by Ana García del Molino

Seoul, what a beautiful place to host the premier conference on multimedia! Living in never-ending summer Singapore, I fell in love with the autumn colours of this city. The 26th edition of the ACM International Conference on Multimedia was held on October 22-26 of 2018 at the Lotte Hotel in Seoul, South Korea. It packed a full program including a very diverse range of workshops and tutorials, oral and poster presentations, art exhibits, interactive demos, competitions, industrial booths, and plenty of networking opportunities.

For me, this edition was a special one. About to graduate, with my thesis half written, I was presenting two papers. So of course, I was both nervous and excited. I had to fly to Seoul a few days ahead just to prepare myself! I was so motivated, I somehow managed to get myself a Best Social Media Reporter Award (who would have said… Me! A reporter!).

So, enough with the intro. Let’s get to the juice. What happened in Seoul between the 22nd and 26th of October 2018?

The first and last day of the conference were dedicated to Workshops and Tutorials. Those were a mix between Deep Learning themed and social applications of multimedia. The sessions included tutorials like “Interactive Video Search: Where is the User in the Age of Deep Learning?” that discussed the importance of the user in the collection of datasets, evaluation, and also interactive search, as opposed to using deep learning to solve challenges with big labelled datasets. In “Deep Learning Interpretation” Jitao Sang presented the main multimedia problems that can’t be addressed using deep learning. On the other hand, new and important trends related to social media (analysis of information diffusion and contagion, user activities and networking, prediction of real-world events, etc) were discussed in the tutorial “Social and Political Event Analysis using Rich Media”. The workshops were mainly user-centred, with special interest in affective computing and emotion analysis and use for multimedia (EE-USAD, ASMMC – MMAC 2018, AVEC 2018).

The conference kick-started with a wonderful keynote by Marianna Obrist. With “Don’t just Look – Smell, Taste, and Feel the Interaction” she showed us how to bring art into 4D by using technology, driving us through a full sensory experience that let us see, hear, and almost touch and smell. Ernest Edmonds also delved into how to mix art and multimedia in “What has art got to do with it?” but this time the other way around: what can multimedia research learn from the artists? Three industry speakers completed the keynote program. Xian-Sheng Hua from Alibaba Group shared their efforts towards visual Intelligence in “Challenges and Practices of Large-Scale Visual Intelligence in the Real-World”. Gary Geunbae Lee shared Samsung’s AI user experience strategy in “Living with Artificial Intelligence Technology in Connected Devices around Us.” And Bowen Zhou presented JD.com’s brand-new concept of Retail as a Service in “Transforming Retailing Experiences with Artificial Intelligence”.

This year’s program included 209 full papers, from a total of 757 submissions. 64 papers were allocated 15-minute oral presentations, while the others got a 90-second spotlight slot in the fast-forward sessions.  The poster sessions and the oral sessions run at the same time. While this was an inconvenience for poster presenters having to leave the poster to attend the oral sessions or miss them, the coffee breaks took place at the same location as the posters, so that was a win-win: chit-chat while having cookies and fruits? I’m in! In terms of content, half of the submissions were to only two areas: Multimedia and Vision and Deep Learning for Multimedia. But who am I to judge, when I had two of those myself! Many members of the community noted that the conference is becoming more and more deep learning, and less multimodal. To compensate, the workshops, tutorials and demos were mostly pure multimedia.

The challenges, competitions, art exhibits and demos happened in the afternoons, so at times it was hard to choose where to head to. So many interesting things happening all around the place! The art exhibit had some really cool interactive art installations, such as “Cellular Music”, that created music from visual motion. Among the demos, I found particularly interesting AniDance, an LSTM-based algorithm that made 3D models dance to the given music; SoniControl, an ultrasonic firewall for NFC protection; MusicMapp, a platform to augment how we experience music; and The Influence Map project, to explore who has influenced each scientist, and who did they most influence through their career.

Regarding diversity, I feel there is still a long way to go. Being in Asia, it makes sense that almost half of the attendees came from China. However, the submission numbers speak by themselves: less than 20% of submissions came from out of Asia, with just one submission from Africa (that’s a 0.13%!) Diversity is not only about gender, folks! I feel like more efforts are needed to facilitate the integration of more collectives in the multimedia community. One step at a time.

The next edition will take place at the NICE ACROPOLIS Convention Center in Nice, France from 21-25 October 2019. The ACM reproducibility badge system will be implemented for the first time at this 27th edition, so we may be seeing many more open-sourced projects. I am so looking forward to this!

First Combined ACM SIGMM Strategic Workshop and Summer School in Stellenbosch, South Africa

The first combined ACM SIGMM Strategic Workshop and Summer School will be held in Stellenbosch, South Africa, in the beginning of July 2020.

Rooiplein

First ACM Multimedia Strategic Workshop

The first Multimedia Strategic Workshop follows the successful series of workshops in areas such as information retrieval. The field of multimedia has continued to evolve and develop: collections of images, sounds and videos have become larger, computers have become more powerful, broadband and mobile Internet are widely supported, complex interactive searches can be done on personal computers or mobile devices, and soon. In addition, as large business enterprises find new ways to leverage the data they collect from users, the gap between the types of research conducted in industry and academics has widened, creating tensions over “repeatability” and “public data” in publications. These changes in environment and attitude mean that the time has come for the field to reassess its assumptions, goals, objectives and methodologies. The goal is to bring together researchers in the field to discuss long-term challenges and opportunities within the field. 

The participants of Multimedia Strategic Workshop will be active researchers in the field of Multimedia. The strategic workshop will give these researchers the opportunity to explore long-term issues in the multimedia field, to recognise the challenges on the horizon, to reach consensus on key issues and to describe them in the resulting report that will be made available to the multimedia research community. The report will stimulate debate, provide research directions to both researchers and graduate students, and also provide funding agencies with data that can be used coordinate the support for research.

The workshop will be held at the Wallenberg Research Centre at the Stellenbosch Institute for Advanced Study (STIAS). STIAS provides  provides venues and state-of-the art equipment for up to 300 conference guests at a time as well as breakaway rooms. 

The First ACM Multimedia Summer School on Multimedia

The motivation of the proposed summer school is to build on the success of the Deep Learning Indaba, but to focus on the application of machine learning to the field of Multimedia. We want delegates to be exposed to current research challenges in Multimedia. A secondary goal is to establish and grow the community of African researchers in the field of Multimedia; and to stimulate scientific research and collaboration between African researchers and the international community. The exact topics covered during the summer school will decided later together with the instructors but will reflect the current research trends in Multimedia.

The Strategic Workshop will be followed by the Summer School on Multimedia. Having the first summer school co-located with the Strategic Workshop will help to recruit the best possible instructors for the summer school. 

The Multimedia Summer School on Multimedia will be held at the Faculty of Engineering at Stellenbosch University, which is one of South Africa’s major producers of top quality engineers. The faculty was established in 1944 and is housed in a large complex of buildings with modern facilities, including lectures halls and electronic classrooms.

Stellenbosch is a university town in South Africa’s Western Cape province. It’s surrounded by the vineyards of the Cape Winelands and the mountainous nature reserves of Jonkershoek and Simonsberg. The town’s oak-shaded streets are lined with cafes, boutiques and art galleries. Cape Dutch architecture gives a sense of South Africa’s Dutch colonial history, as do the Village Museum’s period houses and gardens.

For more information about both events, please refer to the events’ web site (africanmultimedia.acm.org) or contact the organizers:

Report from ACM ICMR 2018 – by Cathal Gurrin

 

Multimedia computing, indexing, and retrieval continue to be one of the most exciting and fastest-growing research areas in the field of multimedia technology. ACM ICMR is the premier international conference that brings together experts and practitioners in the field for an annual conference. The eighth ACM International Conference on Multimedia Retrieval (ACM ICMR 2018) took place from June 11th to 14th, 2018 in Yokohama, Japan’s second most populous city. ACM ICMR 2018 featured a diverse range of activities including: Keynote talks, Demonstrations, Special Sessions and related Workshops, a Panel, a Doctoral Symposium, Industrial Talks, Tutorials, alongside regular conference papers in oral and poster session. The full ICMR2018 schedule can be found on the ICMR 2018 website <http://www.icmr2018.org/>. The organisers of ACM ICMR 2018 placed a large emphasis on generating a high-quality programme and in 2018; ICMR received 179 submissions to the main conference, with 21 accepted for oral presentation and 23 for poster presentation. A number of key themes emerged from the published papers at the conference: deep neural networks for content annotation; multimodal event detection and summarisation; novel multimedia applications; multimodal indexing and retrieval; and video retrieval from regular & social media sources. In addition, a strong emphasis on the user (in terms of end-user applications and user-predictive models) was noticeable throughout the ICMR 2018 programme. Indeed, the user theme was central to many of the components of the conference, from the panel discussion to the keynotes, workshops and special sessions. One of the most memorable elements of ICMR 2018 was a panel discussion on the ‘Top Five Problems in Multimedia Retrieval’ http://www.icmr2018.org/program_panel.html. The panel was composed of leading figures in the multimedia retrieval space: Tat-Seng Chua (National University of Singapore); Michael Houle (National Institute of Informatics); Ramesh Jain (University of California, Irvine); Nicu Sebe (University of Trento) and Rainer Lienhart (University of Augsburg). An engaging panel discussion was facilitated by Chong-Wah Ngo (City University of Hong Kong) and Vincent Oria (New Jersey Institute of Technology). The common theme was that multimedia retrieval is a hard challenge and that there are a number of fundamental topics that we need to make progress in, including bridging the semantic and user gaps, improving approaches to multimodal content fusion, neural network learning, addressing the challenge of processing at scale and the so called “curse of dimensionality”. ICMR2018 included two excellent keynote talks <http://www.icmr2018.org/program_keynote.html>. Firstly, Kohji Mitani, the Deputy Director of Science & Technology Research Laboratories NHK (Japan Broadcasting Corporation) explained about the ongoing evolution of broadcast technology and the efforts underway to create new (connected) broadcast services that can provide viewing experiences never before imagined and user experiences more attuned to daily life. The second keynote from Shunji Yamanaka, from The University of Tokyo discussed his experience of prototyping new user technologies and highlighted the importance of prototyping as a process that bridges an ever increasing gap between advanced technological solutions and societal users. During this entertaining and inspiring talk many prototypes developed in Yamanaka’s lab were introduced and the related vision explained to an eager audience. Three workshops were accepted for ACM ICMR 2018, covering the fields of lifelogging, art and real-estate technologies. Interestingly, all three workshops focused on domain specific applications in three emerging fields for multimedia analytics, all related to users and the user experience. The “LSC2018 – Lifelog Search Challenge”< http://lsc.dcu.ie/2018/> workshop was a novel and highly entertaining workshop modelled on the successful Video Browser Showdown series of participation workshops at the annual MMM conference. LSC was a participation workshop, which means that the participants wrote a paper describing a prototype interactive retrieval system for multimodal lifelog data. It was then evaluated during a live interactive search challenge during the workshop. Six prototype systems took part in the search challenge in front of an audience that reached fifty conference attendees. This was a popular and exciting workshop and could become a regular feature at future ICMR conferences. The second workshop was the MM-Art & ACM workshop <http://www.attractiveness-computing.org/mmart_acm2018/index.html>, which was a joint workshop that merged two existing workshops, the International Workshop on Multimedia Artworks Analysis (MMArt) and the International Workshop on Attractiveness Computing in Multimedia (ACM). The aim of the joint workshop was to enlarge the scope of discussion issues and inspire more works in related fields. The papers at the workshop focused on the creation, editing and retrieval of art-related multimedia content. The third workshop was RETech 2018 <https://sites.google.com/view/multimedia-for-retech/>, the first international workshop on multimedia for real estate tech. In recent years there has been a huge uptake of multimedia processing and retrieval technologies in the domain, but there are still a lot of challenges remaining, such as quality, cost, sensitivity, diversity, and attractiveness to users of content. In addition, ICMR 2018 included three tutorials <http://www.icmr2018.org/program_tutorial.html> on topical areas for the multimedia retrieval communities. The first was ‘Objects, Relationships and Context in Visual Data’ by Hanwang Zhang and Qianru Sun. The second was ‘Recommendation Technologies for Multimedia Content’ by Xiangnan He, Hanwang Zhang and Tat-Seng Chua and the final tutorial was ‘Multimedia Content Understanding, my Learning from very few Examples’ by Guo-Jun Qi. All tutorials were well received and feedback was very good. Other aspects of note from ICMR2018 were a doctoral symposium that attracted five authors and a dedicated industrial session that had four industrial talks highlighting the multimedia retrieval challenges faced by industry. It was interesting from the industrial talks to hear how the analytics and retrieval technologies developed over years and presented at venues such as ICMR were actually being deployed in real-world user applications by large organisations such as NEC and Hitachi. It is always a good idea to listen to the real-world applications of the research carried out by our community. The best paper session at ICMR 2018 had four top ranked works covering multimodal, audio and text retrieval. The best paper award went to ‘Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval’, by Niluthpol Mithun, Juncheng Li, Florian Metze and Amit Roy-Chowdhury. The best Multi-Modal Paper Award winner was ‘Cross-Modal Retrieval Using Deep De-correlated Subspace Ranking Hashing’ by Kevin Joslyn, Kai Li and Kien Hua. In addition, there were awards for best poster ‘PatternNet: Visual Pattern Mining with Deep Neural Network’ by Hongzhi Li, Joseph Ellis, Lei Zhang and Shih-Fu Chang, and best demo ‘Dynamic construction and manipulation of hierarchical quartic image graphs’ by Nico Hezel and Kai Uwe Barthel. Finally, although often overlooked, there were six reviewers commended for their outstanding reviews; Liqiang Nie, John Kender, Yasushi Makihara, Pascal Mettes, Jianquan Liu, and Yusuke Matsui. As with some other ACM sponsored conferences, ACM ICMR 2018 included an award for the most active social media commentator, which is how I ended up writing this report. There were a number of active social media commentators at ICMR 2018 each of which provided a valuable commentary on the proceedings and added to the historical archive.
fig1

Of course, the social side of a conference can be as important as the science. ICMR 2018 included two main social events, a welcome reception and the conference banquet. The welcome reception took place at the Fisherman’s Market, an Asian and ethnic dining experience with a wide selection of Japanese food available. The Conference Banquet took place in the Hotel New Grand, which was built in 1927 and has a long history of attracting famous guests. The venue is famed for the quality of the food and the spectacular panoramic views of the port of Yokohama. As with the rest of the conference, the banquet food was top-class with more than one of the attendees commenting that the Japanese beef on offer was the best they had ever tasted.

ICMR 2018 was an exciting and excellently organised conference and it is important to acknowledge the efforts of the general co-chairs: Kiyoharu Aizawa (The Univ. Of Tokyo), Michael Lew (Leiden Univ.) and Shin’ichi Satoh (National Inst. Of Informatics). They were ably assisted by the TPC co-chairs, Benoit Huet (Eurecom), Qi Tian (Univ. Of Texas at San Antonio) and Keiji Yanai (The Univ. Of Electro-Comm), who coordinated the reviews from a 111 person program committee in a double-blind manner, with an average of 3.8 reviews being prepared for every paper. ICMR 2019 will take place in Ottawa, Canada in June 2019 and ICMR 2020 will take place in Dublin, Ireland in June 2020. I hope to see you all there and continuing the tradition of excellent ICMR conferences.

The Lifelog Search Challenge Workshop attracted six teams for a real-time public interactive search competition.

The Lifelog Search Challenge Workshop attracted six teams for a real-time public interactive search competition.

The Lifelog Search Challenge Workshop attracted six teams for a real-time public interactive search competition.

The Lifelog Search Challenge Workshop attracted six teams for a real-time public interactive search competition.

Shunji Yamanaka about to begin his keynote talk on Prototyping

Shunji Yamanaka about to begin his keynote talk on Prototyping

Kiyoharu Aizawa and Shin'ichi Satoh, two of the ICMR 2018 General co-Chairs welcoming attendees to the ICMR 2018 Banquet at the historical Hotel New Grand.

Kiyoharu Aizawa and Shin’ichi Satoh, two of the ICMR 2018 General co-Chairs welcoming attendees to the ICMR 2018 Banquet at the historical Hotel New Grand.

SISAP 2018: 11th International Conference on Similarity Search and Applications

The International Conference on Similarity Search and Applications (SISAP) is an annual forum for researchers and application developers in the area of similarity data management. It aims at the technological problems shared by numerous application domains, such as data mining, information retrieval, multimedia, computer vision, pattern recognition, computational biology, geography, biometrics, machine learning, and many others that make use of similarity search as a necessary supporting service.

From its roots as a regional workshop in metric indexing, SISAP has expanded to become the only international conference entirely devoted to the issues surrounding the theory, design, analysis, practice, and application of content-based and feature-based similarity search. The SISAP initiative has also created a repository serving the similarity search community, for the exchange of examples of real-world applications, the source code for similarity indexes, and experimental testbeds and benchmark data sets (http://www.sisap.org). The proceedings of SISAP are published by Springer as a volume in the Lecture Notes in Computer Science (LNCS) series.

The 2018 edition of SISAP was held at the Universidad de Ingeniería y Tecnología (UTEC) in one of the oldest neighborhoods of Lima, in a modern building just recently inaugurated. The conference was held back-to-back, with a shared session, with the International Symposium on String Processing and Information Retrieval (SPIRE), an independent symposium with some intersection with SISAP. The organization was smooth and with a strong technical program assembled by two co-chairs and sixty program committee members. Each paper was reviewed by at least three referees. The program was completed with three invited speakers of high caliber.

During this 11th edition of SISAP, the first invited speaker was Hanan Samet (http://www.cs.umd.edu/~hjs/) from the University of Maryland, a pioneer in the similarity search field, with several books published on the subject. Professor Samet presented a state of the art system for news search based on the geographical location of the user to get more accurate results. The second invited speaker was Alistair Moffat (https://people.eng.unimelb.edu.au/ammoffat/) from the University of Melbourne, who delivered a talk about a novel technique for building compressed indexes using Asymmetric Numeral Systems (ANS). The ANS is a curious case of a scientific breakthrough not published in a peer-reviewed journal. Although it is available only as an arXiv technical, it is widely used in the industry – from Google and Facebook to Amazon, the adoption has been widespread. The third keynote talk was delivered in the shared session with SPIRE by Moshe Vardi (https://www.cs.rice.edu/~vardi/) of Rice University, a most celebrated editor of Communications of the ACM. Professor Vardi’s talk was an eye-opening discussion of jobs conquered by machines and the perspectives in accepting technological changes in everyday life. In the same shared session, a keynote presentation of SPIRE was given by Nataša Przulj (http://www0.cs.ucl.ac.uk/staff/natasa/) of University College London, concerning molecular networks and the challenges researchers face in developing a better understanding of them. It is worth noting that roughly 10% of the SPIRE participants were inspired to attend the SISAP technical program.

As it is usually the case, SISAP 2018 included a program with papers exploring various similarity-aware data analysis and processing problems from multiple perspectives. The papers presented at the conference in 2018 studied the role of similarity processing in the context of metric search, visual search, nearest neighbor queries, clustering, outlier detection, and graph analysis. Some of the papers had a theoretical emphasis, while others had a systems perspective, presenting experimental evaluations comparing against state-of-the-art methods. An interesting event at the 2018 conference, as well as the two previous editions, was a poster session that included all accepted papers. This component of the conference generated many lively interactions between presenters and attendees, to not only learn more about the presented techniques but also to identify potential topics for future collaboration.

A shortlist for the Best Paper Award was created from those conference papers nominated by at least one of their 3 reviewers. An award committee of 3 researchers ranked the shortlisted papers, from which a final ranking was decided using Borda count. The Best Paper Award was presented during the Conference Dinner. In a tradition that began with the 2009 conference in Prague, extended versions of the top-ranked papers were invited for a Special Issue of the Information Systems journal.

The venue and the location of SISAP 2018 deserve a special mention. In addition to the excellent conference facilities at UTEC, we had many student volunteers who were ready to help ensure that the logistical aspects of the conference ran smoothly. Lima was a superb location for the conference. Our conference dinner was held at the Huaca Pucllana Restaurant, located on the site of amazing archaeological remains within the city itself. We also had many opportunities to enjoy excellently-prepared traditional Peruvian food and drink. Before and after the conference, many participants chose to visit Machu Picchu, voted as one of the New Seven Wonders of the World.

SISAP 2018 demonstrated that the SISAP community has a strong stable kernel of researchers, active in the field of similarity search and to fostering the growth of the community. Organizing SISAP is a smooth experience thanks to the support of the Steering Committee and dedicated participants.

SISAP 2019 will be organized in Newark (NJ, USA) by Professor Vincent Oria (NJIT). This attractive location in the New York City metropolitan area will allow for easy and convenient travel to and from the conference. One of the major challenges of the SISAP conference series is to continue to raise its profile in the landscape of scientific events related to information indexing, database and search systems.

Figure 1. The conference dinner at Pachacamac ruins

Figure 1. The conference dinner at Pachacamac ruins

Figure 2. After the very interesting technical sessions, we ended the conference with an excursion to Lima downtown

Figure 2. After the very interesting technical sessions, we ended the conference with an excursion to Lima downtown

Figure 3. Keynote by Vardi

Figure 3. Keynote by Vardi

Report from the SIGMM Emerging Leaders Symposium 2018

The idea of a symposium to bring together the bright new talent within the SIGMM community and to hear their views on some topics within the area and on the future of Multimedia, was first mooted in 2014 by Shih-Fu Chang, then SIGMM Chair. That lead to the “Rising Stars Symposium” at the MULTIMEDIA Conference in 2015 where 12 invited speakers made presentations on their work as a satellite event to the main conference. After each presentation a respondent, typically an experienced member of the SIGMM community, gave a response or personal interpretation of the presentation. The format worked well and was very thought-provoking, though some people felt that a shorter event which could be more integrated into the conference, might work better.

For the next year, 2016, the event was run a second time with 6 invited speakers and was indeed more integrated into the main conference. The event skipped a year in 2017, but was brought back for the MULTIMEDIA Conference in 2018 and this time, rather than invite speakers we decided to have an open call with nominations, to make selection for the symposium a competitive process. We also decided to rename the event from Rising Stars Symposium, and call it the “SIGMM Emerging Leaders Symposium”, to avoid confusion with the “SIGMM Rising Star Award”, which is completely different and is awarded annually.

In July 2018 we issued a call for applications to the “Third SIGMM Emerging Leaders Symposium, 2018” which was to be held at the annual MULTIMEDIA Conference in Seoul, Korea, in October 2018. Applications were received and were evaluated by a panel consisting of the following people, and we thank them for volunteering and for their support in doing this.

Werner Bailer, Joanneum Research
Guillaume Gravier, IRISA
Frank Hopfgartner, Sheffield University
Hayley Hung, Delft University, (a previous awardee)
Marta Mrak, BBC

Based on the assessment panel recommendations, 4 speakers were included in the Symposium, namely:

Hanwang Zhang, Nanyang Technological University, Singapore
Michael Riegler, Simula, Norway
Jia Jia, Tsinghua University, China
Liqiang Nie, Shandong University, China

The Symposium took place on the last day of the main conference and was chaired by Gerald Friedland, SIGMM Conference Director.

image1

Towards X Visual Reasoning

By Hanwang Zhang (Nanyang Technological University, Singapore)

For decades, we are interested in detecting objects and classifying them into a fixed vocabulary of lexicon. With the maturity of these “low-level” vision solutions, we are hunger for a “higher-level” representation of the visual data, so as to extract visual knowledge rather than merely bags of visual entities, allowing machines to reason about human-level decision-making. In particular, we wish an “X” reasoning, where X means eXplainable and eXplicit. In this talk, I first reviewed a brief history of symbolism and connectionism, which alternatively promote the development of AI in the past decades. In particular, though the deep neural networks — the prevailing incarnation of connectionism — have shown impressive super-human performance in various tasks, they still lag behind us in high-level reasoning. Therefore, I propose the marriage between symbolism and connectionism to take the complementary advantages of them, that is, the proposed X visual reasoning. Second, I introduced the two building blocks of X visual reasoning: visual knowledge acquisition by scene graph detection and X neural modules applied on the knowledge for reasoning. For scene graph detection, I introduced our recent progress on reinforcement learning of the scene dynamics, which can help to generate coherent scene graphs that respect visual context. For X neural modules, I discussed our most recent work on module design, algorithms, and applications in various visual reasoning tasks such as visual Q&A, natural language grounding, and image captioning. At last, I visioned some future directions towards X visual reasoning, such as using meta-learning and deep reinforcement learning for more dynamic and efficient X neural module compositions.

Professor Ramesh Jain mentioned that a truly X reasoning should consider the potential human-computer interaction that may change or digress a current reasoning path. This is crucial because human intelligence can reasonably respond to interruptions and incoming evidences.

We can position X visual reasoning in the recent trend of neural-symbolic unification, which gradually becomes our consensus towards a general AI. The “neural”’ is good at representation learning and model training, and the “symbolic” is good at knowledge reasoning and model explanation. One should bear in mind that the future multimedia system should take the complementary advantages of the “neural-symbolic”.

BioMedia – The Important Role of Multimedia Research for Healthcare

by Michael Riegler (SimulaMet & University of Oslo, Norway)

With the recent rise of machine learning, analysis of medical data has become a hot topic. Nevertheless, the analysis is still often restricted to a special type of images coming from radiology or CT scans. However, there are continuously vast amounts of multimedia data collected both within the healthcare systems and by the users using devices such as cameras, sensors and mobile phones.

In this talk I focused on the potential of multimedia data and applications to improve healthcare systems. First, a focus on the various data was given. A person’s health is contained in many data sources such as images, videos, text and sensors. Medical data can also be divided into data with hard and soft ground truth. Hard ground truth means that there are procedures that verify certain labels of the given data (for example a biopsy report for a cancerous tissue sample). Soft ground truth is data that was labeled by medical experts without a verification of the outcome. Different data types also come with different levels of security. For example activity data from sensors have a low chance to help to identify the patient whereas speech, social media, GPS come with a higher chance of identification. Finally, it is important to take context into account and results should be explainable and reproducible. This was followed by a discussion about the importance of multimodal data fusion and context aware analysis supported by three example use cases: Mental health, artificial reproduction and colonoscopy.

I also discussed the importance of involving medical experts and patients as users. Medical experts and patients are two different user groups, with different needs and requirements. One common requirement for both groups is the need for explanation about how the decisions were taken. In addition, medical experts are mainly interested in support during their daily tasks, but are not very interested in, for example, huge amounts of sensor data from patients because the increase amount of work. They have a preference on interacting with the patients than with the data. Patients on the other hand usually prefer to collect a lot of data and get informed about their current status, but are more concerned about their privacy. They also usually want that medical experts take as much data into account as possible when making their assessments.

Professor Susanne Boll mentioned that it is important to find out what is needed to make automatic analysis accepted by hospitals and who is taking the responsibility for decisions made by automatic systems. Understandability and reproducibility of methods were mentioned as an important first step.

The most relevant messages of the talk are that the multimedia community has the diverse skills needed to address several challenges related to medicine. Furthermore, it is important to focus on explainable and reproducible methods.

Mental Health Computing via Harvesting Social Media Data

By Jia Jia, Tsinghua University, China

Nowadays, with the rapid pace of life, mental health is receiving widespread attention. Common symptoms like stress, or clinical disorders like depression, are quite harmful, and thus it is of vital significance to detect mental health problems before they lead to severe consequences. Professional mental criteria like the International Classification of Diseases (ICD-10 [1]) and the Diagnostic and Statistical Manual of Mental Disorders (DSM [2]) have defined distinguishing behaviors in daily lives that help diagnosing disorders. However, traditional interventions based on face-to-face interviews or self-report questionnaires are expensive and hysteretic. The potential antipathy towards consulting psychiatrists exacerbates these problems.

Social media platforms, like Twitter and Weibo, have become increasingly prevalent for users to express themselves and interact with friends. The user-generated content (UGC) shared in such platforms may help to better understand the real-life state and emotion of users in a timely manner, making the analysis of the users’ mental wellness feasible. Underlying these discoveries, research efforts have also been devoted for early detection of mental problems.

In this talk, I focused on the timely detection of mental wellness, focusing on typical mental problems: stress and depression. Starting with binary user-level detection, I expanded the research by considering the trigger and the severity of the mental problems, involving different social media platforms that are popular in different cultures. I presented my recent progress from three prespectives:

  1. Through self-reported sentence pattern matching, I constructed a series of large-scale well-labeled datasets in the field of online mental health analysis;
  2. Based on previous psychological research, I extracted multiple groups of discriminating features for detection and presented several multi-modal models targeting at different contexts. I conducted extensive experiments with my models, demonstrating significantly better performance as compared to the state-of-the-art methods; and
  3. I investigated in detail the contribution per feature, of online behaviors and even cultural differences in different contexts. I managed to reveal behaviors not covered in traditional psychological criteria, and provided new perspectives and insights for current and future research.

My developed mental health care applications were also demonstrated in the end.

Dr. B. Prabhakaran indicated that mental health understanding is a difficult problem, even for trained doctors, and we will need to work with psychiatrist sooner than later. Thanks to his valuable comments, regarding possible future directions, I envisage the use of augmented / mixed reality to create different immersive “controlled” scenarios where human behavior can be studied. I consider for example to create stressful situations (such as exams, missing a flight, etc.), for better understanding depression. Especially for depression, I plan to incorporate EEG sensor data in my studies.

[1] https://www.who.int/classifications/icd/en/

[2] https://www.psychiatry.org/psychiatrists/practice/dsm

Towards Micro-Video Understanding

By Liqiang Nie, Shandong University, China

We are living in the era of ever-dwindling attention span. To feed our hunger for quick content, bite-sized videos embracing the philosophy of “shorter-is-better”, are becoming popular with the rise of micro-video sharing services. Typical services include Vine, Snapchat, Viddy, and Kwai. Micro-videos like a wildfire are very popular and taking over the content and social media marketing space, in virtue of their value in brevity, authenticity, communicability, and low-cost. Micro-videos can benefit lots of commercial applications, such as brand building. Despite their value, the analysis and modeling of micro-videos is non-trivial due to the following reasons:

  1. micro-videos are short in length and of low quality;
  2. they can be described by multiple heterogeneous channels, spanning from social, visual, and acoustic to textual modalities;
  3. they are organized into a hierarchical ontology in terms of semantic venues; and
  4. there are no available benchmark dataset on micro-videos.

In my talk, I introduced some shallow and deep learning models for micro-video understanding that are worth studying and have proven effective:

  1. Popularity Prediction. Among the large volume of micro-videos, only a small portion of them will be widely viewed by users, while most will only gain little attention. Obviously, if we can identify in advance the hot and popular micro-videos, it will benefit many applications, like the online marketing and network reservation;
  2. Venue Category Estimation. In a random sample over 2 million Vine videos, I found that only 1.22% of the videos are associated with venue information. Including location information about the videos can benefit multifaceted aspects, such as footprints recording, personalized applications, and other location-based services, it is thus highly desired to infer the missing geographic cues;
  3. Low quality sound. As the quality of the acoustic signal is usually relatively low, simply integrating acoustic features with visual and textual features often leads to suboptimal results, or even adversely degrades the overall quality.

In the future, I may try some other meaningful tasks such as micro-video captioning or tagging and detection of unsuitable content. As many micro-videos are annotated with erroneous words, namely the topic tags or descriptions are not well correlated to the content, this negatively influences other applications, such as textual query search. It is common that users upload many violence and erotic videos. At present, the detection and alert tasks mainly rely on labor-intensive inspection. I plan to create systems that automatically detect erotic and violence content.

During the presentation, the audience asked about the datasets used in my work. In my previous work, all the videos come from Vine, but this service has been closed. The audience wondered how I will build the dataset in the future. As there are many other micro-video sites, such as Kwai and Instagram, I hence can obtain sufficient data from them to support my further research.

Interview with Dr. Magda Ek Zarki and Dr. De-Yu Chen: winners of the Best MMsys’18 Workshop paper award

Abstract

The ACM Multimedia Systems conference (MMSys’18) was recently held in Amsterdam from 9-15 June 2018. The conferencs brings together researchers in multimedia systems. Four workshops were co-located with MMSys, namely PV’18, NOSSDAV’18, MMVE’18, and NetGames’18. In this column we interview Magda El Zarki and De-Yu Chen, the authors of the best workshop paper entitled “Improving the Quality of 3D Immersive Interactive Cloud-Based Services Over Unreliable Network” that was presented at MMVE’18.

Introduction

The ACM Multimedia Systems Conference (MMSys) (mmsys2018.org) was held from the 12-15 June in Amsterdam, The Netherlands. The MMsys conference provides a forum for researchers to present and share their latest research findings in multimedia systems. MMSys is a venue for researchers who explore complete multimedia systems that provide a new kind of multimedia or overall performance improves the state-of-the-art. This touches aspects of many hot topics including but not limited to: adaptive streaming, games, virtual reality, augmented reality, mixed reality, 3D video, Ultra-HD, HDR, immersive systems, plenoptics, 360° video, multimedia IoT, multi- and many-core, GPGPUs, mobile multimedia and 5G, wearable multimedia, P2P, cloud-based multimedia, cyber-physical systems, multi-sensory experiences, smart cities, QoE.

Four workshops were co-located with MMSys in Amsterdam in June 2018. The paper titled “Improving the Quality of 3D Immersive Interactive Cloud-Based Services Over Unreliable Network” by De-Yu Chen and Magda El-Zarki from University of California, Irvine was awarded the Comcast Best Workshop Paper Award for MMSys 2018, chosen from among papers from the following workshops: 

  • MMVE’18 (10th International Workshop on Immersive Mixed and Virtual Environment Systems)
  • NetGames’18 (16th Annual Workshop on Network and Systems Support for Games)
  • NOSSDAV’18 (28th ACM SIGMM Workshop on Network and Operating Systems Support for Digital Audio and Video)
  • PV’18 (23rd Packet Video Workshop)

We approached the authors of the best workshop paper to learn about the research leading up to their paper. 

Could you please give a short summary of the paper that won the MMSys 2018 best workshop paper award?

In this paper we discussed our approach of an adaptive 3D cloud gaming framework. We utilized a collaborative rendering technique to generate partial content on the client, thus the network bandwidth required for streaming the content can be reduced. We also made use of progressive mesh so the system can dynamically adapt to changing performance requirements and resource availability, including network bandwidth and computing capacity. We conducted experiments that are focused on the system performance under unreliable network connections, e.g., when packets can be lost. Our experimental results show that the proposed framework is more resilient under such conditions, which indicates that the approach has potential advantage especially for mobile applications.

Does the work presented in the paper form part of some bigger research question / research project? If so, could you perhaps give some detail about the broader research that is being conducted?

A more complete discussion about the proposed framework can be found in our technical report, Improving the Quality and Efficiency of 3D Immersive Interactive Cloud Based Services by Providing an Adaptive Application Framework for Better Service Provisioning, where we discussed performance trade-off between video quality, network bandwidth, and local computation on the client. In this report, we also tried to tackle network latency issues by utilizing the 3D image warping technique. In another paper, Impact of information buffering on a flexible cloud gaming system, we further explored the potential performance improvement of our latency reduction approach, when more information can be cached and processed.

We received many valuable suggestions and identified a few important future directions. Unfortunately, De-Yu, graduated and decided to pursue a career in the industry. He will not likely to be able to continue working on this project in the near future.

Where do you see the impact of your research? What do you hope to accomplish?

Cloud gaming is an up-and-coming area. Major players like Microsoft and NVIDIA have already launched their own projects. However, it seems to me that there is not a good enough solution that is accepted by the users yet. By providing an alternative approach, we wanted to demonstrate that there are still many unsolved issues and research opportunities, and hopefully inspire further work in this area.

Describe your journey into the multimedia research. Why were you initially attracted to multimedia?

De-Yu: My research interest in cloud gaming system dated back to 2013 when I worked as a research assistant in Academia Sinica, Taiwan. When U first joined Dr. Kuan-Ta Chen’s lab, my background was in parallel and distributed computing. I joined the lab for a project that is aimed to provide a tool that help developers do load balancing on massively multiplayer online video games. Later on, I had the opportunity to participate in the lab’s other project, GamingAnywhere, which aimed to build the world’s first open-source cloud gaming system. Being an enthusiastic gamer myself, having the opportunity to work on such a project was really an enjoyable and valuable experience. That experience came to be the main reason for continuing to work in this area. 

Magda El Zarki: I have worked in multimedia research since the 1980’s when I worked for my PhD project on a project that involved the transmission of data, voice and video over a LAN. It was named MAGNET and was one of the first integrated LANs developed for multimedia transmission. My work continued in that direction with the transmission of Video over IP. In conjunction with several PhD students over the past 20—30 years I have developed several tools for the study of video transmission over IP (MPEGTool) and has several patents related to video over wireless networks. All the work focused on improving the quality of the video via pre and post processing of the signal.

Can you profile your current research, its challenges, opportunities, and implications?

There are quite some challenges in our research. First of all, our approach is an intrusive method. That means we need to modify the source code of the interactive applications, e.g. games, to apply our method. We found it very hard to find a suitable open source game whose source code is neat and clean and easy to modify. Developing our own fully functioning game is not a reasonable approach, alas, due to the complexity involved. We ended up building a 3D virtual environment walkthrough application to demonstrate our idea. Most reviewers have expressed concerns about synchronization issues in a real interactive game, where there may be AI controlled objects, non-deterministic processes, or even objects controlled by other players. We agree with the reviewers that this is a very important issue. But currently it is very hard for us to address it with our limited resources. Most of the other research work in this area faces similar problems to ours – lack of a viable open source game for researchers to modify. As a result, researchers are forced to build their own prototype application for performance evaluation purposes. This brings about another challenge: it is very hard for us to fairly compare the performance of different approaches given that we all use a different application for testing. However, these difficulties can also be deemed as opportunities. There are still many unsolved problems. Some of them may require a lot of time, effort, and resources, but even a little progress can mean a lot since cloud gaming is an area that is gaining more and more attention from industry to increase distribution of games over many platforms.

“3D immersive and interactive services” seems to encompass both massive multi-user online games as well augmented and virtual reality. What do you see as important problems for these fields? How can multimedia researchers help to address these problems?

When it comes to gaming or similar interactive applications, all comes down to the user experience. In the case of cloud gaming, there are many performance metrics that can affect user experience. Identifying what matters the most to the users would be one of the important problems. In my opinion, interactive latency would be the most difficult problem to solve among all performance metrics. There is no trivial way to reduce network latency unless you are willing to pay the cost for large bandwidth pipes. Edge computing may effectively reduce network latency, but it comes with high deployment cost.

As large companies start developing their own systems, it is getting harder and harder for independent researchers with limited funding and resources to make major contributions in this area. Still, we believe that there are a couple ways how independent researchers can make a difference. First, we can limit the scope of the research by simplifying the system, focusing on just one or a few features or components. Unlike corporations, independent researchers usually do not have the resources to build a fully functional system, but we also do not have the obligation to deliver one. That actually enables us to try out some interesting but not so realistic ideas. Second, be open to collaboration. Unlike corporations who need to keep their projects confidential, we have more freedom to share what we are doing, and potentially get more feedback from others. To sum up, I believe in an area that has already attracted a lot of interest from industry, researchers should try to find something that companies cannot or are not willing to do, instead of trying to compete with them.

If you were conducting this interview, what questions would you ask, and then what would be your answers?

 The real question is: Is Cloud Gaming viable? It seems to make economic sense to try to offer it as companies try to reach a broader  and more remote audience. However, computing costs are cheaper than bandwidth costs, so maybe throwing computing power at the problem makes more sense – make more powerful end devices that can handle the computing load of a complex game and only use the network for player interactivity.

Biographies of MMSys’18 Best Workshop Paper Authors

Prof Magda El Zarki (Professor, University of California, Irvine):

Magda El Zarki

Prof. El Zarki’s lab focuses on multimedia transmission over the Internet. The work consists of both theoretical studies and practical implementations to test the algorithms and new mechanisms to improve quality of service on the user device. Both wireline and wireless networks and all types of video and audio media are considered. Recent work has shifted to networked games and massively multi user virtual environments (MMUVE). Focus is mostly on studying the quality of experience of players in applications where precision and time constraints are a major concern for game playability. A new effort also focuses on the development of games and virtual experiences in the arena of education and digital heritage.

De-Yu Chen (PhD candidate, University of California, Irvine):

De-Yu Chen

De-Yu Chen is a PhD candidate at UC Irvine. He received his M.S. in Computer Science from National Taiwan University in 2009, and his B.B.A. in Business Administration from National Taiwan University in 2006. His research interests include multimedia systems, computer graphics, big data analytics and visualization, parallel and distributed computing, cloud computing. His most current research project is focused on improving quality and flexibility of cloud gaming systems.

Report from ACM MMSYS 2018 – by Gwendal Simon

While I was attending the MMSys conference (last June in Amsterdam), I tweeted about my personal highlights of the conference, in the hope to share with those who did not have the opportunity to attend the conference. Fortunately, I have been chosen as “Best Social Media Reporter” of the conference, a new award given by ACM SIGMM chapter to promote the sharing among researchers on social networks. To celebrate this award, here is a more complete report on the conference!

When I first heard that this year’s edition of MMsys would be attended by around 200 people, I was a bit concerned whether the event would maintain its signature atmosphere. It was not long before I realized that fortunately it would. The core group of researchers who were instrumental in the take-off of the conference in the early 2010’s is still present, and these scientists keep on being sincerely happy to meet new researchers, to chat about the latest trends in the fast-evolving world of online multimedia, and to make sure everybody feels comfortable talking with each other.

mmsys_1

I attended my first MMSys in 2012 in North Carolina. Although I did not even submit any paper to MMSys’12, I decided to attend because the short welcoming text on the website was astonishingly aligned with my own feeling of the academic research world. I rarely read the usually boring and unpassionate conference welcoming texts, but this particular day I took time to read this particular MMSys text changed my research career. Before 2012, I felt like one lost researcher among thousands of other researchers, whose only motivation is to publish more papers whatever at stake. I used to publish sometimes in networking venues, sometimes in system venues, sometimes in multimedia venues… My production was then quite inconsistent, and my experiences attending conferences were not especially exciting.

The MMsys community matches my expectations for several reasons:

  • The size of a typical MMSys conference is human: when you meet someone the first day, you’ll surely meet this fellow again the next day.
  • Informal chat groups are diverse. I’ve the feeling that anybody can feel comfortable enough to chat with any other attendee regardless of gender, nationality, and seniority.
  • A responsible vision of what should be an academic event. The community is not into show-off in luxury resorts, but rather promotes decently cheap conferences in standard places while maximizing fun and interactions. It comes sometimes with the cost of organizing the conference in the facilities of the university (which necessarily means much more work for organizers and volunteers), but social events have never been neglected.
  • People share a set of “values” into their research activities.

This last point is of course the most significant aspect of MMSys. The main idea behind this conference is that multimedia services are not only multimedia but also networks, systems, and experiences. This commitment to a holistic vision of multimedia systems has at least two consequences. First, the typical contributions that are discussed in this conference have both some theoretical and experimental parts, and, to be accepted, papers have to find the right balance between both sides of the problem. It is definitely challenging, but it brings passionate researchers to the conference. Second, the line between industry and academia is very porous. As a matter of facts, many core researchers of MMSys are either (past or current) employees of research centers in a company or involved into standard groups and industrial forums. The presence of people being involved in the design of products nurtures the academic debates.

While MMSys significantly grows, year after year, I was curious to see if these “values” remain. Fortunately, it does. The growing reputation has not changed the spirit.

mmsys_2

The 2018 edition of the MMSys conference was held in the campus of CWI, near Downtown Amsterdam. Thanks to the impressive efforts of all volunteers and local organizers, the event went smoothly in the modern facilities near the Amsterdam University. As can be expected from a conference in the Netherlands, especially in June, biking to the conference was the obviously best solution to commute every morning from anywhere in Amsterdam.

mmsys_3The program contains a fairly high number of inspiring talks, which altogether reflected the “style” of MMsys. We got a mix of entertaining technological industry-oriented talks discussing state-of-the-art and beyond. The two main conference keynotes were given by stellar researchers (who unsurprisingly have a bright career in both academia and industry) on the two hottest topics of the conference. First Philip Chou (8i Labs) introduced holograms. Phil kind of lives in the future, somewhere five years later than now. And from there, Phil was kind enough to give us a glimpse of the anticipatory technologies that will be developed between our and his nows. Undoubtedly everybody will remember his flash-forwarding talk. Then Nuria Oliver (Vodafone) discussed the opportunities to combine IoT and multimedia in a talk that was powerful and energizing. The conference also featured so-called overview talks. The main idea is that expert researchers present the state-of-the-art in areas that have been especially under the spotlights in the past months. The topics this year were 360-degree videos, 5G networks, and per-title video encoding. The experts were from Tiledmedia, Netflix, Huawei and University of Illinois. With such a program, MMSys attendees had the opportunity to catch-up on everything they may have missed during the past couple of years.

mmsys_4

mmsys_5The MMSys conference has also a long history of commitment for open-source and demonstration. This year’s conference was a peak with an astonishing ratio of 45% papers awarded by a reproducibility badge, which means that the authors of these papers have accepted to share their dataset, their code, and to make sure that their work can be reproduced by other researchers. I am not aware of any other conference reaching such a ratio of reproducible papers. MMSys is all about sharing, and this reproducibility ratio demonstrates that the MMSys researchers see their peers as cooperating researchers rather than competitors.

 

mmsys_6My personal highlights would go for two papers: the first one is a work from researchers from UT Dallas and Mobiweb. It shows a novel efficient approach to generate human models (skeletal poses) with regular Kinect. This paper is a sign that Augmented Reality and Virtual Reality will soon be populated by user-generated content, not only synthetized 3D models but also digital captures of real humans. The road toward easy integration of avatars in multimedia scenes is paved and this work is a good example of it. The second work I would like to highlight in this column is a work from researchers from Université Cote d’Azur. The paper deals with head movement in 360-degree videos but instead of trying to predict movements, the authors propose to edit the content to guide user attention so that head movements are reduced. The approach, which is validated by a real prototype and code source sharing, comes from a multi-disciplinary collaboration with designers, engineers, and human interaction experts. Such multi-disciplinary work is also largely encouraged in MMSys conferences.

mmsys_7b

Finally, MMSys is also a full event with several associated workshops. This year, Packet Video (PV) was held with MMSys for the very first time and it was successful with regards to the number of people who attended it. Fortunately, PV has not interfered with Nossdav, which is still the main venue for high-quality innovative and provocative studies. In comparison, both MMVE and Netgames were less crowded, but the discussion in these events was intense and lively, as can be expected when so many experts sit in the same room. It is the purpose of workshops, isn’t it?

mmsys_8

A very last word on the social events. The social events in the 2018 edition were at the reputation of MMSys: original and friendly. But I won’t say more about them: what happens in MMSys social events stays at MMSys.

mmsys_9The 2019 edition of MMSys will be held on the East Coast of US, hosted by University of Massachusetts-Amherst. The multimedia community is in a very exciting time of its history. The attention of researchers is shifting from video delivery to immersion, experience, and attention. More than ever, multimedia systems should be studied from multiple interplaying perspectives (network, computation, interfaces). MMSys is thus a perfect place to discuss research challenges and to present breakthrough proposals.

[1] This means that I also had my bunch of rejected papers at MMSys and affiliated workshops. Reviewer #3, whoever you are, you ruined my life (for a couple of hours)