An interview with Associate Professor Hugo L. Hammer

Hugo as a Ph.D. student, at the beginning of his research career.

Describe your journey into research from your youth up to the present. What foundational lessons did you learn from this journey? Why were you initially attracted to multimedia?

From an early age, I had the ability to focus and work individually and loved to develop new systems for all sorts of things, which probably was quite annoying for those around me. It turns out that it is these abilities to focus, being curious, and developing new systems is what drives my research today. When I started as a student in mathematics and statistics at the Norwegian University of Science and Technology (NTNU), I didn’t think of research as an alternative and was determined to find a job in the industry. Throughout the studies, I learned how little mathematics and statistics I had actually learned, which is why I decided to become a Ph.D. student. I expected to find a job in the industry after the Ph.D. period but ended up loving research, and that is why I am where I am today.

As a statistician, I have worked a lot with spatial and spatio-temporal data, such as geophysical observations. Such observations have striking similarities to multimedia content, such as images and videos. I have become very interested in machine learning methods used to process and make decisions from multimedia content and the potential for applying such methods towards other applications, such as geophysical applications. I also love working as a statistician within this field. A crucial part of my research is to try to combine methods from machine learning and statistics into new and exciting ways.

Tell us more about your vision and objectives behind your current roles? What do you hope to accomplish, and how will you bring this about? 

In my current position as an associate professor, I do both teaching and research. Teaching and research challenge me in different ways. I continuously try to develop and improve my teaching. I especially focus on how to do high quality, yet resource-efficient, teaching. I have, for example, worked a lot on how to activate students and improve learning when being a single teacher for hundreds of students.

Can you profile your current research, its challenges, opportunities, and implications?

My current research can roughly be divided into three directions. The first direction is about methods for real-time information processing and decision making, for example, from sensory information or video streams. The second direction is based on developing new machine learning models and methods, and as mentioned above, by taking advantage of my background in statistics. The third direction is doing more applied use of machine learning methods toward real-life multimedia data, in particular, medical data. Direction two and three go hand in hand. Having a background in statistics and working more and more with multimedia data is more of an opportunity than a challenge.

How would you describe your top innovative achievements in terms of the problems you were trying to solve, your solutions, and the impact it has today and into the future?

 I am proud of the research we have done on real-time information processing and decision making. Our developed methods are simple but still document state-of-the-art performance. In 2020, we plan to develop software packages to make the methods readily available and hopefully useful for many. We saw the potential of using machine learning, and in particular deep learning, towards geophysical data and problems quite early, and we are now able to operate at the forefront of this research. I’m also proud of our externally funded research projects and, for sure, our rejected research proposals.

Over your distinguished career, what are your top lessons you want to share with the audience?

Here is a lesson from my personal experience. I think it is easy to depend on or have too much respect for other researchers early in the career. Research is of course all about collaboration, but still, for me, it was useful early in the career to create a small research project where I did every step of the process myself (shaping ideas, collecting data, running simulation, writing, finding suitable publishing channels, revisions, etc.). It was hard work, but for sure, it made me a better and more independent researcher.

What is the best joke you know?

Daddy, what are clouds made of?

Linux servers, mostly.

If you were conducting this interview, what questions would you ask, and then what would be your answers?

One suggestion: What do you like to do in your spare time?

Research, right? 🙂 Working every day at an office, I try to find time for physical activity in my spare time. I love to run, bike, or go skiing in Nordmarka (a forest near Oslo, Norway) or in the mountains on the weekends.

A recent photo of Hugo.

Bio: Hugo L. Hammer is an associate professor in statistics at Oslo Metropolitan University. His main research interests are computational statistics, probabilistic forecasting, real-time analytics, and machine learning.

An interview with Professor Roger Zimmermann

Roger at the start of his career.

Please describe your journey into research from your youth up to the present. What foundational lessons did you learn from this journey? Why were you initially attracted to multimedia?

I have had an interest in technology early on, though my path to becoming an academic has not been very direct. In high school, I really enjoyed to tinker with electronics, taking radios apart, and learning about digital circuits. My goal was to work in this field, and after high school, I did an apprenticeship with Brown, Boveri & Cie. (BBC), which sometime later became Asea Brown Boveri (ABB). The apprentices were assigned to different company locations, and I was lucky enough to be sent to BBC’s Forschungszentrum (Research Center). The labs, the researchers, and the cutting-edge equipment and projects there left a deep impression on me. Beyond electronics, I really liked microprocessors, computers and how they could be flexibly programmed with software. I decided that I wanted to pursue further studies and I subsequently enrolled in the Höhere Technische Lehranstalt (HTL) Brugg-Windisch in their Informatik program (the HTL program has since changed and the building where I studied is now part of the campus Windisch of the Fachhochschule Nordwestschweiz). Fresh with my HTL degree in hand, I started to work for an engineering company and over the next years, I got the chance to work on some fascinating projects. After five years, I got an itch to study for a Master’s degree and I ended up in California. One of the professors (who became my advisor) encouraged me to go for a Ph.D., and I took him up on his offer to support me. His group worked at the intersection of databases and multimedia. It really fascinated me and we ended up building one of the early streaming media servers. What I still find fascinating about multimedia today is how it brings together many fundamental computer science areas such as networking, graphics, operating system support, signal processing, etc. I also like that multimedia is used by people to express their creativity, humanity and artistic aspirations – it is not only about technology.

My personal lessons looking back are that sometimes you may not know where your journey will take you, but make sure you enjoy and learn from the path to get there.

Tell us more about your vision and objectives behind your current roles? What do you hope to accomplish and how will you bring this about?

I currently work broadly in two areas, namely streaming media systems and data analytics. At this point, one of the main enjoyment I get is from working with my research group and international colleagues from around the world. On the technical side, it is fun if somebody is actually using what we develop. On the human side of things, it is great to see when my students and former students are doing well in various parts of the globe.

Can you profile your current research, its challenges, opportunities, and implications?

In my research group, I have two main themes and those are media systems and multimedia data analytics. In the first cluster, we look at media streaming on the Internet. The main technology in use today is Dynamic Adaptive Streaming over HTTP, also called DASH. Some interesting challenges are in the area of enabling very low latency in live streaming, which is of interest to many large Internet companies. Going forward, I see 5G networks as an interesting challenge. Most people are excited about the very high bandwidth that 5G can offer (in the best case), but I believe one of the major challenges will be the very high variability of 5G networks when a device is moving. On the multimedia, and especially spatial, data analytics side, I am part of a new lab between NUS and the ridesharing company Grab. There is a tremendous amount of data generated (e.g., GPS trajectories) that allow novel data-driven applications such as generating accurate road maps in regions where this information is not readily available or the inference of semantic attributes of roads (e.g., no right turn allowed). The fusion of multiple data types such as trajectories, images, maps, etc., will allow for some exciting new applications.

How would you describe your top innovative achievements in terms of the problems you were trying to solve, your solutions, and the impact it has today and into the future?

One of the areas where my group made innovative contributions was georeferenced mobile video — combining videos with their geo-spatial properties led to a lot of interesting developments. We started with this just about at the same time when the first iPhone came out, and the idea of utilizing all the sensors in a phone in combination with its video was really novel. Nowadays, sensor fusion is common and is used in many machine-learning applications and I am sure there will be even greater break-throughs in the future. Another area where I have been working for decades is media streaming and this whole industry has changed from proprietary networks to the Internet. There have been many people working in this area, but I believe that our own contributions have helped to transform this field.

Over your distinguished career, what are the top lessons you want to share with the audience?

My path to becoming an academic has not been as direct as for some other people. But one of the key things that I have enjoyed along the way was to work with many outstandingly talented and bright people from all around the world. I hope that humanity will keep working together based on facts and science to solve some of the big challenges that are coming our way.

If you were conducting this interview, what questions would you ask, and then what would be your answers?

One issue that concerns me is the apparent trend to not trust facts anymore. So a possible question could be: Do you see a danger when people easily distribute and believe in “alternate facts”?

My answer would be, I definitely see this as a considerable concern in the future. While there may be some technical solutions to combat fake news, etc., it is also increasingly important that people are well educated and think critically, especially in a world where fake information may look very persuasive.

 

What is the best joke you know?

I like many of the weird, but strangely funny comments on life and baseball from Yogi Berra. He was born Lawrence Peter Berra and was a US baseball legend. Two examples:

“When you come to a fork in the road, take it.”

“You should always go to other people’s funerals. Otherwise, they won’t come to yours.”


A current image of Roger.

Short bio:

Roger Zimmermann is an Associate Professor at the School of Computing at the National University of Singapore (NUS). He is also Deputy Director with the Smart Systems Institute (SSI) at NUS. From 2010 to 2016 he co-directed the Centre of Social Media Innovations for Communities (COSMIC), a research institute funded by the National Research Foundation (NRF) of Singapore. Prior to joining NUS he held the positions of Research Area Director with the Integrated Media Systems Center (IMSC) and Research Assistant Professor at the University of Southern California (USC). He earned his M.S. and Ph.D. degrees from the Viterbi School of Engineering at the University of Southern California.

Multidisciplinary Column: Conferences as Career and Community Catalysts

A little over 10 years ago, I chose to pursue a PhD. This meant I chose a professional life in which research publications and their uptake would be seen as major evidence of achievement. For those working in computer science, the major dissemination platforms for such publications are conferences.

Given my dual background in music and computer science, it was logical that my main interests were in topics that connected these both worlds. As a consequence, I hoped to become part of the Music Information Retrieval community. The International Society for Music Information Retrieval (ISMIR) therefore seemed the professional community to target, and the annual ISMIR conference the most logical place to present my work at.

In terms of its education and research, my department at TU Delft had track records and agendas in visual and social multimedia content analysis, but not particularly in music. Considering methodology and philosophy, I did think a lot of the work at the department was compatible with what I tried to do in music. Furthermore, as I still was in training in a selective major at the conservatoire, I was not in a good position to geographically move to any other institute that would have a more established Music Information Retrieval track record. So I inquired whether I could stay in Delft for pursuing my PhD.

The answer was somewhat complicated. There was no funding for a PhD position in Music Information Retrieval, and there were no strategic plans to change that. At the same time, the people who had supervised me as a student (in particular, my thesis supervisor Alan Hanjalic) saw promise in me, and would like to keep working with me. Ultimately, I got a one-year contract in which my main task was to try acquiring funding and international community backing to pursue a Music Information Retrieval PhD in a multimedia group.

At the start of that year, I got to attend my first ISMIR conference, where I presented a paper based on my master’s thesis. In a previous column for the SIGMM records, I already discussed my experiences at that moment: how debuting alone at a conference was intimidating, but how I was lucky that senior members of the community pro-actively took care I got introduced to other attendees. Frans Wiering, the senior member who looked after me in particular at that moment, was general chair of the upcoming ISMIR, which would take place in Utrecht, so in my home country. Frans was quick to invite me to serve as a student volunteer, which was very good news for me. As my year would be filled with grant-writing, I did not yet have a sufficiently stable infrastructure around me to be able to truly do research, so submitting to the next ISMIR was out of reach. But this way, I could still attend the conference, and even would have an excuse to keep mingling with all the attendees, as we as volunteers would be the first people to answer any participant questions regarding logistics.

Getting funding turned out a true challenge. In 2009, digital music consumption was not as large yet as it is today, and many potential data-providing partners were reluctant to collaborate. Of course, it also did not help my cause that I still was a complete nobody. Finally, when working on music, one faces an interesting paradox. On the one hand, many people, regardless of their backgrounds, identify with music, up to the point that they personally deeply care about it. As such, working on music makes for a good conversation starter, in which people are always happy to share their personal experiences. On the other hand, this makes music a commonplace topic, which risks it being shoved aside as ‘less serious’. Even though technically, the problems we are working on are framed in very similar ways as they may be in neighboring domains such as vision (and the research challenges are at least as hard, if not harder, due to subjective human factors being an integral part of the problem), common criticisms we receive are that music is fun but does not save lives, and does not deal with areas of major economical impact, nor easily measurable societal impact. So while we never have any problems legitimizing our work in public outreach, in grant-writing, we always need to justify extra why our work is more than a fun hobby, and sufficiently relevant to justify serious funding.

After several collaboration rejections, and the one proposal I did manage setting up getting rejected despite good review scores, I was very lucky that at the very end of my grant-writing year, I managed securing PhD funding through a Google Doctoral Fellowship (now PhD Fellowship). For this, I needed to get a research mentor, although my Google contacts weren’t so sure who would be appropriate for this role, as they were not aware of anyone working in music in the company at that stage.

Several weeks later, I was volunteering at ISMIR in Utrecht. That was where I found out that Douglas Eck had just moved from academia to industry, to work on music research at Google. And that was how I got my research mentor, with several extremely useful interning experiences at the company as a consequence.

When Emilia Gómez, the 2018-2019 president of the ISMIR society invited me to become general co-chair to ISMIR’s 20th anniversary edition with her, and host the event in Delft, this was my chance to give back. Now I had general chair powers, and as the society was quite open to discussing any innovations, I could try realizing the conference of my dreams.

As described in my previous column, the inclusive spirit of ISMIR has always been quite elaborate, including mentoring programs spearheaded by our Women in MIR movement, an explicit focus on multidisciplinarity over exclusivity, and on being medium-sized but single-track. Since two years, all our accepted papers are presented in a 4-minute presentation and a poster, such that all the works get equal visibility. This year, we chose to not do themed sessions but to randomize the paper order, such that authors on related topics would not be presenting their posters at the same time. As a side-effect, this also would nudge attendees towards learning about everything that got accepted, beyond the topics of their specializations. This is something I have seen the ISMIR community always being enthusiastic about, while I had very different experiences at (more prestigious) larger-sized conferences. In many cases, their larger size led to many parallel tracks with fragmented audiences, while any plenary program elements were so massive that it was hard to engage with anyone you did not happen to know already, or incidentally happened to stand or sit next to.

We made sure we offered more than paper presentations. For the keynotes, we invited speakers from neighboring fields and disciplines, and encouraged them to give some critical perspectives on our field. We engaged with a local school in an outreach program. Before the conference, we held workshops, including the Women in MIR prototyping workshop, so people would already get to know one another; we had a dedicated Newcomer Initiatives chair to make sure no one felt lost, and the socials were set up such that people could really mingle. With many people in music also happening to be active music players, we offered both formal and informal options to jam together, so that week, several cafes in Delft faced more live music than we would normally see.

But while I was preparing for this conference, one of my strongest experiences was that I kept being haunted by these memories of the past: that being able to join this community (and an academic career at all) had been a really close call, that really was catalyzed by me having been able to join the conferences, and having met supportive seniors, while I was still an early-stage student without a full research embedding.

So one of the ISMIR 2019 achievements I am most proud of, was that we extended our financial support programs, enabled by the ISMIR board and sponsorship funds. Beyond the existing grants for student authors and female participants, we added a third ‘community grant’ category, meant for individuals who would like to attend ISMIR, but who had not been in the capacity to actively participate to the conference at this stage. Reading through the motivation letters for this grant made me realize that my experiences not as much of a freak case, and that colleagues have been facing similar challenges.

I am deeply grateful that these grants enabled for us to get more people over to ISMIR. Young professionals in between positions, students in other disciplines seeking to collaborate more closely on music topics; students that have found themselves as sole people in their labs working on music, as the labs faced other strategic priorities; but also, seniors who used to be members of our field, but who had gradually been drifting out, when entering a vicious circle of not getting music projects funded, then having to do more teaching in other topics, and then taking hits on their research output and profile. It was a wonderful experience seeing all of them actively mingling with the community, and hearing how being at ISMIR indeed had been personally impactful for them.

For my student volunteers, I especially targeted local and national students who were not yet at the PhD level, such that they could experience our academic atmosphere. Here as well, I saw the positive impact of the ISMIR spirit; several of these students (of whom I am not even the thesis supervisor…) made friends with international colleagues, and are even trying to collaborate on music information research with them in their free time today.

Hopefully, this story can help inspiring colleagues who are seeking to make their conference cultures more inclusive and impactful. With this, I do want to add a warning that endeavors like this will not come for free, but demand considerable extra work and advocacy. Much of our proposed innovations initially faced pushback in some form, as these were not how things normally were done, and they required financial and human resources that would not be normally accounted for. But I am very grateful that we followed through, and extremely proud of what we achieved in the end. My great thanks go to the ISMIR society, my fellow ISMIR 2019 organizers and our sponsors for their trust and support.

All ISMIR 2019 presentations have been recorded, and are available through this link. The accepted (open access) papers with supplementary material are available via this page. Photos of the socials are available here.


About the Column

The Multidisciplinary Column is edited by Cynthia C. S. Liem and Jochen Huber. Every other edition, we will feature an interview with a researcher performing multidisciplinary work, or a column of our own hand. For this edition, we feature a column by Cynthia C. S. Liem.

Dr. Cynthia C. S. Liem is an Assistant Professor in the Multimedia Computing Group of Delft University of Technology, The Netherlands, and pianist of the Magma Duo. Her research interests consider search and recommendation for music and multimedia, with special interest in making people discover new interests, as well as questions of interpretability and validity. She initiated, co-coordinated and participated in various (inter)national collaborative research projects on the accessibility of content which would not trivially be retrieved, both in the music/cultural heritage world, as well as in social sciences applications, e.g. collaborating with organizational psychologists. Beyond her academic activities, Cynthia gained industrial experience at Bell Labs Netherlands, Philips Research and Google. She was a recipient of the Lucent Global Science and Google Anita Borg Europe Memorial scholarships, the Google European Doctoral Fellowship 2010 in Multimedia, and a finalist of the New Scientist Science Talent Award 2016 for young scientists committed to public outreach. In 2018, she was Researcher-in-Residence at the National Library of The Netherlands, and in 2019, she served as general co-chair of the ISMIR conference.

Dr. Jochen Huber is a Senior User Experience Researcher at Synaptics. Previously, he was an SUTD-MIT postdoctoral fellow in the Fluid Interfaces Group at MIT Media Lab and the Augmented Human Lab at Singapore University of Technology and Design. He holds a Ph.D. in Computer Science and degrees in both Mathematics (Dipl.-Math.) and Computer Science (Dipl.-Inform.), all from Technische Universität Darmstadt, Germany. Jochen’s work is situated at the intersection of Human-Computer Interaction and Human Augmentation. He designs, implements and studies novel input technology in the areas of mobile, tangible & non-visual interaction, automotive UX and assistive augmentation. He has co-authored over 60 academic publications and regularly serves as program committee member in premier HCI and multimedia conferences. He was program co-chair of ACM TVX 2016 and Augmented Human 2015 and chaired tracks of ACM Multimedia, ACM Creativity and Cognition and ACM International Conference on Interface Surfaces and Spaces, as well as numerous workshops at ACM CHI and IUI. Further information can be found on his personal homepage: http://jochenhuber.com

Dataset Column: Report from the MMM 2019 Special Session on Multimedia Datasets for Repeatable Experimentation (MDRE 2019)

Special Session

Information retrieval and multimedia content access have a long history of comparative evaluation, and many of the advances in the area over the past decade can be attributed to the availability of open datasets that support comparative and repeatable experimentation. Sharing data and code to allow other researchers to replicate research results is needed in the multimedia modeling field, as it helps to improve the performance of systems and the reproducibility of published papers.

This report summarizes the special session on Multimedia Datasets for Repeatable Experimentation (MDRE 2019), which was organized at the 25th International Conference on MultiMedia Modeling (MMM 2019), which was held in January 2019 in Thessaloniki, Greece.

The intent of these special sessions is to be a venue for releasing datasets to the multimedia community and discussing dataset related issues. The presentation mode in 2019 was to have short presentations (8 minutes) with some questions, and an additional panel discussion after all the presentations, which was moderated by Björn Þór Jónsson. In the following we summarize the special session, including its talks, questions, and discussions.

The special session presenters: Luca Rossetto, Cathal Gurrin and Minh-Son Dao.

Presentations

A Test Collection for Interactive Lifelog Retrieval

The session started with a presentation about A Test Collection for Interactive Lifelog Retrieval [1], given by Cathal Gurrin from Dublin City University (Ireland). In their work, the authors introduced a new test collection for interactive lifelog retrieval, which consists of multi-modal data from 27 days, comprising nearly 42 thousand images and other personal data (health and activity data; more specifically, heart rate, galvanic skin response, calorie burn, steps, blood pressure, blood glucose levels, human activity, and diet log). The authors argued that, although other lifelog datasets already exist, their dataset is unique in terms of the multi-modal character, and has a reasonable and easily manageable size of 27 consecutive days. Hence, it can also be used for interactive search and provides newcomers with an easy entry into the field. The published dataset has already been used for the Lifelog Search Challenge (LSC) [5] in 2018, which is an annual competition run at the ACM International Conference on Multimedia Retrieval (ICMR).

The discussion about this work started with a question about the plans for the dataset and whether it should be extended over the years, e.g. to increase the challenge of participating in the LSC. However, the problem with public lifelog datasets is the fact that there is a conflict between releasing more content and safeguarding privacy. There is a strong need to anonymize the contained images (e.g. blurring faces and license plates), where the rules and requirements of the EU GDPR regulations make this especially important. However, anonymizing content unfortunately is a very slow process. An alternative to removing and/or masking actual content from the dataset for privacy reasons would be to create artificial datasets (e.g. containing public images or only faces from people who consent to publish), but this would likely also be a non-trivial task. One interesting aspect could be the use of Generative Adversarial Networks (GANs) for the anonymization of faces, for instance by replacing all faces appearing in the content with generated faces learned from a small group of people who gave their consent. Another way to preemptively mitigate the privacy issues could be to wear conspicuous ‘lifelogging stickers’ during recording to make people aware of the presence of the camera, which would give them the possibility to object to being filmed or to avoid being captured altogether.

SEPHLA: Challenges and Opportunities Within Environment-Personal Health Archives

The second presentation was given by Minh-Son Dao from the National Institute of Information and Communications Technology (NICT) in Japan about SEPHLA: Challenges and Opportunities Within Environment-Personal Health Archives [2]. This is a dataset that aims at combining the conditions of the environment with health-related aspects (e.g., pollution or weather data with cardio-respiratory or psychophysiological data). The creation of the dataset was motivated by the fact that people in larger cities in Japan very often do not want to go out (e.g., for some sports activities), because they are very concerned about pollution, i.e., health conditions. So it would be beneficial to have a map of the city with assigned pollution ratings, or a system that allows to perform related queries. Their dataset contains sensor data collected on routes by a few dozen volunteer  people over seven days in Fukuoka, Japan. More particularly, they collected data about the location, O3, NO2, PM2.5 (particulates), temperature, and humidity in combination with heart rate, motion behavior (from 3-axis accelerometer), relaxation level, and other personal perception data from questionnaires.

This dataset has also been used for multimedia benchmark challenges, such as the Lifelogging for Wellbeing task at MediaEval. In order to define the ground truth, volunteers were presented with specific use cases and annotation rules, and were asked to collaboratively annotate the dataset. The collected data (the feelings of participants at different locations) was also visualized using an interactive map. Although the dataset may have some inconsistent annotations, it is easy to filter them out since labels of corresponding annotators and annotator groups are contained in the dataset as well.

V3C – a Research Video Collection

The third presentation was given by Luca Rossetto from the University of Basel (Switzerland) about V3C – a Research Video Collection [3]. This is a large-scale dataset for multimedia retrieval, consisting of nearly 30,000 videos with an overall duration of about 3,800 hours. Although many other video datasets are available already (e.g., IACC.3 [6], or YFCC100M [8]), the V3C dataset is unique in the aspects of timeliness (more recent content than many other datasets and therefore more representative content for current ‘videos in the wild’) and diversity (represents many different genres or use cases), while also having no copyright restrictions (all contained videos were labelled with a Creative Commons license by their uploaders). The videos have been collected from the video sharing platform Vimeo (hence the name ‘Vimeo Creative Commons Collection’ or V3C in short) and represent video data currently used on video sharing platforms. The dataset comes together with a master shot-boundary detection ground truth, as well as keyframes and additional metadata. It is partitioned into three major parts (V3C1, V3C2, and V3C3) to make it more manageable, and it will be used by the TRECVID and the Video Browser Showdown (VBS) evaluation campaigns for several years. Although the dataset was not specifically built for retrieval, it is suitable for any use case that requires a larger video dataset.

The shot-boundary detection used to provide the master-shot reference for the V3C dataset was implemented using Cineast, which is an open source software available for download. It divides every frame into a 3×3 grid and computes color histograms for all 9 areas, which are then concatenated into a ‘regional color histogram’ feature vector that is compared between all adjacent frames. This seems to work very well for hard cuts and gradual transitions, although for grayscale content (and flashlights etc.) it is not very stable. The additional metadata provided with the dataset includes information about resolution, frame rate, uploading user and the upload date, as well as any semantic information provided by the uploader (title, description, tags, etc.). 

Athens Urban Soundscape (ATHUS): A Dataset for Urban Soundscape Quality Recognition

Originally a fourth presentation was scheduled about Athens Urban Soundscape (ATHUS): A Dataset for Urban Soundscape Quality Recognition [4], but unfortunately no author was on site to give the presentation. This dataset contains audio samples with a duration of 30 seconds (as well as extracted features and ground truth) from a metropolitan city (Athens, Greece), that have been recorded during a period of about four years by 10 different persons with the aim to provide a collection about city sounds. The metadata includes geospatial coordinates, timestamp, rating, and tagging of the sound by the recording person. The authors demonstrated in a baseline evaluation that their dataset allows to predict the soundscape quality in the city with about 42% accuracy.

Discussion

After the presentations, Björn Þór Jónsson moderated a panel discussion in which all presenters participated.

The panel started with a discussion on the size of datasets, whether the only way to make challenges more difficult is to keep increasing the dataset, or whether there are alternatives to this. Although this heavily depends on the research question one would like to solve, it was generally agreed that there is a definite need for evaluation with large datasets, because for small datasets some problems are trivial. Moreover, too small datasets often introduce some kind of content bias, so that they do not fully reflect the practical situation.

For now, it seems there is no real alternative to using larger datasets although it is clear that this will introduce additional challenges/hurdles for data management and data processing. All presenters (and the audience too) agreed that introducing larger datasets will also necessitate the need for closer collaboration with other research communities―with fields like data science, data management/engineering, and distributed and high-performance computing―in order to manage the higher data load.

However, even though we need larger datasets, we might not be ready yet to really go for true large-scale. For example, the V3C dataset is still far away from a true web-scale video search dataset; it originally was intended to be even bigger, but there were concerns from the TRECVID and VBS communities about the manageability. Datasets that are too large would set the entrance barrier for newcomers so high that an evaluation benchmark may not attract enough participants―a problem that could possibly disappear in a few years (as hardware becomes cheaper and faster/larger), but still needs to be addressed from an organizational viewpoint. 

There were notes from the audience that instead of focusing on size alone, we should also consider the problem we want to solve. It appears many researchers use datasets for use cases for which they were not designed and are not suited to. Instead of blindly going for larger size, datasets could be kept small and simple for solving essential research questions, for example by truly optimizing them to the problem to solve; different evaluations would then use different datasets. However, this would lead to a considerable dataset fragmentation and necessitate the need for combining several datasets for broader/larger evaluation tasks, which has been shown to be quite challenging in the past. For example, there are already a lot of health datasets available, and it would be interesting to take benefit from them, but the workload for the integration into competitions is often too high in practice.

Another issue that should be addressed more intensively by the research community is to figure out the situation for personal datasets that are compliant with GDPR regulations, since currently nobody really knows how to deal with this.

Acknowledgments

The session was organized by the authors of the report, in collaboration with Duc-Tien Dang-Nguyen (Dublin City University), Michael Riegler (Center for Digitalisation and Engineering & University of Oslo), and Luca Piras (University of Cagliari). The panel format of the special session made the discussions much more lively and interactive than that of a traditional technical session. We would like to thank the presenters and their co-authors for their excellent contributions, as well as the members of the audience who contributed greatly to the session.

References

[1] Gurrin, C., Schoeffmann, K., Joho, H., Munzer, B., Albatal, R., Hopfgartner, F., … & Dang-Nguyen, D. T. (2019, January). A test collection for interactive lifelog retrieval. In International Conference on Multimedia Modeling (pp. 312-324). Springer, Cham.
[2] Sato, T., Dao, M. S., Kuribayashi, K., & Zettsu, K. (2019, January). SEPHLA: Challenges and Opportunities Within Environment-Personal Health Archives. In International Conference on Multimedia Modeling (pp. 325-337). Springer, Cham.
[3] Rossetto, L., Schuldt, H., Awad, G., & Butt, A. A. (2019, January). V3C–A Research Video Collection. In International Conference on Multimedia Modeling (pp. 349-360). Springer, Cham.
[4] Giannakopoulos, T., Orfanidi, M., & Perantonis, S. (2019, January). Athens Urban Soundscape (ATHUS): A Dataset for Urban Soundscape Quality Recognition. In International Conference on Multimedia Modeling (pp. 338-348). Springer, Cham.
[5] Dang-Nguyen, D. T., Schoeffmann, K., & Hurst, W. (2018, June). LSE2018 Panel-Challenges of Lifelog Search and Access. In Proceedings of the 2018 ACM Workshop on The Lifelog Search Challenge (pp. 1-2). ACM.
[6] Awad, G., Butt, A., Curtis, K., Lee, Y., Fiscus, J., Godil, A., … & Kraaij, W. (2018, November). Trecvid 2018: Benchmarking video activity detection, video captioning and matching, video storytelling linking and video search.
[7] Lokoč, J., Kovalčík, G., Münzer, B., Schöffmann, K., Bailer, W., Gasser, R., … & Barthel, K. U. (2019). Interactive search or sequential browsing? a detailed analysis of the video browser showdown 2018. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 15(1), 29.
[8] Kalkowski, S., Schulze, C., Dengel, A., & Borth, D. (2015, October). Real-time analysis and visualization of the YFCC100M dataset. In Proceedings of the 2015 Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions(pp. 25-30). ACM.

Introducing the new role of the Director of Diversity and Outreach

sigmm-logo2

Over the last few decades SIGMM has grown with regard to the number and size of conferences and workshops we organize and sponsor, and we have grown with regard to our international outreach. Researchers from all over the world now participate in SIGMM and its many activities. In the same way in which we grow internationally with regard to members, with regard to the participants attending our conferences and their different backgrounds, the diversity of SIGMM is also growing. However, we can observe that diversity and all the aspects it brings to a society is not necessarily “just something” but needs to be supported and embraced by a cultural change of the organization and all its members.

Introducing the new role of SIGMM Director of Diversity and Outreach

In 2019, SIGMM created the new role of SIGMM Director of Diversity and Outreach with a variety of roles and responsibilities, for an initial 3-year period. Creation of this position is a sign and an action to establish future activities and an invitation on a more formal level to move our work in this area beyond anecdotal activities and personal engagement. The Director of Diversity and Outreach will be a voting member of the SIGMM Executive Committee. The EC Chair has drafted and circulated a role specification for this and sent a call to the community for expressions of interest in the role in Spring 2019. The confirmation of an appointment was made by the EC in May 2019. For the inaugural appointment 2019-2021, Susanne Boll has has been elected unanimously for this role. With this new director of diversity and outreach, SIGMM is supporting and developing diversity on an institutional level as a voting member of the SIGMM Executive Committee.

First Initiative

As a first initiative, the SIGMM EC has decided on a “25 in 25’’ strategy to strategically increase the participation of women in SIGMM and all its activities. This strategy aims at increasing the participation of women in all activities and committees of SIGMM to at least 25% by 2025. 

It can be observed that female participation in SIGMM has been low over many years. Even though there were good initiatives over the last decades, we have failed to include a proportionate number of women researchers into the SIG and into our executive structures and event organization. As we observe that about 25% of all CS degrees in computer science are held by women, we may well expect that ACM find these numbers reflected in the number of women active within their Special Interest Groups –  which is not the case in SIGMM. We strongly believe that it will only change if we as SIGMM take action. This action will take place on three levels. 

With the SIGMM Executive Actions we aim at an obligatory inclusion of women in the steering committees of SIGMM. For the coming elections in 2021, we will implement a voting scheme by which the two leading chair positions, SIGMM Chair and SIGMM Vice Chair, will be filled by a man and a woman. For the forthcoming SIGMM officer elections, SIGMM will also fill other candidate roles with two individuals, one man and one woman to ensure gender equality on the level of the different roles. 

With the SIGMM Conference Steering Actions for all forthcoming appointments to the individual Steering Committees, the Steering Committees will invite female candidates in order to reach at least a 25% share of their memberships. All Steering Committees will have their members online and maintain a history of their SC and the different positions on the organizing committee of their related conferences online.

With the SIGMM Conference Actions we request that all SIGMM-sponsored conferences have at least 25% representation of women in all roles of their organizing committee which will be observed for all forthcoming bids for conferences.  We aim at organising committees in which the many volunteer roles for our conferences, such as general chair, workshop chair, tutorial chair, panel chair, web chair, local chair, or proceedings chair could be filled by two individuals, one woman and one man.  

The SIGMM Director of Diversity and Outreach will observe the implementation of these rules and report on the state and progress annually within the EC, at the annual SIGMM business meeting at ACM Multimedia and publish a report in SIGMM Records.

What’s next?

The creation of the role of the  SIGMM Director of Diversity and Outreach was a first step. The initiative “25 in 25” is the first set of initiatives and further initiatives will follow. Currently, we are already in discussion about actions across SIGMM events such as travel support, childcare, mentoring support, support for speakers and targeted meetings. We will regularly inform you through our regular newsletter, website, and meetings. 

SIGMM understands the new role as actively pushing and developing diversity and outreach within SIGMM. The new director is here to listen and to act for a better diversity of our Special Interest Group MM, our activities and our outreach to the multimedia community.  All SIGMM members are strongly invited to support the activities of the director of outreach and the different initiatives. The director will also seek and actively exchange with and learn from other Special Interest Groups within ACM and other societies. If you want to get involved please join us (contact: Susanne Boll boll@acm.org).

Dataset Column: Datasets for Online Multimedia Verification

Introduction

Online disinformation is a problem that has been attracting increased interest by researchers worldwide as the breadth and magnitude of its impact is progressively manifested and documented in a number of studies (Boididou et al., 2014; Zhou & Zafarani, 2018; Zubiaga et al., 2018). This emerging area of research is inherently multidisciplinary and there have been numerous treatments of the subject, each having a distinct perspective or theme, ranging from the predominant perspectives of media, journalism and communications (Wardle & Derakhshan, 2017) and political science (Allcott & Gentzkow, 2017) to those of network science (Lazer et al., 2018), natural language processing (Rubin et al., 2015) and signal processing, including media forensics (Zampoglou et al., 2017). Given the multimodal nature of the problem, it is no surprise that the multimedia community has taken a strong interest in the field.

From a multimedia perspective, two research problems have attracted the bulk of researchers’ attention: a) detection of content tampering and content fabrication, and b) detection of content misuse for disinformation. The first was traditionally studied within the field of media forensics (Rocha et al, 2011), but has recently been under the spotlight as a result of the rise of deepfake videos (Güera & Delp, 2018), i.e. a special class of generative models that are capable of synthesizing highly convincing media content from scratch or based on some authentic seed content. The second problem has focused on the problem of multimedia misuse or misappropriation, i.e. the use of media content out of its original context with the goal of spreading misinformation or false narratives (Tandoc et al., 2018).

Developing automated approaches to detect media-based disinformation is relying to a great extent on the availability of relevant datasets, both for training supervised learning models and for evaluating their effectiveness. Yet, developing and releasing such datasets is a challenge in itself for a number of reasons:

  1. Identifying, curating, understanding, and annotating cases of media-based misinformation is a very effort-intensive task. More often than not, the annotation process requires careful and extensive reading of pertinent news coverage from a variety of sources similar to the journalistic practice of verification (Brandtzaeg et al., 2016).
  2. Media-based disinformation is largely manifested in social media platforms and relevant datasets are therefore hard to collect and distribute due to the temporary nature of social media content and the numerous technical restrictions and challenges involved in collecting content (mostly due to limitations or complete lack of appropriate support by the respective APIs), as well as the legal and ethical issues in releasing social media-based datasets (due to the need to comply with the respective Terms of Service and any applicable data protection law).

In this column, we present two multimedia datasets that could be of value to researchers who study media-based disinformation and develop automated approaches to tackle the problem. The first, called Fake Video Corpus (Papadopoulou et al., 2019) is a manually curated collection of 200 debunked and 180 verified videos, along with relevant annotations, accompanied by a set of 5,193 near-duplicate instances of them that were posted on popular social media platforms. The second, called FIVR-200K (Kordopatis-Zilos et al., 2019), is an automatically collected dataset of 225,960 videos, a list of 100 video queries and manually verified annotations regarding the relation (if any) of the dataset videos to each of the queries (i.e. near-duplicate, complementary scene, same incident).

For each of the two datasets, we present the design and creation process, focusing on issues and questions regarding the relevance of the collected content, the technical means of collection, and the process of annotation, which had the dual goal of ensuring high accuracy and keeping the manual annotation cost manageable. Given that each dataset is accompanied by a detailed journal article, in this column we only limit our description to high-level information, emphasizing the utility and creation process in each case, rather than on detailed statistics, which are disclosed in the respective papers.

Following the presentation of the two datasets, we then proceed to a critical discussion, highlighting their limitations and some caveats, and delineating future steps towards high quality dataset creation for the field of multimedia-based misinformation.

Related Datasets

The complexity and challenge of the multimedia verification problem has led to the creation of numerous datasets and benchmarking efforts, each designed specifically for a particular task within this area. We can broadly classify these efforts in three areas: a) multimedia forensics, b) multimedia retrieval, and c) multimedia post classification. Datasets that are focused on the text modality, e.g. Fake News Challenge, Clickbait Challenge, Hyperpartisan News Detection, RumourEval (Derczynski et al 2017), etc. are beyond the scope of this post and are hence not included in this discussion.

Multimedia forensics: Generating high-quality multimedia forensics datasets has always been a challenge, since creating convincing forgeries is normally a manual task requiring a fair amount of skill, and as a result such datasets have generally been few and limited in scale. With respect to image splicing, our own survey (Zampoglou et al, 2017) listed a number of datasets that had been made available by this point, including our own Wild Web tampered image dataset, which consists of real-world forgeries that have been collected from the Web, including multiple near-duplicates, making it a large and particularly challenging collection. Recently, the Realistic Tampering Dataset (Korus et al,2017) was proposed, offering a large number of convincing forgeries for evaluation. On the other hand, copy-move image forgeries pose a different problem that requires specially designed datasets. Three such commonly used datasets are those produced by MICC (Amerini et al, 2011), the Image Manipulation Dataset by (Christlein et al, 2012), and CoMoFoD (Tralic et al, 2013). These datasets are still actively used in research.

With respect to video tampering, there has been relative scarcity in high-quality large-scale datasets, which is understandable given the difficulty of creating convincing forgeries. The recently proposed Multimedia Forensics Challenge datasets include some large-scale sets of tampered images and videos for the evaluation of forensics algorithms. Finally, there has recently been increased interest towards the automatic detection of forgeries made with the assistance of particular software, and specifically face-swapping software. As the quality of produced face-swaps is constantly improving, detecting face-swaps is an important emerging verification task. The FaceForensics++ dataset (Rössler et al, 2019) is a very-large scale dataset containing face-swapped videos (and untampered face videos) from a number of different algorithms, aimed for the evaluation of face-swap detection algorithms.

Multimedia retrieval: Several cases of multimedia verification can be considered to be an instance of a near-duplicate retrieval task, in which the query video (video to be verified) is run against a database of past cases/videos to check whether it has already appeared before. The most popular and publicly-available dataset for near-duplicate video retrieval is arguably the CC_WEB_VIDEO dataset (Wu et al., 2007). This consists of 12,790 user-generated videos collected from popular video sharing websites (YouTube, Google Video, and Yahoo! Video). It is organized in 24 query sets, for each of which the most popular video was selected to serve as query, and the rest of the videos were manually annotated based on their duplicity to the query. Another relevant dataset is VCDB (Jiang et al., 2014), which was compiled and annotated as a benchmark for the partial video copy detection problem and is composed of videos from popular video platforms (YouTube and Metacafe). VCDB contains two subsets of videos: a) the core, which consists of 28 discrete sets of videos with a total of 528 videos with over 9,000 pairs of manually annotated partial copies, and b) the distractors, which consists of 100,000 videos with the purpose to make the video copy detection problem more challenging.

Multimedia post classification: A benchmark task under the name “Verifying Multimedia Use” (Boididou et al., 2015; Boididou et al., 2016) was organized and took place in the context of MediaEval 2015 and 2016 respectively. The task made a dataset available of 15,629 tweets containing images and videos, each of which made a false or factual claim with respect to the shared image/video. The released tweets were posted in the context of breaking news events (e.g. Hurricane Sandy, Boston Marathon bombings) or hoaxes. 

Video Verification Datasets

The Fake Video Corpus (FVC)

The Fake Video Corpus (Papadopoulou et al., 2018) is a collection of 380 user-generated videos and 5,193 near-duplicate versions of them, all collected from three online video platforms: YouTube, Facebook, and Twitter. The videos are annotated either as “verified” (“real”) or as “debunked” (“fake”) depending on whether the information they convey is accurate or misleading. Verified videos are typically user-generated takes of newsworthy events, while debunked videos include various types of misinformation, including staged content posing as UGC, real content taken out of context, or modified/tampered content (see Figure 1 for examples). The near-duplicates of each video are arranged in temporally ordered “cascades”, and each near-duplicate video is annotated with respect to its relation to the first video of the cascade (e.g. whether it is reinforcing or debunking the original claim). The FVC is the first, to our knowledge, large-scale dataset of debunked and verified user-generated videos (UGVs). The dataset contains different kinds of metadata for its videos, including channel (user) information, video information, and community reactions (number of likes, shares and comments) at the time of their inclusion.

  
  
Figure 1. A selection of real (top row) and fake (bottom row) videos from the Fake Video Corpus. Click image to jump to larger version, description, and link to YouTube video.

The initial set of 380 videos were collected and annotated using various sources including the Context Aggregation and Analysis (CAA) service developed within the InVID project and fact-checking sites such as Snopes. To build the dataset, all videos submitted to the CAA service between November 2017 and January 2018 were collected in an initial pool of approximately 1600 videos, which were then manually inspected and filtered. The remaining videos were annotated as “verified” or “debunked” using established third party sources (news articles or blog posts), leading to the final pool of 180 verified and 200 fake unique videos. Then, keyword-based search was run on the three platforms, and near-duplicate video detection was used to identify the video duplicates within the returned results. More specifically, for each of the 380 videos, its title was reformulated in a more general form, and translated into four major languages: Russian, Arabic, French, and German. The original title, the general form and the translations were submitted as queries to YouTube, Facebook, and Twitter. Then, the  near-duplicate retrieval algorithm of Kordopatis-Zilos etal (2017) was used on the resulting pool, and the results were manually inspected to remove erroneous matches.

The purpose of the dataset is twofold: i) to be used for the analysis of the dissemination patterns of real and fake user-generated videos (by analyzing the traits of the near-duplicate video cascades), and ii) to serve as a benchmark for the evaluation of automated video verification methods. The relatively large size of the dataset is important for both of these tasks. With respect to the study of dissemination patterns, the dataset provides the opportunity to study the dissemination of the same or similar content by analyzing associations between videos not provided by the original platform APIs, combined with the wealth of associated metadata. In parallel, having a collection of 5,573 annotated “verified” or “debunked” videos- even if many are near-duplicate versions of the 380 cases – can be used for the evaluation (or even training) of verification systems, either based on visual content or the associated video metadata.

The Fine-grained Incident Video Retrieval Dataset (FIVR-200K)

The FIVR-200K dataset (Kordopatis-Zilos et al., 2019) consists of 225,960 videos associated with 4,687 Wikipedia events and 100 selected video queries (see Figure 2 for examples). It has been designed to simulate the problem of Fine-grained Incident Video Retrieval (FIVR). The objective of this problem is: given a query video, retrieve all associated videos considering several types of associations with respect to an incident of interest. FIVR contains several retrieval tasks as special cases under a single framework. In particular, we consider three types of association between videos: a) Duplicate Scene Videos (DSV), which share at least one scene (originating from the same camera) regardless of any applied transformation, b) Complementary Scene Videos (CSV), which contain part of the same spatiotemporal segment, but captured from different viewpoints, and c) Incident Scene Videos (ISV), which capture the same incident, i.e. they are spatially and temporally close, but have no overlap.

For the collection of the dataset, we first crawled Wikipedia’s Current Event page to collect a large number of major news events that occurred between 2013 and 2017 (five years). Each news event is accompanied with a topic, headline, text, date, and hyperlinks. To collect videos of the same category, we retained only news events with topic “Armed conflicts and attacks” or “Disasters and accidents”. This ultimately led to a total of 4,687 events after filtering. To gather videos around these events and build a large collection with numerous video pairs that are associated through the relations of interest (DSV, CSV and ISV), we queried the public YouTube API with the event headlines. To ensure that the collected videos capture the corresponding event, we retained only the videos published within a timespan of one week from the event date. This process resulted in the collection of 225,960 videos.

  
Figure 2. A selection of query videos from the Fine-grained Incident Video Retrieval dataset. Click image to jump to larger version, link to YouTube video, and several associated videos.

Next, we proceeded with the selection of query videos. We set up an automated filtering and ranking process that implemented the following criteria: a) query videos should be relatively short and ideally focus on a single scene, b) queries should have many near-duplicates or same-incident videos within the dataset that are published by many different uploaders, c) among a set of near-duplicate/same-instance videos, the one that was uploaded first should be selected as query. This selection process was implemented based on a graph-based clustering approach and resulted in the selection of 635 query videos, of which we used the top 100 (ranked by corresponding cluster size) as the final query set.

For the annotation of similarity relations among videos, we followed a multi-step process, in which we presented annotators with the results of a similarity-based video retrieval system and asked them to indicate the type of relation through a drop-down list of the following labels: a) Near-Duplicate (ND), a special case where the whole video is near-duplicate to the query video, b) Duplicate Scene (DS), where only some scenes in the candidate video are near-duplicates of scenes in the query video, c) Complementary Scenes (CS), d) Incident Scene (IS), and e) Distractors (DI), i.e. irrelevant videos.

To make sure that annotators were presented with as many potentially relevant videos as possible, we used visual-only, text-only and hybrid similarity in turn. As a result, each annotator reviewed video candidates that had very high similarity with the query video in terms either of their visual content, or text metadata (title and description) or the combination of similarities. Once an initial set of annotations were produced by two independent annotators, the annotators went twice again through the annotations two ensure consistency and accuracy.

FIVR-200K was designed to serve as a benchmark that poses real-world challenges for the problem of reverse video search. Given a query video to be verified, the analyst would want to know whether the same or a very similar version of it has already been published. In that way, the user would be able to easily debunk cases of out-of-context video use (i.e. misappropriation) and on the other hand, if several videos are found that depict the same scene from different viewpoints at approximately the same time, then they could be considered to corroborate the video of interest.

Discussion: Limitations and Caveats

We are confident that the two video verification datasets presented in this column can be valuable resources for researchers interested in the problem of media-based disinformation and could serve both as training sets and as benchmarks for automated video verification methods. Yet, both of them suffer from certain limitations and care should be taken when using them to draw conclusions. 

A first potential issue has to do with the video selection bias arising from the particular way that each of the two datasets was created. The videos of the Fake Video Corpus were selected in a mixed manner trying to include a number of cases that were known to the dataset creators and their collaborators, and was also enriched by a pool of test videos that were submitted for analysis to a publicly available video verification service. As a result, it is likely to be more focused on viral and popular videos. Also, videos were included, for which debunking or corroborating information was found online, which introduces yet another source of bias, potentially towards cases that were more newsworthy or clear cut. In the case of the FIVR-200K dataset, videos were intentionally collected to be between two categories of newsworthy events with the goal of ending up with a relatively homogeneous collection, which would be challenging in terms of content-based retrieval. This means that certain types of content, such as political events, sports and entertainment, are very limited or not present at all in the dataset. 

A question that is related to the selection bias of the above datasets pertains to their relevance for multimedia verification and for real-world applications. In particular, it is not clear whether the video cases offered by the Fake Video Corpus are representative of actual verification tasks that journalists and news editors face in their daily work. Another important question is whether these datasets offer a realistic challenge to automatic multimedia analysis approaches. In the case of FIVR-200K, it was clearly demonstrated (Kordopatis-Zilos et al., 2019) that the dataset is a much harder benchmark for near-duplicate detection methods compared to previous datasets such as CC_WEB_VIDEO and VCDB. Even so, we cannot safely conclude that a method, which performs very well in FIVR-200K, would perform equally well in a dataset of much larger scale (e.g. millions or even billions of videos).

Another issue that affects the access to these datasets and the reproducibility of experimental results relates to the ephemeral nature of online video content. A considerable (and increasing) part of these video collections is taken down (either by their own creators or from the video platform), which makes it impossible for researchers to gain access to the exact video set that was originally collected. To give a better sense of the problem, 21% of the Fake Video Corpus and 11% of the FIVR-200K videos were not available online on September 2019. This issue, which affects all datasets that are based on online multimedia content, raises the more general question of whether there are steps that can be taken by online platforms such as YouTube, Facebook and Twitter that could facilitate the reproducibility of social media research without violating copyright legislation or the platforms’ terms of service.

The ephemeral nature of online content is not the only factor that renders the value of multimedia datasets very sensitive to the passing of time. Especially in the case of online disinformation, there seems to be an arms’ race, where new machine learning methods constantly get better in detecting misleading or tampered content, but at the same time new types of misinformation emerge, which are increasingly AI-assisted. This is particularly profound in the case of deepfakes, where the main research paradigm is based on the concept of competition between a generator (adversary) and a detector (Goodfellow et al., 2014). 

Last but not least, one may always be concerned about the potential ethical issues arising when publicly releasing such datasets. In our case, reasonable concerns for privacy risks, which are always relevant when dealing with social media content, are addressed by complying with the relevant Terms of Service of the source platforms and by making sure that any annotation (label) assigned to the dataset videos is accurate. Additional ethical issues pertain to the potential “dual use” of the dataset, i.e. their use by adversaries to craft better tools and techniques to make misinformation campaigns more effective. A recent pertinent case was OpenAI’s delayed release of their very powerful GPT-2 model, which sparked numerous discussions and criticism, and making clear that there is no commonly accepted practice for ensuring reproducibility of research results (and empowering future research) and at the same time making sure that risks of misuse are eliminated.

Future work

Given the challenges of creating and releasing a large-scale dataset for multimedia verification, the main conclusions from our efforts towards this direction so far are the following:

  • The field of multimedia verification is in constant motion and therefore the concept of a static dataset may not be sufficient to capture the real-world nuances and latest challenges of the problem. Instead new benchmarking models, e.g. in the form of open data challenges, and resources, e.g. constantly updated repository of “fake” multimedia, appear to be more effective for empowering future research in the area.
  • The role of social media and multimedia sharing platforms (incl. YouTube, Facebook, Twitter, etc.) seems to be crucial in enabling effective collaboration between academia and industry towards addressing the real-world consequences of online misinformation. While there have been recent developments towards this direction, including the announcements by both Facebook and Alphabet’s Jigsaw of new deepfake datasets, there is also doubt and scepticism about the degree of openness and transparency that such platforms are ready to offer, given the conflicts of interest that are inherent in the underlying business model. 
  • Building a dataset that is fit for a highly diverse and representative set of verification cases appears to be a task that would require a community effort instead of effort from a single organisation or group. This would not only help towards distributing the massive dataset creation cost and effort to multiple stakeholders, but also towards ensuring less selection bias, richer and more accurate annotation and more solid governance.

References

Allcott, H., Gentzkow, M., “Social media and fake news in the 2016 election”, Journal of economic perspectives, 31(2), pp. 211–36, 2017.
Amerini, I, Ballan, L., Caldelli, R., Del Bimbo, A., Serra, G., “A SIFT-based forensic method for copy-move attack detection and transformation recovery”, IEEE Transactions on Information Forensics and Security, 6(3), pp. 1099–1110,2011.
Boididou, C., Papadopoulos, S., Kompatsiaris, Y., Schifferes, S., Newman, N., “Challenges of computational verification in social multimedia”, In Proceedings of the 23rd ACM International Conference on World Wide Web, pp. 743–748,2014.
Boididou, C., Andreadou, K., Papadopoulos, S., Dang-Nguyen, D.T., Boato, G., Riegler, M., Kompatsiaris, Y., “Verifying multimedia use at MediaEval 2015”. In Proceedings of MediaEval 2015, 2015.
Boididou C., Papadopoulos S., Dang-Nguyen D., Boato G., Riegler M., Middleton S.E., Petlund A., Kompatsiaris Y., “Verifying multimedia use at MediaEval 2016”. In Proceedings of MediaEval 2016, 2016.
Brandtzaeg, P.B., Lüders, M., Spangenberg, J., Rath-Wiggins, L., Følstad, A., “Emerging journalistic verification practices concerning social media”. Journalism Practice, 10(3), pp. 323–342, 2016.
Christlein V., Riess C., Jordan J., Riess C., Angelopoulou, E., “An evaluation of popular copy-move forgery detection approaches”. IEEE Transactions on Information Forensics & Security, 7(6), pp. 1841–1854, 2012.
Derczynski, L., Bontcheva, K., Liakata, M., Procter, R., Hoi, G.W.S., Zubiaga, A., “Semeval-2017 Task 8: Rumoureval: determining rumour veracity and support for rumours”, Proceedings of the 11th International Workshop on Semantic Evaluation,pp. 69-76, 2017.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Bengio, Y., “Generative adversarial nets”. In Advances in Neural Information Processing Systems, pp. 2672–2680, 2014.
Guan, H., Kozak, M., Robertson, E., Lee, Y., Yates, A.N., Delgado, A., Zhou, D., Kheyrkhah, T., Smith, J., Fiscus, J., “MFC datasets: Large-scale benchmark datasets for media forensic challenge evaluation”, In Proceedings of the 2019 IEEEWinter Applications of Computer Vision Workshops, pp. 63–72, 2019.
Güera, D., Delp, E.J., “Deepfake video detection using recurrent neural networks”, In Proceedings of the 15th IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 1–6, 2018.
Jiang, Y. G., Jiang, Y., Wang, J., “VCDB: A large-scale database for partial copy detection in videos”. In Proceedings of the European Conference on Computer Vision, pp. 357–371, 2014.
Kiesel, J., Mestre, M., Shukla, R., Vincent, E., Adineh, P., Corney, D., Stein, B. Potthast, M., “Semeval-2019 Task 4: Hyperpartisan news detection”. In Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 829–839,2019.
Kordopatis-Zilos, G., Papadopoulos, S., Patras, I., Kompatsiaris, I., “FIVR: Fine-grained incident video retrieval”. IEEE Transactions on Multimedia, 21(10), pp. 2638–2652, 2019.
Korus, P., Huang, J., “Multi-scale analysis strategies in PRNU-based tampering localization”, IEEE Transactions on Information Forensics & Security, 21(4), pp. 809–824, 2017.
Lazer, D.M., Baum, M.A., Benkler, Y., Berinsky, A.J., Greenhill, K.M., Menczer, F., Schudson, M., “The science of fake news”, Science, 359(6380), pp. 1094–1096, 2018.
Papadopoulou, O., Zampoglou, M., Papadopoulos, S., Kompatsiaris, I., “A corpus of debunked and verified user-generated videos”. Online Information Review, 43(1), pp. 72–88, 2019.
Rocha, A., Scheirer, W., Boult, T., Goldenstein, S., “Vision of the unseen: Current trends and challenges in digital image and video forensics”, ACM Computing Surveys, 43(4), art. 26, 2011.
Rössler, A. Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., Nießner, M. “Faceforensics++: Learning to detect manipulated facial images”, In Proceedings of the IEEE International Conference on Computer Vision, 2019.
Rubin, V.L., Chen, Y., Conroy, N.J., “Deception detection for news: Three types of fakes”, In Proceedings of the 78th ASIS&T Annual Meeting: Information Science with Impact: Research in and for the Community, art. 83, 2015.
Tandoc Jr, E.C., Lim, Z.W., Ling, R. “Defining “fake news”: A typology of scholarly definitions”, Digital journalism, 6(2), pp. 137–153, 2018.
Tralic, D., Zupancic I., Grgic S., Grgic M., “CoMoFoD – New database for copy-move forgery detection”. In Proceedings of the 55th International Symposium on Electronics in Marine, pp. 49–54, 2013.
Wardle, C., Derakhshan, H., “Information disorder: Toward an interdisciplinary framework for research and policy making”, Council of Europe Report, 27, 2017.
Wu, X., Hauptmann, A.G., Ngo, C.-W., “Practical elimination of near-duplicates from web video search”, In Proceedings of the 15th ACM International Conference on Multimedia, pp. 218–227, 2007.
Zampoglou, M., Papadopoulos, S., Kompatsiaris, Y., “Detecting image splicing in the wild (web)”, In Proceedings of the 2015 IEEE International Conference on Multimedia & Expo Workshops, 2015.
Zampoglou, M., Papadopoulos, S., Kompatsiaris, Y., “Large-scale evaluation of splicing localization algorithms for web images”, Multimedia Tools and Applications, 76(4), pp. 4801–4834, 2017.
Zhou, X., Zafarani, R., “Fake news: A survey of research, detection methods, and opportunities”. arXiv preprint arXiv:1812.00315, 2018.
Zubiaga, A., Aker, A., Bontcheva, K., Liakata, M., Procter, R., “Detection and resolution of rumours in social media: A survey”, ACM Computing Surveys, 51(2), art. 32, 2018.

Appendix A: Examples of videos in the Fake Video Corpus.

Real videos


US Airways Flight 1549 ditched in the Hudson River.


A group of musicians playing in an Istanbul park while bombs explode outside the stadium behind them.


A giant alligator crossing a Florida golf course.

Fake videos


“Syrian boy rescuing a girl amid gunfire” – Staged (fabricated content): The video was filmed by Norwegian Lars Klevberg in Malta.


“Golden Eagle Snatches Kid” – Tampered: The video was created by a team of students in Montreal as part of their course on visual effects.


“Pope Francis slaps Donald Trump’s hand for touching him” – Satire/parody: The video was digitally manipulated, and was made for the late-night television show Jimmy Kimmel Live.

Appendix B: Examples of videos in the Fine-grained Incident Video Retrieval dataset.

Example 1


Query video from the American Airlines Flight 383 fire at Chicago O’Hare International Airport in October 28, 2016.


Duplicate scene video.


Complimentary scene video.


Incident scene video.

Example 2


Query video from the Boston Marathon bombing in April 15, 2013.


Duplicate scene video.


Complimentary scene video.


Incident scene video.

Example 3


Query video from the the Las Vegas shooting in October 1, 2017.


Duplicate scene video.


Complimentary scene video.


Incident scene video.

JPEG Column: 84th JPEG Meeting in Brussels, Belgium

The 84th JPEG meeting was held in Brussels, Belgium.

This meeting was characterised by significant progress in most of JPEG projects and also exploratory studies. JPEG XL, the new image coding system, has issued the Committee Draft, giving shape to this new effective solution for the future of image coding. JPEG Pleno, the standard for new imaging technologies, Part 1 (Framework) and Part 2 (Light field coding) have also reached Draft International Standard status.

Moreover, exploration studies are ongoing in the domain of media blockchain and on the application of learning solutions for image coding (JPEG AI). Both have triggered a number of activities providing new knowledge and opening new possibilities on the future use of these technologies in future JPEG standards.

The 84th JPEG meeting had the following highlights: 84th meetingTE-66694113_10156591758739370_4025463063158194176_n

  • JPEG XL issues the Committee Draft
  • JPEG Pleno Part 1 and 2 reaches Draft International Standard status
  • JPEG AI defines Common Test Conditions
  • JPEG exploration studies on Media Blockchain
  • JPEG Systems –JLINK working draft
  • JPEG XS

In the following, a short description of the most significant activities is presented.

 

JPEG XL

The JPEG XL Image Coding System (ISO/IEC 18181) has completed the Committee Draft of the standard. The new coding technique allows storage of high-quality images at one-third the size of the legacy JPEG format. Moreover, JPEG XL can losslessly transcode existing JPEG images to about 80% of their original size simplifying interoperability and accelerating wider deployment.

The JPEG XL reference software, ready for mobile and desktop deployments, will be available in Q4 2019. The current contributors have committed to releasing it publicly under a royalty-free and open source license.

 

JPEG Pleno

A significant milestone has been reached during this meeting: the Draft International Standard (DIS) for both JPEG Pleno Part 1 (Framework) and Part 2 (Light field coding) have been completed. A draft architecture of the Reference Software (Part 4) and developments plans have been also discussed and defined.

In addition, JPEG has completed an in-depth analysis of existing point cloud coding solutions and a new version of the use-cases and requirements document has been released reflecting the future role of JPEG Pleno in point cloud compression. A new set of Common Test Conditions has been released as a guideline for the testing and evaluation of point cloud coding solutions with both a best practice subjective testing protocol and a set of objective metrics.

JPEG Pleno holography activities had significant advances on the definition of use cases and requirements, and description of Common Test Conditions. New quality assessment methodologies for holographic data defined in the framework of a collaboration between JPEG and Qualinet were established. Moreover, JPEG Pleno continues collecting microscopic and tomographic holographic data.

 

JPEG AI

The JPEG Committee continues to carry out exploration studies with deep learning-based image compression solutions, typically with an auto-encoder architecture. The promise that these types of codecs hold, especially in terms of coding efficiency, will be evaluated with several studies. In this meeting, a Common Test Conditions was produced, which includes a plan for subjective and objective quality assessment experiments as well as coding pipelines for anchor and learning-based codecs. Moreover, a JPEG AI dataset was proposed and discussed, and a double stimulus impairment scale experiment (side-by-side) was performed with a mix of experts and non-experts in a controlled environment.

 

JPEG exploration on Media Blockchain

Fake news, copyright violation, media forensics, privacy and security are emerging challenges in digital media. JPEG has determined that blockchain and distributed ledger technologies (DLT) have great potential as a technology component to address these challenges in transparent and trustable media transactions. However, blockchain and DLT need to be integrated closely with a widely adopted standard to ensure broad interoperability of protected images. JPEG calls for industry participation to help define use cases and requirements that will drive the standardization process. In order to clearly identify the impact of blockchain and distributed ledger technologies on JPEG standards, the committee has organised several workshops to interact with stakeholders in the domain.

The 4th public workshop on media blockchain was organized in Brussels on Tuesday the 16th of July 2019 during the 84th ISO/IEC JTC 1/SC 29/WG1 (JPEG) Meeting. The presentations and program of the workshop are available on jpeg.org.

The JPEG Committee has issued an updated version of the white paper entitled “Towards a Standardized Framework for Media Blockchain” that elaborates on the initiative, exploring relevant standardization activities, industrial needs and use cases.

To keep informed and to get involved in this activity, interested parties are invited to register to the ad hoc group’s mailing list.

 

JPEG Systems – JLINK

At the 84th meeting, IS text reviews for ISO/IEC 19566-5 JUMBF and ISO/IEC 19566-6 JPEG 360 were completed; IS publication will be forthcoming.  Work began on adding functionality to JUMBF, Privacy & Security, and JPEG 360; and initial planning towards developing software implementation of these parts of JPEG Systems specification.  Work also began on the new ISO/IEC 19566-7 Linked media images (JLINK) with development of a working draft.

 

JPEG XS

The JPEG Committee is pleased to announce new Core Experiments and Exploration Studies on compression of raw image sensor data. The JPEG XS project aims at the standardization of a visually lossless low-latency and lightweight compression scheme that can be used as a mezzanine codec in various markets. Video transport over professional video links (SDI, IP, Ethernet), real-time video storage in and outside of cameras, memory buffers, machine vision systems, and data compression onboard of autonomous vehicles are among the targeted use cases for raw image sensor compression. This new work on raw sensor data will pave the way towards highly efficient close-to-sensor image compression workflows with JPEG XS.

 

Final Quote

“Completion of the Committee Draft of JPEG XL, the new standard for image coding is an important milestone. It is hoped that JPEG XL can become an excellent replacement of the widely used JPEG format which has been in service for more than 25 years.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

About JPEG

The Joint Photographic Experts Group (JPEG) is a Working Group of ISO/IEC, the International Organisation for Standardization / International Electrotechnical Commission, (ISO/IEC JTC 1/SC 29/WG 1) and of the International Telecommunication Union (ITU-T SG16), responsible for the popular JPEG, JPEG 2000, JPEG XR, JPSearch, JPEG XT and more recently, the JPEG XS, JPEG Systems, JPEG Pleno and JPEG XL families of imaging standards.

More information about JPEG and its work is available at www.jpeg.org.

Future JPEG meetings are planned as follows:

  • No 85, San Jose, California, U.S.A., November 2 to 8, 2019
  • No 86, Sydney, Australia, January 18 to 24, 2020

MPEG Column: 127th MPEG Meeting in Gothenburg, Sweden

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

Plenary of the 127th MPEG Meeting in Gothenburg, Sweden.

Plenary of the 127th MPEG Meeting in Gothenburg, Sweden.

The 127th MPEG meeting concluded on July 12, 2019 in Gothenburg, Sweden with the following topics:

  • Versatile Video Coding (VVC) enters formal approval stage, experts predict 35-60% improvement over HEVC
  • Essential Video Coding (EVC) promoted to Committee Draft
  • Common Media Application Format (CMAF) 2nd edition promoted to Final Draft International Standard
  • Dynamic Adaptive Streaming over HTTP (DASH) 4th edition promoted to Final Draft International Standard
  • Carriage of Point Cloud Data Progresses to Committee Draft
  • JPEG XS carriage in MPEG-2 TS promoted to Final Draft Amendment of ISO/IEC 13818-1 7th edition
  • Genomic information representation – WG11 issues a joint call for proposals on genomic annotations in conjunction with ISO TC 276/WG 5
  • ISO/IEC 23005 (MPEG-V) 4th Edition – WG11 promotes the Fourth edition of two parts of “Media Context and Control” to the Final Draft International Standard (FDIS) stage

The corresponding press release of the 127th MPEG meeting can be found here: https://mpeg.chiariglione.org/meetings/127

Versatile Video Coding (VVC)

The Moving Picture Experts Group (MPEG) is pleased to announce that Versatile Video Coding (VVC) progresses to Committee Draft, experts predict 35-60% improvement over HEVC.

The development of the next major generation of video coding standard has achieved excellent progress, such that MPEG has approved the Committee Draft (CD, i.e., the text for formal balloting in the ISO/IEC approval process).

The new VVC standard will be applicable to a very broad range of applications and it will also provide additional functionalities. VVC will provide a substantial improvement in coding efficiency relative to existing standards. The improvement in coding efficiency is expected to be quite substantial – e.g., in the range of 35–60% bit rate reduction relative to HEVC although it has not yet been formally measured. Relative to HEVC means for equivalent subjective video quality at picture resolutions such as 1080p HD or 4K or 8K UHD, either for standard dynamic range video or high dynamic range and wide color gamut content for levels of quality appropriate for use in consumer distribution services. The focus during the development of the standard has primarily been on 10-bit 4:2:0 content, and 4:4:4 chroma format will also be supported.

The VVC standard is being developed in the Joint Video Experts Team (JVET), a group established jointly by MPEG and the Video Coding Experts Group (VCEG) of ITU-T Study Group 16. In addition to a text specification, the project also includes the development of reference software, a conformance testing suite, and a new standard ISO/IEC 23002-7 specifying supplemental enhancement information messages for coded video bitstreams. The approval process for ISO/IEC 23002-7 has also begun, with the issuance of a CD consideration ballot.

Research aspects: VVC represents the next generation video codec to be deployed in 2020+ and basically the same research aspects apply as for previous generations, i.e., coding efficiency, performance/complexity, and objective/subjective evaluation. Luckily, JVET documents are freely available including the actual standard (committee draft), software (and its description), and common test conditions. Thus, researcher utilizing these resources are able to conduct reproducible research when contributing their findings and code improvements back to the community at large. 

Essential Video Coding (EVC)

MPEG-5 Essential Video Coding (EVC) promoted to Committee Draft

Interestingly, at the same meeting as VVC, MPEG promoted MPEG-5 Essential Video Coding (EVC) to Committee Draft (CD). The goal of MPEG-5 EVC is to provide a standardized video coding solution to address business needs in some use cases, such as video streaming, where existing ISO video coding standards have not been as widely adopted as might be expected from their purely technical characteristics.

The MPEG-5 EVC standards includes a baseline profile that contains only technologies that are over 20 years old or are otherwise expected to be royalty-free. Additionally, a main profile adds a small number of additional tools, each providing significant performance gain. All main profile tools are capable of being individually switched off or individually switched over to a corresponding baseline tool. Organizations making proposals for the main profile have agreed to publish applicable licensing terms within two years of FDIS stage, either individually or as part of a patent pool.

Research aspects: Similar research aspects can be described for EVC and from a software engineering perspective it could be also interesting to further investigate this switching mechanism of individual tools or/and fall back option to baseline tools. Naturally, a comparison with next generation codecs such as VVC is interesting per se. The licensing aspects itself are probably interesting for other disciplines but that is another story…

Common Media Application Format (CMAF)

MPEG ratified the 2nd edition of the Common Media Application Format (CMAF)

The Common Media Application Format (CMAF) enables efficient encoding, storage, and delivery of digital media content (incl. audio, video, subtitles among others), which is key to scaling operations to support the rapid growth of video streaming over the internet. The CMAF standard is the result of widespread industry adoption of an application of MPEG technologies for adaptive video streaming over the Internet, and widespread industry participation in the MPEG process to standardize best practices within CMAF.

The 2nd edition of CMAF adds support for a number of specifications that were a result of significant industry interest. Those include

  • Advanced Audio Coding (AAC) multi-channel;
  • MPEG-H 3D Audio;
  • MPEG-D Unified Speech and Audio Coding (USAC);
  • Scalable High Efficiency Video Coding (SHVC);
  • IMSC 1.1 (Timed Text Markup Language Profiles for Internet Media Subtitles and Captions); and
  • additional HEVC video CMAF profiles and brands.

This edition also introduces CMAF supplemental data handling as well as new structural brands for CMAF that reflects the common practice of the significant deployment of CMAF in industry. Companies adopting CMAF technology will find the specifications introduced in the 2nd Edition particularly useful for further adoption and proliferation of CMAF in the market.

Research aspects: see below (DASH).

Dynamic Adaptive Streaming over HTTP (DASH)

MPEG approves the 4th edition of Dynamic Adaptive Streaming over HTTP (DASH)

The 4th edition of MPEG-DASH comprises the following features:

  • service description that is intended by the service provider on how the service is expected to be consumed;
  • a method to indicate the times corresponding to the production of associated media;
  • a mechanism to signal DASH profiles and features, employed codec and format profiles; and
  • supported protection schemes present in the Media Presentation Description (MPD).

It is expected that this edition will be published later this year. 

Research aspects: CMAF 2nd and DASH 4th edition come along with a rich feature set enabling a plethora of use cases. The underlying principles are still the same and research issues arise from updated application and service requirements with respect to content complexity, time aspects (mainly delay/latency), and quality of experience (QoE). The DASH-IF awards the excellence in DASH award at the ACM Multimedia Systems conference and an overview about its academic efforts can be found here.

Carriage of Point Cloud Data

MPEG progresses the Carriage of Point Cloud Data to Committee Draft

At its 127th meeting, MPEG has promoted the carriage of point cloud data to the Committee Draft stage, the first milestone of ISO standard development process. This standard is the first one introducing the support of volumetric media in the industry-famous ISO base media file format family of standards.

This standard supports the carriage of point cloud data comprising individually encoded video bitstreams within multiple file format tracks in order to support the intrinsic nature of the video-based point cloud compression (V-PCC). Additionally, it also allows the carriage of point cloud data in one file format track for applications requiring multiplexed content (i.e., the video bitstream of multiple components is interleaved into one bitstream).

This standard is expected to support efficient access and delivery of some portions of a point cloud object considering that in many cases that entire point cloud object may not be visible by the user depending on the viewing direction or location of the point cloud object relative to other objects. It is currently expected that the standard will reach its final milestone by the end of 2020.

Research aspects: MPEG’s Point Cloud Compression (PCC) comes in two flavors, video- and geometric-based but still requires to be packaged into file and delivery formats. MPEG’s choice here is the ISO base media file format and the efficient carriage of point cloud data is characterized by both functionality (i.e., enabling the required used cases) and performance (such as low overhead).

MPEG 2 Systems/Transport Stream

JPEG XS carriage in MPEG-2 TS promoted to Final Draft Amendment of ISO/IEC 13818-1 7th edition

At its 127th meeting, WG11 (MPEG) has extended ISO/IEC 13818-1 (MPEG-2 Systems) – in collaboration with WG1 (JPEG) – to support ISO/IEC 21122 (JPEG XS) in order to support industries using still image compression technologies for broadcasting infrastructures. The specification defines a JPEG XS elementary stream header and specifies how the JPEG XS video access unit (specified in ISO/IEC 21122-1) is put into a Packetized Elementary Stream (PES). Additionally, the specification also defines how the System Target Decoder (STD) model can be extended to support JPEG XS video elementary streams.

Genomic information representation

WG11 issues a joint call for proposals on genomic annotations in conjunction with ISO TC 276/WG 5

The introduction of high-throughput DNA sequencing has led to the generation of large quantities of genomic sequencing data that have to be stored, transferred and analyzed. So far WG 11 (MPEG) and ISO TC 276/WG 5 have addressed the representation, compression and transport of genome sequencing data by developing the ISO/IEC 23092 standard series also known as MPEG-G. They provide a file and transport format, compression technology, metadata specifications, protection support, and standard APIs for the access of sequencing data in the native compressed format.

An important element in the effective usage of sequencing data is the association of the data with the results of the analysis and annotations that are generated by processing pipelines and analysts. At the moment such association happens as a separate step, standard and effective ways of linking data and meta information derived from sequencing data are not available.

At its 127th meeting, MPEG and ISO TC 276/WG 5 issued a joint Call for Proposals (CfP) addressing the solution of such problem. The call seeks submissions of technologies that can provide efficient representation and compression solutions for the processing of genomic annotation data.

Companies and organizations are invited to submit proposals in response to this call. Responses are expected to be submitted by the 8th January 2020 and will be evaluated during the 129th WG 11 (MPEG) meeting. Detailed information, including how to respond to the call for proposals, the requirements that have to be considered, and the test data to be used, is reported in the documents N18648, N18647, and N18649 available at the 127th meeting website (http://mpeg.chiariglione.org/meetings/127). For any further question about the call, test conditions, required software or test sequences please contact: Joern Ostermann, MPEG Requirements Group Chair (ostermann@tnt.uni-hannover.de) or Martin Golebiewski, Convenor ISO TC 276/WG 5 (martin.golebiewski@h-its.org).

ISO/IEC 23005 (MPEG-V) 4th Edition

WG11 promotes the Fourth edition of two parts of “Media Context and Control” to the Final Draft International Standard (FDIS) stage

At its 127th meeting, WG11 (MPEG) promoted the 4th edition of two parts of ISO/IEC 23005 (MPEG-V; Media Context and Control) standards to the Final Draft International Standard (FDIS). The new edition of ISO/IEC 23005-1 (architecture) enables ten new use cases, which can be grouped into four categories: 3D printing, olfactory information in virtual worlds, virtual panoramic vision in car, and adaptive sound handling. The new edition of ISO/IEC 23005-7 (conformance and reference software) is updated to reflect the changes made by the introduction of new tools defined in other parts of ISO/IEC 23005. More information on MPEG-V and its parts 1-7 can be found at https://mpeg.chiariglione.org/standards/mpeg-v.


Finally, the unofficial highlight of the 127th MPEG meeting we certainly found while scanning the scene in Gothenburg on Tuesday night…

MPEG127_Metallica

Qualinet Databases: Central Resource for QoE Research – History, Current Status, and Plans

Introduction

Datasets are an enabling tool for successful technological development and innovation in numerous fields. Large-scale databases of multimedia content play a crucial role in the development and performance evaluation of multimedia technologies. Among those are most importantly audiovisual signal processing, for example coding, transmission, subjective/objective quality assessment, and QoE (Quality of Experience) [1]. Publicly available and widely accepted datasets are necessary for a fair comparison and validation of systems under test; they are crucial for reproducible research. In the public domain, large amounts of relevant multimedia contents are available, for example, ACM SIGMM Records Dataset Column (http://sigmm.hosting.acm.org/category/datasets-column/), MediaEval Benchmark (http://www.multimediaeval.org/), MMSys Datasets (http://www.sigmm.org/archive/MMsys/mmsys14/index.php/mmsys-datasets.html), etc. However, the description of these datasets is usually scattered – for example in technical reports, research papers, online resources – and it is a cumbersome task for one to find the most appropriate dataset for the particular needs.

The Qualinet Multimedia Databases Online platform is one of many efforts to provide an overview and comparison of multimedia content datasets – especially for QoE-related research, all in one place. The platform was introduced in the frame of ICT COST Action IC1003 European Network on Quality of Experience in Multimedia Systems and Services – Qualinet (http://www.qualinet.eu). The platform, abbreviated “Qualinet Databases” (http://dbq.multimediatech.cz/), is used to share information on databases with the community [3], [4]. Qualinet was supported as a COST Action between November 8, 2010, and November 7, 2014. It has continued as an independent entity with a new structure, activities, and management since 2015. Qualinet Databases platform fulfills the initial goal to provide a rich and internationally recognized database and has been running since 2010. It is widely considered as one of Qualinet’s most notable achievements.

In the following paragraphs, there is a summary on Qualinet Databases, including its history, current status, and plans.

Background

A commonly recognized database for multimedia content is a crucial resource required not only for QoE-related research. Among the first published efforts in this field are the image and video quality resources website by Stefan Winkler (https://stefan.winklerbros.net/resources.html) and related publications providing in-depth analysis of multimedia content databases [2]. Since 2010, one of the main interests of Qualinet and its Working Group 4 (WG4) entitled Databases and Validation (Leader: Christian Timmerer, Deputy Leaders: Karel Fliegel, Shelley Buchinger, Marcus Barkowsky) was to create an even broader database with extended functionality and take the necessary steps to make it accessible to all researchers.

Qualinet firstly decided to list and summarize available multimedia databases based on a literature search and feedback from the project members. As the number of databases in the list was rapidly increasing, the handling of the necessary updates became inefficient. Based on these findings, WG4 started the implementation of the Qualinet Databases online platform in 2011. Since then, the website has been used as Qualinet’s central resource for sharing the datasets among Qualinet members and the scientific community. To the best of our knowledge, there is no other publicly available resource for QoE research that offers similar functionality. The Qualinet Databases platform is intended to provide more features than other known similar solutions such as Consumer Video Digital Library (http://www.cdvl.org). The main difference lies in the fact that the Qualinet Databases acts as a hub to various scattered resources of multimedia content, especially with the available data, such as MOS (Mean Opinion Score), raw data from subjective experiments, eye-tracking data, and detailed descriptions of the datasets including scientific references.

In the development of Qualinet DBs within the frame of COST Action IC1003, there are several milestones, which are listed in the timeline below:

  • March 2011 (1st Qualinet General Assembly (GA), Lisbon, Portugal), an initial list of multimedia databases collected and published internally for Qualinet members, creation of Web-based portal proposed,
  • September 2011 (2nd Qualinet GA, Brussels, Belgium), Qualinet DBs prototype portal introduced, development of publicly available resource initiated,
  • February 2012 (3rd Qualinet GA, Prague, Czech Republic), hosting of the Qualinet DBs platform under development at the Czech Technical University in Prague (http://dbq.multimediatech.cz/), Qualinet DBs Wiki page (http://dbq-wiki.multimediatech.cz/) introduced,
  • October 2012 (4th Qualinet GA, Zagreb, Croatia), White paper on Qualinet DBs published [3], Qualinet DBs v1.0 online platform released to the public,
  • March 2013 (5th Qualinet GA, Novi Sad, Serbia), Qualinet DBs v1.5 online platform published with extended functionality,
  • September 2013 (6th Qualinet GA, Novi Sad, Serbia), Qualinet DBs Information leaflet published, Task Force (TF) on Standardization and Dissemination established, QoMEX 2013 Dataset Track organized,
  • March 2014 (7th Qualinet GA, Berlin, Germany), ACM MMSys 2014 Dataset Track organized, liaison with Ecma International (https://www.ecma-international.org/) on possible standardization of Qualinet DBs subset established,
  • October 2014 (8th Final Qualinet GA and Workshop, Delft, The Netherlands), final development stage v3.00 of Qualinet DBs platform reached, code freeze.

Qualinet Databases became Qualinet’s primary resource for sharing datasets publicly to Qualinet members and after registration also to the broad scientific community. At the final Qualinet General Assembly under the COST Action IC1003 umbrella (October 2014, Delft, The Netherlands) it was concluded – also based on numerous testimonials – that Qualinet DBs is one of the major assets created throughout the project. Thus it was decided that the sustainability of this resource must be ensured for the years to come. Since 2015 the Qualinet DBs platform is being kept running with the effort of a newly established Task Force, TF4 Qualinet Databases (Leader: Karel Fliegel, Deputy Leaders: Lukáš Krasula, Werner Robitza). The status and achievements are being discussed regularly at Qualinet’s Annual Meetings collocated with QoMEX (International Conference on Quality of Multimedia Experience), i.e., 7th QoMEX 2015 (Costa Navarino, Greece), 8th QoMEX 2016 (Lisbon, Portugal), 9th QoMEX 2017 (Erfurt, Germany), 10th QoMEX 2018 (Sardinia, Italy), and 11th QoMEX 2019 (Berlin, Germany).

Current Status

The basic functionality of the Qualinet Databases online platform, see Figure 1, is based on the idea that registered users (Qualinet members and other interested users from the scientific community) have access through an easy-to-use Web portal providing a list of multimedia databases. Based on their user rights, they are allowed to browse information about the particular database and eventually download the actual multimedia content from the link provided by the database owner.

qualinetDatabaseInterface

Figure 1. Qualinet Databases online platform and its current interface.

Selected users – Database Owners in particular – have rights to upload or edit their records in the list of databases. Most of the multimedia databases have a flag of “Publicly Available” and are accessible to the registered users outside Qualinet. Only Administrators (Task Force leader and deputy leaders) have the right to delete records in the database. Qualinet DBs does not contain the actual multimedia content but only the access information with provided links to the dataset files saved at the server of the Database Owner.

The Qualinet DBs is accessible to all registered users after entering valid login data. Depending on the level of the rights assigned to the particular account, the user can browse the list of the databases with description (all registered users) and has access to the actual multimedia content via a link entered by the Database Owner. It provides the user with a powerful tool to find the multimedia database that best suits his/her needs.

In the list of databases user can select visible fields for the list in the User Settings, namely:

  • Database name, Institution, Qualinet Partner (Yes/No),
  • Link, Description (abstract), Access limitations, Publicly available (Yes/No), Copyright Agreement signed (Yes/No),
  • Citation, References, Copyright notice, Database usage tracking,
  • Content type, MOS (Yes/No), Other (Eye tracking, Sensory, …),
  • Total number of contents, SRC, HRC,
  • Subjective evaluation method (DSCQS, …), Number of ratings.

Fulltext search within the selected visible fields is available. In the current version of the Qualinet DBs, users can sort databases alphabetically based on the visible fields or use the search field as described above.

The list of databases allows:

  • Opening a card with details on particular database record (accessible to all users),
  • Editing database record (accessible to the database owners and administrators),
  • Deleting database record (accessible only to administrators),
  • Requesting deletion of a database record (accessible to the database owners),
  • Requesting assignment as the database owner (accessible to all users).

As for the records available in Qualinet DBs, the listed multimedia databases are a crucial resource for various tasks in multimedia signal processing. The Qualinet DBs is focused primarily on QoE research [1] related content, where, while designing objective quality assessment algorithms, it is necessary to perform (1) Verification of model during development, (2) Validation of model after development, and (2) Benchmarking of various models.

Annotated multimedia databases contain essential ground truth, that is, test material from the subjective experiment annotated with subjective ratings. Qualinet DBs also lists other material without subjective ratings for other kinds of experiments. Qualinet DBs covers mostly image and video datasets, including special contents (e.g., 3D, HDR) and data from subjective experiments, such as subjective quality ratings or visual attention data.

A timeline with statistics on the number of records and users registered in Qualinet DBs throughout the years can be seen in Figure 2. Throughout Qualinet COST Action IC1003 the number of registered datasets grew from 64 in March 2011 to 201 in October 2014. The number of datasets created by the Qualinet partner institutions grew from 30 in September 2011 to 83 in October 2014. The number of registered users increased from 37 in March 2013 to 222 in October 2014. After the end of COST Action IC1003 in November 2014 the number of datasets increased to 246 and the number of registered users to 491. The average yearly increase of registered users is approximately 56 users, which illustrates continuous interest and value of Qualinet DBs for the community.

Figure 2. Qualinet Databases statistics on the number of records and users.

Figure 2. Qualinet Databases statistics on the number of records and users.

Besides the Qualinet DBs online platform (http://dbq.multimediatech.cz/), there are also additional resources available for download via the Wiki page (http://dbq-wiki.multimediatech.cz) and Qualinet website (http://www.qualinet.eu/). Two documents are available: (1) “QUALINET Multimedia Databases v6.5” (May 28, 2017) with a detailed description of registered datasets, and “List of QUALINET Multimedia Databases v6.5” in a searchable spreadsheet with records as of May 28, 2017.

Plans

There are indicators – especially the number of registered users – showing that Qualinet DBs is a valuable resource for the community. However, the current platform as described above has not been updated since 2014, and there are several issues to be solved, such as the burden on one institution to host and maintain the system, possible instability and an obsolete interface, issues with the Wiki page and lack of a file repository. Moreover, in the current system, user registration is required. It is a very useful feature for usage tracking, ensuring database privacy, but at the same time, it can put some people off from using and adding new datasets, and it requires handling of personal data. There are also numerous obsolete links in Qualinet DBs, which is useful for the record, but the respective databases should be archived.

A proposal for a new platform for Qualinet DBs has been presented at the 13th Qualinet General Meeting in June 2019 (Berlin, Germany) and was subsequently supported by the assembly. The new platform is planned to be based on a Git repository so that the system will be open-source and text-based, and no database will be needed. The user-friendly interface is to be provided by a static website generator; the website itself will be hosted on GitHub. A similar approach has been successfully implemented for the VQEG Software & Tools (https://vqeg.github.io/software-tools/) web portal. Among the main advantages of the new platform are (1) easier access (i.e., fast performance with simple interface, no hosting fees and thus long term sustainability, no registration necessary and thus no entry barrier), (2) lower maintenance burden (i.e., minimal technical maintenance effort needed, easy code editing), and (3) future-proofness (i.e., databases are just text files with easy format conversion, and hosting can be done on any server).

On the other hand, the new platform will not support user registration and login, which is beneficial in order to prevent data privacy issues. Tracking of registered users will no longer be available, but database usage tracking is planned to be provided via, for example, Google Analytics. There are three levels of dataset availability in the current platform: (1) Publicly available dataset, (2) Information about dataset but data not available/available upon request, and (3) Not publicly available (e.g., Qualinet members only, not supported in the new platform). The migration of Qualinet DBs to the new platform is to be completed by mid-2020. Current data are to be checked and sanitized, and obsolete records moved to the archive.

Conclusions

Broad audiovisual contents with diverse characteristics, annotated with data from subjective experiments, is an enabling resource for research in multimedia signal processing, especially when QoE is considered. The availability of training and testing data becomes even more important nowadays, with ever-increasing utilization of machine learning approaches. Qualinet Databases helps to facilitate reproducible research in the field and has become a valuable resource for the community. 

References

  • [1] Le Callet, P., Möller, S., Perkis, A. Qualinet White Paper on Definitions of Quality of Experience, European Network on Quality of Experience in Multimedia Systems and Services (COST Action IC 1003), Lausanne, Switzerland, Version 1.2, March 2013. (http://www.qualinet.eu/images/stories/QoE_whitepaper_v1.2.pdf
  • [2] Winkler, S. Analysis of public image and video databases for quality assessment, IEEE Journal of Selected Topics in Signal Processing, 6(6):616-625, 2012. (https://doi.org/10.1109/JSTSP.2012.2215007)
  • [3] Fliegel, K., Timmerer, C. (eds.) WG4 Databases White Paper v1.5: QUALINET Multimedia Database enabling QoE Evaluations and Benchmarking, Prague/Klagenfurt, Czech Republic/Austria, Version 1.5, March 2013.
  • [4] Fliegel, K., Battisti, F., Carli, M., Gelautz, M., Krasula, L., Le Callet, P., Zlokolica, V. 3D Visual Content Datasets. In: Assunção P., Gotchev A. (eds) 3D Visual Content Creation, Coding and Delivery. Signals and Communication Technology, Springer, Cham, 2019. (https://doi.org/10.1007/978-3-319-77842-6_11)

NoteThe readers interested in active contribution to extending the success of Qualinet Databases are referred to Qualinet (http://www.qualinet.eu/) and invited to join its Task Force on Qualinet Databases via email reflector. To subscribe, please send an email to (dbq.wg4.qualinet-subscribe@listes.epfl.ch). This work was partially supported by the project No. GA17-05840S “Multicriteria optimization of shift-variant imaging system models” of the Czech Science Foundation.

Report from MMSYS 2019 – by Alia Sheikh

Alia Sheikh (@alteralias) is researching immersive and interactive content. At present she is interested in the narrative language of immersive environments and how stories can best be choreographed within them.

Being part of an international academic research community and actually meeting said international research community are not exactly the same thing it turns out. After attending the 2019 ACM MMSys conference this year, I have decided that leaving the office and actually meeting the people behind the research is very worth doing.

This year I was invited to give an overview presentation at ACM MMSys ’19, which was being hosted at the University of Massachusetts. The MMSys, NOSSDAV and MMVE (International Workshop on Immersive Mixed and Virtual Environment Systems) conferences happen back to back, in a different location each year. I was asked to talk about some of our team’s experiments in immersive storytelling at MMVE. This included our current work on lightfields and my work on directing attention in, and the cinematography of, immersive environments.

To be honest it wasn’t the most convenient time to decide to catch a plane to New York and then a train to Boston for a multi-day conference, but it felt like the right time to take a break from the office and find out what the rest of the community had been working on.

Fig.1: A picturesque scene from the wonderful University of Massachussetts Amherst campus

Fig.1: A picturesque scene from the wonderful University of Massachussetts Amherst campus

I arrived at Amherst the day before the conference and (along with another delegate who had taken the same bus) wandered the tranquil university grounds slightly lost before being rescued by the ever calm and cheerful Michael Zink. Michael is the chair of the MMSys organising committee and someone who later spent much of the conference introducing people with shared interests to each other – he appeared to know every delegate by name.

Once installed in my UMass hotel room, I proceeded to spend the evening on my usual pre-conference ritual: entirely rewriting my presentation.

As the timetable would have it, I was going to be the first speaker.

Fig 2: Attendees at MMSys 2019 taking their seats

Fig. 2: Attendees at MMSys 2019 taking their seats

Fig 3: Alia in full flow during our talk on day 1

Fig. 3: Alia in full flow during our talk on day 1

I don’t actually know why I do this to myself, but there is something about turning up to the event proper that gives you a sense of what will work for that particular audience, and Michael had given me a brilliantly concise snapshot of the type of delegate that MMSys attracts – highly motivated, expert on the nuts and bolts of how to get data to where it needs to be and likely to be interested in a big picture overview of how these systems can be used to create a meaningful human connection.

Using selected examples from our research, I put together a talk on how the experience of stories in high tech immersive environments differs from more traditional formats, but, once the language of immersive cinematography is properly understood, we find that we are able to create new narrative experiences that are both meaningful and emotionally rich.

The next morning I walked into an auditorium full of strangers filing in, gave my talk (I thought it went well?) and then sank happily into a plush red flip-seat chair safe in the knowledge that I was free to enjoy the rest of the event.

The next item was the keynote and easily one of the best talks I have ever experienced at a conference. Presented by Professor Nimesha Ranasinghe it was a masterclass in taking an interesting problem (how do we transmit a full sensory experience over a network?) And presenting it in such a way as to neatly break down and explain the science (we can electrically stimulate the tongue to recreate a taste!) while never losing sight of the inherent joy in working on the kind of science you dream of as a child (therefore electrified cutlery!).

Fig. 4: Professor Nimesha Ranasinghe during his talk on Multisensory experiences

Fig. 4: Professor Nimesha Ranasinghe during his talk on Multisensory experiences

Fig 5: Multisensory enhanced multimedia - experiences of the future ?

Fig. 5: Multisensory enhanced multimedia – experiences of the future ?

Fig6: Networking and some delicious lunch

Fig. 6: Networking and some delicious lunch

At lunch I discovered the benefit of having presented my talk early – I made a lot of friends with people who had specific questions about our work, and got a useful heads up on work they were presenting either in the afternoon’s long papers session or the poster session.

We all spent the evening at the welcome reception on the top floor of UMass Hotel, where we ate a huge variety of tiny, delicious cakes and got to know each other better. It was obvious that in some cases, researchers that might collaborate remotely all year, were able to use MMSys as an excellent opportunity to catch up. As a newcomer to this ACM conference however, I have to say that I found it a very welcoming event, and I met a lot of very friendly people many of them working on research that was entirely different to my own, but which seemed to offer an interesting insight or area of overlap.

I wasn’t surprised that I really enjoyed MMVE – virtual environments are very much my topic of interest right now. But I was delighted by how much of MMSys was entirely up my street. ACM MMSys provides a forum for researchers to present and share their latest research findings in multimedia systems, and the conference cuts across all media/data types to showcase the intersections and the interplay of approaches and solutions developed for different domains. This year, the work presented on how to best encode and transport mixed reality content, as well as predict head motion to better encode and deliver the part of a spherical panorama a viewer was likely to be looking at, was particularly interesting to me. I wondered whether comparing the predicted path of user attention to the desired path of user attention, would teach us how to better control a users attention within a panoramic scene, or whether peoples viewing patterns were simply too variable. In the Open Datasets & Software track, I was fascinated by one particular dataset: “ A Dataset of Eye Movements for the Children with Autism Spectrum Disorder”. This was a timely reminder for me that diversity within the audience needed to be catered for when designing multimedia systems, to avoid consigning sections of our audience to a substandard experience.

Of the demos, there were too many interesting ones to list, but I was hugely impressed by the demo for Multi-Sensor Capture and Network Processing for Virtual Reality Conferencing. This used cameras and Kinects to turn me into a point cloud and put a live 3D representation of my own physical body in a virtual space.A brilliantly simple and incredibly effective idea and I found myself sitting next to the people responsible for it at a talk later that day and discussing ways to optimise their data compression.

Despite wearing a headset that allowed me to see the other participants, I was still able to see and therefore use my own hands in the real world – even extending to picking up and using my phone.

Fig7: Trying out some cool demos during a bustling demo session

Fig. 7: Trying out some cool demos during a bustling demo session

Fig. 8: An example of the social media interaction from my "tweeting"

Fig. 8: An example of the social media interaction from my “tweeting”

Amusingly, I found that I was (virtually) sat next to a point-cloud of TNO researcher Omar Niamut which led to my favourite twitter exchange of the whole conference. I knew Omar from online, but we had never actually managed to meet in real life. Still, this was the most life-like digital incarnation yet!

I really should mention the Women’s and Diversity lunch event which (pleasingly) was attended by both men and women and offered some absolutely fascinating insights.

These included: the value of mentors over the course of a successful academic life, how a gender pay-gap is inextricably related to work family policies and steps that have successfully been taken by some countries and organisations to improve work-life balance for all genders.

It was incredibly refreshing to see these topics being discussed both scientifically and openly. The conversations I had with people afterwards as they opened up about their own experiences of work and parenthood, were among the most interesting I have ever had on the topic.

Another nice surprise – MMSys offers childcare grants available for conference attendees who are bringing small children to the conference and require on-site childcare or who incur extra expenses in leaving their children at home. It was very cheering to see that the Inclusion Policy did not stop at simply providing interesting talks, but also translated into specific inclusive action.

Fig. 9:  Women’s and Diversity lunch! What a wonderful initiative - well done MMSys and SIGMM

Fig. 9: Women’s and Diversity lunch! What a wonderful initiative – well done MMSys and SIGMM

I am delighted that I made the decision to attend MMSys. I had not realised that I was feeling somewhat detached from my peers and the academic research community in general, until I was put in an environment which contained a concentrated amount of interesting research, interesting researchers and an air of collaboration and sheer good will. It is easy to get tunnel vision when you are focused on your own little area of work, but every conversation I had at the conference reminded me that research does not happen in a vacuum.

Fig. 10: A fascinating talk at the  Women’s and Diversity lunch - it initiated great post event discussions!

Fig. 10: A fascinating talk at the Women’s and Diversity lunch – it initiated great post event discussions!

Fig. 11: The food truck experience - one of many wonderful social aspects to MMSys 2019

Fig. 11: The food truck experience – one of many wonderful social aspects to MMSys 2019

I could write a thousand more words about every interesting thing I saw or person I met at MMSys, but that would only give you my own specific experience of the conference. (I did live tweet* a lot of the talks and demos just for my own records and that can all be found here: https://twitter.com/Alteralias/status/1148546945859952640?s=20)

Fig. 12: Receiving the SIGMM Social Media Reporter Award for MMSys 2019!

Fig. 12: Receiving the SIGMM Social Media Reporter Award for MMSys 2019!

Whether you were someone I was sitting next to at a paper session, a person I spoke to standing next to in line at the food truck (one of the many sociable meal events) or someone who demoed their PhD work to me, thank you so much for sharing this event with me.

Maybe I will see you at MMSys 2020.

* p.s it turns out that if you live-tweet an entire conference, Niall gives you a Social Media Reporter award.