An interview with Judith Redi

Describe your journey into computing from your youth up to the present. What foundational lessons did you learn from this journey? Why were you initially attracted to multimedia?

Dr. Judith Redi

Dr. Judith Redi

My path to multimedia was, let’s say, non-linear. I grew up in the Italian educational system, which up until university, is somewhat biased towards social sciences and humanities. My family was not one of engineers/scientists either, and never really encouraged me to look at the technical side of things. Basically, I was on a science-free educational diet until university. On the other hand, my hometown used to host the headquarters of Olivetti (may remember fancy typewriters and early personal computers?). This meant that at a very young age I had a PC at home and at school, and could use it (as a “user” on the other side of the systems we develop; I had no clue about programming).

When the time came to choose a major at university, I decided to turn the tables, a bit as a provocative action towards my previous education/mind-set, and a bit because I was fascinated by the perspective of being able to design and build future technologies. So, I picked computer engineering, perhaps inspired by my hometown technological legacy. I immediately got fascinated by artificial intelligence, and its potential to make machines more human-like (I still tell all my bachelor students that they should have a picture of Turing on their desk or above their bed). I specialized in machine learning and applied it to cryptanalysis within my master thesis. I won a scholarship to continue that research line in a PhD project at the University of Genoa. And then Philips came along, and multimedia with it.

At the time (2007), Philips was still manufacturing displays, and to stay ahead of the competition, they had to make sure their products would deliver to users the highest possible visual quality. They had algorithms to enhance image quality, but needed a system able to understand how much enhancement was needed, and of which type (sharpening? De-noising?), based on the analysis on the incoming video signal. They wanted to try a machine-learning approach to this issue, and referred to my group for collaboration. I picked up the project immediately: the goal was to model human vision (or at least the processes underlying visual quality perception), which implied not only developing new intelligent systems at the intersection between Signal Processing and Machine Learning, but also to learn more about the users of these systems, their perception and cognition. It was the fact that it would allow me to adopt a user-centred approach, closing the loop back to my social science-oriented education, that made multimedia so attractive to me. So, I left cyber-security, embraced Multimedia, and never left since.

One Philips internship, a best PhD thesis award and a Postdoc later, I am still fascinated by this duality. Much has changed in multimedia delivery, with the shift from linear TV to on-demand content consumption, video streaming accounting for 70% of the internet traffic nowadays, and the advent of Ultra High Definition solutions. User expectations in terms of Quality of Experience (QoE) increase by the day, and they are not only affected by the amount of disruptions (due to encoding, untrustworthy transmissions, rendering inaccuracies) in the delivered video, but also relate to content semantics and popularity, user affective state, environment and social context. The role of these factors on QoE is yet to be understood, let alone modelled. This is what I am working on at TU Delft, and is a long term plan, so I guess I won’t be leaving multimedia any time soon.

I’d say it’s too early for me to draw “foundational lessons” worth sharing from my journey. I guess there are a few things, though, that I figured out along the years, and that may be worthwhile mentioning:

  1. Seemingly reckless choices may be the best decisions you have ever made. Change is scary, but can pay off big time.

  2. Luck exists but hard work is a much safer bet

  3. Keep having fun doing your research. If you’re not having fun anymore, see point (1).

Tell us more about your vision and objectives behind your current roles? What do you hope to accomplish and how will you bring this about?

As a researcher, I have been devoting most of my efforts to understanding multimedia experiences and steer their optimization (or improvement) towards a higher user satisfaction (with the delivery system). On the longer term, I want broaden this scope, to make an even bigger impact on people’s life: I want to go beyond quality of experience and multimedia enjoyment, and target the optimization (or at least improvement) of users’ well-being.

For the past four years, I have been working with Philips Research on an Ambient Assisted Living system able to (1) sense the mood of a user in a room and (2) adapt the lighting in the room to alleviate negative moods (e.g., sadness, or anxiety), when sensed. We were able to show that the system can successfully counter negative moods in elderly users (see our recent PLoS One publication if you are interested), without the need of human intervention. The thing is, negative affective states are experienced by elderly (but by younger people too, according to recent findings) quite often, and most times, a fellow human (relative, friend, caretaker) is not available to comfort the person. My vision is to build systems that, based on the unobtrusive sensing of users’ affective states, can act upon the detection of negative states and relieve the user just as a human would do.

I want to design “empathic technology”, able to provide empathic care, whenever human care is not within reach. Challenges are multiple here. First, (long-term) affective states (such as mood, which is more constant and subtle than emotion) are to be sensed. (Wearable) sensors, cameras, or also interaction with mobile devices and social media can provide relevant information here. Empathic care can then be conveyed through ambient intelligence solutions, but also by creative industries products, ranging from gaming to intelligent clothing, to, of course, Multimedia technology (think about empathic recommender systems, or videotelephony systems that are optimized to maximize the affective charge of the communication). This type of work is highly multidisciplinary (involving multimedia systems, affective computing, embedded systems and sensors, HCI and certainly psychology), and the low-hanging fruits are not many. But I’d like this to be my contribution to make the world a better place, and I am ready to take up the challenge.

Can you profile your current research, its challenges, opportunities, and implications?

Internet-based video fruition has been reality for a while, yet it is constantly growing. Cisco’s forecasts see video delivery to account for 79% of the overall internet consumer traffic by 2018 (this is equivalent to one million minutes of video crossing IP networks every second). As the media fruition grows, so do user expectations in terms of Quality of Experinece (see the recent Conviva reports!). And, future multimedia will have to be optimized for multiple, more immersive (plenoptic, HDRi, ultra-high definition) devices, both fixed and mobile. Moore’s law and broadband speed alone won’t do the job. Resources and delivery mechanisms have to be optimized on a more application- and user-specific basis. To do so, it will be essential to be able to measure (unobtrusively) the extent to which the user deems the video experience to be of a high quality.

In this context, my work aims to (1) understand the perceptual, cognitive and affective processes underlying user appreciation for multimedia experiences, and (2) model these processes in order to automatically assess the delivered QoE, and, when applicable, enhance it. It is important here to bear in mind that multimedia quality of experience cannot be considered to depend solely on the presence (absence) of visual/auditory impairments introduced by technology limitations (e.g., packet loss errors or blocking artifacts from compression). Although that’s been the most common approach to QoE assessment and optimization, it is not sufficient anymore. The appearance of social media and internet-based delivery has challenged the way media are consumed: we don’t deal with passive observers anymore, but with users that select specific videos, to be delivered on specific devices, in any type of context. Elements such as semantics, user personality, preferences and intent, and socio- cultural context of fruition come into play, that have never been investigated (let alone modelled) for delivery optimization. My research focuses on integrating these elements in QoE estimation, to enable effective, personalized optimization.

The challenges are countless: user and context characteristics have to be quantified and modelled, to be then integrated with the video content analysis to deliver a final quality assessment, representing the experience as it would be perceived by that user, in that context, given that specific video. Before that, which user and context factors impact QoE is to be determined (to date, there is not even agreement on a taxonomy of these factors). Adaptive streaming protocols make it possible to implement user- and context- aware delivery strategies, the willingness of users to share personal data publicly can lead to more accurate user models, and especially crowdsourcing and crowdsensing can support the systematic study of the influence that context and user factors have on the overall QoE.

How would you describe the role of women especially in the field of multimedia?

Just like for their male colleagues (would you ask them to describe the role of men in multimedia?), the role of women in multimedia is:

  1. to push the boundaries of science, knowledge and practice in the field, doing amazing research that will make the world a better place
  2. to train new generations of brilliant engineers and scientists that will keep doing amazing research to make the world an even better place and
  3. serve the community as professionals and leaders to steer the future amazing research that will go on making the wold better and better.

I’d say the first two points are covered. The third, instead, may be implemented a bit better in practice, as there is a general lack of representativeness of women at a leadership level. The reasons for this are countless. They go from the lack of incoming talent (traditionally girls are not attracted to STEM subjects, perhaps for socio-cultural reasons), to the so-called leaking pipeline, which sees talented women leaving demanding yet rewarding careers too early, to an underlying presence of the impostor syndrome, that sometimes prevents women from putting their name forward for given roles. The solution is not necessarily in quotas (although I understand the reasoning behind the need for quotas, I think they are actually making women’s life more difficult – there is an underlying feeling that “women have it all easy these days” that makes work relationships more suspicious and ends up making women have to work three times as hard to show that they actually deserve what they accomplished), but rather in coaching and dedicated sponsorship of talent since the early stages.

How would you describe your top innovative achievements in terms of the problems you were trying to solve, your solutions, and the impact it has today and into the future?

The methods that I developed for subjective image quality assessment have been adopted within Philips research and their evolution to video quality assessment is now under evaluation of the Video Quality Experts Group to be advised as an alternative methodology to the standard ACR and paired comparison. The research that I carried out on the suitability of crowdsourcing for subjective QoE testing and adaptation of traditional lab-based experimental designs to crowdtesting is now included in the Qualinet white paper on Best practices for crowdsourced QoE, and has helped in better understanding the potential of this tool for QoE research (and the risks involved in its use). This research is also currently feeding new ITU-T recommendations on the subject. The models that I developed for objective QoE estimation have been published in top journals and pose the basis for a more encompassing and personalized QoE optimization.

Over your distinguished career, what are your top lessons you want to share with the audience?

Again, I am not sure whether I am yet in the position of giving advice and/or sharing lessons, but here are a couple of things:

  1. Be patient and long-sighted. Going for research that pays off on the short term is very appealing, especially when you are challenged with job insecurity (been there, done that). But it is not a sustainable strategy, you can’t make the world a better place with your research if you don’t have a long term vision, where all the pieces fit together towards a final goal. And on the long term, it’s not fun either.

  2. Be generous. Science is supposed to move forward as a collaborative effort. That’s why we talk about a “scientific community”. Be generous in sharing your knowledge and work (open access, datasets, code). Be generous in providing feedback, to your peers (be constructive in your reviews!) and to students. Be generous in helping out fellow scientists and early stage researchers. True, it is horribly time consuming. But it is rewarding, and makes our community tighter and stronger.

For girls, watch Sheryl Sandberg’s TED talk, do participate to the Grace Hopper Celebration of Women in Computing, don’t be afraid to come to the ACMMM women’s lunches, they are a lot of fun. Actually, these are good tips for boys too.

For the rest just watch The last lecture of Randy Pausch because he said it all already and much better than I could ever do.

If you were conducting this interview, what questions would you ask, and then what would be your answers?

Q: Why should one attend the ACMMM women’s lunch?

A: If you are a female junior member of the community, do attend because it will give you the opportunity to chat with senior women who have been around for a while, and can tell you all about how they got where they are (most precious advice, trust me). If you are a female senior member of the community, do attend because you could meet some young, talented researcher that needs some good tips from you, and you should not keep all your valuable advice for yourself :). If you are a male member of the community, you should attend because we really need to initiate some constructive dialogue on how to deal with the problem of low female representation in the community (because it is a problem, see next question). Being this a community problem (and not a problem of females only), we need all members of the community to discuss it.

Q: Why do we need more women in Multimedia?

A: Read this or this, or just check the Wikipedia page on women in STEM.

MPEG Column: Press release for the 114th MPEG meeting

Screen Content Coding Makes HEVC the Flexible Standard for Any Video Source

San Diego, USA − The 114th MPEG meeting was held in San Diego, CA, USA, from 22 – 26 February 2016

Powerful new HEVC tools improve compression of text, graphics, and animation

The 114th MPEG meeting marked the completion of the Screen Content Coding (SCC) extensions to HEVC – the High Efficiency Video Coding standard. This powerful set of tools augments the compression capabilities of HEVC to make it the flexible standard for virtually any type of video source content that is commonly encountered in our daily lives.

Screen content is video containing a significant proportion of rendered (moving or static) graphics, text, or animation rather than, or in addition to, camera-captured video scenes. The new SCC extensions of HEVC greatly improve the compression of such content. Example applications include wireless displays, news and other television content with text and graphics overlays, remote computer desktop access, and real-time screen sharing for video chat and video conferencing.

The technical development of the SCC extensions was performed by the MPEG and VCEG video coding joint team JCT-VC, following a joint Call for Proposals issued in February 2014.

CfP issued for technologies to orchestrate capture and consumption of media across multiple devices

At its 114th meeting, MPEG issued a Call for Proposals (CfP) for Media Orchestration. The CfP seeks submissions of technologies that will facilitate the orchestration of devices and media, both in time (advanced synchronization, e.g. across multiple devices) and space, where the media may come from multiple capturing devices and may be consumed by multiple rendering devices. An example application includes coordination of consumer CE devices to record a live event. The CfP for Media Orchestration can be found at

User Description framework helps recommendation engines deliver better choices

At the 114th meeting, MPEG has completed a standards framework (in ISO/IEC 21000-22) to facilitate the narrowing of big data searches to help recommendation engines deliver better, personalized, and relevant choices to users. Understanding the personal preferences of a user, and the context within which that user

Source: Status: Subject: Date:

is interacting with a given application, facilitates the ability of that application to better respond to individual user requests. Having that information provided in a standard and interoperable format enables application providers to more broadly scale their services to interoperate with other applications providers. Enter MPEG User Description (MPEG-UD). The aim of MPEG User Description is to ensure interoperability among recommendation services, which take into account the user and his/her context when generating recommendations for the user. With MPEG-UD, applications can utilize standard descriptors for users (user descriptor), the context in which the user is operating (context descriptor), recommendations (recommendation descriptor), and a description of a specific recommendation service that could be eventually consumed by the user (service descriptor).

Publish/Subscribe Application Format is finalized

The Publish/Subscribe Application Format (PSAF, ISO/IEC 23000-16) has reached the final milestone of FDIS at this MPEG meeting. The PSAF enables a communication paradigm where publishers do not communicate information directly to intended subscribers but instead rely on a service that mediates the relationship between senders and receivers. In this paradigm, Publishers create and store Resources and their descriptions, and send Publications; Subscribers send Subscriptions. Match Service Providers (MSP) receive and match Subscriptions with Publications and, when a Match has been found, send Notifications to users listed in Publications and Subscriptions. This paradigm is enabled by three other MPEG technologies which have also reached their final milestone: Contract Expression Language (CEL), Media Contract Ontology (MCO) and User Description (UD). A PSAF Notification is expressed as a set of UD Recommendations.

CEL is a language to express contract regarding a digital license, the complete business agreements between the parties. MCO is an ontology to represent contracts dealing with rights on multimedia assets and intellectual property protected content in general. A specific vocabulary is defined in a model extension to represent the most common rights and constraints in the audiovisual context. PSAF contracts between Publishers or Subscribers and MSPs are expressed in CEL or MCO.

Augmented Reality Application Format reaches FDIS status

At the 114th MPEG meeting, the 2nd edition of ARAF, MPEG’s Application Format for Augmented Reality (ISO/IEC 23000-13) has reached FDIS status and will be soon published as an International Standard. The MPEG ARAF enables augmentation of the real world with synthetic media objects by combining multiple existing MPEG standards within a single specific application format addressing certain industry needs. In particular, ARAF comprises three components referred to as scene, sensor/actuator, and media. The target applications include geolocation-based services, image-based object detection and tracking, audio recognition and synchronization, mixed and augmented reality games and real-virtual interactive scenarios.

Genome compression progresses toward standardization

At its 114th meeting, MPEG has progressed its exploration of genome compression toward formal standardization. The 114th meeting included a seminar to collect additional perspectives on genome data standardization, and a review of technologies that had been submitted in response to a Call for Evidence. The purpose of that CfE, which had been previously issued at the 113th meeting, was to assess whether new technologies could achieve better performance in terms of compression efficiency compared with currently used formats.

In all, 22 tools were evaluated. The results demonstrate that by integrating a multiple of these tools, it is possible to improve the compression of up to 27% with respect to the best state-of-the-art tool. With this evidence, MPEG has issued a Draft Call for Proposals (CfP) on Genomic Information Representation and Compression. The Draft CfP targets technologies for compressing raw and aligned genomic data and metadata for efficient storage and analysis.

As demonstrated by the results of the Call for Evidence, improved lossless compression of genomic data beyond the current state-of-the-art tools is achievable by combining and further developing them. The call also addresses lossy compression of the metadata which make up the dominant volume of the resulting compressed data. The Draft CfP seeks lossy compression technologies that can provide higher compression performance without affecting the accuracy of analysis application results. Responses to the Genomic Information Representation and Compression CfP will be evaluated prior to the 116th MPEG meeting in October 2016 (in Chengdu, China). An ad hoc group, co-chaired by Martin Golobiewski, convenor of Working Group 5 of ISO TC 276 (i.e. the ISO committee for Biotechnology) and Dr. Marco Mattavelli (of MPEG) will coordinate the receipt and pre-analysis of submissions received in response to the call. Detailed results to the CfE and the presentations shown during the seminar will soon be available as MPEG documents N16137 and N16147 at:

MPEG evaluates results to CfP for Compact Descriptors for Video Analysis

MPEG has received responses from three consortia to its Call for Proposals (CfP) on Compact Descriptors for Video Analysis (CDVA). This CfP addresses compact (i.e., compressed) video description technologies for search and retrieval applications, i.e. for content matching in video sequences. Visual content matching includes matching of views of large and small objects and scenes, that is robust to partial occlusions as well as changes in vantage point, camera parameters, and lighting conditions. The objects of interest include those that are planar or non-planar, rigid or partially rigid, and textured or partially textured. CDVA aims to enable efficient and interoperable design of video analysis applications in large databases, for example broadcasters’ archives or videos available on the Internet. It is envisioned that CDVA will provide a complementary set of tools to the suite of existing MPEG standards, such as the MPEG-7 Compact Descriptors for Visual Search (CDVS). Evaluation showed that sufficient technology was received such that a standardization effort is started. The final standard is expected to be ready in 2018.

Workshop on 5G/ Beyond UHD Media

A workshop on 5G/ Beyond UHD Media was held on February 24th, 2016 during the 114th MPEG meeting. The workshop was organized to acquire relevant information about the context in which MPEG technology related to video, virtual reality and the Internet of Things will be operating in the future, and to review the status of mobile technologies with the goal of guiding future codec standardization activity.

Dr. James Kempf of Ericsson reported on the challenges that Internet of Things devices face in a mobile environment. Dr. Ian Harvey of FOX discussed content creation for Virtual Reality applications. Dr. Kent Walker of Qualcomm promoted the value of unbundling technologies and creating relevant enablers. Dr. Jongmin Lee of SK Telecom explained challenges and opportunities in Next Generation Mobile Multimedia Services. Dr. Sudhir Dixit of Wireless World Research Forum reported on the next generation mobile 5G network and Its Challenges in Support of UHD Media. Emmanuel Thomas of TNO showed trends in 5G and future media consumption using media orchestration as an example. Dr. Charlie Zhang of Samsung Research America focused in his presentation on the 5G Key Technologies and Recent Advances.

Verification test complete for Scalable HEVC and MV-HEVC

MPEG has completed verification tests of SHVC, the scalable form of HEVC. These tests confirm the major savings that can be achieved by Scalable HEVC’s nested layers of data from which subsets can be extracted and used on their own to provide smaller coded streams. These smaller subsets can still be decoded with good video quality, as contrasted with the need to otherwise send separate “simulcast” coded video streams or add an intermediate “transcoding” process that would add substantial delay and complexity to the system.

The verification tests for SHVC showed that scalable HEVC coding can save an average of 40–60% in bit rate for the same quality as with simulcast coding, depending on the particular scalability scenario. SHVC includes capabilities for using a “base layer” with additional layers of enhancement data that improve the video picture resolution, the video picture fidelity, the range of representable colors, or the dynamic range of displayed brightness. Aside from a small amount of intermediate processing, each enhancement layer can be decoded by applying the same decoding process that is used for the original non-scalable version of HEVC. This compatibility that has been retained for the core of the decoding process will reduce the effort needed by industry to support the new scalable scheme.

Further verification tests were also conducted on MV-HEVC, where the Multiview Main Profile exploits the redundancy between different camera views using the same layering concept as scalable HEVC, with the same property of each view-specific layer being decodable by the ordinary HEVC decoding process. The results demonstrate that for the case of stereo (two views) video, a data rate reduction of 30% when compared to simulcast (independent HEVC coding of the views), and more than 50% when compared to the multi-view version of AVC (which is known as MVC), can be achieved for the same video quality.

Exploring new Capabilities in Video Compression Technology

Three years after finishing the first version of the HEVC standard, this MPEG meeting marked the first full meeting of a new partnership to identify advances in video compression technology. At its previous meeting, MPEG and ITU-T’s VCEG had agreed to join together to explore new technology possibilities for video coding that lie beyond the capabilities of the HEVC standard and its current extensions. The new partnership is known as the Joint Video Exploration Team (JVET), and the team is working to explore both incremental and fundamentally different video coding technology that shows promise to potentially become the next generation in video coding standardization. The JVET formation follows MPEG’s workshops and requirements-gathering efforts that have confirmed that video data demands are continuing to grow and are projected to remain a major source of stress on network traffic – even as additional improvements in broadband speeds arise in the years to come. The groundwork laid at the previous meeting for the JVET effort has already borne fruit. The team has developed a Joint Exploration Model (JEM) for simulation experiments in the area, and initial tests of the first JEM version have shown a potential compression improvement over HEVC by combining a variety of new techniques. Given sufficient further progress and evidence of practicality, it is highly likely that a new Call for Evidence or Call for Proposals will be issued in 2016 or 2017 toward converting this initial JVET exploration into a formal project for an improved video compression standard.

How to contact MPEG, learn more, and find other MPEG facts

To learn about MPEG basics, discover how to participate in the committee, or find out more about the array of technologies developed or currently under development by MPEG, visit MPEG’s home page at There you will find information publicly available from MPEG experts past and present including tutorials, white papers, vision documents, and requirements under consideration for new standards efforts. You can also find useful information in many public documents by using the search window].

Examples of tutorials that can be found on the MPEG homepage include tutorials for: High Efficiency Video Coding, Advanced Audio Coding, Universal Speech and Audio Coding, and DASH to name a few. A rich repository of white papers can also be found and continues to grow. You can find these papers and tutorials for many of MPEG’s standards freely available. Press releases from previous MPEG meetings are also available. Journalists that wish to receive MPEG Press Releases by email should contact Dr. Christian Timmerer at

Further Information

Future MPEG meetings are planned as follows:

No. 115, Geneva, CH, 30 – 03 May – June 2016
No. 116, Chengdu, CN, 17 – 21 October 2016
No. 117, Geneva, CH, 16 – 20 January, 2017
No. 118, Hobart, AU, 03 – 07 April, 2017



New ACM TOMM Policy

As a new policy of ACM TOMM, we are planning to publish three Special Issues per year, starting from 2017. We therefore invite highly qualified scientists to submit proposals for 2017 ACM TOMM Special Issues. Each Special Issue is the responsibility of the Guest Editors.

Proposals are accepted until May, 15th 2016. They should be prepared according to the instructions outlined below, and sent by e-mail to the Senior Associate Editor for Special Issue Management, Shervin Shirmohammadi ( and the Editor in Chief of ACM TOMM Alberto del Bimbo (

Please see for details.

SIGMM Technical Achievement Award — Call for nominations

SIGMM Technical Achievement Award

for Outstanding Technical Contributions to Multimedia Computing, Communications and Applications


This award is presented every year to a researcher who has made significant and lasting contributions to multimedia computing, communication and applications. Outstanding technical contributions through research and practice are recognized. Towards this goal, contributions are considered from academia and industry that focus on major advances in multimedia including multimedia processing, multimedia content analysis, multimedia systems, multimedia network protocols and services, and multimedia applications and interfaces. The award recognizes members of the community for long-term technical accomplishments or those who have made a notable impact through a significant technical innovation. The selection committee focuses on candidates’ contributions as judged by innovative ideas, influence in the community, and/or the technical/social impact resulting from their work. The award includes a $2000 honorarium , an award certificate of recognition, and an invitation for the recipient to present a keynote talk at a current year’s SIGMM-sponsored conference, the ACM International Conference on Multimedia (ACM Multimedia). Travel expenses to the conference will be covered by SIGMM, and a public citation for the award will be placed on the SIGMM website.


The award honorarium, the award certificate of recognition and travel expenses to the ACM International Conference on Multimedia is fully sponsored by the SIGMM budget.


Nominations are solicited by*May 31, 2016*with a decision made by*July 30 2016*, in time to allow the above recognition and award presentation at ACM Multimedia 2016. Nominations for the award must include:

  • A statement summarizing the candidate’s accomplishments, description of the significance of the work and justification of the nomination (two pages maximum);
  • Curriculum Vitae of the nominee;
  • Three endorsement letters supporting the nomination including the significant contributions of the candidate. Each endorsement should be no longer than 500 words with clear specification of the nominee’s contributions and impact on the multimedia field;
  • A concise statement (one sentence) of the achievement(s) for which the award is being given. This statement will appear on the award certificate and on the website.

The nomination rules are: The nominee can be any member of the scientific community.

  • The nominator must be a SIGMM member.
  • No self-nomination is allowed.
  • Nominations that do not result in an award will be valid for two further years. After three years a revised nomination can be resubmitted.
  • The SIGMM elected officers as well as members of the Awards Selection Committee are not eligible.

Please submit your nomination to the award committee by email.



  • 2015: Tat-Seng Chua (for pioneering contributions to multimedia, text and social media processing).
  • 2014: Klara Nahrstedt (for pioneering contributions in Quality of Service for MM systems and networking and for visionary leadership of the MM community).
  • 2013: Dick Bulterman (for outstanding technical contributions in multimedia authoring through research, standardization, and entrepreneurship).
  • 2012: Hong-Jiang Zhang (for pioneering contributions to and leadership in media computing including content-based media analysis and retrieval, and their applications).
  • 2011: Shi-Fu Chang (for pioneering research and inspiring contributions in multimedia analysis and retrieval).
  • 2010: Ramesh Jain (for pioneering research and inspiring leadership that transformed multimedia information processing to enhance the quality of life and visionary leadership of the multimedia community).
  • 2009: Lawrence A. Rowe (for pioneering research in continuous media software systems and visionary leadership of the multimedia research community).
  • 2008: Ralf Steinmetz (for pioneering work in multimedia communications and the fundamentals of multimedia synchronization).

SIGMM Rising Star Award — Call for nominations

SIGMM Rising Star Award


Since 2014, ACM SIGMM presents a “Rising Star” Award annually, recognizing a young researcher – an individual either no older than 35 or within 7 years of PhD – who has made outstanding research contributions to the field of multimedia computing, communication and applications during this early part of his or her career. Depth, impact, and novelty of the researcher’s contributions will be key criteria upon which the Rising Star award committee will evaluate the nominees. Also of particular interest are strong research contributions made independently from the nominee’s PhD advisor. The award includes a $1000 honorarium, an award certificate of recognition, and an invitation for the recipient to present a keynote talk at a current year’s SIGMM-sponsored conference, the ACM International Conference on Multimedia (ACM Multimedia). Travel expenses to the conference will be covered by SIGMM, and a public citation for the award will be placed on the SIGMM website.


The award honorarium, the award certificate of recognition and travel expenses to the ACM International Conference on Multimedia is fully sponsored by the SIGMM budget.


Nominations are solicited byJune 15, 2016with decision made by July 30 2016, in time to allow the above recognition and award presentation at ACM Multimedia 2016. The nomination rules are:

  • A nominee must be either 35 years of age or younger as of December 31 of the year in which the award would be made, or at most 7 years have passed since his/her PhD degree as of December 31 of the year in which the award would be made.
  • The nominee can be any member of the scientific community.
  • The nominator must be a SIGMM member.
  • No self-nomination is allowed.
  • Nominations that do not result in an award will remain in consideration for up to two years if the candidate still meets the criteria with regard to age or PhD award (i.e. no older than 35 or within 7 years of PhD). Afterwards, a new nomination must be submitted.
  • The SIGMM elected officers as well as members of the Awards Selection Committee are not eligible.

Material to be included in the nomination:

  1. Curriculum Vitae, including publications, of nominee.
  2. A letter from the nominator (maximum two pages) documenting the nominee’s research accomplishments as well as justifying the nomination, the significance of the work, and the nominee’s role in the work.
  3. A maximum of 3 endorsement letters of recommendation from others which identify the rationale for the nomination and by what means the recommender knows of the nominee’s work.
  4. A concise statement (one sentence) of the achievement(s) for which the award is being given. This statement will appear on the award certificate and on the website.

Please submit your nomination to the award committee by email.

SIGMM Rising Star Award Committee (2016)

  • Klara Nahrstedt ( <>)
  • Dick Bulterman ( <>)
  • Tat-Seng Chua ( <>)
  • Susanne Boll ( <>)
  • Nicu Sebe ( <>)
  • Shih-Fu Chang ( <>)
  • Rainer Lienhart ( <>) (CHAIR)

SIGMM PhD Thesis Award — Call for nominations

SIGMM Award for Outstanding PhD Thesis in Multimedia Computing, Communications and Applications

Award Description

This award will be presented at most once per year to a researcher whose PhD thesis has the potential of very high impact in multimedia computing, communication and applications, or gives direct evidence of such impact. A selection committee will evaluate contributions towards advances in multimedia including multimedia processing, multimedia systems, multimedia network services, multimedia applications and interfaces. The award will recognize members of the SIGMM community and their research contributions in their PhD theses as well as the potential of impact of their PhD theses in multimedia area. The selection committee will focus on candidates’ contributions as judged by innovative ideas and potential impact resulting from their PhD work. The award includes a US$500 honorarium, an award certificate of recognition, and an invitation for the recipient to receive the award at a current year’s SIGMM-sponsored conference, the ACM International Conference on Multimedia (ACM Multimedia). A public citation for the award will be placed on the SIGMM website, in the SIGMM Records e-newsletter as well as in the ACM e-newsletter.


The award honorarium, the award plaque of recognition and travel expenses to the ACM International Conference on Multimedia will be fully sponsored by the SIGMM budget.

Nomination Applications

Nominations will be solicited by the 31^st May 2016with an award decision to be made by August 30. This timing will allow a recipient to prepare for an award presentation at ACM Multimedia in that Fall (October/November). The initial nomination for a PhD thesis must relate to a dissertation deposited at the nominee’s Academic Institution between January and December of the year previous to the nomination. As discussed below, /some dissertations may be held for up to three years by the selection committee for reconsideration/. If the original thesis is not in English, a full English translation must be provided with the submission. Nominations for the award must include:

  1. PhD thesis (upload at: )
  2. A statement summarizing the candidate’s PhD thesis contributions and potential impact, and justification of the nomination (two pages maximum);
  3. Curriculum Vitae of the nominee
  4. Three endorsement letters supporting the nomination including the significant PhD thesis contributions of the candidate. Each endorsement should be no longer than 500 words with clear specification of nominee PhD thesis contributions and potential impact on the multimedia field.
  5. A concise statement (one sentence) of the PhD thesis contribution for which the award is being given. This statement will appear on the award certificate and on the website.

The nomination rules are:

  1. The nominee can be any member of the scientific community.
  2. The nominator must be a SIGMM member.
  3. No self-nomination is allowed.

If a particular thesis is considered to be of exceptional merit but not selected for the award in a given year, the selection committee (at its sole discretion) may elect to retain the submission for consideration in at most two following years. The candidate will be invited to resubmit his/her work in these years. A thesis is considered to be outstanding if:

  1. Theoretical contributions are significant and application to multimedia is demonstrated.
  2. Applications to multimedia is outstanding, techniques are backed by solid theory with clear demonstration that algorithms can be applied in new domains –  e.g., algorithms must be demonstrably scalable in application in terms of robustness, convergence and complexity.

The submission process of nominations will be preceded by the call for nominations. The call of nominations will be widely publicized by the SIGMM awards committee and by the SIGMM Executive Board at the different SIGMM venues, such as during the SIGMM premier ACM Multimedia conference (at the SIGMM Business Meeting) on the SIGMM web site, via SIGMM mailing list, and via SIGMM e-newsletter between September and December of the previous year.

Submission Process

  • Register an account at <>  and upload one copy of the nominated PhD thesis. The nominee will receive a Paper ID after the submission.
  • The nominator must then collate other materials detailed in the previous section and upload them as supplementary materials, /*except*/**/the endorsement letters, which must be emailed separately as detailed below/.
  • Contact your referees and ask them to send all endorsement letters to <> with the title: “PhD Thesis Award Endorsement Letter for [YourName]”. The web administrator will acknowledge the receipt and the submission CMT website will reflect the status of uploaded documents and endorsement letters.

It is the responsibility of the nominator to follow the process and make sure documentation is complete. Thesis with incomplete documentation will be considered invalid.

Chair of Selection Committee

Prof. Roger Zimmermann ( from National University of Singapore, Singapore

The Menpo Project


logoThe Menpo Project [1] is a BSD-licensed set of tools and software designed to provide an end-to-end pipeline for collection and annotation of image and 3D mesh data. In particular, the Menpo Project provides tools for annotating images and meshes with a sparse set of fiducial markers that we refer to as landmarks. For example, Figure 1 shows an example of a face image that has been annotated with 68 2D landmarks. These landmarks are useful in a variety of areas in Computer Vision and Machine Learning including object detection, deformable modelling and tracking. The Menpo Project aims to enable researchers, practitioners and students to easily annotate new data sources and to investigate existing datasets. Of most interest to the Computer Vision is the fact that The Menpo Project contains completely open source implementations of a number of state-of-the-art algorithms for face detection and deformable model building.

Figure 1. A facial image annotated wih 68 sparse landmarks.

Figure 1. A facial image annotated wih 68 sparse landmarks.

In the Menpo Project, we are actively developing and contributing to the state-of-the-art in deformable modelling [2], [3], [4], [5]. Characteristic examples of widely used state-of-the-art deformable model algorithms are Active Appearance Models [6],[7], Constrained Local Models [8], [9] and Supervised Descent Method [10]. However, there is still a noteworthy lack of high quality open source software in this area. Most existing packages are encrypted, compiled, non-maintained, partly documented, badly structured or difficult to modify. This makes them unsuitable for adoption in cutting edge scientific research. Consequently, research becomes even more difficult since performing a fair comparison between existing methods is, in most cases, infeasible. For this reason, we believe the Menpo Project represents an important contribution towards open science in the area of deformable modelling. We also believe it is important for deformable modelling to move beyond the established area of facial annotations and to extend to a wide variety of deformable object classes. We hope Menpo can accelerate this progress by providing all of our tools completely free and permissively licensed.

Project Structure

The core functionality provided by the Menpo Project revolves around a powerful and flexible cross-platform framework written in Python. This framework has a number of subpackages, all of which rely on a core package called menpo. The specialised subpackages are all based on top of menpo and provide state-of-the-art Computer Vision algorithms in a variety of areas (menpofit, menpodetect, menpo3d, menpowidgets).

  • menpo – This is a general purpose package that is designed from the ground up to make importing, manipulating and visualising image and mesh data as simple as possible. In particular, we focus on data that has been annotated with a set of sparse landmarks. This form of data is common within the fields of Machine Learning and Computer Vision and is a prerequisite for constructing deformable models. All menpo core types are Landmarkable and visualising these landmarks is a primary concern of the menpo library. Since landmarks are first class citizens within menpo, it makes tasks like masking images, cropping images within the bounds of a set of landmarks, spatially transforming landmarks, extracting patches around landmarks and aligning images simple. The menpo package has been downloaded more than 3000 times and we believe it is useful to a broad range of computer scientists.
  • menpofit – This package provides all the necessary tools for training and fitting a large variety of state-of-the-art deformable models under a unified framework. The methods can be roughly split in three categories:

    1. Generative Models: This category includes implementations of all variants of the Lucas-Kanade alignment algorithm [6], [11], [2], Active Appearance Models [7], [12], [13], [2], [3] and other generative models [14], [4], [5].
    2. Discriminative Models: The models of this category are Constrained Local Models [8] and other closely related techniques [9].
    3. Regression-based Techniques: This category includes the commonly-used Supervised Descent Method [10] and other state-of-the-art techniques [15], [16], [17].

    The menpofit package has been downloaded more than 1000 times.

  • menpodetect – This package contains methodologies for performing generic object detection in terms of a bounding box. Herein, we do not attempt to implement novel techniques, but instead wrap existing projects so that they integrate natively with menpo. The current wrapped libraries are DLib, OpenCV, Pico and ffld2.

  • menpo3d – Provides useful tools for importing, visualising and transforming 3D data. menpo3d also provides a simple OpenGL rasteriser for generating depth maps from mesh data.

  • menpowidgets – Package that includes Jupyter widgets for ‘fancy’ visualization of menpo objects. It provides user friendly, aesthetically pleasing, interactive widgets for visualising images, pointclouds, landmarks, trained models and fitting results.

The Menpo Project is primarily written in Python. The use of Python was motivated by its free availability on all platforms, unlike its major competitor in Computer Vision, Matlab. We believe this is important for reproducible open science. Python provides a flexible environment for performing research, and recent innovations such as the Jupyter notebook have made it incredibly simple to provide documentation via examples. The vast majority of the execution time in Menpo is actually spent in highly efficient numerical libraries and bespoke C++ code, allowing us to achieve sufficient performance for real time facial point tracking whilst not compromising on the flexibility that the Menpo Project offers.

Note the Menpo Project has benefited enormously from the wealth of scientific software available with the Python ecosystem! The Menpo Project borrows from the best of the scientific software community wherever possible (e.g. scikit-learn, matplotlib, scikit-image, PIL, VLFeat, Conda) and the Menpo team have contributed patches back to many of these projects.

Getting Started

We, as the Menpo team, are firm believers in making installation as simple as possible. The Menpo Project is designed to provide a suite of tools to solve a complex problem and therefore has a complex set of 3rd party library dependencies. The default Python packing environment does not make this an easy task. Therefore, we evangelise the use of the Conda ecosystem. In our website, we provide detailed step-by-step instructions on how to install Conda and then Menpo on all platforms (Windows, OS X, Linux) (please see Once the conda environment has been set up, installing each of the various Menpo libraries can be done with a single command, as:

$ source activate menpo
(menpo) $ conda install -c menpo menpofit
(menpo) $ conda install -c menpo menpo3d
(menpo) $ conda install -c menpo menpodetect

As part of the project, we maintain a set of Jupyter notebooks that help illustrate how Menpo should be used. The notebooks for each of the core Menpo libraries are kept inside their own repositories on our Github page, i.e. menpo/menpo-notebooks, menpo/menpofit-notebooks and menpo/menpo3d-notebooks. If you wish to view the static output of the notebooks, feel free to browse them online following these links: menpo, menpofit and menpo3d. This gives a great way to passively read the notebooks without needing a full Python environment. Note that these copies of the notebook are tied to the latest development release of our packages and contain only static output and thus cannot be run directly – to execute them you need to download them, install Menpo, and open the notebook in Jupyter.

Usage Example

Let us present a simple example that illustrates how easy it is to manipulate data and train deformable models using Menpo. In this example, we use annotated data to train an Active Appearance Model (AAM) for faces. This procedure involves four steps:

  1. Loading annotated training images
  2. Training a model
  3. Selecting a fitting algorithm
  4. Fitting the model to a test image

Firstly, we will load a set of images along with their annotations and visualize them using a widget. In order to save memory, we will crop the images and convert them to greyscale. For an example set of images, feel free to download the images and annotatons provided by [18] from here. Assuming that all the image and PTS annotation files are located in /path/to/images, this can be easily done as:

import as mio
from menpowidgets import visualize_images

images = []
for i in mio.import_images('/path/to/images', verbose=True):
    i = i.crop_to_landmarks_proportion(0.1)
    if i.n_channels == 3:
        i = i.as_greyscale()

visualize_images(images) # widget for visualising the images and their landmarks

An example of the visualize_images widget is shown in Figure 2.

Figure 2. Visualising images inside Menpo is highly customizable (within a Jupyter notebook)

Figure 2. Visualising images inside Menpo is highly customizable (within a Jupyter notebook)

The second step involves training the Active Appearance Model (AAM) and visualising using an interactive widget. Note that we use Image Gradients Orientations [13], [11] features to help improve the performance of the generic AAM we are constructing. An example of the output of the widget is shown in Figure 3.

from menpofit.aam import HolisticAAM
from menpo.feature import igo

aam = HolisticAAM(images, holistic_features=igo, verbose=True)

print(aam) # print information regarding the model
aam.view_aam_widget() # visualize aam with an interactive widget

Figure 3. Many of the base Menpo classes provide visualisation widgets that allow simple data exploration of the created models. For example, this widget shows the joint texture and shape model of the previously created AAM.

Figure 3. Many of the base Menpo classes provide visualisation widgets that allow simple data exploration of the created models. For example, this widget shows the joint texture and shape model of the previously created AAM.

Next, we need to create a Fitter object for which we specify the Lucas-Kanade algorithm to be used, as well as the number of shape and appearance PCA components.

from menpofit.aam import LucasKanadeAAMFitter

fitter = LucasKanadeAAMFitter(aam, n_shape=[5, 15], n_appearance=0.6)

Assuming that we have a test_image and an initial bounding_box, the fitting can be executed and visualized with a simple command as:

from menpowidgets import visualize_fitting_result

fitting_result = fitter.fit_from_bb(test_image, bounding_box)
visualize_fitting_result(fitting_result) # interactive widget to inspect a fitting result

An example of the visualize_fitting_result widget is shown in Figure 4.

Now we are ready to fit the AAM to a set of test_images. The fitting process needs to be initialized with a bounding box, which we retrieve using the DLib face detector that is provided by menpodetect. Assuming that we have imported the test_images in the same way as shown in the first step, the fitting is as simple as:

from menpodetect import load_dlib_frontal_face_detector

detector = load_dlib_frontal_face_detector() # load face detector

fitting_resutls = []
for i, img in enumerate(test_images):
    # detect face's bounding box(es)
    bboxes = detector(img)

    # if at least one bbox is returned
    if bboxes:
        # groundtruth shape is ONLY useful for error calculation
        groundtruth_shape = img.landmarks['PTS'].lms
        # fit
        fitting_result = fitter.fit_from_bb(img, bounding_box=bboxes[0],

visualize_fitting_result(fitting_results) # visualize all fitting results

Figure 4. Once fitting is complete, Menpo provides a customizable widget that shows the progress of fitting a particular image.

Figure 4. Once fitting is complete, Menpo provides a customizable widget that shows the progress of fitting a particular image.

Web Based Landmarker

URL: is a web application for annotating 2D and 3D data, initially developed by the Menpo Team and then heavily modernised by Charles Lirsac. It has no dependencies beyond a modern web browser and is designed to be simple and intuitive to use. It has several exciting features such as Dropbox support, snap mode (Figure 6) and easy integration with the core types provided by the Menpo Project. Apart from the Dropbox mode, it also supports a server mode, in which the annotations and assets themselves are served to the client from a separate server component which is run by the user. This allows researches to benefit from the web-based nature of the tool without having to compromise privacy or security. The server utilises Menpo to import assets and save out annotations. An example screenshot is given in Figure 5.

The application is designed in such a way to allow for efficient manual annotation. The user can also annotate any object class and define their own template of landmark labels. Most importantly, the decentralisation of the landmarking software means that researchers can recruit annotators by simply directing them to the website. We strongly believe that this is a great advantage that can aid towards acquiring large databases of correctly annotated images for various object classes. In the near future, the tool will support a semi-assisted annotation procedure, for which Menpo will be used to provide initial estimations of the correct points for the images and meshes of interest.

Figure 5. The landmarker provides a number of methods of importing assets, including from Dropbox and a custom Menpo server.

Figure 5. The landmarker provides a number of methods of importing assets, including from Dropbox and a custom Menpo server.

Figure 6. The landmarker provides an intuitive snap mode that enables the user to efficiently edit a set of existing landmarks.

Figure 6. The landmarker provides an intuitive snap mode that enables the user to efficiently edit a set of existing landmarks.


Conclusion and Future Work

The research field of rigid and non-rigid object alignment lacks of high-quality open source software packages. Most researchers release code that is not easily re-usable, which further makes it difficult to compare existing techniques in a fair and unified way. Menpo aims to fill this gap and give solutions to these problems. We put a lot of effort on making Menpo a solid platform from which researchers of any level can benefit. Note that Menpo is a rapidly changing set of software packages that attempts to keep track of the recent advances in the field. In the future, we aim to add even more state-of-the-art techniques and increase our support for 3D deformable models [19]. Finally, we plan to develop a separate benchmark package that will standarize the way comparisons between various methods are performed.

Note that by the time this article was released, the versions of the Menpo packages were as follows:

Package Version
menpo 0.6.02
menpofit 0.3.02
menpo3d 0.2.0
menpodetect 0.3.02
menpowidgets 0.1.0 0.2.1

If you have any questions regarding Menpo, please let us know on the menpo-users mailing list.


[1] J. Alabort-i-Medina, E. Antonakos, J. Booth, P. Snape, and S. Zafeiriou, “Menpo: A comprehensive platform for parametric image alignment and visual deformable models,” in Proceedings Of The ACM International Conference On Multimedia, 2014, pp. 679–682.

[2] E. Antonakos, J. Alabort-i-Medina, G. Tzimiropoulos, and S. Zafeiriou, “Feature-based lucas-kanade and active appearance models,” Image Processing, IEEE Transactions on, 2015.

[3] J. Alabort-i-Medina and S. Zafeiriou, “Bayesian active appearance models,” in Computer Vision And Pattern Recognition (CVPR), 2014 IEEE Conference On, 2014, pp. 3438–3445.

[4] J. Alabort-i-Medina and S. Zafeiriou, “Unifying holistic and parts-based deformable model fitting,” in Computer Vision And Pattern Recognition (CVPR), 2015 IEEE Conference On, 2015, pp. 3679–3688.

[5] E. Antonakos, J. Alabort-i-Medina, and S. Zafeiriou, “Active pictorial structures,” in Computer Vision And Pattern Recognition (CVPR), 2015 IEEE Conference On, 2015, pp. 5435–5444.

[6] S. Baker and I. Matthews, “Lucas-kanade 20 years on: A unifying framework,” International Journal of Computer Vision, vol. 56, no. 3, pp. 221–255, 2004.

[7] I. Matthews and S. Baker, “Active appearance models revisited,” International Journal of Computer Vision, vol. 60, no. 2, pp. 135–164, 2004.

[8] J. M. Saragih, S. Lucey, and J. F. Cohn, “Deformable model fitting by regularized landmark mean-shift,” International Journal of Computer Vision, vol. 91, no. 2, pp. 200–215, 2011.

[9] A. Asthana, S. Zafeiriou, G. Tzimiropoulos, S. Cheng, and M. Pantic, “From pixels to response maps: Discriminative image filtering for face alignment in the wild,” 2015.

[10] X. Xiong and F. De la Torre, “Supervised descent method and its applications to face alignment,” in Computer Vision And Pattern Recognition (CVPR), 2013 IEEE Conference On, 2013, pp. 532–539.

[11] G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, “Robust and efficient parametric face alignment,” in Computer Vision (ICCV), 2011 IEEE International Conference On, 2011, pp. 1847–1854.

[12] G. Papandreou and P. Maragos, “Adaptive and constrained algorithms for inverse compositional active appearance model fitting,” in Computer Vision And Pattern Recognition (CVPR), 2008 IEEE Conference On, 2008, pp. 1–8.

[13] G. Tzimiropoulos, J. Alabort-i-Medina, S. Zafeiriou, and M. Pantic, “Active orientation models for face alignment in-the-wild,” Information Forensics and Security, IEEE Transactions on, vol. 9, no. 12, pp. 2024–2034, 2014.

[14] G. Tzimiropoulos and M. Pantic, “Gauss-newton deformable part models for face alignment in-the-wild,” in Computer Vision And Pattern Recognition (CVPR), 2014 IEEE Conference On, 2014, pp. 1851–1858.

[15] A. Asthana, S. Zafeiriou, S. Cheng, and M. Pantic, “Incremental face alignment in the wild,” in Computer Vision And Pattern Recognition (CVPR), 2014 IEEE Conference On, 2014, pp. 1859–1866.

[16] V. Kazemi and J. Sullivan, “One millisecond face alignment with an ensemble of regression trees,” in Computer Vision And Pattern Recognition (CVPR), 2014 IEEE Conference On, 2014, pp. 1867–1874.

[17] G. Tzimiropoulos, “Project-out cascaded regression with an application to face alignment,” in Computer Vision And Pattern Recognition (CVPR), 2015 IEEE Conference On, 2015, pp. 3659–3667.

[18] C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, “300 faces in-the-wild challenge: The first facial landmark localization challenge,” in Computer Vision Workshops (ICCVW), 2013 IEEE International Conference On, 2013, pp. 397–403.

[19] V. Blanz and T. Vetter, “A morphable model for the synthesis of 3D faces,” in Proceedings Of The 26th Annual Conference On Computer Graphics And Interactive Techniques, 1999, pp. 187–194.

  1. Alphabetical author order signifies equal contribution

  2. Currently unreleased – the next released versions of menpo, menpofit and menpodetect will reflect these version numbers. All samples were written using the current development versions.

MediaEval 2016 Multimedia Benchmark: Call for Feedback and Participation

Each year, the Benchmarking Initiative for Multimedia Evaluation (MediaEval) offers challenges to the multimedia research community in the form of shared tasks. MediaEval tasks place their focus on the human and social aspects of multimedia. We are interested in how multimedia content can be used to produce knowledge and to create algorithms that support people in their daily lives. Many tasks are related to how people understand multimedia content, how they react to it, and how they use it. We emphasize the “multi” in multimedia: speech, audio, music, visual content, tags, users, and context. MediaEval attracts researchers with backgrounds in diverse areas, including multimedia content analysis, information retrieval, speech technology, computer vision, music information retrieval, social computing, recommender systems.


WSICC has established itself as a truly interactive workshop at EuroITV’13, TVX’14, and TVX’15 with three successful editions. The fourth edition of the WSICC workshop aims to bring together researchers and practitioners working on novel approaches for interactive multimedia content consumption. New technologies, devices, media formats, and consumption paradigms are emerging that allow for new types of interactivity. Examples include multi-panoramic video and object-based audio, increasingly available in live scenarios with content feeds from a multitude of sources. All these recent advances have an impact on different aspects related to interactive content consumption, which the workshop categorizes into Enabling Technologies, Content, User Experience, and User Interaction.

Report from the MMM Special Session Perspectives on Multimedia Analytics

This report summarizes the presentations and discussions of the special session entitled “Perspectives on Multimedia Analytics” at MMM 2016, which was held in Miami, Florida on January 6, 2016. The special session consisted of four brief paper presentations, followed by a panel discussion with questions from the audience. The session was organized by Björn Þór Jónsson and Cathal Gurrin, and chaired and moderated by Klaus Schoeffmann. The goal of this report is to record the conclusions of the special session, in the hope that it may serve members of our community who are interested in Multimedia Analytics.


Alan Smeaton opens the discussion.  From the left: Klaus Schoeffmann (moderator), Alan Smeaton, Björn Þór Jónsson, Guillaume Gravier and Graham Healy.

Alan Smeaton opens the discussion. From the left: Klaus Shoefmann (moderator), Alan Smeaton, Björn Þór Jónsson, Guillaume Gravier and Graham Healy.

Firstly, Alan Smeaton presented an analysis of time-series-based recognition of semantic concepts [1]. He argued that while concept recognition in visual multimedia is typically based on simple concepts, there is a need to recognise semantic concepts which have a temporal as- pect corresponding to activities or complex events. Furthermore, he argued that while various results are reported in the literature, there are research questions which remain unanswered, such as: “What concept detection accuracies are satisfactory for higher-level recognition?” and “Can recognition methods perform equally well across various concept detection performances?” Results suggested that, although improving concept detection accuracies can en- hance the recognition of time series based concepts, concept detection does not need to be very accurate in order to characterize the dynamic evolution of time series if appropriate methods are used. In other words, even if semantic concept detectors still have low accuracy, it makes a lot of sense to apply them to temporally adjacent shots/frames in video in order to detect semantic events from them.

Secondly, Björn Þór Jónsson presented ten research questions for scalable multimedia analytics [2]. He argued that the scale and complexity of multimedia collections is ever increasing, as is the desire to harvest useful insight from the collections. To optimally support the complex quest for insight, multimedia analytics has emerged as a new research area that combines concepts and techniques from multimedia analysis and visual analytics into a single framework. Björn argued further, however, that state-of-the-art database management solutions are not yet designed for multimedia analytics workloads, and that research is therefore required into scalable multimedia analytics, built on the three underlying pillars of visual analytics, multimedia analysis and database management. Björn then proposed ten specific research questions to address in this area.

Third, Guillaume Gravier presented a study of the needs and expectations of media professionals for multimedia analytics solutions [3]. The goal of the work was to help clarifying what multimedia analytics encompasses by studying users expectations. They focused on a very specific family of applications for search and navigation of broadcast and social news content. Using extensive conversations with media professionals, using mock-up interfaces and human-centered design methodology, they analyze the perceived usefulness of a number of functionalities leveraging existing or upcoming technology. Based on the results, Guillaume proposed a defintion of research directions for (multi)media analytics.

Graham Healy gives the final presentation of the session. Sitting, from the left: Klaus Shoefmann (moderator), Alan Smeaton, Björn Þór Jónsson, and Guillaume Gravier.

Graham Healy gives the final presentation of the session. Sitting, from the left: Klaus Schoeffmann (moderator), Alan Smeaton, Björn Þór Jónsson, and Guillaume Gravier.

Finally, Graham Healy presented an analysis of human annotation quality using neural signals such as electroencephalography (EEG) [4]. They explored how neurophysiological signals correlate to attention and perception, in order to better understand the image-annotation task. Results indicated potential issues with “how well” a person manually annotates images and variability across annotators. They propose that such issues may arise in part as a result of subjectively interpretable instructions that may fail to elicit similar labelling behaviours and decision thresholds across participants. In particular, they found instances where an individual’s annotations differed from a group consensus, even though their EEG signals indicated that they were likely in consensus. Finally, Graham discussed the potential implications of the work for annotation tasks and crowd-sourcing in the future.


Firstly, a question was asked about a definition for multi- media analytics, and its relationship to multimedia analysis. Björn proposed the following definition of the main goal of scalable multimedia analytics: “. . . to produce the processes, techniques and tools to allow many diverse users to efficiently and effectively analyze large and dynamic multimedia collections over a long period of time to gain insight and knowledge” [2]. Guillaume, on the other hand, proposed that multimedia analytics could be defined as: “. . . the process of organizing multimedia data collections and providing tools to extract knowledge, gain insight and help make decisions by interacting with the data” [3]. Finally, Alan also added that in contrast with multimedia analysis, the multimedia analytics user is involved in every stage of the whole process: from media production/capturing, over data inspection, filtering, and structuring, up to the final consumption, visualization, and usage of the media data.

Clearly, all three definitions are largely in agreement, as they focus on the insight and knowledge gained through interaction with data. In addition, the first definition includes scalability aspects, such as the number of analysts and the duration of analysis, The speakers agreed that multimedia analysis was mostly concerned with the automatic analysis of media content, while media interaction is definitely an important aspect to consider in multimedia analytics. Björn even proposed that users might be more satisfied with a system that takes a few iterations of user interaction to reach a conclusion, than with a system that takes a somewhat shorter time to reach the same conclusions without any interaction. Guillaume stressed that their work had demonstrated the importance of working with the professional users to get their requirements early. When asked, Graham agreed that using neural sen- sors could potentially become a weapon in the analyst’s arsenal, helping the analyst to understand what the brain finds interesting.

A question was asked about potential application areas for multimedia analytics. There was general agreement that many and diverse areas could benefit from multimedia analytics techniques. Alan listed a number of application areas, such as: on-line education, lifelogging, surveillance and forensics, medicine and biomedicine, and so on; in fact he struggled to find an area that could not be affected. There was also agreement that many multimedia analytics application areas would need to involve very large quantities of data. As an example, the recent YFCC 100M collection has nearly 100 million images and around 800 thousand videos; yet compared to web-scale collections it is still very small.

A further thread of discussion centered on where to focus research efforts. The works described by Björn and Guillaume already propose some long-term research questions and directions. Based on his experience, Alan proposed that work on improving the quality of a particular concept detector from 95% to 96%, for example, would not have any significant impact, while work on improving the higher-level detection to use more (and more varied) information would be much more productive. Alan was asked in continuation whether researchers working on concept detection should rather focus on more general concepts with higher recall but often low precision (e.g., beach, car, food, etc.) or more specific concepts with low recall but typically higher precision (e.g., Nascar racing tyre, Shushi, United Airlines plane etc.). He answered that none should be particularly preferred but we need to continue work for both types of concepts.

Finally, some questions were posed to the participants about details of their respective works; however these will not be reported here.


Overall, the conclusion of the discussion is that multimedia analytics should be a very fruitful research area in the future, with diverse application in many areas and for many users. While the finer-grained conclusions of the discussion that we have described above were perhaps not revolutionary, we nevertheless felt it would be a service to the community to write them down in this short report.

The panel format of the special session made the discussion much more lively and interactive than that of a traditional technical session. We would like to thank the presenters and their co-authors for their excellent contributions. The session chairs would also particularly like to thank the moderator, Klaus Schoeffmann, for his contribution to the session, as a good panel moderator is very important for the success of the session.


[1] Peng Wang, Lifeng Sun, Shiqiang Yang, and Alan Smeaton. What are the limits to time series based recognition of semantic concepts? In Proc. MMM, Miami, FL, USA, 2016.
[2] Björn Þór Jónsson, Marcel Worring, Jan Zahálka, Stevan Rudinac, and Laurent Amsaleg. Ten research questions for scalable multimedia analytics. In Proc. MMM, Miami, FL, USA, 2016.
[3] Guillaume Gravier, Martin Ragot, Laurent Amsaleg, Rémi Bois, Grégoire Jadi, Éric Jamet, Laura Monceaux, and Pascale Sébillot. Shaping-up multimedia analytics: Needs and expectations of media professionals. In Proc. MMM, Miami, FL, USA, 2016.
[4] Graham Healy, Cathal Gurrin, and Alan Smeaton. Informed perspectives on human annotation using neural signals. In Proc. MMM, Miami, FL, USA, 2016.