MPEG Column: 116th MPEG Meeting

MPEG Workshop on 5-Year Roadmap Successfully Held in Chengdu

Chengdu, China – The 116th MPEG meeting was held in Chengdu, China, from 17 – 21 October 2016

MPEG Workshop on 5-Year Roadmap Successfully Held in Chengdu

At its 116th meeting, MPEG successfully organised a workshop on its 5-year standardisation roadmap. Various industry representatives presented their views and reflected on the need for standards for new services and applications, specifically in the area of immersive media. The results of the workshop (roadmap, presentations) and the planned phases for the standardisation of “immersive media” are available at http://mpeg.chiariglione.org/. A follow-up workshop will be held on 18 January 2017 in Geneva, co-located with the 117th MPEG meeting. The workshop is open to all interested parties and free of charge. Details on the program and registration will be available at http://mpeg.chiariglione.org/.

Summary of the “Survey on Virtual Reality”

At its 115th meeting, MPEG established an ad-hoc group on virtual reality which conducted a survey on virtual reality with relevant stakeholders in this domain. The feedback from this survey has been provided as input for the 116th MPEG meeting where the results have been evaluated. Based on these results, MPEG aligned its standardisation timeline with the expected deployment timelines for 360-degree video and virtual reality services. An initial specification for 360-degree video and virtual reality services will be ready by the end of 2017 and is referred to as the Omnidirectional Media Application Format (OMAF; MPEG-A Part 20, ISO/IEC 23000-20). A standard addressing audio and video coding for 6 degrees of freedom where users can freely move around is on MPEG’s 5-year roadmap. The summary of the survey on virtual reality is available at http://mpeg.chiariglione.org/.

MPEG and ISO/TC 276/WG 5 have collected and evaluated the answers to the Genomic Information Compression and Storage joint Call for Proposals

At its 115th meeting, MPEG issued a Call for Proposals (CfP) for Genomic Information Compression and Storage in conjunction with the working group for standardisation of data processing and integration of the ISO Technical Committee for biotechnology standards (ISO/TC 276/WG5). The call sought submissions of technologies that can provide efficient compression of genomic data and metadata for storage and processing applications. During the 116th MPEG meeting, responses to this CfP have been collected and evaluated by a joint ad-hoc group of both working groups, comprising twelve distinct technologies submitted. An initial assessment of the performance of the best eleven solutions for the different categories reported compression factors ranging from 8 to 58 for the different classes of data.

The submitted twelve technologies show consistent improvements versus the results assessed as an answer to the Call for Evidence in February 2016. Further improvements of the technologies under consideration are expected with the first phase of core experiments that has been defined at the 116th MPEG meeting. The open core experiments process planned in the next 12 months will address multiple, independent, directly comparable rigorous experiments performed by independent entities to determine the specific merit of each technology and their mutual integration into a single solution for standardisation. The core experiment process will consider submitted technologies as well as new solutions in the scope of each specific core experiment. The final inclusion of submitted technologies into the standard will be based on the experimental comparison of performance, as well as on the validation of requirements and inclusion of essential metadata describing the context of the sequence data, and will be reached by consensus within and across both committees.

Call for Proposals: Internet of Media Things and Wearables (IoMT&W)

At its 116th meeting, MPEG issued a Call for Proposals (CfP) for Internet of Media Things and Wearables (see http://mpeg.chiariglione.org/), motivated by the understanding that more than half of major new business processes and systems will incorporate some element of the Internet of Things (IoT) by 2020. Therefore, the CfP seeks submissions of protocols and data representation enabling dynamic discovery of media things and media wearables. A standard in this space will facilitate the large-scale deployment of complex media systems that can exchange data in an interoperable way between media things and media wearables.

MPEG-DASH Amendment with Media Presentation Description Chaining and Pre-Selection of Adaptation Sets

At its 116th MPEG meeting, a new amendment for MPEG-DASH reached the final stage of Final Draft Amendment (ISO/IEC 23009-1:2014 FDAM 4). This amendment includes several technologies useful for industry practices of adaptive media presentation delivery. For example, the media presentation description (MPD) can be daisy chained to simplify implementation of pre-roll ads in cases of targeted dynamic advertising for live linear services. Additionally, support for pre-selection in order to signal suitable combinations of audio elements that are offered in different adaptation sets is enabled by this amendment. As there have been several amendments and corrigenda produced, this amendment will be published as a part of the 3rd edition of ISO/IEC 23009-1 together with the amendments and corrigenda approved after the 2nd edition.

How to contact MPEG, learn more, and find other MPEG facts

To learn about MPEG basics, discover how to participate in the committee, or find out more about the array of technologies developed or currently under development by MPEG, visit MPEG’s home page at http://mpeg.chiariglione.org. There you will find information publicly available from MPEG experts past and present including tutorials, white papers, vision documents, and requirements under consideration for new standards efforts. You can also find useful information in many public documents by using the search window.

Examples of tutorials that can be found on the MPEG homepage include tutorials for: High Efficiency Video Coding, Advanced Audio Coding, Universal Speech and Audio Coding, and DASH to name a few. A rich repository of white papers can also be found and continues to grow. You can find these papers and tutorials for many of MPEG’s standards freely available. Press releases from previous MPEG meetings are also available. Journalists that wish to receive MPEG Press Releases by email should contact Dr. Christian Timmerer at christian.timmerer@itec.uni-klu.ac.at or christian.timmerer@bitmovin.com.

Further Information

Future MPEG meetings are planned as follows:
No. 117, Geneva, CH, 16 – 20 January, 2017
No. 118, Hobart, AU, 03 – 07 April, 2017
No. 119, Torino, IT, 17 – 21 July, 2017
No. 120, Macau, CN, 23 – 27 October 2017

For further information about MPEG, please contact:
Dr. Leonardo Chiariglione (Convenor of MPEG, Italy)
Via Borgionera, 103
10040 Villar Dora (TO), Italy
Tel: +39 011 935 04 61
leonardo@chiariglione.org

or

Priv.-Doz. Dr. Christian Timmerer
Alpen-Adria-Universität Klagenfurt | Bitmovin Inc.
9020 Klagenfurt am Wörthersee, Austria, Europe
Tel: +43 463 2700 3621
Email: christian.timmerer@itec.aau.at | christian.timmerer@bitmovin.com

Call for Task Proposals: Multimedia Evaluation 2017

MediaEval 2017 Multimedia Evaluation Benchmark

Call for Task Proposals

Proposal Deadline: 3 December 2016

MediaEval is a benchmarking initiative dedicated to developing and evaluating new algorithms and technologies for multimedia retrieval, access and exploration. It offers tasks to the research community that are related to human and social aspects of multimedia. MediaEval emphasizes the ‘multi’ in multimedia and seeks tasks involving multiple modalities, e.g., audio, visual, textual, and/or contextual.

MediaEval is now calling for proposals for tasks to run in the 2017 benchmarking season. The proposal consists of a description of the motivation for the task and challenges that task participants must address. It provides information on the data and evaluation methodology to be used. The proposal also includes a statement of how the task is related to MediaEval (i.e., its human or social component), and how it extends the state of the art in an area related to multimedia indexing, search or other technologies that support users in accessing multimedia collections.

For more detailed information about the content of the task proposal, please see:
http://www.multimediaeval.org/files/mediaeval2017_taskproposals.html

Task proposal deadline: 3 December 2016

Task proposals are chosen on the basis of their feasibility, their match with the topical focus of MediaEval, and also according to the outcome of a survey circulated to the wider multimedia research community.

The MediaEval 2017 Workshop will be held 13-15 September 2017 in Dublin, Ireland, co-located with CLEF 2017 (http://clef2017.clef-initiative.eu)

For more information about MediaEval see http://multimediaeval.org or contact Martha Larson m.a.larson@tudelft.nl

 

SIGMM Award for Outstanding Ph.D. Thesis in Multimedia Computing, Communications and Applications 2016

image001ACM Special Interest Group on Multimedia (SIGMM) is pleased to present the 2016 SIGMM Outstanding Ph.D. Thesis Award to Dr. Christoph Kofler. The award committee considers Dr. Kofler’s dissertation entitled “User Intent in Online Video Search” worthy of the recognition as the thesis is the first to innovatively consider a user’s intent in multimedia search yielding significantly improved results in satisfying the information need of the user. The work has high originality and is expected to have significant impact, especially in boosting the search performance for multimedia data.

Dr. Kofler’s thesis systematically explores a user’s video search intent that is behind a user’s information need in three steps: (1) analyzing a real-world transaction log produced by a large video search engine to understand why searches fail, (2) understanding the possible intents of users behind video search and uploads, and (3) designing an intent-aware video search result optimization approach that re-ranks initial video search results so as to yield the highest potential to satisfy the users’ search intent.

The effectiveness of the framework developed in the thesis has been successfully justified by a thorough range of experiments. The thesis topic itself is highly topical and the framework makes groundbreaking contributions to our understanding and knowledge in the area of users’ information seeking, user intent, user satisfaction, and multimedia search engine usability.  The publications related to the thesis clearly demonstrate the impact of this work across several research disciplines including multimedia, web, and information retrieval.  Overall, the committee recognizes that the thesis has significant impact and makes considerable contributions to the multimedia community. 

Bio of Awardee:

Dr. Christoph Kofler is a software engineer and data scientist at Bloomberg L.P., NY, USA. He holds a Ph.D. degree from Delft University of Technology, The Netherlands, and an M.Sc. and B.Sc. degree from Klagenfurt University, Austria – all in Computer Science. His research interests include the broad fields of multimedia and text-based information retrieval with focus on search intent inference and its applications for search results optimization throughout the entire search engine pipeline (indexing, ranking, query formulation). In addition to “what” a user is looking for using search, Dr. Kofler is particularly interested in the “why” component behind the search and in the related opportunities for improving the efficiency and effectiveness of information retrieval systems. Dr. Kofler has co-authored more than 20 scientific publications with predominant focus on venues such as ACM Multimedia, IEEE Transactions on Multimedia, and ACM Computing Surveys. He has been a task co-organizer of the MediaEval Benchmark initiative. He received the Grand Challenge Best Presentation Award at ACM Multimedia and the Best Paper nomination at the European Conference on Information Retrieval. Dr. Kofler is a recipient of the Google Doctoral Fellowship in Information Retrieval (Video Search). He has held positions at Microsoft Research, Beijing, China; Columbia University, NY, USA; and Google, NY, USA.

 

image003-1The award committee is pleased to present an honorable mention to Dr. Varun Singh for the thesis entitled: “Protocols and Algorithms for Adaptive Multimedia Systems.” The thesis develops and presents congestion control algorithms and signaling protocols that are used in interactive multimedia communications.  The committee is impressed by the thorough theoretical and experimental depth of the thesis. Additionally, remarkable are Dr. Singh’s efforts to shepherd his work to real world adoption which has led him to author four RFCs and several standards-track documents in the IETF. This has resulted in the incorporation of his work in the production versions of the Chrome and Firefox web browsers. Therefore, it can be seen that his work has already achieved impact in the multimedia community.

Bio of Awardee:

Dr. Varun Singh received his Master’s degree in Electrical Engineering from Helsinki University of Technology, Finland, in 2009 his Ph.D. degree from Aalto University, Finland, in 2015.  His research has led him to making important contributions to different standardization organization: 3GPP (2008 – 2010), IETF (since 2010), and W3C (since 2014). He is the co-author of the WebRTC Statistics API. Beyond this, his research work led him to found and become CEO of callstats.io, a startup which analyses and optimizes the Quality of multimedia in real-time communication (currently, WebRTC).

 

ACM TOMM Special Issues and Special Sections

ACM TOMM journal has launched a new two-year program of SPECIAL ISSUES and SPECIAL SECTIONS on strategic and emerging topics in Multimedia research. Each Special Issue will also include an extended survey paper on the subject of the issue, prepared by the Guest Editors. It will help to highlight trends and research paths and will position the contributed papers appropriately.

On May, we received 11 proposals and selected 4 proposals for Special Issues and 2 proposals for Special Sections, based on the timeliness and relevance of the topic and the qualification of the proponents:

SPECIAL ISSUES (8 papers each)

  • “Deep Learning for Mobile Multimedia”
    for publication on April’17. Submission deadline Oct 15, 2016
  • “Delay-Sensitive Video Computing in the Cloud”
    for publication on July’17. Submission deadline Nov. 30, 2016
  • “Representation, Analysis and Recognition of 3D Human”
    for publication on Nov’17. Submission deadline Jan. 15, 2017
  • “QoE Management for Multimedia Services”
    for publication on April’18. Submission deadline May 15, 2017

SPECIAL SECTIONS (4 papers each)

  • “Multimedia Computing and Applications of Socio-Affective Behaviors in the Wild”
    for publication on May ’17. Submission deadline Oct 31, 2016
  • “Multimedia Understanding via Multimodal Analytics”
    for publication on May ’17. Submission deadline Oct 31, 2016

You can visit the ACM TOMM home page at  http://tomm.acm.org, news section, for more in-detail information. We will be definitely happy of your valuable contributions to this initiative.

 

ACM SIGMM Award for Outstanding Technical Contributions to Multimedia Computing, Communications and Applications

image002The 2016 winner of the prestigious ACM Special Interest Group on Multimedia (SIGMM) award for Outstanding Technical Contributions to Multimedia Computing, Communications and Applications is Prof. Dr. Alberto del Bimbo. The award is given in recognition of his outstanding, pioneering and continued research contributions in the areas of multimedia processing, multimedia content analysis, and multimedia applications, his leadership in multimedia education, and his outstanding and continued service to the community.

Prof. del Bimbo was among the very few who pioneered the research in image and video content-based retrieval in the late 80’s. Since that time, for over 25 years, he has been among the most visionary and influential researchers in Europe and world-wide in this field. His research has influenced several generations of researchers that are now active in some of the most important research centers world-wide. Over the years, he has made significant innovative research contributions.

In the early times of the discipline he explored all the modalities for retrieval by visual similarity of images and video. In his early paper Visual Image Retrieval by Elastic Matching of User Sketches published in IEEE Trans. on Pattern Analysis and Machine Intelligence in 1997, he presented one of the first and top performing methods for image retrieval by shape similarity from user’s sketches. He also published in IEEE Trans. on Pattern Analysis and Machine Intelligence and IEEE Trans. on Multimedia his original research on representations for spatial relationships between image regions based on spatial logic. This ground-breaking research was accompanied by the definition of efficient index structures to permit retrieval from large datasets. He was one of the first to address this large datasets aspect that has now become very important for the research community.

Since the early 2000s, with the advancement of 3D imaging technologies and the availability of a new generation of acquisition devices capable of capturing the geometry of 3D objects in the three-dimensional physical space, Prof. del Bimbo and his team initiated research in 3D content based retrieval that has now become increasingly popular in mainstream research. Again, he was among the very first researchers to initiate this research. Particularly, he focused on 3D face recognition extending the weighted walkthrough representation of spatial relationships between image regions to model the 3D relationships between facial stripes. His solution of 3D Face Recognition Using Iso-geodesic Stripes scored the best performance at SHREC Shape Retrieval Contest in 2008, and was published in IEEE Trans. on Pattern Analysis and Machine Intelligence, in 2010. At CVPR’15 he presented a novel idea for representing 3D textured mesh manifolds using Local Binary Patterns, that is highly effective for 3D face retrieval. This was the first attempt to combine 3D geometry and photometric texture into a single unified representation. In 2016 he has co-authored a forward looking survey on content-based image retrieval in the context of social image platforms, that has appeared on ACM Computing Surveys. It includes an extensive treatise of image tag assignment, refinement and tag-based retrieval and explores the differences between traditional image retrieval and retrieval with socially generated images.
One very important aspect of his contribution to the community is Professor del Bimbo’s educational impact during his career. He was the author of the monograph, Visual Information Retrieval, published by Morgan Kaufmann in 1999 which became one of the most cited and influential books from the early years of image and video content-based retrieval. Many young researchers have used this book as the main reference in their studies, and their career has been shaped by the ideas discussed in this book. Being the first and sole book on that subject in the early times of the discipline, it played a key role to develop content-based retrieval from a research niche to a largely populated field of research and to make it central to Multimedia research.

Professor del Bimbo has an extraordinary and long-lasting track record of services to the scientific community through the last 20 years. As the General Chair he organized two of the most successful conferences in Multimedia, namely IEEE ICMCS’99, the Int’l Conf. on Multimedia Computing and Systems (now renamed IEEE ICME) and ACM MULTIMEDIA’10. The quality and success of these conferences were highly influential to attract new young researchers in the field and form the present research community. Since 2016, he is the Editor-in-Chief for ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM).

Announcement of ACM SIGMM Rising Star Award 2016

image003The ACM Special Interest Group on Multimedia (SIGMM) is pleased to present this year’s Rising Star Award in multimedia computing, communications and applications to Dr. Bart Thomee for his significant contributions in the areas of geo-multimedia computing, media evaluation, and open research datasets. The ACM SIGMM Rising Star Award recognizes a young researcher who has made outstanding research contributions to the field of multimedia computing, communication and applications during the early part of his or her career.

Dr. Bart Thomee received his Ph.D. from Leiden University in 2010. In his thesis, he focused on multimedia search and exploration, specifically targeting artificial imagination and duplicate detection. On the topic of artificial imagination, he aimed to more rapidly understand the user’s search intent by generating imagery that resemble the ideal image the user is looking for. Using the synthesized images as queries instead of existing images from the database boosted the relevance of the image results by up to 23%. On the topic of duplicate detection, he designed descriptors to compactly represent web-scale image collections and to accurately detect transformed versions of the same image. This work led to an Outstanding Paper Citation at ACM Conference on Multimedia Information Retrieval 2008.

In 2011, he jointed Yahoo Labs, where Dr. Thomee ‘s interests grew into geographic computing in Multimedia. He began characterizing spatiotemporal regions from labeled (e.g. tagged) georeferenced media, for which he devised a technique based on scale-space theory that could process billions of georeferenced labels in a matter of hours. This work was published at WWW 2013 and became a reference example at Yahoo for how to disambiguate multi-language and multi-meaning labels from media with noisy annotations.

He also started to use an overlooked piece of information that is found in most camera phone images: compass information. He developed a technique to accurately pinpoint the locations and surface area of landmarks, solely based on the positions and orientations of photos taken of them which may have been taken hundreds of yards to miles away.

Dr. Thomee’s recent work on the YFCC100M dataset has had important impacts on the multimedia and SIGMM research community. This new dataset was real in size and structure to fuel and change the landscape of research in Multimedia. What started as an initiative to release a geo-Flickr dataset, Dr. Thomee quickly saw the broader impact and worked rapidly to scale the size. He had to push the limits of openness without violating licensing terms, copyright, or privacy. He worked closely with many lawyers to overturn the default, restrictive terms of use by making it also available to non-academics all over the world. He coordinated and led the efforts to share the data and effort horizontally with ICSI, LLNL, and Amazon Open Data. It was highlighted in the 2016 February issue of the Communications of ACM (CACM). The dataset has been requested over 1200 times in just a few months and cited many times since launch. Dr. Thomee has continued by releasing expansion packs to the YFCC100M. This dataset is expected to impact Multimedia research significantly over the future years.

Dr. Thomee has also been an exemplary community member of the Multimedia community. For example, he organized the ImageCLEF photo annotation task (2012-2013) and MediaEval placing task (2013-2016) as well as designed the ACM Grand Challenge on Event Summarization (2015) and on Tag & Caption Prediction (2016).

In summary, Dr. Bart Thomee receives the 2016 ACM SIGMM Rising Star Award Thomee for significant contributions in the areas of geo-multimedia computing, media evaluation, and open datasets for research.

MPEG Column: 115th MPEG Meeting

The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects.

The 115th MPEG meeting was held in Geneva, Switzerland and its press release highlights the following aspects:

 

  • IMG_2276MPEG issues Genomic Information Compression and Storage joint Call for Proposals in conjunction with ISO/TC 276/WG 5
  • Plug-in free decoding of 3D objects within Web browsers
  • MPEG-H 3D Audio AMD 3 reaches FDAM status
  • Common Media Application Format for Dynamic Adaptive Streaming Applications
  • 4th edition of AVC/HEVC file format

In this blog post, however, I will cover topics specifically relevant for adaptive media streaming, namely:

  • Recent developments in MPEG-DASH
  • Common media application format (CMAF)
  • MPEG-VR (virtual reality)
  • The MPEG roadmap/vision for the future.

MPEG-DASH Server and Network assisted DASH (SAND): ISO/IEC 23009-5

Part 5 of MPEG-DASH, referred to as SAND – server and network-assisted DASH – has reached FDIS. This work item started sometime ago at a public MPEG workshop during the 105th MPEG meeting in Vienna. The goal of this part of MPEG-DASH is to enhance the delivery of DASH content by introducing messages between DASH clients and network elements or between various network elements for the purpose of improving the efficiency of streaming sessions by providing information about real-time operational characteristics of networks, servers, proxies, caches, CDNs as well as DASH client’s performance and status. In particular, it defines the following:

  1. The SAND architecture which identifies the SAND network elements and the nature of SAND messages exchanged among them.
  2. The semantics of SAND messages exchanged between the network elements present in the SAND architecture.
  3. An encoding scheme for the SAND messages.
  4. The minimum to implement a SAND message delivery protocol.

The way that this information is to be utilized is deliberately not defined within the standard and left open for (industry) competition (or other standards developing organizations). In any case, there’s plenty of room for research activities around the topic of SAND, specifically:

  • A main issue is the evaluation of MPEG-DASH SAND in terms of qualitative and quantitative improvements with respect to QoS/QoE. Some papers are available already and have been published within ACM MMSys 2016.
  • Another topic of interest includes an analysis regarding scalability and possible overhead; in other words, I’m wondering whether it’s worth using SAND to improve DASH.

MPEG-DASH with Server Push and WebSockets: ISO/IEC 23009-6

Part 6 of MPEG-DASH reached DIS stage and deals with server push and Web sockets, i.e., it specifies the carriage of MPEG-DASH media presentations over full duplex HTTP-compatible protocols, particularly HTTP/2 and WebSocket. The specification comes with a set of generic definitions for which bindings are defined allowing its usage in various formats. Currently, the specification supports HTTP/2 and WebSocket.

For the former it is required to define the push policy as an HTTP header extension whereas the latter requires the definition of a DASH subprotocol. Luckily, these are the preferred extension mechanisms for both HTTP/2 and WebSocket and, thus, interoperability is provided. The question of whether or not the industry will adopt these extensions cannot be answered right now but I would recommend keeping an eye on this and there are certainly multiple research topics worth exploring in the future.

An interesting aspect for the research community would be to quantify the utility of using push methods within dynamic adaptive environments in terms of QoE and start-up delay. Some papers provide preliminary answers but a comprehensive evaluation is missing.

To conclude the recent MPEG-DASH developments, the DASH-IF recently established the Excellence in DASH Award at ACM MMSys’16 and the winners are presented here (including some of the recent developments described in this blog post).

Common Media Application Format (CMAF): ISO/IEC 23000-19

The goal of CMAF is to enable application consortia to reference a single MPEG specification (i.e., a “common media format”) that would allow a single media encoding to use across many applications and devices. Therefore, CMAF defines the encoding and packaging of segmented media objects for delivery and decoding on end user devices in adaptive multimedia presentations. This sounds very familiar and reminds us a bit on what the DASH-IF is doing with their interoperability points. One of the goals of CMAF is to integrate HLS in MPEG-DASH which is backed up with this WWDC video where Apple announces the support of fragmented MP4 in HLS. The streaming of this announcement is only available in Safari and through the WWDC app but Bitmovin has shown that it also works on Mac iOS 10 and above, and for PC users all recent browser versions including Edge, FireFox, Chrome, and (of course) Safari. 

MPEG Virtual Reality

IMG_2285 (1)
Virtual reality is becoming a hot topic across the industry (and also academia) which also reaches standards developing organizations like MPEG. Therefore, MPEG established an ad-hoc group (with an email reflector) to develop a roadmap required for MPEG-VR. Others have also started working on this like DVB, DASH-IF, and QUALINET (and maybe many others: W3C, 3GPP). In any case, it shows that there’s a massive interest in this topic and Bitmovin has shown already what can be done in this area within today’s Web environments. Obviously, adaptive streaming is an important aspect for VR applications including a many research questions to be addressed in the (near) future. A first step towards a concrete solution is the Omnidirectional Media Application Format (OMAF) which is currently at working draft stage (details to be provided in a future blog post).

The research aspects covers a wide range activity including – but not limited to – content capturing, content representation, streaming/network optimization, consumption, and QoE.

MPEG roadmap/vision

At it’s 115th meeting, MPEG published a document that lays out its medium-term strategic standardization roadmap. The goal of this document is collecting feedback from anyone in professional and B2B industries dealing with media, specifically but not limited to broadcasting, content and service provision, media equipment manufacturing, and telecommunication industry. The roadmap is depicted below and further described in the document available here. Please note that “360 AV” in the figure below also refers to VR but unfortunately it’s not (yet) reflected in the figure. However, it points out the aspects to be addressed by MPEG in the future which would be relevant for both industry and academia.

MPEG-Roadmap

The next MPEG meeting will be held in Chengdu, October 17-21, 2016.

An interview with Judith Redi

Describe your journey into computing from your youth up to the present. What foundational lessons did you learn from this journey? Why were you initially attracted to multimedia?

Dr. Judith Redi

Dr. Judith Redi

My path to multimedia was, let’s say, non-linear. I grew up in the Italian educational system, which up until university, is somewhat biased towards social sciences and humanities. My family was not one of engineers/scientists either, and never really encouraged me to look at the technical side of things. Basically, I was on a science-free educational diet until university. On the other hand, my hometown used to host the headquarters of Olivetti (may remember fancy typewriters and early personal computers?). This meant that at a very young age I had a PC at home and at school, and could use it (as a “user” on the other side of the systems we develop; I had no clue about programming).

When the time came to choose a major at university, I decided to turn the tables, a bit as a provocative action towards my previous education/mind-set, and a bit because I was fascinated by the perspective of being able to design and build future technologies. So, I picked computer engineering, perhaps inspired by my hometown technological legacy. I immediately got fascinated by artificial intelligence, and its potential to make machines more human-like (I still tell all my bachelor students that they should have a picture of Turing on their desk or above their bed). I specialized in machine learning and applied it to cryptanalysis within my master thesis. I won a scholarship to continue that research line in a PhD project at the University of Genoa. And then Philips came along, and multimedia with it.

At the time (2007), Philips was still manufacturing displays, and to stay ahead of the competition, they had to make sure their products would deliver to users the highest possible visual quality. They had algorithms to enhance image quality, but needed a system able to understand how much enhancement was needed, and of which type (sharpening? De-noising?), based on the analysis on the incoming video signal. They wanted to try a machine-learning approach to this issue, and referred to my group for collaboration. I picked up the project immediately: the goal was to model human vision (or at least the processes underlying visual quality perception), which implied not only developing new intelligent systems at the intersection between Signal Processing and Machine Learning, but also to learn more about the users of these systems, their perception and cognition. It was the fact that it would allow me to adopt a user-centred approach, closing the loop back to my social science-oriented education, that made multimedia so attractive to me. So, I left cyber-security, embraced Multimedia, and never left since.

One Philips internship, a best PhD thesis award and a Postdoc later, I am still fascinated by this duality. Much has changed in multimedia delivery, with the shift from linear TV to on-demand content consumption, video streaming accounting for 70% of the internet traffic nowadays, and the advent of Ultra High Definition solutions. User expectations in terms of Quality of Experience (QoE) increase by the day, and they are not only affected by the amount of disruptions (due to encoding, untrustworthy transmissions, rendering inaccuracies) in the delivered video, but also relate to content semantics and popularity, user affective state, environment and social context. The role of these factors on QoE is yet to be understood, let alone modelled. This is what I am working on at TU Delft, and is a long term plan, so I guess I won’t be leaving multimedia any time soon.

I’d say it’s too early for me to draw “foundational lessons” worth sharing from my journey. I guess there are a few things, though, that I figured out along the years, and that may be worthwhile mentioning:

  1. Seemingly reckless choices may be the best decisions you have ever made. Change is scary, but can pay off big time.

  2. Luck exists but hard work is a much safer bet

  3. Keep having fun doing your research. If you’re not having fun anymore, see point (1).

Tell us more about your vision and objectives behind your current roles? What do you hope to accomplish and how will you bring this about?

As a researcher, I have been devoting most of my efforts to understanding multimedia experiences and steer their optimization (or improvement) towards a higher user satisfaction (with the delivery system). On the longer term, I want broaden this scope, to make an even bigger impact on people’s life: I want to go beyond quality of experience and multimedia enjoyment, and target the optimization (or at least improvement) of users’ well-being.

For the past four years, I have been working with Philips Research on an Ambient Assisted Living system able to (1) sense the mood of a user in a room and (2) adapt the lighting in the room to alleviate negative moods (e.g., sadness, or anxiety), when sensed. We were able to show that the system can successfully counter negative moods in elderly users (see our recent PLoS One publication if you are interested), without the need of human intervention. The thing is, negative affective states are experienced by elderly (but by younger people too, according to recent findings) quite often, and most times, a fellow human (relative, friend, caretaker) is not available to comfort the person. My vision is to build systems that, based on the unobtrusive sensing of users’ affective states, can act upon the detection of negative states and relieve the user just as a human would do.

I want to design “empathic technology”, able to provide empathic care, whenever human care is not within reach. Challenges are multiple here. First, (long-term) affective states (such as mood, which is more constant and subtle than emotion) are to be sensed. (Wearable) sensors, cameras, or also interaction with mobile devices and social media can provide relevant information here. Empathic care can then be conveyed through ambient intelligence solutions, but also by creative industries products, ranging from gaming to intelligent clothing, to, of course, Multimedia technology (think about empathic recommender systems, or videotelephony systems that are optimized to maximize the affective charge of the communication). This type of work is highly multidisciplinary (involving multimedia systems, affective computing, embedded systems and sensors, HCI and certainly psychology), and the low-hanging fruits are not many. But I’d like this to be my contribution to make the world a better place, and I am ready to take up the challenge.

Can you profile your current research, its challenges, opportunities, and implications?

Internet-based video fruition has been reality for a while, yet it is constantly growing. Cisco’s forecasts see video delivery to account for 79% of the overall internet consumer traffic by 2018 (this is equivalent to one million minutes of video crossing IP networks every second). As the media fruition grows, so do user expectations in terms of Quality of Experinece (see the recent Conviva reports!). And, future multimedia will have to be optimized for multiple, more immersive (plenoptic, HDRi, ultra-high definition) devices, both fixed and mobile. Moore’s law and broadband speed alone won’t do the job. Resources and delivery mechanisms have to be optimized on a more application- and user-specific basis. To do so, it will be essential to be able to measure (unobtrusively) the extent to which the user deems the video experience to be of a high quality.

In this context, my work aims to (1) understand the perceptual, cognitive and affective processes underlying user appreciation for multimedia experiences, and (2) model these processes in order to automatically assess the delivered QoE, and, when applicable, enhance it. It is important here to bear in mind that multimedia quality of experience cannot be considered to depend solely on the presence (absence) of visual/auditory impairments introduced by technology limitations (e.g., packet loss errors or blocking artifacts from compression). Although that’s been the most common approach to QoE assessment and optimization, it is not sufficient anymore. The appearance of social media and internet-based delivery has challenged the way media are consumed: we don’t deal with passive observers anymore, but with users that select specific videos, to be delivered on specific devices, in any type of context. Elements such as semantics, user personality, preferences and intent, and socio- cultural context of fruition come into play, that have never been investigated (let alone modelled) for delivery optimization. My research focuses on integrating these elements in QoE estimation, to enable effective, personalized optimization.

The challenges are countless: user and context characteristics have to be quantified and modelled, to be then integrated with the video content analysis to deliver a final quality assessment, representing the experience as it would be perceived by that user, in that context, given that specific video. Before that, which user and context factors impact QoE is to be determined (to date, there is not even agreement on a taxonomy of these factors). Adaptive streaming protocols make it possible to implement user- and context- aware delivery strategies, the willingness of users to share personal data publicly can lead to more accurate user models, and especially crowdsourcing and crowdsensing can support the systematic study of the influence that context and user factors have on the overall QoE.

How would you describe the role of women especially in the field of multimedia?

Just like for their male colleagues (would you ask them to describe the role of men in multimedia?), the role of women in multimedia is:

  1. to push the boundaries of science, knowledge and practice in the field, doing amazing research that will make the world a better place
  2. to train new generations of brilliant engineers and scientists that will keep doing amazing research to make the world an even better place and
  3. serve the community as professionals and leaders to steer the future amazing research that will go on making the wold better and better.

I’d say the first two points are covered. The third, instead, may be implemented a bit better in practice, as there is a general lack of representativeness of women at a leadership level. The reasons for this are countless. They go from the lack of incoming talent (traditionally girls are not attracted to STEM subjects, perhaps for socio-cultural reasons), to the so-called leaking pipeline, which sees talented women leaving demanding yet rewarding careers too early, to an underlying presence of the impostor syndrome, that sometimes prevents women from putting their name forward for given roles. The solution is not necessarily in quotas (although I understand the reasoning behind the need for quotas, I think they are actually making women’s life more difficult – there is an underlying feeling that “women have it all easy these days” that makes work relationships more suspicious and ends up making women have to work three times as hard to show that they actually deserve what they accomplished), but rather in coaching and dedicated sponsorship of talent since the early stages.

How would you describe your top innovative achievements in terms of the problems you were trying to solve, your solutions, and the impact it has today and into the future?

The methods that I developed for subjective image quality assessment have been adopted within Philips research and their evolution to video quality assessment is now under evaluation of the Video Quality Experts Group to be advised as an alternative methodology to the standard ACR and paired comparison. The research that I carried out on the suitability of crowdsourcing for subjective QoE testing and adaptation of traditional lab-based experimental designs to crowdtesting is now included in the Qualinet white paper on Best practices for crowdsourced QoE, and has helped in better understanding the potential of this tool for QoE research (and the risks involved in its use). This research is also currently feeding new ITU-T recommendations on the subject. The models that I developed for objective QoE estimation have been published in top journals and pose the basis for a more encompassing and personalized QoE optimization.

Over your distinguished career, what are your top lessons you want to share with the audience?

Again, I am not sure whether I am yet in the position of giving advice and/or sharing lessons, but here are a couple of things:

  1. Be patient and long-sighted. Going for research that pays off on the short term is very appealing, especially when you are challenged with job insecurity (been there, done that). But it is not a sustainable strategy, you can’t make the world a better place with your research if you don’t have a long term vision, where all the pieces fit together towards a final goal. And on the long term, it’s not fun either.

  2. Be generous. Science is supposed to move forward as a collaborative effort. That’s why we talk about a “scientific community”. Be generous in sharing your knowledge and work (open access, datasets, code). Be generous in providing feedback, to your peers (be constructive in your reviews!) and to students. Be generous in helping out fellow scientists and early stage researchers. True, it is horribly time consuming. But it is rewarding, and makes our community tighter and stronger.

For girls, watch Sheryl Sandberg’s TED talk, do participate to the Grace Hopper Celebration of Women in Computing, don’t be afraid to come to the ACMMM women’s lunches, they are a lot of fun. Actually, these are good tips for boys too.

For the rest just watch The last lecture of Randy Pausch because he said it all already and much better than I could ever do.

If you were conducting this interview, what questions would you ask, and then what would be your answers?

Q: Why should one attend the ACMMM women’s lunch?

A: If you are a female junior member of the community, do attend because it will give you the opportunity to chat with senior women who have been around for a while, and can tell you all about how they got where they are (most precious advice, trust me). If you are a female senior member of the community, do attend because you could meet some young, talented researcher that needs some good tips from you, and you should not keep all your valuable advice for yourself :). If you are a male member of the community, you should attend because we really need to initiate some constructive dialogue on how to deal with the problem of low female representation in the community (because it is a problem, see next question). Being this a community problem (and not a problem of females only), we need all members of the community to discuss it.

Q: Why do we need more women in Multimedia?

A: Read this or this, or just check the Wikipedia page on women in STEM.

MPEG Column: Press release for the 114th MPEG meeting

Screen Content Coding Makes HEVC the Flexible Standard for Any Video Source

San Diego, USA − The 114th MPEG meeting was held in San Diego, CA, USA, from 22 – 26 February 2016

Powerful new HEVC tools improve compression of text, graphics, and animation

The 114th MPEG meeting marked the completion of the Screen Content Coding (SCC) extensions to HEVC – the High Efficiency Video Coding standard. This powerful set of tools augments the compression capabilities of HEVC to make it the flexible standard for virtually any type of video source content that is commonly encountered in our daily lives.

Screen content is video containing a significant proportion of rendered (moving or static) graphics, text, or animation rather than, or in addition to, camera-captured video scenes. The new SCC extensions of HEVC greatly improve the compression of such content. Example applications include wireless displays, news and other television content with text and graphics overlays, remote computer desktop access, and real-time screen sharing for video chat and video conferencing.

The technical development of the SCC extensions was performed by the MPEG and VCEG video coding joint team JCT-VC, following a joint Call for Proposals issued in February 2014.

CfP issued for technologies to orchestrate capture and consumption of media across multiple devices

At its 114th meeting, MPEG issued a Call for Proposals (CfP) for Media Orchestration. The CfP seeks submissions of technologies that will facilitate the orchestration of devices and media, both in time (advanced synchronization, e.g. across multiple devices) and space, where the media may come from multiple capturing devices and may be consumed by multiple rendering devices. An example application includes coordination of consumer CE devices to record a live event. The CfP for Media Orchestration can be found at http://mpeg.chiariglione.org/meetings/114.

User Description framework helps recommendation engines deliver better choices

At the 114th meeting, MPEG has completed a standards framework (in ISO/IEC 21000-22) to facilitate the narrowing of big data searches to help recommendation engines deliver better, personalized, and relevant choices to users. Understanding the personal preferences of a user, and the context within which that user

Source: Status: Subject: Date:

is interacting with a given application, facilitates the ability of that application to better respond to individual user requests. Having that information provided in a standard and interoperable format enables application providers to more broadly scale their services to interoperate with other applications providers. Enter MPEG User Description (MPEG-UD). The aim of MPEG User Description is to ensure interoperability among recommendation services, which take into account the user and his/her context when generating recommendations for the user. With MPEG-UD, applications can utilize standard descriptors for users (user descriptor), the context in which the user is operating (context descriptor), recommendations (recommendation descriptor), and a description of a specific recommendation service that could be eventually consumed by the user (service descriptor).

Publish/Subscribe Application Format is finalized

The Publish/Subscribe Application Format (PSAF, ISO/IEC 23000-16) has reached the final milestone of FDIS at this MPEG meeting. The PSAF enables a communication paradigm where publishers do not communicate information directly to intended subscribers but instead rely on a service that mediates the relationship between senders and receivers. In this paradigm, Publishers create and store Resources and their descriptions, and send Publications; Subscribers send Subscriptions. Match Service Providers (MSP) receive and match Subscriptions with Publications and, when a Match has been found, send Notifications to users listed in Publications and Subscriptions. This paradigm is enabled by three other MPEG technologies which have also reached their final milestone: Contract Expression Language (CEL), Media Contract Ontology (MCO) and User Description (UD). A PSAF Notification is expressed as a set of UD Recommendations.

CEL is a language to express contract regarding a digital license, the complete business agreements between the parties. MCO is an ontology to represent contracts dealing with rights on multimedia assets and intellectual property protected content in general. A specific vocabulary is defined in a model extension to represent the most common rights and constraints in the audiovisual context. PSAF contracts between Publishers or Subscribers and MSPs are expressed in CEL or MCO.

Augmented Reality Application Format reaches FDIS status

At the 114th MPEG meeting, the 2nd edition of ARAF, MPEG’s Application Format for Augmented Reality (ISO/IEC 23000-13) has reached FDIS status and will be soon published as an International Standard. The MPEG ARAF enables augmentation of the real world with synthetic media objects by combining multiple existing MPEG standards within a single specific application format addressing certain industry needs. In particular, ARAF comprises three components referred to as scene, sensor/actuator, and media. The target applications include geolocation-based services, image-based object detection and tracking, audio recognition and synchronization, mixed and augmented reality games and real-virtual interactive scenarios.

Genome compression progresses toward standardization

At its 114th meeting, MPEG has progressed its exploration of genome compression toward formal standardization. The 114th meeting included a seminar to collect additional perspectives on genome data standardization, and a review of technologies that had been submitted in response to a Call for Evidence. The purpose of that CfE, which had been previously issued at the 113th meeting, was to assess whether new technologies could achieve better performance in terms of compression efficiency compared with currently used formats.

In all, 22 tools were evaluated. The results demonstrate that by integrating a multiple of these tools, it is possible to improve the compression of up to 27% with respect to the best state-of-the-art tool. With this evidence, MPEG has issued a Draft Call for Proposals (CfP) on Genomic Information Representation and Compression. The Draft CfP targets technologies for compressing raw and aligned genomic data and metadata for efficient storage and analysis.

As demonstrated by the results of the Call for Evidence, improved lossless compression of genomic data beyond the current state-of-the-art tools is achievable by combining and further developing them. The call also addresses lossy compression of the metadata which make up the dominant volume of the resulting compressed data. The Draft CfP seeks lossy compression technologies that can provide higher compression performance without affecting the accuracy of analysis application results. Responses to the Genomic Information Representation and Compression CfP will be evaluated prior to the 116th MPEG meeting in October 2016 (in Chengdu, China). An ad hoc group, co-chaired by Martin Golobiewski, convenor of Working Group 5 of ISO TC 276 (i.e. the ISO committee for Biotechnology) and Dr. Marco Mattavelli (of MPEG) will coordinate the receipt and pre-analysis of submissions received in response to the call. Detailed results to the CfE and the presentations shown during the seminar will soon be available as MPEG documents N16137 and N16147 at: http://mpeg.chiariglione.org/meetings/114.

MPEG evaluates results to CfP for Compact Descriptors for Video Analysis

MPEG has received responses from three consortia to its Call for Proposals (CfP) on Compact Descriptors for Video Analysis (CDVA). This CfP addresses compact (i.e., compressed) video description technologies for search and retrieval applications, i.e. for content matching in video sequences. Visual content matching includes matching of views of large and small objects and scenes, that is robust to partial occlusions as well as changes in vantage point, camera parameters, and lighting conditions. The objects of interest include those that are planar or non-planar, rigid or partially rigid, and textured or partially textured. CDVA aims to enable efficient and interoperable design of video analysis applications in large databases, for example broadcasters’ archives or videos available on the Internet. It is envisioned that CDVA will provide a complementary set of tools to the suite of existing MPEG standards, such as the MPEG-7 Compact Descriptors for Visual Search (CDVS). Evaluation showed that sufficient technology was received such that a standardization effort is started. The final standard is expected to be ready in 2018.

Workshop on 5G/ Beyond UHD Media

A workshop on 5G/ Beyond UHD Media was held on February 24th, 2016 during the 114th MPEG meeting. The workshop was organized to acquire relevant information about the context in which MPEG technology related to video, virtual reality and the Internet of Things will be operating in the future, and to review the status of mobile technologies with the goal of guiding future codec standardization activity.

Dr. James Kempf of Ericsson reported on the challenges that Internet of Things devices face in a mobile environment. Dr. Ian Harvey of FOX discussed content creation for Virtual Reality applications. Dr. Kent Walker of Qualcomm promoted the value of unbundling technologies and creating relevant enablers. Dr. Jongmin Lee of SK Telecom explained challenges and opportunities in Next Generation Mobile Multimedia Services. Dr. Sudhir Dixit of Wireless World Research Forum reported on the next generation mobile 5G network and Its Challenges in Support of UHD Media. Emmanuel Thomas of TNO showed trends in 5G and future media consumption using media orchestration as an example. Dr. Charlie Zhang of Samsung Research America focused in his presentation on the 5G Key Technologies and Recent Advances.

Verification test complete for Scalable HEVC and MV-HEVC

MPEG has completed verification tests of SHVC, the scalable form of HEVC. These tests confirm the major savings that can be achieved by Scalable HEVC’s nested layers of data from which subsets can be extracted and used on their own to provide smaller coded streams. These smaller subsets can still be decoded with good video quality, as contrasted with the need to otherwise send separate “simulcast” coded video streams or add an intermediate “transcoding” process that would add substantial delay and complexity to the system.

The verification tests for SHVC showed that scalable HEVC coding can save an average of 40–60% in bit rate for the same quality as with simulcast coding, depending on the particular scalability scenario. SHVC includes capabilities for using a “base layer” with additional layers of enhancement data that improve the video picture resolution, the video picture fidelity, the range of representable colors, or the dynamic range of displayed brightness. Aside from a small amount of intermediate processing, each enhancement layer can be decoded by applying the same decoding process that is used for the original non-scalable version of HEVC. This compatibility that has been retained for the core of the decoding process will reduce the effort needed by industry to support the new scalable scheme.

Further verification tests were also conducted on MV-HEVC, where the Multiview Main Profile exploits the redundancy between different camera views using the same layering concept as scalable HEVC, with the same property of each view-specific layer being decodable by the ordinary HEVC decoding process. The results demonstrate that for the case of stereo (two views) video, a data rate reduction of 30% when compared to simulcast (independent HEVC coding of the views), and more than 50% when compared to the multi-view version of AVC (which is known as MVC), can be achieved for the same video quality.

Exploring new Capabilities in Video Compression Technology

Three years after finishing the first version of the HEVC standard, this MPEG meeting marked the first full meeting of a new partnership to identify advances in video compression technology. At its previous meeting, MPEG and ITU-T’s VCEG had agreed to join together to explore new technology possibilities for video coding that lie beyond the capabilities of the HEVC standard and its current extensions. The new partnership is known as the Joint Video Exploration Team (JVET), and the team is working to explore both incremental and fundamentally different video coding technology that shows promise to potentially become the next generation in video coding standardization. The JVET formation follows MPEG’s workshops and requirements-gathering efforts that have confirmed that video data demands are continuing to grow and are projected to remain a major source of stress on network traffic – even as additional improvements in broadband speeds arise in the years to come. The groundwork laid at the previous meeting for the JVET effort has already borne fruit. The team has developed a Joint Exploration Model (JEM) for simulation experiments in the area, and initial tests of the first JEM version have shown a potential compression improvement over HEVC by combining a variety of new techniques. Given sufficient further progress and evidence of practicality, it is highly likely that a new Call for Evidence or Call for Proposals will be issued in 2016 or 2017 toward converting this initial JVET exploration into a formal project for an improved video compression standard.

How to contact MPEG, learn more, and find other MPEG facts

To learn about MPEG basics, discover how to participate in the committee, or find out more about the array of technologies developed or currently under development by MPEG, visit MPEG’s home page at

http://mpeg.chiariglione.org. There you will find information publicly available from MPEG experts past and present including tutorials, white papers, vision documents, and requirements under consideration for new standards efforts. You can also find useful information in many public documents by using the search window].

Examples of tutorials that can be found on the MPEG homepage include tutorials for: High Efficiency Video Coding, Advanced Audio Coding, Universal Speech and Audio Coding, and DASH to name a few. A rich repository of white papers can also be found and continues to grow. You can find these papers and tutorials for many of MPEG’s standards freely available. Press releases from previous MPEG meetings are also available. Journalists that wish to receive MPEG Press Releases by email should contact Dr. Christian Timmerer at Christian.timmerer@itec.uni-klu.ac.at.

Further Information

Future MPEG meetings are planned as follows:

No. 115, Geneva, CH, 30 – 03 May – June 2016
No. 116, Chengdu, CN, 17 – 21 October 2016
No. 117, Geneva, CH, 16 – 20 January, 2017
No. 118, Hobart, AU, 03 – 07 April, 2017

 

 

New ACM TOMM Policy

As a new policy of ACM TOMM, we are planning to publish three Special Issues per year, starting from 2017. We therefore invite highly qualified scientists to submit proposals for 2017 ACM TOMM Special Issues. Each Special Issue is the responsibility of the Guest Editors.

Proposals are accepted until May, 15th 2016. They should be prepared according to the instructions outlined below, and sent by e-mail to the Senior Associate Editor for Special Issue Management, Shervin Shirmohammadi (shervin@ieee.org) and the Editor in Chief of ACM TOMM Alberto del Bimbo (eic.tomm@gmail.com).

Please see http://tomm.acm.org/TOMM_2017_SI_CFP.pdf for details.