Predicting the Emotional Impact of Movies

Affective video content analysis aims at the automatic recognition of emotions elicited by videos. It has a large number of applications, including mood based personalized content recommendation [1], video indexing [2], and efficient movie visualization and browsing [3]. Beyond the analysis of existing video material, affective computing techniques can also be used to generate new content, e.g., movie summarization [4], personalized soundtrack recommendation to make user-generated videos more attractive [5]. Affective techniques can furthermore be used to enhance the user engagement with advertising content by optimizing the way ads are inserted inside videos [6].

While major progress has been achieved in computer vision for visual object detection, high-level concept recognition, and scene understanding, a natural further step is the modeling and recognition of affective concepts. This has recently received increasing interest from research communities, e.g., computer vision and machine learning, with an overall goal of endowing computers with human-like perception capabilities.

Efficient training and benchmarking of computational models, however, require a large and diverse collection of data annotated with ground truth, which is often difficult to collect, and particularly in the field of affective computing. To address this issue we created the LIRIS-ACCEDE dataset. In contrast to most existing datasets that contain few video resources and have limited accessibility due to copyright constraints, LIRIS-ACCEDE consists of videos with a large content diversity annotated along emotional dimensions. The annotations are made according to the expected emotion of a video, which is the emotion that the majority of the audience feels in response to the same content. All videos are shared under Creative Commons licenses and can thus be freely distributed without copyright issues. The dataset (the videos, annotations, features and protocols) are publicly available, and it is currently composed of a total of six collections.

Predicting the Emotional Impact of Movies

Credits and license information: (a) Cloudland, LateNite Films, shared under CC BY 3.0 Unported license at http://vimeo.com/17105083, (b) Origami, ESMA MOVIES, shared under CC BY 3.0 Unported license at http://vimeo.com/52560308, (c) Payload, Stu Willis, shared under CC BY 3.0 Unported license at http://vimeo.com/50509389, (d) The room of Franz Kafka, Fred. L’Epee, shared under CC BY-NC-SA 3.0 Unported license at http://vimeo.com/14482569, (e) Spaceman, Jono Schaferkotter & Before North, shared under CC BY-NC 3.0 Unported License license at http://vodo.net/spaceman.

Dataset & Collections

The LIRIS-ACCEDE dataset is composed of movies and excerpts from movies under Creative Commons licenses that enable the dataset to be publicly shared. The set contains 160  professionally made and amateur movies, with different movie genres such as horror, comedy, drama, action and so on. Languages are mainly English, with a small set of Italian, Spanish, French and others subtitled in English. The set has been used to create the six collections that are part of the dataset. The two collections that were originally proposed are the Discrete LIRIS-ACCEDE collection, which contains short excerpts of movies, and the Continuous LIRIS-ACCEDE collection, which comprises long movies. Moreover, since 2015, the set has been used for tasks related to affect/emotion at the MediaEval Benchmarking Initiative for Multimedia Evaluation [7], where each year it was enriched with new data, features and annotations. Thus, the dataset also includes the four additional collections dedicated to these tasks.

The movies are available together with emotional annotations. When dealing with emotional video content analysis, the goal is to automatically recognize emotions elicited by videos. In this context, three types of emotions can be considered: intended, induced and expected emotions[8]. The intended emotion is the emotion that the film maker wants to induce in the viewers. The induced emotion is the emotion that a viewer feels in response to the movie. The expected emotion is the emotion that the majority of the audience feels in response to the same content. While the induced emotion is subjective and context dependent, the expected emotion can be considered objective, as it reflects the more-or-less unanimous response of a general audience to a given stimulus[8]. Thus, the LIRIS-ACCEDE dataset focuses on the expected emotion. The representation of emotions we are considering is the dimensional one, based on valence and arousal. Valence is defined on a continuous scale from most negative to most positive emotions, while arousal is defined continuously from calmest to most active emotions [9]. Moreover, violence annotations were provided in the MediaEval 2015 Affective Impact of Movies collection, while fear annotations were provided in the MediaEval 2016 and 2017 Emotional Impact of Movies collections.

Discrete LIRIS-ACCEDE collection A total of 160 films from various genres split into 9,800 short clips with valence and arousal annotations. More details below.
Continuous LIRIS-ACCEDE collection A total of 30 films with valence and arousal annotations per second. More details below.
MediaEval 2015 Affective Impact of Movies collection A subset of the films with labels for the presence of violence, as well as for the felt valence and arousal. More details below.
MediaEval 2016 Emotional Impact of Movies collection A subset of the films with score annotations for the expected valence and arousal. More details below.
MediaEval 2017 Emotional Impact of Movies collection A subset of the films with valence and arousal values and a label for the presence of fear for each 10 second segment, as well as precomputed features. More details below.
MediaEval 2018 Emotional Impact of Movies collection A subset of the films with valence and arousal values for each second, begin-end times of scenes containing fear, as well as precomputed features. More details below.

Ground Truth

The ground truth for the Discrete LIRIS-ACCEDE collection consists of the ranking of all video clips along both valence and arousal dimensions. These rankings were obtained thanks to a pairwise video clips comparison protocol that has been designed to be used through crowdsourcing (with CrowdFlower service). Thus, for each pair of video clips presented to raters, they had to select the one which conveyed most strongly the given emotion in terms of valence or arousal. The high inter-annotator agreement that was achieved reflects that annotations were fully consistent, despite the large diversity of our raters’ cultural backgrounds. Affective ratings (scores) were also collected for a subset of the 9,800 movies in order to cross-validate the crowdsourced annotations. The affective ratings also made learning of Gaussian Processes for Regression possible, to model the noisiness from measurements and map the whole ranked LIRIS-ACCEDE dataset into the 2D valence-arousal affective space. More details can be found in [10].

To collect the ground truth for the continuous and MediaEval 2016, 2017 and 2018 collections, which consisted of valence and arousal scores for every movie second, French annotators had to continuously indicate their level of valence and arousal while watching the movies using a modified version of the GTrace annotation tool [16] and a joystick. Each annotator continuously annotated one subset of the movies considering the induced valence, and another subset considering the induced arousal. Thus, each movie was continuously annotated by three to five different annotators. Then, the continuous valence and arousal annotations from the annotators were down-sampled by averaging the annotations over windows of 10 seconds with a shift of 1 second overlap (i.e., yielding 1 value per second) in order to remove any noise due to unintended movements of the joystick. Finally, the post-processed continuous annotations were averaged in order to create a continuous mean signal of the valence and arousal self-assessments, ranging from -1 (most negative for valence, most passive for arousal) to +1 (most positive for valence, most active for arousal). The details of this process are given in [11].

The ground truth for violence annotation, used in the MediaEval 2015 Affective Impact of Movies collection, was collected as follows. First, all the videos were annotated separately by two groups of annotators from two different countries. For each group, regular annotators labeled all the videos, which were then reviewed by master annotators. Regular annotators were graduate students (typically single with no children) and master annotators were senior researchers.  Within each group, each video received 2 different annotations, which were then merged by the master annotators into the final annotation for the group. Finally, the achieved annotations from the two groups were merged and reviewed once more by the task organizers. The details can be found in [12].

The ground truth for fear annotations, used in the MediaEval 2017 and 2018 Emotional Impact of Movies collections, was generated using a tool specifically designed for the classification of audio-visual media allowing to perform annotation while watching the movie (at the same time). The annotations have been realized by two well-experienced team members of NICAM [17], both of them trained in classification of media. Each movie was annotated by one annotator reporting the start and stop times of each sequence in the movie expected to induce fear.

Conclusion

Through its six collections, the LIRIS-ACCEDE dataset constitutes a dataset of choice for affective video content analysis. It is one of the largest dataset for this purpose, and is regularly enriched with new data, features and annotations. In particular, it is used for the Emotional Impact of Movies tasks at MediaEval Benchmarking Initiative for Multimedia Evaluation. As all the movies are under Creative Commons licenses, the whole dataset can be freely shared and used by the research community, and is available at http://liris-accede.ec-lyon.fr.

Discrete LIRIS-ACCEDE collection [10]
In total 160 films and short films with different genres were used and were segmented into 9,800 video clips. The total time of all 160 films is 73 hours 41 minutes and 7 seconds, and a video clip was extracted on average every 27s. The 9,800 segmented video clips last between 8 and 12 seconds and are representative enough to conduct experiments. Indeed, the length of extracted segments is large enough to get consistent excerpts allowing the viewer to feel emotions, while being small enough to make the viewer feel only one emotion per excerpt.

The content of the movie was also considered to create homogeneous, consistent and meaningful excerpts that were not meant to disturb the viewers. A robust shot and fade in/out detection was implemented to make sure that each extracted video clip started and ended with a shot or a fade. Furthermore, the order of excerpts within a film was kept, allowing the study of temporal transitions of emotions.

Several movie genres are represented in this collection of movies, such as horror, comedy, drama, action, and so on. Languages are mainly English with a small set of Italian, Spanish, French and others subtitled in English. For this collection the 9,800 video clips are ranked according to valence, from the clip inducing the most negative emotion to the most positive, and to arousal, from the clip inducing the calmest emotion to the most active emotion. Besides the ranks, the emotional scores (valence and arousal) are also provided for each clip.

Continuous LIRIS-ACCEDE collection [11]
The movie clips for the Discrete collection were annotated globally, for which a single value of arousal and valence was used to represent a whole 8 to 12-second video clip. In order to allow deeper investigations into the temporal dependencies of emotions (since a felt emotion may influence the emotions felt in the future), longer movies were considered in this collection. To this end, a selection of 30 movies from the set of 160 was made such that their genre, content, language and duration were diverse enough to be representative of the original Discrete LIRIS-ACCEDE dataset. The selected videos are between 117 and 4,566 seconds long (mean = 884.2s ± 766.7s SD). The total length of the 30 selected movies is 7 hours, 22 minutes and 5 seconds. The emotional annotations consist of a score of expected valence and arousal for each second of each movie.

MediaEval 2015 Affective Impact of Movies collection [12]
This collection has been used as the development and test sets for the MediaEval 2015 Affective Impact of Movies Task. The overall use case scenario of the task was to design a video search system that used automatic tools to help users find videos that fitted their particular mood, age or preferences. To address this, two subtasks were proposed:

  • Induced affect detection: the emotional impact of a video or movie can be a strong indicator for search or recommendation;
  • Violence detection: detecting violent content is an important aspect of filtering video content based on age.

The 9,800 video clips from the Discrete LIRIS-ACCEDE section were used as development set, and an additional 1100 movie clips were proposed for the test set. For each of the 10,900 video clips, the annotations consist of: a binary value to indicate the presence of violence, the class of the excerpt for felt arousal (calm-neutral-active), and the class for felt valence (negative-neutral-positive).

MediaEval 2016 Emotional Impact of Movies collection [13]
The MediaEval 2016 Emotional Impact of Movies task required participants to deploy multimedia features to automatically predict the emotional impact of movies, in terms of valence and arousal. Two subtasks were proposed:

  • Global emotion prediction: given a short video clip (around 10 seconds), participants’ systems were expected to predict a score of induced valence (negative-positive) and induced arousal (calm-excited) for the whole clip;
  • Continuous emotion prediction: as an emotion felt during a scene may be influenced by the emotions felt during the previous scene(s), the purpose here was to consider longer videos, and to predict the valence and arousal continuously along the video. Thus, a score of induced valence and arousal were to be provided for each 1s-segment of each video.

The development set was composed of the Discrete LIRIS-ACCEDE part for the first subtask, and the Continuous LIRIS-ACCEDE part for the second subtask. In addition to the development set, a test set was also provided to assess participants’ methods performance. A total of 49 new movies under Creative Commons licenses were added. With the same protocol as the one used for the development set, 1,200 additional short video clips were extracted for the first subtask (between 8 and 12 seconds), while 10 long movies (from 25 minutes to 1 hour and 35 minutes) were selected for the second subtask (for a total duration of 11.48 hours). Thus, the annotations consist of a score of expected valence and arousal for each movie clip used for the first subtask, and a score of expected valence and arousal for each second of the movies for the second subtask.

MediaEval 2017 Emotional Impact of Movies collection [14]
This collection was used for the MediaEval 2017 Emotional Impact of Movies task. Here, only long movies were considered, and the emotion was considered in terms of valence, arousal and fear. The following two subtasks were proposed for which the emotional impact had to be predicted for consecutive 10-second segments, which slid over the whole movie with a shift of 5 seconds:

  • Valence/Arousal prediction: participants’ systems were supposed to predict a score of expected valence and arousal for each consecutive 10-second segment;
  • Fear prediction: the purpose here was to predict whether each consecutive 10-second segments was likely to induce fear or not. The targeted use case was the prediction of frightening scenes to help systems protecting children from potentially harmful video content. This subtask is complementary to the valence/arousal prediction task in the sense that the mapping of discrete emotions into the 2D valence/arousal space is often overlapped (for instance, fear, disgust and anger are overlapped since they are characterized with very negative valence and high arousal).

The Continuous LIRIS-ACCEDE collection was used as the development test for both subtasks. The test set consisted of a selection of new 14 new movies under Creative Commons licenses other than the selection of the 160 original movies. They are between 210 and 6,260 seconds long. The total length of the 14 selected movies is 7 hours, 57 minutes and 13 seconds. In addition to the video data, general purpose audio and visual content features were also provided, including Deep features, Fuzzy Color and Texture Histogram, Gabor features. The annotations consist of a valence value, an arousal value and a binary value for each 10-second segment to indicate if the segment was supposed to induce fear or not.

MediaEval 2018 Emotional Impact of Movies collection [15]
The MediaEval 2018 Emotional Impact of Movies task is similar to the one of 2017. However, in this case, more data was provided and a prediction of the emotional impact needed to be made for every second in movies rather than for 10-second segments as before. The two subtasks were:

  • Valence and Arousal prediction: participants’ systems had to predict a score of expected valence and arousal continuously (every second) for each movie;
  • Fear detection: the purpose here was to predict beginning and ending times of sequences inducing fear in movies. The targeted use case was the detection of frightening scenes to help systems protecting children from potentially harmful video content.

The development set for both subtasks consisted of the movies from the Continuous LIRIS-ACCEDE collection, as well as from the test set of the MediaEval 2017 Emotional Impact of Movies collection, i.e. 44 movies for a total duration of 15 hours and 20 minutes. The test set consisted of 12 other movies selected from the set of 160 movies, for a total duration of 8 hours and 56 minutes. Like the 2017 collection, in addition to the video data, general purpose audio and visual content features were also provided. The annotations consist of valence and arousal values for each second of the movies (for the first subtasks) as well as the beginning and ending times of each sequence in movies inducing fear (for the second subtask).

Acknowledgments

This work was supported in part by the French research agency ANR through the VideoSense Project under the Grant 2009 CORD 026 02 and through the Visen project within the ERA-NET CHIST-ERA framework under the grant ANR-12-CHRI-0002-04.

Contact

Should you have any inquiries or questions about the dataset, do not hesitate to contact us by email at: emmanuel dot dellandrea at ec-lyon dot fr.

References

[1] L. Canini, S. Benini, and R. Leonardi, “Affective recommendation of movies based on selected connotative features”, in IEEE Transactions on Circuits and Systems for Video Technology, 23(4), 636–647, 2013.
[2] S. Zhang, Q. Huang, S. Jiang, W. Gao, and Q. Tian. 2010, “Affective visualization and retrieval for music video”, in IEEE Transactions on Multimedia 12(6), 510–522, 2010.
[3] S.Zhao, H.Yao, X.Sun, X.Jiang, and P. Xu., “Flexible presentation of videos based on affective content analysis”, in Advances in Multimedia Modeling, 2013.
[4] H. Katti, K. Yadati, M. Kankanhalli, and C. Tat-Seng, “Affective video summarization and story board generation using pupillary dilation and eye gaze”, in IEEE International Symposium on Multimedia (ISM), 2011.
[5] R.R. Shah,Y. Yu, and R. Zimmermann, “Advisor: Personalized video soundtrack recommendation by late fusion with heuristic rankings”, in ACM International Conference on Multimedia, 2014.
[6] K. Yadati, H. Katti, and M. Kankanhalli, “Cavva: Computational affective video-in-video advertising”, in IEEE Transactions on Multimedia 16(1), 15–23, 2014.
[7] http://www.multimediaeval.org/
[8] A. Hanjalic, “Extracting moods from pictures and sounds: Towards truly personalized TV”, in IEEE Signal Processing Magazine, 2006.
[9] J.A. Russell, “Core affect and the psychological construction of emotion”, in Psychological Review, 2003.
[10] Y. Baveye, E. Dellandrea, C. Chamaret, and L. Chen, “LIRIS-ACCEDE: A Video Database for Affective Content Analysis,” in IEEE Transactions on Affective Computing, 2015.
[11] Y. Baveye, E. Dellandrea, C. Chamaret, and L. Chen, “Deep Learning vs. Kernel Methods: Performance for Emotion Prediction in Videos,” in 2015 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII), 2015.
[12] M. Sjöberg, Y. Baveye, H. Wang, V. L. Quang, B. Ionescu, E. Dellandréa, M. Schedl, C.-H. Demarty, and L. Chen, “The mediaeval 2015 affective impact of movies task,” in MediaEval 2015 Workshop, 2015.
[13] E. Dellandrea, L. Chen, Y. Baveye, M. Sjoberg and C. Chamaret, “The MediaEval 2016 Emotional Impact of Movies Task”, in Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, The Netherlands, October 20-21, 2016.
[14] E. Dellandrea, M. Huigsloot, L. Chen, Y. Baveye and M. Sjoberg, “The MediaEval 2017 Emotional Impact of Movies Task”, in Working Notes Proceedings of the MediaEval 2017 Workshop, Dublin, Ireland, September 13-15, 2017.
[15] E. Dellandréa, M. Huigsloot, L. Chen, Y. Baveye, Z. Xiao and M. Sjöberg, “The MediaEval 2018 Emotional Impact of Movies Task”, Working Notes Proceedings of the MediaEval 2018 Workshop, Sophia Antipolis, France, October 29-31, 2018.
[16] R. Cowie, M. Sawey, C. Doherty, J. Jaimovich, C. Fyans, and P. Stapleton, “Gtrace: General trace program compatible with emotionML”, in Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII), 2013.
[17] http://www.kijkwijzer.nl/nicam.

MPEG Column: 124th MPEG Meeting in Macau, China

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The MPEG press release comprises the following aspects:

  • Point Cloud Compression – MPEG promotes a video-based point cloud compression technology to the Committee Draft stage
  • Compressed Representation of Neural Networks – MPEG issues Call for Proposals
  • Low Complexity Video Coding Enhancements – MPEG issues Call for Proposals
  • New Video Coding Standard expected to have licensing terms timely available – MPEG issues Call for Proposals
  • Multi-Image Application Format (MIAF) promoted to Final Draft International Standard
  • 3DoF+ Draft Call for Proposal goes Public

Point Cloud Compression – MPEG promotes a video-based point cloud compression technology to the Committee Draft stage

At its 124th meeting, MPEG promoted its Video-based Point Cloud Compression (V-PCC) standard to Committee Draft (CD) stage. V-PCC addresses lossless and lossy coding of 3D point clouds with associated attributes such as colour. By leveraging existing and video ecosystems in general (hardware acceleration, transmission services and infrastructure), and future video codecs as well, the V-PCC technology enables new applications. The current V-PCC encoder implementation provides a compression of 125:1, which means that a dynamic point cloud of 1 million points could be encoded at 8 Mbit/s with good perceptual quality.

A next step is the storage of V-PCC in ISOBMFF for which a working draft has been produced. It is expected that further details will be discussed in upcoming reports.

Research aspects: Video-based Point Cloud Compression (V-PCC) is at CD stage and a first working draft for the storage of V-PCC in ISOBMFF has been provided. Thus, a next consequence is the delivery of V-PCC encapsulated in ISOBMFF over networks utilizing various approaches, protocols, and tools. Additionally, one may think of using also different encapsulation formats if needed.

MPEG issues Call for Proposals on Compressed Representation of Neural Networks

Artificial neural networks have been adopted for a broad range of tasks in multimedia analysis and processing, media coding, data analytics, and many other fields. Their recent success is based on the feasibility of processing much larger and complex neural networks (deep neural networks, DNNs) than in the past, and the availability of large-scale training data sets. Some applications require the deployment of a particular trained network instance to a potentially large number of devices and, thus, could benefit from a standard for the compressed representation of neural networks. Therefore, MPEG has issued a Call for Proposals (CfP) for compression technology for neural networks, focusing on the compression of parameters and weights, focusing on four use cases: (i) visual object classification, (ii) audio classification, (iii) visual feature extraction (as used in MPEG CDVA), and (iv) video coding.

Research aspects: As point out last time, research here will mainly focus around compression efficiency for both lossy and lossless scenarios. Additionally, communication aspects such as transmission of compressed artificial neural networks within lossy, large-scale environments including update mechanisms may become relevant in the (near) future.

 

MPEG issues Call for Proposals on Low Complexity Video Coding Enhancements

Upon request from the industry, MPEG has identified an area of interest in which video technology deployed in the market (e.g., AVC, HEVC) can be enhanced in terms of video quality without the need to necessarily replace existing hardware. Therefore, MPEG has issued a Call for Proposals (CfP) on Low Complexity Video Coding Enhancements.

The objective is to develop video coding technology with a data stream structure defined by two component streams: a base stream decodable by a hardware decoder and an enhancement stream suitable for software processing implementation. The project is meant to be codec agnostic; in other words, the base encoder and base decoder can be AVC, HEVC, or any other codec in the market.

Research aspects: The interesting aspect here is that this use case assumes a legacy base decoder – most likely realized in hardware – which is enhanced with software-based implementations to improve coding efficiency or/and quality without sacrificing capabilities of the end user in terms of complexity and, thus, energy efficiency due to the software based solution. 

 

MPEG issues Call for Proposals for a New Video Coding Standard expected to have licensing terms timely available

At its 124th meeting, MPEG issued a Call for Proposals (CfP) for a new video coding standard to address combinations of both technical and application (i.e., business) requirements that may not be adequately met by existing standards. The aim is to provide a standardized video compression solution which combines coding efficiency similar to that of HEVC with a level of complexity suitable for real-time encoding/decoding and the timely availability of licensing terms.

Research aspects: This new work item is more related to business aspects (i.e., licensing terms) than technical aspects of video coding.

 

Multi-Image Application Format (MIAF) promoted to Final Draft International Standard

The Multi-Image Application Format (MIAF) defines interoperability points for creation, reading, parsing, and decoding of images embedded in High Efficiency Image File (HEIF) format by (i) only defining additional constraints on the HEIF format, (ii) limiting the supported encoding types to a set of specific profiles and levels, (iii) requiring specific metadata formats, and (iv) defining a set of brands for signaling such constraints including specific depth map and alpha plane formats. For instance, it addresses use case like a capturing device may use one of HEIF codecs with a specific HEVC profile and level in its created HEIF files, while a playback device is only capable of decoding the AVC bitstreams.

Research aspects: MIAF is an application format which is defined as a combination of tools (incl. profiles and levels) of other standards (e.g., audio codecs, video codecs, systems) to address the needs of a specific application. Thus, the research is related to use cases enabled by this application format. 

 

3DoF+ Draft Call for Proposal goes Public

Following investigations on the coding of “three Degrees of Freedom plus” (3DoF+) content in the context of MPEG-I, the MPEG video subgroup has provided evidence demonstrating the capability to encode a 3DoF+ content efficiently while maintaining compatibility with legacy HEVC hardware. As a result, MPEG decided to issue a draft Call for Proposal (CfP) to the public containing the information necessary to prepare for the final Call for Proposal expected to occur at the 125th MPEG meeting (January 2019) with responses due at the 126th MPEG meeting (March 2019).

Research aspects: This work item is about video (coding) and, thus, research is about compression efficiency.

 

What else happened at #MPEG124?

  • MPEG-DASH 3rd edition is still in the final editing phase and not yet available. Last time, I wrote that we expect final publication later this year or early next year and we hope this is still the case. At this meeting Amendment.5 is progressed to DAM and conformance/reference software for SRD, SAND and Server Push is also promoted to DAM. In other words, DASH is pretty much in maintenance mode.
  • MPEG-I (systems part) is working on immersive media access and delivery and I guess more updates will come on this after the next meeting. OMAF is working on a 2nd edition for which a working draft exists and phase 2 use cases (public document) and draft requirements are discussed.
  • Versatile Video Coding (VVC): working draft 3 (WD3) and test model 3 (VTM3) has been issued at this meeting including a large number of new tools. Both documents (and software) will be publicly available after editing periods (Nov. 23 for WD3 and Dec 14 for VTM3).

 

JPEG Column: 81st JPEG Meeting in Vancouver, Canada

The 81st JPEG meeting was held in Vancouver, British Columbia, Canada, at which significant efforts were put into the analysis of the responses to the call for proposals on the next generation image coding standard, nicknamed JPEG XL, that is expected to provide a solution for image format with improved quality and flexibility, allied with a better compression efficiency. The responses to the call confirms the interest of different parties on this activity. Moreover, the initial  subjective and objective evaluation of the different proposals confirm a significative evolution on both quality and compression efficiency that will be provided by the future standard.

Apart the multiple activities related with several standards development, a workshop on Blockchain technologies was held at Telus facilities in Vancouver, with several talks on Blockchain and Distributed Ledger Technologies, and a Panel where the influence of these technologies on multimedia was analysed and discussed. A new workshop is planned at the 82nd JPEG meeting to be held in Lisbon, Portugal, in January 2019.

The 81st JPEG meeting had the following highlights:JPEG81VancouverCut

  • JPEG Completes Initial Assessment on Responses for the Next Generation Image Coding Standard (JPEG XL);
  • Workshop on Blockchain technology;
  • JPEG XS Core Coding System submitted to ISO for immediate publication as International Standard;
  • HTJ2K achieves Draft International Status;
  • JPEG Pleno defines a generic file format syntax architecture.

The following summarizes various highlights during JPEG’s Vancouver meeting.

JPEG XL completes the initial assessment of responses to the call for proposals

 The JPEG Committee launched the Next Generation Image Coding activity, also referred to as JPEG XL, with the aim of developing a standard for image coding that offers substantially better compression efficiency than existing image formats, along with features desirable for web distribution and efficient compression of high quality images. A Call for Proposals on Next Generation Image Coding was issued at the 79th JPEG meeting.

Seven submissions were received in response to the Call for Proposals. The submissions, along with the anchors, were evaluated in subjective tests by three independent research labs. At the 81st JPEG meeting in Vancouver, Canada, the proposals were evaluated using subjective and objective evaluation metrics, and a verification model (XLM) was agreed upon. Following this selection process, a series of experiments have been designed in order to compare the performance of the current XLM with alternative choices as coding components including those technologies submitted by some of the top performing submissions; these experiments are commonly referred to as core experiments and will serve to further refine and improve the XLM towards the final standard. 

Workshop on Blockchain technology

On October 16th, 2018, JPEG organized its first workshop on Media Blockchain in Vancouver. Touradj Ebrahimi JPEG Convenor and Frederik Temmermans a leading JPEG expert, presented on the background of the JPEG standardization committee and ongoing JPEG activities such as JPEG Privacy and Security. Thereafter, Eric Paquet, Victoria Lemieux and Stephen Swift shared their experiences related to blockchain technology focusing on standardization challenges and formalization, real world adoption in media use cases and the state of the art related to consensus models. The workshop closed with an interactive discussion between the speakers and the audience, moderated by JPEG Requirements Chair Fernando Pereira.

The presentations from the workshop are available for download on the JPEG website. In January 2019, during the 82nd JPEG meeting in Lisbon, Portugal, a 2nd workshop will be organized to continue the discussion and interact with European stakeholders. More information about the program and registration will be made available on jpeg.org.

In addition to the workshop, JPEG issued an updated version of its white paper “JPEG White paper: Towards a Standardized Framework for Media Blockchain and Distributed Ledger Technologies” that elaborates on the blockchain initiative, exploring relevant standardization activities, industrial needs and use cases. The white paper will be further extended in the future with more elaborated use cases and conclusions drawn from the workshops. To keep informed and get involved in the discussion, interested parties are invited to register to the ad hoc group’s mailing list via http://jpeg-blockchain-list.jpeg.org.

WorkshopBlockChainCut

Touradj Ebrahimi, convenor of JPEG, giving the introductory talk in the Workshop on Blockchain technology.


JPEG XS

The JPEG committee is pleased to announce a significant milestone of the JPEG XS project, with the Core Coding System (aka JPEG XS Part-1) submitted to ISO for immediate publication as International Standard. This project aims at the standardization of a near-lossless low-latency and lightweight compression scheme that can be used as a mezzanine codec within any AV market. Among the targeted use cases are video transport over professional video links (SDI, IP, Ethernet), real-time video storage, memory buffers, omnidirectional video capture and rendering, and sensor compression (for example in cameras and in the automotive industry). The Core Coding System allows for visual transparent quality at moderate compression rates, scalable end-to-end latency ranging from less than a line to a few lines of the image, and low complexity real time implementations in ASIC, FPGA, CPU and GPU. Beside the Core Coding System, Profiles and levels (addressing specific application fields and use cases), together with the transport and container formats (defining different means to store and transport JPEG XS codestreams in files, over IP networks or SDI infrastructures) are also being finalized and their expected submission for publication as International Standard is Q1 2019.

HTJ2K

The JPEG Committee has reached a major milestone in the development of an alternative block coding algorithm for the JPEG 2000 family of standards, with ISO/IEC 15444-15 High Throughput JPEG 2000 (HTJ2K) achieving Draft International Status (DIS).

The HTJ2K algorithm has demonstrated an average tenfold increase in encoding and decoding throughput compared to the algorithm currently defined by JPEG 2000 Part 1. This increase in throughput results in an average coding efficiency loss of 10% or less in comparison to the most efficient modes of the block coding algorithm in JPEG 2000 Part 1, and enables mathematically lossless transcoding to and from JPEG 2000 Part 1 codestreams.

The JPEG Committee has begun the development of HTJ2K conformance codestreams and reference software.

JPEG Pleno

The JPEG Committee is currently pursuing three activities in the framework of the JPEG Pleno Standardization: Light Field, Point Cloud and Holographic content coding.

At the Vancouver meeting, a generic file format syntax architecture was outlined that allows for efficient exchange of these modalities by utilizing a box-based file format. This format will enable the carriage of light field, point cloud and holography data, including associated metadata for colour space specification, camera calibration etc. In the particular case of light field data, this will encompass both texture and disparity information.

For coding of point clouds and holographic data, activities are still in exploratory phase addressing the elaboration of use cases and the refinement of requirements for coding such modalities. In addition, experimental procedures are being designed to facilitate the quality evaluation and testing of technologies that will be submitted in later calls for coding technologies. Interested parties active in point cloud and holography related markets and applications, both from industry and academia are welcome to participate in this standardization activity.

Final Quote

“JPEG XL standard will enable a higher quality content while improving on compression efficiency and offering new features useful for emerging multimedia applications. said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

About JPEG

The Joint Photographic Experts Group (JPEG) is a Working Group of ISO/IEC, the International Organisation for Standardization / International Electrotechnical Commission, (ISO/IEC JTC 1/SC 29/WG 1) and of the International Telecommunication Union (ITU-T SG16), responsible for the popular JPEG, JPEG 2000, JPEG XR, JPSearch and more recently, the JPEG XT, JPEG XS, JPEG Systems and JPEG Pleno families of imaging standards.  

The JPEG Committee nominally meets four times a year, in different world locations. The 81st JPEG Meeting was held on 12-19 October 2018, in Vancouver, Canada. The next 82nd JPEG Meeting will be held on 19-25 January 2019, in Lisbon, Portugal.

More information about JPEG and its work is available at www.jpeg.org or by contacting Antonio Pinheiro or Frederik Temmermans (pr@jpeg.org) of the JPEG Communication Subgroup.

If you would like to stay posted on JPEG activities, please subscribe to the jpeg-news mailing list on http://jpeg-news-list.jpeg.org.  

Future JPEG meetings are planned as follows:

  • No 82, Lisbon, Portugal, January 19 to 25, 2019
  • No 83, Geneva, Switzerland, March 16 to 22, 2019
  • No 84, Brussels, Belgium, July 13 to 19, 2019

 

Towards an Integrated View on QoE and UX: Adding the Eudaimonic Dimension

In the past, research on Quality of Experience (QoE) has frequently been limited to networked multimedia applications, such as the transmission of speech, audio and video signals. In parallel, usability and User Experience (UX) research addressed human-machine interaction systems which either focus on a functional (pragmatic) or aesthetic (hedonic) aspect of the experience of the user. In both, the QoE and UX domains, the context (mental, social, physical, societal etc.) of use has mostly been considered as a control factor, in order to guarantee the functionality of the service or the ecological validity of the evaluation. This situation changes when systems are considered which explicitly integrate the usage environment and context they are used in, such as Cyber-Physical Systems (CPS), used e.g. in smart home or smart workplace scenarios. Such systems dispose of sensors and actuators which are able to sample and manipulate the environment they are integrated into, and thus the interaction with them is somehow moderated through the environment; e.g. the environment can react to a user entering a room. In addition, such systems are used for applications which differ from standard multimedia communication in the sense that they are frequently used over a long or repeating period(s) of time, and/or in a professional use scenario. In such application scenarios the motivation of system usage can be divided between the actual system user and a third party (e.g. the employer) resulting in differing factors affecting related experiences (in comparison to services which are used on the user’s own account). However, the impact of this duality of usage motivation on the resulting QoE or UX has rarely been addressed in existing research of both scientific communities. 

In the context of QoE research, the European Network on Quality of Experience in Multimedia Systems and Services, Qualinet (COST Action IC 1003) as well as a number of Dagstuhl seminars [see note from the editors], started a scientific discussion about the definition of the term QoE and related concepts around 2011. This discussion resulted in a White Paper which defines QoE as “the degree of delight or annoyance of the user of an application or service. It results from the fulfillment of his or her expectations with respect to the utility and/ or enjoyment of the application or service in the light of the users personality and current state.” [White Paper 2012]. Besides this definition, the white paper describes a number of factors that influence a user’s QoE perception, e.g. human-, system- and contextual factors. Although this discussion lists a large set of influencing factors quite thoroughly, it still focuses on rather short-term (or episodic) and media related hedonic experiences. A first step towards integrating an additional (quality) dimension (to the hedonic one) has been described in [Hammer et al., 2018], where the authors introduced the eudaimonic perspective as being the user’s overall well-being as a result of system usage. The term “eudaimonic” stems from Aristoteles and is commonly used to designate a deeper degree of well-being, as a result of a self-fulfillment by developing one’s own strengths.

On a different side, UX research has historically evolved from usability research (which was for a long time focusing on enhancing the efficiency and effectiveness of the system), and was initially concerned with the prevention of negative emotions related to technology use. As an important contributor for such preventions, pragmatic aspects of analyzed ICT systems have been identified in usability research. However, the twist towards a modern understanding of UX focuses on the understanding of human-machine interaction as a specific emotional experience (e.g., pleasure) and considers pragmatic aspects only as enablers of positive experiences but not as contributors to positive experiences. In line with this understanding, the concept of Positive or Hedonic Psychology, as introduced by [Kahnemann 1999], has been embedded and adopted in HCI and UX research. As a result, the related research community has mainly focused on the hedonic aspects of experiences as described in [Diefenbach 2014] and as critically outlined by [Mekler 2016] in which the authors argue that this concentration on hedonic aspects has overcasted the importance of eudaimonic aspects of well-being as described in positive psychology. With respect to the measurement of user experiences, the devotion towards hedonic psychology comes also with the need for measuring emotional responses (or experiential qualities). In contrast to the majority of QoE research, where the measurement of the (single) experienced (media) quality of a multimedia system is in the focus, the measurement of experiential qualities in UX calls for the measurement of a range of qualities (e.g. [Bargas-Avila 2011] lists affect, emotion, fun, aesthetics, hedonic and flow as qualities that are assessed in the context of UX). Hence, this measurement approach considers a considerable broader range of quantified qualities. However, the development of the UX domain towards a design-based UX research that steers away from quantitatively measurable qualities and focuses more towards a qualitative research approach (that does not generate measurable numbers) has marginalized this measurement or model-based UX research camp in recent UX developments as denoted by [Law 2014].

While existing work in QoE mainly focuses on hedonic aspects (and in UX, also on pragmatic ones), eudaimonic aspects such as the development of one’s own strengths have not been considered extensively so far in the context of both research areas. Especially in the usage context of professional applications, the meaningfulness of system usage (which is strongly related to eudaimonic aspects) and the growth of the user’s capabilities will certainly influence the resulting experiential quality(ies). In particular, professional applications must be designed such that the user continues to use the system in the long run without frustration, i.e. provide long-term acceptance for applications which the user is required to use by the employer. In order to consider these aspects, the so-called “HEP cube” has been introduced in [Hammer et al. 2018]. It opens a 3-dimensional space of hedonic (H), eudaimonic (E) and pragmatic (P) aspects of QoE and UX, which are integrated towards a Quality of User Experience (QUX) concept.

Whereas a simple definition of QUX has not yet been set up in this context, a number of QUX-related aspects, e.g. utility (P), joy-of-use (H), meaningfulness (E), have been integrated into a multidimensional HEP construct. This construct is displayed in Figure 1. In addition to the well-known hedonic and pragmatic aspects of UX, it incorporates the eudaimonic dimension. Thereby, it shows the assumed relationships between aforementioned aspects of User Experience and QoE, and in addition usefulness and motivation (which is strongly related to the eudaimonic dimension). These aspects are triggered by user needs (first layer) and moderated by the respective dimension aspects joy-of-use (for hedonic), ease-of-use (pragmatic), and purpose-of-use (eudaimonic). The authors expect that a consideration of the additional needs and QUX aspects, and an incorporation of these aspects into application design, will not only lead to higher acceptance rates, but also to deep-grounded well-being of users. Furthermore, incorporation of these aspects into QoE and / or QUX modelling will improve their respective prediction performance and ecological validity.

towardsAnIntegratedViewQoEandUX_AddingEudaimonicDimension

Figure 1: QUX as a multidimensional construct involving HEP attributes, existing QoE/UX, need fulfillment and motivation. Picture taken from Hammer, F., Egger-Lampl, S., Möller, S.: Quality-of-User-Experience: A Position Paper, Quality and User Experience, Springer (2018).

References

  • [White Paper 2012] Qualinet White Paper on Definitions of Quality of Experience (2012).  European Network on Quality of Experience in Multimedia Systems and  Services (COST Action IC 1003), Patrick Le Callet, Sebastian Möller and Andrew Perkis, eds., Lausanne, Switzerland, Version 1.2, March 2013.
  • [Kahnemann 1999] Kahneman, D.: Well-being: Foundations of Hedonic Psychology, chap. Objective Happiness, pp. 3{25. Russell Sage Foundation Press, New York (1999)
  • [Diefenbach 2014] Diefenbach, S., Kolb, N., Hassenzahl, M.: The `hedonic’ in human-computer interaction: History, contributions, and future research directions. In: Proceedings of the 2014 conference on Designing interactive systems, pp. 305{314. ACM (2014)
  • [Mekler 2016] Mekler, E.D., Hornbaek, K.: Momentary pleasure or lasting meaning?: Distinguishing eudaimonic and hedonic user experiences. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 4509{4520. ACM (2016)
  • [Bargas-Avila 2011] Bargas-Avila, J.A., Hornbaek, K.: Old wine in new bottles or novel challenges: A critical analysis of empirical studies of user experience. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 2689{2698. ACM (2011)
  • [Law 2014] Law, E.L.C., van Schaik, P., Roto, V.: Attitudes towards user experience (UX) measurement. International Journal of Human-Computer Studies 72(6), 526{541 (2014)
  • [Hammer et al. 2018] Hammer, F., Egger-Lampl, S., Möller, S.: Quality-of-User-Experience: A Position Paper, Quality and User Experience, Springer (2018).

Note from the editors:

More details on the integrated view of QoE and UX can be found in Hammer, F., Egger-Lampl, S. & Möller, “Quality-of-user-experience: a position paper”. Springer Quality and User Experience (2018) 3: 9. https://doi.org/10.1007/s41233-018-0022-0

The Dagstuhl seminars mentioned by the authors started a scientific discussion about the definition of the term QoE in 2009. Three Dagstuhl Seminars were related to QoE: 09192 “From Quality of Service to Quality of Experience” (2009), 12181 “Quality of Experience: From User Perception to Instrumental Metrics” (2012), and 15022 “Quality of Experience: From Assessment to Application” (2015). A Dagstuhl Perspectives Workshop 16472 “QoE Vadis?” followed in 2016 which set out to jointly and critically reflect on future perspectives and directions of QoE research. During the Dagstuhl Perspectives Workshop, the QoE-UX wedding proposal came up to marry the area of QoE and UX. The reports from the Dagstuhl seminars  as well as the Manifesto from the Perspectives Workshop are available online and listed below.

One step towards an integrated view of QoE and UX is reflected by QoMEX 2019. The 11th International Conference on Quality of Multimedia Experience will be held in June 5th to 7th, 2019 in Berlin, Germany. It will bring together leading experts from academia and industry to present and discuss current and future research on multimedia quality, quality of experience (QoE) and user experience (UX). This way, it will contribute towards an integrated view on QoE and UX, and foster the exchange between the so-far distinct communities. More details: https://www.qomex2019.de/

 

SIGMM Records: News, Statistics, and Call for Contributions & Suggestions

 

A new editorial team has committed to lead the ACM SIGMM Records since the issue of January 2017. The goal is to consolidate the Records as a primary source of information and a communication vehicle for the multimedia community. With these objectives in mind, the Records were re-organized around three main categories (Open Science, Information, and Opinion), for which specific sections and columns were created (more details in http://sigmm.hosting.acm.org/2017/05/08/sigmm-records-serving-the-community/).

statistics october 2018

Since then, all sections and columns have provided relevant and high-quality contributions, with a higher impact than anticipated. Since the new epoch of the Records, apart from new columns, two additional initiatives have been incorporated:

  • Best social media reporter: It was decided to award the SIGMM members with the most intense and valuable posts on Social Media during the SIGMM conferences. The selected Best Social Media Reporters are asked to provide a post-conference report to be published in the Records, and get a free registration to one of the upcoming SIGMM conferences. Up to now, the awardees have been: Miriam Redi (ICMR 2017), Christian Timmerer (MMSYS 2017), Benoit Huet and Conor Keighrey (MM 2017), Cathal Gurrin (ICMR 2018) and Gwendal Simon (MMSYS 2018). The criteria for the awards are specified here: http://sigmm.hosting.acm.org/2017/05/20/awarding-the-best-social-media-reporters/
  • Section on QoE: Starting in the third issue of 2018 (September 2018), the Records include a new section on QoE, edited by Tobias Hoßfeld and Christian Timmerer. You can find here the introduction column: http://sigmm.hosting.acm.org/2018/09/08/quality-of-experience-column-an-introduction/ 

Apart from the recurrent sections, the community has as well contributed with relevant feature articles. Some examples include the article about the flow of ideas around SIGMM conferences by Lexing Xie, the article about ACM Fellows in SIGMM by Alan Smeaton, the SIGMM Annual Report (2018) by the Chairs, and an article about data driven statistics and trends in SIGMM conferences by David Ayman Shamma.

Finally, the editorial team is also working on infrastructural aspects together with ACM. First, an effective communication protocol with the ACM Digital Library has been established, enabling the publication of the issues and individual contributions in HTML format. SIGMM has indeed been pioneering in adopting the HTML format in the publication of articles. Second, the process for migrating the Records website to an ACM server and domain has started, and should be completed before the end of the year.

Pablo Cesar, the editor-in-chief, presented the new team, structure and impact at ACM MM 2017 and will update the community during ACM MM2018.

pablo_acm mm

 

Reach of the SIGMM Records

Since August 2018, we have been collecting statistics about visitors and visits to the Records website, and making use of Social Media for disseminating the contributions and news. In these 13 months, the daily number of visitors have ranged approximately between 100 and 400, being this variation strongly influenced by the publication of Social Media posts promoting published contents. In these last 13 months, more than 80000 visitors and nearly 500000 visits (i.e. clicks) have been registered.

The top 3 countries with highest number of visitors are US (>19000), China (>10000) and Germany (nearly 7000), and the top 10 all surpass 2000 visitors. Likewise, the top 3 posts with highest impact, in terms of number of visits are listed in Table 1.

Table 1. Top 3 posts on the Records website with highest impact

Post Publication Date Number of Visits
Impact of the New @sigmm Records September 2017 3051 visits
Standards Column: JPEG and MPEG May 2017 1374 visits
Practical Guide to Using the YFCC100M and MMCOMMONS on a Budget October 2017 786 visits

Finally, the top 3 referring sites (i.e., external websites from which visitors have clicked an URL to access the Records website) are Facebook (around 2500 references), Google (around 2500 references) and Twitter (>700 references). According to this, it seems clear that the social media strategy implemented by the editorial team is positively impacting the Records.

Regarding Social Media, two @sigmm channels are being used: a Facebook page and a Twitter account (@sigmm). The number of followers is still not high in Facebook (47), but it has significantly increased in Twitter (247) compared to the previous report. However, the impact of the posts on these platforms, in terms of reach, likes and shares is noteworthy. In Facebook, there are posts that have reached more than 1000 users, and in Twitter there are many tweets with tens of re-tweets and likes.

Contribute!

Our mission is to keep improving and consolidate the Records, and we are very open to getting extra help and feedback. So, if you would like to become member of our team, or simply have suggestions or ideas, please drop us a line!

We hope you are enjoying every new edition of the Records.

The Editorial team

JPEG Column: 80th JPEG Meeting in Berlin, Germany

The 80th JPEG meeting was held in Berlin, Germany, from 7 to 13 July 2018. During this meeting, JPEG issued a record number of ballots and output documents, spread through the multiple activities taking place. These record numbers are very revealing of the level of commitment of JPEG standardisation committee. A strong effort is being accomplished on the standardisation of new solutions for the emerging image technologies enabling the interoperability of different systems on the growing market of multimedia. Moreover, it is intended that these new initiatives should provide royalty-free patent licensing solutions at least in one of the available profiles, which shall promote a wider adoption of these future JPEG standards from the consumer market, and applications and systems developers.

A significant progress in low latency and high throughput standardisation initiatives has taken place at Berlin meetings. The new part 15 of JPEG 2000, known as High Throughput JPEG 2000 (HTJ2K), is finally ready and reached committee draft status. Furthermore, JPEG XS profiles and levels were released for their second and final ballot. Hence, these new low complexity standards foresee to be finalised in a short time, providing new solutions for developers and consumers on applications where mobility is important and large bandwidth is available. Virtual and augmented reality, as well as 360º images and video, are among the several applications that might benefit from these new standards.

Berlin80T1cut

JPEG meeting plenary in Berlin.

The 80th JPEG meeting had the following highlights:

  • HTJ2K reaches Committee Draft status;
  • JPEG XS profiles and levels are under ballot;
  • JPEG XL publishes additional information to the CfP;
  • JPEG Systems – JUMBF & JPEG 360;
  • JPEG-in-HEIF;
  • JPEG Blockchain white paper;
  • JPEG Pleno Light Field verification model.

The following summarizes the various highlights during JPEG’s Berlin meeting.

HTJ2K

The JPEG committee is pleased to announce a significant milestone, with ISO/IEC 15444-15 High-Throughput JPEG 2000 (HTJ2K) reaching Committee Draft status.

HTJ2K introduces a new FAST block coder to the JPEG 2000 family. The FAST block coder can be used in place of the JPEG 2000 Part 1 arithmetic block coder, and, as illustrated in Table 1, offers in average an order of magnitude increase on decoding and encoding throughput – at the expense of slightly reduced coding efficiency and elimination of quality scalability.

Table 1. Comparison between FAST block coder and JPEG 2000 Part 1 arithmetic block coder. Results were generated by optimized implementations evaluated as part of the HTJ2K activity, using professional video test images in the transcoding context specified in the Call for Proposal available at https://jpeg.org.  Figures are relative to JPEG2000 Part1 arithmetic block coder (bpp – bits per pixel).

JPEG 2000 Part 1 Block Coder Bitrate 0.5 bpp 1 bpp 2 bpp 4 bpp 6 bpp lossless
Average FAST Block Coder Speedup Factor 17.5x 19.5x 21.1x 25.5x 27.4x 43.7x
Average FAST Block Decoder Speedup Factor 10.2x 11.4x 11.9x 14.1x 15.1x 24.0x
Average Increase in Codestream Size  8.4%  7.3%   7.1% 6.6%  6.5%  6.6% 

Apart from the block coding algorithm itself, the FAST block coding algorithm does not modify the JPEG 2000 codestream, and allows mathematically lossless transcoding to and from JPEG 2000 codestreams. As a result the FAST block coding algorithm can be readily integrated into existing JPEG 2000 applications, where it can bring significant increases in processing efficiency. 

 

JPEG XS

This project aims at the standardization of a visually lossless low-latency and lightweight compression scheme that can be used as a mezzanine codec for the broadcast industry, Pro-AV and other markets. Targeted use cases are video transport over professional video links (SDI, IP, Ethernet), real-time video storage, memory buffers, omnidirectional video capture and rendering, and sensor compression (in particular in the automotive industry). The Core Coding System, expected to be published in Q4 2018 allows for visually lossless quality at 6:1 compression ratio for most content, 32 lines end-to-end latency, and ultra low complexity implementations in ASIC, FPGA, CPU and GPU. Following the 80th JPEG meeting in Berlin, profiles and levels (addressing specific application fields and use cases) are now under final ballot (expected publication in Q1 2019). Different means to store and transport JPEG XS codestreams in files, over IP networks or SDI infrastructures are also defined and go to a first ballot.

 

JPEG XL

The JPEG Committee issued a Call for Proposals (CfP) following its 79th meeting (April 2018), with the objective of seeking technologies that fulfill the objectives and scope of the Next-Generation Image Coding activity. The CfP, with all related info, can be found in https://jpeg.org/downloads/jpegxl/jpegxl-cfp.pdf. The deadline for expression of interest and registration was August 15, 2018, and submissions to the CfP were due on September 1, 2018. 

As outcome of the 80th JPEG meeting in Berlin, a document was produced containing additional information related to the objective and subjective quality assessment methodologies that will be used to evaluate the anchors and proposals to the CfP, available on https://jpeg.org/downloads/jpegxl/wg1n80024-additional-information-cfp.pdf. Moreover, a detailed workflow is described, together with the software and command lines used to generate the anchors and to compute objective quality metrics.

To stay posted on the action plan of JPEG XL, please regularly consult our website at jpeg.org and/or subscribe to its e-mail reflector.

 

JPEG Systems – JUMBF & JPEG 360

The JPEG Committee progressed towards a common framework and definition for metadata which will improve the ability to share 360 images. At the 80th meeting, the Committee Draft ballot was completed, the comments reviewed, and is now progressing towards DIS text for upcoming ballots on “JPEG Universal Metadata Box Format (JUMBF)” as ISO/IEC 19566-5, and “JPEG 360” as ISO/IEC 19566-6. Investigations have started to apply the framework on the structure of JPEG Pleno files.

 

JPEG-in-HEIF

The JPEG Committee made significant progress towards standardizing how JPEG XR, JPEG 2000 and the upcoming JPEG XS will be carried in ISO/IEC 23008-12 image file container.

 

JPEG Blockchain

Fake news, copyright violation, media forensics, privacy and security are emerging challenges for digital media. JPEG has determined that blockchain technology has great potential as a technology component to address these challenges in transparent and trustable media transactions. However, blockchain needs to be integrated closely with a widely adopted standard to ensure broad interoperability of protected images. JPEG calls for industry participation to help define use cases and requirements that will drive the standardization process. To reach this objective, JPEG issued a white paper entitled “Towards a Standardized Framework for Media Blockchain” that elaborates on the initiative, exploring relevant standardization activities, industrial needs and use cases. In addition, JPEG plans to organise a workshop during its 81st meeting in Vancouver on Tuesday 16th October 2018. More information about the workshop is available on https://www.jpeg.org. To keep informed and get involved, interested parties are invited to register on the ad hoc group’s mailing list at http://jpeg-blockchain-list.jpeg.org.

 

JPEG Pleno

The JPEG Committee is currently pursuing three activities in the framework of the JPEG Pleno Standardization: Light Field, Point Cloud and Holographic content coding.

At its Berlin meeting, a first version of the verification model software for light field coding has been produced. This software supports the core functionality that was indented for the light field coding standard. It serves for intensive testing of the standard. JPEG Pleno Light Field Coding supports various sensors ranging from lenslet cameras to high-density camera arrays, light field related content production chains up to light field displays.

For coding of point clouds and holographic data, activities are still in exploratory phase addressing the elaboration of use cases and the refinement of requirements for coding such modalities. In addition, experimental procedures are being designed to facilitate the quality evaluation and testing of technologies that will be submitted in later calls for coding technologies. Interested parties active in point cloud and holography related markets and applications, both from industry and academia are welcome to participate in this standardization activity.

 

Final Quote 

“After a record number of ballots and output documents generated during its 80th meeting, the JPEG Committee pursues its activity on the specification of effective and reliable solutions for image coding offering needed features in emerging multimedia applications. The new JPEG XS and JPEG 2000 part 15 provide low complexity compression solutions that will benefit many growing markets such as content production, virtual and augmented reality as well as autonomous cars and drones.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

 

About JPEG

The Joint Photographic Experts Group (JPEG) is a Working Group of ISO/IEC, the International Organisation for Standardization / International Electrotechnical Commission, (ISO/IEC JTC 1/SC 29/WG 1) and of the International Telecommunication Union (ITU-T SG16), responsible for the popular JBIG, JPEG, JPEG 2000, JPEG XR, JPSearch and more recently, the JPEG XT, JPEG XS, JPEG Systems and JPEG Pleno families of imaging standards.

The JPEG Committee nominally meets four times a year, in different world locations. The 80th JPEG Meeting was held on 7-13 July 2018, in Berlin, Germany. The next 81st JPEG Meeting will be held on 13-19 October 2018, in Vancouver, Canada.

More information about JPEG and its work is available at www.jpeg.org or by contacting Antonio Pinheiro or Frederik Temmermans (pr@jpeg.org) of the JPEG Communication Subgroup.

If you would like to stay posted on JPEG activities, please subscribe to the jpeg-news mailing list on http://jpeg-news-list.jpeg.org.  

 

Future JPEG meetings are planned as follows:JPEG-signature

  • No 81, Vancouver, Canada, October 13 to 19, 2018
  • No 82, Lisbon, Portugal, January 19 to 25, 2019
  • No 83, Geneva, Switzerland, March 16 to 22, 2019

MPEG Column: 123rd MPEG Meeting in Ljubljana, Slovenia

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

IMG_5700The MPEG press release comprises the following topics:

  • MPEG issues Call for Evidence on Compressed Representation of Neural Networks
  • Network-Based Media Processing – MPEG evaluates responses to call for proposal and kicks off its technical work
  • MPEG finalizes 1st edition of Technical Report on Architectures for Immersive Media
  • MPEG releases software for MPEG-I visual activities
  • MPEG enhances ISO Base Media File Format (ISOBMFF) with new features

MPEG issues Call for Evidence on Compressed Representation of Neural Networks

Artificial neural networks have been adopted for a broad range of tasks in multimedia analysis and processing, media coding, data analytics, translation and many other fields. Their recent success is based on the feasibility of processing much larger and complex neural networks (deep neural networks, DNNs) than in the past, and the availability of large-scale training data sets. As a consequence, trained neural networks contain a large number of parameters (weights), resulting in a quite large size (e.g., several hundred MBs). Many applications require the deployment of a particular trained network instance, potentially to a larger number of devices, which may have limitations in terms of processing power and memory (e.g., mobile devices or smart cameras). Any use case, in which a trained neural network (and its updates) needs to be deployed to a number of devices could thus benefit from a standard for the compressed representation of neural networks.

At its 123rd meeting, MPEG has issued a Call for Evidence (CfE) for compression technology for neural networks. The compression technology will be evaluated in terms of compression efficiency, runtime, and memory consumption and the impact on performance in three use cases: visual object classification, visual feature extraction (as used in MPEG Compact Descriptors for Visual Analysis) and filters for video coding. Responses to the CfE will be analyzed on the weekend prior to and during the 124th MPEG meeting in October 2018 (Macau, CN).

Research aspects: As this is about “compression” of structured data, research aspects will mainly focus around compression efficiency for both lossy and lossless scenarios. Additionally, communication aspects such as transmission of compressed artificial neural networks within lossy, large-scale environments including update mechanisms may become relevant in the (near) future. Furthermore, additional use cases should be communicated towards MPEG until the next meeting.

Network-Based Media Processing – MPEG evaluates responses to call for proposal and kicks off its technical work

Recent developments in multimedia have brought significant innovation and disruption to the way multimedia content is created and consumed. At its 123rd meeting, MPEG analyzed the technologies submitted by eight industry leaders as responses to the Call for Proposals (CfP) for Network-Based Media Processing (NBMP, MPEG-I Part 8). These technologies address advanced media processing use cases such as network stitching for virtual reality (VR) services, super-resolution for enhanced visual quality, transcoding by a mobile edge cloud, or viewport extraction for 360-degree video within the network environment. NBMP allows service providers and end users to describe media processing operations that are to be performed by the entities in the networks. NBMP will describe the composition of network-based media processing services out of a set of NBMP functions and makes these NBMP services accessible through Application Programming Interfaces (APIs).

NBMP will support the existing delivery methods such as streaming, file delivery, push-based progressive download, hybrid delivery, and multipath delivery within heterogeneous network environments. MPEG issued a Call for Proposal (CfP) seeking technologies that allow end-user devices, which are limited in processing capabilities and power consumption, to offload certain kinds of processing to the network.

After a formal evaluation of submissions, MPEG selected three technologies as starting points for the (i) workflow, (ii) metadata, and (iii) interfaces for static and dynamically acquired NBMP. A key conclusion of the evaluation was that NBMP can significantly improve the performance and efficiency of the cloud infrastructure and media processing services.

Research aspects: I reported about NBMP in my previous post and basically the same applies here. NBMP will be particularly interesting in the context of new networking approaches including, but not limited to, software-defined networking (SDN), information-centric networking (ICN), mobile edge computing (MEC), fog computing, and related aspects in the context of 5G.

MPEG finalizes 1st edition of Technical Report on Architectures for Immersive Media

At its 123nd meeting, MPEG finalized the first edition of its Technical Report (TR) on Architectures for Immersive Media. This report constitutes the first part of the MPEG-I standard for the coded representation of immersive media and introduces the eight MPEG-I parts currently under specification in MPEG. In particular, it addresses three Degrees of Freedom (3DoF; three rotational and un-limited movements around the X, Y and Z axes (respectively pitch, yaw and roll)), 3DoF+ (3DoF with additional limited translational movements (typically, head movements) along X, Y and Z axes), and 6DoF (3DoF with full translational movements along X, Y and Z axes) experiences but it mostly focuses on 3DoF. Future versions are expected to cover aspects beyond 3DoF. The report documents use cases and defines architectural views on elements that contribute to an overall immersive experience. Finally, the report also includes quality considerations for immersive services and introduces minimum requirements as well as objectives for a high-quality immersive media experience.

Research aspects: ISO/IEC technical reports are typically publicly available and provides informative descriptions of what the standard is about. In MPEG-I this technical report can be used as a guideline for possible architectures for immersive media. This first edition focuses on three Degrees of Freedom (3DoF; three rotational and un-limited movements around the X, Y and Z axes (respectively pitch, yaw and roll)) and outlines the other degrees of freedom currently foreseen in MPEG-I. It also highlights use cases and quality-related aspects that could be of interest for the research community.

MPEG releases software for MPEG-I visual activities

MPEG-I visual is an activity that addresses the specific requirements of immersive visual media for six degrees of freedom virtual walkthroughs with correct motion parallax within a bounded volume. MPEG-I visual covers application scenarios from 3DoF+ with slight body and head movements in a sitting position to 6DoF allowing some walking steps from a central position. At the 123nd MPEG meeting, an important progress has been achieved in software development. A new Reference View Synthesizer (RVS 2.0) has been released for 3DoF+, allowing to synthesize virtual viewpoints from an unlimited number of input views. RVS integrates code bases from Universite Libre de Bruxelles and Philips, who acted as software coordinator. A Weighted-to-Spherically-uniform PSNR (WS-PSNR) software utility, essential to 3DoF+ and 6DoF activities, has been developed by Zhejiang University. WS-PSNR is a full reference objective quality metric for all flavors of omnidirectional video. RVS and WS-PSNR are essential software tools for the upcoming Call for Proposals on 3DoF+ expected to be released at the 124th MPEG meeting in October 2018 (Macau, CN).

Research aspects: MPEG does not only produce text specifications but also reference software and conformance bitstreams, which are important assets for both research and development. Thus, it is very much appreciated to have a new Reference View Synthesizer (RVS 2.0) and Weighted-to-Spherically-uniform PSNR (WS-PSNR) software utility available which enables interoperability and reproducibility of R&D efforts/results in this area.

MPEG enhances ISO Base Media File Format (ISOBMFF) with new features

At the 123rd MPEG meeting, a couple of new amendments related to ISOBMFF has reached the first milestone. Amendment 2 to ISO/IEC 14496-12 6th edition will add the option to have relative addressing as an alternative to offset addressing, which in some environments and workflows can simplify the handling of files and will allow creation of derived visual tracks using items and samples in other tracks with some transformation, for example rotation. Another amendment reached its first milestone is the first amendment to ISO/IEC 23001-7 3rd edition. It will allow use of multiple keys to a single sample and scramble some parts of AVC or HEVC video bitstreams without breaking conformance to the existing decoders. That is, the bitstream will be decodable by existing decoders, but some parts of the video will be scrambled. It is expected that these amendments will reach the final milestone in Q3 2019.

Research aspects: The ISOBMFF reference software is now available on Github, which is a valuable service to the community and allows for active standard’s participation even from outside of MPEG. It is recommended that interested parties have a look at it and consider contributing to this project.


What else happened at #MPEG123?

  • The MPEG-DASH 3rd edition is finally available as output document (N17813; only available to MPEG members) combining 2nd edition, four amendments, and 2 corrigenda. We expect final publication later this year or early next year.
  • There is a new DASH amendment and corrigenda items in pipeline which should progress to final stages also some time next year. The status of MPEG-DASH (July 2018) can be seen below.

DASHstatus0718

  • MPEG received a rather interesting input document related to “streaming first” which resulted into a publicly available output document entitled “thoughts on adaptive delivery and access to immersive media”. The key idea here is to focus on streaming (first) rather than on file/encapsulation formats typically used for storage (and streaming second). This document should become available here.
  • Since a couple of meetings, MPEG maintains a standardization roadmap highlighting recent/major MPEG standards and documenting the roadmap for the next five years. It definitely worth keeping this in mind when defining/updating your own roadmap.
  • JVET/VVC issued Working Draft 2 of Versatile Video Coding (N17732 | JVET-K1001) and Test Model 2 of Versatile Video Coding (VTM 2) (N17733 | JVET-K1002). Please note that N-documents are MPEG internal but JVET-documents are publicly accessible here: http://phenix.it-sudparis.eu/jvet/. An interesting aspect is that VTM2/WD2 should have >20% rate reduction compared to HEVC, all with reasonable complexity and the next benchmark set (BMS) should have close to 30% rate reduction vs. HEVC. Further improvements expected from (a) improved merge, intra prediction, etc., (b) decoder-side estimation with low complexity, (c) multi-hypothesis prediction and OBMC, (d) diagonal and other geometric partitioning, (e) secondary transforms, (f) new approaches of loop filtering, reconstruction and prediction filtering (denoising, non-local, diffusion based, bilateral, etc.), (g) current picture referencing, palette, and (h) neural networks.
  • In addition to VVC — which is a joint activity with VCEG –, MPEG is working on two video-related exploration activities, namely (a) an enhanced quality profile of the AVC standard and (b) a low complexity enhancement video codec. Both topics will be further discussed within respective Ad-hoc Groups (AhGs) and further details are available here.
  • Finally, MPEG established an Ad-hoc Group (AhG) dedicated to the long-term planning which is also looking into application areas/domains other than media coding/representation.

In this context it is probably worth mentioning the following DASH awards at recent conferences

Additionally, there have been two tutorials at ICME related to MPEG standards, which you may find interesting

Quality of Experience Column: An Introduction

“Quality of Experience (QoE) is the degree of delight or annoyance of the user of an application or service. It results from the fulfillment of his or her expectations with respect to the utility and / or enjoyment of the application or service in the light of the user’s personality and current state.“ (Definition from the Qualinet Whitepaper 2013).

Research on Quality of Experience (QoE) has advanced significantly in recent years and attracts attention from various stakeholders. Different facets have been addressed by the research community like subjective user studies to identify QoE influence factors for particular applications like video streaming, QoE models to capture the effects of those influence factors on concrete applications, QoE monitoring approaches at the end user site but also within the network to assess QoE during service consumption and to provide means for QoE management for improved QoE. However, in order to progress in the area of QoE, new research directions have to be taken. The application of QoE in practice needs to consider the entire QoE eco-system and the stakeholders along the service delivery chain to the end user.

The term Quality of Experience dates back to a presentation in 2001 (interestingly, at a Quality of Service workshop) and Figure 1 depicts an overview of QoE showing some of the influence factors.

QualityofExperience

Figure 1. Quality of Experience (from Ebrahimi’09)

Different communities have been very active in the context of QoE. A long-established community is Qualinet which started in 2010. The Qualinet community (www.qualinet.eu) provided a definition of QoE in its [Qualinet Whitepaper] which is a contribution of the European Network on Quality of Experience in Multimedia Systems and Services, Qualinet (COST Action IC 1003), to the scientific discussion about the term QoE and its underlying concepts. The concepts and ideas cited in this paper mainly refer to the Quality of Experience of multimedia communication systems, but may be helpful also for other areas where QoE is an issue. Qualinet is organized in different task forces which address various research topics: Managing Web and Cloud QoE; Gaming; QoE in Medical Imaging and Healthcare; Crowdsourcing; Immersive Media Experiences (IMEx). There is also a liaison relation with VQEG and a task force on Qualinet Databases providing a platform with QoE-related dataset. The Qualinet database (http://dbq.multimediatech.cz/) is seen as a key for current and future developments in Quality of Experience, which resides in a rich and internationally recognized database of content of different sorts, and to share such a database with the scientific community at large.

Another example of the Qualinet activities is the Crowdsourcing task force. The goal of this task force is among others to identify the scientific challenges and problems for QoE assessment via crowdsourcing but also the strengths and benefits, and to derive a methodology and setup for crowdsourcing in QoE assessment including statistical approaches for proper analysis. Crowdsourcing is a popular approach that outsources tasks via the Internet to a large number of users. Commercial crowdsourcing platforms provide a global pool of users employed for performing short and simple online tasks. For quality assessment of multimedia services and applications, crowdsourcing enables new possibilities by moving the subjective test into the crowd resulting in larger diversity of the test subjects, faster turnover of test campaigns, and reduced costs due to low reimbursement costs of the participants. Further, crowdsourcing allows easily addressing additional features like real-life environments. Crowdsourced quality assessment however is not a straightforward implementation of existing subjective testing methodologies in an Internet-based environment. Additional challenges and differences to lab studies occur, in conceptual, technical, and motivational areas. The white paper [Crowdsourcing Best Practices] summarizes the recommendations and best practices for crowdsourced quality assessment of multimedia applications from the Qualinet Task Force on “Crowdsourcing” and is also discussed within the standardization ITU-T P.CROWD.

A selection of QoE related communities is provided in the following to give an overview on the pervasion of QoE in research.

  • Qualinet (http://www.qualinet.eu): European Network on Quality of Experience in Multimedia Systems and Services as outlined above. Qualinet is also technical sponsor of QoMEX.  
  • QoMEX (http://qomex.org/). The International Conference on Quality of Multimedia Experience (QoMEX) is a top-ranked international conference and among the twenty-best conferences in Google Scholar for subcategory Multimedia. In 2019, the 11th International Conference on Quality of Multimedia Experience  will be held in June 5th to 7th, 2019 in Berlin, Germany. It will bring together leading experts from academia and industry to present and discuss current and future research on multimedia quality, quality of experience (QoE) and user experience (UX). This way, it will contribute towards an integrated view on QoE and UX, and foster the exchange between the so-far distinct communities.
  • ACM SIGMM (http://www.sigmm.org/): Within the ACM community, QoE plays also a significant role in the major events like ACM Multimedia (ACM MM), where “Experience” is one of the four major themes. ACM Multimedia Systems (MMSys) regularly publishes works on QoE, and included special sessions on those topics in the last years. ACM MMsys 2019 will held from June 18 – 21, 2019 in Amherst, Massachusetts, USA.
  • ICME: The IEEE International Conference on Multimedia and Expo (IEEE ICME 2019) will be held from July 8-12, 2019 in Shanghai, China. It includes in the call for papers topics such as Multimedia quality assessment and metrics, and Multi-modal media computing and human-machine interaction.
  • ACM SIGCOMM (http://www.sigcomm.com): Within ACM SIGCOMM, Internet-QoE workshops have been initiated in 2016 and 2017. The focus of the last edition was on QoE Measurements, QoE-based Traffic Monitoring and Analysis, QoE-based Network Management.
  • Tracking QoE in the Internet Workshop: A summary and the outcomes of the “Workshop on Tracking Quality of Experience in the Internet” at Princeton gives a very good impression on the QoE activities in US with a recent focus on QoE monitoring and measurable QoE parameters in the presence of constraints like encryption.  
  • SPEC RG QoE (https://research.spec.org): The mission of SPEC’s Research Group (RG) is to promote innovative research in the area of quantitative system evaluation and analysis by serving as a platform for collaborative research efforts fostering the interaction between industry and academia in the field. The SPEC research group on QoE is the starting point for the release of QoE ideas, QoE approaches, QoE measurement tools, and QoE assessment paradigms.
  • QoENet (http://www.qoenet-itn.eu) is a Marie Curie project, whose focus is the analysis, design, optimization and management of the QoE in advanced multimedia services, creating a fully-integrated and multi-disciplinary network of 12 Early Stage Researchers working in and seconded by 7 academic institutions, 3 private companies and 1 standardization institute distributed in 6 European countries and in South Korea. The project is then fulfilling the major objective of training through research of the young fellows to broader the knowledge in the field of the new generation of researchers. Significant research results have been achieved in the field of: QoE for online gaming, social TV and storytelling, and adaptive video streaming; QoE management in collaborative ISP/OTT scenarios; models for HDR, VR/AR and 3D images and videos.
  • Many QoE-related activities at a national level are also happening. For example, a community of professors and researchers from Spain organize a yearly workshop entitled “QoS and QoE in Multimedia Communications” since 2015 (URL of its latest edition: https://bit.ly/2LSlb2N). This community is targeted at establishing collaborations, sharing resources, and discussing about the latest contributions and open issues. The community is also pursuing the creation of a national network on QoE (like the Spanish Qualinet), and then involving international researchers in that network.
  • There are several standardization-related activities ongoing e.g. in standardization groups ITU, JPEG, MPEG, VQEG. Their specific interest in QoE will be summarized in one of the upcoming QoE columns.

The first QoE column will discuss how to approach an integrated view of QoE and User Experience. While research on QoE has mostly been carried out in the area of multimedia communications, user experience (UX) has addressed hedonic and pragmatic usage aspects of interactive applications. In the case of QoE, the meaningfulness of the application to the user and the forces driving the use have been largely neglected, while in the UX field, respective research has been carried out but hardly been incorporated in a model combined with the pragmatic and hedonic aspects. In the first column will be dedicated to recent ideas “Toward an integrated view of QoE and User Experience”. To give the readers an impression on the expected contents, we foresee in the upcoming QoE columns topics to discuss about recent activities like

  • Point cloud subjective evaluation methodology
  • Complex, interactive narrative design for complexity
  • Large-Scale Visual Quality Assessment Databases
  • Status and upcoming QoE activities in standardization
  • Active Learning and Machine Learning for subjective testing and QoE modeling
  • QoE in 5G: QoE management in softwarized networks with big data analytics
  • Immersive Media Experiences e.g. for VR/AR/360° video applications

Our aim for SIGMM Records is to share insights from the QoE community and to highlight recent development, new research directions, but also lessons learned and best practices. If you are interested in writing for the QoE column, or have something you would like to know more about in this area, please do not hesitate to contact the editors. The SIGMM Records editors responsible for QoE are active in different communities and QoE research directions.

The QoE column is edited by Tobias Hoßfeld and Christian Timmerer.

[Qualinet Whitepaper] Qualinet White Paper on Definitions of Quality of Experience (2012).  European Network on Quality of Experience in Multimedia Systems and Services (COST Action IC 1003), Patrick Le Callet, Sebastian Möller and Andrew Perkis, eds., Lausanne, Switzerland, Version 1.2, March 2013.” Qualinet_QoE_whitepaper_v1.2

[Crowdsourcing Best Practices] Tobias Hoßfeld et al. “Best Practices and Recommendations for Crowdsourced QoE-Lessons learned from the Qualinet Task Force ‘Crowdsourcing’” (2014). Qualinet_CSLessonsLearned_29Oct2014

Hossfeld_Tobias Tobias Hoßfeld is full professor at the University of Würzburg, Chair of Communication Networks, and is active in QoE research and teaching for more than 10 years. He finished his PhD in 2009 and his professorial thesis (habilitation) “Modeling and Analysis of Internet Applications and Services” in 2013 at the University of Würzburg. From 2014 to 2018, he was head of the Chair “Modeling of Adaptive Systems” at the University of Duisburg-Essen, Germany. He has published more than 100 research papers in major conferences and journals and received the Fred W. Ellersick Prize 2013 (IEEE Communications Society) for one of his articles on QoE. Among others, he is member of the advisory board of the ITC (International Teletraffic Congress), the editorial board of IEEE Communications Surveys & Tutorials and of Springer Quality and User Experience.
ct2013oct Christian Timmerer received his M.Sc. (Dipl.-Ing.) in January 2003 and his Ph.D. (Dr.techn.) in June 2006 (for research on the adaptation of scalable multimedia content in streaming and constrained environments) both from the Alpen-Adria-Universität (AAU) Klagenfurt. He joined the AAU in 1999 (as a system administrator) and is currently an Associate Professor at the Institute of Information Technology (ITEC) within the Multimedia Communication Group. His research interests include immersive multimedia communications, streaming, adaptation, Quality of Experience, and Sensory Experience. He was the general chair of WIAMIS 2008, QoMEX 2013, and MMSys 2016 and has participated in several EC-funded projects, notably DANAE, ENTHRONE, P2P-Next, ALICANTE, SocialSensor, COST IC1003 QUALINET, and ICoSOLE. He also participated in ISO/MPEG work for several years, notably in the area of MPEG-21, MPEG-M, MPEG-V, and MPEG-DASH where he also served as standard editor. In 2012 he cofounded Bitmovin (http://www.bitmovin.com/) to provide professional services around MPEG-DASH where he holds the position of the Chief Innovation Officer (CIO).

Opinion Column: Review Process of ACM Multimedia

 

This quarter, our Community column is dedicated to the review process of ACM Multimedia (MM). We report the summary of discussions arisen at various points in time, after the first round of reviews were returned to authors.

The core part of the discussion focused on how to improve review quality for ACM MM. Some participants pointed out that there have been complaints about the level and usefulness of some reviews in recent editions of ACM Multimedia. The members of our discussion forums (Facebook and Linkedin) proposed some solutions.

A semi-automated paper assignment. Participants debated about the best way of assigning papers to reviewers. Some suggested that automated assignment, i.e. using TPMS, helps reducing biases at scale: this year MM followed the review model of CVPR, which handled 1,000+ submissions and peer reviews. Other participants observed that automated assignment systems often fail in matching papers with the right reviewers. This is mainly due to the diversity of the Multimedia field: even within a single area, there is a lot of diversity in expertise and methodologies. Some participants advocated that the best solution is to have two steps (1) a bidding period where reviewers choose their favorite papers based on the areas of expertise, or, alternatively, an automated assignment step; (2) an “expert assignment” period, where, based on the previous choices, Area Chairs select the right people for a paper: a reviewer pool with relevant complementary expertise.

The authors’ advocate. Most participants agreed that the figure of the author’s advocate is crucial for a fair reviewing process, especially for a diverse community such as the Multimedia community. Most participants agreed that the author’s advocate should be provided in all tracks.

Non-anonymity among reviewers. It was observed that revealing the identity of reviewers to the other members of the program committee (e.g. Area Chairs and other reviewers) could encourage responsiveness and commitments during the review and discussion periods.

Quality over quantity. It was pointed out that increasing the number of reviews per paper is not always the right solution. This adds workload on the reviewers, thus potentially decreasing the quality of their reviews.

Less frequent changes in review process. A few participants discussed about the frequency of changes in the review process in ACM MM. In recent years, the conference organizers have tried different review formats, often inspired by other communities. It was observed that this lack of continuity in the review process might not give the time to evaluate the success of a format, or to measure the quality of the conference overall. Moreover, changes should be communicated and announced well before implemented (and repeatedly because people tend to oversight them) to the authors and the reviewers.

This debate lead to a higher-level discussion about the identity of the MM community. Some participants interpreted these frequent changes in the review process as some kind of identity crisis. It was proposed to use empirical evidence (e. g. a community survey) to analyse exactly what the MM community actually is and how it should evaluate itself. The risk of becoming a second tier conference to CVPR was brought up: not only authors submit to MM rejected papers from CVPR, but also, at times, reviewers are assuming that the MM papers have to be reviewed as CVPR papers, thus potentially losing a lot of interesting papers for the conference.

We would like to thank all participants for their time and precious thoughts. As next step for this column, we might consider making short surveys about specific topics, including the ones discussed in this issue of the SIGMM Records opinion column.

We hope this column will foster fruitful discussions during the conference, which will be held in Seoul, Korea, on 22-26 October 2018.

An interview with Assoc. Prof. Ragnhild Eg

Please describe your journey into research from your youth up to the present. What foundational lessons did you learn from this journey? Why were you initially attracted to multimedia?

In high school, I really had no idea what I wanted to study in university. I liked writing, so I first tried out journalism. I soon discovered that I was too timid for this line of work, and the writing was less creative than I had imagined. So I returned to my favourite subject, psychology. I have always been fascinated by how the human mind works, how we can process all the information that surrounds us – and act on it. This fascination led me from a Bachelor in Australia, back to Norway where I started a Master in cognitive and biological psychology. One of my professors (whom I was lucky to have as a supervisor later) was working on a project on speech perception, and I still remember the first example she used to demonstrate how what we see can alter what we hear. I am delighted that I still encounter new examples of how multi-sensory processes can trick us. Most of all, I am interested by how these complex processes happen naturally, beyond our consciousness. And that is also what interests me in multimedia, how is it that we perceive information conveyed by digital systems in much the same way we perceive information from the physical world? And when we do not perceive it in the same way, what is causing the discrepancy?

My personal lessons are not to let a chosen path lead you in a direction you do not want to go. Moreover, not all of us are driven by a grand master plan. I am very much driven by impulses and curiosity, and this has led me to a line of work where curiosity is an asset.

Ragnhild Eg at the begin of hear research career in 2011

Ragnhild Eg at the beginning of her research career in 2011.

Tell us more about your vision and objectives behind your current roles? What do you hope to accomplish and how will you bring this about?

I currently work at a university college, where I have the opportunity to combine two passions: teaching and research. I wish to continue with both, so my vision relates to my research progression. My objective is pretty basic, I wish to broaden the scope of my research to include more perspectives on human perception. To do that, I want to start with new collaborations that can lead to long-term projects. As mentioned, I often let curiosity guide me, and I do not intend to stop doing just that.

Can you profile your current research, its challenges, opportunities, and implications?

In later years, my research scope has extended from perception of multimedia content to human-computer interactions, and further on to individual factors. Although we investigate perceptual processes in the context of computer systems’ limitations, our original approach was to generalise across a population. Yet, the question of how universal perceptual processes can differ so much between individuals has become more and more intriguing.

How would you describe the role of women especially in the field of multimedia?

I have a love-hate relationship when it comes to stereotypes. Not only are they unavoidable, they are essential for us to process information. Moreover, it can be quite amusing to apply characteristics to stereotypes. On the other hand, stereotypes contribute to preserve, and even strengthen, certain conceptions about individuals. On the topic of women in multimedia, I find it important because we are a minority and I believe any community benefits from diversity. However, I find it difficult to describe our role without falling back on stereotypical gender traits.

How would you describe your top innovative achievements in terms of the problems you were trying to solve, your solutions, and the impact it has today and into the future?

The path that led me to multimedia research started with my studies in psychology, so I came into the field with a different outlook. I use my theoretical knowledge about human cognition and perception, and my experience with psychological research methods, to tackle multimedia challenges. For instance, designing behavioural studies with experimental controls and validity checks. Perhaps not innovative, my first approach to study the perception of multimedia quality was to avoid addressing quality, and rather control it as an experimental factor. Instead, I explored variations in perceptual integration, across different quality levels. Interestingly, I see more and more knowledge introduced from psychology and neuroscience to multimedia research. I regard these cross-overs as an indication that multimedia research has come to be an established field with versatile research methods, and I look forward to seeing what insights come out of it.

Over your distinguished career, what are your top lessons you want to share with the audience?

When I started my PhD, I came into a research environment dominated by computer science. The transition went far smoother than I had imagined, mostly due to open-minded and welcoming colleagues. Yet, working with inter-disciplinary research will lead to encounters where you do not understand the contributions of others, and they may not understand yours. Have respect for the knowledge and expertise others bring with them, and expect the same respect for your own strengths. This type of collaboration can be demanding, but can also bring about the most interesting questions and results.

Another lesson I want to share, is perhaps one that can only come through personal experience. I enjoy collaborating on research projects, but being a researcher also requires a great deal of autonomy. Only at the end of the first year did I realise that no one could tell me what should be the focus of my PhD, even though I was expected to contribute to a larger project. Research is not constrained by clear boundaries, and I believe a researcher must be able to apply their own curiosity even when external forces seem to enforce limits.

Ragnhild Eg in 2018.

Ragnhild Eg in 2018.

If you were conducting this interview, what questions would you ask, and then what would be your answers?

I would ask what is the best joke you know! And my answer would undoubtedly be a knock-knock joke. 
Editor’s note: Officially added to the standard questionnaire!

What is the best joke you know? 🙂

Knock knock

– Who’s there?

A little old lady

– A little old lady who?

Wow, I had no idea you could yodel! 


Bios

Assoc. Prof. Ragnhild Eg: 

Ragnhild Eg is an associate professor at Kristiania University College, where she combines her background and interests in psychology with research and education. She teaches psychology and ethics, and pursue research interests spanning from perception and the effects of technological constraints, to the consequences of online media consumption.

Michael Alexander Riegler: 

Michael is a scientific researcher at Simula Research Laboratory. His research interests are medical multimedia data analysis and understanding, image processing, image retrieval, parallel processing, crowdsourcing, social computing and user intent.