SIGMM Technical Achievement Award — Call for nominations

SIGMM Technical Achievement Award

for Outstanding Technical Contributions to Multimedia Computing, Communications and Applications

AWARD DESCRIPTION

This award is presented every year to a researcher who has made significant and lasting contributions to multimedia computing, communication and applications. Outstanding technical contributions through research and practice are recognized. Towards this goal, contributions are considered from academia and industry that focus on major advances in multimedia including multimedia processing, multimedia content analysis, multimedia systems, multimedia network protocols and services, and multimedia applications and interfaces. The award recognizes members of the community for long-term technical accomplishments or those who have made a notable impact through a significant technical innovation. The selection committee focuses on candidates’ contributions as judged by innovative ideas, influence in the community, and/or the technical/social impact resulting from their work. The award includes a $2000 honorarium , an award certificate of recognition, and an invitation for the recipient to present a keynote talk at a current year’s SIGMM-sponsored conference, the ACM International Conference on Multimedia (ACM Multimedia). Travel expenses to the conference will be covered by SIGMM, and a public citation for the award will be placed on the SIGMM website.

FUNDING

The award honorarium, the award certificate of recognition and travel expenses to the ACM International Conference on Multimedia is fully sponsored by the SIGMM budget.

NOMINATION PROCESS

Nominations are solicited by*May 31, 2016*with a decision made by*July 30 2016*, in time to allow the above recognition and award presentation at ACM Multimedia 2016. Nominations for the award must include:

  • A statement summarizing the candidate’s accomplishments, description of the significance of the work and justification of the nomination (two pages maximum);
  • Curriculum Vitae of the nominee;
  • Three endorsement letters supporting the nomination including the significant contributions of the candidate. Each endorsement should be no longer than 500 words with clear specification of the nominee’s contributions and impact on the multimedia field;
  • A concise statement (one sentence) of the achievement(s) for which the award is being given. This statement will appear on the award certificate and on the website.

The nomination rules are: The nominee can be any member of the scientific community.

  • The nominator must be a SIGMM member.
  • No self-nomination is allowed.
  • Nominations that do not result in an award will be valid for two further years. After three years a revised nomination can be resubmitted.
  • The SIGMM elected officers as well as members of the Awards Selection Committee are not eligible.

Please submit your nomination to the award committee by email.

Committee

PREVIOUS RECIPIENTS

  • 2015: Tat-Seng Chua (for pioneering contributions to multimedia, text and social media processing).
  • 2014: Klara Nahrstedt (for pioneering contributions in Quality of Service for MM systems and networking and for visionary leadership of the MM community).
  • 2013: Dick Bulterman (for outstanding technical contributions in multimedia authoring through research, standardization, and entrepreneurship).
  • 2012: Hong-Jiang Zhang (for pioneering contributions to and leadership in media computing including content-based media analysis and retrieval, and their applications).
  • 2011: Shi-Fu Chang (for pioneering research and inspiring contributions in multimedia analysis and retrieval).
  • 2010: Ramesh Jain (for pioneering research and inspiring leadership that transformed multimedia information processing to enhance the quality of life and visionary leadership of the multimedia community).
  • 2009: Lawrence A. Rowe (for pioneering research in continuous media software systems and visionary leadership of the multimedia research community).
  • 2008: Ralf Steinmetz (for pioneering work in multimedia communications and the fundamentals of multimedia synchronization).

SIGMM Rising Star Award — Call for nominations

SIGMM Rising Star Award

AWARD DESCRIPTION

Since 2014, ACM SIGMM presents a “Rising Star” Award annually, recognizing a young researcher – an individual either no older than 35 or within 7 years of PhD – who has made outstanding research contributions to the field of multimedia computing, communication and applications during this early part of his or her career. Depth, impact, and novelty of the researcher’s contributions will be key criteria upon which the Rising Star award committee will evaluate the nominees. Also of particular interest are strong research contributions made independently from the nominee’s PhD advisor. The award includes a $1000 honorarium, an award certificate of recognition, and an invitation for the recipient to present a keynote talk at a current year’s SIGMM-sponsored conference, the ACM International Conference on Multimedia (ACM Multimedia). Travel expenses to the conference will be covered by SIGMM, and a public citation for the award will be placed on the SIGMM website.

FUNDING

The award honorarium, the award certificate of recognition and travel expenses to the ACM International Conference on Multimedia is fully sponsored by the SIGMM budget.

NOMINATION PROCESS

Nominations are solicited byJune 15, 2016with decision made by July 30 2016, in time to allow the above recognition and award presentation at ACM Multimedia 2016. The nomination rules are:

  • A nominee must be either 35 years of age or younger as of December 31 of the year in which the award would be made, or at most 7 years have passed since his/her PhD degree as of December 31 of the year in which the award would be made.
  • The nominee can be any member of the scientific community.
  • The nominator must be a SIGMM member.
  • No self-nomination is allowed.
  • Nominations that do not result in an award will remain in consideration for up to two years if the candidate still meets the criteria with regard to age or PhD award (i.e. no older than 35 or within 7 years of PhD). Afterwards, a new nomination must be submitted.
  • The SIGMM elected officers as well as members of the Awards Selection Committee are not eligible.

Material to be included in the nomination:

  1. Curriculum Vitae, including publications, of nominee.
  2. A letter from the nominator (maximum two pages) documenting the nominee’s research accomplishments as well as justifying the nomination, the significance of the work, and the nominee’s role in the work.
  3. A maximum of 3 endorsement letters of recommendation from others which identify the rationale for the nomination and by what means the recommender knows of the nominee’s work.
  4. A concise statement (one sentence) of the achievement(s) for which the award is being given. This statement will appear on the award certificate and on the website.

Please submit your nomination to the award committee by email.

SIGMM Rising Star Award Committee (2016)

  • Klara Nahrstedt (klara@illinois.edu <mailto:klara@illinois.edu>)
  • Dick Bulterman (Dick.Bulterman@fxpal.com <mailto:Dick.Bulterman@fxpal.com>)
  • Tat-Seng Chua (chuats@comp.nus.edu.sg <mailto:chuats@comp.nus.edu.sg>)
  • Susanne Boll (susanne.boll@informatik.uni-oldenburg.de <mailto:susanne.boll@informatik.uni-oldenburg.de>)
  • Nicu Sebe (nicusebe@gmail.com <mailto:nicusebe@gmail.com>)
  • Shih-Fu Chang (shih.fu.chang@columbia.edu <mailto:shih.fu.chang@columbia.edu>)
  • Rainer Lienhart (rainer.lienhart@informatik.uni-augsburg.de <mailto:rainer.lienhart@informatik.uni-augsburg.de>) (CHAIR)

SIGMM PhD Thesis Award — Call for nominations

SIGMM Award for Outstanding PhD Thesis in Multimedia Computing, Communications and Applications

Award Description

This award will be presented at most once per year to a researcher whose PhD thesis has the potential of very high impact in multimedia computing, communication and applications, or gives direct evidence of such impact. A selection committee will evaluate contributions towards advances in multimedia including multimedia processing, multimedia systems, multimedia network services, multimedia applications and interfaces. The award will recognize members of the SIGMM community and their research contributions in their PhD theses as well as the potential of impact of their PhD theses in multimedia area. The selection committee will focus on candidates’ contributions as judged by innovative ideas and potential impact resulting from their PhD work. The award includes a US$500 honorarium, an award certificate of recognition, and an invitation for the recipient to receive the award at a current year’s SIGMM-sponsored conference, the ACM International Conference on Multimedia (ACM Multimedia). A public citation for the award will be placed on the SIGMM website, in the SIGMM Records e-newsletter as well as in the ACM e-newsletter.

Funding

The award honorarium, the award plaque of recognition and travel expenses to the ACM International Conference on Multimedia will be fully sponsored by the SIGMM budget.

Nomination Applications

Nominations will be solicited by the 31^st May 2016with an award decision to be made by August 30. This timing will allow a recipient to prepare for an award presentation at ACM Multimedia in that Fall (October/November). The initial nomination for a PhD thesis must relate to a dissertation deposited at the nominee’s Academic Institution between January and December of the year previous to the nomination. As discussed below, /some dissertations may be held for up to three years by the selection committee for reconsideration/. If the original thesis is not in English, a full English translation must be provided with the submission. Nominations for the award must include:

  1. PhD thesis (upload at: https://cmt.research.microsoft.com/SIGMMA2016/ )
  2. A statement summarizing the candidate’s PhD thesis contributions and potential impact, and justification of the nomination (two pages maximum);
  3. Curriculum Vitae of the nominee
  4. Three endorsement letters supporting the nomination including the significant PhD thesis contributions of the candidate. Each endorsement should be no longer than 500 words with clear specification of nominee PhD thesis contributions and potential impact on the multimedia field.
  5. A concise statement (one sentence) of the PhD thesis contribution for which the award is being given. This statement will appear on the award certificate and on the website.

The nomination rules are:

  1. The nominee can be any member of the scientific community.
  2. The nominator must be a SIGMM member.
  3. No self-nomination is allowed.

If a particular thesis is considered to be of exceptional merit but not selected for the award in a given year, the selection committee (at its sole discretion) may elect to retain the submission for consideration in at most two following years. The candidate will be invited to resubmit his/her work in these years. A thesis is considered to be outstanding if:

  1. Theoretical contributions are significant and application to multimedia is demonstrated.
  2. Applications to multimedia is outstanding, techniques are backed by solid theory with clear demonstration that algorithms can be applied in new domains –  e.g., algorithms must be demonstrably scalable in application in terms of robustness, convergence and complexity.

The submission process of nominations will be preceded by the call for nominations. The call of nominations will be widely publicized by the SIGMM awards committee and by the SIGMM Executive Board at the different SIGMM venues, such as during the SIGMM premier ACM Multimedia conference (at the SIGMM Business Meeting) on the SIGMM web site, via SIGMM mailing list, and via SIGMM e-newsletter between September and December of the previous year.

Submission Process

  • Register an account at https://cmt.research.microsoft.com/SIGMMA2016/ <https://cmt.research.microsoft.com/SIGMMA2015/>  and upload one copy of the nominated PhD thesis. The nominee will receive a Paper ID after the submission.
  • The nominator must then collate other materials detailed in the previous section and upload them as supplementary materials, /*except*/**/the endorsement letters, which must be emailed separately as detailed below/.
  • Contact your referees and ask them to send all endorsement letters to sigmmaward@gmail.com <mailto:sigmmaward@gmail.com> with the title: “PhD Thesis Award Endorsement Letter for [YourName]”. The web administrator will acknowledge the receipt and the submission CMT website will reflect the status of uploaded documents and endorsement letters.

It is the responsibility of the nominator to follow the process and make sure documentation is complete. Thesis with incomplete documentation will be considered invalid.

Chair of Selection Committee

Prof. Roger Zimmermann (rogerz@comp.nus.edu.sg) from National University of Singapore, Singapore

The Menpo Project

Overview

logoThe Menpo Project [1] is a BSD-licensed set of tools and software designed to provide an end-to-end pipeline for collection and annotation of image and 3D mesh data. In particular, the Menpo Project provides tools for annotating images and meshes with a sparse set of fiducial markers that we refer to as landmarks. For example, Figure 1 shows an example of a face image that has been annotated with 68 2D landmarks. These landmarks are useful in a variety of areas in Computer Vision and Machine Learning including object detection, deformable modelling and tracking. The Menpo Project aims to enable researchers, practitioners and students to easily annotate new data sources and to investigate existing datasets. Of most interest to the Computer Vision is the fact that The Menpo Project contains completely open source implementations of a number of state-of-the-art algorithms for face detection and deformable model building.

Figure 1. A facial image annotated wih 68 sparse landmarks.

Figure 1. A facial image annotated wih 68 sparse landmarks.

In the Menpo Project, we are actively developing and contributing to the state-of-the-art in deformable modelling [2], [3], [4], [5]. Characteristic examples of widely used state-of-the-art deformable model algorithms are Active Appearance Models [6],[7], Constrained Local Models [8], [9] and Supervised Descent Method [10]. However, there is still a noteworthy lack of high quality open source software in this area. Most existing packages are encrypted, compiled, non-maintained, partly documented, badly structured or difficult to modify. This makes them unsuitable for adoption in cutting edge scientific research. Consequently, research becomes even more difficult since performing a fair comparison between existing methods is, in most cases, infeasible. For this reason, we believe the Menpo Project represents an important contribution towards open science in the area of deformable modelling. We also believe it is important for deformable modelling to move beyond the established area of facial annotations and to extend to a wide variety of deformable object classes. We hope Menpo can accelerate this progress by providing all of our tools completely free and permissively licensed.

Project Structure

The core functionality provided by the Menpo Project revolves around a powerful and flexible cross-platform framework written in Python. This framework has a number of subpackages, all of which rely on a core package called menpo. The specialised subpackages are all based on top of menpo and provide state-of-the-art Computer Vision algorithms in a variety of areas (menpofit, menpodetect, menpo3d, menpowidgets).

  • menpo – This is a general purpose package that is designed from the ground up to make importing, manipulating and visualising image and mesh data as simple as possible. In particular, we focus on data that has been annotated with a set of sparse landmarks. This form of data is common within the fields of Machine Learning and Computer Vision and is a prerequisite for constructing deformable models. All menpo core types are Landmarkable and visualising these landmarks is a primary concern of the menpo library. Since landmarks are first class citizens within menpo, it makes tasks like masking images, cropping images within the bounds of a set of landmarks, spatially transforming landmarks, extracting patches around landmarks and aligning images simple. The menpo package has been downloaded more than 3000 times and we believe it is useful to a broad range of computer scientists.
  • menpofit – This package provides all the necessary tools for training and fitting a large variety of state-of-the-art deformable models under a unified framework. The methods can be roughly split in three categories:

    1. Generative Models: This category includes implementations of all variants of the Lucas-Kanade alignment algorithm [6], [11], [2], Active Appearance Models [7], [12], [13], [2], [3] and other generative models [14], [4], [5].
    2. Discriminative Models: The models of this category are Constrained Local Models [8] and other closely related techniques [9].
    3. Regression-based Techniques: This category includes the commonly-used Supervised Descent Method [10] and other state-of-the-art techniques [15], [16], [17].

    The menpofit package has been downloaded more than 1000 times.

  • menpodetect – This package contains methodologies for performing generic object detection in terms of a bounding box. Herein, we do not attempt to implement novel techniques, but instead wrap existing projects so that they integrate natively with menpo. The current wrapped libraries are DLib, OpenCV, Pico and ffld2.

  • menpo3d – Provides useful tools for importing, visualising and transforming 3D data. menpo3d also provides a simple OpenGL rasteriser for generating depth maps from mesh data.

  • menpowidgets – Package that includes Jupyter widgets for ‘fancy’ visualization of menpo objects. It provides user friendly, aesthetically pleasing, interactive widgets for visualising images, pointclouds, landmarks, trained models and fitting results.

The Menpo Project is primarily written in Python. The use of Python was motivated by its free availability on all platforms, unlike its major competitor in Computer Vision, Matlab. We believe this is important for reproducible open science. Python provides a flexible environment for performing research, and recent innovations such as the Jupyter notebook have made it incredibly simple to provide documentation via examples. The vast majority of the execution time in Menpo is actually spent in highly efficient numerical libraries and bespoke C++ code, allowing us to achieve sufficient performance for real time facial point tracking whilst not compromising on the flexibility that the Menpo Project offers.

Note the Menpo Project has benefited enormously from the wealth of scientific software available with the Python ecosystem! The Menpo Project borrows from the best of the scientific software community wherever possible (e.g. scikit-learn, matplotlib, scikit-image, PIL, VLFeat, Conda) and the Menpo team have contributed patches back to many of these projects.

Getting Started

We, as the Menpo team, are firm believers in making installation as simple as possible. The Menpo Project is designed to provide a suite of tools to solve a complex problem and therefore has a complex set of 3rd party library dependencies. The default Python packing environment does not make this an easy task. Therefore, we evangelise the use of the Conda ecosystem. In our website, we provide detailed step-by-step instructions on how to install Conda and then Menpo on all platforms (Windows, OS X, Linux) (please see http://www.menpo.org/installation/). Once the conda environment has been set up, installing each of the various Menpo libraries can be done with a single command, as:

$ source activate menpo
(menpo) $ conda install -c menpo menpofit
(menpo) $ conda install -c menpo menpo3d
(menpo) $ conda install -c menpo menpodetect

As part of the project, we maintain a set of Jupyter notebooks that help illustrate how Menpo should be used. The notebooks for each of the core Menpo libraries are kept inside their own repositories on our Github page, i.e. menpo/menpo-notebooks, menpo/menpofit-notebooks and menpo/menpo3d-notebooks. If you wish to view the static output of the notebooks, feel free to browse them online following these links: menpo, menpofit and menpo3d. This gives a great way to passively read the notebooks without needing a full Python environment. Note that these copies of the notebook are tied to the latest development release of our packages and contain only static output and thus cannot be run directly – to execute them you need to download them, install Menpo, and open the notebook in Jupyter.

Usage Example

Let us present a simple example that illustrates how easy it is to manipulate data and train deformable models using Menpo. In this example, we use annotated data to train an Active Appearance Model (AAM) for faces. This procedure involves four steps:

  1. Loading annotated training images
  2. Training a model
  3. Selecting a fitting algorithm
  4. Fitting the model to a test image

Firstly, we will load a set of images along with their annotations and visualize them using a widget. In order to save memory, we will crop the images and convert them to greyscale. For an example set of images, feel free to download the images and annotatons provided by [18] from here. Assuming that all the image and PTS annotation files are located in /path/to/images, this can be easily done as:

import menpo.io as mio
from menpowidgets import visualize_images

images = []
for i in mio.import_images('/path/to/images', verbose=True):
    i = i.crop_to_landmarks_proportion(0.1)
    if i.n_channels == 3:
        i = i.as_greyscale()
    images.append(i)

visualize_images(images) # widget for visualising the images and their landmarks

An example of the visualize_images widget is shown in Figure 2.

Figure 2. Visualising images inside Menpo is highly customizable (within a Jupyter notebook)

Figure 2. Visualising images inside Menpo is highly customizable (within a Jupyter notebook)

The second step involves training the Active Appearance Model (AAM) and visualising using an interactive widget. Note that we use Image Gradients Orientations [13], [11] features to help improve the performance of the generic AAM we are constructing. An example of the output of the widget is shown in Figure 3.

from menpofit.aam import HolisticAAM
from menpo.feature import igo

aam = HolisticAAM(images, holistic_features=igo, verbose=True)

print(aam) # print information regarding the model
aam.view_aam_widget() # visualize aam with an interactive widget

Figure 3. Many of the base Menpo classes provide visualisation widgets that allow simple data exploration of the created models. For example, this widget shows the joint texture and shape model of the previously created AAM.

Figure 3. Many of the base Menpo classes provide visualisation widgets that allow simple data exploration of the created models. For example, this widget shows the joint texture and shape model of the previously created AAM.

Next, we need to create a Fitter object for which we specify the Lucas-Kanade algorithm to be used, as well as the number of shape and appearance PCA components.

from menpofit.aam import LucasKanadeAAMFitter

fitter = LucasKanadeAAMFitter(aam, n_shape=[5, 15], n_appearance=0.6)

Assuming that we have a test_image and an initial bounding_box, the fitting can be executed and visualized with a simple command as:

from menpowidgets import visualize_fitting_result

fitting_result = fitter.fit_from_bb(test_image, bounding_box)
visualize_fitting_result(fitting_result) # interactive widget to inspect a fitting result

An example of the visualize_fitting_result widget is shown in Figure 4.

Now we are ready to fit the AAM to a set of test_images. The fitting process needs to be initialized with a bounding box, which we retrieve using the DLib face detector that is provided by menpodetect. Assuming that we have imported the test_images in the same way as shown in the first step, the fitting is as simple as:

from menpodetect import load_dlib_frontal_face_detector

detector = load_dlib_frontal_face_detector() # load face detector

fitting_resutls = []
for i, img in enumerate(test_images):
    # detect face's bounding box(es)
    bboxes = detector(img)

    # if at least one bbox is returned
    if bboxes:
        # groundtruth shape is ONLY useful for error calculation
        groundtruth_shape = img.landmarks['PTS'].lms
        # fit
        fitting_result = fitter.fit_from_bb(img, bounding_box=bboxes[0],
                                            gt_shape=groundtruth_shape)
        fitting_resutls.append(fitting_result)

visualize_fitting_result(fitting_results) # visualize all fitting results

Figure 4. Once fitting is complete, Menpo provides a customizable widget that shows the progress of fitting a particular image.

Figure 4. Once fitting is complete, Menpo provides a customizable widget that shows the progress of fitting a particular image.

Web Based Landmarker

URL: https://www.landmarker.io/

landmarker.io is a web application for annotating 2D and 3D data, initially developed by the Menpo Team and then heavily modernised by Charles Lirsac. It has no dependencies beyond a modern web browser and is designed to be simple and intuitive to use. It has several exciting features such as Dropbox support, snap mode (Figure 6) and easy integration with the core types provided by the Menpo Project. Apart from the Dropbox mode, it also supports a server mode, in which the annotations and assets themselves are served to the client from a separate server component which is run by the user. This allows researches to benefit from the web-based nature of the tool without having to compromise privacy or security. The server utilises Menpo to import assets and save out annotations. An example screenshot is given in Figure 5.

The application is designed in such a way to allow for efficient manual annotation. The user can also annotate any object class and define their own template of landmark labels. Most importantly, the decentralisation of the landmarking software means that researchers can recruit annotators by simply directing them to the website. We strongly believe that this is a great advantage that can aid towards acquiring large databases of correctly annotated images for various object classes. In the near future, the tool will support a semi-assisted annotation procedure, for which Menpo will be used to provide initial estimations of the correct points for the images and meshes of interest.

Figure 5. The landmarker provides a number of methods of importing assets, including from Dropbox and a custom Menpo server.

Figure 5. The landmarker provides a number of methods of importing assets, including from Dropbox and a custom Menpo server.

Figure 6. The landmarker provides an intuitive snap mode that enables the user to efficiently edit a set of existing landmarks.

Figure 6. The landmarker provides an intuitive snap mode that enables the user to efficiently edit a set of existing landmarks.

[/caption]

Conclusion and Future Work

The research field of rigid and non-rigid object alignment lacks of high-quality open source software packages. Most researchers release code that is not easily re-usable, which further makes it difficult to compare existing techniques in a fair and unified way. Menpo aims to fill this gap and give solutions to these problems. We put a lot of effort on making Menpo a solid platform from which researchers of any level can benefit. Note that Menpo is a rapidly changing set of software packages that attempts to keep track of the recent advances in the field. In the future, we aim to add even more state-of-the-art techniques and increase our support for 3D deformable models [19]. Finally, we plan to develop a separate benchmark package that will standarize the way comparisons between various methods are performed.

Note that by the time this article was released, the versions of the Menpo packages were as follows:

Package Version
menpo 0.6.02
menpofit 0.3.02
menpo3d 0.2.0
menpodetect 0.3.02
menpowidgets 0.1.0
landmarker.io 0.2.1

If you have any questions regarding Menpo, please let us know on the menpo-users mailing list.

References

[1] J. Alabort-i-Medina, E. Antonakos, J. Booth, P. Snape, and S. Zafeiriou, “Menpo: A comprehensive platform for parametric image alignment and visual deformable models,” in Proceedings Of The ACM International Conference On Multimedia, 2014, pp. 679–682. http://doi.acm.org/10.1145/2647868.2654890

[2] E. Antonakos, J. Alabort-i-Medina, G. Tzimiropoulos, and S. Zafeiriou, “Feature-based lucas-kanade and active appearance models,” Image Processing, IEEE Transactions on, 2015. http://dx.doi.org/10.1109/TIP.2015.2431445

[3] J. Alabort-i-Medina and S. Zafeiriou, “Bayesian active appearance models,” in Computer Vision And Pattern Recognition (CVPR), 2014 IEEE Conference On, 2014, pp. 3438–3445. http://dx.doi.org/10.1109/CVPR.2014.439

[4] J. Alabort-i-Medina and S. Zafeiriou, “Unifying holistic and parts-based deformable model fitting,” in Computer Vision And Pattern Recognition (CVPR), 2015 IEEE Conference On, 2015, pp. 3679–3688. http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Alabort-i-Medina_Unifying_Holistic_and_2015_CVPR_paper.pdf

[5] E. Antonakos, J. Alabort-i-Medina, and S. Zafeiriou, “Active pictorial structures,” in Computer Vision And Pattern Recognition (CVPR), 2015 IEEE Conference On, 2015, pp. 5435–5444. http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Antonakos_Active_Pictorial_Structures_2015_CVPR_paper.pdf

[6] S. Baker and I. Matthews, “Lucas-kanade 20 years on: A unifying framework,” International Journal of Computer Vision, vol. 56, no. 3, pp. 221–255, 2004. http://dx.doi.org/10.1023/B:VISI.0000011205.11775.fd

[7] I. Matthews and S. Baker, “Active appearance models revisited,” International Journal of Computer Vision, vol. 60, no. 2, pp. 135–164, 2004. http://dx.doi.org/10.1023/B:VISI.0000029666.37597.d3

[8] J. M. Saragih, S. Lucey, and J. F. Cohn, “Deformable model fitting by regularized landmark mean-shift,” International Journal of Computer Vision, vol. 91, no. 2, pp. 200–215, 2011. http://dx.doi.org/10.1007/s11263-010-0380-4

[9] A. Asthana, S. Zafeiriou, G. Tzimiropoulos, S. Cheng, and M. Pantic, “From pixels to response maps: Discriminative image filtering for face alignment in the wild,” 2015. http://dx.doi.org/10.1109/TPAMI.2014.2362142

[10] X. Xiong and F. De la Torre, “Supervised descent method and its applications to face alignment,” in Computer Vision And Pattern Recognition (CVPR), 2013 IEEE Conference On, 2013, pp. 532–539. http://dx.doi.org/10.1109/CVPR.2013.75

[11] G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, “Robust and efficient parametric face alignment,” in Computer Vision (ICCV), 2011 IEEE International Conference On, 2011, pp. 1847–1854. http://dx.doi.org/10.1109/ICCV.2011.6126452

[12] G. Papandreou and P. Maragos, “Adaptive and constrained algorithms for inverse compositional active appearance model fitting,” in Computer Vision And Pattern Recognition (CVPR), 2008 IEEE Conference On, 2008, pp. 1–8. http://dx.doi.org/10.1109/CVPR.2008.4587540

[13] G. Tzimiropoulos, J. Alabort-i-Medina, S. Zafeiriou, and M. Pantic, “Active orientation models for face alignment in-the-wild,” Information Forensics and Security, IEEE Transactions on, vol. 9, no. 12, pp. 2024–2034, 2014. http://dx.doi.org/10.1109/TIFS.2014.2361018

[14] G. Tzimiropoulos and M. Pantic, “Gauss-newton deformable part models for face alignment in-the-wild,” in Computer Vision And Pattern Recognition (CVPR), 2014 IEEE Conference On, 2014, pp. 1851–1858. http://dx.doi.org/10.1109/CVPR.2014.239

[15] A. Asthana, S. Zafeiriou, S. Cheng, and M. Pantic, “Incremental face alignment in the wild,” in Computer Vision And Pattern Recognition (CVPR), 2014 IEEE Conference On, 2014, pp. 1859–1866. http://dx.doi.org/10.1109/CVPR.2014.240

[16] V. Kazemi and J. Sullivan, “One millisecond face alignment with an ensemble of regression trees,” in Computer Vision And Pattern Recognition (CVPR), 2014 IEEE Conference On, 2014, pp. 1867–1874. http://dx.doi.org/10.1109/CVPR.2014.241

[17] G. Tzimiropoulos, “Project-out cascaded regression with an application to face alignment,” in Computer Vision And Pattern Recognition (CVPR), 2015 IEEE Conference On, 2015, pp. 3659–3667. http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Tzimiropoulos_Project-Out_Cascaded_Regression_2015_CVPR_paper.pdf

[18] C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, “300 faces in-the-wild challenge: The first facial landmark localization challenge,” in Computer Vision Workshops (ICCVW), 2013 IEEE International Conference On, 2013, pp. 397–403. http://www.cv-foundation.org/openaccess/content_iccv_workshops_2013/W11/papers/Sagonas_300_Faces_in-the-Wild_2013_ICCV_paper.pdf

[19] V. Blanz and T. Vetter, “A morphable model for the synthesis of 3D faces,” in Proceedings Of The 26th Annual Conference On Computer Graphics And Interactive Techniques, 1999, pp. 187–194. http://dx.doi.org/10.1145/311535.311556


  1. Alphabetical author order signifies equal contribution

  2. Currently unreleased – the next released versions of menpo, menpofit and menpodetect will reflect these version numbers. All samples were written using the current development versions.

MediaEval 2016 Multimedia Benchmark: Call for Feedback and Participation

Each year, the Benchmarking Initiative for Multimedia Evaluation (MediaEval) offers challenges to the multimedia research community in the form of shared tasks. MediaEval tasks place their focus on the human and social aspects of multimedia. We are interested in how multimedia content can be used to produce knowledge and to create algorithms that support people in their daily lives. Many tasks are related to how people understand multimedia content, how they react to it, and how they use it. We emphasize the “multi” in multimedia: speech, audio, music, visual content, tags, users, and context. MediaEval attracts researchers with backgrounds in diverse areas, including multimedia content analysis, information retrieval, speech technology, computer vision, music information retrieval, social computing, recommender systems.

WSICC at ACM TVX’16

WSICC has established itself as a truly interactive workshop at EuroITV’13, TVX’14, and TVX’15 with three successful editions. The fourth edition of the WSICC workshop aims to bring together researchers and practitioners working on novel approaches for interactive multimedia content consumption. New technologies, devices, media formats, and consumption paradigms are emerging that allow for new types of interactivity. Examples include multi-panoramic video and object-based audio, increasingly available in live scenarios with content feeds from a multitude of sources. All these recent advances have an impact on different aspects related to interactive content consumption, which the workshop categorizes into Enabling Technologies, Content, User Experience, and User Interaction.

Report from the MMM Special Session Perspectives on Multimedia Analytics

This report summarizes the presentations and discussions of the special session entitled “Perspectives on Multimedia Analytics” at MMM 2016, which was held in Miami, Florida on January 6, 2016. The special session consisted of four brief paper presentations, followed by a panel discussion with questions from the audience. The session was organized by Björn Þór Jónsson and Cathal Gurrin, and chaired and moderated by Klaus Schoeffmann. The goal of this report is to record the conclusions of the special session, in the hope that it may serve members of our community who are interested in Multimedia Analytics.

Presentations

Alan Smeaton opens the discussion.  From the left: Klaus Schoeffmann (moderator), Alan Smeaton, Björn Þór Jónsson, Guillaume Gravier and Graham Healy.

Alan Smeaton opens the discussion. From the left: Klaus Shoefmann (moderator), Alan Smeaton, Björn Þór Jónsson, Guillaume Gravier and Graham Healy.

Firstly, Alan Smeaton presented an analysis of time-series-based recognition of semantic concepts [1]. He argued that while concept recognition in visual multimedia is typically based on simple concepts, there is a need to recognise semantic concepts which have a temporal as- pect corresponding to activities or complex events. Furthermore, he argued that while various results are reported in the literature, there are research questions which remain unanswered, such as: “What concept detection accuracies are satisfactory for higher-level recognition?” and “Can recognition methods perform equally well across various concept detection performances?” Results suggested that, although improving concept detection accuracies can en- hance the recognition of time series based concepts, concept detection does not need to be very accurate in order to characterize the dynamic evolution of time series if appropriate methods are used. In other words, even if semantic concept detectors still have low accuracy, it makes a lot of sense to apply them to temporally adjacent shots/frames in video in order to detect semantic events from them.

Secondly, Björn Þór Jónsson presented ten research questions for scalable multimedia analytics [2]. He argued that the scale and complexity of multimedia collections is ever increasing, as is the desire to harvest useful insight from the collections. To optimally support the complex quest for insight, multimedia analytics has emerged as a new research area that combines concepts and techniques from multimedia analysis and visual analytics into a single framework. Björn argued further, however, that state-of-the-art database management solutions are not yet designed for multimedia analytics workloads, and that research is therefore required into scalable multimedia analytics, built on the three underlying pillars of visual analytics, multimedia analysis and database management. Björn then proposed ten specific research questions to address in this area.

Third, Guillaume Gravier presented a study of the needs and expectations of media professionals for multimedia analytics solutions [3]. The goal of the work was to help clarifying what multimedia analytics encompasses by studying users expectations. They focused on a very specific family of applications for search and navigation of broadcast and social news content. Using extensive conversations with media professionals, using mock-up interfaces and human-centered design methodology, they analyze the perceived usefulness of a number of functionalities leveraging existing or upcoming technology. Based on the results, Guillaume proposed a defintion of research directions for (multi)media analytics.

Graham Healy gives the final presentation of the session. Sitting, from the left: Klaus Shoefmann (moderator), Alan Smeaton, Björn Þór Jónsson, and Guillaume Gravier.

Graham Healy gives the final presentation of the session. Sitting, from the left: Klaus Schoeffmann (moderator), Alan Smeaton, Björn Þór Jónsson, and Guillaume Gravier.

Finally, Graham Healy presented an analysis of human annotation quality using neural signals such as electroencephalography (EEG) [4]. They explored how neurophysiological signals correlate to attention and perception, in order to better understand the image-annotation task. Results indicated potential issues with “how well” a person manually annotates images and variability across annotators. They propose that such issues may arise in part as a result of subjectively interpretable instructions that may fail to elicit similar labelling behaviours and decision thresholds across participants. In particular, they found instances where an individual’s annotations differed from a group consensus, even though their EEG signals indicated that they were likely in consensus. Finally, Graham discussed the potential implications of the work for annotation tasks and crowd-sourcing in the future.

Discussions

Firstly, a question was asked about a definition for multi- media analytics, and its relationship to multimedia analysis. Björn proposed the following definition of the main goal of scalable multimedia analytics: “. . . to produce the processes, techniques and tools to allow many diverse users to efficiently and effectively analyze large and dynamic multimedia collections over a long period of time to gain insight and knowledge” [2]. Guillaume, on the other hand, proposed that multimedia analytics could be defined as: “. . . the process of organizing multimedia data collections and providing tools to extract knowledge, gain insight and help make decisions by interacting with the data” [3]. Finally, Alan also added that in contrast with multimedia analysis, the multimedia analytics user is involved in every stage of the whole process: from media production/capturing, over data inspection, filtering, and structuring, up to the final consumption, visualization, and usage of the media data.

Clearly, all three definitions are largely in agreement, as they focus on the insight and knowledge gained through interaction with data. In addition, the first definition includes scalability aspects, such as the number of analysts and the duration of analysis, The speakers agreed that multimedia analysis was mostly concerned with the automatic analysis of media content, while media interaction is definitely an important aspect to consider in multimedia analytics. Björn even proposed that users might be more satisfied with a system that takes a few iterations of user interaction to reach a conclusion, than with a system that takes a somewhat shorter time to reach the same conclusions without any interaction. Guillaume stressed that their work had demonstrated the importance of working with the professional users to get their requirements early. When asked, Graham agreed that using neural sen- sors could potentially become a weapon in the analyst’s arsenal, helping the analyst to understand what the brain finds interesting.

A question was asked about potential application areas for multimedia analytics. There was general agreement that many and diverse areas could benefit from multimedia analytics techniques. Alan listed a number of application areas, such as: on-line education, lifelogging, surveillance and forensics, medicine and biomedicine, and so on; in fact he struggled to find an area that could not be affected. There was also agreement that many multimedia analytics application areas would need to involve very large quantities of data. As an example, the recent YFCC 100M collection has nearly 100 million images and around 800 thousand videos; yet compared to web-scale collections it is still very small.

A further thread of discussion centered on where to focus research efforts. The works described by Björn and Guillaume already propose some long-term research questions and directions. Based on his experience, Alan proposed that work on improving the quality of a particular concept detector from 95% to 96%, for example, would not have any significant impact, while work on improving the higher-level detection to use more (and more varied) information would be much more productive. Alan was asked in continuation whether researchers working on concept detection should rather focus on more general concepts with higher recall but often low precision (e.g., beach, car, food, etc.) or more specific concepts with low recall but typically higher precision (e.g., Nascar racing tyre, Shushi, United Airlines plane etc.). He answered that none should be particularly preferred but we need to continue work for both types of concepts.

Finally, some questions were posed to the participants about details of their respective works; however these will not be reported here.

Summary

Overall, the conclusion of the discussion is that multimedia analytics should be a very fruitful research area in the future, with diverse application in many areas and for many users. While the finer-grained conclusions of the discussion that we have described above were perhaps not revolutionary, we nevertheless felt it would be a service to the community to write them down in this short report.

The panel format of the special session made the discussion much more lively and interactive than that of a traditional technical session. We would like to thank the presenters and their co-authors for their excellent contributions. The session chairs would also particularly like to thank the moderator, Klaus Schoeffmann, for his contribution to the session, as a good panel moderator is very important for the success of the session.

References

[1] Peng Wang, Lifeng Sun, Shiqiang Yang, and Alan Smeaton. What are the limits to time series based recognition of semantic concepts? In Proc. MMM, Miami, FL, USA, 2016.
[2] Björn Þór Jónsson, Marcel Worring, Jan Zahálka, Stevan Rudinac, and Laurent Amsaleg. Ten research questions for scalable multimedia analytics. In Proc. MMM, Miami, FL, USA, 2016.
[3] Guillaume Gravier, Martin Ragot, Laurent Amsaleg, Rémi Bois, Grégoire Jadi, Éric Jamet, Laura Monceaux, and Pascale Sébillot. Shaping-up multimedia analytics: Needs and expectations of media professionals. In Proc. MMM, Miami, FL, USA, 2016.
[4] Graham Healy, Cathal Gurrin, and Alan Smeaton. Informed perspectives on human annotation using neural signals. In Proc. MMM, Miami, FL, USA, 2016.

SIVA Suite: An Open-Source Framework for Hypervideos

Overview

The SIVA Suite is an open source framework for the creation, playback, and administration of hypervideos. Allowing the definition of complex navigational structures, our hypervideos are well suited for different scenarios. Compared to traditional linear videos, they especially excel in e-learning and training situations (see [1] and [2]), where fitting the teaching material to the needs of the viewer can be crucial. Other fields of application include virtual tours through buildings or cities, sports events, and interactive video stories. The SIVA Suite consists of an authoring tool (SIVA Producer), an HTML5 hypervideo player (SIVA Player), and a Web server (SIVA Server) for user and video management. It has been evaluated in various scenarios with several usability tests and has been improved step-by-step since 2008.

Introduction

The viewer of a traditional video takes a mostly passive role. Traditional videos are linear and cannot provide additional information about objects or scenes. In contrast to traditional linear videos, hypervideos are not only made of a sequence of video scenes. Their essence are alternative storylines, user choices, and additional materials which can be viewed in parallel with the main content as well as a navigational structure facilitating these features. Therefore, special players with extended controls and areas to present the additional information beyond the original content are necessary. The user choices in the video can be made at the selection of the follow-up scene on a button panel, a table of contents, as well as a keyword search.

One of the most advanced tools in this area is Hyper-Hitchcock [3] which can be used for the creation of detail-on-demand hypervideos with one main storyline and entry points for more detailed video explanations. However, an open source version of the software is not available. With new technologies like HTML5, CSS 3, and JavaScript, web-based tools like Klynt [4] emerged. Klynt allows the creation of hypervideos with focus on different media types and provides many useful features but can not be extended or customized due to the proprietary licensing. Finally, with the SIVA Suite we now offer the first customizable open source framework for the creation of hypervideos.

To simplify the creation process, our work focuses on videos as the main content. In the SIVA Producer, video scenes and navigational elements are arranged in a graph, called scene graph, to define the navigational structure of a hypervideo. Annotations offering additional information can be added to single scenes as well as to the whole video. For this purpose, images, texts, pdfs, audios, and even videos may be used. As a supplement to the video structure defined by the scene graph, further navigational elements like a table of contents and a keyword search enable the viewer to easily jump to points of interest.

Hypervideos are created in the SIVA Producer and then uploaded to the SIVA Server. Registered users can then download the video from the server or watch it online. If logging is enabled, user interactions during playback are logged by the player and sent to the database on the SIVA Server. Video administrators can access the logging data and watch different diagrams or export the data for further analysis in a statistics tool. An overview of the system is shown in Figure 1.

overview

Figure 1. SIVA Suite – Overview.


SIVA Producer

Reqirements (recommended): Windows 7 or higher
Installation: executable setup file
License: Eclipse Public License (EPL)

Installation files of the SIVA Producer can be found at https://github.com/SIVAteam/SIVA-Suite/tree/master/producer, an installer of the latest release can be found at https://github.com/SIVAteam/SIVA-Suite/releases.

The SIVA Producer is used for the creation of hypervideos where main video scenes are linked with each other in a scene graph. Each of the scenes may have one or more multimedia annotations. Further navigational structures are a table of contents as well as the definition of keywords which can then be searched for in the player. The GUI was implemented and improved step-by-step since 2008 [5]

First Steps

  1. Create a new project: A new project is created with a wizard. The author can set the appearance of the player as well as functions the player will provide. It is, for example, possible to select a primary and a secondary color, to determine the width of the annotation panel, etc.
  2. Add media files to the project: Media files are imported into the media repository. These may be videos, audios, images, or html files. The Producer uses each media file in its original format during the creation process and only transforms it during the export.
  3. Create scenes: From videos in the media repository, scenes can be extracted. Those will be added to the scene repository from where they can be dragged to the scene graph to create the hypervideo structure.
  4. Create a scene graph: A scene graph (see Figure 2) consists of a defined start and an end, as well as several scenes and connection/branching elements allowing advanced navigation options during playback of the video. Scenes and navigation elements are added to the scene graph via drag and drop. These elements are linked with the connection tool from the scene graph tool bar. In order to produce a valid and exportable scene graph, two conditions have to be met. First, only one start scene is allowed. Second, every scene has to be connected by some path to the start and to the end of the video. The validity of the scene graph can be checked with a validation function.
    Figure 2. Scene graph of the SIVA Producer.

    Figure 2. Scene graph of the SIVA Producer.

  5. Add annotations to scenes: Each scene in the scene graph may have one or more multimedia annotations. To add an annotation, a media file can either be dragged from the media repository and dropped on a scene, or an annotation editor (see Figure 3) can be used to customize its timing and appearance. Additionally, a hotspot can be added to the scene which invokes the display of the annotation only after a viewer clicks the marked area.
    Figure 3. Annotation editor of the SIVA Producer.

    Figure 3. Annotation editor of the SIVA Producer.

  6. Export video project: In a last step, finished hypervideo projects with valid scene graphs are exported for the player. The structure of the hypervideo with all possible actions is converted into a JSON file. The media files are transformed and transcoded for the desired target platform.

Further Features

  • Global annotations: Besides annotations which are displayed with scenes, global annotations which are displayed during the whole hypervideo (and do not have timing information as a consequence) can be added with a separate editor. The editor is opened from the main menu or the quick access toolbar.
  • Keywords: Keywords can be added to scenes and annotations in the respective editors. They are added in whitespace-separated lists at the lower left part of the editors. Currently, only keywords added by the author are exported to the player and searchable with the search function, no automated analysis of the media files is performed.
  • Table of contents: The table of contents editor (see Figure 4) is used to create a tree structure of entries with meaningful headlines. A scene from the scene graph can be linked with one of the entries in the table of contents. A scene is added to an entry in the table of contents via drag and drop. The editor is opened from the main menu or the quick access toolbar.
    Figure 4. Table of contents editor of the SIVA Producer.

    Figure 4. Table of contents editor of the SIVA Producer.

  • Advanced navigation: Besides a standard selection element where the user may select one of the attached paths to continue playback in the player, more advanced elements are available as well:
    • Forward button: A single button with only one label. It can be used to interrupt a linear sequence of scenes.
    • Random selection: One of the attached paths will be selected at random without user interaction.
    • Conditional selection: For attached paths, conditions can be defined which have to be fulfilled before the path is unlocked for playback.
  • Project handover: The SIVA Producer provides a function for handing over a project to another computer. Using this function, all media files as well as the project file are copied into a given file structure where they can easily be copied from.
  • Help: A help for the SIVA Producer can be found in the menu under “Help -> Help Contents“.

 


SIVA Player

Reqirements (recommended): Firefox 42.0, Chrome 46.0, Opera 33, Internet Explorer 11, Safari 10.10
Installation: use HTML export profile in SIVA Producer, then integrate it into a website via copying the body part of the exported HTML file and adapting the paths – or use as local stand-alone player
License: GPLv3

Installation files are contained in the SIVA Producer at https://github.com/SIVAteam/SIVA-Suite/tree/master/producer/org.iviPro.ui/libs-native/HTML5player.

The SIVA Player is used to play the hypervideo created in the SIVA Producer. The structure and media elements of the hypervideo are described in a JSON file which conforms to the XML structure described in [6]. A previous versions of the player can be found in [7].

Figure 5. SIVA Player with video view and annotation area.

Figure 5. SIVA Player with video view and annotation area.

The playback of the described videos requires special players which are capable of providing navigational elements like selection panels for follow-up scenes, a table of contents, or a search function. Furthermore, areas for displaying additional information are necessary. Figure 5 shows a user interface of the player (with contents of a medical training scenario) with the following elements:

  • (1) standard controls like pause/play
  • (2) a progress bar (for the current video)
  • (3) a settings button
  • (4) a volume control
  • (5) entry point to the table of contents
  • (6) a button to jump to the previous scene
  • (7) title of the currently displayed scene
  • (8) a button to jump to the next scene (or to a selection panel)
  • (9) a search button (performs a live search and refines the search results with every keystroke)
  • (10) a button for the full-screen mode
  • (11) a foldout panel on the right shows additional information (here, an additional video (12) and two image galleries (13); the additional video provides standard controls and can be displayed in full-screen mode (14))

A click on one of the annotations opens its contents in full screen mode for additional interactions (like browsing an image gallery or watching a video), while the player pauses the main video in the background. If a fork is reached in the scene graph, a button panel is provided at the left side of the main video area where the viewer has to select the next scene. The player also provides multiple language support if the author provides translations for all text and media elements (note: this functionality is not yet implemented in the SIVA Producer, the translations have to be made manually in the JSON file). Besides clicking or tapping the buttons, the basic functions of the player can also be controlled using the keyboard, namely with space bar, ESC button, and left and right arrow button.

All actions of the user can be recorded if logging is enabled by the author. The player transmits the actions to the server every 60 seconds as well as when the player starts or the video ends, if it is used in online mode. If the player is used in offline mode, logging data is collected and transmitted to the server when a connection can be established.

Configurations in HTML are possible. Using a responsive design, the player cannot only be used on desktop PCs having varying screen sizes but also on mobile devices in landscape and portrait mode. The player can be used online over the internet or in offline mode when all files are stored at the end user device.

 


SIVA Server

Main server application:

Reqirements (recommended): Apache Tomcat 7, PostgreSQL 9.1 or newer, credentials to an SMTP account
Installation: deploy WAR file into the Tomcat’s webapp folder, open URL in browser, finish installation by filling all fields
License: GPLv3

Player Stats:

Reqirements (recommended): Apache 2 webserver, PHP in version 5.4, enabled Apache module mod_rewrite
Installation: put back-end files into virtual host’s folder, open in browser, complete the installation
License: GPLv3

Installation files for the server application and the player stats can be found at https://github.com/SIVAteam/SIVA-Suite/tree/master/server. Additionally, a WAR file for the main server application can be found at https://github.com/SIVAteam/SIVA-Suite/releases.

The SIVA Server provides a platform for hypervideos and evaluations based on logging data. Furthermore, it provides user and rights management for copyright protected videos.

Videos exported by the producer are uploaded in the Web interface, extracted by the server, and can then be viewed on the server. It is furthermore possible to provide a link to a video, for example when the video is also available as a Chrome App, or a download for a zip file. The latter can be extracted locally on the end user device and watched without internet connection.

Users may have different roles (like user, administrator, etc.) and rights according to their roles. Furthermore, each user may be member of one or more groups. The accessibility of videos can be assigned at group level. This ensures that the visibility of videos is satisfied according to the demands of the author or copyright limitations. A help for the SIVA Server can be found on its start page.

Figure 6. SIVA Server - player stats with usage view.

Figure 6. SIVA Server – player stats with usage view.

The server furthermore provides the SIVA Player Stats, the back end for the logging functionality of the player. This part of the application facilitates analyzing and evaluating the logged usage data. Watching, searching, exporting, or visualizing these data can be done video based. One of the currently available diagram views is the Sunburst diagram (see Figure 6), which shows how often certain paths were taken in a video by the viewers. Another diagram is a Treemap which shows the different scenes of the video and the events in these scenes. Thereby, the sizes of the boxes are representing the frequency of occurrence of one single event. This part of the application is only accessible for administrators registered in the front-end.

 


Implementation

For information about implementation details please refer to the documentation on GitHub https://github.com/SIVAteam/SIVA-Suite or [8].


Conclusion and Future Work

In this column we present the SIVA Suite, an open source framework for the creation, playback, and administration of hypervideos. The authoring tool, the SIVA Producer, provides several editors like a scene graph or annotation editors, as well as an export function. The hypervideo player, the SIVA Player, has extended controls and display areas as well as an intuitive design. The Web server, the SIVA Server, provides functions for user, group, and video management. This framework, especially the authoring tool and the player, were successfully used for the creation and playback of several hypervideos, most noteworthy a medical hypervideo training (see [1] and [2]). Both were evaluated in several usability tests and improved step-by-step since 2008.

While the framework already provides all necessary functions for the creation, playback, and management of hypervideos, several additional functions might be desirable. For now, video conversion is done in the producer during the export of a hypervideo. Especially when several video versions (regarding resolution, quality, or video format) are needed, this task can block the production site for a long period. To improve productivity, video conversion could be moved to the server component. Furthermore, a player preview in the producer is preferable, which avoids the necessity to export the hypervideo to watch it. While currently a created hypervideo can only be translated by manually copying its structure to a new project, input forms for multilingualism in the producer would make this task easier. Pushing the interaction part to a new level, viewers could benefit from a collaborative editing function in the player, allowing them to add comments or additional materials to a video. Additionally, splitting the contents of the player into a second screen could allow for easier interaction and perception of hypervideos, especially in sports or medical training scenarios. The implementation of a download and cache management as described in [9] and [10] in the player may help to reduce waiting time at scene changes.

 

References

[1] Katrin Tonndorf, Christian Handschigl, Julian Windscheid, Harald Kosch & Michael Granitzer. The effect of non-linear structures on the usage of hypervideo for physical training. In: 2015 IEEE International Conference on Multimedia and Expo (ICME), pp.1-6, 2015.

[2] Britta Meixner, Katrin Tonndorf, Stefan John, Christian Handschigl, Kai Hofmann, Michael Granitzer, Michael Langbauer & Harald Kosch. A Multimedia Help System for a Medical Scenario in a Rehabilitation Clinic. In: Proceedings of I-Know, 14th International Conference on Knowledge Management and Knowledge Technologies (i-KNOW ’14). ACM, New York, NY, USA, 25:1-25:8, 2014.

[3] Frank Shipman, Andreas Girgensohn & Lynn Wilcox. Authoring, Viewing, and Generating Hypervideo: An Overview of Hyper-Hitchcock. In: ACM Trans. Multimedia Comput. Commun. Appl., ACM, 5, 15:1-15:19, 2008.

[4] Honkytonk Films Klynt, http://www.klynt.net/, Website (accessed May 18, 2015), 2015.

[5] Britta Meixner, Katarzyna Matusik, Christoph Grill & Harald Kosch. Towards an easy to use authoring tool for interactive non-linear video. In: Multimedia Tools and Applications, Volume 70, Number 2, Springer Netherlands, pp. 1251-1276, ISSN 1380-7501, 2014.

[6] Britta Meixner & Harald Kosch. Interactive non-linear video: definition and XML structure. In: Proceedings of the 2012 ACM symposium on Document engineering (DocEng ’12). ACM, New York, NY, USA, 49-58, 2012.

[7] Britta Meixner, Beate Siegel, Peter Schultes, Franz Lehner & Harald Kosch. An HTML5 Player for Interactive Non-linear Video Time-based Collaborative Annotations. In: Proceedings of the 10th International Conference on Advances in Mobile Computing & Multimedia, MoMM ’13, ACM, New York, NY, USA, pp. 490-499, 2013.

[8] Britta Meixner, Stefan John & Christian Handschigl. SIVA Suite: Framework for Hypervideo Creation, Playback and Management. In: Proceedings of the 23rd Annual ACM Conference on Multimedia Conference (MM ’15). ACM, New York, NY, USA, 713-716, 2015.

[9] Britta Meixner & Jürgen Hoffmann. Intelligent Download and Cache Management for Interactive Non-Linear Video. In: Multimedia Tools and Applications, Volume 70, Number 2, SpringerNetherlands, pp. 905-948, ISSN 1380-7501, 2014.

[10] Britta Meixner Annotated Interactive Non-linear Video – Software Suite, Download and Cache Management Doctoral Thesis, University of Passau, 2014.

An interview with Klara Nahrstedt

Michael Riegler (MR): Describe your journey into computing from your youth up to the present. What foundational lessons did you learn from this journey? Why were you initially attracted to multimedia?

Prof. Klara Nahrstedt

Prof. Klara Nahrstedt

Klara Nahrstedt (KN): From my youth I have been attracted and interested in mathematics, physics and other sciences. However, since most of my family were electrical and computer engineers, I was surrounded by engineering gadgets and devices, and one of them was a very early computer, able to answer various quiz questions about the world. I liked this new device with its many potentials. Therefore, my interests and my family’s influence guided me towards an educational journey between science and engineering. I did my undergraduate studies in Mathematics and my Diploma work in Numerical Analysis, at the Humboldt University zu Berlin in East Germany. And after the Berlin Wall came down in 1989, my educational journey led me to the Computer and Information Science Department at the University of Pennsylvania in Philadelphia where I did my PhD degree and studied multimedia systems and networking.

My interest in multimedia came during my time at the Institute for Informatik, where I worked as a research programmer. This was the time after my Diploma Degree and after my System Administrator job at the Computer Center of the Ministry of Agriculture in East Berlin. This was the time when Europe, in contrast to USA, invested heavily in the new ISO-defined X.25-based digital networking technology, and with it in the new X.400 email system and its applications. One of the very interesting discussions at the time was to transport via email not only text messages, but also digital audio and images as messages. I wanted to be part of the discussion, since I believed that a picture (image) is worth 1000 words and auditory interfaces would be easier for users to enter messages than text messages. I wanted to help develop solutions that would enable transport of these multi-modal media, and my long journey into multimedia systems and networks started. After I joined University of Pennsylvania, as part of my PhD work, I was exposed to the research in the GRASP laboratory where researchers studied computer vision algorithms and cameras, mounted on robots. As a researcher interested in networking and multimedia, it was very natural for me to explore the integrated multimedia networking problems for tele-robotic applications and enable video and control information to be transported from remote robots to operators and to visualize what the remote robot was doing. Since my PhD the journey into deep understanding of multimedia systems and networks continues as new knowledge, technologies, applications, and users emerge.

The foundational lessons that I learned from this journey are: (1) acquire very strong fundamental knowledge in science and humanities very early independent what future opportunities, jobs, interests, and circumstances guide you towards; (2) work hard and believe in yourself; and (3) keep continuously learning.

MR: Tell us more about your vision and objectives behind your current roles? What do you hope to accomplish and how will you bring this about?

KN: During my professional life, I had three different roles: researcher, educator and provider of professional services in different functions.

  • As a researcher, my vision and objective are to provide theoretical and practical cyber-solutions that enable people to communicate seamlessly and trustworthy with each other and with their physical environments.
  • As an educator, my vision and objective were and are to educate as best I can the next generation of undergraduate and graduate students who are very well prepared to tackle the numerous new challenges in the fast changing human-cyber-physical environments.
  • In the space of professional services, I served in various roles as the member of numerous program committees, and organizing member and/or chair, co-chair, editor of IEEE and ACM professional venues, as the chair of ACM Special Interest Group on Multimedia (SIGMM), and as the member of various departmental and college committees, and now as the Director of the Interdisciplinary Research Unit, the Coordinated Science Laboratory (CSL) in the College of Engineering at the University of Illinois at Urbana-Champaign. In each of the administrative and service roles, my vision and objective are to provide high quality service to the community if it is a high quality technical program at a conference or journal, fair and balanced allocation of resources that would advance the mission of SIGMM, or a broad support of interdisciplinary work in CSL.

I hope to achieve the vision and objectives of my research, educational and professional service activities via hard work, continuous learning, willingness to listen to others, and a very strong collaboration with others, especially my students, colleagues and staff members that I interact with.

MR: Can you profile your current research, its challenges, opportunities, and implications?

KN: My current research moves in three different directions which have some commonalities, but also differences. The major commonality of my research is in aiming to solve the underlying joint performance and trust issues in resource management of multi-modal systems and networking that we find in the current human-cyber-physical systems. The three different directions of my research are: (a) 3D teleimmersive systems for tele-health, (b) trustworthy cyber-physical systems such as power-grid, oil and gas, and (c) trustworthy and timely cloud-based cyber-infrastructures for scientific instruments such as distributed microscopes.

In all of these challenges and directions, the challenges are in providing real-time acquisition, distribution, analysis and retrieval of multi-modal data in conjunction with providing security, reliability and safety.

The opportunities in the areas of human-cyber-physical systems in health, and critical infrastructures are enormous as people are aging, physical infrastructures are being fully stressed, and multimedia devices are challenging every societal cyber-infrastructure by generating Big Data in terms of their volume, velocity and variety.

We are living in truly exciting times as the digital systems are getting more and more complex. The implications are that we have a lot of work to do and solve many challenges as a multimedia system and networking community in collaboration with many other communities. It is very clear that a single computing community is not able to solve the many problems that are coming upon us in the space of multi-modal human-cyber-physical systems. Inter and cross-disciplinary research is the call of the day.

MR: How would you describe the role of women especially in the field of multimedia?

KN: “Difficult” comes to my mind. The number of women in multimedia computing is small and in multimedia systems and networks even smaller. I wish that the role and visibility of women in multimedia technology field would be greater when it comes to IEEE and ACM awards, conference leadership roles, editorial boards memberships, participations in SIGMM technical challenges, and other visible events and roles. Multimedia technology became such a ubiquitous base for numerous application fields including education, training, entertainment, health care, social work which have very strong representations of women in general. Hence, I believe that women in multimedia should play even more of a crucial role in the future than today, especially in innovation, leadership, and interconnection of multimedia computing technologies with the above mentioned application fields.

MR: How would you describe your top innovative achievements in terms of the problems you were trying to solve, your solutions, and the impact it has today and into the future?

KN: My top innovative achievements range from bringing a much better understanding into the field of Quality of Service (QoS) Management and Quality of Service Routing for multimedia systems and networks, to developing novel real-time and trusted resource management architectures and protocols for complex multi-modal applications, systems and networks such as the 3D teleimmersion, energy-efficient mobile multimedia, and trustworthy smart grid, to name few. My QoS research impact can be seen in current wide area wired and wireless networks and systems. The impact of the research management algorithms, architectures and systems that I and my research group have developed can be seen throughout the Microsoft, Google, HP, and IBM solutions where my graduate and undergraduate students took on an employment and brought with them research results and knowledge that then made their ways into multimedia applications, systems and network products.

MR: Over your distinguished career, what are your top lessons you want to share with the audience?

KN: The top lessons that I would like to share are: be patient, honest, open-minded, and fair; don’t give up; be humble but don’t be shy to “toot your own horn” when appropriate; listen what others have to say; and be respectful to others since everybody has something to contribute to the community and society in his/her own way.

 

MPEG Column: Press release for the 113th MPEG meeting

MPEG explores new frontiers for coding technologies with Genome Compression

Geneva, CH − The 113th MPEG meeting was held in Geneva, CH, from 19 – 23 October 2015

MPEG issues Call for Evidence (CfE) for Genome Compression and Storage

At its 113th meeting, MPEG has taken its first formal step toward leveraging its compression expertise to code an entirely new kind of essential information, i.e. the single recipe that describes each one of us as an individual — the human genome. A genome is comprised of the DNA sequences that may contain up to 300 billion DNA pairs, that make up the genetic information within each human cell. It is fundamentally the complete set of our hereditary information.

To aid in the representation and storage of this unique information, MPEG has issued a Call for Evidence (CfE) on Genome Compression and Storage with the goal to assess the performance of new technologies for the efficient compression of genomic information when compared to currently used file formats. This is vitally important because the amount of genomic and related information from a sequences can be as high as several Tbytes (trillion bytes).

Additional purposes of the call are to:

  • become aware of which additional functionalities (e.g. non sequential access, lossy compression efficiency, etc. ) are provided by these new technologies
  • collect information that may be used in drafting a future Call for Proposals

Responses to the CfE will be evaluated during the 114th MPEG meeting in February 2016.

Detailed information, including how to respond to the CfE, will soon be available as documents N15740 and N15739 at the 113th meeting website.

Future Video Coding workshop explores requirements and technologies for the next video codec

A workshop on Future Video Coding Applications and Technologies was held on October 21st, 2015 during the 113th MPEG meeting in Geneva. The workshop was organized to acquire relevant information about the context in which video coding will be operating in the future, and to review the status of existing technologies with merits beyond the capabilities of HEVC, with the goal of guiding future codec standardization activity.

The event featured speakers from the MPEG community and invited outside experts from industry and academia, and covered several topics related to video coding. Prof. Patrick Le Callet from the University of Nantes presented recent results in the field of objective and subjective video quality evaluation. Various applications of video compression were introduced by Prof. Doug Young Suh from Kyung Hee University, Stephan Wenger from Vidyo, Jonatan Samuelsson from Ericsson, and Don Wu from HiSilicon. Dr. Stefano Andriani presented the Digital Cinema Workflow from ARRI. Finally, Debargha Mukherjee from Google and Tim Terriberry from Mozilla gave an overview of recent algorithmic improvements in the development of the VP10 and Daala codecs, and discussed the motivation for the royalty-free video compression technologies developed by their companies.

The workshop took place at a very timely moment when MPEG and VCEG (ITU-T SG16 Q9) experts decided to join efforts in developing extensions of the HEVC standard for HDR.

MPEG‑V 3rd Edition reaches FDIS status for communication between actors in virtual and physical worlds

Parts 1‑6 of the 3rd Edition of MPEG‑V, to be published as ISO/IEC 23005‑[1‑6]:2016, have reached FDIS status — the final stage in the development of a standard prior to formal publication by ISO/IEC — at the 113th MPEG meeting. MPEG‑V specifies the architecture and associated representations to enable interaction between digital content and virtual worlds with the physical one, as well as information exchange between virtual worlds. Features of MPEG‑V enable the specification of multi-sensorial content associated with audio/video data, and control of multimedia applications via advanced interaction devices. In this 3rd Edition, MPEG‑V also includes technology for environmental and camera-related sensors, and 4D-theater effects.

Configurable decoder framework extended with new bitstream parser

The Reconfigurable Media Coding framework — MPEG’s toolkit that enables the expression of a functions of a decoder in terms of functional units and data models — has been extended by a novel new building block, called Parser Instantiation from BSD. Parser Instantiation from BSD can interpret information about a bitstream that is described via the Bitstream Syntax Description Language (defined in ISO/IEC 23001-5), and automatically instantiate a functional unit that is able to correctly parse the bitstream. This will, for example, enable on-the-fly decoding of bitstreams that have been reconfigured for dedicated purposes. The specification of the new technology will be included in a new edition of ISO/IEC 23001-4, which has also been issued by MPEG at its 113th meeting.

Multimedia Preservation Application Format (MP-AF) is finalized

At the 113th MPEG meeting, the Multimedia Preservation Application Format (MP-AF, ISO/IEC 23000-15) has reached the Final Draft International Standard (FDIS) stage. This new standard provides standardized description information to enable users to plan, execute, and evaluate preservation operations (e.g., checking preserved content integrity, migrating preserved content from one system to another system, replicating subpart or entire preserved content, etc.) in order to achieve the objectives of digital preservation. The standard also provides the industry with a coherent and consistent approach in managing multimedia preservation so that it can be implemented in a variety of scenarios. This includes applications, systems, and methods and different hardware and software in varying administrative domains, independent of technological changes.

Implementation guidelines and reference software for MP-AF are under development and have reached Draft Amendment stage at the 113th MPEG meeting. This latest amendment contains examples of applying MP-AF to use cases from the media industry. The final International Standard of MP-AF is expected to be issued at the 114th meeting (February 2016, San Diego).

Seminar for Genome Compression Standardization planned for the 114th MPEG meeting

After its successful seminar on “Prospects on Genome Compression Standardization” held in Geneva during the 113th meeting, MPEG plans to hold another open seminar at its next meeting in San Diego, California on 23rd February 2016 to collect further input and perspectives on genome data standardization from parties interested in the acquisition and processing of genome data.

The main topics covered by the planned seminar presentations are:

  • New approaches, tools and algorithms to compress genome sequence data
  • Genome compression and genomic medicine applications
  • Objectives and issues of quality scores compression and impact on downstream analysis applications

All interested parties are invited to join the seminar to learn more about genome data processing challenges and planned MPEG standardization activities in this area, share opinions, and work together towards the definition of standard technologies supporting improved storage, transport and new functionality for the processing of genomic information.

The seminar is open to the public and registration is free of charge.

Other logistic information on the 114th MPEG meeting are available online together with the detailed program of the Seminar.

How to contact MPEG, learn more, and find other MPEG facts

To learn about MPEG basics, discover how to participate in the committee, or find out more about the array of technologies developed or currently under development by MPEG, visit MPEG’s home page at http://mpeg.chiariglione.org. There you will find information publicly available from MPEG experts past and present including tutorials, white papers, vision documents, and requirements under consideration for new standards efforts.

Examples of tutorials that can be found on the MPEG homepage include tutorials for: High Efficiency Video Coding, Advanced Audio Coding, Universal Speech and Audio Coding, and DASH to name a few.  A rich repository of white papers can also be found and continues to grow. You can find these papers and tutorials for many of MPEG’s standards freely available. Press releases from previous MPEG meetings are also available. Journalists that wish to receive MPEG Press Releases by email should contact Dr. Arianne T. Hinds at a.hinds@cablelabs.com 

Further Information

Future MPEG meetings are planned as follows:

No. 114, San Diego, CA, USA, 22 – 26 February 2016
No. 115, Geneva, CH, 30 – 03 May – June 2016
No. 116, Chengdu, CN, 17 – 21 October 2016
No. 117, Geneva, CH, 16 – 20 January, 2017