SIGMM Conferences and Journals Ranked High by CCF

SIGMM Conferences and Journals Ranked High by Chinese Computing Federation (CCF)

The Chinese Computing Federation (CCF) Ranking List provides a ranking of peer-reviewed conferences and journals in the broad area of computer science. This list is typically consulted by most academic institutions in China as a quality metric for PhD promotions and tenure track jobs.

The CCF ranking released in 2013 for “Multimedia and Graphics” is at the following link (one may use Google Translate to view the web page in English or other desired language):

http://www.ccf.org.cn/sites/ccf/biaodan.jsp?contentId=2567814757424

This CCF 2013 ranking released for “Multimedia and Graphics” conferences and journals is first split into sections A, B, and C, and the conferences are ranked numerically within each section. We are very pleased and excited to share the news that the conferences and journals sponsored by SIGMM have been ranked high in CCF list!

For Multimedia and Graphics conferences:

ACM Multimedia was ranked the highest in the A section, and ACM Conference on Multimedia Retrieval (ICMR) was ranked the highest in the B section.

For Multimedia and Graphics Journals

ACM Transactions on Multimedia Computing, Communications and Application (TOMCCAP) was ranked top in the B section.

This recognition for SIGMM sponsored publishing avenues has been entirely due to the tireless and significant efforts from the organizers, steering committees, SIGMM officers, and most importantly, the multimedia research community.

MPEG Column: 105th MPEG Meeting

— original post by Multimedia Communication blogChristian TimmererAAU

 

Opening plenary, 105th MPEG meeting, Vienna, Klagenfurt

At the 105th MPEG meeting in Vienna, Austria, a lot of interesting things happened. First, this was not only the 105th MPEG meeting but also the 48th VCEG meeting, 14th JCT-VC meeting, 5th JCT-3V meeting, and 26th SC29 meeting bringing together more than 400 experts from more than 20 countries to discuss technical issues in the domain of coding of audio, [picture (SC29 only),] multimedia and hypermedia information. Second, it was the 3rd meeting hosted in Austria after the 62nd in July 2002 and 77th in July 2006. In 2002, “the new video coding standard being developed jointly with the ITU-T VCEG organization was promoted to Final Committee Draft (FCD)” and in 2006 “MPEG Surround completed its technical work and has been submitted for final FDIS balloting” as well as “MPEG has issued a Final Call for Proposals on MPEG-7 Query Format (MP7QF)”.

The official press release of the 105th meeting can be found here but I’d like to highlight a couple of interesting topics including research aspects covered or enabled by them. Although research efforts may lead to the standardization activities but also enables research as you may see below.

MPEG selects technology for the upcoming MPEG-H 3D audio standard

Based on the responses submitted to the Call for Proposals (CfP) on MPEG-H 3D audio, MPEG selected technology supporting content based on multiple formats, i.e., channels and objects (CO) and higher order ambisonics (HOA). All submissions have been evaluated by comprehensive and standardized subjective listening tests followed by statistical analysis of the results. Interestingly, when taking the highest bitrate of 1.2 Mb/s with a 22.2 channel configuration, both of the selected technologies have achieved excellent quality and are very close to true transparency. That is, listeners cannot differentiate between the encoded and uncompressed bitstream. A first version of the MPEG-H 3D audio standard with higher bitrates of around 1.2 Mb/s to 256 kb/s should be available by March 2014 (Committee Draft – CD), July 2014 (Draft International Standard – DIS), and January 2015 (Final Draft International Standards – FDIS), respectively.

Research topics: Although the technologies have been selected, it’s still a long way until the standard gets ratified by MPEG and published by ISO/IEC. Thus, there’s a lot of space for researching efficient encoding tools including the subjective quality evaluations thereof. Additionally, it may impact the way 3D Audio bitstreams are transferred from one entity to the another including file-based, streaming, on demand, and live services. Finally, within the application domain it may enable new use cases which are interesting to explore from a research point of view.

Augmented Reality Application Format reaches FDIS status

The MPEG Augmented Reality Application Format (ARAF, ISO/IEC 23000-13) enables the augmentation of the real world with synthetic media objects by combining multiple, existing standards within a single specific application format addressing certain industry needs. In particular, it combines standards providing representation formats for scene description (i.e., subset of BIFS), sensor/actuator descriptors (MPEG-V), and media formats such as audio/video coding formats. There are multiple target applications which may benefit from the MPEG ARAF standard, e.g., geolocation-based services, image-based object detection and tracking, mixed and augmented reality games and real-virtual interactive scenarios.

Research topics: Please note that MPEG ARAF only specifies the format to enable interoperability in order to support use cases enabled by this format. Hence, there are many research topics which could be associated to the application domains identified above.

What’s new in Dynamic Adaptive Streaming over HTTP?

The DASH outcome of the 105th MPEG meeting comes with a couple of highlights. First, a public workshop was held on session management and control (#DASHsmc) which will be used to derive additional requirements for DASH. All position papers and presentations are publicly available here. Second, the first amendment (Amd.1) to part 1 of MPEG-DASH (ISO/IEC 23009-1:2012) has reached the final stage of standardization and together with the first corrigendum (Cor.1) and the existing part 1, the FDIS of the second edition of ISO/IEC 23009-1:201x has been approved. This includes support for event messages (e.g., to be used for live streaming and dynamic ad insertion) and a media presentation anchor which enables session mobility among others. Third and finally, the FDIS of conformance and reference software (ISO/IEC 23009-2) has been approved providing means for media presentation conformance, test vectors, a DASH access engine reference software, and various sample software tools.

Research topics: The MPEG-DASH conformance and reference software provides the ideal playground for researchers as it can be used both to generate and to consume bitstreams compliant to the standard. This playground could be used together with other open source tools from the DASH-IFGPAC, and DASH@ITEC. Additionally, see also Open Source Column: Dynamic Adaptive Streaming over HTTP Toolset.

HEVC support in MPEG-2 Transport Stream and ISO Base Media File Format

After the completion of High Efficiency Video Coding (HEVC) – ITU-T H.265 | MPEG HEVC at the 103rd MPEG meeting in Geneva, HEVC bitstreams can be now delivered using the MPEG-2 Transport Stream (M2TS) and files based on the ISO Base Media File Format (ISOBMFF). For the latter, the scope of the Advanced Video Coding (AVC) file format has been extended to support also HEVC and this part of MPEG-4 has been renamed to Network Abstract Layer (NAL) file format. This file format now covers AVC and its family (Scalable Video Coding – SVC and Multiview Video Coding – MVC) but also HEVC.

Research topics: Research in the area of delivering audio-visual material is manifold and very well reflected in conference/workshops like ACM MMSys and Packet Video and associated journals and magazines. For these two particular standards, it would be interesting to see the efficiency of the carriage of HEVC with respect to the overhead.

Publicly available MPEG output documents

The following documents shall be come available at http://mpeg.chiariglione.org/ (availability in brackets – YY/MM/DD). If you have difficulties to access one of these documents, please feel free to contact me.

  • Requirements for HEVC image sequences (13/08/02)
  • Requirements for still image coding using HEVC (13/08/02)
  • Text of ISO/IEC 14496-16/PDAM4 Pattern based 3D mesh compression (13/08/02)
  • WD of ISO/IEC 14496-22 3rd edition (13/08/02)
  • Study text of DTR of ISO/IEC 23000-14, Augmented reality reference model (13/08/02)
  • Draft Test conditions for HEVC still picture coding performance evaluation (13/08/02)
  • List of stereo and 3D sequences considered (13/08/02)
  • Timeline and Requirements for MPEG-H Audio (13/08/02)
  • Working Draft 1 of Video Coding for browsers (13/08/31)
  • Test Model 1 of Video Coding for browsers (13/08/31)
  • Draft Requirements for Full Gamut Content Distribution (13/08/02)
  • Internet Video Coding Test Model (ITM) v 6.0 (13/08/23)
  • WD 2.0 MAR Reference Model (13/08/13)
  • Call for Proposals on MPEG User Description (MPEG-UD) (13/08/02)
  • Use Cases for MPEG User Description (13/08/02)
  • Requirements on MPEG User Description (13/08/02)
  • Text of white paper on MPEG Query Format (13/07/02)
  • Text of white paper on MPEG-7 AudioVisual Description Profile (AVDP) (13/07/02)

MPEG Column: Press release for the 104th MPEG meeting

Multimedia ecosystem event focuses on a broader scope of MPEG standards

The 104th MPEG meeting was held in Incheon, Korea, from 22 January to 26 April 2013.

MPEG hosts Multimedia Ecosystem 2013 Event

During its 104th meeting, MPEG has hosted the MPEG Multimedia Ecosystem event to raise awareness of MPEG’s activities in areas not directly related to compression. In addition to world class standards for compression technologies, MPEG has developed media-related standards that enrich the use of multimedia such as MPEG-M for Multimedia Service Platform Technologies, MPEG-U for Rich Media User Interfaces, and MPEG-V for interfaces between real and virtual worlds. Also, new activities such as MPEG Augmented Reality Application Format, Compact Descriptors for Visual Search, Green MPEG for energy efficient media coding, and MPEG User Description are currently in progress. The event was organized with two sessions including a workshop and demonstrations. The workshop session introduced the seven standards described above while the demonstration session showed 17 products based on these standards.

MPEG issues CfP for Energy-Efficient Media Consumption (Green MPEG)

At the 104th MPEG meeting, MPEG has issued a Call for Proposals (CfP) on energy-efficient media consumption (Green MPEG) which is available in the public documents section at http://mpeg.chiariglione.org/. Green MPEG is envisaged to provide interoperable solutions for energy- efficient media decoding and presentation as well as energy-efficient media encoding based on encoder resources or receiver feedback. The CfP solicits responses that use compact signaling to facilitate reduced consumption from the encoding, decoding and presentation of media content without any degradation in the Quality of Experience (QoE). When power levels are critically low, consumers may prefer to sacrifice their QoE for reduced energy consumption. Green MPEG will provide this capability by allowing energy consumption to be traded off with the QoE. Responses to the call are due at the 105th MPEG meeting in July 2013.

APIs enable access to other MPEG technologies via MXM

The MPEG eXtensible Middleware (MXM) API technology specifications (ISO/IEC 23006-2) have reached the status of International Standard at the 104th MPEG meeting. MXM specifies the means to access individual MPEG tools through standardized APIs and is expected to help the creation of a global market of MXM applications that can run on devices supporting MXM APIs in addition to the other MPEG technologies. The MXM standard should also help the deployment of innovative business models because it will enable the easy design and implementation of media-handling value chains. The standard also provides reference software as open source with a business friendly license. The introductory part of the MXM family of specifications, 23006-1 MXM architecture and technologies, will soon be also freely available on the ISO web site.

MPEG introduces MPEG 101 with multimedia

MPEG has taken a further step toward communicating information about its standards in an easy and user- friendly manner; i.e. MPEG 101 with multimedia. MPEG 101 with multimedia will provide video clips containing overviews of individual standards along with explanations of the benefits that can be achieved by each standard, and will be available from the MPEG web site (http://mpeg.chiariglione.org/). During this 104th MPEG meeting, the first video clip on the Unified Speech and Audio Coding (USAC) standard has been prepared. USAC is the newest MPEG Audio standard, which was issued in 2012. It provides performance as good as or better than state-of-the-art codecs that are designed specifically for a single class of content, such as just speech or just music, and it does so for any content type, such as speech, music or a mix of speech and music. Over its target operating bit rate, 12 kb/s for mono signals through 32 kb/s for stereo signals,USAC provides significantly better performance than the benchmarkcodecs, and continues to provide better performance as the bitrate is increased to higher rates. MPEG will employ the MPEG 101 with multimedia communication tool to other MPEG standards in near future.

Digging Deeper – How to Contact MPEG

Communicating the large and sometimes complex array of technology that the MPEG Committee has developed is not a simple task. Experts, past and present, have contributed a series of tutorials and vision documents that explain each of these standards individually. The repository is growing with each meeting, so if something you are interested is not yet there, it may appear shortly – but you should also not hesitate to request it. You can start your MPEG adventure at http://mpeg.chiariglione.org/

Further Information

Future MPEG meetings are planned as follows:

  • No. 105, Vienna, AT, 29 July – 2 August 2013
  • No. 106, Geneva, CH, 28 October – 1 November 2013
  • No. 107, San Jose, CA, USA, 13 – 17 January 2014
  • No. 108, Valencia, ES, 31 March – 04 April 2014

For further information about MPEG, please contact:
Dr. Leonardo Chiariglione (Convenor of MPEG, Italy)
Via Borgionera, 103
10040 Villar Dora (TO), Italy
Tel: +39 011 935 04 61
leonardo@chiariglione.org

or

Dr. Arianne T. Hinds
Cable Television Laboratories 858
Coal Creek Circle Lousiville, Colorado 80027, USA
Tel: +1 303 661 3419
a.hinds@cablelabs.com.

The MPEG homepage also has links to other MPEG pages that are maintained by the MPEG subgroups. It also contains links to public documents that are freely available for download by those who are not MPEG members. Journalists that wish to receive MPEG Press Releases by email should contact Dr. Arianne T. Hinds at a.hinds@cablelabs.com.

MediaEval Multimedia Benchmark: Highlights from the Ongoing 2013 Season

MediaEval is an international multimedia benchmarking initiative that offers tasks to the multimedia community that are related to the human and social aspects of multimedia. The focus is on addressing new challenges in the area of multimedia search and indexing that allow researchers to make full use of multimedia techniques that simultaneously exploit multiple modalities. A series of interesting tasks is currently underway in MediaEval 2013. As every year, the selection of tasks is made using a community-wide survey that gauges what multimedia researchers would find most interesting and useful. A new task to watch closely this year is Search and Hyperlinking of Television Content, which follows on the heels of a very successful pilot last year. The other main tasks running this year are:

The tagline of the MediaEval Multimedia Benchmark is: “The ‘multi’ in multimedia: speech, audio, visual content, tags, users, context”. This tagline explains the inspiration behind the choice of the Brave New Tasks, which are running for the first time this year. Here, we would like to highlight Question Answering for the Spoken Web, which builds on the Spoken Web Search tasks mentioned above. This task is a joint effort between MediaEval and the Forum for Information Retrieval Evaluation, a India-based information retrieval benchmark. MediaEval believes strongly in collaboration and complementarity between benchmarks and we hope that this task will help us to better understand how joint-tasks should be best designed and coordinated. The other Brave New Tasks at MediaEval this year are:

The MediaEval 2013 season culminates with the MediaEval 2013 worksop, which will take place in Barcelona, Catalunya, Spain, on Friday-Saturday 18-19 October 2013. Note that this is just before ACM Multimedia 201, which will be held Monday-Friday 21-25 October 2013, also in Barcelona. We are currently working on finalizing the registration site for the workshop and it will open very soon and will be announced on the MediaEval website. In order to further foster our understanding and appreciation of user-generated multimedia, each year we designate a MediaEval filmmaker to make a YouTube video about the workshop. The MediaEval 2012 workshop video was made by John N.A. Brown and has recently appeared online. John’s decided to focus on the sense of community and the laughter than he observed at the workshop. Interestingly, his focus recalls the work done at MediaEval 2010 on the role of laughter in social video, see:

http://www.youtube.com/watch?v=z1bjXwxkgBs&feature=youtu.be&t=1m29s

We hope that this video inspires your to join us in Barcelona.

Call for Bids: ACM Multimedia 2016

The ACM Special Interest Group on Multimedia is inviting for bids for holding its flagship “ACM International Conference on Multimedia”. The bids are invited for holding the 2016 conference in EUROPE.

The details of the two required bid documents, evaluation procedure and deadlines are explained below (find updates and corrections at http://disi.unitn.it/~sebe/2016_bid.txt). The bids are due on September 2, 2013.

Required Bid Documents

Two documents are required:

  1. Bid Proposal: This document outlines all of the details except the budget. The proposal should contain:
    1. The organizing team: Names and brief bios of General Chairs, Program Chairs and Local Arrangements Chairs. Names and brief bios of at least one chair (out of the two) each for Workshops, Panels, Video, Brave New Ideas, Interactive Arts, Open Source Software Competition, Multimedia Grand Challenge, Tutorials, Doctoral Symposium, Preservation and Technical Demos. It is the responsibility of the General Chairs to obtain consent of all of the proposed team members. Please note that the SIGMM Executive Committee may suggest changes in the team composition for the winning bids. Please make sure that everyone who has been initially contacted understands this.
    2. The Venue: the details of the proposed conference venue including the location, layout and facilities. The layout should facilitate maximum interaction between the participants. It should provide for the normal required facilities for multimedia presentations including internet access. Please note that the 2016 ACM Multimedia Conference will be held in Europe.
    3. Accommodation: the bids should indicate a range of accommodations, catering for student, academic and industry attendees with easy as well as quick access to the conference venue. Indicative costs should be provided. Indicative figures for lunches/dinners and local transport costs for the location must be provided.
    4. Accessibility: the venue should be easily accessible to participants from Americas, Europe and Asia (the primary sources of attendees). Indicative cost of travel from these major destinations should be provided.
    5. Other aspects:
      1. commitments from the local government and organizations
      2. committed financial and in-kind sponsorships
      3. institutional support for local arrangement chairs
      4. conference date in September/October/November which does not clash with any major holidays or other major related conferences
      5. social events to be held with the conference
      6. possible venue(s) for the TPC Meeting
      7. any innovations to be brought into the conference
      8. cultural/scenic/industrial attractions
  2. Tentative Budget: The entire cost of holding the conference with realistic estimated figures should be provided. This template budget sheet should be used for this purpose:http://disi.unitn.it/~sebe/ACMMM_Budget_Template.xlsPlease note that the sheet is quite detailed and you may not have all of the information. Please try to fill it as much as possible. All committed sponsorships for conference organization, meals, student subsidy and awards must be highlighted. Please note that estimated registration costs for ACM members, non-members and students will be required for preparing the budget. Estimates of the number of attendees will also be required.

Feedback from ACM Multimedia Steering Committee:

The bid documents will also be submitted to the ACM Multimedia Steering Committee. The feedback of this committee will have to be incorporated in the final submission of the proposal.

Bid Evaluation Procedure

Bids will be evaluated on the basis of:

  1. Quality of the Organizing Team (both technical strengths and conference organization experience)
  2. Quality of the Venue (facilities and accessibility)
  3. Affordability of the Venue (travel, stay and registration)to the participants
  4. Viability of the Budget: Since SIGMM fully sponsors this conference and it does not have reserves, the aim is to minimize the probability of making a loss and maximize the chances of making a small surplus.

The winning bid will be decided by the SIGMM Executive Committee by vote.

Bid Submission Procedure

Please up-load the two required documents and any other supplementary material to a web-site. The general chairs then should email the formal intent to host along with the bid documents web-site URL to the SIGMM Chair (sfchang@ee.columbia.edu) and the Director of Conferences (sebe@disi.unitn.it) by Sep 02, 2013.

Time-line

Sep 02, 2013:
Bid URL to be submitted to SIGMM Chair and Director of Conferences
Sep 2013:
Bids open for viewing by SIGMM Executive Committee and ACM Multimedia Steering Committee
Oct 01, 2013:
Feedback from SIGMM Executive Committee and ACM Multimedia Steering Committee made available
Oct 15, 2013:
Bid Documents to be finalized
Oct 15, 2013:
Bids open for viewing by all SIGMM Members
Oct 24, 2013:
10-min Presentation of each Bid at ACM Multimedia 2013
Oct 25, 2013:
Decision by the SIGMM Executive Committee

Please note that there is a separate conference organization procedure which kicks in for the winning bids whose details can be seen at: http://www.acm.org/sigs/volunteer_resources/conference_manual

FXPAL hires Dick Bulterman to replace Larry Rowe as President

FX Palo Alto Laboratory announced on July 1, 2013, the hiring of Prof.dr. D.C.A. (Dick) Bulterman to be President and COO.  He will replace Dr. Lawrence A. Rowe on October 1, 2013. Dick is currently a Research Group Head of Distributed and Interactive Systems at Centrum Wiskunde & Informatica (CWI) in Amsterdam, The Netherlands. He is also a Full Professor of Computer Science at Vrije Universiteit, Amsterdam. His research interests are multimedia authoring and document processing. His recent research concerns socially-aware multimedia, interactive television, and media analysis. Together with his CWI colleagues, he has won a series of best paper and research awards within the multimedia community, and has managed a series of successful large-scale European Union projects. He was also the founder and first managing director of Oratrix Development BV.

Dick commented on his appointment: “I am very pleased to be able to join an organization with FXPAL’s reputation and scope. Becoming President is a privilege and a challenge. I look forward to working with the rest of the FXPAL team to continue to build a first-class research organization that addresses significant problems in information capture, sharing, archiving and visualization. I especially hope to strengthen ties with academic research teams around the world via joint projects and an active visitor’s program in Palo Alto.”

Larry Rowe has led FXPAL for the past six years.  During his tenure, he significantly upgraded the physical facilities and computing infrastructure and hired six PhD researchers. Working with VP Research Dr. Lynn Wilcox, he led the lab in pursuing a variety of research projects involving distributed collaboration (e.g., virtual and mixed reality systems), image and multimedia processing (e.g., Embedded Media Markers), information access (e.g., TalkMiner lecture video search system and Querium exploratory search system), video authoring and playback, awareness and presence systems (e.g., MyUnity), and mobile/cloud applications.  He worked with VP Marketing & Business Development Ram Sriram to develop the Fuji Xerox SkyDesk cloud applications product offering. Larry commented, “it has been an honor to lead FXPAL during this exciting time, particularly the changes brought about by the internet, mobile devices, and cloud computing.”

Larry will spend the next several months working on applications using media wall technology to simplify presentations and demonstrations that incorporate multiple windows displaying content from mobile devices and cloud applications.

FXPAL is a leading multimedia and human-computer interface research laboratory established in 1995 by Fuji Xerox Co., Ltd. FXPAL researchers invent technologies that address key issues affecting businesses and society. For more information, visit http://www.fxpal.com.

Open Source Column: OpenIMAJ – Intelligent Multimedia Analysis in Java

Introduction

Multimedia analysis is an exciting and fast-moving research area. Unfortunately, historically there has been a lack of software solutions in a common programming language for performing scalable integrated analysis of all modalities of media (images, videos, audio, text, web-pages, etc). For example, in the image analysis world, OpenCV and Matlab are commonly used by researchers, whilst many common Natural Language Processing tools are built using Java. The lack of coherency between these tools and languages means that it is often difficult to research and develop rational, comprehensible and repeatable software implementations of algorithms for performing multimodal multimedia analysis. These problems are also exacerbated by the lack of any principled software engineering (separation of concerns, minimised code repetition, maintainability, understandability, premature optimisation and over optimisation) often found in research code. OpenIMAJ is a set of libraries and tools for multimedia content analysis and content generation that aims to fill this gap and address the concerns. OpenIMAJ provides a coherent interface to a very broad range of techniques, and contains everything from state-of-the-art computer vision (e.g. SIFT descriptors, salient region detection, face detection and description, etc.) and advanced data clustering and hashing, through to software that performs analysis on the content, layout and structure of webpages. A full list of the all the modules and an overview of their functionalities for the latest OpenIMAJ release can be found here. OpenIMAJ is primarily written in Java and, as such is completely platform independent. The video-capture and hardware libraries contain some native code but Linux, OSX and Windows are supported out of the box (under both 32 and 64 bit JVMs; ARM processors are also supported under Linux). It is possible to write programs that use the libraries in any JVM language that supports Java interoperability, for example Groovy and Scala. OpenIMAJ can even be run on Android phones and tablets. As it’s written using Java, you can run any application built using OpenIMAJ on any of the supported platforms without even having to recompile the code.

Some simple programming examples

The following code snippets and illustrations aim to give you an idea of what programming with OpenIMAJ is like, whilst showing some of the powerful features.

	...
	// A simple Haar-Cascade face detector
	HaarCascadeDetector det1 = new HaarCascadeDetector();
	DetectedFace face1 = det1.detectFaces(img).get(0);
	new SimpleDetectedFaceRenderer().drawDetectedFace(mbf,10,face1);

	// Get the facial keypoints
	FKEFaceDetector det2 = new FKEFaceDetector();
	KEDetectedFace face2 = det2.detectFaces(img).get(0);
	new KEDetectedFaceRenderer().drawDetectedFace(mbf,10,face2);

	// With the CLM Face Model
	CLMFaceDetector det3 = new CLMFaceDetector();
	CLMDetectedFace face3 = det3.detectFaces(img).get(0);
	new CLMDetectedFaceRenderer().drawDetectedFace(mbf,10,face3);
	...
Face detection, keypoint localisation and model fitting
	...
	// Find the features
	DoGSIFTEngine eng = new DoGSIFTEngine();
	List sourceFeats = eng.findFeatures(sourceImage);
	List targetFeats = eng.findFeatures(targetImage);

	// Prepare the matcher
	final HomographyModel model = new HomographyModel(5f);
	final RANSAC ransac = new RANSAC(model, 1500, 
	        new RANSAC.BestFitStoppingCondition(), true);
	ConsistentLocalFeatureMatcher2d matcher = 
	    new ConsistentLocalFeatureMatcher2d
	        (new FastBasicKeypointMatcher(8));

	// Match the features
	matcher.setFittingModel(ransac);
	matcher.setModelFeatures(sourceFeats);
	matcher.findMatches(targetFeats);
	...
Finding and matching SIFT keypoints
	...
	// Access First Webcam
	VideoCapture cap = new VideoCapture(640, 480);

	//grab a frame
	MBFImage last = cap.nextFrame().clone();

	// Process Video
	VideoDisplay.createOffscreenVideoDisplay(cap)
	    .addVideoListener(
	        new VideoDisplayListener() {
	        public void beforeUpdate(MBFImage frame) {
	            frame.subtractInplace(last).abs();
	            last = frame.clone();
	        }
	...
Webcam access and video processing

The OpenIMAJ design philosophy

One of the main goals in the design and implementation of OpenIMAJ was to keep all components as modular as possible, providing a clear separation of concerns whilst maximising code reusability, maintainability and understandability. At the same time, this makes the code easy to use and extend. For example, the OpenIMAJ difference-of-Gaussian SIFT implementation allows different parts of the algorithm to be replaced or modified at will without having to modify the source-code of the existing components; an example of this is our min-max SIFT implementation [1], which allows more efficient clustering of SIFT features by exploiting the symmetry of features detected at minima and maxima of the scale-space. Implementations of commonly used algorithms are also made as generic as possible; for example, the OpenIMAJ RANSAC implementation works with generic Modelobjects and doesn’t care whether the specific model implementation is attempting to fit a homography to a set of point-pair matches or a straight line to samples in a space. Primitive media types in OpenIMAJ are also kept as simple as possible: Images are just an encapsulation of a 2D-arrays of pixels; Videos are just encapsulated iterable collections/streams of images; Audio is just an encapsulated array of samples. The speed of individual algorithms in OpenIMAJ has not been a major development focus, however OpenIMAJ can not be called slow. For example, most of the algorithms implemented in both OpenIMAJ and OpenCV run at similar rates, and things such as SIFT detection and face detection can be run in real-time. Whilst the actual algorithm speed has not been a particular design focus, scalability of the algorithms to massive datasets has. Because OpenIMAJ is written in Java, it is trivial to integrate it with tools for distributed data processing, such as Apache Hadoop. Using the OpenIMAJ Hadoop tools [3] on our small Hadoop cluster, we have extracted and indexed visual term features from datasets with sizes in excess of 50 million images. The OpenIMAJ clustering implementations are able to cluster larger-than-memory datasets by reading data from disk as necessary.

A history of OpenIMAJ

OpenIMAJ was first made public in May 2011, just in time to be entered into the 2011 ACM Multimedia Open-Source Software Competition [2] which it went on to win. OpenIMAJ was not written overnight however. As shown in the following picture, parts of the original codebase came from projects as long ago as 2005. Initially, the features were focused around image analysis, with a concentration on image features used for CBIR (i.e. global histogram features), features for image matching (i.e. SIFT) and simple image classification (i.e. cityscape versus landscape classification).

A visual history of OpenIMAJ

As time went on, the list of features began to grow; firstly with more implementations of image analysis techniques (i.e. connected components, shape analysis, scalable bags-of-visual-words, face detection, etc). This was followed by support for analysing more types of media (video, audio, text, and web-pages), as well as implementations of more general techniques for machine learning and clustering. In addition, support for various hardware devices and video capture was added. Since its initial public release, the community of people and organisations using OpenIMAJ has continued to grow, and includes a number of internationally recognised companies. We also have an active community of people reporting (and helping to fix) any bugs or issues they find, and suggesting new features and improvements. Last summer, we had a single intern working with us, using and developing new features (in particular with respect to text analysis and mining functionality). This summer we’re expecting two or three interns who will help us leverage OpenIMAJ in the 2013 MediaEvalcampaign. From the point-of-view of the software itself, the number of features in OpenIMAJ continues to grow on an almost daily basis. Since the initial release, the core codebase has become much more mature and we’ve added new features and implementations of algorithms throughout. We’ve picked a couple of the highlights from the latest release version and the current development version below:

Reference Annotations

As academics we are quite used to the idea of throughly referencing the ideas and work of others when we write a paper. Unfortunately, this is not often carried forward to other forms of writing, such as the writing of the code for computer software. Within OpenIMAJ, we implement and expand upon much of our own published work, but also the published work of others. For the 1.1 release of OpenIMAJ we decided that we wanted to make it explicit where the idea for an implementation of each algorithm and technique came from. Rather than haphazardly adding references and citations in the Javadoc comments, we decided that the process of referencing should be more formal, and that the references should be machine readable. These machine-readable references are automatically inserted into the generated documentation, and can also be accessed programatically. It’s even possible to automatically generate a bibliography of all the techniques used by any program built on top of OpenIMAJ. For more information, take a look at thisblog post. The reference annotations are part of a bigger framework currently under development that aims to encourage better code development for experimentation purposes. The overall aim of this is to provide the basis for repeatable software implementations of experiments and evaluations, with automatic gathering of the basic statistics that all experiments should have, together with more specific statistics based on the type of evaluation (i.e. ROC statistics for classification experiments; TREC-style Precision-Recall for information retrieval experiments, etc).

Stream Processing Framework

Processing streaming data is a hot topic currently. We wanted to provide a way in OpenIMAJ to experiment with the analysis of streaming multimedia data (see the description of the “Twitter’s visual pulse” application below for example). The OpenIMAJ Streamclasses in the development trunk of OpenIMAJ provide a way to effectively gather, consume, process and analyse streams of data. For example, in just a few lines of code it is possible to get and display all the images from the live Twitter sample stream:

  //construct your twitter api key
  TwitterAPIToken token = ...

  // Create a twitter dataset instance connected to the live twitter sample stream
  StreamingDataset<Status> dataset = new TwitterStreamingDataset(token, 1);

  //use the Stream#map() method to transform the stream so we get images
  dataset
    //process tweet statuses to produce a stream of URLs
    .map(new TwitterLinkExtractor())
    //filter URLs to just get those that are URLs of images
    .map(new ImageURLExtractor())
    //consume the stream and display images
    .forEach(new Operation<URL>() {
      public void perform(URL url) {
        DisplayUtilities.display(ImageUtilities.readMBF(url));
      }
    });

The stream processing framework handles a lot of the hard-work for you. For example it can optionally drop incoming items if you are unable to consume the stream at a fast enough rate (in this case it will gather statistics about what it’s dropped). In addition to the Twitter live stream, we’ve provided a number of other stream source implementations, including the one based on the Twitter search API and one based on IRC chat. The latter was used to produce a simple visualisation of a world map that shows where current Wikipedia edits are currently happening.

Improved face pipeline

The initial OpenIMAJ release contained some support for face detection and analysis, however, this has been and continues to be improved. The key advantage OpenIMAJ has over other libraries such as OpenCV in this area is that it implements a complete pipeline with the following components:

  1. Face Detection
  2. Face Alignment
  3. Facial Feature Extraction
  4. Face Recognition/Classification

Each stage of the pipeline is configurable, and OpenIMAJ contains a number of different algorithm implementations for each stage as well as offering the possibility to easily implement more. The pipeline is designed to allow researchers to focus on a specific area of the pipeline without having to worry about the other components. At the same time, it is fairly easy to modify and evaluate a complete pipeline. In addition to the parts of the recognition pipeline, OpenIMAJ also includes code for tracking faces in videos and comparing the similarity of faces.

Improved audio processing & analysis functionality

When OpenIMAJ was first made public, there was little support for audio processing and analysis beyond playback, resampling and mixing. As OpenIMAJ has matured, the audio analysis components have grown, and now include standard audio feature extractors for things such as Mel-Frequency Cepstrum Coefficients (MFCCs), and higher level analysers for performing tasks such as beat detection, and determining if an audio sample is human speech. In addition, we’ve added a large number of generation, processing and filtering classes for audio signals, and also provided an interface between OpenIMAJ audio objects and the CMU Sphinx speech recognition engine.

Example applications

Every year our research group holds a 2-3 day Hackathon where we stop normal work and form groups to do a mini-project. For the last two years we’ve built applications using OpenIMAJ as the base. We’ve provided a short description together with some links so that you can get an idea of the varied kinds of application OpenIMAJ can be used to rapidly create.

Southampton Goggles

In 2011 we built “Southampton Goggles”. The ultimate aim was to build a geo-localisation/geo-information system based on content-based matching of images of buildings on the campus taken with a mobile device; the idea was that one could take a photo of a building as a query, and be returned relevant information about that building as a response (i.e. which faculty/school is located in it, whether there are vending machines/cafe’s in the building, the opening times of the building, etc). The project had two parts: the first part was data collection in order to collect and annotate the database of images which we would match against. The second part involved indexing the images, and making the client and server software for the search engine. In order to rapidly collect images of the campus, we built a hand-portable streetview like camera device with 6 webcams, a GPS and compass. The software for controlling this used OpenIMAJ to interface with all the hardware and record images, location and direction at regular time intervals. The camera rig and software are shown below:

The Southampton Goggles Capture Rig

The Southampton Goggles Capture Software, built using OpenIMAJ

For the second part of the project, we used the SIFT feature extraction, clustering and quantisation abilities of OpenIMAJ to build visual-term representations of each image, and used our ImageTerrier software [3,4] to build an inverted index which could be efficiently queried. For more information on the project, see this blog post.

Twitter’s visual pulse

Last year, we decided that for our mini-project we’d explore the wealth of visual information on Twitter. Specifically we wanted to look at which images were trending based not on counts of repeated URLs, but on the detection of near-duplicate images hosted at different URLs. In order to do this, we used what has now become the OpenIMAJ stream processing framework, described above, to:

  1. ingest the Twitter sample stream,
  2. process the tweet text to find links,
  3. filter out links that weren’t images (based on a set of patterns for common image hosting sites),
  4. download and resample the images,
  5. extract sift features,
  6. use locality sensitive hashing to sketch each SIFT feature and store in an ensemble of temporal hash-tables.

This process happens continuously in real-time. At regular intervals, the hash-tables are used to build a duplicates graph, which is then filtered and analysed to find the largest clusters of duplicate images, which are then visualised. OpenIMAJ was used for all the constituent parts of the software: stream processing, feature extraction and LSH. The graph construction and filtering uses the excellent JGraphT library that is integrated into the OpenIMAJ core-math module. For more information on the “Twitter’s visual pulse” application, see the paper [5] and this video.

Erica the Rhino

This year, we’re involved in a longer-running hackathon activity to build an interactive artwork for a mass public art exhibition called Go! Rhinos that will be held throughout Southampton city centre over the summer. The Go! Rhinos exhibition features a large number of rhino sculptures that will inhabit the streets and shopping centres of Southampton. Our school has sponsored a rhino sculpture called Erica which we’ve loaded with Raspberry Pi computers, sensors and physical actuators. Erica is still under construction, as shown in the picture below:

Erica, the OpenIMAJ-powered interactive rhino sculpture

OpenIMAJ is being used to provide visual analysis from the webcams that we’ve installed as eyes in the rhino sculpture (shown below). Specifically, we’re using a Java program built on top of the OpenIMAJ libraries to perform motion analysis, face detection and QR-code recognition. The rhino-eyeprogram runs directly on a Raspberry Pi mounted inside the sculpture.

Erica’s eye is a servo-mounted webcam, powered by software written using OpenIMAJ and running on a Raspberry Pi

For more information, check out Erica’s website and YouTube channel, where you can see a prototype of the OpenIMAJ-powered eye in action.

Conclusions

For software developers, the OpenIMAJ library facilitates the rapid creation of multimedia analysis, indexing, visualisation and content generation tools using state-of-the-art techniques in a coherent programming model. The OpenIMAJ architecture enables scientists and researchers to easily experiment with different techniques, and provides a platform for innovating new solutions to multimedia analysis problems. The OpenIMAJ design philosophy means that building new techniques and algorithms, combining different approaches, and extending and developing existing techniques, are all achievable. We welcome you to come and try OpenIMAJ for your multimedia analysis needs. To get started watch the introductory videos, try the tutorial, and look through some of the examples. If you have any questions, suggestions or comments, then don’t hesitate to get in contact.

Acknowledgements

Early work on the software that formed the nucleus of OpenIMAJ was funded by the European Unions 6th Framework Programme, the Engineering and Physical Sciences Research Council, the Arts and Humanities Research Council, Ordnance Survey and the BBC. Current development of the OpenIMAJ software is primarily funded by the European Union Seventh Framework Programme under the ARCOMEM and TrendMiner projects. The initial public releases were also funded by the European Union Seventh Framework Programme under the LivingKnowledge together with the LiveMemories project, funded by the Autonomous Province of Trento.

Introducing the new Board of ACM SIG Multimedia

Statement of the new Board

As we celebrate the 20th anniversary of the ACM Multimedia conference, we are thrilled to see the rapidly expanding momentum in developing innovative multimedia technologies in both the commercial and academic worlds. These can be easily seen by the continuously growing attendance size at the SIGMM flagship conference (ACMMM) and the action-packed technical demo session contributed by many researchers and practitioners from around the world each year.

However, the community is also faced with a set of new challenges in the coming years:

  • how to solidify the intellectual foundation of the SIGMM community and gain broader recognition as a field;
  • how to rebuild the strong relations with other communities closely related to multimedia;
  • how to broaden the scope of the multimedia conferences and journals and encourage participation of researchers from other disciplines;
  • how to develop a simulative environment for nurturing next-generation leaders including a stronger female participation in the community.

To respond to these challenges and explore new opportunities, we will actively explore following initiatives:

  • create opportunities for stronger collaboration with relevant fields through joint workshops, tutorials, and special issues;
  • recruit active contribution from broad technical, geographical, and organizational areas;
  • strengthen the SIGMM brand by establishing community-wide resources such as SIGMM distinguished lecture series, open source software, and online courses;
  • expand the mentoring and educational activities for students and minority members.


Chair: Shih-Fu Chang

Shih-Fu Chang

shih.fu.chang@columbia.edu
Shih-Fu Chang is the Richard Dicker Professor and Director of the Digital Video and Multimedia Lab at Columbia University. He is an active researcher leading development of innovative technologies for multimedia information extraction and retrieval, while contributing to fundamental advances of the fields of machine learning, computer vision, and signal processing. Recognized by many paper awards and citation impacts, his scholarly work set trends in several important areas, such as content-based visual search, compressed-domain video manipulation, image authentication, large-scale high-dimensional data indexing, and semantic video search. He co-led the ADVENT university-industry research consortium with participation of more than 25 industry sponsors. He has received IEEE Signal Processing Society Technical Achievement Award, ACM SIG Multimedia Technical Achievement Award, IEEE Kiyo Tomiyasu Award, Service Recognition Awards from IEEE and ACM, and the Great Teacher Award from the Society of Columbia Graduates. He served as the Editor-in-Chief of the IEEE Signal Processing Magazine (2006-8), Chairman of Columbia Electrical Engineering Department (2007-2010), Senior Vice Dean of Columbia Engineering School (2012-date), and advisor for several companies and research institutes. His research has been broadly supported by government agencies as well as many industry sponsors. He is a Fellow of IEEE and the American Association for the Advancement of Science.

Vice Chair: Rainer Lienhart

Rainer Lienhart

rainer.lienhart@informatik.uni-augsburg.de
Rainer Lienhart is a full professor in the computer science department of the University of Augsburg heading the Multimedia Computing & Computer Vision Lab (MMC Lab). His group is focusing on all aspects of very large-scale image, video, and audio mining algorithms including feature extraction, image/video retrieval, object detection, and humans pose estimation. Rainer Lienhart has been an active contributor to OpenCV. From August 1998 to July 2004 he worked as a Staff Researcher at Intel’s Microprocessor Research Lab in Santa Clara, CA, on transforming a network of heterogeneous, distributed computing platforms into an array of audio/video sensors and actuators capable of performing complex DSP tasks such as distributed beamforming, audio rendering, audio/visual tracking, and camera array processing. At the same time, he was also continuing his work on media mining. He is well known for his work in image/video text detection/recognition, commercial detection, face detection, shot and scene detection, automatic video abstraction, and large-scale image retrieval. The scientific work of Rainer Lienhart covers more than 80 refereed publications and more than 20+ patents. He was a general co-chair of ACM Multimedia 2007 and SPIE Storage and Retrieval of Media Databases 2004 & 2005. He serves in the editorial boards of 3 international journals. For more than a decade he is a committee member of ACM Multimedia. Since July 2009 he is the vice chair of SIGMM. He is also the executive director of the Institute for Computer Science at the University of Augsburg since April 2010.

Director of Conferences: Nicu Sebe

Nicu Sebe

sebe@disi.unitn.it
Nicu Sebe is Associate Professor with the University of Trento, Italy, where he is leading the research in the areas of multimedia information retrieval and human-computer interaction in computer vision applications. He was involved in the organization of the major conferences and workshops addressing the computer vision and human-centered aspects of multimedia information retrieval, among which as a General Co-Chair of the IEEE Automatic Face and Gesture Recognition Conference, FG 2008, ACM International Conference on Image and Video Retrieval (CIVR) 2007 and 2010. He is the general chair of ACM Multimedia 2013 and a program chair of ECCV 2016 and was a program chair of ACM Multimedia 2011 and 2007. He has been a visiting professor in Beckman Institute, University of Illinois at Urbana-Champaign and in the Electrical Engineering Department, Darmstadt University of Technology, Germany. He is a co-chair of the IEEE Computer Society Task Force on Human-centered Computing and is an associate editor of IEEE Transactions on Multimedia, Computer Vision and Image Understanding, Machine Vision and Applications, Image and Vision Computing, and of Journal of Multimedia.

Open Source Column: Dynamic Adaptive Streaming over HTTP Toolset

Introduction

Multimedia content is nowadays omnipresent thanks to technological advancements in the last decades. A major driver of today’s networks are content providers like Netflix and YouTube, which do not deploy their own streaming architecture but provide their service over-the-top (OTT). Interestingly, this streaming approach performs well and adopts the Hypertext Transfer Protocol (HTTP), which has been initially designed for best-effort file transfer and not for real-time multimedia streaming. The assumption of former video streaming research that streaming on top of HTTP/TCP will not work smoothly due to its retransmission delay and throughput variations, has apparently be overcome as supported by [1]. Streaming on top of HTTP, which is currently mainly deployed in the form of progressive download, has several other advantages. The infrastructure deployed for traditional HTTP-based services (e.g., Web sites) can be exploited also for real-time multimedia streaming. Typical problems of real-time multimedia streaming like NAT or firewall traversal do not apply for HTTP streaming. Nevertheless, there are certain disadvantages, such as fluctuating bandwidth conditions, that can not be handled with the progressive download approach, which is a major drawback especially for mobile networks where the bandwidth variations are tremendous. One of the first solutions to overcome the problem of varying bandwidth conditions has been specified within 3GPP as Adaptive HTTP Streaming (AHS) [2]. The basic idea is to encode the media file/stream into different versions (e.g., bitrate, resolution) and chop each version into segments of the same length (e.g., two seconds). The segments are provided on an ordinary Web server and can be downloaded through HTTP GET requests. The adaptation to the bitrate or resolution is done on the client-side for each segment, e.g., the client can switch to a higher bitrate – if bandwidth permits – on a per segment basis. This has several advantages because the client knows best its capabilities, received throughput, and the context of the user. In order to describe the temporal and structural relationships between segments, AHS introduced the so-called Media Presentation Description (MPD). The MPD is a XML document that associates an uniform resource locators (URL) to the different qualities of the media content and the individual segments of each quality. This structure provides the binding of the segments to the bitrate (resolution, etc.) among others (e.g., start time, duration of segments). As a consequence each client will first request the MPD that contains the temporal and structural information for the media content and based on that information it will request the individual segments that fit best for its requirements. Additionally, the industry has deployed several proprietary solutions, e.g., Microsoft Smooth Streaming [3], Apple HTTP Live Streaming [4] and Adobe Dynamic HTTP Streaming [5], which more or less adopt the same approach.

Figure 1: Concept of Dynamic Adaptive Streaming over HTTP.

Recently, ISO/IEC MPEG has ratified Dynamic Adaptive Streaming over HTTP (DASH) [6] an international standard that should enable interoperability among proprietary solutions. The concept of DASH is depicted in Figure 1. The Institute of Information Technology (ITEC) and, in particular, the Multimedia Communication Research Group of the Alpen-Adria-Universität Klagenfurt has participated and contributed from the beginning to this standard. During the standardization process a lot of research tools have been developed for evaluation purposes and scientific contributions including several publications. These tools are provided as open source for the community and are available at [7].

Open Source Tools Suite

Our open source tool suite consists of several components. On the client-side we provide libdash [8] and the DASH plugin for the VLC media player (also available on Android). Additionally, our suite also includes a JavaScript-based client that utilizes the HTML5 media source extensions of the Google Chrome browser to enable DASH playback. Furthermore, we provide several server-side tools such as our DASH dataset, consisting of different movie sequences available in different segment lengths as well as bitrates and resolutions. Additionally, we provide a distributed dataset mirrored at different locations across Europe. Our datasets have been encoded using our DASHEncoder, which is a wrapper tool for x264 and MP4Box. Finally, a DASH online MPD validation service and a DASH implementation over CCN completes our open source tool suite.

libdash

Figure 2: Client-Server DASH Architecture with libdash.

The general architecture of DASH is depicted in Figure 2, where orange represents standardized parts. libdash comprises the MPD parsing and HTTP part. The library provides interfaces for the DASH Streaming Control and the Media Player to access MPDs and downloadable media segments. The download order of such media segments will not be handled by the library. This is left to the DASH Streaming Control, which is an own component in this architecture but it could also be included in the Media Player. In a typical deployment, a DASH server provides segments in several bitrates and resolutions. The client initially receives the MPD through libdash which provides a convenient object-oriented interface to that MPD. Based on that information the client can download individual media segments through libdash at any point in time. Varying bandwidth conditions can be handled by switching to the corresponding quality level at segment boundaries in order to provide a smooth streaming experience. This adaptation is not part of libdash and the DASH standard and will be left to the application which is using libdash.

DASH-JS

Figure 3: Screenshot of DASH-JS.

DASH-JS seamlessly integrates DASH into the Web using the HTML5 video element. A screenshot is shown in Figure 3. It is based on JavaScript and uses the Media Source API of Google’s Chrome browser to present a flexible and potentially browser independent DASH player. DASH-JS is currently using WebM-based media segments and segments based on the ISO Base Media File Format.

DASHEncoder

DASHEncoder is a content generation tool – on top of the open source encoding tool x264 and GPAC’s MP4Box – for DASH video-on-demand content. Using DASHEncoder, the user does not need to encode and multiplex separately each quality level of the final DASH content. Figure 4 depicts the workflow of the DASHEncoder. It generates the desired representations (quality/bitrate levels), fragmented MP4 files, and MPD file based on a given configuration file or by command line parameters.

Figure 4: High-level structure of DASHEncoder.

The set of configuration parameters comprises a wide range of possibilities. For example, DASHEncoder supports different segment sizes, bitrates, resolutions, encoding settings, URLs, etc. The modular implementation of DASHEncoder enables the batch processing of multiple encodings which are finally reassembled within a predefined directory structure represented by single MPD. DASHEncoder is available open source on our Web site as well as on Github, with the aim that other developers will join this project. The content generated with DASHEncoder is compatible with our playback tools.

Datasets

Figure 5: DASH Dataset.

Our DASH dataset comprises multiple full movie length sequences from different genres – animation, sport and movie (c.f. Figure 5) – and is located at our Web site. The DASH dataset is encoded and multiplexed using different segment sizes inspired by commercial products ranging from 2 seconds (i.e., Microsoft Smooth Streaming) to 10 seconds per fragment (i.e., Apple HTTP Streaming) and beyond. In particular, each sequence of the dataset is provided with segments sizes of 1, 2, 4, 6, 10, and 15 seconds. Additionally, we also offer a non-segmented version of the videos and the corresponding MPD for the movies of the animation genre, which allows for byte-range requests. The provided MPDs of the dataset are compatible with the current implementation of the DASH VLC Plugin, libdash, and DASH-JS. Furthermore, we provide a distributed DASH (D-DASH) dataset which is, at the time of writing, replicated on five sites within Europe, i.e., Klagenfurt, Paris, Prague, Torino, and Crete. This allows for a real-world evaluation of DASH clients that perform bitstream switching between multiple sites, e.g., this could be useful as a simulation of the switching between multiple Content Distribution Networks (CDNs).

DASH Online MPD Validation Service

The DASH online MPD validation service implements the conformance software of MPEG-DASH and enables a Web-based validation of MPDs based on a file, URI, and text. As the MPD is based on XML schema, it is also possible to use an external XML schema file for the validation.

DASH over CCN

Finally, the Dynamic Adaptive Streaming over Content Centric Networks (DASC áka DASH over CCN) implements DASH utilizing a CCN naming scheme to identify content segments in a CCN network. Therefore, the CCN concept from Jacobson et al. and the CCNx implementation (www.ccnx.org) of PARC is used. In particular, video segments formatted according to MPEG-DASH are available in different quality levels but instead of HTTP, CCN is used for referencing and delivery.

Conclusion

Our open source tool suite is available to the community with the aim to provide a common ground for research efforts in the area of adaptive media streaming in order to make results comparable with each other. Everyone is invited to join this activity – get involved in and excited about DASH.

Acknowledgments

This work was supported in part by the EC in the context of the ALICANTE (FP7-ICT-248652) and SocialSensor (FP7-ICT-287975) projects and partly performed in the Lakeside Labs research cluster at AAU.

References

[1] Sandvine, “Global Internet Phenomena Report 2H 2012”, Sandvine Intelligent Broadband Networks, 2012. [2] 3GPP TS 26.234, “Transparent end-to-end packet switched streaming service (PSS)”, Protocols and codecs, 2010. [3] A. Zambelli, “IIS Smooth Streaming Technical Overview,” Technical Report, Microsoft Corporation, March 2009. [4] R. Pantos, W. May, “HTTP Live Streaming”, IETF draft, http://tools.ietf.org/html/draft-pantos-http-live-streaming-07 (last access: Feb 2013). [5] Adobe HTTP Dynamic Streaming, http://www.adobe.com/products/httpdynamicstreaming/ (last access: Feb 2013). [6] ISO/IEC 23009-1:2012, Information technology – Dynamic adaptive streaming over HTTP (DASH) – Part 1: Media presentation description and segment formats. Available here [7] ITEC DASH, http://dash.itec.aau.at [8] libdash open git repository, https://github.com/bitmovin/libdash  

Call for Multimedia Grand Challenge Solutions

Overview

The Multimedia Grand Challenge presents a set of problems and issues from industry leaders, geared to engage the Multimedia research community in solving relevant, interesting and challenging questions about the industry’s 3-5 year vision for multimedia.
The Multimedia Grand Challenge was first presented as part of ACM Multimedia 2009 and has established itself as a prestigious competition in the multimedia community. This year’s conference will continue the tradition with by repeating previous challenges, and by introducing brand new challenges.

Challenges

NHK Where is beauty? Grand Challenge

Scene Evaluation based on Aesthetic Quality

Automatic understanding of viewer’s impressions from image or video sequences is a very difficult task, but an interesting theme for study. Therefore, more and more researchers have investigated this theme recently. To achieve automatic understanding, various elemental features or techniques need to be used in a comprehensive manner, such as the balance of color or contrast, composition, audio, object recognition, and object motion. In addition, we might have to consider not only image features but also semantic features.

The task NHK sets is “Where is Beauty?”, which aims at automatically recognizing beautiful scenes in a set of video sequences. The important point of this task is “how to evaluate beauty using an engineering approach”, which is a challenging task involving human feelings. We will provide participants with approx. 1,000 clips of raw broadcast video footage, containing various categories such as creatures, landscape, and CGI. These video clips last about 1 min. Participants will have to evaluate the beautifulness of these videos automatically, and rank them in terms of beauty.

The proposed method will be evaluated on the basis of its originality and accuracy. We expect that participants will consider a diverse range of beauty, not only the balance of color but also composition, motion, audio, and other brand new features! The reliability and the diversity of the extracted beauty will be scored by using manually annotated data. In addition, if a short video composed of the highly ranked videos is submitted, it will be included in the evaluation.

More details

Technicolor – Rich Multimedia Retrieval from Input Videos Grand Challenge

Visual search that aims at retrieving copies of an image as well as information on a specific object, person or place in this image has progressed dramatically in the past few years. Thanks to modern techniques for large scale image description, indexing and matching, such an image-based information retrieval can be conducted either in a structured image database for a given topic (e.g., photos in a collection, paintings, book covers, monuments) or in an unstructured image database which is weakly labeled (e.g., via user-input tags or surrounding texts, including captions).

This Grand Challenge aims at exploring tools to push this search paradigm forward by addressing the following question: how can we search unstructured multimedia databases based on video queries? This problem is already encountered in professional environments where large semi-structured multimedia assets, such as TV/radio archives or cultural archives, are operationally managed. In these cases, resorting to trained professionals such as archivists remains the rule, both to annotate part of the database beforehand and to conduct searches. Unfortunately, this workflow does not apply to large-scale search into wildly unstructured repositories accessible on-line.

The challenge is to retrieve and organize automatically relevant multimedia documents based on an input video. In a scenario where the input video features a news story for instance, can we retrieve other videos, articles and photos about the same news story? And, when the retrieved information is voluminous, how can these multimedia documents be linked, organized and summarized for easy reference, navigation and exploitation?

More details

Yahoo! – Large-scale Flickr-tag Image Classification Grand Challenge

Image classification is one of the fundamental problems of computer vision and multimedia research. With the proliferation of the Internet, the availability of cheap digital cameras, and the ubiquity of cell-phone cameras, the amount of accessible visual content has increased astronomically. Websites such as Flickr alone boast of over 5 billion images, not counting the may such websites and countless other images that are not published online. This explosion poses unique challenges for the classification of images.

Classification of images with a large number of classes and images has attracted several research efforts in recent years. The availability of datasets such as ImageNet, which boasts of over 14 million images and over 21 thousand classes, has motivated researchers to develop classification algorithms that can deal with large quantities of data. However, most of the effort has been dedicated to building systems that can scale up when the number of classes is large. In this challenge we are interested to learn classifiers when the number of images is large. There has been some recent work that deals with thousands of images for training, however in this challenge we are looking at upwards of 250,000 images per class. What makes the challenge difficult is that the annotations are provided by users of Flickr (www.flickr.com), which might not be always accurate. Furthermore each class can be considered as a collection of sub-classes with varied visual properties.

More details

Huawei/3DLife – 3D human reconstruction and action recognition Grand Challenge

3D human reconstruction and action recognition from multiple active and passive sensors

This challenge calls for demonstrations of methods and technologies that support real-time or near real-time 3D reconstruction of moving humans from multiple calibrated and remotely located RGB cameras and/or consumer depth cameras. Additionally, this challenge also calls for methods for human gesture/movement recognition from multimodal data. The challenge targets mainly real-time applications, such as collaborative immersive environments and inter-personal communications over the Internet or other dedicated networking environments.

To this end, we provide two data sets to support investigation of various techniques in the fields of 3D signal processing, computer graphics and pattern recognition, and enable demonstrations of various relevant technical achievements.

Consider multiple distant users, which are captured in real-time by their own visual capturing equipment, ranging from a single Kinect (simple user) to multiple Kinects and/or high-definition cameras (advanced users), as well as non-visual sensors, such as Wearable Inertial Measurement Units (WIMUs) and multiple microphones. The captured data is either processed at the capture site to produce 3D reconstructions of users or directly coded and transmitted, enabling rendering of multiple users in a shared environment, where users can “meet” and “interact” with each other or the virtual environment via a set of gestures/movements.

More details

MediaMixer/VideoLectures.NET – Temporal Segmentation and Annotation Grand Challenge

Semantic VideoLectures.NET segmentation service

VideoLectures.NET mostly hosts lectures 1 to 1.5h long linked with slides and enriched with metadata and additional textual contents. With automatic temporal segmentation and annotation of the video we would gain on efficiency of our video search engine and be able to provide users with the ability to search for sections within a video, as well as recommend similar content. This would mean that the challenge partcipants develop tools for automatic segmentation of videos that could then be implemented in VideoLectures.NET.

More details

Microsoft: MSR – Bing Image Retrieval Grand Challenge

The Second Microsoft Research (MSR)-Bing challenge (the “Challenge”) is organized into a dual track format, one scientific and the other industrial. The two tracks share exactly the same task and timelines but independent submission and ranking processes.

For the scientific track, we will follow exactly what MM13 GC outlines. The papers will be submitted to MM13, and go through the review process. The accepted ones will be presented at the conference. At the conference, the authors of the accepted papers will be requested to introduce their solutions, give a quick demo, and take questions from the judges and the audience. Winners will be selected for Multimedia Grand Challenge Award based on their presentation.

The industrial track of the Challenge will be conducted over the internet through a website maintained by Microsoft. Contestants participating in the industrial track are encouraged to take advantage of the recent advancements in the cloud computing infrastructure and public datasets and must submit their entries in the form of publicly accessible REST-based web services (further specified below). Each entry will be evaluated against a test set created by Bing on queries received at Bing Image Search in the EN-US market. Due to the global nature of the Web the queries are not necessarily limited to the English language used in the United States.

More details

Submissions

Submissions should:

  • Significantly address one of the challenges posted on the web site.
  • Depict working, presentable systems or demos, using the grand challenge dataset where provided.
  • Describe why the system presents a novel and interesting solution.

Submission Guidelines

The submissions (max 4 pages) should be formatted according to ACM Multimedia formatting guidelines. The submissions should be formatted according to ACM Multimedia formatting guidelines. Multimedia Grand Challenge reviewing is Double-blind so authors shouldn’t reveal their identity in the paper. The finalists will be selected by a committee consisting of academia and industry representatives, based on novelty, presentation, scientific interest of the approache and, for the evaluation-based challenges, on the performance against the task.

Finalist submissions will be published in the conference proceedings, and will be presented in a special event during the ACM Multimedia 2013 conference in Barcelona, Spain. At the conference, finalists will be requested to introduce their solutions, give a quick demo, and take questions from the judges and the audience.
Winners will be selected for Multimedia Grand Challenge awards based on their presentation.

Important Dates

Challenges Announced: February 25, 2013
Paper Submission Deadline: July 1, 2013
Notification of Acceptance: July 29, 2013
Camera-Ready Submission Deadline: August 12, 2013

Contact

For any questions regarding the Grand Challenges please email the Multimedia Grand Challenge Solutions Chairs:

Neil O’Hare (Yahoo!, Spain)
Yiannis Kompatsiaris (CERTH, Greece)