VIREO-VH: Libraries and Tools for Threading and Visualizing a Large Video Collection

Introduction

“Video Hyperlinking” refers to the creation of links connecting videos that share near-duplicate segments. Like hyperlinks in HTML documents, the video links help user navigating videos of similar content, and facilitate the mining of iconic clips (or visual memes) spread among videos. Figure 1 shows some example of iconic clips, which can be leveraged for linking videos and the results are potentially useful for multimedia tasks such as video search, mining and analytics.

VIREO-VH [1] is an open source software developed by the VIREO research team. The software provides end-to-end support for the creation of hyperlinks, including libraries and tools for threading and visualizing videos in a large collection. The major software components are: near-duplicate keyframe retrieval, partial near-duplicate localization with time alignment, and galaxy visualization. These functionalities are mostly implemented based on state-of-the-art technologies, and each of them is developed as an independent tool taking into consideration flexibility, such that users can substitute any of the components with their own implementation. The earlier versions of the software are LIP-VIREO and SOTU, which have been downloaded more than 3,500 times. VIREO-VH has been internally used by VIREO since 2007, and evolved over the years based on the experiences of developing various multimedia applications, such as news events evolution analysis, novelty reranking, multimedia-based question-answering [2], cross media hyperlinking [3], and social video monitoring.

Figure 1: Examples of iconic clips.

Functionality

The software components include video pre-processing, bag-of-words based inverted file indexing for scalable near-duplicate keyframe search, localization of partial near-duplicate segments [4], and galaxy visualization of a video collection, as shown in Figure 2. The open source includes over 400 methods with 22,000 lines of code.

The workflow of the open source is as followings. Given a collection of videos, the visual content will be indexed based on a bag-of-words (BoW) representation. Near-duplicate keyframes will be retrieved and then temporally aligned in a pairwise manner among videos. Segments of a video which are near-duplicate to other videos in the collection will then be hyperlinked with the start and end times of segments being explicitly logged. The end product is a galaxy browser, where the videos are visualized as a galaxy of clusters on a Web browser, with each cluster being a group of videos that are hyperlinked directly or indirectly through transitivity propagation. User friendly interaction is provided such that end user can zoom in and out, so they can glance or take a close inspection of the video relationship.

Figure 2: Overview of VIREO-VH software architecture.

Interface

VIREO-VH could be either used as an end-to-end system that outputs visual hyperlinks, with a video collection as input, or as independent functions for development of different applications.

For content owners interested in the content-wise analysis of a video collection, VIREO-VH can be used as an end-to-end system by simply inputting the location of a video collection and the output paths (Figure 3). The resulting output can then be viewed with the provided interactive interface for showing the glimpse of video relationship in the collection.

Figure 3: Interface for end-to-end processing of video collection.

VIREO-VH also provides libraries to grant researchers programmatic access. The libraries consist of various classes (e.g., Vocab, HE, Index, SearchEngine and CNetwork), providing different functions for vocabulary and Hamming signature training [5], keyframe indexing, near-duplicate keyframe searching and video alignment. Users can refer to the manual for details. Furthermore, the components of VIREO-VH are independently developed for providing flexibility, so users can substitute any of the components with their own implementation. This capability is particular useful for benchmarking the users’ own choice of algorithms. As an example, users can choose their own visual vocabulary and Hamming median, but use the open source for building index and retrieving near-duplicate keyframes. For example, the following few lines of code implements a typical image retrieval system:

#include “Vocab_Gen.h” #include “Index.h” #include “HE.h” #include “SearchEngine.h” … // train visual vocabulary using descriptors in folder “dir_desc” // here we choose to train a hierarchical vocabulary with 1M leaf nodes (3 layers, 100 nodes / layer) Vocab_Gen::genVoc(“dir_desc”, 100, 3); // load pre-trained vocabulary from disk Vocab* voc = new Vocab(100, 3, 128); voc->loadFromDisk(“vk_words/”); // Hamming Embedding training for the vocabulary HE* he = new HE(32, 128, p_mat, 1000000, 12); he->train(voc, “matrix”, 8); // index the descriptors with inverted file Index::indexFiles(voc, he, “dir_desc/”, “.feat”, “out_dir/”, 8); // load index and conduct online search for images in “query_desc” SearchEngine* engine = new SearchEngine(voc, he); engine->loadIndexes(“out_dir/”); engine->search_dir(“query_desc”, “result_file”, 100); …

Example

We use a video collection consisting of 220 videos (around 31 hours) as an example. The collection was crawled from YouTube using the keyword “economic collapse”. Using our open source and default parameter settings, a total of 35 partial near-duplicate (ND) segments are located, resulting in 10 visual clusters (or snippets). Figure 4 shows two examples of the snippets. Based on our experiments, the precision of ND localization is as high as 0.95 and the recall is 0.66. Table 1 lists the running time for each step. The experiment was conducted on a PC with dual core 3.16 GHz CPU and 3 GB of RAM. In total, creating a galaxy view for 31.2 hours of videos (more than 4,000 keyframes) could be completed within 2.5 hours using our open source. More details can be found in [6].

Pre-processing 75 minutes
ND Retrieval 59 minutes
Partial ND localization 8 minutes
Galaxy Visualization 55 seconds

Table 1: The running time for processing 31.2 hours of videos.

Figure 4: Examples of visual snippets mined from a collection of 220 videos. For ease of visualization, each cluster is tagged with a timeline description from Wikipedia using the techniques developed in [3].

Acknowledgements

The open source software described in this article was fully supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (CityU 119610).

References

[1] http://vireo.cs.cityu.edu.hk/VIREO-VH/

[2] W. Zhang, L. Pang and C. W. Ngo. Snap-and-Ask: Answering Multimodal Question by Naming Visual Instance. ACM Multimedia, Nara, Japan, October 2012. Demo

[3] S. Tan, C. W. Ngo, H. K. Tan and L. Pang. Cross Media Hyperlinking for Search Topic Browsing. ACM Multimedia, Arizona, USA, November 2011. Demo

[4] H. K. Tan, C. W. Ngo, R. Hong and T. S. Chua. Scalable Detection of Partial Near-Duplicate Videos by Visual-Temporal Consistency. In ACM Multimedia, pages 145-154, 2009.

[5] H. Jegou, M. Douze, and C. Schmid. Improving bag-of-features for large scale image search. IJCV,87(3):192-212, May 2010.

[6] L. Pang, W. Zhang and C. W. Ngo. Video Hyperlinking: Libraries and Tools for Threading and Visualizing a Large Video Collection. ACM Multimedia, Nara, Japan, Oct 2012.

Opencast Matterhorn – Open source lecture capture and video management

Since its formation in 2007, Opencast has become a global community around academic video and its related domains. It is a valid source of inspiring ideas and a huge living community for educational multimedia content creation and usage. Matterhorn is a community-driven collaborative project to develop an end-to-end, open source solution that supports the scheduling, capture, managing, encoding, and delivery of educational audio and video content, and the engagement of users with that content. Recently (July 2013) Matterhorn 1.4 has been released after almost a year of feature planing, development and bug fixing by the community. The latest version, along with documentation, can be downloaded from the project website at Opencast Matterhorn.

Opencast Matterhorn Welcomes Climbers

The first screenshot shows a successful system installation and start in a web browser.

Opencast: A community for higher education

The Opencast community is a collaborative effort, in which individuals, higher education institutions and organizations work together to explore, develop, define and document best practices and technologies for management of audiovisual content in academia. As such, it wants to stimulate discussion and collaboration between institutions to develop and enhance the use of academic video. The community shares experiences with existing technologies as well as identifies future approaches and requirements. The community seeks broad participation in this important and dynamic domain, to allow community members to share expertise and experience and collaborate in related projects. It was initiated by the founding members of the community [2] to solve the need identified with academic institutions to run an affordable, flexible and enterprise-ready video management system. A list of current system adopters along with a detailed description can be found on the adopters page at List of Matterhorn adopters.

Matterhorn: Underlying technology

Matterhorn offers an open source reference implementation of an end-to-end enterprise lecture capture suite and a comprehensive set of flexible rich media services. It supports the typical lecture capture and video management phases: preparation, scheduling, capture, media processing, content distribution and usage. The picture below depicts the supported phases. These phases are also major differentiators in the system architecture. Additional information is available in [2].

Opencast Matterhorn phases of lecture recording

The core components are build upon a service-based architecture leveraging Java OSGI technology, which provides a standardized, component oriented environment for service coupling and cooperation [1], [2]. System administrators, lecturers or students do not need to handle Java objects, interfaces or service endpoints directly but can create and interact with system components by using fully customizable workflows (XML descriptions) for media recordings, encoding, handling and/or content distribution. Matterhorn comes with tools for administrators that allow to plan and schedule upcoming lectures as well as monitor different processes across distributed Matterhorn nodes. Feeds (ATOM/RSS) as well as a browsable media gallery enable users to customize and adapt content created with the system to local needs. Finally content player components (engage applications) are provided which allow to synchronize different streams (e.g. talking head and screen capture video or audience cameras), access content directly based on media search queries and use the media analysis capabilities for navigation purposes (e.g. slide changes).

Opencast Matterhorn welcome page

The project website provides a guided feature and demo tour, cookbook and installation sections about how to use and operate the system on a daily basis, as well as links to the community issue/feature tracker system Opencast issues.

Matterhorn2GO: Mobile connection to online learning repositories

Matterhorn 2GO is a cross-platform open source mobile front-end for recorded video material produced with the Opencast Matterhorn system [3]. It can be used on Android or iOS smartphones and tablets. The core technology used is Apache Flex. It has been released in the Google Play Store as well as in Apple’s iTunes store. Further information is available on the project website: Download / Install Matterhorn2GO. It brings lecture recordings and additional material created by Opencast Matterhorn to mobile learners worldwide. Matterhorn 2GO comes with powerful in content search capabilities based on Matterhorn’s core multimedia analysis features and is able to synchronize different content streams in one single view to fully follow the activity in the classroom. This allows users, for example, to access a certain aspect directly within numerous recorded series and/or single episodes. A user can choose between three different video view state options: a parallel view (professor and corresponding synchronized slides or screen recording), just the professor or only the lecture slides. Since most Opencast Matterhorn service endpoints offer a streaming option, a user can directly navigate to any time position in the video without waiting until it has been fully downloaded. The browse media page lists recordings from available Matterhorn installations. Students can simply follow their own eLectures but also get information about what else is being taught or presented at the local university or abroad at other learning institutes.

Stay informed and join the discussion

As an international open source community, Opencast has established several mechanisms for individuals to communicate, support each other and collaborate.

More information about communication within the community can be found at www.opencast.org/communications.

Conclusion

Matterhorn and the Opencast Community can offer research initiatives a prolific environment with a multitude of partners and a technology developed to be adapted, amended or supplemented by new features, be that voice recognition, face detection, support for mobile devices, semantic connections in learning objects or (big) data mining. The final objective is to ensure that research initiatives will consider Matterhorn a focal point for their activities. The governance model of the Opencast Community and the Opencast Matterhorn project can be found online at www.opencast.org/opencast-governance.

Acknowledgments and License

The authors would like to thank the Opencast Community and the Opencast Matterhorn developers for their support and creativity as well as the continuous efforts to create tools that can be used across campuses and learning institutes worldwide. Matterhorn is published under the Educational Community License (ECL) 2.0.

References

[1] Christopher A. Brooks, Markus Ketterl, Adam Hochman, Josh Holtzman, Judy Stern, Tobias Wunden, Kristofor Amundson, Greg Logan, Kenneth Lui, Adam McKenzie, Denis Meyer, Markus Moormann, Matjaz Rihtar, Ruediger Rolf, Nejc Skofic, Micah Sutton, Ruben Perez Vazquez, und Benjamin Wulff. OpenCast Matterhorn 1.1: reaching new heights. ACM Multimedia, pages 703-706. ACM, (2011)

[2] Ketterl, M, Schulte, O. A., Hochman, A. Opencast Matterhorn: A Community-Driven Open Source Software Project for Producing, Managing, and Distributing Academic Video. International Journal of Interactive Technology and Smart Education, Emerald Group Publishing Limited, Vol. 7 Iss: 3, pp.168 – 180, 2010.

[3] Markus Ketterl, Leonid Oldenburger, Oliver Vornberger. Opencast 2 Go: Mobile Connections to Multimedia Learning Repositories. In proceeding of: IADIS International Conference Mobile Learning, pages 181-188, Berlin, Germany, 2012

MediaEval Multimedia Benchmark: Highlights from the Ongoing 2013 Season

MediaEval is an international multimedia benchmarking initiative that offers tasks to the multimedia community that are related to the human and social aspects of multimedia. The focus is on addressing new challenges in the area of multimedia search and indexing that allow researchers to make full use of multimedia techniques that simultaneously exploit multiple modalities. A series of interesting tasks is currently underway in MediaEval 2013. As every year, the selection of tasks is made using a community-wide survey that gauges what multimedia researchers would find most interesting and useful. A new task to watch closely this year is Search and Hyperlinking of Television Content, which follows on the heels of a very successful pilot last year. The other main tasks running this year are:

The tagline of the MediaEval Multimedia Benchmark is: “The ‘multi’ in multimedia: speech, audio, visual content, tags, users, context”. This tagline explains the inspiration behind the choice of the Brave New Tasks, which are running for the first time this year. Here, we would like to highlight Question Answering for the Spoken Web, which builds on the Spoken Web Search tasks mentioned above. This task is a joint effort between MediaEval and the Forum for Information Retrieval Evaluation, a India-based information retrieval benchmark. MediaEval believes strongly in collaboration and complementarity between benchmarks and we hope that this task will help us to better understand how joint-tasks should be best designed and coordinated. The other Brave New Tasks at MediaEval this year are:

The MediaEval 2013 season culminates with the MediaEval 2013 worksop, which will take place in Barcelona, Catalunya, Spain, on Friday-Saturday 18-19 October 2013. Note that this is just before ACM Multimedia 201, which will be held Monday-Friday 21-25 October 2013, also in Barcelona. We are currently working on finalizing the registration site for the workshop and it will open very soon and will be announced on the MediaEval website. In order to further foster our understanding and appreciation of user-generated multimedia, each year we designate a MediaEval filmmaker to make a YouTube video about the workshop. The MediaEval 2012 workshop video was made by John N.A. Brown and has recently appeared online. John’s decided to focus on the sense of community and the laughter than he observed at the workshop. Interestingly, his focus recalls the work done at MediaEval 2010 on the role of laughter in social video, see:

http://www.youtube.com/watch?v=z1bjXwxkgBs&feature=youtu.be&t=1m29s

We hope that this video inspires your to join us in Barcelona.

Open Source Column: OpenIMAJ – Intelligent Multimedia Analysis in Java

Introduction

Multimedia analysis is an exciting and fast-moving research area. Unfortunately, historically there has been a lack of software solutions in a common programming language for performing scalable integrated analysis of all modalities of media (images, videos, audio, text, web-pages, etc). For example, in the image analysis world, OpenCV and Matlab are commonly used by researchers, whilst many common Natural Language Processing tools are built using Java. The lack of coherency between these tools and languages means that it is often difficult to research and develop rational, comprehensible and repeatable software implementations of algorithms for performing multimodal multimedia analysis. These problems are also exacerbated by the lack of any principled software engineering (separation of concerns, minimised code repetition, maintainability, understandability, premature optimisation and over optimisation) often found in research code. OpenIMAJ is a set of libraries and tools for multimedia content analysis and content generation that aims to fill this gap and address the concerns. OpenIMAJ provides a coherent interface to a very broad range of techniques, and contains everything from state-of-the-art computer vision (e.g. SIFT descriptors, salient region detection, face detection and description, etc.) and advanced data clustering and hashing, through to software that performs analysis on the content, layout and structure of webpages. A full list of the all the modules and an overview of their functionalities for the latest OpenIMAJ release can be found here. OpenIMAJ is primarily written in Java and, as such is completely platform independent. The video-capture and hardware libraries contain some native code but Linux, OSX and Windows are supported out of the box (under both 32 and 64 bit JVMs; ARM processors are also supported under Linux). It is possible to write programs that use the libraries in any JVM language that supports Java interoperability, for example Groovy and Scala. OpenIMAJ can even be run on Android phones and tablets. As it’s written using Java, you can run any application built using OpenIMAJ on any of the supported platforms without even having to recompile the code.

Some simple programming examples

The following code snippets and illustrations aim to give you an idea of what programming with OpenIMAJ is like, whilst showing some of the powerful features.

	...
	// A simple Haar-Cascade face detector
	HaarCascadeDetector det1 = new HaarCascadeDetector();
	DetectedFace face1 = det1.detectFaces(img).get(0);
	new SimpleDetectedFaceRenderer().drawDetectedFace(mbf,10,face1);

	// Get the facial keypoints
	FKEFaceDetector det2 = new FKEFaceDetector();
	KEDetectedFace face2 = det2.detectFaces(img).get(0);
	new KEDetectedFaceRenderer().drawDetectedFace(mbf,10,face2);

	// With the CLM Face Model
	CLMFaceDetector det3 = new CLMFaceDetector();
	CLMDetectedFace face3 = det3.detectFaces(img).get(0);
	new CLMDetectedFaceRenderer().drawDetectedFace(mbf,10,face3);
	...
Face detection, keypoint localisation and model fitting
	...
	// Find the features
	DoGSIFTEngine eng = new DoGSIFTEngine();
	List sourceFeats = eng.findFeatures(sourceImage);
	List targetFeats = eng.findFeatures(targetImage);

	// Prepare the matcher
	final HomographyModel model = new HomographyModel(5f);
	final RANSAC ransac = new RANSAC(model, 1500, 
	        new RANSAC.BestFitStoppingCondition(), true);
	ConsistentLocalFeatureMatcher2d matcher = 
	    new ConsistentLocalFeatureMatcher2d
	        (new FastBasicKeypointMatcher(8));

	// Match the features
	matcher.setFittingModel(ransac);
	matcher.setModelFeatures(sourceFeats);
	matcher.findMatches(targetFeats);
	...
Finding and matching SIFT keypoints
	...
	// Access First Webcam
	VideoCapture cap = new VideoCapture(640, 480);

	//grab a frame
	MBFImage last = cap.nextFrame().clone();

	// Process Video
	VideoDisplay.createOffscreenVideoDisplay(cap)
	    .addVideoListener(
	        new VideoDisplayListener() {
	        public void beforeUpdate(MBFImage frame) {
	            frame.subtractInplace(last).abs();
	            last = frame.clone();
	        }
	...
Webcam access and video processing

The OpenIMAJ design philosophy

One of the main goals in the design and implementation of OpenIMAJ was to keep all components as modular as possible, providing a clear separation of concerns whilst maximising code reusability, maintainability and understandability. At the same time, this makes the code easy to use and extend. For example, the OpenIMAJ difference-of-Gaussian SIFT implementation allows different parts of the algorithm to be replaced or modified at will without having to modify the source-code of the existing components; an example of this is our min-max SIFT implementation [1], which allows more efficient clustering of SIFT features by exploiting the symmetry of features detected at minima and maxima of the scale-space. Implementations of commonly used algorithms are also made as generic as possible; for example, the OpenIMAJ RANSAC implementation works with generic Modelobjects and doesn’t care whether the specific model implementation is attempting to fit a homography to a set of point-pair matches or a straight line to samples in a space. Primitive media types in OpenIMAJ are also kept as simple as possible: Images are just an encapsulation of a 2D-arrays of pixels; Videos are just encapsulated iterable collections/streams of images; Audio is just an encapsulated array of samples. The speed of individual algorithms in OpenIMAJ has not been a major development focus, however OpenIMAJ can not be called slow. For example, most of the algorithms implemented in both OpenIMAJ and OpenCV run at similar rates, and things such as SIFT detection and face detection can be run in real-time. Whilst the actual algorithm speed has not been a particular design focus, scalability of the algorithms to massive datasets has. Because OpenIMAJ is written in Java, it is trivial to integrate it with tools for distributed data processing, such as Apache Hadoop. Using the OpenIMAJ Hadoop tools [3] on our small Hadoop cluster, we have extracted and indexed visual term features from datasets with sizes in excess of 50 million images. The OpenIMAJ clustering implementations are able to cluster larger-than-memory datasets by reading data from disk as necessary.

A history of OpenIMAJ

OpenIMAJ was first made public in May 2011, just in time to be entered into the 2011 ACM Multimedia Open-Source Software Competition [2] which it went on to win. OpenIMAJ was not written overnight however. As shown in the following picture, parts of the original codebase came from projects as long ago as 2005. Initially, the features were focused around image analysis, with a concentration on image features used for CBIR (i.e. global histogram features), features for image matching (i.e. SIFT) and simple image classification (i.e. cityscape versus landscape classification).

A visual history of OpenIMAJ

As time went on, the list of features began to grow; firstly with more implementations of image analysis techniques (i.e. connected components, shape analysis, scalable bags-of-visual-words, face detection, etc). This was followed by support for analysing more types of media (video, audio, text, and web-pages), as well as implementations of more general techniques for machine learning and clustering. In addition, support for various hardware devices and video capture was added. Since its initial public release, the community of people and organisations using OpenIMAJ has continued to grow, and includes a number of internationally recognised companies. We also have an active community of people reporting (and helping to fix) any bugs or issues they find, and suggesting new features and improvements. Last summer, we had a single intern working with us, using and developing new features (in particular with respect to text analysis and mining functionality). This summer we’re expecting two or three interns who will help us leverage OpenIMAJ in the 2013 MediaEvalcampaign. From the point-of-view of the software itself, the number of features in OpenIMAJ continues to grow on an almost daily basis. Since the initial release, the core codebase has become much more mature and we’ve added new features and implementations of algorithms throughout. We’ve picked a couple of the highlights from the latest release version and the current development version below:

Reference Annotations

As academics we are quite used to the idea of throughly referencing the ideas and work of others when we write a paper. Unfortunately, this is not often carried forward to other forms of writing, such as the writing of the code for computer software. Within OpenIMAJ, we implement and expand upon much of our own published work, but also the published work of others. For the 1.1 release of OpenIMAJ we decided that we wanted to make it explicit where the idea for an implementation of each algorithm and technique came from. Rather than haphazardly adding references and citations in the Javadoc comments, we decided that the process of referencing should be more formal, and that the references should be machine readable. These machine-readable references are automatically inserted into the generated documentation, and can also be accessed programatically. It’s even possible to automatically generate a bibliography of all the techniques used by any program built on top of OpenIMAJ. For more information, take a look at thisblog post. The reference annotations are part of a bigger framework currently under development that aims to encourage better code development for experimentation purposes. The overall aim of this is to provide the basis for repeatable software implementations of experiments and evaluations, with automatic gathering of the basic statistics that all experiments should have, together with more specific statistics based on the type of evaluation (i.e. ROC statistics for classification experiments; TREC-style Precision-Recall for information retrieval experiments, etc).

Stream Processing Framework

Processing streaming data is a hot topic currently. We wanted to provide a way in OpenIMAJ to experiment with the analysis of streaming multimedia data (see the description of the “Twitter’s visual pulse” application below for example). The OpenIMAJ Streamclasses in the development trunk of OpenIMAJ provide a way to effectively gather, consume, process and analyse streams of data. For example, in just a few lines of code it is possible to get and display all the images from the live Twitter sample stream:

  //construct your twitter api key
  TwitterAPIToken token = ...

  // Create a twitter dataset instance connected to the live twitter sample stream
  StreamingDataset<Status> dataset = new TwitterStreamingDataset(token, 1);

  //use the Stream#map() method to transform the stream so we get images
  dataset
    //process tweet statuses to produce a stream of URLs
    .map(new TwitterLinkExtractor())
    //filter URLs to just get those that are URLs of images
    .map(new ImageURLExtractor())
    //consume the stream and display images
    .forEach(new Operation<URL>() {
      public void perform(URL url) {
        DisplayUtilities.display(ImageUtilities.readMBF(url));
      }
    });

The stream processing framework handles a lot of the hard-work for you. For example it can optionally drop incoming items if you are unable to consume the stream at a fast enough rate (in this case it will gather statistics about what it’s dropped). In addition to the Twitter live stream, we’ve provided a number of other stream source implementations, including the one based on the Twitter search API and one based on IRC chat. The latter was used to produce a simple visualisation of a world map that shows where current Wikipedia edits are currently happening.

Improved face pipeline

The initial OpenIMAJ release contained some support for face detection and analysis, however, this has been and continues to be improved. The key advantage OpenIMAJ has over other libraries such as OpenCV in this area is that it implements a complete pipeline with the following components:

  1. Face Detection
  2. Face Alignment
  3. Facial Feature Extraction
  4. Face Recognition/Classification

Each stage of the pipeline is configurable, and OpenIMAJ contains a number of different algorithm implementations for each stage as well as offering the possibility to easily implement more. The pipeline is designed to allow researchers to focus on a specific area of the pipeline without having to worry about the other components. At the same time, it is fairly easy to modify and evaluate a complete pipeline. In addition to the parts of the recognition pipeline, OpenIMAJ also includes code for tracking faces in videos and comparing the similarity of faces.

Improved audio processing & analysis functionality

When OpenIMAJ was first made public, there was little support for audio processing and analysis beyond playback, resampling and mixing. As OpenIMAJ has matured, the audio analysis components have grown, and now include standard audio feature extractors for things such as Mel-Frequency Cepstrum Coefficients (MFCCs), and higher level analysers for performing tasks such as beat detection, and determining if an audio sample is human speech. In addition, we’ve added a large number of generation, processing and filtering classes for audio signals, and also provided an interface between OpenIMAJ audio objects and the CMU Sphinx speech recognition engine.

Example applications

Every year our research group holds a 2-3 day Hackathon where we stop normal work and form groups to do a mini-project. For the last two years we’ve built applications using OpenIMAJ as the base. We’ve provided a short description together with some links so that you can get an idea of the varied kinds of application OpenIMAJ can be used to rapidly create.

Southampton Goggles

In 2011 we built “Southampton Goggles”. The ultimate aim was to build a geo-localisation/geo-information system based on content-based matching of images of buildings on the campus taken with a mobile device; the idea was that one could take a photo of a building as a query, and be returned relevant information about that building as a response (i.e. which faculty/school is located in it, whether there are vending machines/cafe’s in the building, the opening times of the building, etc). The project had two parts: the first part was data collection in order to collect and annotate the database of images which we would match against. The second part involved indexing the images, and making the client and server software for the search engine. In order to rapidly collect images of the campus, we built a hand-portable streetview like camera device with 6 webcams, a GPS and compass. The software for controlling this used OpenIMAJ to interface with all the hardware and record images, location and direction at regular time intervals. The camera rig and software are shown below:

The Southampton Goggles Capture Rig

The Southampton Goggles Capture Software, built using OpenIMAJ

For the second part of the project, we used the SIFT feature extraction, clustering and quantisation abilities of OpenIMAJ to build visual-term representations of each image, and used our ImageTerrier software [3,4] to build an inverted index which could be efficiently queried. For more information on the project, see this blog post.

Twitter’s visual pulse

Last year, we decided that for our mini-project we’d explore the wealth of visual information on Twitter. Specifically we wanted to look at which images were trending based not on counts of repeated URLs, but on the detection of near-duplicate images hosted at different URLs. In order to do this, we used what has now become the OpenIMAJ stream processing framework, described above, to:

  1. ingest the Twitter sample stream,
  2. process the tweet text to find links,
  3. filter out links that weren’t images (based on a set of patterns for common image hosting sites),
  4. download and resample the images,
  5. extract sift features,
  6. use locality sensitive hashing to sketch each SIFT feature and store in an ensemble of temporal hash-tables.

This process happens continuously in real-time. At regular intervals, the hash-tables are used to build a duplicates graph, which is then filtered and analysed to find the largest clusters of duplicate images, which are then visualised. OpenIMAJ was used for all the constituent parts of the software: stream processing, feature extraction and LSH. The graph construction and filtering uses the excellent JGraphT library that is integrated into the OpenIMAJ core-math module. For more information on the “Twitter’s visual pulse” application, see the paper [5] and this video.

Erica the Rhino

This year, we’re involved in a longer-running hackathon activity to build an interactive artwork for a mass public art exhibition called Go! Rhinos that will be held throughout Southampton city centre over the summer. The Go! Rhinos exhibition features a large number of rhino sculptures that will inhabit the streets and shopping centres of Southampton. Our school has sponsored a rhino sculpture called Erica which we’ve loaded with Raspberry Pi computers, sensors and physical actuators. Erica is still under construction, as shown in the picture below:

Erica, the OpenIMAJ-powered interactive rhino sculpture

OpenIMAJ is being used to provide visual analysis from the webcams that we’ve installed as eyes in the rhino sculpture (shown below). Specifically, we’re using a Java program built on top of the OpenIMAJ libraries to perform motion analysis, face detection and QR-code recognition. The rhino-eyeprogram runs directly on a Raspberry Pi mounted inside the sculpture.

Erica’s eye is a servo-mounted webcam, powered by software written using OpenIMAJ and running on a Raspberry Pi

For more information, check out Erica’s website and YouTube channel, where you can see a prototype of the OpenIMAJ-powered eye in action.

Conclusions

For software developers, the OpenIMAJ library facilitates the rapid creation of multimedia analysis, indexing, visualisation and content generation tools using state-of-the-art techniques in a coherent programming model. The OpenIMAJ architecture enables scientists and researchers to easily experiment with different techniques, and provides a platform for innovating new solutions to multimedia analysis problems. The OpenIMAJ design philosophy means that building new techniques and algorithms, combining different approaches, and extending and developing existing techniques, are all achievable. We welcome you to come and try OpenIMAJ for your multimedia analysis needs. To get started watch the introductory videos, try the tutorial, and look through some of the examples. If you have any questions, suggestions or comments, then don’t hesitate to get in contact.

Acknowledgements

Early work on the software that formed the nucleus of OpenIMAJ was funded by the European Unions 6th Framework Programme, the Engineering and Physical Sciences Research Council, the Arts and Humanities Research Council, Ordnance Survey and the BBC. Current development of the OpenIMAJ software is primarily funded by the European Union Seventh Framework Programme under the ARCOMEM and TrendMiner projects. The initial public releases were also funded by the European Union Seventh Framework Programme under the LivingKnowledge together with the LiveMemories project, funded by the Autonomous Province of Trento.

Open Source Column: Dynamic Adaptive Streaming over HTTP Toolset

Introduction

Multimedia content is nowadays omnipresent thanks to technological advancements in the last decades. A major driver of today’s networks are content providers like Netflix and YouTube, which do not deploy their own streaming architecture but provide their service over-the-top (OTT). Interestingly, this streaming approach performs well and adopts the Hypertext Transfer Protocol (HTTP), which has been initially designed for best-effort file transfer and not for real-time multimedia streaming. The assumption of former video streaming research that streaming on top of HTTP/TCP will not work smoothly due to its retransmission delay and throughput variations, has apparently be overcome as supported by [1]. Streaming on top of HTTP, which is currently mainly deployed in the form of progressive download, has several other advantages. The infrastructure deployed for traditional HTTP-based services (e.g., Web sites) can be exploited also for real-time multimedia streaming. Typical problems of real-time multimedia streaming like NAT or firewall traversal do not apply for HTTP streaming. Nevertheless, there are certain disadvantages, such as fluctuating bandwidth conditions, that can not be handled with the progressive download approach, which is a major drawback especially for mobile networks where the bandwidth variations are tremendous. One of the first solutions to overcome the problem of varying bandwidth conditions has been specified within 3GPP as Adaptive HTTP Streaming (AHS) [2]. The basic idea is to encode the media file/stream into different versions (e.g., bitrate, resolution) and chop each version into segments of the same length (e.g., two seconds). The segments are provided on an ordinary Web server and can be downloaded through HTTP GET requests. The adaptation to the bitrate or resolution is done on the client-side for each segment, e.g., the client can switch to a higher bitrate – if bandwidth permits – on a per segment basis. This has several advantages because the client knows best its capabilities, received throughput, and the context of the user. In order to describe the temporal and structural relationships between segments, AHS introduced the so-called Media Presentation Description (MPD). The MPD is a XML document that associates an uniform resource locators (URL) to the different qualities of the media content and the individual segments of each quality. This structure provides the binding of the segments to the bitrate (resolution, etc.) among others (e.g., start time, duration of segments). As a consequence each client will first request the MPD that contains the temporal and structural information for the media content and based on that information it will request the individual segments that fit best for its requirements. Additionally, the industry has deployed several proprietary solutions, e.g., Microsoft Smooth Streaming [3], Apple HTTP Live Streaming [4] and Adobe Dynamic HTTP Streaming [5], which more or less adopt the same approach.

Figure 1: Concept of Dynamic Adaptive Streaming over HTTP.

Recently, ISO/IEC MPEG has ratified Dynamic Adaptive Streaming over HTTP (DASH) [6] an international standard that should enable interoperability among proprietary solutions. The concept of DASH is depicted in Figure 1. The Institute of Information Technology (ITEC) and, in particular, the Multimedia Communication Research Group of the Alpen-Adria-Universität Klagenfurt has participated and contributed from the beginning to this standard. During the standardization process a lot of research tools have been developed for evaluation purposes and scientific contributions including several publications. These tools are provided as open source for the community and are available at [7].

Open Source Tools Suite

Our open source tool suite consists of several components. On the client-side we provide libdash [8] and the DASH plugin for the VLC media player (also available on Android). Additionally, our suite also includes a JavaScript-based client that utilizes the HTML5 media source extensions of the Google Chrome browser to enable DASH playback. Furthermore, we provide several server-side tools such as our DASH dataset, consisting of different movie sequences available in different segment lengths as well as bitrates and resolutions. Additionally, we provide a distributed dataset mirrored at different locations across Europe. Our datasets have been encoded using our DASHEncoder, which is a wrapper tool for x264 and MP4Box. Finally, a DASH online MPD validation service and a DASH implementation over CCN completes our open source tool suite.

libdash

Figure 2: Client-Server DASH Architecture with libdash.

The general architecture of DASH is depicted in Figure 2, where orange represents standardized parts. libdash comprises the MPD parsing and HTTP part. The library provides interfaces for the DASH Streaming Control and the Media Player to access MPDs and downloadable media segments. The download order of such media segments will not be handled by the library. This is left to the DASH Streaming Control, which is an own component in this architecture but it could also be included in the Media Player. In a typical deployment, a DASH server provides segments in several bitrates and resolutions. The client initially receives the MPD through libdash which provides a convenient object-oriented interface to that MPD. Based on that information the client can download individual media segments through libdash at any point in time. Varying bandwidth conditions can be handled by switching to the corresponding quality level at segment boundaries in order to provide a smooth streaming experience. This adaptation is not part of libdash and the DASH standard and will be left to the application which is using libdash.

DASH-JS

Figure 3: Screenshot of DASH-JS.

DASH-JS seamlessly integrates DASH into the Web using the HTML5 video element. A screenshot is shown in Figure 3. It is based on JavaScript and uses the Media Source API of Google’s Chrome browser to present a flexible and potentially browser independent DASH player. DASH-JS is currently using WebM-based media segments and segments based on the ISO Base Media File Format.

DASHEncoder

DASHEncoder is a content generation tool – on top of the open source encoding tool x264 and GPAC’s MP4Box – for DASH video-on-demand content. Using DASHEncoder, the user does not need to encode and multiplex separately each quality level of the final DASH content. Figure 4 depicts the workflow of the DASHEncoder. It generates the desired representations (quality/bitrate levels), fragmented MP4 files, and MPD file based on a given configuration file or by command line parameters.

Figure 4: High-level structure of DASHEncoder.

The set of configuration parameters comprises a wide range of possibilities. For example, DASHEncoder supports different segment sizes, bitrates, resolutions, encoding settings, URLs, etc. The modular implementation of DASHEncoder enables the batch processing of multiple encodings which are finally reassembled within a predefined directory structure represented by single MPD. DASHEncoder is available open source on our Web site as well as on Github, with the aim that other developers will join this project. The content generated with DASHEncoder is compatible with our playback tools.

Datasets

Figure 5: DASH Dataset.

Our DASH dataset comprises multiple full movie length sequences from different genres – animation, sport and movie (c.f. Figure 5) – and is located at our Web site. The DASH dataset is encoded and multiplexed using different segment sizes inspired by commercial products ranging from 2 seconds (i.e., Microsoft Smooth Streaming) to 10 seconds per fragment (i.e., Apple HTTP Streaming) and beyond. In particular, each sequence of the dataset is provided with segments sizes of 1, 2, 4, 6, 10, and 15 seconds. Additionally, we also offer a non-segmented version of the videos and the corresponding MPD for the movies of the animation genre, which allows for byte-range requests. The provided MPDs of the dataset are compatible with the current implementation of the DASH VLC Plugin, libdash, and DASH-JS. Furthermore, we provide a distributed DASH (D-DASH) dataset which is, at the time of writing, replicated on five sites within Europe, i.e., Klagenfurt, Paris, Prague, Torino, and Crete. This allows for a real-world evaluation of DASH clients that perform bitstream switching between multiple sites, e.g., this could be useful as a simulation of the switching between multiple Content Distribution Networks (CDNs).

DASH Online MPD Validation Service

The DASH online MPD validation service implements the conformance software of MPEG-DASH and enables a Web-based validation of MPDs based on a file, URI, and text. As the MPD is based on XML schema, it is also possible to use an external XML schema file for the validation.

DASH over CCN

Finally, the Dynamic Adaptive Streaming over Content Centric Networks (DASC áka DASH over CCN) implements DASH utilizing a CCN naming scheme to identify content segments in a CCN network. Therefore, the CCN concept from Jacobson et al. and the CCNx implementation (www.ccnx.org) of PARC is used. In particular, video segments formatted according to MPEG-DASH are available in different quality levels but instead of HTTP, CCN is used for referencing and delivery.

Conclusion

Our open source tool suite is available to the community with the aim to provide a common ground for research efforts in the area of adaptive media streaming in order to make results comparable with each other. Everyone is invited to join this activity – get involved in and excited about DASH.

Acknowledgments

This work was supported in part by the EC in the context of the ALICANTE (FP7-ICT-248652) and SocialSensor (FP7-ICT-287975) projects and partly performed in the Lakeside Labs research cluster at AAU.

References

[1] Sandvine, “Global Internet Phenomena Report 2H 2012”, Sandvine Intelligent Broadband Networks, 2012. [2] 3GPP TS 26.234, “Transparent end-to-end packet switched streaming service (PSS)”, Protocols and codecs, 2010. [3] A. Zambelli, “IIS Smooth Streaming Technical Overview,” Technical Report, Microsoft Corporation, March 2009. [4] R. Pantos, W. May, “HTTP Live Streaming”, IETF draft, http://tools.ietf.org/html/draft-pantos-http-live-streaming-07 (last access: Feb 2013). [5] Adobe HTTP Dynamic Streaming, http://www.adobe.com/products/httpdynamicstreaming/ (last access: Feb 2013). [6] ISO/IEC 23009-1:2012, Information technology – Dynamic adaptive streaming over HTTP (DASH) – Part 1: Media presentation description and segment formats. Available here [7] ITEC DASH, http://dash.itec.aau.at [8] libdash open git repository, https://github.com/bitmovin/libdash  

Open Source Column: GPAC

GPAC, Toolbox for Interactive Multimedia Packaging, Delivery and Playback

Introduction

GPAC was born 10 years ago from the need of a lighter and more robust implementation of the MPEG-4 Systems standard [1], compared to the official reference software. It has since then evolved into a much wider project, covering many tools required when exploring new research topics in multimedia, while keeping a strong focus on international standard coming from organization such as W3C, ISO, ETSI or IETF. The goal of the project is to provide the tools needed to setup test beds and experiments for interactive multimedia applications, in any of the various environments used to deliver content in modern systems: broadcast, multicast, unicast unreliable streaming, HTTP-based streaming and file-based delivery. Read more

Ambulant – a multimedia playback platform

Distributed multimedia is a field that depends on many technologies, including networking, coding and decoding, scheduling, rendering and user interaction. Often, this leads to multimedia researchers in one of those fields expending a lot of implementation effort to build a complete media environment when they actually only want to demonstrate an advance within their own field. In 2004 the authors, having gone through this process more than once themselves, decided to design an open source extensible and embeddable multimedia platform that could serve as a central research resource. The NLNet Foundation, www.nlnet.nl, graciously provided initial funding for the resulting Ambulant project. Ambulant was designed from the outset to be usable for experimentation in a wide range of fields, not only in a laboratory setting but also as a deployed player for end users. However, it was not intended to compete with general end-user playback systems such as the (then popular) RealPlayer, Quicktime or the Windows Media Player. Our goal was to build a glue environment where various research groups could plug in next approaches to media scheduling, rendering and distribution. While some effort was spent on things like ease of installation, multi-platform compatibility and user interface issues, Ambulant has never hoped to usurp commercial media players. The user interface on three different platforms can bee seen in the figure below.

The first deployment of the platform was during the W3C standardization of SMIL 2.1 and 3.0 [2, 3], when Ambulant was used to test the specification and create an open reference implementation. The fact that Ambulant supports SMIL out of the box means that it is not only useful to “low-level” multimedia researchers who want to experiment with replacing systems components, but also to people interested in semantics or server-side document generation: by using SMIL as their output format they can use Ambulant to render their documents on any platform, including inside a web browser. Design and Implementation Ambulant is designed so that all key components are replaceable and extensible. This follows from the requirement that it is usable as an experimentation vehicle: if someone wants to replace the scheduler by one of their own design this should be possible, and have little or no impact on the rest of the system. To ensure wide deployability it was decided to create a portable platform. However, runtime efficiency is also an issue in multimedia playback, especially for audio and video decoding and rendering, so we decided to implement the core engine in C++. This allowed us to use platform-native decoding and rendering toolkits such as QuickTime and DirectShow, and gave us the added benefit of being able to use the native GUI toolkit on each platform, which makes life easier for end users and integrators. Using the native GUI has been a bit of extra effort up front, finding the right spot to separate platform-independent and platform-dependent code, but by now porting to a new GUI toolkit takes about three man-months. About 8 GUI toolkits have been supported over time (or 11 if you count browser plugin APIs as a GUI toolkit). The current version of Ambulant runs natively on MacOSX, Linux, Windows and iOS, and a browser plugin is available for all major browsers on all desktop platforms (including Internet Explorer on Windows). Various old platforms (WM5, Maemo) were supported in the past and, while no longer maintained, the code is still available. The design of Ambulant is shown in the figure above. On the document level there is a parser which reads external documents and converts them into a representation that the scheduler and layout engine will handle during document playout time. On the intermediate level there are datasources that read documents and media streams and handles them to the playout components. On the lower level there are the machine-dependent implementations of those stream readers and renderers. For each of these components there are multiple implementations, and those can easily be replaced or extended. The design largely uses factory functions and abstract interfaces, therefore the implementation uses a plugin architecture to allow easy replacement of components at runtime without having to rebuild the complete application. To make life even more simple, the API to the core Ambulant engine is available not only in C++ but also in Python. The Python bridge is complete and bidirectional: all classes that are accessible from C++ are just as accessible from Python and vice versa, and sending an object back-and-forth through the Python-C++ bridge results in the original object, not a new double-wrapped object. Moreover, not only can C++ classes be subclassed in Python but also the reverse. This means both extending Ambulant through a plugin and embedding Ambulant can be done in pure Python, without having to write any C/C++ code and without having to rebuild Ambulant. Applications Over the years, Ambulant has extensively been used for experimentation, both within our group and externally. In this section we will highlight some of these applications. The overview is not complete, but it highlights the breadth of applications of Ambulant. One of the interests of the authors is maintaining the temporal scheduling integrity of dynamically modified multimedia presentations. In the Ambulant Annotator [4], we experimented with using secondary screens during playback, allowing user interaction on those secondary screens to modify existing shared presentations on the main screen. The modification and sharing interface was implemented as a plugin in Ambulant, which is also used to drive the main screen. In Ta2 MyVideos [5] we looked at a different form of live modification: a personalized video mashup that was created while the user is viewing it. Integration of live video conferencing and multimedia documents is another area in which we work. For the Ta2 Family Game project [6] we augmented Ambulant with renderers to do low delay live video rendering and digitizing, and a Flash engine. The resulting platform was used to play a cooperative action game in multiple locations. We are also using Ambulant to investigate protocols for synchronizing media playback at remote locations. In a wholly different application area, the Daisy Consortium has used Ambulant as the basis of AMIS, www.daisy.org/projects/amis. AMIS is software that reads Daisy Books, which are the international standard for digital talking books for the visually impaired. For this project Ambulant was only a small part of the solution. The main program allows the end user, who may be blind or dyslectic, to select books and navigate them. Timed playback is then handled by Ambulant, with added functionality to highlight paragraphs on-screen as the content is read out, etc. At a higher level, an instrumented version of Ambulant has also been deployed to indirectly evaluate social media systems. In 2004, it was submitted to the first ACM Multimedia Open Source Software Competition [1]. Obtaining and Using Ambulant Ambulant is available via www.ambulantplayer.org, in three different forms: as a stable distribution (source and installers), as a nightly build (source and installers) and through Mercurial. Unfortunately, the stable distribution is currently lagging quite a bit behind, due to restricted manpower. We also maintain full API documentation, sample documents and community mailing lists. Ambulant is distributed under the LGPL2 license. This allows the platform to be used with commercial plugins developed by industry partners who provide proprietary software intended for limited distribution. We are considering a switch to dual licensing (GPL/BSD), but a concrete need has yet to arise. The Bottom Line Ambulant is a full open source media rendering pipeline. It provides an open, plug-in environment in which researches from a wide variety of (sub)disciplines can test new algorithms and media sharing approaches without having to write mountains of less-relevant framework code. It can serve as an open environment for experimentation, validation and distribution. You are welcome to give it a try and to contribute to its growth. References [1]Bulterman, D. et al. 2004. Ambulant: a fast, multi-platform open source SMIL player. In Proceedings of the 12th annual ACM international conference on Multimedia (MULTIMEDIA ’04). ACM, New York, NY, USA, 492-495. DOI=10.1145/1027527.1027646 [2]Bulterman, D. et al. 2008. Synchronized Multimedia Integration Language (SMIL 3.0). W3C. URL=http://www.w3.org/TR/SMIL/ [3]Bulterman, D. and Rutledge, L. 2008. Interactive Multimedia for the Web, Mobile Devices and Daisy Talking Books. Springer-Verlag, Heidelberg, Germany, ISBN: 3-540-20234-X. [4]Cesar, P. et al. Fragment, tag, enrich, and send: Enhancing social sharing of video. Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP (2009) vol. 5 (3). DOI=10.1145/1556134.1556136 [5]Jansen, J. et al. 2012. Just-in-time personalized video presentations. In Proceedings of the 2012 ACM symposium on Document engineering (DocEng ’12). ACM, New York, NY, USA, 59-68. DOI=10.1145/2361354.2361368 [6]Jansen, J. et al. Enabling Composition-Based Video-Conferencing for the Home. IEEE Transactions on Multimedia (2011) vol. 13 (5) pp. 869-881. DOI=10.1109/TMM.2011.2159369

Open Source Column: Tribler: P2P Search, Share and Stream

Six years ago, we created a new open source P2P file sharing program called Tribler. During this time over one million users have used it, and three generations of Ph.D. students tested their algorithms in the real world. Tribler is built around BitTorrent. Introduced in 2001, BitTorrent revolutionized the P2P world because of its unprecedented efficiency. However, some problems are not properly addressed in BitTorrent. First, it does not specify how to search the network, relying instead on central websites. These websites allow users to find and download small metadata files called torrents. A torrent describes the content and is required for downloading to start. Second, BitTorrent’s unique method for downloading files is incompatible for streaming. This is due to the fact that it is optimized for speedy and reliable downloading, not providing a method for quick buffering. Tribler is the first client which continuously tries to improve upon the basic BitTorrent implementation by addressing some of the flaws described above. It implements, amongst others, remote search, streaming, channels and reputation-management. All these features are implemented in a completely distributed manner, not relying on any centralized component. Still, Tribler manages to remain fully backwards compatible with BitTorrent. Work on Tribler was initiated in 2005 and has been supported by multiple European grants. In order to maximize the resource contribution of peers (other computers downloading/uploading the same file), BitTorrent splits a file into small pieces. This way, downloaders (called leechers) can upload completed pieces to other leechers, without the need to have the complete file first. Furthermore, uploading is encouraged by the tit-for-tat incentive mechanism: a leecher will rank its connected peers by their upload speed and will upload only to the fastest uploaders. Peers which have the complete file can help others by sending them pieces for free, these peers are called seeders. Before being able to download a file, a peer first has to obtain a torrent. This torrent describes the content of the file and includes the SHA-1 cryptographic hash per piece. This mechanism protects against transfer errors and malicious modifications of the file.

Tribler Design and Features

Tribler component overview

A basic overview of Tribler is shown in the figure on the right, it consists of four distinct components.

  • GUI: build using wxWidgets in order to be platform independent
  • BTengine: a BitTorrent engine, which has been altered to allow for our customizations
  • BuddyCast: our custom BitTorrent overlay, which is slowly being phased out
  • Dispersy: our new custom protocol build with NAT traversal and distributed permissions

Tribler is built around PermIds, permanent identities that allow us to identify the actions of users. PermIds are stored as a public/private keypair and are used in Tribler to sign every message. Communication between peers is established by using a ‘special’ BitTorrent swarm. All Tribler peers connect to this swarm and communicate with each other using the BuddyCast protocol. Peers are discovered by connecting to SuperPeers. Identical to ‘normal’ peers, but are considered to always be online. BuddyCast connects to a new peer every 15s and will exchange its preference list. This list contains the last downloads of a peer. By collecting them, we can calculate which peers are most similar to a given user. Those peers, taste buddies, are then used during search. While performing the BuddyCast handshake the user will exchange his current connections as well, allowing it to hillclimb towards finding his most similar peers and discovering new peers.  

Search

Tribler search results

Performing remote search in a decentralized manner has been a problem for many years. An early P2P protocol, Gnutella, used to send a message to all of its neighbors, which was then forwarded until a TTL of 7 was reached. Such an implementation is called flooding, as it causes a search-query to be sent to almost all peers in the network. Flooding a network is very quick, but it consumes huge amounts of bandwidth.

In contrast, Tribler uses a TTL of 1 (i.e., it only uses its neighbors to perform a remote search). Using the connections of the user to similar peers, we can obtain good results. Using only taste buddies, hitrates over 60% are possible. This figure is further improved by local caches deployed at every peer. The caches contain information for up to 50 000 torrents, which are used for improving search. Tribler connects to up to 10 taste buddies and to 10 random peers, thus allowing the user to search within up to 1050000 torrents. A torrent is collected when our algorithms deem it interesting for a peer. This is calculated by using the same user-item matrix that is used to find similar peers. The user-item matrix is constructed by storing the BuddyCast preferences. Collaborative filtering allows us to ‘recommend’ torrents to be collected. Collected torrents are thus tailored for every user, resulting in quicker search results, as we can display in the GUI the locally cached results before receiving any response from a peer. More details are available in our papers [1,2].

Streaming

Tribler VOD streaming

Tribler supports two distinct types of streaming: Video-On-Demand and Live-Steaming. Both of them extend BitTorrent, by replacing one aspect of it. VOD requires a different approach for downloading pieces. The default policy of BitTorrent is to download the rarest piece first, ensuring the health of all pieces in the swarm. In contrast, in VOD we want to download the first few pieces as soon as possible to commence playback as soon as possible.

To allow for this we have defined three priorities (high, mid and low). Priorities are assigned to pieces, based on the current playback position. High priority pieces are downloaded in-order, thus allowing to start playback quickly. After all high priority pieces are downloaded, we start downloading mid priority pieces. Those are downloaded rarest-first, this to ensure we maintain the overall health of the swarm. Because mid priority pieces are only a subset of all available pieces, we still ensure that the playback-buffer remains stable. After downloading all high and mid priority pieces, we start downloading the low priority pieces, rarest-first. Because the playback position is moving forward, the priority of the pieces will be continuously modified. Furthermore, we replaced the default BitTorrent incentive mechanism (tit-for-tat) with Give-to-Get. This incentive mechanism will rank peers according to their forwarding rank. A metric describing how well a peer is sending pieces to other peers. Full details are available in our paper [5].

For live streaming we had to modify the actual torrent file. Because in live streaming pieces are not know beforehand, we cannot include their SHA-1 hashes. We replaced the verification scheme by specifying the public key of the original source in the torrent file. Using the public key, every peer can then verify the validity of the pieces. Because live streaming may have an indefinite duration, we keep pieces that are at most 15 minutes old relative to our playback position.

Channels

Channel overview: listing all torrents

Channel comments: listing latest received comments for this Channel

Channel activity: listing latest activity for this Channel

Since December 2011 we are evaluating, in the wild, the performance of our new transport protocol, Dispersy. Dispersy is the successor of BuddyCast, and it is focused on NAT-traversal. Instead of using TCP, Dispersy uses UDP. Furthermore, while BuddyCast implemented one global overlay to connect all Tribler-peers using a ‘special’ BitTorrent swarm, Dispersy creates a separate overlay per protocol.

Using Dispersy we implemented Channels. These are created by users and consist of a list of torrents they like. Channels are implemented as separate Dispersy overlays and are discovered through one special overlay to which all peers connect. Channels have evolved from a simple list of torrents to a community in which users can comment on torrents, modify their name and description and organize playlists. Modifications are publicly visible, in a system that resembles Wikipedia. By allowing everyone to edit/improve the metadata of torrents, we hope to get a similar quality of experience as in Wikipedia. If a channel owner (the user creating the channel) does not want other users to interfere with his channel, he can limit which messages other users are able to send. This is enabled by using the decentralized permission system build into Dispersy, and it allows for a flexible configuration of channels. By voting on a Channel, Dispersy will start to collect its contents. Before voting on a Channel only a snapshot of its content is available. More details regarding the voting is described in our paper [6]. Currently, popular channels have well over 30 000 torrents. Furthermore, our users have currently casted over 60 000 votes.

Reputation

BarterCast graph: showing data transfers between peers

A feature lacking in BitTorrent is the cross-swarm identification of peers. While downloading a file, peers have an incentive to upload to others due to tit-for-tat. But after completing a download, no incentives are in place to motivate a peer to keep uploading a file.

In order to address this, Tribler uses its PermIds to identify Tribler peers in other swarms. Additionally, we employ a mechanism called BarterCast, which builds a history of upload and download traffic between Tribler peers. We can then build a graph consisting of the download behavior of peers, scoring them accordingly. A peer which has shown to upload more than others is rewarded by being able to download at a faster rate, while lacking peers can be prevented from downloading at all. More details are available in our papers [3,4].

Acknowledgments

Since the start of the project in 2005 many many people have contributed to the project. Amongst others are A. Bakker, J.J.D. Mol, J. Yang, L. d’Acunto, J.A. Pouwelse, J. Wang, P. Garbacki, A. Iosup, J. Doumen, J. Roozenburg, Y. Yuan, M. ten Brinke, L. Musat, F. Zindel, F. van der Werf, M. Meulpolder, J. Taal, R. Rahman, B. Schoon and N.S.M. Zeilemaker. Tribler is a project which continues to evolve with the help of its community. Currently we have an active userbase who are commenting and suggesting features in the forums and continues to innovate together with our european partners.

If you are interested by the text above and want to try out Tribler, you can download it from our website http://www.tribler.org. Furthermore, the website has even more documentation of feature Tribler has and had, feel free to look around and leave a comment in the forums.

References
  1. Pouwelse JA, Garbacki P, Wang J, et al. Tribler: A social-based peer-to-peer system. Concurrency and Computation: Practice and Experience 2008. Available at: http://www3.interscience.wiley.com/journal/114219988/abstract
  2. Zeilemaker N, Capota M, Bakker A. Tribler: P2P media search and sharing. Proceedings of the 19th ACM international conference on Multimedia (ACM MM) 2011. Available at: http://dl.acm.org/citation.cfm?id=2072433
  3. Meulpolder M, Pouwelse J. Bartercast: A practical approach to prevent lazy freeriding in p2p networks. Sixth Int’l Workshop on Hot Topics in P2P Systems (HoT-P2P) 2009. Available at: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5160954
  4. Delaviz R, Andrade N, Pouwelse JA. Improving accuracy and coverage in an internet-deployed reputation mechanism. IEEE Tenth International Conference on Peer-to-Peer Computing (IEEE P2P) 2010. Available at: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5569965
  5. Mol J, Pouwelse J, Meulpolder M, Epema D, Sips H. Give-to-get: Free-riding-resilient video-on-demand in p2p systems. Fifteenth Annual Multimedia Computing and Networking (MMCN) 2008. Available at: http://www.pds.ewi.tudelft.nl/pubs/papers/mmcn2008.pdf
  6. Rahman R, Hales D, Meulpolder M, Heinink V, Pouwelse JA and Sips H. Robust vote sampling in a P2P media distribution system. In: Proceedings of IPDPS 2009. Available at: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5160946
  7. Tribler Protocol Specification. Available at: http://svn.tribler.org/bt2-design/proto-spec-unified/trunk/proto-spec-current.pdf

Open Source Column: Mozilla Popcorn

Mozilla Popcorn: Web Video Interaction Using Client-Side Javascript

Context and history

Popcorn is an HTML5 media project from Mozilla, the non-profit organization that makes the Firefox web browser. It makes media-oriented web development easy through shared development, open source libraries and tools.

Popcorn.js: a Javascript library for interactions between video and the web

Popcorn.js makes web media more connected by providing an event-driven API to hook <video> and <audio> content into the of the capibilities of the web platform (developer.mozilla.org). Prior to HTML5, web video lived exclusively inside browser plug-ins like Flash and VLC, which put it outside the reach of JavaScript, CSS, and other techniques for interacting with the rest of the surrounding HTML document. Popcorn.js turns media into fully-interactive JavaScript objects, so that media objects can both trigger and listen for events. It enables developers to cue events along a media timeline using a simple Javascript syntax:

var pop = Popcorn("#my-video");
pop.text({
start: 1.38,
end: 5.12,
text: "Hello World",
target: "my-div"
});
pop.play();

Live source available at http://jsfiddle.net/p8Kbs/80/. Additionally, media playback is accessible via popcorn.play(), popcorn.pause(), and popcorn.currentTime(seconds), which allows you to jump to any point in the timeline of the referenced media. As a nod toward the expectations of media producers and videographers, Popcorn.js also provides methods like popcorn.cue(), which simply ties actions to specific times. Aside from simple time-based triggers, you can use popcorn.listen(event,callback_function) to bind the callback function to a specified event. Built-in events are provided to handle typical HTML5 web video playback scenarios, such as “play,” “pause,” “loadstart,” “seeked,” “volumechange,” and so on. However, you can define custom events and trigger them directly by using popcorn.trigger(event[,data]), where the data parameter is an optional data object to send to listeners.

Extensibility

By design, Popcorn is extensible. Mozilla supports about 20 plugins that come packaged with the library, ranging from simple HTML element insertion, to complex data retrieval and aggregation. Examples include a subtitle plugin, a GoogleMaps plugin, a Twitter plugin, a Facebook plugin, and a JavaScript code plugin. If some desired functionality doesn’t yet exist in the library, Popcorn.js has a well-documented plugin architecture: http://popcornjs.org/popcorn-docs/addon-development/. Popcorn works best with HTML5 media, but also has wrappers for arbitary objects (through the “baseplayer”) and Flash players, like YouTube, Vimeo, Flowplayer, and Soundcloud. It’s easy to write a wrapper for any web-oriented video player. Popcorn also includes a set of parsers for reading common data files (SRT, TTML, XML, etc). Of course, as with other parts of Popcorn, it is easy to create a custom data parser. Tested thoroughly, Popcorn.js supports all modern browsers and IE8. Currently, it’s stable at version 1.2. You can download the Popcorn source or use a web-based build tool to wrap a custom, compressed version.

Potential Applications

Popcorn is in use by a range of publishers, service providers, creative coders and individuals to mash video with the rest of the web. RAMP, a content optimization company, uses an automated process to display time-coded metadata about significant people, places and things whenever they are mentioned in a video. Using Popcorn, RAMP can support a range of player types and contexts (web, mobile, headless) by developing against the common Popcorn API (http://www.ramp.com/solutions/optimized-video/metaplayer/popcorn/). The Dutch multimedia archive Beeld en Geluid has used Popcorn to create a “living archive,” connecting cultural archival material with a range of semantic metadata (http://www.openimages.eu/blog/2012/01/13/open-images-videos-enriched-with-open-data/). Popcorn has also been used to create hyperlinked transcripts that use text as an interface for traversing and editing long media assets (http://yoyodyne.cc/h/) Aside from scale applications, Popcorn is also supported by a burgeoning creative community (In fact, the project was started and is run by Brett Gaylor–a filmmaker!). For instance, documentary producer Kat Cizek uses Popcorn to create web based interactive films. In “1 Millionth Tower,” the web browser creates a navigable 3D space that simulates high rises in major cities around the world. Popcorn is used to turn the camera at key moments, spawn visual effects, and to download live weather data from web APIs. If it’s raining in Toronto, it’s also raining in virtual Toronto (http://highrise.nfb.ca/onemillionthtower/1mt_webgl.php).

Popcorn Maker

A key goal of the Popcorn project is to enable more connected web video on a mass scale, and to open creative possibilities to individual media-makers. Popcorn Maker is a user-facing web application used to create interactive media. It requires no code knowledge. Users pick a video from YouTube or the wider web, open the media object in a prepared HTML template, customize the project, and publish. Popcorn Maker can be used to create pop-up videos, multimedia reports, guided web tours and more. Project composition happens live in the browser. Users can drag and drop events onto a timeline interface, position objects on the page, and watch a live preview of the project be constructed. Popcorn Maker projects are entirely human-readable HTML, CSS and Javascript. For the time being, Popcorn Maker does not support media editing and sequencing. Users must come prepared with an edited video file. For this reason, Popcorn Maker is not a web-based video editor–rather, it’s a video-based web editor. We may revisit this decision later when web browsers handle media playback and synchronization more precisely. Like the rest of the Popcorn project, Popcorn Maker is 100% free and open source. Developers of time-based multimedia apps are encouraged to build on the Butter SDK (source code available at http://github.com/mozilla/butter) and contribute back to the project. At the time of this writing, Popcorn Maker is in active development at version 0.5, and is scheduled for a 1.0 release in late 2012. http://mozillapopcorn.org.

Credits

The Popcorn project and its constituents are lovingly crafted by Ben Moskowitz, Bobby Richter, Brett Gaylor, David Seifried, Christopher De Cairos, Matthew Schranz, Jon Buckley, Scott Downe, Mohammed Buttu, Kate Hudson, David Humphrey, Jeremy Banks, Brian Chirls, James Burke, Robert Stanica, Anna Sobiepanek, Rick Waldron, Nick Cammarata, Daniel Hodgin, Daniel Brooks, Boaz Sender, Dan Ventura, Brad Chen, Minoo Ziaei, Cesar Gomes, Steven Weerdenburg, Cole Gillespie, and Nick Doiron.