Large-scale Sensor-rich Video Management and Delivery
Supervisor(s) and Committee member(s): Roger Zimmermann (supervisor), Wei-Tsang Ooi (examiner), Mun-Choon Chan (examiner), Pål Halvorsen (examiner)
In recent years, people have become accustomed to sharing and watching videos on the Internet. Particularly, the rapid advance in the technology of mobile devices has attracted users to produce and consume videos on the newly booming platform. With the technological innovation, a new life cycle of a video has reformed where people capture a video on their smartphones, upload it to some place on the Internet and make it available to the public; others discover the video in some way, download and watch it on smartphones as well as traditional platforms. Within the new life cycle, a number of hardware and software problems arise.
This thesis focuses on the problems raised during the second half of the aforementioned video life cycle and caused by the new requirements and constraints, that is, the large volume of videos and the big audience size. Specifically, the second half of the video life cycle (or the process of accessing Internet videos) can be further divided into two steps: (1) finding the desired video clip and then (2) downloading and watching it in real-time. The constraint of the large volume of videos complicates the first step, while it together with the constraint of the big audience size makes the second step difficult as well. Unfortunately, the traditional solutions that deal with small video corpora and small-scale audience are no longer applicable under the new conditions. Therefore, this thesis investigates and proposes some start-of-the-art techniques that can be applied to the two steps to improve people’s experience of accessing Internet videos.
During the first step, to search the desired videos, people tend to use the traditional textual input (or keywords), since textual annotation (or tagging) has demonstrated its capability of making videos searchable. Manual tagging is so laborious and often inaccurate that researchers proposed to automatically tag videos by analyzing their content. However, while the signal-level features of videos can easily be extracted from the content, high-level semantics are shown to be difficult to acquire for achieving sound accuracy. Recently, context of videos has been introduced to supplement high-level video semantics detection. Being aware of its promising effect, this thesis investigates a rich-context method, where a video is enriched with multiple dimensions of sensor information. Based on the sensor-rich setup, a data-driven approach for automating the tag generation process by exploiting the geo-spatial properties of videos is proposed. Importantly, without conducting any pixel-wise computations, the proposed approach is quite efficient and able to cope with big video corpora. Then, the thesis further discusses how to make use of the crowdsourced information from online multimedia websites to improve the geo-referenced data source, which significantly influences the quality of tags.
For the second step, after the desired videos are found, the traditional paradigm to deliver them to users is client-server, where the content publisher is responsible for disseminating videos to each individual user. Hence the bandwidth usage on the content publisher side grows linearly with the audience size. Given a huge audience, this paradigm may exhaust the bandwidth on the publisher side. In contrast, P2P networks have demonstrated to be a scalable paradigm by shifting the video delivery workload to users. Nevertheless, in recent years, P2P networks have generated a huge amount of far-reaching Internet traffic, which may result in monetary cost for Internet service providers (ISPs), network congestion and decrease of video quality. Consequently, it is worthwhile to study how to localize the traffic caused by P2P video streaming with streaming quality preserved. In this thesis, first, a real-world P2P streaming application has been measured to understand the peer distribution over networks, confirming the opportunity of localizing traffic. Next, the optimal solution of ISP-scale traffic locality is derived, and according to the solution, a number of modifications that are compatible with current P2P streaming architectures have been proposed. Nevertheless, it is found that traffic inefficiency is not just restricted to the scale of ISPs. Therefore, the solution is further extended to the scenarios of LAN-scale traffic locality and mobile wireless networks for generalization.