Topic Models for Image Retrieval on Large-Scale Databases
Supervisor(s) and Committee member(s): Rainer Lienhart (supervisor), Wolfgang Effelsberg (reader), Bernhard Möller (reader)
With the explosion of the number of images in personal and on-line collections, efficient techniques for navigating, indexing, labeling and searching images become more and more important. In this work we will rely on the image content as the main source of information to retrieve images. We study the representation of images by topic models in its various aspects and extend the current models. Starting from a bag-of-visual-words image description based on local image features, images representations are learned in an unsupervised fashion and each image is modeled as a mixture of topics/object parts depicted in the image. Thus topic models allow us to automatically extract high-level image content descriptions which in turn can be used to find similar images. Further, the typically low-dimensional topic-model-based representation enables efficient and fast search, especially in very large databases.
In this thesis we present a complete image retrieval system based on topic models and evaluate the suitability of different types of topic models for the task of large-scale retrieval on real-world databases. Different similarity measure are evaluated in a retrieval-by-example task.
Next, we focus on the incorporation of different types of local image features in the topic models. For this, we first evaluate which types of feature detectors and descriptors are appropriate to model the images, then we propose and explore models that fuse multiple types of local features. All basic topic models require the quantization of the otherwise high-dimensional continuous local feature vectors into a finite, discrete vocabulary to enable the bag-of-words image representation the topic models are built on. As it is not clear how to optimally quantize the high-dimensional features, we introduce different extensions to a basic topic model which model the visual vocabulary continuously, making the quantization step obsolete.
On-line image repositories of the Web 2.0 often store additional information about the images besides their pixel values, called metadata, such as associated tags, date of creation, ownership and camera parameters. In this work we also investigate how to include such cues in our retrieval system. We present work in progress on (hierarchical) models which fuse features from multiple modalities.
Finally, we present an approach to find the most relevant images, i.e., very representative images, in a large web-scale collection given a query term. Our unsupervised approach ranks highest the image whose image content and its various metadata types gives us the highest probability according to a the model we automatically build for this tag.
Multimedia Computing Lab, University of Augsburg
Multimedia Computing Lab at the Institute of Computer Science at the University of Augsburg is headed by Prof. Dr. Rainer Lienhart. The Multimedia Computing Lab focuses on the theory and the computational aspects of media data especially media mining.