Automatic Summarization of Personal Photo Collections
Supervisor(s) and Committee member(s): Professor Ramesh Jain (supervisor), Professor Sharad Mehrotra (committee member), Professor Padhraic Smyth (committee member), Professor Deva Ramanan (committee member)
URL: http://www.ics.uci.edu/~psinha/research/thesis/pinaki_thesis.pdf
Photo taking and sharing devices (e.g., smart phones, digital cameras, etc) have become extremely popular in recent times. Photo enthusiasts today capture moments of their personal lives using these devices. This has resulted in huge collections of photos stored in various personal archives. The exponential growth of online social networks and web based photo sharing platforms have added fuel to this fire. According to recent estimates [46], three billion photos are uploaded on the social network Facebook per month. This photo data overload has created some major challenges. One of the them is automatic generation of representative overviews from large photo collections. Manual browsing of photo corpora is not only tedious, but also time inefficient. Hence, development of an automatic photo summarization system is not only a research but also a practical challenge. In this dissertation, we present a principled approach for generation of size constrained overview summaries from large personal photo collections.
We define a photo summary as an extractive subset, which is a good representative of the larger photo set. We propose three properties that an effective summary should satisfy: Quality, Diversity and Coverage. Modern digital photos come with heterogeneous content and context data. We propose models which can combine this multimodal data to compute the summary properties. Our summarization objective is modeled as an optimization of these properties. Further, the summarization framework can integrate user preferences in form of inputs. Thus, different summaries may be generated from the same corpus to accommodate preference variations among the users.
A traditional way of intrinsic evaluation in information retrieval is comparing the retrieved result set with a manually generated ground truth. However, given the variability of human behavior in selection of appealing photos, it may be difficult and non-intuitive to generate a unique ground truth summary of a larger data corpus. Due to the personal nature of the dataset, only the contributor of a particular photo corpus can possibly summarize it (since personal photos typically come with lots of background personal knowledge). While considerable efforts have been directed towards evaluation of annotation and ranking in multimedia, relatively few experiments have been done to evaluate photo summaries.
We conducted extensive user studies on summarization of photos from single life events. The experiments showed certain uniformity and some diversity of user preferences in generating and evaluating photo summaries. We also posit that photo summaries should serve the twin objectives of information discovery and reuse. Based on this assumption, we propose novel objective metrics which enables us to evaluate summaries from large personal photo corpora without user generated ground truths. We also create dataset of personal photos along with hosts of contextual data which can be helpful in future research. Our experiments show that the summarization properties and framework proposed can indeed be used to generate effective summaries. This framework can be extended to include other types information (e.g., social ties among multiple users present in a dataset) and to create personalized photo summaries.