Search disambiguation techniques in multimedia collections
In the last few years, we have witnessed an explosive growth in multimedia content. Online repositories such as Flickr (image) and YouTube (video) contain hundreds of millions of images and videos and are still growing by the day. However, the utility of these data sources is only as good as their a...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2012
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/50768 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In the last few years, we have witnessed an explosive growth in multimedia content. Online repositories such as Flickr (image) and YouTube (video) contain hundreds of millions of images and videos and are still growing by the day. However, the utility of these data sources is only as good as their accessibility. A primary motivation of this thesis is towards better utilization of this overwhelming wealth of information. Specifically, we look at ways to disambiguate search in a multimedia retrieval system. The need for search disambiguation often arises in practice because the typical issued query is short, imprecise and ambiguous, and it is difficult to infer the exact query intent. Search disambiguation is the general class of techniques to elucidate the multi-faceted information needs arising from an ambiguous query. The objective is to provide an overview of search results grouped by multiple facets, which can better clarify a user's search intention by enabling him to zoom into any specific facet of the query topic. There has been a lack of a principled approach to search disambiguation. Many multimedia retrieval systems address the issue of ambiguous queries by results clustering, near-duplicate removal and diversification. The aim is to simply prevent result pages from being cluttered by too many similar Web articles. In this thesis, we take a more principled approach and propose the following two-prong methodology. First, we propose a novel faceted topic retrieval framework wherein multimedia documents are modeled as comprising of facets or topics. The notion of facets can be interpreted broadly to encompass any binary property of a document that represents a fact or a topic that is contained in the query need. In contrast with traditional term-based retrieval models (e.g. TF-IDF), documents are now ranked by the relevance of their composite facets/topics to the query. The goal is then to return a set of documents that are not only relevant to the query, they also cover the many different facets of the information need. We base our faceted retrieval framework on probabilistic topic models, a class of algorithms designed to discover the latent thematic structures in a document collection. Second, we augment our faceted retrieval framework with two modeling capabilities to better process ambiguous queries in multimedia collection such as video and images. Firstly, because different queries may have varying level of ambiguity, and hence varying level of polysemy in the return results, we develop a non-parametric Bayesian method to cluster search results. The main advantage of our clustering method is that the number of mixture components is not fixed a priori, but is determined during the posterior inference process. This allows our model to grow with the level of polysemy (and visual diversity) in the return multimedia results. Secondly, we extend the basic probabilistic topic model (the Latent Dirichlet Allocation, LDA) to jointly model the complementary information in the visual and textual streams. We show how the joint modeling can improve the quality of facet detection in news video, and in turn yield better user satisfaction in faceted topic retrieval. |
---|