Search disambiguation techniques in multimedia collections

In the last few years, we have witnessed an explosive growth in multimedia content. Online repositories such as Flickr (image) and YouTube (video) contain hundreds of millions of images and videos and are still growing by the day. However, the utility of these data sources is only as good as their a...

Full description

Saved in:
Bibliographic Details
Main Author: Wan, Kong Wah
Other Authors: Tan Ah Hwee
Format: Theses and Dissertations
Language:English
Published: 2012
Subjects:
Online Access:https://hdl.handle.net/10356/50768
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-50768
record_format dspace
spelling sg-ntu-dr.10356-507682023-03-04T00:48:22Z Search disambiguation techniques in multimedia collections Wan, Kong Wah Tan Ah Hwee School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing In the last few years, we have witnessed an explosive growth in multimedia content. Online repositories such as Flickr (image) and YouTube (video) contain hundreds of millions of images and videos and are still growing by the day. However, the utility of these data sources is only as good as their accessibility. A primary motivation of this thesis is towards better utilization of this overwhelming wealth of information. Specifically, we look at ways to disambiguate search in a multimedia retrieval system. The need for search disambiguation often arises in practice because the typical issued query is short, imprecise and ambiguous, and it is difficult to infer the exact query intent. Search disambiguation is the general class of techniques to elucidate the multi-faceted information needs arising from an ambiguous query. The objective is to provide an overview of search results grouped by multiple facets, which can better clarify a user's search intention by enabling him to zoom into any specific facet of the query topic. There has been a lack of a principled approach to search disambiguation. Many multimedia retrieval systems address the issue of ambiguous queries by results clustering, near-duplicate removal and diversification. The aim is to simply prevent result pages from being cluttered by too many similar Web articles. In this thesis, we take a more principled approach and propose the following two-prong methodology. First, we propose a novel faceted topic retrieval framework wherein multimedia documents are modeled as comprising of facets or topics. The notion of facets can be interpreted broadly to encompass any binary property of a document that represents a fact or a topic that is contained in the query need. In contrast with traditional term-based retrieval models (e.g. TF-IDF), documents are now ranked by the relevance of their composite facets/topics to the query. The goal is then to return a set of documents that are not only relevant to the query, they also cover the many different facets of the information need. We base our faceted retrieval framework on probabilistic topic models, a class of algorithms designed to discover the latent thematic structures in a document collection. Second, we augment our faceted retrieval framework with two modeling capabilities to better process ambiguous queries in multimedia collection such as video and images. Firstly, because different queries may have varying level of ambiguity, and hence varying level of polysemy in the return results, we develop a non-parametric Bayesian method to cluster search results. The main advantage of our clustering method is that the number of mixture components is not fixed a priori, but is determined during the posterior inference process. This allows our model to grow with the level of polysemy (and visual diversity) in the return multimedia results. Secondly, we extend the basic probabilistic topic model (the Latent Dirichlet Allocation, LDA) to jointly model the complementary information in the visual and textual streams. We show how the joint modeling can improve the quality of facet detection in news video, and in turn yield better user satisfaction in faceted topic retrieval. DOCTOR OF PHILOSOPHY (SCE) 2012-10-29T06:55:36Z 2012-10-29T06:55:36Z 2012 2012 Thesis Wan, K. W. (2012). Search disambiguation techniques in multimedia collections. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/50768 10.32657/10356/50768 en 197 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Wan, Kong Wah
Search disambiguation techniques in multimedia collections
description In the last few years, we have witnessed an explosive growth in multimedia content. Online repositories such as Flickr (image) and YouTube (video) contain hundreds of millions of images and videos and are still growing by the day. However, the utility of these data sources is only as good as their accessibility. A primary motivation of this thesis is towards better utilization of this overwhelming wealth of information. Specifically, we look at ways to disambiguate search in a multimedia retrieval system. The need for search disambiguation often arises in practice because the typical issued query is short, imprecise and ambiguous, and it is difficult to infer the exact query intent. Search disambiguation is the general class of techniques to elucidate the multi-faceted information needs arising from an ambiguous query. The objective is to provide an overview of search results grouped by multiple facets, which can better clarify a user's search intention by enabling him to zoom into any specific facet of the query topic. There has been a lack of a principled approach to search disambiguation. Many multimedia retrieval systems address the issue of ambiguous queries by results clustering, near-duplicate removal and diversification. The aim is to simply prevent result pages from being cluttered by too many similar Web articles. In this thesis, we take a more principled approach and propose the following two-prong methodology. First, we propose a novel faceted topic retrieval framework wherein multimedia documents are modeled as comprising of facets or topics. The notion of facets can be interpreted broadly to encompass any binary property of a document that represents a fact or a topic that is contained in the query need. In contrast with traditional term-based retrieval models (e.g. TF-IDF), documents are now ranked by the relevance of their composite facets/topics to the query. The goal is then to return a set of documents that are not only relevant to the query, they also cover the many different facets of the information need. We base our faceted retrieval framework on probabilistic topic models, a class of algorithms designed to discover the latent thematic structures in a document collection. Second, we augment our faceted retrieval framework with two modeling capabilities to better process ambiguous queries in multimedia collection such as video and images. Firstly, because different queries may have varying level of ambiguity, and hence varying level of polysemy in the return results, we develop a non-parametric Bayesian method to cluster search results. The main advantage of our clustering method is that the number of mixture components is not fixed a priori, but is determined during the posterior inference process. This allows our model to grow with the level of polysemy (and visual diversity) in the return multimedia results. Secondly, we extend the basic probabilistic topic model (the Latent Dirichlet Allocation, LDA) to jointly model the complementary information in the visual and textual streams. We show how the joint modeling can improve the quality of facet detection in news video, and in turn yield better user satisfaction in faceted topic retrieval.
author2 Tan Ah Hwee
author_facet Tan Ah Hwee
Wan, Kong Wah
format Theses and Dissertations
author Wan, Kong Wah
author_sort Wan, Kong Wah
title Search disambiguation techniques in multimedia collections
title_short Search disambiguation techniques in multimedia collections
title_full Search disambiguation techniques in multimedia collections
title_fullStr Search disambiguation techniques in multimedia collections
title_full_unstemmed Search disambiguation techniques in multimedia collections
title_sort search disambiguation techniques in multimedia collections
publishDate 2012
url https://hdl.handle.net/10356/50768
_version_ 1759855489930231808