Search disambiguation techniques in multimedia collections
In the last few years, we have witnessed an explosive growth in multimedia content. Online repositories such as Flickr (image) and YouTube (video) contain hundreds of millions of images and videos and are still growing by the day. However, the utility of these data sources is only as good as their a...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2012
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/50768 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-50768 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-507682023-03-04T00:48:22Z Search disambiguation techniques in multimedia collections Wan, Kong Wah Tan Ah Hwee School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing In the last few years, we have witnessed an explosive growth in multimedia content. Online repositories such as Flickr (image) and YouTube (video) contain hundreds of millions of images and videos and are still growing by the day. However, the utility of these data sources is only as good as their accessibility. A primary motivation of this thesis is towards better utilization of this overwhelming wealth of information. Specifically, we look at ways to disambiguate search in a multimedia retrieval system. The need for search disambiguation often arises in practice because the typical issued query is short, imprecise and ambiguous, and it is difficult to infer the exact query intent. Search disambiguation is the general class of techniques to elucidate the multi-faceted information needs arising from an ambiguous query. The objective is to provide an overview of search results grouped by multiple facets, which can better clarify a user's search intention by enabling him to zoom into any specific facet of the query topic. There has been a lack of a principled approach to search disambiguation. Many multimedia retrieval systems address the issue of ambiguous queries by results clustering, near-duplicate removal and diversification. The aim is to simply prevent result pages from being cluttered by too many similar Web articles. In this thesis, we take a more principled approach and propose the following two-prong methodology. First, we propose a novel faceted topic retrieval framework wherein multimedia documents are modeled as comprising of facets or topics. The notion of facets can be interpreted broadly to encompass any binary property of a document that represents a fact or a topic that is contained in the query need. In contrast with traditional term-based retrieval models (e.g. TF-IDF), documents are now ranked by the relevance of their composite facets/topics to the query. The goal is then to return a set of documents that are not only relevant to the query, they also cover the many different facets of the information need. We base our faceted retrieval framework on probabilistic topic models, a class of algorithms designed to discover the latent thematic structures in a document collection. Second, we augment our faceted retrieval framework with two modeling capabilities to better process ambiguous queries in multimedia collection such as video and images. Firstly, because different queries may have varying level of ambiguity, and hence varying level of polysemy in the return results, we develop a non-parametric Bayesian method to cluster search results. The main advantage of our clustering method is that the number of mixture components is not fixed a priori, but is determined during the posterior inference process. This allows our model to grow with the level of polysemy (and visual diversity) in the return multimedia results. Secondly, we extend the basic probabilistic topic model (the Latent Dirichlet Allocation, LDA) to jointly model the complementary information in the visual and textual streams. We show how the joint modeling can improve the quality of facet detection in news video, and in turn yield better user satisfaction in faceted topic retrieval. DOCTOR OF PHILOSOPHY (SCE) 2012-10-29T06:55:36Z 2012-10-29T06:55:36Z 2012 2012 Thesis Wan, K. W. (2012). Search disambiguation techniques in multimedia collections. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/50768 10.32657/10356/50768 en 197 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing Wan, Kong Wah Search disambiguation techniques in multimedia collections |
description |
In the last few years, we have witnessed an explosive growth in multimedia content. Online repositories such as Flickr (image) and YouTube (video) contain hundreds of millions of images and videos and are still growing by the day. However, the utility of these data sources is only as good as their accessibility. A primary motivation of this thesis is towards better utilization of this overwhelming wealth of information. Specifically, we look at ways to disambiguate search in a multimedia retrieval system. The need for search disambiguation often arises in practice because the typical issued query is short, imprecise and ambiguous, and it is difficult to infer the exact query intent. Search disambiguation is the general class of techniques to elucidate the multi-faceted information needs arising from an ambiguous query. The objective is to provide an overview of search results grouped by multiple facets, which can better clarify a user's search intention by enabling him to zoom into any specific facet of the query topic. There has been a lack of a principled approach to search disambiguation. Many multimedia retrieval systems address the issue of ambiguous queries by results clustering, near-duplicate removal and diversification. The aim is to simply prevent result pages from being cluttered by too many similar Web articles. In this thesis, we take a more principled approach and propose the following two-prong methodology. First, we propose a novel faceted topic retrieval framework wherein multimedia documents are modeled as comprising of facets or topics. The notion of facets can be interpreted broadly to encompass any binary property of a document that represents a fact or a topic that is contained in the query need. In contrast with traditional term-based retrieval models (e.g. TF-IDF), documents are now ranked by the relevance of their composite facets/topics to the query. The goal is then to return a set of documents that are not only relevant to the query, they also cover the many different facets of the information need. We base our faceted retrieval framework on probabilistic topic models, a class of algorithms designed to discover the latent thematic structures in a document collection. Second, we augment our faceted retrieval framework with two modeling capabilities to better process ambiguous queries in multimedia collection such as video and images. Firstly, because different queries may have varying level of ambiguity, and hence varying level of polysemy in the return results, we develop a non-parametric Bayesian method to cluster search results. The main advantage of our clustering method is that the number of mixture components is not fixed a priori, but is determined during the posterior inference process. This allows our model to grow with the level of polysemy (and visual diversity) in the return multimedia results. Secondly, we extend the basic probabilistic topic model (the Latent Dirichlet Allocation, LDA) to jointly model the complementary information in the visual and textual streams. We show how the joint modeling can improve the quality of facet detection in news video, and in turn yield better user satisfaction in faceted topic retrieval. |
author2 |
Tan Ah Hwee |
author_facet |
Tan Ah Hwee Wan, Kong Wah |
format |
Theses and Dissertations |
author |
Wan, Kong Wah |
author_sort |
Wan, Kong Wah |
title |
Search disambiguation techniques in multimedia collections |
title_short |
Search disambiguation techniques in multimedia collections |
title_full |
Search disambiguation techniques in multimedia collections |
title_fullStr |
Search disambiguation techniques in multimedia collections |
title_full_unstemmed |
Search disambiguation techniques in multimedia collections |
title_sort |
search disambiguation techniques in multimedia collections |
publishDate |
2012 |
url |
https://hdl.handle.net/10356/50768 |
_version_ |
1759855489930231808 |