Content and context analysis for mobile landmark recognition

In recent years, the use of mobile/cellular phones has increased greatly. Over 80 percent of the global population has become mobile cellular subscribers by 2012. Today, more than half of the mobile phones in use have camera features. Benefiting from the built-in cameras along with the advancement i...

Full description

Saved in:
Bibliographic Details
Main Author: Tao, Chen
Other Authors: Yap Kim Hui
Format: Theses and Dissertations
Language:English
Published: 2013
Subjects:
Online Access:https://hdl.handle.net/10356/54669
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-54669
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Tao, Chen
Content and context analysis for mobile landmark recognition
description In recent years, the use of mobile/cellular phones has increased greatly. Over 80 percent of the global population has become mobile cellular subscribers by 2012. Today, more than half of the mobile phones in use have camera features. Benefiting from the built-in cameras along with the advancement in network technologies, various mobile device-based applications that take advantage of the interactive features offered by the cellular phones have been designed. Amongst them, mobile landmark recognition that uses camera phone to capture a landmark and determine its related information such as its name, history and activities, is becoming increasingly popular. In view of this, this thesis focuses on mobile device-based landmark recognition and proposes to address this issue using context-aware content analysis techniques. In Chapter 3, a new content and context integration technique for mobile landmark recognition is proposed. A bags-of-words (BoW) framework that involves a new spatial pyramid decomposition scheme is developed to perform the content analysis. The location obtained through built-in Global Positioning System (GPS) of mobile device and direction information obtained through built-in digital compass of mobile device are incorporated into the context analysis, which is then integrated with the content analysis for mobile landmark recognition. In Chapter 4, a new BoW approach based on discriminative learning of patches, images and codewords is proposed for landmark recognition. An iterative learning approach based on a differential Gaussian mixture model (DGMM) is developed to estimate the discriminative information of each image patch. This information is then incorporated into vector quantization to generate a BoW histogram. An image signature weighting method is developed to score each image in representing its landmark category, which is then used to train a discriminative classifier through a fuzzy support vector machine (SVM). The context information such as GPS and direction is finally integrated with the proposed content analysis to speed up the recognition time and improve the recognition performance. In Chapter 5, different from the BoW-based methods above, a new soft bag-of-phrase (BoP) approach based on category-dependent phrase selection is proposed for mobile landmark recognition. In this chapter, the number of visual words in each phrase is chosen as two, which is named as second-order phrase. Two contributions are made in the proposed approach: (i) a discriminative selection approach that takes advantage of the word-level and phrase-level semantic similarity is developed to select the important phrases from a large number of candidates, and form the descriptive BoP dictionary, (ii) a soft encoding technique is developed to generate a BoP histogram for each image, which reduces the amount of information loss induced by conventional BoP quantization. In Chapter 6, unlike the above methods that adopt SVM as the recognition technique, a fast landmark recognition approach based on scalable vocabulary tree (SVT) is proposed. The method constructs direction-dependent SVTs (DSVTs) for image quantization, and learns a discriminative compact vocabulary (DCV) to encode the query image. Direction information is first considered to supervise image feature clustering to construct DSVTs. Location information is then incorporated into the DCV learning algorithm, to select the discriminative codewords of the DSVT to form the DCV. An ImageRank technique and an iterative codeword selection algorithm are developed for DCV learning. Inverted indexed files are constructed for the codewords in the vocabulary, which can greatly improve the recognition efficiency. We validate the proposed algorithms and techniques on several landmark databases, including the NTU50Landmark database created by ourselves, the Oxford building dataset, and the San Francisco landmark database. The experimental results on these landmark databases consistently show the effectiveness of the proposed methods for mobile landmark recognition. Furthermore, the comparison with other state-of-the-art techniques indicates that the proposed algorithms achieve better performance in terms of the recognition accuracy and computational time.
author2 Yap Kim Hui
author_facet Yap Kim Hui
Tao, Chen
format Theses and Dissertations
author Tao, Chen
author_sort Tao, Chen
title Content and context analysis for mobile landmark recognition
title_short Content and context analysis for mobile landmark recognition
title_full Content and context analysis for mobile landmark recognition
title_fullStr Content and context analysis for mobile landmark recognition
title_full_unstemmed Content and context analysis for mobile landmark recognition
title_sort content and context analysis for mobile landmark recognition
publishDate 2013
url https://hdl.handle.net/10356/54669
_version_ 1772828125735944192
spelling sg-ntu-dr.10356-546692023-07-04T15:38:33Z Content and context analysis for mobile landmark recognition Tao, Chen Yap Kim Hui School of Electrical and Electronic Engineering Centre for Signal Processing DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision In recent years, the use of mobile/cellular phones has increased greatly. Over 80 percent of the global population has become mobile cellular subscribers by 2012. Today, more than half of the mobile phones in use have camera features. Benefiting from the built-in cameras along with the advancement in network technologies, various mobile device-based applications that take advantage of the interactive features offered by the cellular phones have been designed. Amongst them, mobile landmark recognition that uses camera phone to capture a landmark and determine its related information such as its name, history and activities, is becoming increasingly popular. In view of this, this thesis focuses on mobile device-based landmark recognition and proposes to address this issue using context-aware content analysis techniques. In Chapter 3, a new content and context integration technique for mobile landmark recognition is proposed. A bags-of-words (BoW) framework that involves a new spatial pyramid decomposition scheme is developed to perform the content analysis. The location obtained through built-in Global Positioning System (GPS) of mobile device and direction information obtained through built-in digital compass of mobile device are incorporated into the context analysis, which is then integrated with the content analysis for mobile landmark recognition. In Chapter 4, a new BoW approach based on discriminative learning of patches, images and codewords is proposed for landmark recognition. An iterative learning approach based on a differential Gaussian mixture model (DGMM) is developed to estimate the discriminative information of each image patch. This information is then incorporated into vector quantization to generate a BoW histogram. An image signature weighting method is developed to score each image in representing its landmark category, which is then used to train a discriminative classifier through a fuzzy support vector machine (SVM). The context information such as GPS and direction is finally integrated with the proposed content analysis to speed up the recognition time and improve the recognition performance. In Chapter 5, different from the BoW-based methods above, a new soft bag-of-phrase (BoP) approach based on category-dependent phrase selection is proposed for mobile landmark recognition. In this chapter, the number of visual words in each phrase is chosen as two, which is named as second-order phrase. Two contributions are made in the proposed approach: (i) a discriminative selection approach that takes advantage of the word-level and phrase-level semantic similarity is developed to select the important phrases from a large number of candidates, and form the descriptive BoP dictionary, (ii) a soft encoding technique is developed to generate a BoP histogram for each image, which reduces the amount of information loss induced by conventional BoP quantization. In Chapter 6, unlike the above methods that adopt SVM as the recognition technique, a fast landmark recognition approach based on scalable vocabulary tree (SVT) is proposed. The method constructs direction-dependent SVTs (DSVTs) for image quantization, and learns a discriminative compact vocabulary (DCV) to encode the query image. Direction information is first considered to supervise image feature clustering to construct DSVTs. Location information is then incorporated into the DCV learning algorithm, to select the discriminative codewords of the DSVT to form the DCV. An ImageRank technique and an iterative codeword selection algorithm are developed for DCV learning. Inverted indexed files are constructed for the codewords in the vocabulary, which can greatly improve the recognition efficiency. We validate the proposed algorithms and techniques on several landmark databases, including the NTU50Landmark database created by ourselves, the Oxford building dataset, and the San Francisco landmark database. The experimental results on these landmark databases consistently show the effectiveness of the proposed methods for mobile landmark recognition. Furthermore, the comparison with other state-of-the-art techniques indicates that the proposed algorithms achieve better performance in terms of the recognition accuracy and computational time. DOCTOR OF PHILOSOPHY (EEE) 2013-07-16T01:48:21Z 2013-07-16T01:48:21Z 2013 2013 Thesis Tao, C. (2013). Content and context analysis for mobile landmark recognition. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/54669 10.32657/10356/54669 en 171 p. application/pdf