Content and context analysis for mobile landmark recognition
In recent years, the use of mobile/cellular phones has increased greatly. Over 80 percent of the global population has become mobile cellular subscribers by 2012. Today, more than half of the mobile phones in use have camera features. Benefiting from the built-in cameras along with the advancement i...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2013
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/54669 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In recent years, the use of mobile/cellular phones has increased greatly. Over 80 percent of the global population has become mobile cellular subscribers by 2012. Today, more than half of the mobile phones in use have camera features. Benefiting from the built-in cameras along with the advancement in network technologies, various mobile device-based applications that take advantage of the interactive features offered by the cellular phones have been designed. Amongst them, mobile landmark recognition that uses camera phone to capture a landmark and determine its related information such as its name, history and activities, is becoming increasingly popular. In view of this, this thesis focuses on mobile device-based landmark recognition and proposes to address this issue using context-aware content analysis techniques. In Chapter 3, a new content and context integration technique for mobile landmark recognition is proposed. A bags-of-words (BoW) framework that involves a new spatial pyramid decomposition scheme is developed to perform the content analysis. The location obtained through built-in Global Positioning System (GPS) of mobile device and direction information obtained through built-in digital compass of mobile device are incorporated into the context analysis, which is then integrated with the content analysis for mobile landmark recognition. In Chapter 4, a new BoW approach based on discriminative learning of patches, images and codewords is proposed for landmark recognition. An iterative learning approach based on a differential Gaussian mixture model (DGMM) is developed to estimate the discriminative information of each image patch. This information is then incorporated into vector quantization to generate a BoW histogram. An image signature weighting method is developed to score each image in representing its landmark category, which is then used to train a discriminative classifier through a fuzzy support vector machine (SVM). The context information such as GPS and direction is finally integrated with the proposed content analysis to speed up the recognition time and improve the recognition performance. In Chapter 5, different from the BoW-based methods above, a new soft bag-of-phrase (BoP) approach based on category-dependent phrase selection is proposed for mobile landmark recognition. In this chapter, the number of visual words in each phrase is chosen as two, which is named as second-order phrase. Two contributions are made in the proposed approach: (i) a discriminative selection approach that takes advantage of the word-level and phrase-level semantic similarity is developed to select the important phrases from a large number of candidates, and form the descriptive BoP dictionary, (ii) a soft encoding technique is developed to generate a BoP histogram for each image, which reduces the amount of information loss induced by conventional BoP quantization. In Chapter 6, unlike the above methods that adopt SVM as the recognition technique, a fast landmark recognition approach based on scalable vocabulary tree (SVT) is proposed. The method constructs direction-dependent SVTs (DSVTs) for image quantization, and learns a discriminative compact vocabulary (DCV) to encode the query image. Direction information is first considered to supervise image feature clustering to construct DSVTs. Location information is then incorporated into the DCV learning algorithm, to select the discriminative codewords of the DSVT to form the DCV. An ImageRank technique and an iterative codeword selection algorithm are developed for DCV learning. Inverted indexed files are constructed for the codewords in the vocabulary, which can greatly improve the recognition efficiency. We validate the proposed algorithms and techniques on several landmark databases, including the NTU50Landmark database created by ourselves, the Oxford building dataset, and the San Francisco landmark database. The experimental results on these landmark databases consistently show the effectiveness of the proposed methods for mobile landmark recognition. Furthermore, the comparison with other state-of-the-art techniques indicates that the proposed algorithms achieve better performance in terms of the recognition accuracy and computational time. |
---|