Context-aware mobile image recognition and annotation

The growing usage of mobile camera phones has led to proliferation of many mobile applications, such as mobile city guide, mobile shopping, personalized mobile service, and personal album management. Mobile visual systems have been developed which analyze images taken by mobile devices to enable the...

Full description

Saved in:
Bibliographic Details
Main Author: Li, Zhen
Other Authors: Yap Kim Hui
Format: Theses and Dissertations
Language:English
Published: 2013
Subjects:
Online Access:https://hdl.handle.net/10356/55100
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-55100
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Information systems
spellingShingle DRNTU::Engineering::Computer science and engineering::Information systems
Li, Zhen
Context-aware mobile image recognition and annotation
description The growing usage of mobile camera phones has led to proliferation of many mobile applications, such as mobile city guide, mobile shopping, personalized mobile service, and personal album management. Mobile visual systems have been developed which analyze images taken by mobile devices to enable these mobile applications. Amongst these applications, there are two important ones: 1) mobile image recognition which provides relevant information for the scene/landmark images, and 2) mobile image annotation that uses camera phones to capture images and annotate them. Mobile image recognition and annotation are closely related, and are based on mobile visual analysis. In order to enhance the performance of mobile visual system, it is natural to incorporate the mobile domain-specific context information to the conventional visual content analysis. The context information in this work includes location and direction information on mobile devices, mobile user interaction, etc. However, context information is underutilized in most of the existing mobile visual systems. Existing mobile visual systems mainly use location information provided by GPS (Global Positioning System) to obtain the candidate images located near the current location of the query image, and then carry out content analysis within the shortlisted candidates to obtain the final recognition/annotation results. This is insufficient since (i) GPS is not that reliable due to its large errors in dense build-up areas, and (ii) other context information such as direction (recorded by digital compass on mobile device) is not utilized to further improve recognition. For mobile image recognition, we proposed several approaches based on content analysis with possible incorporation of context information: 1) A new approach for scene image recognition is proposed by combining generative models and discriminative models. A new image signature is proposed based on Gaussian Mixture Model (GMM), and its soft relevance value is incorporated into training of Fuzzy Support Vector Machine (FSVM). By using the proposed GMM-FSVM approach, the recognition performance is shown to be superior to state-of-the-art Bag-of-Words (BoW) methods. 2) A new landmark image recognition method is proposed that can incorporate saliency information of images to the state-of-the-art Scalable Vocabulary Tree (SVT) approach. Since the saliency information emphasizes the foreground landmark object and ignores the cluttered background, recognition performance of the proposed Saliency-Aware Vocabulary Tree (SAVT) algorithm is improved relative to the baseline SVT approach. 3) We propose a real-valued multi-class adaboost algorithm using exponential loss function (RMAE), which can integrate visual content and two types of mobile context: location and direction. RMAE generates SVTs based on content and context analysis, respectively, and then constructs weak classifiers based on them, followed by the final strong classifier construction based on the weak classifiers which contains both content and context information. For mobile image annotation, we developed a system prototype and proposed several approaches by utilizing content analysis, context analysis and their integration: 2) To study the effectiveness of context-based image annotation, a new algorithm is proposed by modeling the tag distributions over different GPS locations of the mobile images. Specifically, the tag distributions are obtained by using an enhanced GMM. Based on the tag distributions, a query image can be associated to tags according to its location, thus achieving context-based image annotation. As part of the contributions, we have also constructed two mobile image databases: i) Singapore Landmark-40 dataset for recognition, and ii) NTU Scene-25 dataset for annotation. Singapore Landmark-40 datasets consists of 12,338 training images and 1,200 testing images for 40 famous landmarks in Singapore. NTU Scene-25 dataset consists of 3916 images in 25 categories of geotagged scenes/landmarks/activities from the campus in NTU. This dataset include various context information such as GPS location and direction. Comprehensive experiments have been carried on a number of mobile image datasets, and experimental results show that the proposed mobile image recognition and annotation methods outperform the state-of-the-art methods, and shows good potential in mobile image sharing based on recognition and annotation.
author2 Yap Kim Hui
author_facet Yap Kim Hui
Li, Zhen
format Theses and Dissertations
author Li, Zhen
author_sort Li, Zhen
title Context-aware mobile image recognition and annotation
title_short Context-aware mobile image recognition and annotation
title_full Context-aware mobile image recognition and annotation
title_fullStr Context-aware mobile image recognition and annotation
title_full_unstemmed Context-aware mobile image recognition and annotation
title_sort context-aware mobile image recognition and annotation
publishDate 2013
url https://hdl.handle.net/10356/55100
_version_ 1772827815855521792
spelling sg-ntu-dr.10356-551002023-07-04T17:10:55Z Context-aware mobile image recognition and annotation Li, Zhen Yap Kim Hui School of Electrical and Electronic Engineering DRNTU::Engineering::Computer science and engineering::Information systems The growing usage of mobile camera phones has led to proliferation of many mobile applications, such as mobile city guide, mobile shopping, personalized mobile service, and personal album management. Mobile visual systems have been developed which analyze images taken by mobile devices to enable these mobile applications. Amongst these applications, there are two important ones: 1) mobile image recognition which provides relevant information for the scene/landmark images, and 2) mobile image annotation that uses camera phones to capture images and annotate them. Mobile image recognition and annotation are closely related, and are based on mobile visual analysis. In order to enhance the performance of mobile visual system, it is natural to incorporate the mobile domain-specific context information to the conventional visual content analysis. The context information in this work includes location and direction information on mobile devices, mobile user interaction, etc. However, context information is underutilized in most of the existing mobile visual systems. Existing mobile visual systems mainly use location information provided by GPS (Global Positioning System) to obtain the candidate images located near the current location of the query image, and then carry out content analysis within the shortlisted candidates to obtain the final recognition/annotation results. This is insufficient since (i) GPS is not that reliable due to its large errors in dense build-up areas, and (ii) other context information such as direction (recorded by digital compass on mobile device) is not utilized to further improve recognition. For mobile image recognition, we proposed several approaches based on content analysis with possible incorporation of context information: 1) A new approach for scene image recognition is proposed by combining generative models and discriminative models. A new image signature is proposed based on Gaussian Mixture Model (GMM), and its soft relevance value is incorporated into training of Fuzzy Support Vector Machine (FSVM). By using the proposed GMM-FSVM approach, the recognition performance is shown to be superior to state-of-the-art Bag-of-Words (BoW) methods. 2) A new landmark image recognition method is proposed that can incorporate saliency information of images to the state-of-the-art Scalable Vocabulary Tree (SVT) approach. Since the saliency information emphasizes the foreground landmark object and ignores the cluttered background, recognition performance of the proposed Saliency-Aware Vocabulary Tree (SAVT) algorithm is improved relative to the baseline SVT approach. 3) We propose a real-valued multi-class adaboost algorithm using exponential loss function (RMAE), which can integrate visual content and two types of mobile context: location and direction. RMAE generates SVTs based on content and context analysis, respectively, and then constructs weak classifiers based on them, followed by the final strong classifier construction based on the weak classifiers which contains both content and context information. For mobile image annotation, we developed a system prototype and proposed several approaches by utilizing content analysis, context analysis and their integration: 2) To study the effectiveness of context-based image annotation, a new algorithm is proposed by modeling the tag distributions over different GPS locations of the mobile images. Specifically, the tag distributions are obtained by using an enhanced GMM. Based on the tag distributions, a query image can be associated to tags according to its location, thus achieving context-based image annotation. As part of the contributions, we have also constructed two mobile image databases: i) Singapore Landmark-40 dataset for recognition, and ii) NTU Scene-25 dataset for annotation. Singapore Landmark-40 datasets consists of 12,338 training images and 1,200 testing images for 40 famous landmarks in Singapore. NTU Scene-25 dataset consists of 3916 images in 25 categories of geotagged scenes/landmarks/activities from the campus in NTU. This dataset include various context information such as GPS location and direction. Comprehensive experiments have been carried on a number of mobile image datasets, and experimental results show that the proposed mobile image recognition and annotation methods outperform the state-of-the-art methods, and shows good potential in mobile image sharing based on recognition and annotation. DOCTOR OF PHILOSOPHY (EEE) 2013-12-12T07:22:28Z 2013-12-12T07:22:28Z 2013 2013 Thesis Li, Z. (2013). Context-aware mobile image recognition and annotation. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/55100 10.32657/10356/55100 en 174 p. application/pdf