Distance learning between image and class for object recognition

Object recognition is an active research topic in the computer vision community. Recently a novel Image-to-Class (I2C) distance has been proposed to handle this problem, which classifies images using a simple Naive-Bayes based nearest-neighbor (NBNN) classifier but provides surprisingly excellent pe...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Zhengxiang
Other Authors: Chia Liang Tien, Clement
Format: Theses and Dissertations
Language:English
Published: 2013
Subjects:
Online Access:https://hdl.handle.net/10356/54819
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-54819
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Wang, Zhengxiang
Distance learning between image and class for object recognition
description Object recognition is an active research topic in the computer vision community. Recently a novel Image-to-Class (I2C) distance has been proposed to handle this problem, which classifies images using a simple Naive-Bayes based nearest-neighbor (NBNN) classifier but provides surprisingly excellent performance. This new distance provides a novel direction that avoids feature quantization and shows better generalization capability than the traditional Image-to-Image (I2I) distance. However, the computation cost of calculating this distance is too expensive since its performance relies heavily on searching the nearest neighbor (NN) from a large number of training features, and the label information of the training data is not fully used, which limits its recognition performance. In this thesis, we aim to improve both the recognition performance and efficiency of this I2C distance as well as to extend its application field. First of all, we add a training phase to this distance for improving its recognition performance by learning a weighted I2C distance. A large margin optimization framework is proposed to learn the I2C distance function, which is modeled as a weighted combination of the distance from every local feature in an image to its NN in a candidate class. We learn these weights associated with local features in the training set by constraining the optimization such that the I2C distance from image to its belonging class should be less than that to any other class. To reduce the computation cost, we also propose two methods based on spatial division and hubness score to accelerate the NN search, which is able to largely reduce the on-line testing time while still preserving or even achieving a better classification accuracy. Secondly, we propose a distance metric learning method to further improve the performance of I2C distance by learning Per-Class Mahalanobis metrics. This Mahalanobis I2C distance is adaptive to different classes by combining with the learned metric for each class. These multiple Per-Class metrics are learned simultaneously by forming a convex optimization problem and solved by an efficient subgradient descent method. For efficiency and scalability to large-scale problems, we also show how to simplify the method to learn a diagonal matrix for each class. Thirdly, we extend the object recognition to the multi-label problem and propose a Class-to-Image (C2I) distance, which shows better performance than the I2C distance for multi-label image classification. However, since the number of local features in a class is huge compared to that in an image, the calculation of the C2I distance is more expensive than the one of I2C distance. Moreover, the label information of training images can be used to help select relevant local features for each class and further improve the recognition performance. Therefore, to make the C2I distance faster and perform better, we propose an optimization algorithm using L_1-norm regularization and large margin constraint to learn the C2I distance, which can not only reduce the number of local features in the class feature set, but also improve the performance of the C2I distance due to the use of label information. We also use this C2I distance for object localization, so that it can tell not only whether a candidate class appears in a test image, but also where it locates. With these three works, we are able to improve the recognition performance and efficiency of the I2C distance and make it applicable for the multi-label problem. Therefore, the learned distance between image and class would be more practical for real world object recognition applications.
author2 Chia Liang Tien, Clement
author_facet Chia Liang Tien, Clement
Wang, Zhengxiang
format Theses and Dissertations
author Wang, Zhengxiang
author_sort Wang, Zhengxiang
title Distance learning between image and class for object recognition
title_short Distance learning between image and class for object recognition
title_full Distance learning between image and class for object recognition
title_fullStr Distance learning between image and class for object recognition
title_full_unstemmed Distance learning between image and class for object recognition
title_sort distance learning between image and class for object recognition
publishDate 2013
url https://hdl.handle.net/10356/54819
_version_ 1759856710050119680
spelling sg-ntu-dr.10356-548192023-03-04T00:37:15Z Distance learning between image and class for object recognition Wang, Zhengxiang Chia Liang Tien, Clement School of Computer Engineering Centre for Multimedia and Network Technology DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Object recognition is an active research topic in the computer vision community. Recently a novel Image-to-Class (I2C) distance has been proposed to handle this problem, which classifies images using a simple Naive-Bayes based nearest-neighbor (NBNN) classifier but provides surprisingly excellent performance. This new distance provides a novel direction that avoids feature quantization and shows better generalization capability than the traditional Image-to-Image (I2I) distance. However, the computation cost of calculating this distance is too expensive since its performance relies heavily on searching the nearest neighbor (NN) from a large number of training features, and the label information of the training data is not fully used, which limits its recognition performance. In this thesis, we aim to improve both the recognition performance and efficiency of this I2C distance as well as to extend its application field. First of all, we add a training phase to this distance for improving its recognition performance by learning a weighted I2C distance. A large margin optimization framework is proposed to learn the I2C distance function, which is modeled as a weighted combination of the distance from every local feature in an image to its NN in a candidate class. We learn these weights associated with local features in the training set by constraining the optimization such that the I2C distance from image to its belonging class should be less than that to any other class. To reduce the computation cost, we also propose two methods based on spatial division and hubness score to accelerate the NN search, which is able to largely reduce the on-line testing time while still preserving or even achieving a better classification accuracy. Secondly, we propose a distance metric learning method to further improve the performance of I2C distance by learning Per-Class Mahalanobis metrics. This Mahalanobis I2C distance is adaptive to different classes by combining with the learned metric for each class. These multiple Per-Class metrics are learned simultaneously by forming a convex optimization problem and solved by an efficient subgradient descent method. For efficiency and scalability to large-scale problems, we also show how to simplify the method to learn a diagonal matrix for each class. Thirdly, we extend the object recognition to the multi-label problem and propose a Class-to-Image (C2I) distance, which shows better performance than the I2C distance for multi-label image classification. However, since the number of local features in a class is huge compared to that in an image, the calculation of the C2I distance is more expensive than the one of I2C distance. Moreover, the label information of training images can be used to help select relevant local features for each class and further improve the recognition performance. Therefore, to make the C2I distance faster and perform better, we propose an optimization algorithm using L_1-norm regularization and large margin constraint to learn the C2I distance, which can not only reduce the number of local features in the class feature set, but also improve the performance of the C2I distance due to the use of label information. We also use this C2I distance for object localization, so that it can tell not only whether a candidate class appears in a test image, but also where it locates. With these three works, we are able to improve the recognition performance and efficiency of the I2C distance and make it applicable for the multi-label problem. Therefore, the learned distance between image and class would be more practical for real world object recognition applications. DOCTOR OF PHILOSOPHY (SCE) 2013-08-30T01:59:50Z 2013-08-30T01:59:50Z 2013 2013 Thesis Wang, Z. (2013). Distance learning between image and class for object recognition. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/54819 10.32657/10356/54819 en 133 p. application/pdf