Distance metric learning for visual recognition

How to design an effective distance function plays an important role in many computer vision and pattern recognition tasks. Over the past decade, a variety of distance metric learning algorithms have been proposed in the literature, and many of them have obtained reasonable success in various visual...

Full description

Saved in:
Bibliographic Details
Main Author: Hu, Junlin
Other Authors: Tan Yap Peng
Format: Theses and Dissertations
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/73159
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-73159
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
Hu, Junlin
Distance metric learning for visual recognition
description How to design an effective distance function plays an important role in many computer vision and pattern recognition tasks. Over the past decade, a variety of distance metric learning algorithms have been proposed in the literature, and many of them have obtained reasonable success in various visual recognition applications, such as face recognition, image classification, person re-identification and visual search. For better efficiency, however, a number of issues remain to be addressed in distance metric learning. In this thesis, we propose several deep metric leaning methods and multi-view metric learning methods, and apply them to various visual recognition applications to demonstrate their effectiveness. Firstly, most metric learning methods seek a single linear transformation to map data points into a new feature space; hence they are not effective enough in exploiting the nonlinear relationship of data points. Even if the kernel trick is employed to map data points into a high-dimensional feature space under which a discriminative distance metric is learned, the kernel-based metric learning methods still suffer from the scalability problem as they cannot obtain the explicit nonlinear mapping functions. To explicitly address both the nonlinearity and scalability problems, we propose a discriminative deep metric learning (DDML) method by using a deep neural network architecture. The DDML trains a neural network which learns a set of hierarchical nonlinear transformations to project data pairs into the same feature subspace, under which the distance of each positive pair is less than a small threshold and that of each negative pair is higher than a large threshold, respectively, so that more discriminative information can be used in the neural network. We have conducted extensive experiments on face verification and kinship verification to demonstrate the efficacy of the proposed method. Secondly, most existing metric learning methods usually assume that the training and test samples are from the same distribution. This assumption doesn't necessarily hold in many real-world visual recognition applications. To address this problem, we propose a deep transfer metric learning (DTML) method to learn a set of hierarchical nonlinear transformations for cross-domain visual recognition by transferring discriminative knowledge from the labeled source domain to the unlabeled target domain. Specifically, the proposed DTML learns a deep metric network by maximizing the inter-class variations and minimizing the intra-class variations, and minimizing the distribution divergence between the source domain and the target domain at the top layer of the network. To better exploit the discriminative information from the source domain, we further develop a deeply supervised transfer metric learning (DSTML) method by including an additional objective on DTML, where the output of both the hidden layers and the top layer are optimized jointly. To preserve the local manifold of input data points in the metric space, we present two new methods, DTML with autoencoder regularization (DTML-AE) and DSTML with autoencoder regularization (DSTML-AE). Experimental results on face verification, person re-identification, and handwritten digit recognition validate the effectiveness of the proposed methods. Thirdly, it is desirable to learn distance metrics from multiple feature representations so that more discriminative information can be exploited. To explore multiple feature representations in metric learning, we first propose a large-margin multi-metric learning (LM3L) method to collaboratively learn multiple distance metrics from multiple feature representations of data, where one distance metric is learned for each feature and the correlations of different feature representations of each sample are maximized. Furthermore, under the learned metric spaces the distance of each positive pair is less than a small threshold and that of each negative pair is more than a large threshold, respectively. In addition, we also propose two local distance metric learning approaches, i.e., local metric learning (LML) and local large-margin multi-metric learning (L2M3L), to better exploit the local manifold structures of data points. Experimental results on face verification and kinship verification tasks show the effectiveness of the proposed methods.
author2 Tan Yap Peng
author_facet Tan Yap Peng
Hu, Junlin
format Theses and Dissertations
author Hu, Junlin
author_sort Hu, Junlin
title Distance metric learning for visual recognition
title_short Distance metric learning for visual recognition
title_full Distance metric learning for visual recognition
title_fullStr Distance metric learning for visual recognition
title_full_unstemmed Distance metric learning for visual recognition
title_sort distance metric learning for visual recognition
publishDate 2018
url http://hdl.handle.net/10356/73159
_version_ 1772826793381724160
spelling sg-ntu-dr.10356-731592023-07-04T17:24:39Z Distance metric learning for visual recognition Hu, Junlin Tan Yap Peng School of Electrical and Electronic Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition How to design an effective distance function plays an important role in many computer vision and pattern recognition tasks. Over the past decade, a variety of distance metric learning algorithms have been proposed in the literature, and many of them have obtained reasonable success in various visual recognition applications, such as face recognition, image classification, person re-identification and visual search. For better efficiency, however, a number of issues remain to be addressed in distance metric learning. In this thesis, we propose several deep metric leaning methods and multi-view metric learning methods, and apply them to various visual recognition applications to demonstrate their effectiveness. Firstly, most metric learning methods seek a single linear transformation to map data points into a new feature space; hence they are not effective enough in exploiting the nonlinear relationship of data points. Even if the kernel trick is employed to map data points into a high-dimensional feature space under which a discriminative distance metric is learned, the kernel-based metric learning methods still suffer from the scalability problem as they cannot obtain the explicit nonlinear mapping functions. To explicitly address both the nonlinearity and scalability problems, we propose a discriminative deep metric learning (DDML) method by using a deep neural network architecture. The DDML trains a neural network which learns a set of hierarchical nonlinear transformations to project data pairs into the same feature subspace, under which the distance of each positive pair is less than a small threshold and that of each negative pair is higher than a large threshold, respectively, so that more discriminative information can be used in the neural network. We have conducted extensive experiments on face verification and kinship verification to demonstrate the efficacy of the proposed method. Secondly, most existing metric learning methods usually assume that the training and test samples are from the same distribution. This assumption doesn't necessarily hold in many real-world visual recognition applications. To address this problem, we propose a deep transfer metric learning (DTML) method to learn a set of hierarchical nonlinear transformations for cross-domain visual recognition by transferring discriminative knowledge from the labeled source domain to the unlabeled target domain. Specifically, the proposed DTML learns a deep metric network by maximizing the inter-class variations and minimizing the intra-class variations, and minimizing the distribution divergence between the source domain and the target domain at the top layer of the network. To better exploit the discriminative information from the source domain, we further develop a deeply supervised transfer metric learning (DSTML) method by including an additional objective on DTML, where the output of both the hidden layers and the top layer are optimized jointly. To preserve the local manifold of input data points in the metric space, we present two new methods, DTML with autoencoder regularization (DTML-AE) and DSTML with autoencoder regularization (DSTML-AE). Experimental results on face verification, person re-identification, and handwritten digit recognition validate the effectiveness of the proposed methods. Thirdly, it is desirable to learn distance metrics from multiple feature representations so that more discriminative information can be exploited. To explore multiple feature representations in metric learning, we first propose a large-margin multi-metric learning (LM3L) method to collaboratively learn multiple distance metrics from multiple feature representations of data, where one distance metric is learned for each feature and the correlations of different feature representations of each sample are maximized. Furthermore, under the learned metric spaces the distance of each positive pair is less than a small threshold and that of each negative pair is more than a large threshold, respectively. In addition, we also propose two local distance metric learning approaches, i.e., local metric learning (LML) and local large-margin multi-metric learning (L2M3L), to better exploit the local manifold structures of data points. Experimental results on face verification and kinship verification tasks show the effectiveness of the proposed methods. Doctor of Philosophy (EEE) 2018-01-08T06:12:49Z 2018-01-08T06:12:49Z 2018 Thesis http://hdl.handle.net/10356/73159 10.32657/10356/73159 en 170 p. application/pdf