Kernel learning for visual perception

The visual perceptual system in animals allows them to assimilate information from their surroundings. In artificial intelligence, the objective of visual perception is to enable the capability of a computer system to interpret the surrounding environment using data acquired from cameras and other a...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Chen
Other Authors: Xie Lihua
Format: Theses and Dissertations
Language:English
Published: 2019
Subjects:
Online Access:https://hdl.handle.net/10356/105527
http://hdl.handle.net/10220/47835
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-105527
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Electrical and electronic engineering
spellingShingle DRNTU::Engineering::Electrical and electronic engineering
Wang, Chen
Kernel learning for visual perception
description The visual perceptual system in animals allows them to assimilate information from their surroundings. In artificial intelligence, the objective of visual perception is to enable the capability of a computer system to interpret the surrounding environment using data acquired from cameras and other aided sensors. Since the last century, researchers in visual perception have delivered many marvelous technologies and algorithms for various applications, such as object detection and image recognition, etc. Despite the technological progresses, human beings are still confronted by the unsatisfactory performance of artificial visual perceptual systems. One of the main reasons is that the traditional methods usually rely on large amount of training data, powerful processors, and require great efforts and time for process modeling. The research goal of this thesis is to develop visual perceptual systems that requires less computational resources but with higher performance. To this end, the novel kernel learning methods for several basic visual perceptual tasks, including object tracking, localization, mapping, and image recognition, are proposed and demonstrated both theoretically and practically. In visual object tracking, the state-of-the-art algorithms that leverage on kernelized correlation filters are limited by circulant training data and non-weighted kernel functions. This makes them only applicable for translation prediction and prevents their usage in other applications. To overcome the problems, a kernel cross-correlator (KCC) is introduced. First, by introducing the kernel trick, the KCC extends linear cross-correlation to non-linear space, which is more robust to signal noises and distortions. Second, connections to the existing works show that the KCC provides a unified solution for correlation filters. Third, the KCC is not only applicable to any training data and kernel functions, but also able to predict affine transforms with customized properties. Last, by leveraging the fast Fourier transform (FFT), the KCC eliminates direct calculation of kernel vectors, thus achieving better performance at a reasonable computational cost. Comprehensive experiments on visual tracking and human activity recognition using wearable devices have demonstrated its robustness, flexibility, and efficiency. Optical flow is the pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer and a scene. It is calculated from sequences of ordered images and allows the estimation of motion as instantaneous image velocities, which is crucial for autonomous robot navigation. This thesis proposes a KCC-based algorithm to determine optical flow using a monocular camera, which is named as correlation flow (CF). CF can provide reliable and accurate velocity estimation and is robust to motion blur. In addition, a joint kernel scale-rotation correlator is proposed to estimate the altitude velocity and yaw rate which are not available by traditional methods. Autonomous flight tests on a quadcopter show that correlation flow can provide robust trajectory estimation with very low processing power. In the problem of simultaneous localization and mapping (SLAM), traditional odometry methods resort to iterative algorithms which are usually computationally expensive or require well-designed initialization. To overcome this problem, a KCC-based non-iterative solution to RGB-D-inertial odometry system is proposed. To reduce the odometry and inertial drifts, two frameworks for non-iterative SLAM (NI-SLAM) are presented. One is to combine a visual loop closure detection, another one is to seek the aids from ultra wide-band (UWB) technology. Dominated by the FFT, the non-iterative front-end is only of $\mathcal{O}(n\log n)$ complexity, where $n$ is the number of pixels. Therefore, both frameworks can provide reliable performance and are of very low computational complexity. The map fusion is conducted by element-wise operation, so that both time and space complexity are further reduced. Extensive experiments show that, due to the lightweight of the proposed non-iterative front-end, both frameworks of NI-SLAM can run at a much faster speed and yet still with comparable accuracy with the state-of-the-arts. Convolutional neural network (CNN) is one of the most powerful tools in visual perception. It has enabled many state-of-the-art performances in image recognition, object detection, etc. However, little effort has been devoted to establishing convolution in non-linear space. In this thesis, a new operation, kervolution (kernel convolution), is introduced to approximate the non-linear behavior of the human perceptual system. It generalizes traditional convolution and increases the model capacity without introducing more parameters. Similarly, kervolution can also be calculated through element-wise multiplication via Fourier transform. The extensive experiments show that the kervolutional neural networks (KNN) achieve better performance and faster convergence than traditional CNN on the MNIST, CIFAR, and ImageNet datasets. In summary, the thesis demonstrates the superiority of the proposed kernel tools for visual perceptual problems, including KCC, CF, NI-SLAM and KNN. With the kernel tools, we may expect their usage in more applications, such as internet of things, robotics, transfer learning, reinforcement learning, etc.
author2 Xie Lihua
author_facet Xie Lihua
Wang, Chen
format Theses and Dissertations
author Wang, Chen
author_sort Wang, Chen
title Kernel learning for visual perception
title_short Kernel learning for visual perception
title_full Kernel learning for visual perception
title_fullStr Kernel learning for visual perception
title_full_unstemmed Kernel learning for visual perception
title_sort kernel learning for visual perception
publishDate 2019
url https://hdl.handle.net/10356/105527
http://hdl.handle.net/10220/47835
_version_ 1772825488497049600
spelling sg-ntu-dr.10356-1055272023-07-04T16:41:50Z Kernel learning for visual perception Wang, Chen Xie Lihua School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering The visual perceptual system in animals allows them to assimilate information from their surroundings. In artificial intelligence, the objective of visual perception is to enable the capability of a computer system to interpret the surrounding environment using data acquired from cameras and other aided sensors. Since the last century, researchers in visual perception have delivered many marvelous technologies and algorithms for various applications, such as object detection and image recognition, etc. Despite the technological progresses, human beings are still confronted by the unsatisfactory performance of artificial visual perceptual systems. One of the main reasons is that the traditional methods usually rely on large amount of training data, powerful processors, and require great efforts and time for process modeling. The research goal of this thesis is to develop visual perceptual systems that requires less computational resources but with higher performance. To this end, the novel kernel learning methods for several basic visual perceptual tasks, including object tracking, localization, mapping, and image recognition, are proposed and demonstrated both theoretically and practically. In visual object tracking, the state-of-the-art algorithms that leverage on kernelized correlation filters are limited by circulant training data and non-weighted kernel functions. This makes them only applicable for translation prediction and prevents their usage in other applications. To overcome the problems, a kernel cross-correlator (KCC) is introduced. First, by introducing the kernel trick, the KCC extends linear cross-correlation to non-linear space, which is more robust to signal noises and distortions. Second, connections to the existing works show that the KCC provides a unified solution for correlation filters. Third, the KCC is not only applicable to any training data and kernel functions, but also able to predict affine transforms with customized properties. Last, by leveraging the fast Fourier transform (FFT), the KCC eliminates direct calculation of kernel vectors, thus achieving better performance at a reasonable computational cost. Comprehensive experiments on visual tracking and human activity recognition using wearable devices have demonstrated its robustness, flexibility, and efficiency. Optical flow is the pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer and a scene. It is calculated from sequences of ordered images and allows the estimation of motion as instantaneous image velocities, which is crucial for autonomous robot navigation. This thesis proposes a KCC-based algorithm to determine optical flow using a monocular camera, which is named as correlation flow (CF). CF can provide reliable and accurate velocity estimation and is robust to motion blur. In addition, a joint kernel scale-rotation correlator is proposed to estimate the altitude velocity and yaw rate which are not available by traditional methods. Autonomous flight tests on a quadcopter show that correlation flow can provide robust trajectory estimation with very low processing power. In the problem of simultaneous localization and mapping (SLAM), traditional odometry methods resort to iterative algorithms which are usually computationally expensive or require well-designed initialization. To overcome this problem, a KCC-based non-iterative solution to RGB-D-inertial odometry system is proposed. To reduce the odometry and inertial drifts, two frameworks for non-iterative SLAM (NI-SLAM) are presented. One is to combine a visual loop closure detection, another one is to seek the aids from ultra wide-band (UWB) technology. Dominated by the FFT, the non-iterative front-end is only of $\mathcal{O}(n\log n)$ complexity, where $n$ is the number of pixels. Therefore, both frameworks can provide reliable performance and are of very low computational complexity. The map fusion is conducted by element-wise operation, so that both time and space complexity are further reduced. Extensive experiments show that, due to the lightweight of the proposed non-iterative front-end, both frameworks of NI-SLAM can run at a much faster speed and yet still with comparable accuracy with the state-of-the-arts. Convolutional neural network (CNN) is one of the most powerful tools in visual perception. It has enabled many state-of-the-art performances in image recognition, object detection, etc. However, little effort has been devoted to establishing convolution in non-linear space. In this thesis, a new operation, kervolution (kernel convolution), is introduced to approximate the non-linear behavior of the human perceptual system. It generalizes traditional convolution and increases the model capacity without introducing more parameters. Similarly, kervolution can also be calculated through element-wise multiplication via Fourier transform. The extensive experiments show that the kervolutional neural networks (KNN) achieve better performance and faster convergence than traditional CNN on the MNIST, CIFAR, and ImageNet datasets. In summary, the thesis demonstrates the superiority of the proposed kernel tools for visual perceptual problems, including KCC, CF, NI-SLAM and KNN. With the kernel tools, we may expect their usage in more applications, such as internet of things, robotics, transfer learning, reinforcement learning, etc. Doctor of Philosophy 2019-03-16T15:10:31Z 2019-12-06T21:53:01Z 2019-03-16T15:10:31Z 2019-12-06T21:53:01Z 2019 Thesis Wang, Chen. (2019). Kernel learning for visual perception. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/105527 http://hdl.handle.net/10220/47835 10.32657/10220/47835 en 197 p. application/pdf