Salient keypoint detectors and compact feature descriptors for 3D perception
3D depth data acquisition has become extremely easy and affordable with the availability of hand-held depth sensors such as Microsoft Kinect, Intel RealSense Camera and Google Tango. Moreover, with the surge in smartphones equipped with depth sensors such as Lenovo Phab2Pro and Asus Zenfone AR, i...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2017
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/72484 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-72484 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering |
spellingShingle |
DRNTU::Engineering::Computer science and engineering Prakhya, Sai Manoj Salient keypoint detectors and compact feature descriptors for 3D perception |
description |
3D depth data acquisition has become extremely easy and affordable with the availability
of hand-held depth sensors such as Microsoft Kinect, Intel RealSense Camera and Google
Tango. Moreover, with the surge in smartphones equipped with depth sensors such as Lenovo
Phab2Pro and Asus Zenfone AR, it is quintessential to develop 3D perception applications that
are accurate, and run with low memory, computational and bandwidth requirements.
The first two steps of various 3D perception applications, such as Simultaneous Localization and Mapping (SLAM), 3D object recognition, retrieval and 3D reconstruction, are
1. 3D Keypoint Detection - Detect meaningful 3D points of interest that can efficiently
represent the input 3D point cloud.
2. 3D Feature Description - Represent the 3D neighbourhood of the detected keypoints with
a multi-dimensional vector to determine keypoint correspondences.
The first part of the thesis focuses on 3D keypoint detection, in which, firstly a highly
repeatable salient 3D keypoint detection algorithm is proposed. Next, we consider a specific
3D perception application, SLAM with an RGB-D camera, and propose a new 3D keypoint
detection module that works best for it. The second part of the thesis focusses on 3D feature
description, in which, we firstly propose a fast real valued low dimensional 3D descriptor, then
the first binary 3D descriptor in literature and lastly, a set of even lower bitrate 3D descriptors,
which are extremely fast to compute, match yet still offer better performance.
Existing 3D keypoint detectors sometimes detect keypoints on non-salient regions/planar
regions or, detect noise and glitches as keypoints. In contrary to the existing norm of having
distinct keypoints, we propose to detect salient and highly repeatable keypoint sets(groups of
keypoints). Towards this, we propose Histogram of Normal Orientations (HoNO) to detect
salient regions and effectively remove planar regions by thresholding the kurtosis of HoNO
calculated at every point in the point cloud. Then, the final keypoint sets are detected by
evaluating the properties of HoNO and neighbourhood covariance matrix.
Next, we consider a 3D perception problem, SLAM with an RGB-D camera by solely re-
lying on depth data. As a solution, we propose Sparse Depth Odometry (SDO), in which the
main contribution lies in the proposal of a new 3D keypoint detection module. The new key-
point detection module comprises of two existing keypoint detectors, SURE and NARF, and
is designed based on extensive theoretical and experimental analysis. The proposed keypoint
detection module finds reliable keypoints that work well with nearest neighbour association
and represent the scene comprehensively while working in real time, which are the key requirements of SDO. SDO powered with the proposed keypoint detection module, estimates the
ego-motion of an RGB-D camera solely from its depth data and runs online without a GPU.
As for 3D feature description, existing real valued 3D descriptors are either high dimensional or demand immense computational time for their extraction and matching. Hence we
propose 3DHoPD, a new low dimensional 3D feature descriptor that is extremely fast to com-
pute. The novelty lies in compactly encoding the 3D keypoint position by transforming it to a
new 3D space, where the keypoints arising from similar 3D surface patches lie close to each
other. Then we propose Histograms of Point Distributions (HoPD) descriptor to capture the
neighbourhood structure, thus forming 3DHoPD (3D+HoPD). We propose a tailored feature
descriptor matching technique, where in, the search space for each keypoint match is reduced
by 90%, and then the exact match is found using the proposed HoPD descriptor.
There are several real valued 3D descriptors, but there is no binary 3D descriptor for 3D
keypoint matching. Binary descriptors are known for their low memory footprint and fast
matching via Hamming distance. Hence, we introduce the first binary 3D descriptor, B-SHOT,
by proposing an adaptive binarization technique that converts a real valued vector to a binary
vector. We apply this method on a state-of-the-art 3D feature descriptor, SHOT, and create
a new binary 3D descriptor. B-SHOT requires 32 times lesser memory for its representation
while being 6 times faster in feature descriptor matching, when compared to the SHOT.
Finally, for the applications that require online transfer of 3D descriptors over a network,
there is a need to develop compressed 3D descriptors with even lower memory footprint, i.e.,
bandwidth and yet have high descriptiveness. Therefore, we propose to employ lattice quantization to efficiently compress 3D feature descriptors. These compressed low bitrate 3D descriptors can be directly matched in compressed domain without any need for decompression,
hence drastically reducing the memory footprint and computational requirements for match-
ing. We also propose double stage lattice quantization to achieve even more compression in the
case of SHOT descriptor. We provide a spectrum of possible bitrates and achievable keypoint
matching performance for three state-of-the-art 3D feature descriptors, so that it can aid users
to choose the apt one based on the memory, bandwidth and performance requirements. |
author2 |
Lin Weisi |
author_facet |
Lin Weisi Prakhya, Sai Manoj |
format |
Theses and Dissertations |
author |
Prakhya, Sai Manoj |
author_sort |
Prakhya, Sai Manoj |
title |
Salient keypoint detectors and compact feature descriptors for 3D perception |
title_short |
Salient keypoint detectors and compact feature descriptors for 3D perception |
title_full |
Salient keypoint detectors and compact feature descriptors for 3D perception |
title_fullStr |
Salient keypoint detectors and compact feature descriptors for 3D perception |
title_full_unstemmed |
Salient keypoint detectors and compact feature descriptors for 3D perception |
title_sort |
salient keypoint detectors and compact feature descriptors for 3d perception |
publishDate |
2017 |
url |
http://hdl.handle.net/10356/72484 |
_version_ |
1759857848952553472 |
spelling |
sg-ntu-dr.10356-724842023-03-04T00:47:08Z Salient keypoint detectors and compact feature descriptors for 3D perception Prakhya, Sai Manoj Lin Weisi School of Computer Science and Engineering A*STAR Institute for Infocomm Research (I2R) DRNTU::Engineering::Computer science and engineering 3D depth data acquisition has become extremely easy and affordable with the availability of hand-held depth sensors such as Microsoft Kinect, Intel RealSense Camera and Google Tango. Moreover, with the surge in smartphones equipped with depth sensors such as Lenovo Phab2Pro and Asus Zenfone AR, it is quintessential to develop 3D perception applications that are accurate, and run with low memory, computational and bandwidth requirements. The first two steps of various 3D perception applications, such as Simultaneous Localization and Mapping (SLAM), 3D object recognition, retrieval and 3D reconstruction, are 1. 3D Keypoint Detection - Detect meaningful 3D points of interest that can efficiently represent the input 3D point cloud. 2. 3D Feature Description - Represent the 3D neighbourhood of the detected keypoints with a multi-dimensional vector to determine keypoint correspondences. The first part of the thesis focuses on 3D keypoint detection, in which, firstly a highly repeatable salient 3D keypoint detection algorithm is proposed. Next, we consider a specific 3D perception application, SLAM with an RGB-D camera, and propose a new 3D keypoint detection module that works best for it. The second part of the thesis focusses on 3D feature description, in which, we firstly propose a fast real valued low dimensional 3D descriptor, then the first binary 3D descriptor in literature and lastly, a set of even lower bitrate 3D descriptors, which are extremely fast to compute, match yet still offer better performance. Existing 3D keypoint detectors sometimes detect keypoints on non-salient regions/planar regions or, detect noise and glitches as keypoints. In contrary to the existing norm of having distinct keypoints, we propose to detect salient and highly repeatable keypoint sets(groups of keypoints). Towards this, we propose Histogram of Normal Orientations (HoNO) to detect salient regions and effectively remove planar regions by thresholding the kurtosis of HoNO calculated at every point in the point cloud. Then, the final keypoint sets are detected by evaluating the properties of HoNO and neighbourhood covariance matrix. Next, we consider a 3D perception problem, SLAM with an RGB-D camera by solely re- lying on depth data. As a solution, we propose Sparse Depth Odometry (SDO), in which the main contribution lies in the proposal of a new 3D keypoint detection module. The new key- point detection module comprises of two existing keypoint detectors, SURE and NARF, and is designed based on extensive theoretical and experimental analysis. The proposed keypoint detection module finds reliable keypoints that work well with nearest neighbour association and represent the scene comprehensively while working in real time, which are the key requirements of SDO. SDO powered with the proposed keypoint detection module, estimates the ego-motion of an RGB-D camera solely from its depth data and runs online without a GPU. As for 3D feature description, existing real valued 3D descriptors are either high dimensional or demand immense computational time for their extraction and matching. Hence we propose 3DHoPD, a new low dimensional 3D feature descriptor that is extremely fast to com- pute. The novelty lies in compactly encoding the 3D keypoint position by transforming it to a new 3D space, where the keypoints arising from similar 3D surface patches lie close to each other. Then we propose Histograms of Point Distributions (HoPD) descriptor to capture the neighbourhood structure, thus forming 3DHoPD (3D+HoPD). We propose a tailored feature descriptor matching technique, where in, the search space for each keypoint match is reduced by 90%, and then the exact match is found using the proposed HoPD descriptor. There are several real valued 3D descriptors, but there is no binary 3D descriptor for 3D keypoint matching. Binary descriptors are known for their low memory footprint and fast matching via Hamming distance. Hence, we introduce the first binary 3D descriptor, B-SHOT, by proposing an adaptive binarization technique that converts a real valued vector to a binary vector. We apply this method on a state-of-the-art 3D feature descriptor, SHOT, and create a new binary 3D descriptor. B-SHOT requires 32 times lesser memory for its representation while being 6 times faster in feature descriptor matching, when compared to the SHOT. Finally, for the applications that require online transfer of 3D descriptors over a network, there is a need to develop compressed 3D descriptors with even lower memory footprint, i.e., bandwidth and yet have high descriptiveness. Therefore, we propose to employ lattice quantization to efficiently compress 3D feature descriptors. These compressed low bitrate 3D descriptors can be directly matched in compressed domain without any need for decompression, hence drastically reducing the memory footprint and computational requirements for match- ing. We also propose double stage lattice quantization to achieve even more compression in the case of SHOT descriptor. We provide a spectrum of possible bitrates and achievable keypoint matching performance for three state-of-the-art 3D feature descriptors, so that it can aid users to choose the apt one based on the memory, bandwidth and performance requirements. Doctor of Philosophy (SCE) 2017-08-07T02:54:12Z 2017-08-07T02:54:12Z 2017 Thesis Prakhya, S. M. (2017). Salient keypoint detectors and compact feature descriptors for 3D perception. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/72484 10.32657/10356/72484 en 158 p. application/pdf |