Geometry estimation by deep neural network

Geometry estimation predicts the geometry information under a vision coordinate system. With the high popularity of deep learning, data-driven and learning-based geometry estimation has received much attention in decades. Following the development of geometry estimation, traditional methods compute...

Full description

Saved in:
Bibliographic Details
Main Author: Mei, Jianhan
Other Authors: Jiang Xudong
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/155046
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-155046
record_format dspace
spelling sg-ntu-dr.10356-1550462023-07-04T16:40:18Z Geometry estimation by deep neural network Mei, Jianhan Jiang Xudong School of Electrical and Electronic Engineering Rapid-Rich Object Search (ROSE) Lab EXDJiang@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Geometry estimation predicts the geometry information under a vision coordinate system. With the high popularity of deep learning, data-driven and learning-based geometry estimation has received much attention in decades. Following the development of geometry estimation, traditional methods compute geometry by object correspondences while deep learning-based algorithms train the "black-box" to predict the parameter directly. In this thesis, we target on constructing deep neural network for the application of 6 Dof (6D) pose estimation. Meanwhile, we explore the possibility of learning local image features which is one of the fundamental stages for geometry estimation. Finally, we build the deep learning-based 6D pose estimation system combining with the traditional keypoint estimation modules. Tackling the problem of learning the local image region representation via deep neural networks, existing works mainly learn from matched corresponding image patches, with which the learned feature is too sensitive to the individual local patch matching result and cannot handle aggregation based tasks such as image level retrieval. Thus, we propose to use both the matched corresponding image patches and the clustering result as labels for the network training. To resolve the inconsistency between the matched correspondences and clustering results, we propose a semi-supervised iterative training scheme together with a dual-margins loss. Moreover, A jointly learned spatial transform prediction network is utilized to obtain better spatial transform invariance of the learned local features. Using SIFT as the label initializer, experimental results show comparable or even better performance than the hand-crafted feature, which sheds light on learning local feature representation in an unsupervised or weakly supervised manner. For the application of 6D object pose estimation, we focus on two challenges that are the rotation ambiguity and object occlusion. Considering the strong occlusion and background noise, we propose to utilize the spatial structure for better tackling the challenging task. Consequently, observing that the 3D mesh can be naturally abstracted by the graph, we build the graph using 3D points as vertices and mesh connections as edges. We construct the corresponding mapping from 2D image features to 3D points for filling the graph and fusion of the 2D and 3D features. Afterward, a Graph Convolutional Network (GCN) is applied to help the feature exchange among objects' points in 3D space. To address the problem of rotation symmetry ambiguity for objects, a spherical convolution is utilized and the spherical feature is combined with the convolutional feature which is mapped to the graph. Predefined 3D keypoints are voted and the 6DoF pose is obtained via the optimization fitting. Both the scenarios of inference with and without the depth information are discussed. Tested on the datasets of YCB-Video and LINEMOD, the experiments demonstrate the effectiveness of our proposed method. Doctor of Philosophy 2022-02-27T23:41:54Z 2022-02-27T23:41:54Z 2022 Thesis-Doctor of Philosophy Mei, J. (2022). Geometry estimation by deep neural network. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/155046 https://hdl.handle.net/10356/155046 10.32657/10356/155046 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Mei, Jianhan
Geometry estimation by deep neural network
description Geometry estimation predicts the geometry information under a vision coordinate system. With the high popularity of deep learning, data-driven and learning-based geometry estimation has received much attention in decades. Following the development of geometry estimation, traditional methods compute geometry by object correspondences while deep learning-based algorithms train the "black-box" to predict the parameter directly. In this thesis, we target on constructing deep neural network for the application of 6 Dof (6D) pose estimation. Meanwhile, we explore the possibility of learning local image features which is one of the fundamental stages for geometry estimation. Finally, we build the deep learning-based 6D pose estimation system combining with the traditional keypoint estimation modules. Tackling the problem of learning the local image region representation via deep neural networks, existing works mainly learn from matched corresponding image patches, with which the learned feature is too sensitive to the individual local patch matching result and cannot handle aggregation based tasks such as image level retrieval. Thus, we propose to use both the matched corresponding image patches and the clustering result as labels for the network training. To resolve the inconsistency between the matched correspondences and clustering results, we propose a semi-supervised iterative training scheme together with a dual-margins loss. Moreover, A jointly learned spatial transform prediction network is utilized to obtain better spatial transform invariance of the learned local features. Using SIFT as the label initializer, experimental results show comparable or even better performance than the hand-crafted feature, which sheds light on learning local feature representation in an unsupervised or weakly supervised manner. For the application of 6D object pose estimation, we focus on two challenges that are the rotation ambiguity and object occlusion. Considering the strong occlusion and background noise, we propose to utilize the spatial structure for better tackling the challenging task. Consequently, observing that the 3D mesh can be naturally abstracted by the graph, we build the graph using 3D points as vertices and mesh connections as edges. We construct the corresponding mapping from 2D image features to 3D points for filling the graph and fusion of the 2D and 3D features. Afterward, a Graph Convolutional Network (GCN) is applied to help the feature exchange among objects' points in 3D space. To address the problem of rotation symmetry ambiguity for objects, a spherical convolution is utilized and the spherical feature is combined with the convolutional feature which is mapped to the graph. Predefined 3D keypoints are voted and the 6DoF pose is obtained via the optimization fitting. Both the scenarios of inference with and without the depth information are discussed. Tested on the datasets of YCB-Video and LINEMOD, the experiments demonstrate the effectiveness of our proposed method.
author2 Jiang Xudong
author_facet Jiang Xudong
Mei, Jianhan
format Thesis-Doctor of Philosophy
author Mei, Jianhan
author_sort Mei, Jianhan
title Geometry estimation by deep neural network
title_short Geometry estimation by deep neural network
title_full Geometry estimation by deep neural network
title_fullStr Geometry estimation by deep neural network
title_full_unstemmed Geometry estimation by deep neural network
title_sort geometry estimation by deep neural network
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/155046
_version_ 1772826393236733952