Geometry estimation by deep neural network

Geometry estimation predicts the geometry information under a vision coordinate system. With the high popularity of deep learning, data-driven and learning-based geometry estimation has received much attention in decades. Following the development of geometry estimation, traditional methods compute...

Full description

Saved in:
Bibliographic Details
Main Author: Mei, Jianhan
Other Authors: Jiang Xudong
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/155046
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Geometry estimation predicts the geometry information under a vision coordinate system. With the high popularity of deep learning, data-driven and learning-based geometry estimation has received much attention in decades. Following the development of geometry estimation, traditional methods compute geometry by object correspondences while deep learning-based algorithms train the "black-box" to predict the parameter directly. In this thesis, we target on constructing deep neural network for the application of 6 Dof (6D) pose estimation. Meanwhile, we explore the possibility of learning local image features which is one of the fundamental stages for geometry estimation. Finally, we build the deep learning-based 6D pose estimation system combining with the traditional keypoint estimation modules. Tackling the problem of learning the local image region representation via deep neural networks, existing works mainly learn from matched corresponding image patches, with which the learned feature is too sensitive to the individual local patch matching result and cannot handle aggregation based tasks such as image level retrieval. Thus, we propose to use both the matched corresponding image patches and the clustering result as labels for the network training. To resolve the inconsistency between the matched correspondences and clustering results, we propose a semi-supervised iterative training scheme together with a dual-margins loss. Moreover, A jointly learned spatial transform prediction network is utilized to obtain better spatial transform invariance of the learned local features. Using SIFT as the label initializer, experimental results show comparable or even better performance than the hand-crafted feature, which sheds light on learning local feature representation in an unsupervised or weakly supervised manner. For the application of 6D object pose estimation, we focus on two challenges that are the rotation ambiguity and object occlusion. Considering the strong occlusion and background noise, we propose to utilize the spatial structure for better tackling the challenging task. Consequently, observing that the 3D mesh can be naturally abstracted by the graph, we build the graph using 3D points as vertices and mesh connections as edges. We construct the corresponding mapping from 2D image features to 3D points for filling the graph and fusion of the 2D and 3D features. Afterward, a Graph Convolutional Network (GCN) is applied to help the feature exchange among objects' points in 3D space. To address the problem of rotation symmetry ambiguity for objects, a spherical convolution is utilized and the spherical feature is combined with the convolutional feature which is mapped to the graph. Predefined 3D keypoints are voted and the 6DoF pose is obtained via the optimization fitting. Both the scenarios of inference with and without the depth information are discussed. Tested on the datasets of YCB-Video and LINEMOD, the experiments demonstrate the effectiveness of our proposed method.