Deep learning based monocular visual-inertial odometry

Autonomous vehicles require knowing their state in the environment to make a decision and achieve their desired goal. The Visual-Inertial Odometry (VIO) system is one of the most significant modules of autonomous vehicles. This module enables robots to know their position and orientation, relative t...

Full description

Saved in:

Bibliographic Details
Main Author:	Mahdi Abolfazli Esfahani
Other Authors:	Wang Han
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2021
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Electrical and electronic engineering
Online Access:	https://hdl.handle.net/10356/152280
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-152280
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Electrical and electronic engineering
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Electrical and electronic engineering Mahdi Abolfazli Esfahani Deep learning based monocular visual-inertial odometry
description	Autonomous vehicles require knowing their state in the environment to make a decision and achieve their desired goal. The Visual-Inertial Odometry (VIO) system is one of the most significant modules of autonomous vehicles. This module enables robots to know their position and orientation, relative to the starting point, in the environment. Despite decades of research using different sensors, current odometry systems suffer from various kinds of problems, resulting in odometry failure. Different types of sensors and their combination are studied and examined by researchers in the past decades, and novel methods are proposed. In between all sensor combinations, using a monocular camera simultaneously with an Inertial Measurement Unit (IMU) is in researchers' focus in the past three years because of its lightweight, low-cost, and small combinations of sensors. This combination can be used in many applications, from smart mobile phones to micro aerial vehicles. Traditional methods attempt to extract features from visual information, track them in consecutive frames, and tightly optimize the relation between feature points in consecutive camera frames with the secured IMU information to solve the problem. However, they cannot perform well in textureless regions and fail when they cannot track features. Moreover, they cannot handle some challenging scenarios where an object moves in front of the camera. Deep learning based VIO systems are investigated to tackle this challenge. While deep learning based VIO systems achieved satisfactory results, they suffer from wrong motion estimation and scale extraction problems in different scenarios; due to their low generalization capability. This research investigates the pros and cons of existing deep learning based monocular VIO systems. It then proposes a new approach to train Convolutional Neural Networks (CNNs) for the regression problem of 6-DOF camera re-localization, which makes CNNs more robust. The proposed approach is further extended to solve the problem of Visual Odometry (VO) in an end-to-end manner. The effect of moving objects and motion blur are also studied, and approaches to make visual odometry and camera re-localization modules robust to them are proposed. Furthermore, since VO's primary step is to extract robust features that can be tracked in consecutive frames, a bio-inspired approach that learns to extract distinctive handcrafted features, from a single training image, with high generalization ability is proposed. The proposed method executes in real-time and surpasses traditional and deep learning based methods in terms of robust handcrafted feature extraction. IMU suffers from bias and measurement noise, which makes the problem of Inertial Odometry (IO) much more complicated. Due to the error propagation over time (during robot position estimation), an inaccurate estimate or a small error will cause the odometry and localization system unreliable and unusable in a split of seconds. This research presents a novel triple channel deep IO network based on IMU's physical and mathematical model to solve IO's problem. The proposed method is capable of outputting the length of orientation and position changes over time. The proposed model simulates the noise model in the training phase and becomes robust to noise in testing. Besides, the proposed network architecture includes time interval between two consecutive IMU readings as input, which makes it robust to change of IMU frame rate and miss of IMU information. The proposed network architecture outperforms all the existing solutions on the IMU readings of challenging Micro Aerial Vehicle (MAV) dataset and improves the accuracy for about 25 percent. Furthermore, the approach is further extended to extract the full 3D orientation of MAV accurately.
author2	Wang Han
author_facet	Wang Han Mahdi Abolfazli Esfahani
format	Thesis-Doctor of Philosophy
author	Mahdi Abolfazli Esfahani
author_sort	Mahdi Abolfazli Esfahani
title	Deep learning based monocular visual-inertial odometry
title_short	Deep learning based monocular visual-inertial odometry
title_full	Deep learning based monocular visual-inertial odometry
title_fullStr	Deep learning based monocular visual-inertial odometry
title_full_unstemmed	Deep learning based monocular visual-inertial odometry
title_sort	deep learning based monocular visual-inertial odometry
publisher	Nanyang Technological University
publishDate	2021
url	https://hdl.handle.net/10356/152280
_version_	1772828925057040384
spelling	sg-ntu-dr.10356-1522802023-07-04T17:07:02Z Deep learning based monocular visual-inertial odometry Mahdi Abolfazli Esfahani Wang Han School of Electrical and Electronic Engineering HW@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Electrical and electronic engineering Autonomous vehicles require knowing their state in the environment to make a decision and achieve their desired goal. The Visual-Inertial Odometry (VIO) system is one of the most significant modules of autonomous vehicles. This module enables robots to know their position and orientation, relative to the starting point, in the environment. Despite decades of research using different sensors, current odometry systems suffer from various kinds of problems, resulting in odometry failure. Different types of sensors and their combination are studied and examined by researchers in the past decades, and novel methods are proposed. In between all sensor combinations, using a monocular camera simultaneously with an Inertial Measurement Unit (IMU) is in researchers' focus in the past three years because of its lightweight, low-cost, and small combinations of sensors. This combination can be used in many applications, from smart mobile phones to micro aerial vehicles. Traditional methods attempt to extract features from visual information, track them in consecutive frames, and tightly optimize the relation between feature points in consecutive camera frames with the secured IMU information to solve the problem. However, they cannot perform well in textureless regions and fail when they cannot track features. Moreover, they cannot handle some challenging scenarios where an object moves in front of the camera. Deep learning based VIO systems are investigated to tackle this challenge. While deep learning based VIO systems achieved satisfactory results, they suffer from wrong motion estimation and scale extraction problems in different scenarios; due to their low generalization capability. This research investigates the pros and cons of existing deep learning based monocular VIO systems. It then proposes a new approach to train Convolutional Neural Networks (CNNs) for the regression problem of 6-DOF camera re-localization, which makes CNNs more robust. The proposed approach is further extended to solve the problem of Visual Odometry (VO) in an end-to-end manner. The effect of moving objects and motion blur are also studied, and approaches to make visual odometry and camera re-localization modules robust to them are proposed. Furthermore, since VO's primary step is to extract robust features that can be tracked in consecutive frames, a bio-inspired approach that learns to extract distinctive handcrafted features, from a single training image, with high generalization ability is proposed. The proposed method executes in real-time and surpasses traditional and deep learning based methods in terms of robust handcrafted feature extraction. IMU suffers from bias and measurement noise, which makes the problem of Inertial Odometry (IO) much more complicated. Due to the error propagation over time (during robot position estimation), an inaccurate estimate or a small error will cause the odometry and localization system unreliable and unusable in a split of seconds. This research presents a novel triple channel deep IO network based on IMU's physical and mathematical model to solve IO's problem. The proposed method is capable of outputting the length of orientation and position changes over time. The proposed model simulates the noise model in the training phase and becomes robust to noise in testing. Besides, the proposed network architecture includes time interval between two consecutive IMU readings as input, which makes it robust to change of IMU frame rate and miss of IMU information. The proposed network architecture outperforms all the existing solutions on the IMU readings of challenging Micro Aerial Vehicle (MAV) dataset and improves the accuracy for about 25 percent. Furthermore, the approach is further extended to extract the full 3D orientation of MAV accurately. Doctor of Philosophy 2021-08-03T12:15:07Z 2021-08-03T12:15:07Z 2020 Thesis-Doctor of Philosophy Mahdi Abolfazli Esfahani (2020). Deep learning based monocular visual-inertial odometry. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/152280 https://hdl.handle.net/10356/152280 10.32657/10356/152280 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

Deep learning based monocular visual-inertial odometry

Similar Items