Deep learning based monocular visual-inertial odometry

Autonomous vehicles require knowing their state in the environment to make a decision and achieve their desired goal. The Visual-Inertial Odometry (VIO) system is one of the most significant modules of autonomous vehicles. This module enables robots to know their position and orientation, relative t...

Full description

Saved in:
Bibliographic Details
Main Author: Mahdi Abolfazli Esfahani
Other Authors: Wang Han
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/152280
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-152280
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Engineering::Electrical and electronic engineering
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Engineering::Electrical and electronic engineering
Mahdi Abolfazli Esfahani
Deep learning based monocular visual-inertial odometry
description Autonomous vehicles require knowing their state in the environment to make a decision and achieve their desired goal. The Visual-Inertial Odometry (VIO) system is one of the most significant modules of autonomous vehicles. This module enables robots to know their position and orientation, relative to the starting point, in the environment. Despite decades of research using different sensors, current odometry systems suffer from various kinds of problems, resulting in odometry failure. Different types of sensors and their combination are studied and examined by researchers in the past decades, and novel methods are proposed. In between all sensor combinations, using a monocular camera simultaneously with an Inertial Measurement Unit (IMU) is in researchers' focus in the past three years because of its lightweight, low-cost, and small combinations of sensors. This combination can be used in many applications, from smart mobile phones to micro aerial vehicles. Traditional methods attempt to extract features from visual information, track them in consecutive frames, and tightly optimize the relation between feature points in consecutive camera frames with the secured IMU information to solve the problem. However, they cannot perform well in textureless regions and fail when they cannot track features. Moreover, they cannot handle some challenging scenarios where an object moves in front of the camera. Deep learning based VIO systems are investigated to tackle this challenge. While deep learning based VIO systems achieved satisfactory results, they suffer from wrong motion estimation and scale extraction problems in different scenarios; due to their low generalization capability. This research investigates the pros and cons of existing deep learning based monocular VIO systems. It then proposes a new approach to train Convolutional Neural Networks (CNNs) for the regression problem of 6-DOF camera re-localization, which makes CNNs more robust. The proposed approach is further extended to solve the problem of Visual Odometry (VO) in an end-to-end manner. The effect of moving objects and motion blur are also studied, and approaches to make visual odometry and camera re-localization modules robust to them are proposed. Furthermore, since VO's primary step is to extract robust features that can be tracked in consecutive frames, a bio-inspired approach that learns to extract distinctive handcrafted features, from a single training image, with high generalization ability is proposed. The proposed method executes in real-time and surpasses traditional and deep learning based methods in terms of robust handcrafted feature extraction. IMU suffers from bias and measurement noise, which makes the problem of Inertial Odometry (IO) much more complicated. Due to the error propagation over time (during robot position estimation), an inaccurate estimate or a small error will cause the odometry and localization system unreliable and unusable in a split of seconds. This research presents a novel triple channel deep IO network based on IMU's physical and mathematical model to solve IO's problem. The proposed method is capable of outputting the length of orientation and position changes over time. The proposed model simulates the noise model in the training phase and becomes robust to noise in testing. Besides, the proposed network architecture includes time interval between two consecutive IMU readings as input, which makes it robust to change of IMU frame rate and miss of IMU information. The proposed network architecture outperforms all the existing solutions on the IMU readings of challenging Micro Aerial Vehicle (MAV) dataset and improves the accuracy for about 25 percent. Furthermore, the approach is further extended to extract the full 3D orientation of MAV accurately.
author2 Wang Han
author_facet Wang Han
Mahdi Abolfazli Esfahani
format Thesis-Doctor of Philosophy
author Mahdi Abolfazli Esfahani
author_sort Mahdi Abolfazli Esfahani
title Deep learning based monocular visual-inertial odometry
title_short Deep learning based monocular visual-inertial odometry
title_full Deep learning based monocular visual-inertial odometry
title_fullStr Deep learning based monocular visual-inertial odometry
title_full_unstemmed Deep learning based monocular visual-inertial odometry
title_sort deep learning based monocular visual-inertial odometry
publisher Nanyang Technological University
publishDate 2021
url https://hdl.handle.net/10356/152280
_version_ 1772828925057040384
spelling sg-ntu-dr.10356-1522802023-07-04T17:07:02Z Deep learning based monocular visual-inertial odometry Mahdi Abolfazli Esfahani Wang Han School of Electrical and Electronic Engineering HW@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Electrical and electronic engineering Autonomous vehicles require knowing their state in the environment to make a decision and achieve their desired goal. The Visual-Inertial Odometry (VIO) system is one of the most significant modules of autonomous vehicles. This module enables robots to know their position and orientation, relative to the starting point, in the environment. Despite decades of research using different sensors, current odometry systems suffer from various kinds of problems, resulting in odometry failure. Different types of sensors and their combination are studied and examined by researchers in the past decades, and novel methods are proposed. In between all sensor combinations, using a monocular camera simultaneously with an Inertial Measurement Unit (IMU) is in researchers' focus in the past three years because of its lightweight, low-cost, and small combinations of sensors. This combination can be used in many applications, from smart mobile phones to micro aerial vehicles. Traditional methods attempt to extract features from visual information, track them in consecutive frames, and tightly optimize the relation between feature points in consecutive camera frames with the secured IMU information to solve the problem. However, they cannot perform well in textureless regions and fail when they cannot track features. Moreover, they cannot handle some challenging scenarios where an object moves in front of the camera. Deep learning based VIO systems are investigated to tackle this challenge. While deep learning based VIO systems achieved satisfactory results, they suffer from wrong motion estimation and scale extraction problems in different scenarios; due to their low generalization capability. This research investigates the pros and cons of existing deep learning based monocular VIO systems. It then proposes a new approach to train Convolutional Neural Networks (CNNs) for the regression problem of 6-DOF camera re-localization, which makes CNNs more robust. The proposed approach is further extended to solve the problem of Visual Odometry (VO) in an end-to-end manner. The effect of moving objects and motion blur are also studied, and approaches to make visual odometry and camera re-localization modules robust to them are proposed. Furthermore, since VO's primary step is to extract robust features that can be tracked in consecutive frames, a bio-inspired approach that learns to extract distinctive handcrafted features, from a single training image, with high generalization ability is proposed. The proposed method executes in real-time and surpasses traditional and deep learning based methods in terms of robust handcrafted feature extraction. IMU suffers from bias and measurement noise, which makes the problem of Inertial Odometry (IO) much more complicated. Due to the error propagation over time (during robot position estimation), an inaccurate estimate or a small error will cause the odometry and localization system unreliable and unusable in a split of seconds. This research presents a novel triple channel deep IO network based on IMU's physical and mathematical model to solve IO's problem. The proposed method is capable of outputting the length of orientation and position changes over time. The proposed model simulates the noise model in the training phase and becomes robust to noise in testing. Besides, the proposed network architecture includes time interval between two consecutive IMU readings as input, which makes it robust to change of IMU frame rate and miss of IMU information. The proposed network architecture outperforms all the existing solutions on the IMU readings of challenging Micro Aerial Vehicle (MAV) dataset and improves the accuracy for about 25 percent. Furthermore, the approach is further extended to extract the full 3D orientation of MAV accurately. Doctor of Philosophy 2021-08-03T12:15:07Z 2021-08-03T12:15:07Z 2020 Thesis-Doctor of Philosophy Mahdi Abolfazli Esfahani (2020). Deep learning based monocular visual-inertial odometry. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/152280 https://hdl.handle.net/10356/152280 10.32657/10356/152280 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University