Deep learning based monocular visual-inertial odometry
Autonomous vehicles require knowing their state in the environment to make a decision and achieve their desired goal. The Visual-Inertial Odometry (VIO) system is one of the most significant modules of autonomous vehicles. This module enables robots to know their position and orientation, relative t...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/152280 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-152280 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Electrical and electronic engineering |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Electrical and electronic engineering Mahdi Abolfazli Esfahani Deep learning based monocular visual-inertial odometry |
description |
Autonomous vehicles require knowing their state in the environment to make a decision and achieve their desired goal. The Visual-Inertial Odometry (VIO) system is one of the most significant modules of autonomous vehicles. This module enables robots to know their position and orientation, relative to the starting point, in the environment. Despite decades of research using different sensors, current odometry systems suffer from various kinds of problems, resulting in odometry failure.
Different types of sensors and their combination are studied and examined by researchers in the past decades, and novel methods are proposed. In between all sensor combinations, using a monocular camera simultaneously with an Inertial Measurement Unit (IMU) is in researchers' focus in the past three years because of its lightweight, low-cost, and small combinations of sensors. This combination can be used in many applications, from smart mobile phones to micro aerial vehicles.
Traditional methods attempt to extract features from visual information, track them in consecutive frames, and tightly optimize the relation between feature points in consecutive camera frames with the secured IMU information to solve the problem. However, they cannot perform well in textureless regions and fail when they cannot track features. Moreover, they cannot handle some challenging scenarios where an object moves in front of the camera. Deep learning based VIO systems are investigated to tackle this challenge. While deep learning based VIO systems achieved satisfactory results, they suffer from wrong motion estimation and scale extraction problems in different scenarios; due to their low generalization capability.
This research investigates the pros and cons of existing deep learning based monocular VIO systems. It then proposes a new approach to train Convolutional Neural Networks (CNNs) for the regression problem of 6-DOF camera re-localization, which makes CNNs more robust. The proposed approach is further extended to solve the problem of Visual Odometry (VO) in an end-to-end manner. The effect of moving objects and motion blur are also studied, and approaches to make visual odometry and camera re-localization modules robust to them are proposed.
Furthermore, since VO's primary step is to extract robust features that can be tracked in consecutive frames, a bio-inspired approach that learns to extract distinctive handcrafted features, from a single training image, with high generalization ability is proposed. The proposed method executes in real-time and surpasses traditional and deep learning based methods in terms of robust handcrafted feature extraction.
IMU suffers from bias and measurement noise, which makes the problem of Inertial Odometry (IO) much more complicated. Due to the error propagation over time (during robot position estimation), an inaccurate estimate or a small error will cause the odometry and localization system unreliable and unusable in a split of seconds. This research presents a novel triple channel deep IO network based on IMU's physical and mathematical model to solve IO's problem. The proposed method is capable of outputting the length of orientation and position changes over time. The proposed model simulates the noise model in the training phase and becomes robust to noise in testing. Besides, the proposed network architecture includes time interval between two consecutive IMU readings as input, which makes it robust to change of IMU frame rate and miss of IMU information. The proposed network architecture outperforms all the existing solutions on the IMU readings of challenging Micro Aerial Vehicle (MAV) dataset and improves the accuracy for about 25 percent. Furthermore, the approach is further extended to extract the full 3D orientation of MAV accurately. |
author2 |
Wang Han |
author_facet |
Wang Han Mahdi Abolfazli Esfahani |
format |
Thesis-Doctor of Philosophy |
author |
Mahdi Abolfazli Esfahani |
author_sort |
Mahdi Abolfazli Esfahani |
title |
Deep learning based monocular visual-inertial odometry |
title_short |
Deep learning based monocular visual-inertial odometry |
title_full |
Deep learning based monocular visual-inertial odometry |
title_fullStr |
Deep learning based monocular visual-inertial odometry |
title_full_unstemmed |
Deep learning based monocular visual-inertial odometry |
title_sort |
deep learning based monocular visual-inertial odometry |
publisher |
Nanyang Technological University |
publishDate |
2021 |
url |
https://hdl.handle.net/10356/152280 |
_version_ |
1772828925057040384 |
spelling |
sg-ntu-dr.10356-1522802023-07-04T17:07:02Z Deep learning based monocular visual-inertial odometry Mahdi Abolfazli Esfahani Wang Han School of Electrical and Electronic Engineering HW@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Engineering::Electrical and electronic engineering Autonomous vehicles require knowing their state in the environment to make a decision and achieve their desired goal. The Visual-Inertial Odometry (VIO) system is one of the most significant modules of autonomous vehicles. This module enables robots to know their position and orientation, relative to the starting point, in the environment. Despite decades of research using different sensors, current odometry systems suffer from various kinds of problems, resulting in odometry failure. Different types of sensors and their combination are studied and examined by researchers in the past decades, and novel methods are proposed. In between all sensor combinations, using a monocular camera simultaneously with an Inertial Measurement Unit (IMU) is in researchers' focus in the past three years because of its lightweight, low-cost, and small combinations of sensors. This combination can be used in many applications, from smart mobile phones to micro aerial vehicles. Traditional methods attempt to extract features from visual information, track them in consecutive frames, and tightly optimize the relation between feature points in consecutive camera frames with the secured IMU information to solve the problem. However, they cannot perform well in textureless regions and fail when they cannot track features. Moreover, they cannot handle some challenging scenarios where an object moves in front of the camera. Deep learning based VIO systems are investigated to tackle this challenge. While deep learning based VIO systems achieved satisfactory results, they suffer from wrong motion estimation and scale extraction problems in different scenarios; due to their low generalization capability. This research investigates the pros and cons of existing deep learning based monocular VIO systems. It then proposes a new approach to train Convolutional Neural Networks (CNNs) for the regression problem of 6-DOF camera re-localization, which makes CNNs more robust. The proposed approach is further extended to solve the problem of Visual Odometry (VO) in an end-to-end manner. The effect of moving objects and motion blur are also studied, and approaches to make visual odometry and camera re-localization modules robust to them are proposed. Furthermore, since VO's primary step is to extract robust features that can be tracked in consecutive frames, a bio-inspired approach that learns to extract distinctive handcrafted features, from a single training image, with high generalization ability is proposed. The proposed method executes in real-time and surpasses traditional and deep learning based methods in terms of robust handcrafted feature extraction. IMU suffers from bias and measurement noise, which makes the problem of Inertial Odometry (IO) much more complicated. Due to the error propagation over time (during robot position estimation), an inaccurate estimate or a small error will cause the odometry and localization system unreliable and unusable in a split of seconds. This research presents a novel triple channel deep IO network based on IMU's physical and mathematical model to solve IO's problem. The proposed method is capable of outputting the length of orientation and position changes over time. The proposed model simulates the noise model in the training phase and becomes robust to noise in testing. Besides, the proposed network architecture includes time interval between two consecutive IMU readings as input, which makes it robust to change of IMU frame rate and miss of IMU information. The proposed network architecture outperforms all the existing solutions on the IMU readings of challenging Micro Aerial Vehicle (MAV) dataset and improves the accuracy for about 25 percent. Furthermore, the approach is further extended to extract the full 3D orientation of MAV accurately. Doctor of Philosophy 2021-08-03T12:15:07Z 2021-08-03T12:15:07Z 2020 Thesis-Doctor of Philosophy Mahdi Abolfazli Esfahani (2020). Deep learning based monocular visual-inertial odometry. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/152280 https://hdl.handle.net/10356/152280 10.32657/10356/152280 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |