Depth map generation : depth estimation from images

Depth information is an important part of the 3D structure of a scene. Accurate depth information can help us better understand the scene and is useful in various applications, such as semantic segmentation, simultaneous localization and mapping (SLAM), autonomous driving cars, 3D modeling, etc. Tra...

Full description

Saved in:
Bibliographic Details
Main Author: Zhao, Yukai
Other Authors: -
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/150273
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-150273
record_format dspace
spelling sg-ntu-dr.10356-1502732023-07-04T16:11:35Z Depth map generation : depth estimation from images Zhao, Yukai - School of Electrical and Electronic Engineering Tay Wee Peng wptay@ntu.edu.sg Engineering::Electrical and electronic engineering::Computer hardware, software and systems Depth information is an important part of the 3D structure of a scene. Accurate depth information can help us better understand the scene and is useful in various applications, such as semantic segmentation, simultaneous localization and mapping (SLAM), autonomous driving cars, 3D modeling, etc. Traditional methods mostly use binocular or multi-view images for depth estimation. The most commonly used method is stereo matching technology, which uses triangulation to estimate scene depth information from the image, but it can be easily affected by the diversity of scenes, and the amount of calculation is huge. Monocular images have lower requirements for equipment and environment, and depth estimation through monocular images is closer to the actual situation and has a wider range of application scenarios. With the breakthrough of deep learning, convolutional neural networks (CNN) have demonstrated exciting and outstanding performance on depth estimation. State-of-the-art methods indicate that monocular depth estimation methods require less memory and less calculation time, and at the same time have relatively good accuracy. Monocular depth estimation has therefore attracted more attention. Our objective is to build a monocular image depth estimation model based on deep learning. In this dissertation, three published models BTSNet, BANet, and ViP-DeepLab are investigated and implemented. Our BTSNet has achieved almost the same results as the original paper, which can retain the local detail information and restore the boundaries of the objects by the local plane guidance (LPG) layer in the model. Due to the lack of detailed information on the depth to space (D2S) layer which is used to get the full resolution feature map in BANet, we propose a new model by replacing the D2S layer with the outputs of the LPG layer. The model is trained on the KITTI dataset and has achieved better performance compared with the state-of-the-art methods. Inspired by the LPG layer in the BTSNet, a new upsampling layer called LPG upsampling layer is proposed. The LPG upsampling layer aims to provide detailed information when enlarging the resolution. By introducing the LPG upsampling layer to the ViP-DeepLab, the model achieves a 5% improvement in root mean square error (RMSE). Master of Science (Computer Control and Automation) 2021-06-08T12:23:56Z 2021-06-08T12:23:56Z 2021 Thesis-Master by Coursework Zhao, Y. (2021). Depth map generation : depth estimation from images. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/150273 https://hdl.handle.net/10356/150273 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering::Computer hardware, software and systems
spellingShingle Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Zhao, Yukai
Depth map generation : depth estimation from images
description Depth information is an important part of the 3D structure of a scene. Accurate depth information can help us better understand the scene and is useful in various applications, such as semantic segmentation, simultaneous localization and mapping (SLAM), autonomous driving cars, 3D modeling, etc. Traditional methods mostly use binocular or multi-view images for depth estimation. The most commonly used method is stereo matching technology, which uses triangulation to estimate scene depth information from the image, but it can be easily affected by the diversity of scenes, and the amount of calculation is huge. Monocular images have lower requirements for equipment and environment, and depth estimation through monocular images is closer to the actual situation and has a wider range of application scenarios. With the breakthrough of deep learning, convolutional neural networks (CNN) have demonstrated exciting and outstanding performance on depth estimation. State-of-the-art methods indicate that monocular depth estimation methods require less memory and less calculation time, and at the same time have relatively good accuracy. Monocular depth estimation has therefore attracted more attention. Our objective is to build a monocular image depth estimation model based on deep learning. In this dissertation, three published models BTSNet, BANet, and ViP-DeepLab are investigated and implemented. Our BTSNet has achieved almost the same results as the original paper, which can retain the local detail information and restore the boundaries of the objects by the local plane guidance (LPG) layer in the model. Due to the lack of detailed information on the depth to space (D2S) layer which is used to get the full resolution feature map in BANet, we propose a new model by replacing the D2S layer with the outputs of the LPG layer. The model is trained on the KITTI dataset and has achieved better performance compared with the state-of-the-art methods. Inspired by the LPG layer in the BTSNet, a new upsampling layer called LPG upsampling layer is proposed. The LPG upsampling layer aims to provide detailed information when enlarging the resolution. By introducing the LPG upsampling layer to the ViP-DeepLab, the model achieves a 5% improvement in root mean square error (RMSE).
author2 -
author_facet -
Zhao, Yukai
format Thesis-Master by Coursework
author Zhao, Yukai
author_sort Zhao, Yukai
title Depth map generation : depth estimation from images
title_short Depth map generation : depth estimation from images
title_full Depth map generation : depth estimation from images
title_fullStr Depth map generation : depth estimation from images
title_full_unstemmed Depth map generation : depth estimation from images
title_sort depth map generation : depth estimation from images
publisher Nanyang Technological University
publishDate 2021
url https://hdl.handle.net/10356/150273
_version_ 1772828000920797184