Depth map generation : depth estimation from images

Depth information is an important part of the 3D structure of a scene. Accurate depth information can help us better understand the scene and is useful in various applications, such as semantic segmentation, simultaneous localization and mapping (SLAM), autonomous driving cars, 3D modeling, etc. Tra...

Full description

Saved in:

Bibliographic Details
Main Author:	Zhao, Yukai
Other Authors:	-
Format:	Thesis-Master by Coursework
Language:	English
Published:	Nanyang Technological University 2021
Subjects:	Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Online Access:	https://hdl.handle.net/10356/150273
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-150273
record_format	dspace
spelling	sg-ntu-dr.10356-1502732023-07-04T16:11:35Z Depth map generation : depth estimation from images Zhao, Yukai - School of Electrical and Electronic Engineering Tay Wee Peng wptay@ntu.edu.sg Engineering::Electrical and electronic engineering::Computer hardware, software and systems Depth information is an important part of the 3D structure of a scene. Accurate depth information can help us better understand the scene and is useful in various applications, such as semantic segmentation, simultaneous localization and mapping (SLAM), autonomous driving cars, 3D modeling, etc. Traditional methods mostly use binocular or multi-view images for depth estimation. The most commonly used method is stereo matching technology, which uses triangulation to estimate scene depth information from the image, but it can be easily affected by the diversity of scenes, and the amount of calculation is huge. Monocular images have lower requirements for equipment and environment, and depth estimation through monocular images is closer to the actual situation and has a wider range of application scenarios. With the breakthrough of deep learning, convolutional neural networks (CNN) have demonstrated exciting and outstanding performance on depth estimation. State-of-the-art methods indicate that monocular depth estimation methods require less memory and less calculation time, and at the same time have relatively good accuracy. Monocular depth estimation has therefore attracted more attention. Our objective is to build a monocular image depth estimation model based on deep learning. In this dissertation, three published models BTSNet, BANet, and ViP-DeepLab are investigated and implemented. Our BTSNet has achieved almost the same results as the original paper, which can retain the local detail information and restore the boundaries of the objects by the local plane guidance (LPG) layer in the model. Due to the lack of detailed information on the depth to space (D2S) layer which is used to get the full resolution feature map in BANet, we propose a new model by replacing the D2S layer with the outputs of the LPG layer. The model is trained on the KITTI dataset and has achieved better performance compared with the state-of-the-art methods. Inspired by the LPG layer in the BTSNet, a new upsampling layer called LPG upsampling layer is proposed. The LPG upsampling layer aims to provide detailed information when enlarging the resolution. By introducing the LPG upsampling layer to the ViP-DeepLab, the model achieves a 5% improvement in root mean square error (RMSE). Master of Science (Computer Control and Automation) 2021-06-08T12:23:56Z 2021-06-08T12:23:56Z 2021 Thesis-Master by Coursework Zhao, Y. (2021). Depth map generation : depth estimation from images. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/150273 https://hdl.handle.net/10356/150273 en application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering::Computer hardware, software and systems
spellingShingle	Engineering::Electrical and electronic engineering::Computer hardware, software and systems Zhao, Yukai Depth map generation : depth estimation from images
description	Depth information is an important part of the 3D structure of a scene. Accurate depth information can help us better understand the scene and is useful in various applications, such as semantic segmentation, simultaneous localization and mapping (SLAM), autonomous driving cars, 3D modeling, etc. Traditional methods mostly use binocular or multi-view images for depth estimation. The most commonly used method is stereo matching technology, which uses triangulation to estimate scene depth information from the image, but it can be easily affected by the diversity of scenes, and the amount of calculation is huge. Monocular images have lower requirements for equipment and environment, and depth estimation through monocular images is closer to the actual situation and has a wider range of application scenarios. With the breakthrough of deep learning, convolutional neural networks (CNN) have demonstrated exciting and outstanding performance on depth estimation. State-of-the-art methods indicate that monocular depth estimation methods require less memory and less calculation time, and at the same time have relatively good accuracy. Monocular depth estimation has therefore attracted more attention. Our objective is to build a monocular image depth estimation model based on deep learning. In this dissertation, three published models BTSNet, BANet, and ViP-DeepLab are investigated and implemented. Our BTSNet has achieved almost the same results as the original paper, which can retain the local detail information and restore the boundaries of the objects by the local plane guidance (LPG) layer in the model. Due to the lack of detailed information on the depth to space (D2S) layer which is used to get the full resolution feature map in BANet, we propose a new model by replacing the D2S layer with the outputs of the LPG layer. The model is trained on the KITTI dataset and has achieved better performance compared with the state-of-the-art methods. Inspired by the LPG layer in the BTSNet, a new upsampling layer called LPG upsampling layer is proposed. The LPG upsampling layer aims to provide detailed information when enlarging the resolution. By introducing the LPG upsampling layer to the ViP-DeepLab, the model achieves a 5% improvement in root mean square error (RMSE).
author2	-
author_facet	- Zhao, Yukai
format	Thesis-Master by Coursework
author	Zhao, Yukai
author_sort	Zhao, Yukai
title	Depth map generation : depth estimation from images
title_short	Depth map generation : depth estimation from images
title_full	Depth map generation : depth estimation from images
title_fullStr	Depth map generation : depth estimation from images
title_full_unstemmed	Depth map generation : depth estimation from images
title_sort	depth map generation : depth estimation from images
publisher	Nanyang Technological University
publishDate	2021
url	https://hdl.handle.net/10356/150273
_version_	1772828000920797184

Depth map generation : depth estimation from images

Similar Items