Depth map generation : depth estimation from images
Depth information is an important part of the 3D structure of a scene. Accurate depth information can help us better understand the scene and is useful in various applications, such as semantic segmentation, simultaneous localization and mapping (SLAM), autonomous driving cars, 3D modeling, etc. Tra...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/150273 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-150273 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1502732023-07-04T16:11:35Z Depth map generation : depth estimation from images Zhao, Yukai - School of Electrical and Electronic Engineering Tay Wee Peng wptay@ntu.edu.sg Engineering::Electrical and electronic engineering::Computer hardware, software and systems Depth information is an important part of the 3D structure of a scene. Accurate depth information can help us better understand the scene and is useful in various applications, such as semantic segmentation, simultaneous localization and mapping (SLAM), autonomous driving cars, 3D modeling, etc. Traditional methods mostly use binocular or multi-view images for depth estimation. The most commonly used method is stereo matching technology, which uses triangulation to estimate scene depth information from the image, but it can be easily affected by the diversity of scenes, and the amount of calculation is huge. Monocular images have lower requirements for equipment and environment, and depth estimation through monocular images is closer to the actual situation and has a wider range of application scenarios. With the breakthrough of deep learning, convolutional neural networks (CNN) have demonstrated exciting and outstanding performance on depth estimation. State-of-the-art methods indicate that monocular depth estimation methods require less memory and less calculation time, and at the same time have relatively good accuracy. Monocular depth estimation has therefore attracted more attention. Our objective is to build a monocular image depth estimation model based on deep learning. In this dissertation, three published models BTSNet, BANet, and ViP-DeepLab are investigated and implemented. Our BTSNet has achieved almost the same results as the original paper, which can retain the local detail information and restore the boundaries of the objects by the local plane guidance (LPG) layer in the model. Due to the lack of detailed information on the depth to space (D2S) layer which is used to get the full resolution feature map in BANet, we propose a new model by replacing the D2S layer with the outputs of the LPG layer. The model is trained on the KITTI dataset and has achieved better performance compared with the state-of-the-art methods. Inspired by the LPG layer in the BTSNet, a new upsampling layer called LPG upsampling layer is proposed. The LPG upsampling layer aims to provide detailed information when enlarging the resolution. By introducing the LPG upsampling layer to the ViP-DeepLab, the model achieves a 5% improvement in root mean square error (RMSE). Master of Science (Computer Control and Automation) 2021-06-08T12:23:56Z 2021-06-08T12:23:56Z 2021 Thesis-Master by Coursework Zhao, Y. (2021). Depth map generation : depth estimation from images. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/150273 https://hdl.handle.net/10356/150273 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Electrical and electronic engineering::Computer hardware, software and systems |
spellingShingle |
Engineering::Electrical and electronic engineering::Computer hardware, software and systems Zhao, Yukai Depth map generation : depth estimation from images |
description |
Depth information is an important part of the 3D structure of a scene. Accurate depth information can help us better understand the scene and is useful in various applications, such as semantic segmentation, simultaneous localization and mapping (SLAM), autonomous driving cars, 3D modeling, etc. Traditional methods mostly use binocular or multi-view images for depth estimation. The most commonly used method is stereo matching technology, which uses triangulation to estimate scene depth information from the image, but it can be easily affected by the diversity of scenes, and the amount of calculation is huge. Monocular images have lower requirements for equipment and environment, and depth estimation through monocular images is closer to the actual situation and has a wider range of application scenarios. With the breakthrough of deep learning, convolutional neural networks (CNN) have demonstrated exciting and outstanding performance on depth estimation. State-of-the-art methods indicate that monocular depth estimation methods require less memory and less calculation time, and at the same time have relatively good accuracy. Monocular depth estimation has therefore attracted more attention.
Our objective is to build a monocular image depth estimation model based on deep learning. In this dissertation, three published models BTSNet, BANet, and ViP-DeepLab are investigated and implemented. Our BTSNet has achieved almost the same results as the original paper, which can retain the local detail information and restore the boundaries of the objects by the local plane guidance (LPG) layer in the model. Due to the lack of detailed information on the depth to space (D2S) layer which is used to get the full resolution feature map in BANet, we propose a new model by replacing the D2S layer with the outputs of the LPG layer. The model is trained on the KITTI dataset and has achieved better performance compared with the state-of-the-art methods. Inspired by the LPG layer in the BTSNet, a new upsampling layer called LPG upsampling layer is proposed. The LPG upsampling layer aims to provide detailed information when enlarging the resolution. By introducing the LPG upsampling layer to the ViP-DeepLab, the model achieves a 5% improvement in root mean square error (RMSE). |
author2 |
- |
author_facet |
- Zhao, Yukai |
format |
Thesis-Master by Coursework |
author |
Zhao, Yukai |
author_sort |
Zhao, Yukai |
title |
Depth map generation : depth estimation from images |
title_short |
Depth map generation : depth estimation from images |
title_full |
Depth map generation : depth estimation from images |
title_fullStr |
Depth map generation : depth estimation from images |
title_full_unstemmed |
Depth map generation : depth estimation from images |
title_sort |
depth map generation : depth estimation from images |
publisher |
Nanyang Technological University |
publishDate |
2021 |
url |
https://hdl.handle.net/10356/150273 |
_version_ |
1772828000920797184 |