Towards robust monocular depth estimation: a new baseline and benchmark

Before deploying a monocular depth estimation (MDE) model in real-world applications such as autonomous driving, it is critical to understand its generalization and robustness. Although the generalization of MDE models has been thoroughly studied, the robustness of the models has been overlooked in...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xian, Ke, Cao, Zhiguo, Shen, Chunhua, Lin, Guosheng
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2024
Subjects:	Computer and Information Science Monocular depth prediction Generalization
Online Access:	https://hdl.handle.net/10356/174734
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-174734
record_format	dspace
spelling	sg-ntu-dr.10356-1747342024-04-12T15:37:30Z Towards robust monocular depth estimation: a new baseline and benchmark Xian, Ke Cao, Zhiguo Shen, Chunhua Lin, Guosheng School of Computer Science and Engineering Computer and Information Science Monocular depth prediction Generalization Before deploying a monocular depth estimation (MDE) model in real-world applications such as autonomous driving, it is critical to understand its generalization and robustness. Although the generalization of MDE models has been thoroughly studied, the robustness of the models has been overlooked in previous research. Existing state-of-the-art methods exhibit strong generalization to clean, unseen scenes. Such methods, however, appear to degrade when the test image is perturbed. This is likely because the prior arts typically use the primary 2D data augmentations (e.g., random horizontal flipping, random cropping, and color jittering), ignoring other common image degradation or corruptions. To mitigate this issue, we delve deeper into data augmentation and propose utilizing strong data augmentation techniques for robust depth estimation. In particular, we introduce 3D-aware defocus blur in addition to seven 2D data augmentations. We evaluate the generalization of our model on six clean RGB-D datasets that were not seen during training. To evaluate the robustness of MDE models, we create a benchmark by applying 15 common corruptions to the clean images from IBIMS, NYUDv2, KITTI, ETH3D, DIODE, and TUM. On this benchmark, we systematically study the robustness of our method and 9 representative MDE models. The experimental results demonstrate that our model exhibits better generalization and robustness than the previous methods. Specifically, we provide valuable insights about the choices of data augmentation strategies and network architectures, which would be useful for future research in robust monocular depth estimation. Our code, model, and benchmark can be available at https://github.com/KexianHust/Robust-MonoDepth . Ministry of Education (MOE) Submitted/Accepted version This work was in part supported by the National Key R&D Program of China (No. 2022ZD0118700), and partly supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (MOE-T2EP20220-0007). This work was also supported under the RIE2020 Industry Alignment Fund - Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s). Z. Cao was supported by the National Natural Science Foundation of China (No. U1913602). 2024-04-08T07:43:08Z 2024-04-08T07:43:08Z 2024 Journal Article Xian, K., Cao, Z., Shen, C. & Lin, G. (2024). Towards robust monocular depth estimation: a new baseline and benchmark. International Journal of Computer Vision. https://dx.doi.org/10.1007/s11263-023-01979-4 0920-5691 https://hdl.handle.net/10356/174734 10.1007/s11263-023-01979-4 2-s2.0-85182691022 en MOE-T2EP20220-0007 IAF-ICP International Journal of Computer Vision © 2024 The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1007/s11263-023-01979-4. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science Monocular depth prediction Generalization
spellingShingle	Computer and Information Science Monocular depth prediction Generalization Xian, Ke Cao, Zhiguo Shen, Chunhua Lin, Guosheng Towards robust monocular depth estimation: a new baseline and benchmark
description	Before deploying a monocular depth estimation (MDE) model in real-world applications such as autonomous driving, it is critical to understand its generalization and robustness. Although the generalization of MDE models has been thoroughly studied, the robustness of the models has been overlooked in previous research. Existing state-of-the-art methods exhibit strong generalization to clean, unseen scenes. Such methods, however, appear to degrade when the test image is perturbed. This is likely because the prior arts typically use the primary 2D data augmentations (e.g., random horizontal flipping, random cropping, and color jittering), ignoring other common image degradation or corruptions. To mitigate this issue, we delve deeper into data augmentation and propose utilizing strong data augmentation techniques for robust depth estimation. In particular, we introduce 3D-aware defocus blur in addition to seven 2D data augmentations. We evaluate the generalization of our model on six clean RGB-D datasets that were not seen during training. To evaluate the robustness of MDE models, we create a benchmark by applying 15 common corruptions to the clean images from IBIMS, NYUDv2, KITTI, ETH3D, DIODE, and TUM. On this benchmark, we systematically study the robustness of our method and 9 representative MDE models. The experimental results demonstrate that our model exhibits better generalization and robustness than the previous methods. Specifically, we provide valuable insights about the choices of data augmentation strategies and network architectures, which would be useful for future research in robust monocular depth estimation. Our code, model, and benchmark can be available at https://github.com/KexianHust/Robust-MonoDepth .
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Xian, Ke Cao, Zhiguo Shen, Chunhua Lin, Guosheng
format	Article
author	Xian, Ke Cao, Zhiguo Shen, Chunhua Lin, Guosheng
author_sort	Xian, Ke
title	Towards robust monocular depth estimation: a new baseline and benchmark
title_short	Towards robust monocular depth estimation: a new baseline and benchmark
title_full	Towards robust monocular depth estimation: a new baseline and benchmark
title_fullStr	Towards robust monocular depth estimation: a new baseline and benchmark
title_full_unstemmed	Towards robust monocular depth estimation: a new baseline and benchmark
title_sort	towards robust monocular depth estimation: a new baseline and benchmark
publishDate	2024
url	https://hdl.handle.net/10356/174734
_version_	1806059798143172608

Towards robust monocular depth estimation: a new baseline and benchmark

Similar Items