Towards robust monocular depth estimation: a new baseline and benchmark

Before deploying a monocular depth estimation (MDE) model in real-world applications such as autonomous driving, it is critical to understand its generalization and robustness. Although the generalization of MDE models has been thoroughly studied, the robustness of the models has been overlooked in...

Full description

Saved in:
Bibliographic Details
Main Authors: Xian, Ke, Cao, Zhiguo, Shen, Chunhua, Lin, Guosheng
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/174734
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-174734
record_format dspace
spelling sg-ntu-dr.10356-1747342024-04-12T15:37:30Z Towards robust monocular depth estimation: a new baseline and benchmark Xian, Ke Cao, Zhiguo Shen, Chunhua Lin, Guosheng School of Computer Science and Engineering Computer and Information Science Monocular depth prediction Generalization Before deploying a monocular depth estimation (MDE) model in real-world applications such as autonomous driving, it is critical to understand its generalization and robustness. Although the generalization of MDE models has been thoroughly studied, the robustness of the models has been overlooked in previous research. Existing state-of-the-art methods exhibit strong generalization to clean, unseen scenes. Such methods, however, appear to degrade when the test image is perturbed. This is likely because the prior arts typically use the primary 2D data augmentations (e.g., random horizontal flipping, random cropping, and color jittering), ignoring other common image degradation or corruptions. To mitigate this issue, we delve deeper into data augmentation and propose utilizing strong data augmentation techniques for robust depth estimation. In particular, we introduce 3D-aware defocus blur in addition to seven 2D data augmentations. We evaluate the generalization of our model on six clean RGB-D datasets that were not seen during training. To evaluate the robustness of MDE models, we create a benchmark by applying 15 common corruptions to the clean images from IBIMS, NYUDv2, KITTI, ETH3D, DIODE, and TUM. On this benchmark, we systematically study the robustness of our method and 9 representative MDE models. The experimental results demonstrate that our model exhibits better generalization and robustness than the previous methods. Specifically, we provide valuable insights about the choices of data augmentation strategies and network architectures, which would be useful for future research in robust monocular depth estimation. Our code, model, and benchmark can be available at https://github.com/KexianHust/Robust-MonoDepth . Ministry of Education (MOE) Submitted/Accepted version This work was in part supported by the National Key R&D Program of China (No. 2022ZD0118700), and partly supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (MOE-T2EP20220-0007). This work was also supported under the RIE2020 Industry Alignment Fund - Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s). Z. Cao was supported by the National Natural Science Foundation of China (No. U1913602). 2024-04-08T07:43:08Z 2024-04-08T07:43:08Z 2024 Journal Article Xian, K., Cao, Z., Shen, C. & Lin, G. (2024). Towards robust monocular depth estimation: a new baseline and benchmark. International Journal of Computer Vision. https://dx.doi.org/10.1007/s11263-023-01979-4 0920-5691 https://hdl.handle.net/10356/174734 10.1007/s11263-023-01979-4 2-s2.0-85182691022 en MOE-T2EP20220-0007 IAF-ICP International Journal of Computer Vision © 2024 The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1007/s11263-023-01979-4. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Monocular depth prediction
Generalization
spellingShingle Computer and Information Science
Monocular depth prediction
Generalization
Xian, Ke
Cao, Zhiguo
Shen, Chunhua
Lin, Guosheng
Towards robust monocular depth estimation: a new baseline and benchmark
description Before deploying a monocular depth estimation (MDE) model in real-world applications such as autonomous driving, it is critical to understand its generalization and robustness. Although the generalization of MDE models has been thoroughly studied, the robustness of the models has been overlooked in previous research. Existing state-of-the-art methods exhibit strong generalization to clean, unseen scenes. Such methods, however, appear to degrade when the test image is perturbed. This is likely because the prior arts typically use the primary 2D data augmentations (e.g., random horizontal flipping, random cropping, and color jittering), ignoring other common image degradation or corruptions. To mitigate this issue, we delve deeper into data augmentation and propose utilizing strong data augmentation techniques for robust depth estimation. In particular, we introduce 3D-aware defocus blur in addition to seven 2D data augmentations. We evaluate the generalization of our model on six clean RGB-D datasets that were not seen during training. To evaluate the robustness of MDE models, we create a benchmark by applying 15 common corruptions to the clean images from IBIMS, NYUDv2, KITTI, ETH3D, DIODE, and TUM. On this benchmark, we systematically study the robustness of our method and 9 representative MDE models. The experimental results demonstrate that our model exhibits better generalization and robustness than the previous methods. Specifically, we provide valuable insights about the choices of data augmentation strategies and network architectures, which would be useful for future research in robust monocular depth estimation. Our code, model, and benchmark can be available at https://github.com/KexianHust/Robust-MonoDepth .
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Xian, Ke
Cao, Zhiguo
Shen, Chunhua
Lin, Guosheng
format Article
author Xian, Ke
Cao, Zhiguo
Shen, Chunhua
Lin, Guosheng
author_sort Xian, Ke
title Towards robust monocular depth estimation: a new baseline and benchmark
title_short Towards robust monocular depth estimation: a new baseline and benchmark
title_full Towards robust monocular depth estimation: a new baseline and benchmark
title_fullStr Towards robust monocular depth estimation: a new baseline and benchmark
title_full_unstemmed Towards robust monocular depth estimation: a new baseline and benchmark
title_sort towards robust monocular depth estimation: a new baseline and benchmark
publishDate 2024
url https://hdl.handle.net/10356/174734
_version_ 1806059798143172608