Towards robust monocular depth estimation: a new baseline and benchmark
Before deploying a monocular depth estimation (MDE) model in real-world applications such as autonomous driving, it is critical to understand its generalization and robustness. Although the generalization of MDE models has been thoroughly studied, the robustness of the models has been overlooked in...
Saved in:
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/174734 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-174734 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1747342024-04-12T15:37:30Z Towards robust monocular depth estimation: a new baseline and benchmark Xian, Ke Cao, Zhiguo Shen, Chunhua Lin, Guosheng School of Computer Science and Engineering Computer and Information Science Monocular depth prediction Generalization Before deploying a monocular depth estimation (MDE) model in real-world applications such as autonomous driving, it is critical to understand its generalization and robustness. Although the generalization of MDE models has been thoroughly studied, the robustness of the models has been overlooked in previous research. Existing state-of-the-art methods exhibit strong generalization to clean, unseen scenes. Such methods, however, appear to degrade when the test image is perturbed. This is likely because the prior arts typically use the primary 2D data augmentations (e.g., random horizontal flipping, random cropping, and color jittering), ignoring other common image degradation or corruptions. To mitigate this issue, we delve deeper into data augmentation and propose utilizing strong data augmentation techniques for robust depth estimation. In particular, we introduce 3D-aware defocus blur in addition to seven 2D data augmentations. We evaluate the generalization of our model on six clean RGB-D datasets that were not seen during training. To evaluate the robustness of MDE models, we create a benchmark by applying 15 common corruptions to the clean images from IBIMS, NYUDv2, KITTI, ETH3D, DIODE, and TUM. On this benchmark, we systematically study the robustness of our method and 9 representative MDE models. The experimental results demonstrate that our model exhibits better generalization and robustness than the previous methods. Specifically, we provide valuable insights about the choices of data augmentation strategies and network architectures, which would be useful for future research in robust monocular depth estimation. Our code, model, and benchmark can be available at https://github.com/KexianHust/Robust-MonoDepth . Ministry of Education (MOE) Submitted/Accepted version This work was in part supported by the National Key R&D Program of China (No. 2022ZD0118700), and partly supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (MOE-T2EP20220-0007). This work was also supported under the RIE2020 Industry Alignment Fund - Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s). Z. Cao was supported by the National Natural Science Foundation of China (No. U1913602). 2024-04-08T07:43:08Z 2024-04-08T07:43:08Z 2024 Journal Article Xian, K., Cao, Z., Shen, C. & Lin, G. (2024). Towards robust monocular depth estimation: a new baseline and benchmark. International Journal of Computer Vision. https://dx.doi.org/10.1007/s11263-023-01979-4 0920-5691 https://hdl.handle.net/10356/174734 10.1007/s11263-023-01979-4 2-s2.0-85182691022 en MOE-T2EP20220-0007 IAF-ICP International Journal of Computer Vision © 2024 The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1007/s11263-023-01979-4. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science Monocular depth prediction Generalization |
spellingShingle |
Computer and Information Science Monocular depth prediction Generalization Xian, Ke Cao, Zhiguo Shen, Chunhua Lin, Guosheng Towards robust monocular depth estimation: a new baseline and benchmark |
description |
Before deploying a monocular depth estimation (MDE) model in real-world applications such as autonomous driving, it is critical to understand its generalization and robustness. Although the generalization of MDE models has been thoroughly studied, the robustness of the models has been overlooked in previous research. Existing state-of-the-art methods exhibit strong generalization to clean, unseen scenes. Such methods, however, appear to degrade when the test image is perturbed. This is likely because the prior arts typically use the primary 2D data augmentations (e.g., random horizontal flipping, random cropping, and color jittering), ignoring other common image degradation or corruptions. To mitigate this issue, we delve deeper into data augmentation and propose utilizing strong data augmentation techniques for robust depth estimation. In particular, we introduce 3D-aware defocus blur in addition to seven 2D data augmentations. We evaluate the generalization of our model on six clean RGB-D datasets that were not seen during training. To evaluate the robustness of MDE models, we create a benchmark by applying 15 common corruptions to the clean images from IBIMS, NYUDv2, KITTI, ETH3D, DIODE, and TUM. On this benchmark, we systematically study the robustness of our method and 9 representative MDE models. The experimental results demonstrate that our model exhibits better generalization and robustness than the previous methods. Specifically, we provide valuable insights about the choices of data augmentation strategies and network architectures, which would be useful for future research in robust monocular depth estimation. Our code, model, and benchmark can be available at https://github.com/KexianHust/Robust-MonoDepth . |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Xian, Ke Cao, Zhiguo Shen, Chunhua Lin, Guosheng |
format |
Article |
author |
Xian, Ke Cao, Zhiguo Shen, Chunhua Lin, Guosheng |
author_sort |
Xian, Ke |
title |
Towards robust monocular depth estimation: a new baseline and benchmark |
title_short |
Towards robust monocular depth estimation: a new baseline and benchmark |
title_full |
Towards robust monocular depth estimation: a new baseline and benchmark |
title_fullStr |
Towards robust monocular depth estimation: a new baseline and benchmark |
title_full_unstemmed |
Towards robust monocular depth estimation: a new baseline and benchmark |
title_sort |
towards robust monocular depth estimation: a new baseline and benchmark |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/174734 |
_version_ |
1806059798143172608 |