Cross-domain robustness in visual place recognition: an adversarial-based domain alignment approach

Visual Place Recognition (VPR) serves as an important component in robotic sensing, mainly utilized within navigation systems, such as autonomous vehicles. VPR enables large-scale localization by comparing current query visual cues within a geo-tagged database of previously visited locations. The ma...

Full description

Saved in:
Bibliographic Details
Main Author: Lin, Yingying
Other Authors: Wang Dan Wei
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175346
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Visual Place Recognition (VPR) serves as an important component in robotic sensing, mainly utilized within navigation systems, such as autonomous vehicles. VPR enables large-scale localization by comparing current query visual cues within a geo-tagged database of previously visited locations. The main approach in VPR is using weak-supervised representation learning to generate compact but discriminative place descriptors, which are then used for image retrieve the closest match location from the database. NetVLAD has achieved high recall performance within datasets sampled from the same distribution, such as when inference is conducted within the same city and conditions. However, real applications may face significant challenges due to environmental changes, such as variations in illumination, viewpoint, and architectural styles. These different image styles could be viewed as samples from another data distribution, or a different domain. A VPR model trained on a source domain datasets may suffer a sharp decline in performance when tested on a different target domain. This dissertation aims to relief this low cross-domain robustness problem and enhance domain generalization capability. This dissertation focuses on improving the widely-used NetVLAD architecture by employing Domain Adaptation strategies to get place-discriminative while domain-invariant descriptors. An exhaustive theoretical exploration of VPR and Domain Adaptation is first conducted, identifying that resolving the inconsistency between geographic supervision and semantic information is key to better place descriptors. Meanwhile, mitigating the impact of domain-specific features relies most on adjusting the clustering center and soft-alignment parameters in the NetVLAD aggregation layer. Based on theoretical research insights, this work would perform domain alignment by introducing a small-scale target domain guidance to add prior information of the target domain during the training process, thus adapting the VPR model to new environments. Inspired by GAN principles to blur domain-specific information and enlightened by Domain Adversarial Neural Networks (DANN), this dissertation proposes three levels of domain alignment: pixel-level, local feature level, and representation level. Each approach is theoretically and empirically analyzed, comparing their advantages and limitations. Experimental outcomes shows that representation-level alignment most effectively meets the research objectives, outperforming both pixel and local feature level alignments. This increase is attributed to its alignment with the essence of representation learning, being highly task-relevant, and directly modifying descriptors, thus successfully enhancing target domain robustness while preserving source domain performance. For some limitations of this modification, the dissertation also gives recommendations for future work, such as enriching dataset information or simplifying model complexity to ensure model generalization ability.