Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs

Most supervised geographic mapping methods with very-high-resolution (VHR) images are designed for a specific task, leading to high label-dependency and inadequate task-generality. Additionally, the lack of socio-economic information in VHR images limits their applicability to social/human-related g...

Full description

Saved in:
Bibliographic Details
Main Authors: Bai, Lubin, Huang, Weiming, Zhang, Xiuyuan, Du, Shihong, Cong, Gao, Wang, Haoyu, Liu, Bo
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/170129
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-170129
record_format dspace
spelling sg-ntu-dr.10356-1701292023-08-29T02:47:09Z Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs Bai, Lubin Huang, Weiming Zhang, Xiuyuan Du, Shihong Cong, Gao Wang, Haoyu Liu, Bo School of Computer Science and Engineering Engineering::Computer science and engineering Multi-Modal Representation Learning Remote Sensing Images Most supervised geographic mapping methods with very-high-resolution (VHR) images are designed for a specific task, leading to high label-dependency and inadequate task-generality. Additionally, the lack of socio-economic information in VHR images limits their applicability to social/human-related geographic studies. To resolve these two issues, we propose an unsupervised multi-modal geographic representation learning framework (MMGR) using both VHR images and points-of-interest (POIs), to learn representations (regional vector embeddings) carrying both the physical and socio-economic properties of the geographies. In MMGR, we employ an intra-modal and an inter-modal contrastive learning module, in which the former deeply mines visual features by contrasting different VHR image augmentations, while the latter fuses physical and socio-economic features by contrasting VHR image and POI features. Extensive experiments are performed in two study areas (Shanghai and Wuhan in China) and three relevant while distinctive geographic mapping tasks (i.e., mapping urban functional distributions, population density, and gross domestic product), to verify the superiority of MMGR. The results demonstrate that the proposed MMGR considerably outperforms seven competitive baselines in all three tasks, which indicates its effectiveness in fusing VHR images and POIs for multiple geographic mapping tasks. Furthermore, MMGR is a competent pre-training method to help image encoders understand multi-modal geographic information, and it can be further strengthened by fine-tuning even with a few labeled samples. The source code is released at https://github.com/bailubin/MMGR. National Research Foundation (NRF) The work presented in this paper is supported by the International Research Center of Big Data for Sustainable Development Goals (No. CBAS2022GSP06), the National Natural Science Foundation of China (No. 42001327, 42271469), the China Postdoctoral Science Foundation (No. 2019M660003 and No. 2020T130005), the National Key Research and Development Program of China (No. 2021YFE0117100), the Knut and Alice Wallenberg Foundation (to W.H.), and the National Research Foundation, Singapore under its Industry Alignment Fund – Prepositioning (IAF-PP) Funding Initiative. 2023-08-29T02:47:09Z 2023-08-29T02:47:09Z 2023 Journal Article Bai, L., Huang, W., Zhang, X., Du, S., Cong, G., Wang, H. & Liu, B. (2023). Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs. ISPRS Journal of Photogrammetry and Remote Sensing, 201, 193-208. https://dx.doi.org/10.1016/j.isprsjprs.2023.05.006 0924-2716 https://hdl.handle.net/10356/170129 10.1016/j.isprsjprs.2023.05.006 2-s2.0-85160791013 201 193 208 en IAF-PP ISPRS Journal of Photogrammetry and Remote Sensing © 2023 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Multi-Modal Representation Learning
Remote Sensing Images
spellingShingle Engineering::Computer science and engineering
Multi-Modal Representation Learning
Remote Sensing Images
Bai, Lubin
Huang, Weiming
Zhang, Xiuyuan
Du, Shihong
Cong, Gao
Wang, Haoyu
Liu, Bo
Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs
description Most supervised geographic mapping methods with very-high-resolution (VHR) images are designed for a specific task, leading to high label-dependency and inadequate task-generality. Additionally, the lack of socio-economic information in VHR images limits their applicability to social/human-related geographic studies. To resolve these two issues, we propose an unsupervised multi-modal geographic representation learning framework (MMGR) using both VHR images and points-of-interest (POIs), to learn representations (regional vector embeddings) carrying both the physical and socio-economic properties of the geographies. In MMGR, we employ an intra-modal and an inter-modal contrastive learning module, in which the former deeply mines visual features by contrasting different VHR image augmentations, while the latter fuses physical and socio-economic features by contrasting VHR image and POI features. Extensive experiments are performed in two study areas (Shanghai and Wuhan in China) and three relevant while distinctive geographic mapping tasks (i.e., mapping urban functional distributions, population density, and gross domestic product), to verify the superiority of MMGR. The results demonstrate that the proposed MMGR considerably outperforms seven competitive baselines in all three tasks, which indicates its effectiveness in fusing VHR images and POIs for multiple geographic mapping tasks. Furthermore, MMGR is a competent pre-training method to help image encoders understand multi-modal geographic information, and it can be further strengthened by fine-tuning even with a few labeled samples. The source code is released at https://github.com/bailubin/MMGR.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Bai, Lubin
Huang, Weiming
Zhang, Xiuyuan
Du, Shihong
Cong, Gao
Wang, Haoyu
Liu, Bo
format Article
author Bai, Lubin
Huang, Weiming
Zhang, Xiuyuan
Du, Shihong
Cong, Gao
Wang, Haoyu
Liu, Bo
author_sort Bai, Lubin
title Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs
title_short Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs
title_full Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs
title_fullStr Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs
title_full_unstemmed Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs
title_sort geographic mapping with unsupervised multi-modal representation learning from vhr images and pois
publishDate 2023
url https://hdl.handle.net/10356/170129
_version_ 1779156326414286848