Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs
Most supervised geographic mapping methods with very-high-resolution (VHR) images are designed for a specific task, leading to high label-dependency and inadequate task-generality. Additionally, the lack of socio-economic information in VHR images limits their applicability to social/human-related g...
Saved in:
Main Authors: | , , , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/170129 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-170129 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1701292023-08-29T02:47:09Z Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs Bai, Lubin Huang, Weiming Zhang, Xiuyuan Du, Shihong Cong, Gao Wang, Haoyu Liu, Bo School of Computer Science and Engineering Engineering::Computer science and engineering Multi-Modal Representation Learning Remote Sensing Images Most supervised geographic mapping methods with very-high-resolution (VHR) images are designed for a specific task, leading to high label-dependency and inadequate task-generality. Additionally, the lack of socio-economic information in VHR images limits their applicability to social/human-related geographic studies. To resolve these two issues, we propose an unsupervised multi-modal geographic representation learning framework (MMGR) using both VHR images and points-of-interest (POIs), to learn representations (regional vector embeddings) carrying both the physical and socio-economic properties of the geographies. In MMGR, we employ an intra-modal and an inter-modal contrastive learning module, in which the former deeply mines visual features by contrasting different VHR image augmentations, while the latter fuses physical and socio-economic features by contrasting VHR image and POI features. Extensive experiments are performed in two study areas (Shanghai and Wuhan in China) and three relevant while distinctive geographic mapping tasks (i.e., mapping urban functional distributions, population density, and gross domestic product), to verify the superiority of MMGR. The results demonstrate that the proposed MMGR considerably outperforms seven competitive baselines in all three tasks, which indicates its effectiveness in fusing VHR images and POIs for multiple geographic mapping tasks. Furthermore, MMGR is a competent pre-training method to help image encoders understand multi-modal geographic information, and it can be further strengthened by fine-tuning even with a few labeled samples. The source code is released at https://github.com/bailubin/MMGR. National Research Foundation (NRF) The work presented in this paper is supported by the International Research Center of Big Data for Sustainable Development Goals (No. CBAS2022GSP06), the National Natural Science Foundation of China (No. 42001327, 42271469), the China Postdoctoral Science Foundation (No. 2019M660003 and No. 2020T130005), the National Key Research and Development Program of China (No. 2021YFE0117100), the Knut and Alice Wallenberg Foundation (to W.H.), and the National Research Foundation, Singapore under its Industry Alignment Fund – Prepositioning (IAF-PP) Funding Initiative. 2023-08-29T02:47:09Z 2023-08-29T02:47:09Z 2023 Journal Article Bai, L., Huang, W., Zhang, X., Du, S., Cong, G., Wang, H. & Liu, B. (2023). Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs. ISPRS Journal of Photogrammetry and Remote Sensing, 201, 193-208. https://dx.doi.org/10.1016/j.isprsjprs.2023.05.006 0924-2716 https://hdl.handle.net/10356/170129 10.1016/j.isprsjprs.2023.05.006 2-s2.0-85160791013 201 193 208 en IAF-PP ISPRS Journal of Photogrammetry and Remote Sensing © 2023 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved. |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering Multi-Modal Representation Learning Remote Sensing Images |
spellingShingle |
Engineering::Computer science and engineering Multi-Modal Representation Learning Remote Sensing Images Bai, Lubin Huang, Weiming Zhang, Xiuyuan Du, Shihong Cong, Gao Wang, Haoyu Liu, Bo Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs |
description |
Most supervised geographic mapping methods with very-high-resolution (VHR) images are designed for a specific task, leading to high label-dependency and inadequate task-generality. Additionally, the lack of socio-economic information in VHR images limits their applicability to social/human-related geographic studies. To resolve these two issues, we propose an unsupervised multi-modal geographic representation learning framework (MMGR) using both VHR images and points-of-interest (POIs), to learn representations (regional vector embeddings) carrying both the physical and socio-economic properties of the geographies. In MMGR, we employ an intra-modal and an inter-modal contrastive learning module, in which the former deeply mines visual features by contrasting different VHR image augmentations, while the latter fuses physical and socio-economic features by contrasting VHR image and POI features. Extensive experiments are performed in two study areas (Shanghai and Wuhan in China) and three relevant while distinctive geographic mapping tasks (i.e., mapping urban functional distributions, population density, and gross domestic product), to verify the superiority of MMGR. The results demonstrate that the proposed MMGR considerably outperforms seven competitive baselines in all three tasks, which indicates its effectiveness in fusing VHR images and POIs for multiple geographic mapping tasks. Furthermore, MMGR is a competent pre-training method to help image encoders understand multi-modal geographic information, and it can be further strengthened by fine-tuning even with a few labeled samples. The source code is released at https://github.com/bailubin/MMGR. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Bai, Lubin Huang, Weiming Zhang, Xiuyuan Du, Shihong Cong, Gao Wang, Haoyu Liu, Bo |
format |
Article |
author |
Bai, Lubin Huang, Weiming Zhang, Xiuyuan Du, Shihong Cong, Gao Wang, Haoyu Liu, Bo |
author_sort |
Bai, Lubin |
title |
Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs |
title_short |
Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs |
title_full |
Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs |
title_fullStr |
Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs |
title_full_unstemmed |
Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs |
title_sort |
geographic mapping with unsupervised multi-modal representation learning from vhr images and pois |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/170129 |
_version_ |
1779156326414286848 |