Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs

Most supervised geographic mapping methods with very-high-resolution (VHR) images are designed for a specific task, leading to high label-dependency and inadequate task-generality. Additionally, the lack of socio-economic information in VHR images limits their applicability to social/human-related g...

Full description

Saved in:

Bibliographic Details
Main Authors:	Bai, Lubin, Huang, Weiming, Zhang, Xiuyuan, Du, Shihong, Cong, Gao, Wang, Haoyu, Liu, Bo
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2023
Subjects:	Engineering::Computer science and engineering Multi-Modal Representation Learning Remote Sensing Images
Online Access:	https://hdl.handle.net/10356/170129
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-170129
record_format	dspace
spelling	sg-ntu-dr.10356-1701292023-08-29T02:47:09Z Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs Bai, Lubin Huang, Weiming Zhang, Xiuyuan Du, Shihong Cong, Gao Wang, Haoyu Liu, Bo School of Computer Science and Engineering Engineering::Computer science and engineering Multi-Modal Representation Learning Remote Sensing Images Most supervised geographic mapping methods with very-high-resolution (VHR) images are designed for a specific task, leading to high label-dependency and inadequate task-generality. Additionally, the lack of socio-economic information in VHR images limits their applicability to social/human-related geographic studies. To resolve these two issues, we propose an unsupervised multi-modal geographic representation learning framework (MMGR) using both VHR images and points-of-interest (POIs), to learn representations (regional vector embeddings) carrying both the physical and socio-economic properties of the geographies. In MMGR, we employ an intra-modal and an inter-modal contrastive learning module, in which the former deeply mines visual features by contrasting different VHR image augmentations, while the latter fuses physical and socio-economic features by contrasting VHR image and POI features. Extensive experiments are performed in two study areas (Shanghai and Wuhan in China) and three relevant while distinctive geographic mapping tasks (i.e., mapping urban functional distributions, population density, and gross domestic product), to verify the superiority of MMGR. The results demonstrate that the proposed MMGR considerably outperforms seven competitive baselines in all three tasks, which indicates its effectiveness in fusing VHR images and POIs for multiple geographic mapping tasks. Furthermore, MMGR is a competent pre-training method to help image encoders understand multi-modal geographic information, and it can be further strengthened by fine-tuning even with a few labeled samples. The source code is released at https://github.com/bailubin/MMGR. National Research Foundation (NRF) The work presented in this paper is supported by the International Research Center of Big Data for Sustainable Development Goals (No. CBAS2022GSP06), the National Natural Science Foundation of China (No. 42001327, 42271469), the China Postdoctoral Science Foundation (No. 2019M660003 and No. 2020T130005), the National Key Research and Development Program of China (No. 2021YFE0117100), the Knut and Alice Wallenberg Foundation (to W.H.), and the National Research Foundation, Singapore under its Industry Alignment Fund – Prepositioning (IAF-PP) Funding Initiative. 2023-08-29T02:47:09Z 2023-08-29T02:47:09Z 2023 Journal Article Bai, L., Huang, W., Zhang, X., Du, S., Cong, G., Wang, H. & Liu, B. (2023). Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs. ISPRS Journal of Photogrammetry and Remote Sensing, 201, 193-208. https://dx.doi.org/10.1016/j.isprsjprs.2023.05.006 0924-2716 https://hdl.handle.net/10356/170129 10.1016/j.isprsjprs.2023.05.006 2-s2.0-85160791013 201 193 208 en IAF-PP ISPRS Journal of Photogrammetry and Remote Sensing © 2023 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved.
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering Multi-Modal Representation Learning Remote Sensing Images
spellingShingle	Engineering::Computer science and engineering Multi-Modal Representation Learning Remote Sensing Images Bai, Lubin Huang, Weiming Zhang, Xiuyuan Du, Shihong Cong, Gao Wang, Haoyu Liu, Bo Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs
description	Most supervised geographic mapping methods with very-high-resolution (VHR) images are designed for a specific task, leading to high label-dependency and inadequate task-generality. Additionally, the lack of socio-economic information in VHR images limits their applicability to social/human-related geographic studies. To resolve these two issues, we propose an unsupervised multi-modal geographic representation learning framework (MMGR) using both VHR images and points-of-interest (POIs), to learn representations (regional vector embeddings) carrying both the physical and socio-economic properties of the geographies. In MMGR, we employ an intra-modal and an inter-modal contrastive learning module, in which the former deeply mines visual features by contrasting different VHR image augmentations, while the latter fuses physical and socio-economic features by contrasting VHR image and POI features. Extensive experiments are performed in two study areas (Shanghai and Wuhan in China) and three relevant while distinctive geographic mapping tasks (i.e., mapping urban functional distributions, population density, and gross domestic product), to verify the superiority of MMGR. The results demonstrate that the proposed MMGR considerably outperforms seven competitive baselines in all three tasks, which indicates its effectiveness in fusing VHR images and POIs for multiple geographic mapping tasks. Furthermore, MMGR is a competent pre-training method to help image encoders understand multi-modal geographic information, and it can be further strengthened by fine-tuning even with a few labeled samples. The source code is released at https://github.com/bailubin/MMGR.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Bai, Lubin Huang, Weiming Zhang, Xiuyuan Du, Shihong Cong, Gao Wang, Haoyu Liu, Bo
format	Article
author	Bai, Lubin Huang, Weiming Zhang, Xiuyuan Du, Shihong Cong, Gao Wang, Haoyu Liu, Bo
author_sort	Bai, Lubin
title	Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs
title_short	Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs
title_full	Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs
title_fullStr	Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs
title_full_unstemmed	Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs
title_sort	geographic mapping with unsupervised multi-modal representation learning from vhr images and pois
publishDate	2023
url	https://hdl.handle.net/10356/170129
_version_	1779156326414286848

Geographic mapping with unsupervised multi-modal representation learning from VHR images and POIs

Similar Items