ACCURACY ENHANCEMENT OF AUTOMATIC ROAD EXTRACTION FROM SATELLITE IMAGES THROUGH VISION TRANSFORMER-BASED SYSTEM DEVELOPMENT

Automated road extraction techniques employing deep learning offer a cost- effective and expeditious alternative to manual approaches, while surpassing semi-automated methods in terms of efficacy. Nevertheless, these methods are still fall short of meeting the accuracy requirements for practica...

Full description

Saved in:
Bibliographic Details
Main Author: Jogy Maratur Siburian, Arthur
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/77944
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:77944
spelling id-itb.:779442023-09-15T10:36:29ZACCURACY ENHANCEMENT OF AUTOMATIC ROAD EXTRACTION FROM SATELLITE IMAGES THROUGH VISION TRANSFORMER-BASED SYSTEM DEVELOPMENT Jogy Maratur Siburian, Arthur Indonesia Theses Very-high resolution imagery, vision transformer, conditional positional encoding, dilated window attention INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/77944 Automated road extraction techniques employing deep learning offer a cost- effective and expeditious alternative to manual approaches, while surpassing semi-automated methods in terms of efficacy. Nevertheless, these methods are still fall short of meeting the accuracy requirements for practical real-world applications. This study enhances the performance of the automated road extraction process by introducing a novel multi-axis multi-scale attention network for road extraction, with a primary focus on capturing extended dependencies. The architecture comprises an encoder-decoder hierarchical structure, featuring sequentially positioned sparse local attention and dilated global attention, each accompanied by adjusted patch sizes at each stage. Distinct dilation rates for grid attention are introduced in the shallower network stage, effectively amplifying long-range dependencies. An implicit inductive bias through conditional positional encoding in the feed-forward network and relative positional bias in the attention model is integrated to enhance positional encoding efficiency and introduce bias in local patches. In the decoding phase, a summation-based aggregation strategy is employed, complemented by a more refined decoder, to facilitate intricate spatial information recovery. The proposed model was subjected to experimental validation on the DeepGlobe dataset, yielding results that hold comparability with several state-of-the-art networks. Additionally, A comprehensive ablation study was conducted, shedding light on the contributions of embedded modules within the architecture, and offering insights into meticulous tuning strategies. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Automated road extraction techniques employing deep learning offer a cost- effective and expeditious alternative to manual approaches, while surpassing semi-automated methods in terms of efficacy. Nevertheless, these methods are still fall short of meeting the accuracy requirements for practical real-world applications. This study enhances the performance of the automated road extraction process by introducing a novel multi-axis multi-scale attention network for road extraction, with a primary focus on capturing extended dependencies. The architecture comprises an encoder-decoder hierarchical structure, featuring sequentially positioned sparse local attention and dilated global attention, each accompanied by adjusted patch sizes at each stage. Distinct dilation rates for grid attention are introduced in the shallower network stage, effectively amplifying long-range dependencies. An implicit inductive bias through conditional positional encoding in the feed-forward network and relative positional bias in the attention model is integrated to enhance positional encoding efficiency and introduce bias in local patches. In the decoding phase, a summation-based aggregation strategy is employed, complemented by a more refined decoder, to facilitate intricate spatial information recovery. The proposed model was subjected to experimental validation on the DeepGlobe dataset, yielding results that hold comparability with several state-of-the-art networks. Additionally, A comprehensive ablation study was conducted, shedding light on the contributions of embedded modules within the architecture, and offering insights into meticulous tuning strategies.
format Theses
author Jogy Maratur Siburian, Arthur
spellingShingle Jogy Maratur Siburian, Arthur
ACCURACY ENHANCEMENT OF AUTOMATIC ROAD EXTRACTION FROM SATELLITE IMAGES THROUGH VISION TRANSFORMER-BASED SYSTEM DEVELOPMENT
author_facet Jogy Maratur Siburian, Arthur
author_sort Jogy Maratur Siburian, Arthur
title ACCURACY ENHANCEMENT OF AUTOMATIC ROAD EXTRACTION FROM SATELLITE IMAGES THROUGH VISION TRANSFORMER-BASED SYSTEM DEVELOPMENT
title_short ACCURACY ENHANCEMENT OF AUTOMATIC ROAD EXTRACTION FROM SATELLITE IMAGES THROUGH VISION TRANSFORMER-BASED SYSTEM DEVELOPMENT
title_full ACCURACY ENHANCEMENT OF AUTOMATIC ROAD EXTRACTION FROM SATELLITE IMAGES THROUGH VISION TRANSFORMER-BASED SYSTEM DEVELOPMENT
title_fullStr ACCURACY ENHANCEMENT OF AUTOMATIC ROAD EXTRACTION FROM SATELLITE IMAGES THROUGH VISION TRANSFORMER-BASED SYSTEM DEVELOPMENT
title_full_unstemmed ACCURACY ENHANCEMENT OF AUTOMATIC ROAD EXTRACTION FROM SATELLITE IMAGES THROUGH VISION TRANSFORMER-BASED SYSTEM DEVELOPMENT
title_sort accuracy enhancement of automatic road extraction from satellite images through vision transformer-based system development
url https://digilib.itb.ac.id/gdl/view/77944
_version_ 1822995565704642560