ACCURACY ENHANCEMENT OF AUTOMATIC ROAD EXTRACTION FROM SATELLITE IMAGES THROUGH VISION TRANSFORMER-BASED SYSTEM DEVELOPMENT
Automated road extraction techniques employing deep learning offer a cost- effective and expeditious alternative to manual approaches, while surpassing semi-automated methods in terms of efficacy. Nevertheless, these methods are still fall short of meeting the accuracy requirements for practica...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/77944 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:77944 |
---|---|
spelling |
id-itb.:779442023-09-15T10:36:29ZACCURACY ENHANCEMENT OF AUTOMATIC ROAD EXTRACTION FROM SATELLITE IMAGES THROUGH VISION TRANSFORMER-BASED SYSTEM DEVELOPMENT Jogy Maratur Siburian, Arthur Indonesia Theses Very-high resolution imagery, vision transformer, conditional positional encoding, dilated window attention INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/77944 Automated road extraction techniques employing deep learning offer a cost- effective and expeditious alternative to manual approaches, while surpassing semi-automated methods in terms of efficacy. Nevertheless, these methods are still fall short of meeting the accuracy requirements for practical real-world applications. This study enhances the performance of the automated road extraction process by introducing a novel multi-axis multi-scale attention network for road extraction, with a primary focus on capturing extended dependencies. The architecture comprises an encoder-decoder hierarchical structure, featuring sequentially positioned sparse local attention and dilated global attention, each accompanied by adjusted patch sizes at each stage. Distinct dilation rates for grid attention are introduced in the shallower network stage, effectively amplifying long-range dependencies. An implicit inductive bias through conditional positional encoding in the feed-forward network and relative positional bias in the attention model is integrated to enhance positional encoding efficiency and introduce bias in local patches. In the decoding phase, a summation-based aggregation strategy is employed, complemented by a more refined decoder, to facilitate intricate spatial information recovery. The proposed model was subjected to experimental validation on the DeepGlobe dataset, yielding results that hold comparability with several state-of-the-art networks. Additionally, A comprehensive ablation study was conducted, shedding light on the contributions of embedded modules within the architecture, and offering insights into meticulous tuning strategies. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Automated road extraction techniques employing deep learning offer a cost-
effective and expeditious alternative to manual approaches, while surpassing
semi-automated methods in terms of efficacy. Nevertheless, these methods are
still fall short of meeting the accuracy requirements for practical real-world
applications. This study enhances the performance of the automated road
extraction process by introducing a novel multi-axis multi-scale attention
network for road extraction, with a primary focus on capturing extended
dependencies. The architecture comprises an encoder-decoder hierarchical
structure, featuring sequentially positioned sparse local attention and dilated
global attention, each accompanied by adjusted patch sizes at each stage.
Distinct dilation rates for grid attention are introduced in the shallower
network stage, effectively amplifying long-range dependencies. An implicit
inductive bias through conditional positional encoding in the feed-forward
network and relative positional bias in the attention model is integrated to
enhance positional encoding efficiency and introduce bias in local patches. In
the decoding phase, a summation-based aggregation strategy is employed,
complemented by a more refined decoder, to facilitate intricate spatial
information recovery. The proposed model was subjected to experimental
validation on the DeepGlobe dataset, yielding results that hold comparability
with several state-of-the-art networks. Additionally, A comprehensive ablation
study was conducted, shedding light on the contributions of embedded modules
within the architecture, and offering insights into meticulous tuning strategies. |
format |
Theses |
author |
Jogy Maratur Siburian, Arthur |
spellingShingle |
Jogy Maratur Siburian, Arthur ACCURACY ENHANCEMENT OF AUTOMATIC ROAD EXTRACTION FROM SATELLITE IMAGES THROUGH VISION TRANSFORMER-BASED SYSTEM DEVELOPMENT |
author_facet |
Jogy Maratur Siburian, Arthur |
author_sort |
Jogy Maratur Siburian, Arthur |
title |
ACCURACY ENHANCEMENT OF AUTOMATIC ROAD EXTRACTION FROM SATELLITE IMAGES THROUGH VISION TRANSFORMER-BASED SYSTEM DEVELOPMENT |
title_short |
ACCURACY ENHANCEMENT OF AUTOMATIC ROAD EXTRACTION FROM SATELLITE IMAGES THROUGH VISION TRANSFORMER-BASED SYSTEM DEVELOPMENT |
title_full |
ACCURACY ENHANCEMENT OF AUTOMATIC ROAD EXTRACTION FROM SATELLITE IMAGES THROUGH VISION TRANSFORMER-BASED SYSTEM DEVELOPMENT |
title_fullStr |
ACCURACY ENHANCEMENT OF AUTOMATIC ROAD EXTRACTION FROM SATELLITE IMAGES THROUGH VISION TRANSFORMER-BASED SYSTEM DEVELOPMENT |
title_full_unstemmed |
ACCURACY ENHANCEMENT OF AUTOMATIC ROAD EXTRACTION FROM SATELLITE IMAGES THROUGH VISION TRANSFORMER-BASED SYSTEM DEVELOPMENT |
title_sort |
accuracy enhancement of automatic road extraction from satellite images through vision transformer-based system development |
url |
https://digilib.itb.ac.id/gdl/view/77944 |
_version_ |
1822995565704642560 |