Recognizing text on maps

Scanned cartographic maps are publicly available repositories of geographical data that include various map symbols and text labels in different fonts, styles, and orientations. Due to the highly unstructured format of textual content in maps, text recognition in maps is a challenging task that requ...

Full description

Saved in:
Bibliographic Details
Main Author: Goel, Tejas
Other Authors: Li Boyang
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/172006
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-172006
record_format dspace
spelling sg-ntu-dr.10356-1720062023-11-24T15:37:58Z Recognizing text on maps Goel, Tejas Li Boyang School of Computer Science and Engineering boyang.li@ntu.edu.sg Engineering::Computer science and engineering Scanned cartographic maps are publicly available repositories of geographical data that include various map symbols and text labels in different fonts, styles, and orientations. Due to the highly unstructured format of textual content in maps, text recognition in maps is a challenging task that requires manual work or advanced machine learning tools. In this project, we tackle the task of recognizing text in maps, which broadly involves two major steps – detection of bounding box for text instances and recognition of characters in the text. For this task, we study and adopt the state-of-the-art TESTR model originally designed for Scene Text Recognition. Due to a lack of training data for finetuning the TESTR model, we investigate the application of cycle-GAN to automatically create a vast dataset of annotated historical map images. Experiments on the text spotting model shows a 74% F-score which outperforms other state-of-the-art models evaluated for this task. Finally, we examine and implement a machine learning pipeline mapKurator that provides end-to-end tools for preprocessing map images, detecting and recognizing text labels in maps, and post-processing of the output. The mapKurator pipeline enables ease of use of the text spotting model, hence promoting the FAIR principles of historical maps. Bachelor of Engineering (Computer Engineering) 2023-11-20T06:53:49Z 2023-11-20T06:53:49Z 2023 Final Year Project (FYP) Goel, T. (2023). Recognizing text on maps. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/172006 https://hdl.handle.net/10356/172006 en SCSE22-0771 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Goel, Tejas
Recognizing text on maps
description Scanned cartographic maps are publicly available repositories of geographical data that include various map symbols and text labels in different fonts, styles, and orientations. Due to the highly unstructured format of textual content in maps, text recognition in maps is a challenging task that requires manual work or advanced machine learning tools. In this project, we tackle the task of recognizing text in maps, which broadly involves two major steps – detection of bounding box for text instances and recognition of characters in the text. For this task, we study and adopt the state-of-the-art TESTR model originally designed for Scene Text Recognition. Due to a lack of training data for finetuning the TESTR model, we investigate the application of cycle-GAN to automatically create a vast dataset of annotated historical map images. Experiments on the text spotting model shows a 74% F-score which outperforms other state-of-the-art models evaluated for this task. Finally, we examine and implement a machine learning pipeline mapKurator that provides end-to-end tools for preprocessing map images, detecting and recognizing text labels in maps, and post-processing of the output. The mapKurator pipeline enables ease of use of the text spotting model, hence promoting the FAIR principles of historical maps.
author2 Li Boyang
author_facet Li Boyang
Goel, Tejas
format Final Year Project
author Goel, Tejas
author_sort Goel, Tejas
title Recognizing text on maps
title_short Recognizing text on maps
title_full Recognizing text on maps
title_fullStr Recognizing text on maps
title_full_unstemmed Recognizing text on maps
title_sort recognizing text on maps
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/172006
_version_ 1783955620183932928