Recognizing text on maps
Scanned cartographic maps are publicly available repositories of geographical data that include various map symbols and text labels in different fonts, styles, and orientations. Due to the highly unstructured format of textual content in maps, text recognition in maps is a challenging task that requ...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/172006 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Scanned cartographic maps are publicly available repositories of geographical data that include various map symbols and text labels in different fonts, styles, and orientations. Due to the highly unstructured format of textual content in maps, text recognition in maps is a challenging task that requires manual work or advanced machine learning tools. In this project, we tackle the task of recognizing text in maps, which broadly involves two major steps – detection of bounding box for text instances and recognition of characters in the text. For this task, we study and adopt the state-of-the-art TESTR model originally designed for Scene Text Recognition. Due to a lack of training data for finetuning the TESTR model, we investigate the application of cycle-GAN to automatically create a vast dataset of annotated historical map images. Experiments on the text spotting model shows a 74% F-score which outperforms other state-of-the-art models evaluated for this task. Finally, we examine and implement a machine learning pipeline mapKurator that provides end-to-end tools for preprocessing map images, detecting and recognizing text labels in maps, and post-processing of the output. The mapKurator pipeline enables ease of use of the text spotting model, hence promoting the FAIR principles of historical maps. |
---|