Training deep network models for accurate detection of texts in scenes

In the scene text detection field, recent deep neural network-based approaches have garnered significant attention due to their impressive results on various benchmark datasets, including ICDAR 2013 [52], ICDAR 2015 [34], and MSRA-TD500 [53]. However, several existing methods for scene text detectio...

Full description

Saved in:
Bibliographic Details
Main Author: Lee, Chun Fei
Other Authors: Lu Shijian
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/165885
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-165885
record_format dspace
spelling sg-ntu-dr.10356-1658852023-04-21T15:37:24Z Training deep network models for accurate detection of texts in scenes Lee, Chun Fei Lu Shijian School of Computer Science and Engineering Shijian.Lu@ntu.edu.sg Engineering::Computer science and engineering In the scene text detection field, recent deep neural network-based approaches have garnered significant attention due to their impressive results on various benchmark datasets, including ICDAR 2013 [52], ICDAR 2015 [34], and MSRA-TD500 [53]. However, several existing methods for scene text detection employ complex pipelines with multiple intermediate steps, such as text-line candidate generation [29], rule-based filtering [29] , and word partitioning [7, 29]. This can lead to increased computational costs and time-consuming processing, ultimately resulting in reduced efficiency and performance degradation [49]. Moreover, the vanishing gradient and overfitting issues pose a significant challenge in scene text detection methods [1, 50]. Low-resolution feature maps also struggle to identify small and barely noticeable text in an image [51]. Therefore, this research aims to address these challenges and enhance existing scene text detection models by incorporating various designs. To address these challenges, we propose four key improvements: First, we refactor a widely used scene text detection method [1] and modify a simple yet efficient pipeline. This will serve as a basis for further enhancements. Second, we incorporate skip links into its feature extractor, effectively preventing vanishing gradient problems. Third, we apply the Feature Pyramid Network (FPN) [2] to eliminate the low-resolution issues. Specifically, we up-sample feature maps at different scales and concatenate them to form a high-resolution feature map. Lastly, we fine-tune the training schedules to avoid the overfitting issue. Concretely, our approach deploys fine-grained learning rates to train the model, enabling it to start from easier concepts to more complex ones. Through extensive experimentation, we demonstrate the robustness and effectiveness of our method. Our method outperforms the original EAST implementation on the ICDAR 2015 dataset by 5.71, achieving an F-score of 82.12. For more implementation details, please check our code at: https://github.com/ChunFei96/EAST_resnet50. Bachelor of Engineering (Computer Science) 2023-04-17T06:33:05Z 2023-04-17T06:33:05Z 2023 Final Year Project (FYP) Lee, C. F. (2023). Training deep network models for accurate detection of texts in scenes. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/165885 https://hdl.handle.net/10356/165885 en PSCSE21-0026 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Lee, Chun Fei
Training deep network models for accurate detection of texts in scenes
description In the scene text detection field, recent deep neural network-based approaches have garnered significant attention due to their impressive results on various benchmark datasets, including ICDAR 2013 [52], ICDAR 2015 [34], and MSRA-TD500 [53]. However, several existing methods for scene text detection employ complex pipelines with multiple intermediate steps, such as text-line candidate generation [29], rule-based filtering [29] , and word partitioning [7, 29]. This can lead to increased computational costs and time-consuming processing, ultimately resulting in reduced efficiency and performance degradation [49]. Moreover, the vanishing gradient and overfitting issues pose a significant challenge in scene text detection methods [1, 50]. Low-resolution feature maps also struggle to identify small and barely noticeable text in an image [51]. Therefore, this research aims to address these challenges and enhance existing scene text detection models by incorporating various designs. To address these challenges, we propose four key improvements: First, we refactor a widely used scene text detection method [1] and modify a simple yet efficient pipeline. This will serve as a basis for further enhancements. Second, we incorporate skip links into its feature extractor, effectively preventing vanishing gradient problems. Third, we apply the Feature Pyramid Network (FPN) [2] to eliminate the low-resolution issues. Specifically, we up-sample feature maps at different scales and concatenate them to form a high-resolution feature map. Lastly, we fine-tune the training schedules to avoid the overfitting issue. Concretely, our approach deploys fine-grained learning rates to train the model, enabling it to start from easier concepts to more complex ones. Through extensive experimentation, we demonstrate the robustness and effectiveness of our method. Our method outperforms the original EAST implementation on the ICDAR 2015 dataset by 5.71, achieving an F-score of 82.12. For more implementation details, please check our code at: https://github.com/ChunFei96/EAST_resnet50.
author2 Lu Shijian
author_facet Lu Shijian
Lee, Chun Fei
format Final Year Project
author Lee, Chun Fei
author_sort Lee, Chun Fei
title Training deep network models for accurate detection of texts in scenes
title_short Training deep network models for accurate detection of texts in scenes
title_full Training deep network models for accurate detection of texts in scenes
title_fullStr Training deep network models for accurate detection of texts in scenes
title_full_unstemmed Training deep network models for accurate detection of texts in scenes
title_sort training deep network models for accurate detection of texts in scenes
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/165885
_version_ 1764208153371607040