Training deep network models for accurate detection of texts in scenes
In the scene text detection field, recent deep neural network-based approaches have garnered significant attention due to their impressive results on various benchmark datasets, including ICDAR 2013 [52], ICDAR 2015 [34], and MSRA-TD500 [53]. However, several existing methods for scene text detectio...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/165885 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-165885 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1658852023-04-21T15:37:24Z Training deep network models for accurate detection of texts in scenes Lee, Chun Fei Lu Shijian School of Computer Science and Engineering Shijian.Lu@ntu.edu.sg Engineering::Computer science and engineering In the scene text detection field, recent deep neural network-based approaches have garnered significant attention due to their impressive results on various benchmark datasets, including ICDAR 2013 [52], ICDAR 2015 [34], and MSRA-TD500 [53]. However, several existing methods for scene text detection employ complex pipelines with multiple intermediate steps, such as text-line candidate generation [29], rule-based filtering [29] , and word partitioning [7, 29]. This can lead to increased computational costs and time-consuming processing, ultimately resulting in reduced efficiency and performance degradation [49]. Moreover, the vanishing gradient and overfitting issues pose a significant challenge in scene text detection methods [1, 50]. Low-resolution feature maps also struggle to identify small and barely noticeable text in an image [51]. Therefore, this research aims to address these challenges and enhance existing scene text detection models by incorporating various designs. To address these challenges, we propose four key improvements: First, we refactor a widely used scene text detection method [1] and modify a simple yet efficient pipeline. This will serve as a basis for further enhancements. Second, we incorporate skip links into its feature extractor, effectively preventing vanishing gradient problems. Third, we apply the Feature Pyramid Network (FPN) [2] to eliminate the low-resolution issues. Specifically, we up-sample feature maps at different scales and concatenate them to form a high-resolution feature map. Lastly, we fine-tune the training schedules to avoid the overfitting issue. Concretely, our approach deploys fine-grained learning rates to train the model, enabling it to start from easier concepts to more complex ones. Through extensive experimentation, we demonstrate the robustness and effectiveness of our method. Our method outperforms the original EAST implementation on the ICDAR 2015 dataset by 5.71, achieving an F-score of 82.12. For more implementation details, please check our code at: https://github.com/ChunFei96/EAST_resnet50. Bachelor of Engineering (Computer Science) 2023-04-17T06:33:05Z 2023-04-17T06:33:05Z 2023 Final Year Project (FYP) Lee, C. F. (2023). Training deep network models for accurate detection of texts in scenes. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/165885 https://hdl.handle.net/10356/165885 en PSCSE21-0026 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering |
spellingShingle |
Engineering::Computer science and engineering Lee, Chun Fei Training deep network models for accurate detection of texts in scenes |
description |
In the scene text detection field, recent deep neural network-based approaches have garnered significant attention due to their impressive results on various benchmark datasets, including ICDAR 2013 [52], ICDAR 2015 [34], and MSRA-TD500 [53]. However, several existing methods for scene text detection employ complex pipelines with multiple intermediate steps,
such as text-line candidate generation [29], rule-based filtering [29] , and word partitioning [7, 29]. This can lead to increased computational costs and time-consuming processing, ultimately resulting in reduced efficiency and performance degradation [49]. Moreover, the vanishing gradient and overfitting issues pose a significant challenge in scene text detection methods [1, 50]. Low-resolution feature maps also struggle to identify small and barely noticeable text in an image [51]. Therefore, this research aims to address these challenges and enhance existing scene text detection models by incorporating various designs.
To address these challenges, we propose four key improvements: First, we refactor a widely used scene text detection method [1] and modify a simple yet efficient pipeline. This will serve as a basis for further enhancements. Second, we incorporate skip links into its feature extractor, effectively preventing vanishing gradient problems. Third, we apply the Feature Pyramid Network (FPN) [2] to eliminate the low-resolution issues. Specifically, we up-sample feature maps at different scales and concatenate them to form a high-resolution feature map. Lastly, we fine-tune the training schedules to avoid the overfitting issue. Concretely, our approach deploys fine-grained learning rates to train the model, enabling it to start from easier concepts to more complex ones.
Through extensive experimentation, we demonstrate the robustness and effectiveness of our method. Our method outperforms the original EAST implementation on the ICDAR 2015 dataset by 5.71, achieving an F-score of 82.12. For more implementation details, please check our code at: https://github.com/ChunFei96/EAST_resnet50. |
author2 |
Lu Shijian |
author_facet |
Lu Shijian Lee, Chun Fei |
format |
Final Year Project |
author |
Lee, Chun Fei |
author_sort |
Lee, Chun Fei |
title |
Training deep network models for accurate detection of texts in scenes |
title_short |
Training deep network models for accurate detection of texts in scenes |
title_full |
Training deep network models for accurate detection of texts in scenes |
title_fullStr |
Training deep network models for accurate detection of texts in scenes |
title_full_unstemmed |
Training deep network models for accurate detection of texts in scenes |
title_sort |
training deep network models for accurate detection of texts in scenes |
publisher |
Nanyang Technological University |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/165885 |
_version_ |
1764208153371607040 |