Training deep network models for accurate detection of texts in scenes

In the scene text detection field, recent deep neural network-based approaches have garnered significant attention due to their impressive results on various benchmark datasets, including ICDAR 2013 [52], ICDAR 2015 [34], and MSRA-TD500 [53]. However, several existing methods for scene text detectio...

Full description

Saved in:

Bibliographic Details
Main Author:	Lee, Chun Fei
Other Authors:	Lu Shijian
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2023
Subjects:	Engineering::Computer science and engineering
Online Access:	https://hdl.handle.net/10356/165885
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-165885
record_format	dspace
spelling	sg-ntu-dr.10356-1658852023-04-21T15:37:24Z Training deep network models for accurate detection of texts in scenes Lee, Chun Fei Lu Shijian School of Computer Science and Engineering Shijian.Lu@ntu.edu.sg Engineering::Computer science and engineering In the scene text detection field, recent deep neural network-based approaches have garnered significant attention due to their impressive results on various benchmark datasets, including ICDAR 2013 [52], ICDAR 2015 [34], and MSRA-TD500 [53]. However, several existing methods for scene text detection employ complex pipelines with multiple intermediate steps, such as text-line candidate generation [29], rule-based filtering [29] , and word partitioning [7, 29]. This can lead to increased computational costs and time-consuming processing, ultimately resulting in reduced efficiency and performance degradation [49]. Moreover, the vanishing gradient and overfitting issues pose a significant challenge in scene text detection methods [1, 50]. Low-resolution feature maps also struggle to identify small and barely noticeable text in an image [51]. Therefore, this research aims to address these challenges and enhance existing scene text detection models by incorporating various designs. To address these challenges, we propose four key improvements: First, we refactor a widely used scene text detection method [1] and modify a simple yet efficient pipeline. This will serve as a basis for further enhancements. Second, we incorporate skip links into its feature extractor, effectively preventing vanishing gradient problems. Third, we apply the Feature Pyramid Network (FPN) [2] to eliminate the low-resolution issues. Specifically, we up-sample feature maps at different scales and concatenate them to form a high-resolution feature map. Lastly, we fine-tune the training schedules to avoid the overfitting issue. Concretely, our approach deploys fine-grained learning rates to train the model, enabling it to start from easier concepts to more complex ones. Through extensive experimentation, we demonstrate the robustness and effectiveness of our method. Our method outperforms the original EAST implementation on the ICDAR 2015 dataset by 5.71, achieving an F-score of 82.12. For more implementation details, please check our code at: https://github.com/ChunFei96/EAST_resnet50. Bachelor of Engineering (Computer Science) 2023-04-17T06:33:05Z 2023-04-17T06:33:05Z 2023 Final Year Project (FYP) Lee, C. F. (2023). Training deep network models for accurate detection of texts in scenes. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/165885 https://hdl.handle.net/10356/165885 en PSCSE21-0026 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering
spellingShingle	Engineering::Computer science and engineering Lee, Chun Fei Training deep network models for accurate detection of texts in scenes
description	In the scene text detection field, recent deep neural network-based approaches have garnered significant attention due to their impressive results on various benchmark datasets, including ICDAR 2013 [52], ICDAR 2015 [34], and MSRA-TD500 [53]. However, several existing methods for scene text detection employ complex pipelines with multiple intermediate steps, such as text-line candidate generation [29], rule-based filtering [29] , and word partitioning [7, 29]. This can lead to increased computational costs and time-consuming processing, ultimately resulting in reduced efficiency and performance degradation [49]. Moreover, the vanishing gradient and overfitting issues pose a significant challenge in scene text detection methods [1, 50]. Low-resolution feature maps also struggle to identify small and barely noticeable text in an image [51]. Therefore, this research aims to address these challenges and enhance existing scene text detection models by incorporating various designs. To address these challenges, we propose four key improvements: First, we refactor a widely used scene text detection method [1] and modify a simple yet efficient pipeline. This will serve as a basis for further enhancements. Second, we incorporate skip links into its feature extractor, effectively preventing vanishing gradient problems. Third, we apply the Feature Pyramid Network (FPN) [2] to eliminate the low-resolution issues. Specifically, we up-sample feature maps at different scales and concatenate them to form a high-resolution feature map. Lastly, we fine-tune the training schedules to avoid the overfitting issue. Concretely, our approach deploys fine-grained learning rates to train the model, enabling it to start from easier concepts to more complex ones. Through extensive experimentation, we demonstrate the robustness and effectiveness of our method. Our method outperforms the original EAST implementation on the ICDAR 2015 dataset by 5.71, achieving an F-score of 82.12. For more implementation details, please check our code at: https://github.com/ChunFei96/EAST_resnet50.
author2	Lu Shijian
author_facet	Lu Shijian Lee, Chun Fei
format	Final Year Project
author	Lee, Chun Fei
author_sort	Lee, Chun Fei
title	Training deep network models for accurate detection of texts in scenes
title_short	Training deep network models for accurate detection of texts in scenes
title_full	Training deep network models for accurate detection of texts in scenes
title_fullStr	Training deep network models for accurate detection of texts in scenes
title_full_unstemmed	Training deep network models for accurate detection of texts in scenes
title_sort	training deep network models for accurate detection of texts in scenes
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/165885
_version_	1764208153371607040

Training deep network models for accurate detection of texts in scenes

Similar Items