Training deep network models for accurate recognition of texts in scenes
Text recognition in scenes has always been a popular research field in computer vision and even natural language processing because of the large application spectrum, including automatic document scanning and license plate recognition. Recently, deep learning-based approaches have caught significant...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/166170 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-166170 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1661702023-04-28T15:41:32Z Training deep network models for accurate recognition of texts in scenes Sui, Lulu Lu Shijian School of Computer Science and Engineering Shijian.Lu@ntu.edu.sg Engineering::Computer science and engineering Text recognition in scenes has always been a popular research field in computer vision and even natural language processing because of the large application spectrum, including automatic document scanning and license plate recognition. Recently, deep learning-based approaches have caught significant attention for their impressive results on various benchmark datasets like IIIT5K, ICDAR 2013 and ICDAR 2015. The attractive results are accomplished through techniques such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and attention mechanisms. However, many challenges exist in this field, such as high variability of texts (orientation, lighting, low resolution) and a trade-off between accuracy and speed. Moreover, state-of-the-art (SOTA) models usually require high computational resources and hardware requirements to efficiently deploy the models. Therefore, this research aims to address these challenges, reproduce and enhance existing scene text recognition models by fine-tuning current designs. To address these challenges, we propose two key goals: my primary goal is to reproduce ABINet and SAR models. The ABINet model encodes linguistic knowledge into itself to achieve SOTA performance while the SAR model is relatively light and easy to train and deploy with good performance compared to many other models. This process will build a solid foundation for further enhancements. My next step is to fine-tune the models through hyperparameter tuning by systematically testing various learning rates. This approach will help to find the optimal combination of hyperparameters that gives the best performance, fastest training and lowest overfitting issue. We utilized six benchmarking test datasets to show the models’ performance and the outcome of hyperparameter tuning. After extensive tuning experiments, we set the optimal learning rate at 0.0001 for the SAR model. Our results demonstrate superior performance with ABINet achieving 80.1% accuracy and SAR achieving 84.6% accuracy. Bachelor of Business Bachelor of Engineering (Computer Science) 2023-04-24T02:20:21Z 2023-04-24T02:20:21Z 2023 Final Year Project (FYP) Sui, L. (2023). Training deep network models for accurate recognition of texts in scenes. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/166170 https://hdl.handle.net/10356/166170 en SCSE22-0070 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering |
spellingShingle |
Engineering::Computer science and engineering Sui, Lulu Training deep network models for accurate recognition of texts in scenes |
description |
Text recognition in scenes has always been a popular research field in computer vision and even natural language processing because of the large application spectrum, including automatic document scanning and license plate recognition. Recently, deep learning-based approaches have caught significant attention for their impressive results on various benchmark datasets like IIIT5K, ICDAR 2013 and ICDAR 2015. The attractive results are accomplished through techniques such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and attention mechanisms. However, many challenges exist in this field, such as high variability of texts (orientation, lighting, low resolution) and a trade-off between accuracy and speed. Moreover, state-of-the-art (SOTA) models usually require high computational resources and hardware requirements to efficiently deploy the models. Therefore, this research aims to address these challenges, reproduce and enhance existing scene text recognition models by fine-tuning current designs. To address these challenges, we propose two key goals: my primary goal is to reproduce ABINet and SAR models. The ABINet model encodes linguistic knowledge into itself to achieve SOTA performance while the SAR model is relatively light and easy to train and deploy with good performance compared to many other models. This process will build a solid foundation for further enhancements. My next step is to fine-tune the models through hyperparameter tuning by systematically testing various learning rates. This approach will help to find the optimal combination of hyperparameters that gives the best performance, fastest training and lowest overfitting issue.
We utilized six benchmarking test datasets to show the models’ performance and the outcome of hyperparameter tuning. After extensive tuning experiments, we set the optimal learning rate at 0.0001 for the SAR model. Our results demonstrate superior performance with ABINet achieving 80.1% accuracy and SAR achieving 84.6% accuracy. |
author2 |
Lu Shijian |
author_facet |
Lu Shijian Sui, Lulu |
format |
Final Year Project |
author |
Sui, Lulu |
author_sort |
Sui, Lulu |
title |
Training deep network models for accurate recognition of texts in scenes |
title_short |
Training deep network models for accurate recognition of texts in scenes |
title_full |
Training deep network models for accurate recognition of texts in scenes |
title_fullStr |
Training deep network models for accurate recognition of texts in scenes |
title_full_unstemmed |
Training deep network models for accurate recognition of texts in scenes |
title_sort |
training deep network models for accurate recognition of texts in scenes |
publisher |
Nanyang Technological University |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/166170 |
_version_ |
1765213833319677952 |