Training deep neural network models for accurate recognition of texts in scenes
Scene text recognition has been a research challenge for many years and is undoubtedly non-trivial due to varying conditions in natural scene images. This technology, however, is highly significant in many vision-based applications beyond document analysis. In this paper, a state-of-the-art neural n...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/137977 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-137977 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1379772020-04-21T00:32:09Z Training deep neural network models for accurate recognition of texts in scenes Lim, Joshen Eng Keat Lu Shijian School of Computer Science and Engineering shijian.lu@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Scene text recognition has been a research challenge for many years and is undoubtedly non-trivial due to varying conditions in natural scene images. This technology, however, is highly significant in many vision-based applications beyond document analysis. In this paper, a state-of-the-art neural network architecture that tackles scene text recognition through image-based sequence recognition is studied and its published results are emulated. Experiments are primarily conducted around the tuning of hyper-parameters of the model in efforts to build the best performing model, and the model’s accuracy is measured against two standard benchmark datasets, namely the IIIT 5k-word and the ICDAR13 datasets. Two main refinements were also added to the original implementation, namely early stopping during model training, and fine-tuning of the model. Both enhancements have resulted in the model’s performance improving slightly beyond the published results. Additionally, a program is written to demonstrate the performance and efficiency of the trained text recognition model in both a real-time scenario through a live camera feed and with static images. The program is also able to display the detected texts in the order which they are meant to be read from the image in the latter scenario. Bachelor of Engineering (Computer Science) 2020-04-21T00:32:09Z 2020-04-21T00:32:09Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/137977 en SCSE19-0040 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
country |
Singapore |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Lim, Joshen Eng Keat Training deep neural network models for accurate recognition of texts in scenes |
description |
Scene text recognition has been a research challenge for many years and is undoubtedly non-trivial due to varying conditions in natural scene images. This technology, however, is highly significant in many vision-based applications beyond document analysis. In this paper, a state-of-the-art neural network architecture that tackles scene text recognition through image-based sequence recognition is studied and its published results are emulated. Experiments are primarily conducted around the tuning of hyper-parameters of the model in efforts to build the best performing model, and the model’s accuracy is measured against two standard benchmark datasets, namely the IIIT 5k-word and the ICDAR13 datasets. Two main refinements were also added to the original implementation, namely early stopping during model training, and fine-tuning of the model. Both enhancements have resulted in the model’s performance improving slightly beyond the published results. Additionally, a program is written to demonstrate the performance and efficiency of the trained text recognition model in both a real-time scenario through a live camera feed and with static images. The program is also able to display the detected texts in the order which they are meant to be read from the image in the latter scenario. |
author2 |
Lu Shijian |
author_facet |
Lu Shijian Lim, Joshen Eng Keat |
format |
Final Year Project |
author |
Lim, Joshen Eng Keat |
author_sort |
Lim, Joshen Eng Keat |
title |
Training deep neural network models for accurate recognition of texts in scenes |
title_short |
Training deep neural network models for accurate recognition of texts in scenes |
title_full |
Training deep neural network models for accurate recognition of texts in scenes |
title_fullStr |
Training deep neural network models for accurate recognition of texts in scenes |
title_full_unstemmed |
Training deep neural network models for accurate recognition of texts in scenes |
title_sort |
training deep neural network models for accurate recognition of texts in scenes |
publisher |
Nanyang Technological University |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/137977 |
_version_ |
1681056522210115584 |