Training deep neural network models for accurate recognition of texts in scenes

Scene text recognition has been a research challenge for many years and is undoubtedly non-trivial due to varying conditions in natural scene images. This technology, however, is highly significant in many vision-based applications beyond document analysis. In this paper, a state-of-the-art neural n...

Full description

Saved in:
Bibliographic Details
Main Author: Lim, Joshen Eng Keat
Other Authors: Lu Shijian
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/137977
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-137977
record_format dspace
spelling sg-ntu-dr.10356-1379772020-04-21T00:32:09Z Training deep neural network models for accurate recognition of texts in scenes Lim, Joshen Eng Keat Lu Shijian School of Computer Science and Engineering shijian.lu@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Scene text recognition has been a research challenge for many years and is undoubtedly non-trivial due to varying conditions in natural scene images. This technology, however, is highly significant in many vision-based applications beyond document analysis. In this paper, a state-of-the-art neural network architecture that tackles scene text recognition through image-based sequence recognition is studied and its published results are emulated. Experiments are primarily conducted around the tuning of hyper-parameters of the model in efforts to build the best performing model, and the model’s accuracy is measured against two standard benchmark datasets, namely the IIIT 5k-word and the ICDAR13 datasets. Two main refinements were also added to the original implementation, namely early stopping during model training, and fine-tuning of the model. Both enhancements have resulted in the model’s performance improving slightly beyond the published results. Additionally, a program is written to demonstrate the performance and efficiency of the trained text recognition model in both a real-time scenario through a live camera feed and with static images. The program is also able to display the detected texts in the order which they are meant to be read from the image in the latter scenario. Bachelor of Engineering (Computer Science) 2020-04-21T00:32:09Z 2020-04-21T00:32:09Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/137977 en SCSE19-0040 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Lim, Joshen Eng Keat
Training deep neural network models for accurate recognition of texts in scenes
description Scene text recognition has been a research challenge for many years and is undoubtedly non-trivial due to varying conditions in natural scene images. This technology, however, is highly significant in many vision-based applications beyond document analysis. In this paper, a state-of-the-art neural network architecture that tackles scene text recognition through image-based sequence recognition is studied and its published results are emulated. Experiments are primarily conducted around the tuning of hyper-parameters of the model in efforts to build the best performing model, and the model’s accuracy is measured against two standard benchmark datasets, namely the IIIT 5k-word and the ICDAR13 datasets. Two main refinements were also added to the original implementation, namely early stopping during model training, and fine-tuning of the model. Both enhancements have resulted in the model’s performance improving slightly beyond the published results. Additionally, a program is written to demonstrate the performance and efficiency of the trained text recognition model in both a real-time scenario through a live camera feed and with static images. The program is also able to display the detected texts in the order which they are meant to be read from the image in the latter scenario.
author2 Lu Shijian
author_facet Lu Shijian
Lim, Joshen Eng Keat
format Final Year Project
author Lim, Joshen Eng Keat
author_sort Lim, Joshen Eng Keat
title Training deep neural network models for accurate recognition of texts in scenes
title_short Training deep neural network models for accurate recognition of texts in scenes
title_full Training deep neural network models for accurate recognition of texts in scenes
title_fullStr Training deep neural network models for accurate recognition of texts in scenes
title_full_unstemmed Training deep neural network models for accurate recognition of texts in scenes
title_sort training deep neural network models for accurate recognition of texts in scenes
publisher Nanyang Technological University
publishDate 2020
url https://hdl.handle.net/10356/137977
_version_ 1681056522210115584