Training deep neural network models for accurate recognition of texts in scenes

Scene text recognition has been a research challenge for many years and is undoubtedly non-trivial due to varying conditions in natural scene images. This technology, however, is highly significant in many vision-based applications beyond document analysis. In this paper, a state-of-the-art neural n...

Full description

Saved in:

Bibliographic Details
Main Author:	Lim, Joshen Eng Keat
Other Authors:	Lu Shijian
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2020
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Online Access:	https://hdl.handle.net/10356/137977
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-137977
record_format	dspace
spelling	sg-ntu-dr.10356-1379772020-04-21T00:32:09Z Training deep neural network models for accurate recognition of texts in scenes Lim, Joshen Eng Keat Lu Shijian School of Computer Science and Engineering shijian.lu@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Scene text recognition has been a research challenge for many years and is undoubtedly non-trivial due to varying conditions in natural scene images. This technology, however, is highly significant in many vision-based applications beyond document analysis. In this paper, a state-of-the-art neural network architecture that tackles scene text recognition through image-based sequence recognition is studied and its published results are emulated. Experiments are primarily conducted around the tuning of hyper-parameters of the model in efforts to build the best performing model, and the model’s accuracy is measured against two standard benchmark datasets, namely the IIIT 5k-word and the ICDAR13 datasets. Two main refinements were also added to the original implementation, namely early stopping during model training, and fine-tuning of the model. Both enhancements have resulted in the model’s performance improving slightly beyond the published results. Additionally, a program is written to demonstrate the performance and efficiency of the trained text recognition model in both a real-time scenario through a live camera feed and with static images. The program is also able to display the detected texts in the order which they are meant to be read from the image in the latter scenario. Bachelor of Engineering (Computer Science) 2020-04-21T00:32:09Z 2020-04-21T00:32:09Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/137977 en SCSE19-0040 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Lim, Joshen Eng Keat Training deep neural network models for accurate recognition of texts in scenes
description	Scene text recognition has been a research challenge for many years and is undoubtedly non-trivial due to varying conditions in natural scene images. This technology, however, is highly significant in many vision-based applications beyond document analysis. In this paper, a state-of-the-art neural network architecture that tackles scene text recognition through image-based sequence recognition is studied and its published results are emulated. Experiments are primarily conducted around the tuning of hyper-parameters of the model in efforts to build the best performing model, and the model’s accuracy is measured against two standard benchmark datasets, namely the IIIT 5k-word and the ICDAR13 datasets. Two main refinements were also added to the original implementation, namely early stopping during model training, and fine-tuning of the model. Both enhancements have resulted in the model’s performance improving slightly beyond the published results. Additionally, a program is written to demonstrate the performance and efficiency of the trained text recognition model in both a real-time scenario through a live camera feed and with static images. The program is also able to display the detected texts in the order which they are meant to be read from the image in the latter scenario.
author2	Lu Shijian
author_facet	Lu Shijian Lim, Joshen Eng Keat
format	Final Year Project
author	Lim, Joshen Eng Keat
author_sort	Lim, Joshen Eng Keat
title	Training deep neural network models for accurate recognition of texts in scenes
title_short	Training deep neural network models for accurate recognition of texts in scenes
title_full	Training deep neural network models for accurate recognition of texts in scenes
title_fullStr	Training deep neural network models for accurate recognition of texts in scenes
title_full_unstemmed	Training deep neural network models for accurate recognition of texts in scenes
title_sort	training deep neural network models for accurate recognition of texts in scenes
publisher	Nanyang Technological University
publishDate	2020
url	https://hdl.handle.net/10356/137977
_version_	1681056522210115584

Training deep neural network models for accurate recognition of texts in scenes

Similar Items