Deep learning for optical character recognition in online images

Product images on e-commerce product listing sites contain a wealth of information about the products. The ability to encode the text in the product images into machine readable form through Optical Character Recognition is crucial for machines to develop a better understanding of the products. T...

Full description

Saved in:

Bibliographic Details
Main Author:	Lim, Yi Xian
Other Authors:	Gwee Bah Hwee
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2023
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Online Access:	https://hdl.handle.net/10356/167016
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-167016
record_format	dspace
spelling	sg-ntu-dr.10356-1670162023-07-07T17:22:44Z Deep learning for optical character recognition in online images Lim, Yi Xian Gwee Bah Hwee School of Electrical and Electronic Engineering Hong Xuenong ebhgwee@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Document and text processing Product images on e-commerce product listing sites contain a wealth of information about the products. The ability to encode the text in the product images into machine readable form through Optical Character Recognition is crucial for machines to develop a better understanding of the products. This will allow for data mining to be performed, laying the foundation for advanced features that can provide value add to the e-commerce platform users. In this study, a custom end-to-end OCR system optimized for performance in terms of speed, recall, and precision on online e-commerce images (online images) is proposed. The pipeline, consisting of a Mask R-CNN based text detection model, and an ABINET text recognition model, is able to perform with an accuracy of 67.3%. This represents a 96% increase in performance relative to the benchmark OCR pipeline. The pipeline also achieves competitive performance to SOTA algorithms in the ICDAR 2015 Born Digital competition after accounting for differences in challenge level. A web platform was also successfully developed to allow for online text detection and recognition, and to visualize the performance of the pipeline. Bachelor of Engineering (Electrical and Electronic Engineering) 2023-05-15T02:00:26Z 2023-05-15T02:00:26Z 2023 Final Year Project (FYP) Lim, Y. X. (2023). Deep learning for optical character recognition in online images. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/167016 https://hdl.handle.net/10356/167016 en A2130-221 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Document and text processing
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Document and text processing Lim, Yi Xian Deep learning for optical character recognition in online images
description	Product images on e-commerce product listing sites contain a wealth of information about the products. The ability to encode the text in the product images into machine readable form through Optical Character Recognition is crucial for machines to develop a better understanding of the products. This will allow for data mining to be performed, laying the foundation for advanced features that can provide value add to the e-commerce platform users. In this study, a custom end-to-end OCR system optimized for performance in terms of speed, recall, and precision on online e-commerce images (online images) is proposed. The pipeline, consisting of a Mask R-CNN based text detection model, and an ABINET text recognition model, is able to perform with an accuracy of 67.3%. This represents a 96% increase in performance relative to the benchmark OCR pipeline. The pipeline also achieves competitive performance to SOTA algorithms in the ICDAR 2015 Born Digital competition after accounting for differences in challenge level. A web platform was also successfully developed to allow for online text detection and recognition, and to visualize the performance of the pipeline.
author2	Gwee Bah Hwee
author_facet	Gwee Bah Hwee Lim, Yi Xian
format	Final Year Project
author	Lim, Yi Xian
author_sort	Lim, Yi Xian
title	Deep learning for optical character recognition in online images
title_short	Deep learning for optical character recognition in online images
title_full	Deep learning for optical character recognition in online images
title_fullStr	Deep learning for optical character recognition in online images
title_full_unstemmed	Deep learning for optical character recognition in online images
title_sort	deep learning for optical character recognition in online images
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/167016
_version_	1772825671795474432

Deep learning for optical character recognition in online images

Similar Items