Deep learning for optical character recognition in online images

Product images on e-commerce product listing sites contain a wealth of information about the products. The ability to encode the text in the product images into machine readable form through Optical Character Recognition is crucial for machines to develop a better understanding of the products. T...

Full description

Saved in:
Bibliographic Details
Main Author: Lim, Yi Xian
Other Authors: Gwee Bah Hwee
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/167016
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-167016
record_format dspace
spelling sg-ntu-dr.10356-1670162023-07-07T17:22:44Z Deep learning for optical character recognition in online images Lim, Yi Xian Gwee Bah Hwee School of Electrical and Electronic Engineering Hong Xuenong ebhgwee@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Document and text processing Product images on e-commerce product listing sites contain a wealth of information about the products. The ability to encode the text in the product images into machine readable form through Optical Character Recognition is crucial for machines to develop a better understanding of the products. This will allow for data mining to be performed, laying the foundation for advanced features that can provide value add to the e-commerce platform users. In this study, a custom end-to-end OCR system optimized for performance in terms of speed, recall, and precision on online e-commerce images (online images) is proposed. The pipeline, consisting of a Mask R-CNN based text detection model, and an ABINET text recognition model, is able to perform with an accuracy of 67.3%. This represents a 96% increase in performance relative to the benchmark OCR pipeline. The pipeline also achieves competitive performance to SOTA algorithms in the ICDAR 2015 Born Digital competition after accounting for differences in challenge level. A web platform was also successfully developed to allow for online text detection and recognition, and to visualize the performance of the pipeline. Bachelor of Engineering (Electrical and Electronic Engineering) 2023-05-15T02:00:26Z 2023-05-15T02:00:26Z 2023 Final Year Project (FYP) Lim, Y. X. (2023). Deep learning for optical character recognition in online images. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/167016 https://hdl.handle.net/10356/167016 en A2130-221 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Computer science and engineering::Computing methodologies::Document and text processing
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Lim, Yi Xian
Deep learning for optical character recognition in online images
description Product images on e-commerce product listing sites contain a wealth of information about the products. The ability to encode the text in the product images into machine readable form through Optical Character Recognition is crucial for machines to develop a better understanding of the products. This will allow for data mining to be performed, laying the foundation for advanced features that can provide value add to the e-commerce platform users. In this study, a custom end-to-end OCR system optimized for performance in terms of speed, recall, and precision on online e-commerce images (online images) is proposed. The pipeline, consisting of a Mask R-CNN based text detection model, and an ABINET text recognition model, is able to perform with an accuracy of 67.3%. This represents a 96% increase in performance relative to the benchmark OCR pipeline. The pipeline also achieves competitive performance to SOTA algorithms in the ICDAR 2015 Born Digital competition after accounting for differences in challenge level. A web platform was also successfully developed to allow for online text detection and recognition, and to visualize the performance of the pipeline.
author2 Gwee Bah Hwee
author_facet Gwee Bah Hwee
Lim, Yi Xian
format Final Year Project
author Lim, Yi Xian
author_sort Lim, Yi Xian
title Deep learning for optical character recognition in online images
title_short Deep learning for optical character recognition in online images
title_full Deep learning for optical character recognition in online images
title_fullStr Deep learning for optical character recognition in online images
title_full_unstemmed Deep learning for optical character recognition in online images
title_sort deep learning for optical character recognition in online images
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/167016
_version_ 1772825671795474432