Deep learning for optical character recognition in online images

Product images on e-commerce product listing sites contain a wealth of information about the products. The ability to encode the text in the product images into machine readable form through Optical Character Recognition is crucial for machines to develop a better understanding of the products. T...

Full description

Saved in:
Bibliographic Details
Main Author: Lim, Yi Xian
Other Authors: Gwee Bah Hwee
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/167016
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Product images on e-commerce product listing sites contain a wealth of information about the products. The ability to encode the text in the product images into machine readable form through Optical Character Recognition is crucial for machines to develop a better understanding of the products. This will allow for data mining to be performed, laying the foundation for advanced features that can provide value add to the e-commerce platform users. In this study, a custom end-to-end OCR system optimized for performance in terms of speed, recall, and precision on online e-commerce images (online images) is proposed. The pipeline, consisting of a Mask R-CNN based text detection model, and an ABINET text recognition model, is able to perform with an accuracy of 67.3%. This represents a 96% increase in performance relative to the benchmark OCR pipeline. The pipeline also achieves competitive performance to SOTA algorithms in the ICDAR 2015 Born Digital competition after accounting for differences in challenge level. A web platform was also successfully developed to allow for online text detection and recognition, and to visualize the performance of the pipeline.