Deep learning for optical character recognition in online images
Product images on e-commerce product listing sites contain a wealth of information about the products. The ability to encode the text in the product images into machine readable form through Optical Character Recognition is crucial for machines to develop a better understanding of the products. T...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/167016 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-167016 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1670162023-07-07T17:22:44Z Deep learning for optical character recognition in online images Lim, Yi Xian Gwee Bah Hwee School of Electrical and Electronic Engineering Hong Xuenong ebhgwee@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Document and text processing Product images on e-commerce product listing sites contain a wealth of information about the products. The ability to encode the text in the product images into machine readable form through Optical Character Recognition is crucial for machines to develop a better understanding of the products. This will allow for data mining to be performed, laying the foundation for advanced features that can provide value add to the e-commerce platform users. In this study, a custom end-to-end OCR system optimized for performance in terms of speed, recall, and precision on online e-commerce images (online images) is proposed. The pipeline, consisting of a Mask R-CNN based text detection model, and an ABINET text recognition model, is able to perform with an accuracy of 67.3%. This represents a 96% increase in performance relative to the benchmark OCR pipeline. The pipeline also achieves competitive performance to SOTA algorithms in the ICDAR 2015 Born Digital competition after accounting for differences in challenge level. A web platform was also successfully developed to allow for online text detection and recognition, and to visualize the performance of the pipeline. Bachelor of Engineering (Electrical and Electronic Engineering) 2023-05-15T02:00:26Z 2023-05-15T02:00:26Z 2023 Final Year Project (FYP) Lim, Y. X. (2023). Deep learning for optical character recognition in online images. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/167016 https://hdl.handle.net/10356/167016 en A2130-221 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Document and text processing |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Document and text processing Lim, Yi Xian Deep learning for optical character recognition in online images |
description |
Product images on e-commerce product listing sites contain a wealth of information about
the products. The ability to encode the text in the product images into machine readable
form through Optical Character Recognition is crucial for machines to develop a better
understanding of the products. This will allow for data mining to be performed, laying the
foundation for advanced features that can provide value add to the e-commerce platform
users.
In this study, a custom end-to-end OCR system optimized for performance in terms of
speed, recall, and precision on online e-commerce images (online images) is proposed.
The pipeline, consisting of a Mask R-CNN based text detection model, and an ABINET
text recognition model, is able to perform with an accuracy of 67.3%. This represents a
96% increase in performance relative to the benchmark OCR pipeline. The pipeline also
achieves competitive performance to SOTA algorithms in the ICDAR 2015 Born Digital
competition after accounting for differences in challenge level. A web platform was also
successfully developed to allow for online text detection and recognition, and to visualize
the performance of the pipeline. |
author2 |
Gwee Bah Hwee |
author_facet |
Gwee Bah Hwee Lim, Yi Xian |
format |
Final Year Project |
author |
Lim, Yi Xian |
author_sort |
Lim, Yi Xian |
title |
Deep learning for optical character recognition in online images |
title_short |
Deep learning for optical character recognition in online images |
title_full |
Deep learning for optical character recognition in online images |
title_fullStr |
Deep learning for optical character recognition in online images |
title_full_unstemmed |
Deep learning for optical character recognition in online images |
title_sort |
deep learning for optical character recognition in online images |
publisher |
Nanyang Technological University |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/167016 |
_version_ |
1772825671795474432 |