Deep learning-based text recognition of agricultural regulatory document

In this study, an OCR system based on deep learning techniques was deployed to digitize scanned agricultural regulatory documents comprising of certificates and labels. Recognition of the certificates and labels is challenging as they are scanned images of the hard copy form and the layout and size...

Full description

Saved in:
Bibliographic Details
Main Authors: FWA, Hua Leong, CHAN, Farn Haur
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2022
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7334
https://ink.library.smu.edu.sg/context/sis_research/article/8337/viewcontent/OCR_agricultural_reg_doc_ICCCI.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8337
record_format dspace
spelling sg-smu-ink.sis_research-83372023-08-07T00:35:59Z Deep learning-based text recognition of agricultural regulatory document FWA, Hua Leong CHAN, Farn Haur In this study, an OCR system based on deep learning techniques was deployed to digitize scanned agricultural regulatory documents comprising of certificates and labels. Recognition of the certificates and labels is challenging as they are scanned images of the hard copy form and the layout and size of the text as well as the languages vary between the various countries (due to diverse regulatory requirements). We evaluated and compared between various state-of-the-art deep learningbased text detection and recognition model as well as a packaged OCR library – Tesseract. We then adopted a two-stage approach comprising of text detection using Character Region Awareness For Text (CRAFT) followed by recognition using OCR branch of a multi-lingual text recognition algorithm E2E-MLT. A sliding windows text matcher is used to enhance the extraction of the required information such as trade names, active ingredients and crops. Initial evaluation revealed that the system performs well with a high accuracy of 91.9% for the recognition of trade names in certificates and labels and the system is currently deployed for use in Philippines, one of our collaborator’s sites. 2022-09-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7334 info:doi/10.1007/978-3-031-16210-7_18 https://ink.library.smu.edu.sg/context/sis_research/article/8337/viewcontent/OCR_agricultural_reg_doc_ICCCI.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Deep learning Text detection Optical character recognition Regulatory document Artificial Intelligence and Robotics Databases and Information Systems Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Deep learning
Text detection
Optical character recognition
Regulatory document
Artificial Intelligence and Robotics
Databases and Information Systems
Numerical Analysis and Scientific Computing
spellingShingle Deep learning
Text detection
Optical character recognition
Regulatory document
Artificial Intelligence and Robotics
Databases and Information Systems
Numerical Analysis and Scientific Computing
FWA, Hua Leong
CHAN, Farn Haur
Deep learning-based text recognition of agricultural regulatory document
description In this study, an OCR system based on deep learning techniques was deployed to digitize scanned agricultural regulatory documents comprising of certificates and labels. Recognition of the certificates and labels is challenging as they are scanned images of the hard copy form and the layout and size of the text as well as the languages vary between the various countries (due to diverse regulatory requirements). We evaluated and compared between various state-of-the-art deep learningbased text detection and recognition model as well as a packaged OCR library – Tesseract. We then adopted a two-stage approach comprising of text detection using Character Region Awareness For Text (CRAFT) followed by recognition using OCR branch of a multi-lingual text recognition algorithm E2E-MLT. A sliding windows text matcher is used to enhance the extraction of the required information such as trade names, active ingredients and crops. Initial evaluation revealed that the system performs well with a high accuracy of 91.9% for the recognition of trade names in certificates and labels and the system is currently deployed for use in Philippines, one of our collaborator’s sites.
format text
author FWA, Hua Leong
CHAN, Farn Haur
author_facet FWA, Hua Leong
CHAN, Farn Haur
author_sort FWA, Hua Leong
title Deep learning-based text recognition of agricultural regulatory document
title_short Deep learning-based text recognition of agricultural regulatory document
title_full Deep learning-based text recognition of agricultural regulatory document
title_fullStr Deep learning-based text recognition of agricultural regulatory document
title_full_unstemmed Deep learning-based text recognition of agricultural regulatory document
title_sort deep learning-based text recognition of agricultural regulatory document
publisher Institutional Knowledge at Singapore Management University
publishDate 2022
url https://ink.library.smu.edu.sg/sis_research/7334
https://ink.library.smu.edu.sg/context/sis_research/article/8337/viewcontent/OCR_agricultural_reg_doc_ICCCI.pdf
_version_ 1773551433092694016