Deep learning-based text recognition of agricultural regulatory document
In this study, an OCR system based on deep learning techniques was deployed to digitize scanned agricultural regulatory documents comprising of certificates and labels. Recognition of the certificates and labels is challenging as they are scanned images of the hard copy form and the layout and size...
Saved in:
Main Authors: | , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2022
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/7334 https://ink.library.smu.edu.sg/context/sis_research/article/8337/viewcontent/OCR_agricultural_reg_doc_ICCCI.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-8337 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-83372023-08-07T00:35:59Z Deep learning-based text recognition of agricultural regulatory document FWA, Hua Leong CHAN, Farn Haur In this study, an OCR system based on deep learning techniques was deployed to digitize scanned agricultural regulatory documents comprising of certificates and labels. Recognition of the certificates and labels is challenging as they are scanned images of the hard copy form and the layout and size of the text as well as the languages vary between the various countries (due to diverse regulatory requirements). We evaluated and compared between various state-of-the-art deep learningbased text detection and recognition model as well as a packaged OCR library – Tesseract. We then adopted a two-stage approach comprising of text detection using Character Region Awareness For Text (CRAFT) followed by recognition using OCR branch of a multi-lingual text recognition algorithm E2E-MLT. A sliding windows text matcher is used to enhance the extraction of the required information such as trade names, active ingredients and crops. Initial evaluation revealed that the system performs well with a high accuracy of 91.9% for the recognition of trade names in certificates and labels and the system is currently deployed for use in Philippines, one of our collaborator’s sites. 2022-09-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7334 info:doi/10.1007/978-3-031-16210-7_18 https://ink.library.smu.edu.sg/context/sis_research/article/8337/viewcontent/OCR_agricultural_reg_doc_ICCCI.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Deep learning Text detection Optical character recognition Regulatory document Artificial Intelligence and Robotics Databases and Information Systems Numerical Analysis and Scientific Computing |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Deep learning Text detection Optical character recognition Regulatory document Artificial Intelligence and Robotics Databases and Information Systems Numerical Analysis and Scientific Computing |
spellingShingle |
Deep learning Text detection Optical character recognition Regulatory document Artificial Intelligence and Robotics Databases and Information Systems Numerical Analysis and Scientific Computing FWA, Hua Leong CHAN, Farn Haur Deep learning-based text recognition of agricultural regulatory document |
description |
In this study, an OCR system based on deep learning techniques was deployed to digitize scanned agricultural regulatory documents comprising of certificates and labels. Recognition of the certificates and labels is challenging as they are scanned images of the hard copy form and the layout and size of the text as well as the languages vary between the various countries (due to diverse regulatory requirements). We evaluated and compared between various state-of-the-art deep learningbased text detection and recognition model as well as a packaged OCR library – Tesseract. We then adopted a two-stage approach comprising of text detection using Character Region Awareness For Text (CRAFT) followed by recognition using OCR branch of a multi-lingual text recognition algorithm E2E-MLT. A sliding windows text matcher is used to enhance the extraction of the required information such as trade names, active ingredients and crops. Initial evaluation revealed that the system performs well with a high accuracy of 91.9% for the recognition of trade names in certificates and labels and the system is currently deployed for use in Philippines, one of our collaborator’s sites. |
format |
text |
author |
FWA, Hua Leong CHAN, Farn Haur |
author_facet |
FWA, Hua Leong CHAN, Farn Haur |
author_sort |
FWA, Hua Leong |
title |
Deep learning-based text recognition of agricultural regulatory document |
title_short |
Deep learning-based text recognition of agricultural regulatory document |
title_full |
Deep learning-based text recognition of agricultural regulatory document |
title_fullStr |
Deep learning-based text recognition of agricultural regulatory document |
title_full_unstemmed |
Deep learning-based text recognition of agricultural regulatory document |
title_sort |
deep learning-based text recognition of agricultural regulatory document |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2022 |
url |
https://ink.library.smu.edu.sg/sis_research/7334 https://ink.library.smu.edu.sg/context/sis_research/article/8337/viewcontent/OCR_agricultural_reg_doc_ICCCI.pdf |
_version_ |
1773551433092694016 |