DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION
Digitalization of archiving is a good solution for handling the current needs of archiving. One method that can help in the digitalization of archiving is by implementing Optical Character Recognition (OCR). However, most of the OCR software that is currently popular still requires the internet t...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/79657 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Digitalization of archiving is a good solution for handling the current needs of archiving.
One method that can help in the digitalization of archiving is by implementing Optical
Character Recognition (OCR). However, most of the OCR software that is currently popular
still requires the internet to use. In fact, internet network access in Indonesia is still not
evenly distributed. In addition, the need for OCR software to help digitalize formal document
archiving continues to increase, for example digitizing formal taxpayer identification number
document. Therefore, it is necessary to develop a system for understanding formal taxpayer
identification number documents that takes into account the internet network access factor of
the Indonesian people. The NPWP formal document understanding system consists of 5
stages, namely image pre-processing, text recognition, re-alignment, text detection, and NER
Tagging. The models used in the understanding system is DB++ ResNet50 for text detection
with an F
1
3
score of 93.4%, PP-OCRv3 for text recognition with a CER of 7.87%, and the
NER model from spaCy for NER Tagging with an accuration of 100%. The models are
selected based on their performance which is tested on specific metrics for each model, as
well as size and inference time. The test results show that the model used in the system have
good performances and are successfully integrated into the OCR software created. In
addition, a strategy to minimize internet usage is also implemented using a backend system in
the form of REST API using the Flask framework. |
---|