DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION

Digitalization of archiving is a good solution for handling the current needs of archiving. One method that can help in the digitalization of archiving is by implementing Optical Character Recognition (OCR). However, most of the OCR software that is currently popular still requires the internet t...

Full description

Saved in:
Bibliographic Details
Main Author: Fakhiri Setiawan, Harith
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/79657
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Digitalization of archiving is a good solution for handling the current needs of archiving. One method that can help in the digitalization of archiving is by implementing Optical Character Recognition (OCR). However, most of the OCR software that is currently popular still requires the internet to use. In fact, internet network access in Indonesia is still not evenly distributed. In addition, the need for OCR software to help digitalize formal document archiving continues to increase, for example digitizing formal taxpayer identification number document. Therefore, it is necessary to develop a system for understanding formal taxpayer identification number documents that takes into account the internet network access factor of the Indonesian people. The NPWP formal document understanding system consists of 5 stages, namely image pre-processing, text recognition, re-alignment, text detection, and NER Tagging. The models used in the understanding system is DB++ ResNet50 for text detection with an F 1 3 score of 93.4%, PP-OCRv3 for text recognition with a CER of 7.87%, and the NER model from spaCy for NER Tagging with an accuration of 100%. The models are selected based on their performance which is tested on specific metrics for each model, as well as size and inference time. The test results show that the model used in the system have good performances and are successfully integrated into the OCR software created. In addition, a strategy to minimize internet usage is also implemented using a backend system in the form of REST API using the Flask framework.