DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION

Digitalization of archiving is a good solution for handling the current needs of archiving. One method that can help in the digitalization of archiving is by implementing Optical Character Recognition (OCR). However, most of the OCR software that is currently popular still requires the internet t...

Full description

Saved in:
Bibliographic Details
Main Author: Fakhiri Setiawan, Harith
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/79657
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:79657
spelling id-itb.:796572024-01-15T07:54:16ZDEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION Fakhiri Setiawan, Harith Indonesia Final Project Keywords: Digital Archiving, OCR, NPWP, text recognition, text detection, NER tagging, model integration, backend. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/79657 Digitalization of archiving is a good solution for handling the current needs of archiving. One method that can help in the digitalization of archiving is by implementing Optical Character Recognition (OCR). However, most of the OCR software that is currently popular still requires the internet to use. In fact, internet network access in Indonesia is still not evenly distributed. In addition, the need for OCR software to help digitalize formal document archiving continues to increase, for example digitizing formal taxpayer identification number document. Therefore, it is necessary to develop a system for understanding formal taxpayer identification number documents that takes into account the internet network access factor of the Indonesian people. The NPWP formal document understanding system consists of 5 stages, namely image pre-processing, text recognition, re-alignment, text detection, and NER Tagging. The models used in the understanding system is DB++ ResNet50 for text detection with an F 1 3 score of 93.4%, PP-OCRv3 for text recognition with a CER of 7.87%, and the NER model from spaCy for NER Tagging with an accuration of 100%. The models are selected based on their performance which is tested on specific metrics for each model, as well as size and inference time. The test results show that the model used in the system have good performances and are successfully integrated into the OCR software created. In addition, a strategy to minimize internet usage is also implemented using a backend system in the form of REST API using the Flask framework. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Digitalization of archiving is a good solution for handling the current needs of archiving. One method that can help in the digitalization of archiving is by implementing Optical Character Recognition (OCR). However, most of the OCR software that is currently popular still requires the internet to use. In fact, internet network access in Indonesia is still not evenly distributed. In addition, the need for OCR software to help digitalize formal document archiving continues to increase, for example digitizing formal taxpayer identification number document. Therefore, it is necessary to develop a system for understanding formal taxpayer identification number documents that takes into account the internet network access factor of the Indonesian people. The NPWP formal document understanding system consists of 5 stages, namely image pre-processing, text recognition, re-alignment, text detection, and NER Tagging. The models used in the understanding system is DB++ ResNet50 for text detection with an F 1 3 score of 93.4%, PP-OCRv3 for text recognition with a CER of 7.87%, and the NER model from spaCy for NER Tagging with an accuration of 100%. The models are selected based on their performance which is tested on specific metrics for each model, as well as size and inference time. The test results show that the model used in the system have good performances and are successfully integrated into the OCR software created. In addition, a strategy to minimize internet usage is also implemented using a backend system in the form of REST API using the Flask framework.
format Final Project
author Fakhiri Setiawan, Harith
spellingShingle Fakhiri Setiawan, Harith
DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION
author_facet Fakhiri Setiawan, Harith
author_sort Fakhiri Setiawan, Harith
title DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION
title_short DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION
title_full DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION
title_fullStr DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION
title_full_unstemmed DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION
title_sort development of formal taxpayer identification number document understanding system with optical character recognition models for mobile based implementation
url https://digilib.itb.ac.id/gdl/view/79657
_version_ 1822281376238403584