DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION

Digitalization of archiving is a good solution for handling the current needs of archiving. One method that can help in the digitalization of archiving is by implementing Optical Character Recognition (OCR). However, most of the OCR software that is currently popular still requires the internet t...

Full description

Saved in:

Bibliographic Details
Main Author:	Fakhiri Setiawan, Harith
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/79657
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:79657
spelling	id-itb.:796572024-01-15T07:54:16ZDEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION Fakhiri Setiawan, Harith Indonesia Final Project Keywords: Digital Archiving, OCR, NPWP, text recognition, text detection, NER tagging, model integration, backend. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/79657 Digitalization of archiving is a good solution for handling the current needs of archiving. One method that can help in the digitalization of archiving is by implementing Optical Character Recognition (OCR). However, most of the OCR software that is currently popular still requires the internet to use. In fact, internet network access in Indonesia is still not evenly distributed. In addition, the need for OCR software to help digitalize formal document archiving continues to increase, for example digitizing formal taxpayer identification number document. Therefore, it is necessary to develop a system for understanding formal taxpayer identification number documents that takes into account the internet network access factor of the Indonesian people. The NPWP formal document understanding system consists of 5 stages, namely image pre-processing, text recognition, re-alignment, text detection, and NER Tagging. The models used in the understanding system is DB++ ResNet50 for text detection with an F 1 3 score of 93.4%, PP-OCRv3 for text recognition with a CER of 7.87%, and the NER model from spaCy for NER Tagging with an accuration of 100%. The models are selected based on their performance which is tested on specific metrics for each model, as well as size and inference time. The test results show that the model used in the system have good performances and are successfully integrated into the OCR software created. In addition, a strategy to minimize internet usage is also implemented using a backend system in the form of REST API using the Flask framework. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Digitalization of archiving is a good solution for handling the current needs of archiving. One method that can help in the digitalization of archiving is by implementing Optical Character Recognition (OCR). However, most of the OCR software that is currently popular still requires the internet to use. In fact, internet network access in Indonesia is still not evenly distributed. In addition, the need for OCR software to help digitalize formal document archiving continues to increase, for example digitizing formal taxpayer identification number document. Therefore, it is necessary to develop a system for understanding formal taxpayer identification number documents that takes into account the internet network access factor of the Indonesian people. The NPWP formal document understanding system consists of 5 stages, namely image pre-processing, text recognition, re-alignment, text detection, and NER Tagging. The models used in the understanding system is DB++ ResNet50 for text detection with an F 1 3 score of 93.4%, PP-OCRv3 for text recognition with a CER of 7.87%, and the NER model from spaCy for NER Tagging with an accuration of 100%. The models are selected based on their performance which is tested on specific metrics for each model, as well as size and inference time. The test results show that the model used in the system have good performances and are successfully integrated into the OCR software created. In addition, a strategy to minimize internet usage is also implemented using a backend system in the form of REST API using the Flask framework.
format	Final Project
author	Fakhiri Setiawan, Harith
spellingShingle	Fakhiri Setiawan, Harith DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION
author_facet	Fakhiri Setiawan, Harith
author_sort	Fakhiri Setiawan, Harith
title	DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION
title_short	DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION
title_full	DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION
title_fullStr	DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION
title_full_unstemmed	DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION
title_sort	development of formal taxpayer identification number document understanding system with optical character recognition models for mobile based implementation
url	https://digilib.itb.ac.id/gdl/view/79657
_version_	1822281376238403584

DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION

Similar Items