DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION
Digitalization of archiving is a good solution for handling the current needs of archiving. One method that can help in the digitalization of archiving is by implementing Optical Character Recognition (OCR). However, most of the OCR software that is currently popular still requires the internet t...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/79657 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:79657 |
---|---|
spelling |
id-itb.:796572024-01-15T07:54:16ZDEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION Fakhiri Setiawan, Harith Indonesia Final Project Keywords: Digital Archiving, OCR, NPWP, text recognition, text detection, NER tagging, model integration, backend. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/79657 Digitalization of archiving is a good solution for handling the current needs of archiving. One method that can help in the digitalization of archiving is by implementing Optical Character Recognition (OCR). However, most of the OCR software that is currently popular still requires the internet to use. In fact, internet network access in Indonesia is still not evenly distributed. In addition, the need for OCR software to help digitalize formal document archiving continues to increase, for example digitizing formal taxpayer identification number document. Therefore, it is necessary to develop a system for understanding formal taxpayer identification number documents that takes into account the internet network access factor of the Indonesian people. The NPWP formal document understanding system consists of 5 stages, namely image pre-processing, text recognition, re-alignment, text detection, and NER Tagging. The models used in the understanding system is DB++ ResNet50 for text detection with an F 1 3 score of 93.4%, PP-OCRv3 for text recognition with a CER of 7.87%, and the NER model from spaCy for NER Tagging with an accuration of 100%. The models are selected based on their performance which is tested on specific metrics for each model, as well as size and inference time. The test results show that the model used in the system have good performances and are successfully integrated into the OCR software created. In addition, a strategy to minimize internet usage is also implemented using a backend system in the form of REST API using the Flask framework. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Digitalization of archiving is a good solution for handling the current needs of archiving.
One method that can help in the digitalization of archiving is by implementing Optical
Character Recognition (OCR). However, most of the OCR software that is currently popular
still requires the internet to use. In fact, internet network access in Indonesia is still not
evenly distributed. In addition, the need for OCR software to help digitalize formal document
archiving continues to increase, for example digitizing formal taxpayer identification number
document. Therefore, it is necessary to develop a system for understanding formal taxpayer
identification number documents that takes into account the internet network access factor of
the Indonesian people. The NPWP formal document understanding system consists of 5
stages, namely image pre-processing, text recognition, re-alignment, text detection, and NER
Tagging. The models used in the understanding system is DB++ ResNet50 for text detection
with an F
1
3
score of 93.4%, PP-OCRv3 for text recognition with a CER of 7.87%, and the
NER model from spaCy for NER Tagging with an accuration of 100%. The models are
selected based on their performance which is tested on specific metrics for each model, as
well as size and inference time. The test results show that the model used in the system have
good performances and are successfully integrated into the OCR software created. In
addition, a strategy to minimize internet usage is also implemented using a backend system in
the form of REST API using the Flask framework. |
format |
Final Project |
author |
Fakhiri Setiawan, Harith |
spellingShingle |
Fakhiri Setiawan, Harith DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION |
author_facet |
Fakhiri Setiawan, Harith |
author_sort |
Fakhiri Setiawan, Harith |
title |
DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION |
title_short |
DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION |
title_full |
DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION |
title_fullStr |
DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION |
title_full_unstemmed |
DEVELOPMENT OF FORMAL TAXPAYER IDENTIFICATION NUMBER DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE BASED IMPLEMENTATION |
title_sort |
development of formal taxpayer identification number document understanding system with optical character recognition models for mobile based implementation |
url |
https://digilib.itb.ac.id/gdl/view/79657 |
_version_ |
1822281376238403584 |