DEVELOPMENT OF OCR MODULE FOR SCANNING KTP AND DEVELOPMENT OF NER MODULE FOR RECOGNIZING KTP, SIM, AND KK ENTITIES
Official documents such as ID cards (KTP), driver's licenses (SIM), and family cards (KK) are crucial for digitalization to streamline data input, information retrieval, and data analysis processes. However, challenges arise due to uneven internet access in Indonesia. Therefore, the objectiv...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/78178 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:78178 |
---|---|
spelling |
id-itb.:781782023-09-18T10:40:32ZDEVELOPMENT OF OCR MODULE FOR SCANNING KTP AND DEVELOPMENT OF NER MODULE FOR RECOGNIZING KTP, SIM, AND KK ENTITIES Bernadetha Marbun, Sharon Indonesia Final Project document digitalization, OCR (Optical Character Recognition), text detection, text recognition, NER (Named Entity Recognition). INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/78178 Official documents such as ID cards (KTP), driver's licenses (SIM), and family cards (KK) are crucial for digitalization to streamline data input, information retrieval, and data analysis processes. However, challenges arise due to uneven internet access in Indonesia. Therefore, the objective of this capstone project is to develop a mobile-based OCR application capable of working offline to digitize KTP, SIM, and KK documents. With task distribution among capstone team members, the primary focus of this final project is the development of OCR modules, including text detection and text recognition modules, to read KTP documents. Additionally, an NER module is developed to transform the output from the OCR module into structured data through entity recognition. The text detection and text recognition modules for KTP are developed by selecting the best pre-trained model based on benchmarking and then training this model using KTP datasets. The trained models are subsequently evaluated and converted to a mobile format for deployment purposes. The model chosen for text detection is the DB model with the MobileNetV3 backbone. Evaluation results indicate that the trained text detection model performs with high precision (98.73%), recall (97.5%), hmean (98.11%), a size of 2.26 MB, and an inference time of 2.0129 seconds. The text recognition model selected is the SVTR model with the SVTR-Tiny backbone, which demonstrates good performance and efficiency, with an accuracy of 99.37%, a size of 8.85 MB, and an inference time of 1.4201 seconds. The NER module for recognizing entities in the OCR output of KTP, SIM, and KK documents is developed separately using lexicon-based and rule-based approaches. The lexicons and rules used are made according to the characteristics of each document. Evaluation results indicate that the NER module performs well in recognizing entities in the OCR output of all three documents, achieving 100% accuracy for each entity. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Official documents such as ID cards (KTP), driver's licenses (SIM), and family cards (KK)
are crucial for digitalization to streamline data input, information retrieval, and data analysis
processes. However, challenges arise due to uneven internet access in Indonesia. Therefore,
the objective of this capstone project is to develop a mobile-based OCR application capable
of working offline to digitize KTP, SIM, and KK documents.
With task distribution among capstone team members, the primary focus of this final project
is the development of OCR modules, including text detection and text recognition modules,
to read KTP documents. Additionally, an NER module is developed to transform the output
from the OCR module into structured data through entity recognition.
The text detection and text recognition modules for KTP are developed by selecting the best
pre-trained model based on benchmarking and then training this model using KTP datasets.
The trained models are subsequently evaluated and converted to a mobile format for
deployment purposes. The model chosen for text detection is the DB model with the
MobileNetV3 backbone. Evaluation results indicate that the trained text detection model
performs with high precision (98.73%), recall (97.5%), hmean (98.11%), a size of 2.26 MB,
and an inference time of 2.0129 seconds. The text recognition model selected is the SVTR
model with the SVTR-Tiny backbone, which demonstrates good performance and efficiency,
with an accuracy of 99.37%, a size of 8.85 MB, and an inference time of 1.4201 seconds.
The NER module for recognizing entities in the OCR output of KTP, SIM, and KK
documents is developed separately using lexicon-based and rule-based approaches. The
lexicons and rules used are made according to the characteristics of each document.
Evaluation results indicate that the NER module performs well in recognizing entities in the
OCR output of all three documents, achieving 100% accuracy for each entity. |
format |
Final Project |
author |
Bernadetha Marbun, Sharon |
spellingShingle |
Bernadetha Marbun, Sharon DEVELOPMENT OF OCR MODULE FOR SCANNING KTP AND DEVELOPMENT OF NER MODULE FOR RECOGNIZING KTP, SIM, AND KK ENTITIES |
author_facet |
Bernadetha Marbun, Sharon |
author_sort |
Bernadetha Marbun, Sharon |
title |
DEVELOPMENT OF OCR MODULE FOR SCANNING KTP AND DEVELOPMENT OF NER MODULE FOR RECOGNIZING KTP, SIM, AND KK ENTITIES |
title_short |
DEVELOPMENT OF OCR MODULE FOR SCANNING KTP AND DEVELOPMENT OF NER MODULE FOR RECOGNIZING KTP, SIM, AND KK ENTITIES |
title_full |
DEVELOPMENT OF OCR MODULE FOR SCANNING KTP AND DEVELOPMENT OF NER MODULE FOR RECOGNIZING KTP, SIM, AND KK ENTITIES |
title_fullStr |
DEVELOPMENT OF OCR MODULE FOR SCANNING KTP AND DEVELOPMENT OF NER MODULE FOR RECOGNIZING KTP, SIM, AND KK ENTITIES |
title_full_unstemmed |
DEVELOPMENT OF OCR MODULE FOR SCANNING KTP AND DEVELOPMENT OF NER MODULE FOR RECOGNIZING KTP, SIM, AND KK ENTITIES |
title_sort |
development of ocr module for scanning ktp and development of ner module for recognizing ktp, sim, and kk entities |
url |
https://digilib.itb.ac.id/gdl/view/78178 |
_version_ |
1822995651317727232 |