DEVELOPMENT OF OCR MODULE FOR SCANNING KTP AND DEVELOPMENT OF NER MODULE FOR RECOGNIZING KTP, SIM, AND KK ENTITIES

Official documents such as ID cards (KTP), driver's licenses (SIM), and family cards (KK) are crucial for digitalization to streamline data input, information retrieval, and data analysis processes. However, challenges arise due to uneven internet access in Indonesia. Therefore, the objectiv...

Full description

Saved in:
Bibliographic Details
Main Author: Bernadetha Marbun, Sharon
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/78178
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:78178
spelling id-itb.:781782023-09-18T10:40:32ZDEVELOPMENT OF OCR MODULE FOR SCANNING KTP AND DEVELOPMENT OF NER MODULE FOR RECOGNIZING KTP, SIM, AND KK ENTITIES Bernadetha Marbun, Sharon Indonesia Final Project document digitalization, OCR (Optical Character Recognition), text detection, text recognition, NER (Named Entity Recognition). INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/78178 Official documents such as ID cards (KTP), driver's licenses (SIM), and family cards (KK) are crucial for digitalization to streamline data input, information retrieval, and data analysis processes. However, challenges arise due to uneven internet access in Indonesia. Therefore, the objective of this capstone project is to develop a mobile-based OCR application capable of working offline to digitize KTP, SIM, and KK documents. With task distribution among capstone team members, the primary focus of this final project is the development of OCR modules, including text detection and text recognition modules, to read KTP documents. Additionally, an NER module is developed to transform the output from the OCR module into structured data through entity recognition. The text detection and text recognition modules for KTP are developed by selecting the best pre-trained model based on benchmarking and then training this model using KTP datasets. The trained models are subsequently evaluated and converted to a mobile format for deployment purposes. The model chosen for text detection is the DB model with the MobileNetV3 backbone. Evaluation results indicate that the trained text detection model performs with high precision (98.73%), recall (97.5%), hmean (98.11%), a size of 2.26 MB, and an inference time of 2.0129 seconds. The text recognition model selected is the SVTR model with the SVTR-Tiny backbone, which demonstrates good performance and efficiency, with an accuracy of 99.37%, a size of 8.85 MB, and an inference time of 1.4201 seconds. The NER module for recognizing entities in the OCR output of KTP, SIM, and KK documents is developed separately using lexicon-based and rule-based approaches. The lexicons and rules used are made according to the characteristics of each document. Evaluation results indicate that the NER module performs well in recognizing entities in the OCR output of all three documents, achieving 100% accuracy for each entity. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Official documents such as ID cards (KTP), driver's licenses (SIM), and family cards (KK) are crucial for digitalization to streamline data input, information retrieval, and data analysis processes. However, challenges arise due to uneven internet access in Indonesia. Therefore, the objective of this capstone project is to develop a mobile-based OCR application capable of working offline to digitize KTP, SIM, and KK documents. With task distribution among capstone team members, the primary focus of this final project is the development of OCR modules, including text detection and text recognition modules, to read KTP documents. Additionally, an NER module is developed to transform the output from the OCR module into structured data through entity recognition. The text detection and text recognition modules for KTP are developed by selecting the best pre-trained model based on benchmarking and then training this model using KTP datasets. The trained models are subsequently evaluated and converted to a mobile format for deployment purposes. The model chosen for text detection is the DB model with the MobileNetV3 backbone. Evaluation results indicate that the trained text detection model performs with high precision (98.73%), recall (97.5%), hmean (98.11%), a size of 2.26 MB, and an inference time of 2.0129 seconds. The text recognition model selected is the SVTR model with the SVTR-Tiny backbone, which demonstrates good performance and efficiency, with an accuracy of 99.37%, a size of 8.85 MB, and an inference time of 1.4201 seconds. The NER module for recognizing entities in the OCR output of KTP, SIM, and KK documents is developed separately using lexicon-based and rule-based approaches. The lexicons and rules used are made according to the characteristics of each document. Evaluation results indicate that the NER module performs well in recognizing entities in the OCR output of all three documents, achieving 100% accuracy for each entity.
format Final Project
author Bernadetha Marbun, Sharon
spellingShingle Bernadetha Marbun, Sharon
DEVELOPMENT OF OCR MODULE FOR SCANNING KTP AND DEVELOPMENT OF NER MODULE FOR RECOGNIZING KTP, SIM, AND KK ENTITIES
author_facet Bernadetha Marbun, Sharon
author_sort Bernadetha Marbun, Sharon
title DEVELOPMENT OF OCR MODULE FOR SCANNING KTP AND DEVELOPMENT OF NER MODULE FOR RECOGNIZING KTP, SIM, AND KK ENTITIES
title_short DEVELOPMENT OF OCR MODULE FOR SCANNING KTP AND DEVELOPMENT OF NER MODULE FOR RECOGNIZING KTP, SIM, AND KK ENTITIES
title_full DEVELOPMENT OF OCR MODULE FOR SCANNING KTP AND DEVELOPMENT OF NER MODULE FOR RECOGNIZING KTP, SIM, AND KK ENTITIES
title_fullStr DEVELOPMENT OF OCR MODULE FOR SCANNING KTP AND DEVELOPMENT OF NER MODULE FOR RECOGNIZING KTP, SIM, AND KK ENTITIES
title_full_unstemmed DEVELOPMENT OF OCR MODULE FOR SCANNING KTP AND DEVELOPMENT OF NER MODULE FOR RECOGNIZING KTP, SIM, AND KK ENTITIES
title_sort development of ocr module for scanning ktp and development of ner module for recognizing ktp, sim, and kk entities
url https://digilib.itb.ac.id/gdl/view/78178
_version_ 1822995651317727232