DEVELOPMENT OF FORMAL BANK STATEMENT DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE APPLICATION

Digital document archiving overcomes the limitations of physical document quality and facilitates information processing. The digitalization process can be aided by Optical Character Recognition (OCR) systems. However, integrated OCR software for formal documents, particularly bank statements, co...

Full description

Saved in:
Bibliographic Details
Main Author: Aisha Geubrina Yasmin, Syarifah
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/78172
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Digital document archiving overcomes the limitations of physical document quality and facilitates information processing. The digitalization process can be aided by Optical Character Recognition (OCR) systems. However, integrated OCR software for formal documents, particularly bank statements, considering internet access in Indonesia, has not been widely developed. Therefore, a formal bank statement document understanding system is needed to support this. The formal bank statement document understanding system comprises 5 stages: image preprocessing, text detection, re-alignment, text recognition, and NER tagging to extract essential information from bank statements. The models used for text detection, text recognition, and NER tagging are PP-OCRv3 with an F1/3 Score of 93.8%, SVTR with a CER of 5.629%, and spaCy's NER model with an accuracy of 100% (BCA) and 99% (BNI). These models are generated through retraining on pre-trained models using synthesized bank statement data. Performance testing for each model is based on evaluation metrics specific to each model, as well as size and inference time. Additionally, in an effort to minimize internet usage, the strategy employed is the implementation of a backend system in the form of an API using the Flask framework.