DEVELOPMENT OF FORMAL BANK STATEMENT DOCUMENT UNDERSTANDING SYSTEM WITH OPTICAL CHARACTER RECOGNITION MODELS FOR MOBILE APPLICATION
Digital document archiving overcomes the limitations of physical document quality and facilitates information processing. The digitalization process can be aided by Optical Character Recognition (OCR) systems. However, integrated OCR software for formal documents, particularly bank statements, co...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/78172 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Digital document archiving overcomes the limitations of physical document quality
and facilitates information processing. The digitalization process can be aided by
Optical Character Recognition (OCR) systems. However, integrated OCR software
for formal documents, particularly bank statements, considering internet access in
Indonesia, has not been widely developed. Therefore, a formal bank statement
document understanding system is needed to support this. The formal bank
statement document understanding system comprises 5 stages: image
preprocessing, text detection, re-alignment, text recognition, and NER tagging to
extract essential information from bank statements. The models used for text
detection, text recognition, and NER tagging are PP-OCRv3 with an F1/3 Score of
93.8%, SVTR with a CER of 5.629%, and spaCy's NER model with an accuracy of
100% (BCA) and 99% (BNI). These models are generated through retraining on
pre-trained models using synthesized bank statement data. Performance testing for
each model is based on evaluation metrics specific to each model, as well as size
and inference time. Additionally, in an effort to minimize internet usage, the
strategy employed is the implementation of a backend system in the form of an API
using the Flask framework. |
---|