TEXT CLASSIFICATION FOR IDENTIFYING COST COMPONENTS IN FINANCIAL TRANSACTIONS OF ELECTRICITY SUPPLY USING MACHINE LEARNING
To ensure accountability in the proper distribution of electricity subsidies, PT PLN (Persero) is required to periodically report the cost components incurred in delivering electricity distribution, referred to as the Basic Cost of Electricity Supply (Biaya Pokok Penyediaan or BPP) components, to...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/86928 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | To ensure accountability in the proper distribution of electricity subsidies, PT PLN
(Persero) is required to periodically report the cost components incurred in
delivering electricity distribution, referred to as the Basic Cost of Electricity Supply
(Biaya Pokok Penyediaan or BPP) components, to the government. In this process,
PT PLN (Persero) must identify BPP components (also known as Allowable Costs)
and non-BPP components (also known as Non-Allowable Costs) from financial
transactions stored in the company’s financial system. Currently, the classification
of BPP and non-BPP components is conducted manually, which requires
significant resources. The financial transaction data used for this identification
consists of account codes and transaction description texts.
To enhance efficiency in the process of identifying BPP and non-BPP components
in large financial transaction datasets, the author proposes a machine learningbased text classification model. The data used will include financial transactions
from January to December 2023 for model development and evaluation, and
transactions from January to March 2024 for predictions using the developed
model. The data is categorized into three classes: AC, NAC, and PROP (with the
PROP class representing transactions with a proportional value relative to NAC).
The financial transaction data contains unstructured free text, characterized by
diverse formats, a mix of formal and informal language, and the use of
abbreviations, necessitating preprocessing before analysis. The preprocessing
steps to be employed include case folding, noise removal, tokenization, stop word
removal, spell checking, and word representation.
Furthermore, the transaction data used in this study exhibits imbalanced data
characteristics, where the dataset's classes are unevenly distributed. This requires
additional methods to address potential bias toward the majority class in machine
learning results. To address this issue, the Synthetic Minority Oversampling
Technique (SMOTE) is applied to improve the accuracy of machine learning
predictions.
The study involves two scenarios to compare model performance: the first scenario
uses the Random Forest method, while the second scenario uses the CNN method. Findings reveal that the SMOTE-Random Forest model outperforms the SMOTECNN model, achieving an accuracy of 97% and an AUC of 0.9871. When applied
to new data, the model demonstrates an accuracy of 85%. The implementation of
machine learning for classifying financial transactions to determine BPP and NonBPP components in electricity subsidies significantly improves time efficiency. The
model can process data faster than manual classification methods.
|
---|