TEXT CLASSIFICATION FOR IDENTIFYING COST COMPONENTS IN FINANCIAL TRANSACTIONS OF ELECTRICITY SUPPLY USING MACHINE LEARNING

To ensure accountability in the proper distribution of electricity subsidies, PT PLN (Persero) is required to periodically report the cost components incurred in delivering electricity distribution, referred to as the Basic Cost of Electricity Supply (Biaya Pokok Penyediaan or BPP) components, to...

Full description

Saved in:
Bibliographic Details
Main Author: Nirmalasari, Listyani
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/86928
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:To ensure accountability in the proper distribution of electricity subsidies, PT PLN (Persero) is required to periodically report the cost components incurred in delivering electricity distribution, referred to as the Basic Cost of Electricity Supply (Biaya Pokok Penyediaan or BPP) components, to the government. In this process, PT PLN (Persero) must identify BPP components (also known as Allowable Costs) and non-BPP components (also known as Non-Allowable Costs) from financial transactions stored in the company’s financial system. Currently, the classification of BPP and non-BPP components is conducted manually, which requires significant resources. The financial transaction data used for this identification consists of account codes and transaction description texts. To enhance efficiency in the process of identifying BPP and non-BPP components in large financial transaction datasets, the author proposes a machine learningbased text classification model. The data used will include financial transactions from January to December 2023 for model development and evaluation, and transactions from January to March 2024 for predictions using the developed model. The data is categorized into three classes: AC, NAC, and PROP (with the PROP class representing transactions with a proportional value relative to NAC). The financial transaction data contains unstructured free text, characterized by diverse formats, a mix of formal and informal language, and the use of abbreviations, necessitating preprocessing before analysis. The preprocessing steps to be employed include case folding, noise removal, tokenization, stop word removal, spell checking, and word representation. Furthermore, the transaction data used in this study exhibits imbalanced data characteristics, where the dataset's classes are unevenly distributed. This requires additional methods to address potential bias toward the majority class in machine learning results. To address this issue, the Synthetic Minority Oversampling Technique (SMOTE) is applied to improve the accuracy of machine learning predictions. The study involves two scenarios to compare model performance: the first scenario uses the Random Forest method, while the second scenario uses the CNN method. Findings reveal that the SMOTE-Random Forest model outperforms the SMOTECNN model, achieving an accuracy of 97% and an AUC of 0.9871. When applied to new data, the model demonstrates an accuracy of 85%. The implementation of machine learning for classifying financial transactions to determine BPP and NonBPP components in electricity subsidies significantly improves time efficiency. The model can process data faster than manual classification methods.