TEXT CLASSIFICATION FOR IDENTIFYING COST COMPONENTS IN FINANCIAL TRANSACTIONS OF ELECTRICITY SUPPLY USING MACHINE LEARNING

To ensure accountability in the proper distribution of electricity subsidies, PT PLN (Persero) is required to periodically report the cost components incurred in delivering electricity distribution, referred to as the Basic Cost of Electricity Supply (Biaya Pokok Penyediaan or BPP) components, to...

Full description

Saved in:
Bibliographic Details
Main Author: Nirmalasari, Listyani
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/86928
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:86928
spelling id-itb.:869282025-01-07T08:38:02ZTEXT CLASSIFICATION FOR IDENTIFYING COST COMPONENTS IN FINANCIAL TRANSACTIONS OF ELECTRICITY SUPPLY USING MACHINE LEARNING Nirmalasari, Listyani Indonesia Theses CNN, Random Forest, SMOTE, Text classification INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/86928 To ensure accountability in the proper distribution of electricity subsidies, PT PLN (Persero) is required to periodically report the cost components incurred in delivering electricity distribution, referred to as the Basic Cost of Electricity Supply (Biaya Pokok Penyediaan or BPP) components, to the government. In this process, PT PLN (Persero) must identify BPP components (also known as Allowable Costs) and non-BPP components (also known as Non-Allowable Costs) from financial transactions stored in the company’s financial system. Currently, the classification of BPP and non-BPP components is conducted manually, which requires significant resources. The financial transaction data used for this identification consists of account codes and transaction description texts. To enhance efficiency in the process of identifying BPP and non-BPP components in large financial transaction datasets, the author proposes a machine learningbased text classification model. The data used will include financial transactions from January to December 2023 for model development and evaluation, and transactions from January to March 2024 for predictions using the developed model. The data is categorized into three classes: AC, NAC, and PROP (with the PROP class representing transactions with a proportional value relative to NAC). The financial transaction data contains unstructured free text, characterized by diverse formats, a mix of formal and informal language, and the use of abbreviations, necessitating preprocessing before analysis. The preprocessing steps to be employed include case folding, noise removal, tokenization, stop word removal, spell checking, and word representation. Furthermore, the transaction data used in this study exhibits imbalanced data characteristics, where the dataset's classes are unevenly distributed. This requires additional methods to address potential bias toward the majority class in machine learning results. To address this issue, the Synthetic Minority Oversampling Technique (SMOTE) is applied to improve the accuracy of machine learning predictions. The study involves two scenarios to compare model performance: the first scenario uses the Random Forest method, while the second scenario uses the CNN method. Findings reveal that the SMOTE-Random Forest model outperforms the SMOTECNN model, achieving an accuracy of 97% and an AUC of 0.9871. When applied to new data, the model demonstrates an accuracy of 85%. The implementation of machine learning for classifying financial transactions to determine BPP and NonBPP components in electricity subsidies significantly improves time efficiency. The model can process data faster than manual classification methods. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description To ensure accountability in the proper distribution of electricity subsidies, PT PLN (Persero) is required to periodically report the cost components incurred in delivering electricity distribution, referred to as the Basic Cost of Electricity Supply (Biaya Pokok Penyediaan or BPP) components, to the government. In this process, PT PLN (Persero) must identify BPP components (also known as Allowable Costs) and non-BPP components (also known as Non-Allowable Costs) from financial transactions stored in the company’s financial system. Currently, the classification of BPP and non-BPP components is conducted manually, which requires significant resources. The financial transaction data used for this identification consists of account codes and transaction description texts. To enhance efficiency in the process of identifying BPP and non-BPP components in large financial transaction datasets, the author proposes a machine learningbased text classification model. The data used will include financial transactions from January to December 2023 for model development and evaluation, and transactions from January to March 2024 for predictions using the developed model. The data is categorized into three classes: AC, NAC, and PROP (with the PROP class representing transactions with a proportional value relative to NAC). The financial transaction data contains unstructured free text, characterized by diverse formats, a mix of formal and informal language, and the use of abbreviations, necessitating preprocessing before analysis. The preprocessing steps to be employed include case folding, noise removal, tokenization, stop word removal, spell checking, and word representation. Furthermore, the transaction data used in this study exhibits imbalanced data characteristics, where the dataset's classes are unevenly distributed. This requires additional methods to address potential bias toward the majority class in machine learning results. To address this issue, the Synthetic Minority Oversampling Technique (SMOTE) is applied to improve the accuracy of machine learning predictions. The study involves two scenarios to compare model performance: the first scenario uses the Random Forest method, while the second scenario uses the CNN method. Findings reveal that the SMOTE-Random Forest model outperforms the SMOTECNN model, achieving an accuracy of 97% and an AUC of 0.9871. When applied to new data, the model demonstrates an accuracy of 85%. The implementation of machine learning for classifying financial transactions to determine BPP and NonBPP components in electricity subsidies significantly improves time efficiency. The model can process data faster than manual classification methods.
format Theses
author Nirmalasari, Listyani
spellingShingle Nirmalasari, Listyani
TEXT CLASSIFICATION FOR IDENTIFYING COST COMPONENTS IN FINANCIAL TRANSACTIONS OF ELECTRICITY SUPPLY USING MACHINE LEARNING
author_facet Nirmalasari, Listyani
author_sort Nirmalasari, Listyani
title TEXT CLASSIFICATION FOR IDENTIFYING COST COMPONENTS IN FINANCIAL TRANSACTIONS OF ELECTRICITY SUPPLY USING MACHINE LEARNING
title_short TEXT CLASSIFICATION FOR IDENTIFYING COST COMPONENTS IN FINANCIAL TRANSACTIONS OF ELECTRICITY SUPPLY USING MACHINE LEARNING
title_full TEXT CLASSIFICATION FOR IDENTIFYING COST COMPONENTS IN FINANCIAL TRANSACTIONS OF ELECTRICITY SUPPLY USING MACHINE LEARNING
title_fullStr TEXT CLASSIFICATION FOR IDENTIFYING COST COMPONENTS IN FINANCIAL TRANSACTIONS OF ELECTRICITY SUPPLY USING MACHINE LEARNING
title_full_unstemmed TEXT CLASSIFICATION FOR IDENTIFYING COST COMPONENTS IN FINANCIAL TRANSACTIONS OF ELECTRICITY SUPPLY USING MACHINE LEARNING
title_sort text classification for identifying cost components in financial transactions of electricity supply using machine learning
url https://digilib.itb.ac.id/gdl/view/86928
_version_ 1822999733075968000