TEXT CLASSIFICATION FOR IDENTIFYING COST COMPONENTS IN FINANCIAL TRANSACTIONS OF ELECTRICITY SUPPLY USING MACHINE LEARNING

To ensure accountability in the proper distribution of electricity subsidies, PT PLN (Persero) is required to periodically report the cost components incurred in delivering electricity distribution, referred to as the Basic Cost of Electricity Supply (Biaya Pokok Penyediaan or BPP) components, to...

Full description

Saved in:

Bibliographic Details
Main Author:	Nirmalasari, Listyani
Format:	Theses
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/86928
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:86928
spelling	id-itb.:869282025-01-07T08:38:02ZTEXT CLASSIFICATION FOR IDENTIFYING COST COMPONENTS IN FINANCIAL TRANSACTIONS OF ELECTRICITY SUPPLY USING MACHINE LEARNING Nirmalasari, Listyani Indonesia Theses CNN, Random Forest, SMOTE, Text classification INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/86928 To ensure accountability in the proper distribution of electricity subsidies, PT PLN (Persero) is required to periodically report the cost components incurred in delivering electricity distribution, referred to as the Basic Cost of Electricity Supply (Biaya Pokok Penyediaan or BPP) components, to the government. In this process, PT PLN (Persero) must identify BPP components (also known as Allowable Costs) and non-BPP components (also known as Non-Allowable Costs) from financial transactions stored in the company’s financial system. Currently, the classification of BPP and non-BPP components is conducted manually, which requires significant resources. The financial transaction data used for this identification consists of account codes and transaction description texts. To enhance efficiency in the process of identifying BPP and non-BPP components in large financial transaction datasets, the author proposes a machine learningbased text classification model. The data used will include financial transactions from January to December 2023 for model development and evaluation, and transactions from January to March 2024 for predictions using the developed model. The data is categorized into three classes: AC, NAC, and PROP (with the PROP class representing transactions with a proportional value relative to NAC). The financial transaction data contains unstructured free text, characterized by diverse formats, a mix of formal and informal language, and the use of abbreviations, necessitating preprocessing before analysis. The preprocessing steps to be employed include case folding, noise removal, tokenization, stop word removal, spell checking, and word representation. Furthermore, the transaction data used in this study exhibits imbalanced data characteristics, where the dataset's classes are unevenly distributed. This requires additional methods to address potential bias toward the majority class in machine learning results. To address this issue, the Synthetic Minority Oversampling Technique (SMOTE) is applied to improve the accuracy of machine learning predictions. The study involves two scenarios to compare model performance: the first scenario uses the Random Forest method, while the second scenario uses the CNN method. Findings reveal that the SMOTE-Random Forest model outperforms the SMOTECNN model, achieving an accuracy of 97% and an AUC of 0.9871. When applied to new data, the model demonstrates an accuracy of 85%. The implementation of machine learning for classifying financial transactions to determine BPP and NonBPP components in electricity subsidies significantly improves time efficiency. The model can process data faster than manual classification methods. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	To ensure accountability in the proper distribution of electricity subsidies, PT PLN (Persero) is required to periodically report the cost components incurred in delivering electricity distribution, referred to as the Basic Cost of Electricity Supply (Biaya Pokok Penyediaan or BPP) components, to the government. In this process, PT PLN (Persero) must identify BPP components (also known as Allowable Costs) and non-BPP components (also known as Non-Allowable Costs) from financial transactions stored in the company’s financial system. Currently, the classification of BPP and non-BPP components is conducted manually, which requires significant resources. The financial transaction data used for this identification consists of account codes and transaction description texts. To enhance efficiency in the process of identifying BPP and non-BPP components in large financial transaction datasets, the author proposes a machine learningbased text classification model. The data used will include financial transactions from January to December 2023 for model development and evaluation, and transactions from January to March 2024 for predictions using the developed model. The data is categorized into three classes: AC, NAC, and PROP (with the PROP class representing transactions with a proportional value relative to NAC). The financial transaction data contains unstructured free text, characterized by diverse formats, a mix of formal and informal language, and the use of abbreviations, necessitating preprocessing before analysis. The preprocessing steps to be employed include case folding, noise removal, tokenization, stop word removal, spell checking, and word representation. Furthermore, the transaction data used in this study exhibits imbalanced data characteristics, where the dataset's classes are unevenly distributed. This requires additional methods to address potential bias toward the majority class in machine learning results. To address this issue, the Synthetic Minority Oversampling Technique (SMOTE) is applied to improve the accuracy of machine learning predictions. The study involves two scenarios to compare model performance: the first scenario uses the Random Forest method, while the second scenario uses the CNN method. Findings reveal that the SMOTE-Random Forest model outperforms the SMOTECNN model, achieving an accuracy of 97% and an AUC of 0.9871. When applied to new data, the model demonstrates an accuracy of 85%. The implementation of machine learning for classifying financial transactions to determine BPP and NonBPP components in electricity subsidies significantly improves time efficiency. The model can process data faster than manual classification methods.
format	Theses
author	Nirmalasari, Listyani
spellingShingle	Nirmalasari, Listyani TEXT CLASSIFICATION FOR IDENTIFYING COST COMPONENTS IN FINANCIAL TRANSACTIONS OF ELECTRICITY SUPPLY USING MACHINE LEARNING
author_facet	Nirmalasari, Listyani
author_sort	Nirmalasari, Listyani
title	TEXT CLASSIFICATION FOR IDENTIFYING COST COMPONENTS IN FINANCIAL TRANSACTIONS OF ELECTRICITY SUPPLY USING MACHINE LEARNING
title_short	TEXT CLASSIFICATION FOR IDENTIFYING COST COMPONENTS IN FINANCIAL TRANSACTIONS OF ELECTRICITY SUPPLY USING MACHINE LEARNING
title_full	TEXT CLASSIFICATION FOR IDENTIFYING COST COMPONENTS IN FINANCIAL TRANSACTIONS OF ELECTRICITY SUPPLY USING MACHINE LEARNING
title_fullStr	TEXT CLASSIFICATION FOR IDENTIFYING COST COMPONENTS IN FINANCIAL TRANSACTIONS OF ELECTRICITY SUPPLY USING MACHINE LEARNING
title_full_unstemmed	TEXT CLASSIFICATION FOR IDENTIFYING COST COMPONENTS IN FINANCIAL TRANSACTIONS OF ELECTRICITY SUPPLY USING MACHINE LEARNING
title_sort	text classification for identifying cost components in financial transactions of electricity supply using machine learning
url	https://digilib.itb.ac.id/gdl/view/86928
_version_	1822999733075968000

TEXT CLASSIFICATION FOR IDENTIFYING COST COMPONENTS IN FINANCIAL TRANSACTIONS OF ELECTRICITY SUPPLY USING MACHINE LEARNING

Similar Items