MULTI-LABEL CLASSIFICATION OF CUSTOMER COMPLAINTS USING FINE TUNING INDOBERT AND HANDLING IMBALANCED DATA
To enhance customer satisfaction and engagement, companies generally provide services such as Call Centers, Super Apps, and Social Media platforms (Twitter, Instagram, and Facebook) to handle customer complaints. However, the classification of complaints is often done manually by operators into Issu...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/87153 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:87153 |
---|---|
spelling |
id-itb.:871532025-01-14T09:15:27ZMULTI-LABEL CLASSIFICATION OF CUSTOMER COMPLAINTS USING FINE TUNING INDOBERT AND HANDLING IMBALANCED DATA Firmansyah, Adi Indonesia Theses Customer Complaints, Imbalance Dataset, Deep Learning.IndoBERT INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/87153 To enhance customer satisfaction and engagement, companies generally provide services such as Call Centers, Super Apps, and Social Media platforms (Twitter, Instagram, and Facebook) to handle customer complaints. However, the classification of complaints is often done manually by operators into Issue Types or complaint categories. These complaints can be classified into multiple Issue Types, which takes more time and may result in errors during the classification process. Additionally, there is an imbalance in the data across issue types, as the majority of reported complaints are related to service disruptions, with fewer complaints regarding other issues such as integrity reports or other issue types, leading to an imbalance in data distribution. This study aims to develop a model for handling imbalanced datasets in multi label text classification using a deep learning approach. The research methodology is based on the Cross-Industry Standard Process for Data Mining (CRISP-DM) framework. Three main approaches were considered during the modeling process: 1) a pipeline model using CNN as an encoder and XGBoost as a decoder, combined with word embeddings; 2) an end-to-end model with fine-tuned IndoBERT; and 3) an end-to-end model with fine-tuned IndoBERTweet. The dataset used in this study comprises 378,382 customer complaints collected from Twitter during the period from January 1, 2023, to December 31, 2023. This study employed a combination of partial oversampling, partial undersampling, and class weighting to handle the data imbalance. The results of this study showing that the end-to-end fine-tuning IndoBERTweet model with combination of partial oversampling, partial undersampling, and class weigh, delivered the best performance with an accuracy of 0.86, an F1-Score of 0.56, and a Hamming Loss of 0.02. This model significantly outperformed the baseline IndoBERT-CNN-XGBoost model, which employed the same imbalance handling strategies but achieved a lower accuracy of 0.78, an F1-Score of 0.43, and a Hamming Loss of 0.03. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
To enhance customer satisfaction and engagement, companies generally provide services such as Call Centers, Super Apps, and Social Media platforms (Twitter, Instagram, and Facebook) to handle customer complaints. However, the classification of complaints is often done manually by operators into Issue Types or complaint categories. These complaints can be classified into multiple Issue Types, which takes more time and may result in errors during the classification process. Additionally, there is an imbalance in the data across issue types, as the majority of reported complaints are related to service disruptions, with fewer complaints regarding other issues such as integrity reports or other issue types, leading to an imbalance in data distribution. This study aims to develop a model for handling imbalanced datasets in multi label text classification using a deep learning approach.
The research methodology is based on the Cross-Industry Standard Process for Data Mining (CRISP-DM) framework. Three main approaches were considered during the modeling process: 1) a pipeline model using CNN as an encoder and XGBoost as a decoder, combined with word embeddings; 2) an end-to-end model with fine-tuned IndoBERT; and 3) an end-to-end model with fine-tuned IndoBERTweet. The dataset used in this study comprises 378,382 customer complaints collected from Twitter during the period from January 1, 2023, to December 31, 2023. This study employed a combination of partial oversampling, partial undersampling, and class weighting to handle the data imbalance.
The results of this study showing that the end-to-end fine-tuning IndoBERTweet model with combination of partial oversampling, partial undersampling, and class weigh, delivered the best performance with an accuracy of 0.86, an F1-Score of 0.56, and a Hamming Loss of 0.02. This model significantly outperformed the baseline IndoBERT-CNN-XGBoost model, which employed the same imbalance handling strategies but achieved a lower accuracy of 0.78, an F1-Score of 0.43, and a Hamming Loss of 0.03. |
format |
Theses |
author |
Firmansyah, Adi |
spellingShingle |
Firmansyah, Adi MULTI-LABEL CLASSIFICATION OF CUSTOMER COMPLAINTS USING FINE TUNING INDOBERT AND HANDLING IMBALANCED DATA |
author_facet |
Firmansyah, Adi |
author_sort |
Firmansyah, Adi |
title |
MULTI-LABEL CLASSIFICATION OF CUSTOMER COMPLAINTS USING FINE TUNING INDOBERT AND HANDLING IMBALANCED DATA |
title_short |
MULTI-LABEL CLASSIFICATION OF CUSTOMER COMPLAINTS USING FINE TUNING INDOBERT AND HANDLING IMBALANCED DATA |
title_full |
MULTI-LABEL CLASSIFICATION OF CUSTOMER COMPLAINTS USING FINE TUNING INDOBERT AND HANDLING IMBALANCED DATA |
title_fullStr |
MULTI-LABEL CLASSIFICATION OF CUSTOMER COMPLAINTS USING FINE TUNING INDOBERT AND HANDLING IMBALANCED DATA |
title_full_unstemmed |
MULTI-LABEL CLASSIFICATION OF CUSTOMER COMPLAINTS USING FINE TUNING INDOBERT AND HANDLING IMBALANCED DATA |
title_sort |
multi-label classification of customer complaints using fine tuning indobert and handling imbalanced data |
url |
https://digilib.itb.ac.id/gdl/view/87153 |
_version_ |
1822011292865527808 |