MULTI-LABEL CLASSIFICATION OF CUSTOMER COMPLAINTS USING FINE TUNING INDOBERT AND HANDLING IMBALANCED DATA

To enhance customer satisfaction and engagement, companies generally provide services such as Call Centers, Super Apps, and Social Media platforms (Twitter, Instagram, and Facebook) to handle customer complaints. However, the classification of complaints is often done manually by operators into Issu...

Full description

Saved in:
Bibliographic Details
Main Author: Firmansyah, Adi
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/87153
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:87153
spelling id-itb.:871532025-01-14T09:15:27ZMULTI-LABEL CLASSIFICATION OF CUSTOMER COMPLAINTS USING FINE TUNING INDOBERT AND HANDLING IMBALANCED DATA Firmansyah, Adi Indonesia Theses Customer Complaints, Imbalance Dataset, Deep Learning.IndoBERT INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/87153 To enhance customer satisfaction and engagement, companies generally provide services such as Call Centers, Super Apps, and Social Media platforms (Twitter, Instagram, and Facebook) to handle customer complaints. However, the classification of complaints is often done manually by operators into Issue Types or complaint categories. These complaints can be classified into multiple Issue Types, which takes more time and may result in errors during the classification process. Additionally, there is an imbalance in the data across issue types, as the majority of reported complaints are related to service disruptions, with fewer complaints regarding other issues such as integrity reports or other issue types, leading to an imbalance in data distribution. This study aims to develop a model for handling imbalanced datasets in multi label text classification using a deep learning approach. The research methodology is based on the Cross-Industry Standard Process for Data Mining (CRISP-DM) framework. Three main approaches were considered during the modeling process: 1) a pipeline model using CNN as an encoder and XGBoost as a decoder, combined with word embeddings; 2) an end-to-end model with fine-tuned IndoBERT; and 3) an end-to-end model with fine-tuned IndoBERTweet. The dataset used in this study comprises 378,382 customer complaints collected from Twitter during the period from January 1, 2023, to December 31, 2023. This study employed a combination of partial oversampling, partial undersampling, and class weighting to handle the data imbalance. The results of this study showing that the end-to-end fine-tuning IndoBERTweet model with combination of partial oversampling, partial undersampling, and class weigh, delivered the best performance with an accuracy of 0.86, an F1-Score of 0.56, and a Hamming Loss of 0.02. This model significantly outperformed the baseline IndoBERT-CNN-XGBoost model, which employed the same imbalance handling strategies but achieved a lower accuracy of 0.78, an F1-Score of 0.43, and a Hamming Loss of 0.03. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description To enhance customer satisfaction and engagement, companies generally provide services such as Call Centers, Super Apps, and Social Media platforms (Twitter, Instagram, and Facebook) to handle customer complaints. However, the classification of complaints is often done manually by operators into Issue Types or complaint categories. These complaints can be classified into multiple Issue Types, which takes more time and may result in errors during the classification process. Additionally, there is an imbalance in the data across issue types, as the majority of reported complaints are related to service disruptions, with fewer complaints regarding other issues such as integrity reports or other issue types, leading to an imbalance in data distribution. This study aims to develop a model for handling imbalanced datasets in multi label text classification using a deep learning approach. The research methodology is based on the Cross-Industry Standard Process for Data Mining (CRISP-DM) framework. Three main approaches were considered during the modeling process: 1) a pipeline model using CNN as an encoder and XGBoost as a decoder, combined with word embeddings; 2) an end-to-end model with fine-tuned IndoBERT; and 3) an end-to-end model with fine-tuned IndoBERTweet. The dataset used in this study comprises 378,382 customer complaints collected from Twitter during the period from January 1, 2023, to December 31, 2023. This study employed a combination of partial oversampling, partial undersampling, and class weighting to handle the data imbalance. The results of this study showing that the end-to-end fine-tuning IndoBERTweet model with combination of partial oversampling, partial undersampling, and class weigh, delivered the best performance with an accuracy of 0.86, an F1-Score of 0.56, and a Hamming Loss of 0.02. This model significantly outperformed the baseline IndoBERT-CNN-XGBoost model, which employed the same imbalance handling strategies but achieved a lower accuracy of 0.78, an F1-Score of 0.43, and a Hamming Loss of 0.03.
format Theses
author Firmansyah, Adi
spellingShingle Firmansyah, Adi
MULTI-LABEL CLASSIFICATION OF CUSTOMER COMPLAINTS USING FINE TUNING INDOBERT AND HANDLING IMBALANCED DATA
author_facet Firmansyah, Adi
author_sort Firmansyah, Adi
title MULTI-LABEL CLASSIFICATION OF CUSTOMER COMPLAINTS USING FINE TUNING INDOBERT AND HANDLING IMBALANCED DATA
title_short MULTI-LABEL CLASSIFICATION OF CUSTOMER COMPLAINTS USING FINE TUNING INDOBERT AND HANDLING IMBALANCED DATA
title_full MULTI-LABEL CLASSIFICATION OF CUSTOMER COMPLAINTS USING FINE TUNING INDOBERT AND HANDLING IMBALANCED DATA
title_fullStr MULTI-LABEL CLASSIFICATION OF CUSTOMER COMPLAINTS USING FINE TUNING INDOBERT AND HANDLING IMBALANCED DATA
title_full_unstemmed MULTI-LABEL CLASSIFICATION OF CUSTOMER COMPLAINTS USING FINE TUNING INDOBERT AND HANDLING IMBALANCED DATA
title_sort multi-label classification of customer complaints using fine tuning indobert and handling imbalanced data
url https://digilib.itb.ac.id/gdl/view/87153
_version_ 1822011292865527808