TEXT CLASSIFICATION USING XLNET WITH INFOMAP AUTOMATIC LABELING PROCESS

Text data is growing rapidly and used in various fields such as chatbots and question answering, which are currently popular. Having good quality text data, especially in text classification, significantly affects the performance of the model. Manual labeling by humans, generally used in labeling...

Full description

Saved in:
Bibliographic Details
Main Author: Dewi Salma, Triana
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/55941
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:55941
spelling id-itb.:559412021-06-20T11:33:59ZTEXT CLASSIFICATION USING XLNET WITH INFOMAP AUTOMATIC LABELING PROCESS Dewi Salma, Triana Indonesia Theses natural language processing, text classification, chatbot, question answering, community detection INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/55941 Text data is growing rapidly and used in various fields such as chatbots and question answering, which are currently popular. Having good quality text data, especially in text classification, significantly affects the performance of the model. Manual labeling by humans, generally used in labeling training data in supervised learning, is expensive, prone to mistakes, and has a low quantity. Automatic labeling to provide high quality and high quantity of training data is necessary to improve text classification performance. This study attempts to conduct community detection with the Infomap algorithm for automatic labeling in text classification using XLNet. The accuracy of the model is compared to the baseline, which using data with manual labeling. Experiments were carried out with case studies on question answering data sets. The experimental process was carried out in two stages, namely automatic labeling of training data using Infomap community detection and text classification process. The results of the first stage, in the form of training data that are labeled based on the community are used for the second stage, namely text classification. Text classification is done using XLNet that has been pre-trained. Experiments were carried out with 3 scenarios to compare manual label data, bigram data and trigram data. Based on the experiments conducted, community detection testing can not only refer to modularity but is also influenced by the value of class split and class merge which has an impact on the quality of community detection performance and also classification. The results of the research conducted at the optimal threshold showed that the bigram data excels at the 6th epoch and the 10th epoch with an accuracy of 0.2766 and 0.355, while trigram data excels at the 10th epoch with an accuracy of 0.286. In addition, from this study it can be seen that automatic labeling can increase the average classification speed by 79.13% compared to manual data, although the decrease in accuracy obtained from the overall experiment is an average of 42.15%. This indicates that while the accuracy has not outperformed the overall baseline yet, but the result shows that automatic labeling can improve data labeling quickly with high quantity. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Text data is growing rapidly and used in various fields such as chatbots and question answering, which are currently popular. Having good quality text data, especially in text classification, significantly affects the performance of the model. Manual labeling by humans, generally used in labeling training data in supervised learning, is expensive, prone to mistakes, and has a low quantity. Automatic labeling to provide high quality and high quantity of training data is necessary to improve text classification performance. This study attempts to conduct community detection with the Infomap algorithm for automatic labeling in text classification using XLNet. The accuracy of the model is compared to the baseline, which using data with manual labeling. Experiments were carried out with case studies on question answering data sets. The experimental process was carried out in two stages, namely automatic labeling of training data using Infomap community detection and text classification process. The results of the first stage, in the form of training data that are labeled based on the community are used for the second stage, namely text classification. Text classification is done using XLNet that has been pre-trained. Experiments were carried out with 3 scenarios to compare manual label data, bigram data and trigram data. Based on the experiments conducted, community detection testing can not only refer to modularity but is also influenced by the value of class split and class merge which has an impact on the quality of community detection performance and also classification. The results of the research conducted at the optimal threshold showed that the bigram data excels at the 6th epoch and the 10th epoch with an accuracy of 0.2766 and 0.355, while trigram data excels at the 10th epoch with an accuracy of 0.286. In addition, from this study it can be seen that automatic labeling can increase the average classification speed by 79.13% compared to manual data, although the decrease in accuracy obtained from the overall experiment is an average of 42.15%. This indicates that while the accuracy has not outperformed the overall baseline yet, but the result shows that automatic labeling can improve data labeling quickly with high quantity.
format Theses
author Dewi Salma, Triana
spellingShingle Dewi Salma, Triana
TEXT CLASSIFICATION USING XLNET WITH INFOMAP AUTOMATIC LABELING PROCESS
author_facet Dewi Salma, Triana
author_sort Dewi Salma, Triana
title TEXT CLASSIFICATION USING XLNET WITH INFOMAP AUTOMATIC LABELING PROCESS
title_short TEXT CLASSIFICATION USING XLNET WITH INFOMAP AUTOMATIC LABELING PROCESS
title_full TEXT CLASSIFICATION USING XLNET WITH INFOMAP AUTOMATIC LABELING PROCESS
title_fullStr TEXT CLASSIFICATION USING XLNET WITH INFOMAP AUTOMATIC LABELING PROCESS
title_full_unstemmed TEXT CLASSIFICATION USING XLNET WITH INFOMAP AUTOMATIC LABELING PROCESS
title_sort text classification using xlnet with infomap automatic labeling process
url https://digilib.itb.ac.id/gdl/view/55941
_version_ 1822002211796811776