TEXT CLASSIFICATION USING XLNET WITH INFOMAP AUTOMATIC LABELING PROCESS
Text data is growing rapidly and used in various fields such as chatbots and question answering, which are currently popular. Having good quality text data, especially in text classification, significantly affects the performance of the model. Manual labeling by humans, generally used in labeling...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/55941 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:55941 |
---|---|
spelling |
id-itb.:559412021-06-20T11:33:59ZTEXT CLASSIFICATION USING XLNET WITH INFOMAP AUTOMATIC LABELING PROCESS Dewi Salma, Triana Indonesia Theses natural language processing, text classification, chatbot, question answering, community detection INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/55941 Text data is growing rapidly and used in various fields such as chatbots and question answering, which are currently popular. Having good quality text data, especially in text classification, significantly affects the performance of the model. Manual labeling by humans, generally used in labeling training data in supervised learning, is expensive, prone to mistakes, and has a low quantity. Automatic labeling to provide high quality and high quantity of training data is necessary to improve text classification performance. This study attempts to conduct community detection with the Infomap algorithm for automatic labeling in text classification using XLNet. The accuracy of the model is compared to the baseline, which using data with manual labeling. Experiments were carried out with case studies on question answering data sets. The experimental process was carried out in two stages, namely automatic labeling of training data using Infomap community detection and text classification process. The results of the first stage, in the form of training data that are labeled based on the community are used for the second stage, namely text classification. Text classification is done using XLNet that has been pre-trained. Experiments were carried out with 3 scenarios to compare manual label data, bigram data and trigram data. Based on the experiments conducted, community detection testing can not only refer to modularity but is also influenced by the value of class split and class merge which has an impact on the quality of community detection performance and also classification. The results of the research conducted at the optimal threshold showed that the bigram data excels at the 6th epoch and the 10th epoch with an accuracy of 0.2766 and 0.355, while trigram data excels at the 10th epoch with an accuracy of 0.286. In addition, from this study it can be seen that automatic labeling can increase the average classification speed by 79.13% compared to manual data, although the decrease in accuracy obtained from the overall experiment is an average of 42.15%. This indicates that while the accuracy has not outperformed the overall baseline yet, but the result shows that automatic labeling can improve data labeling quickly with high quantity. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Text data is growing rapidly and used in various fields such as chatbots and
question answering, which are currently popular. Having good quality text data,
especially in text classification, significantly affects the performance of the model.
Manual labeling by humans, generally used in labeling training data in supervised
learning, is expensive, prone to mistakes, and has a low quantity. Automatic
labeling to provide high quality and high quantity of training data is necessary to
improve text classification performance. This study attempts to conduct community
detection with the Infomap algorithm for automatic labeling in text classification
using XLNet. The accuracy of the model is compared to the baseline, which using
data with manual labeling. Experiments were carried out with case studies on
question answering data sets. The experimental process was carried out in two
stages, namely automatic labeling of training data using Infomap community
detection and text classification process. The results of the first stage, in the form
of training data that are labeled based on the community are used for the second
stage, namely text classification. Text classification is done using XLNet that has
been pre-trained. Experiments were carried out with 3 scenarios to compare
manual label data, bigram data and trigram data. Based on the experiments
conducted, community detection testing can not only refer to modularity but is also
influenced by the value of class split and class merge which has an impact on the
quality of community detection performance and also classification. The results of
the research conducted at the optimal threshold showed that the bigram data excels
at the 6th epoch and the 10th epoch with an accuracy of 0.2766 and 0.355, while
trigram data excels at the 10th epoch with an accuracy of 0.286. In addition, from
this study it can be seen that automatic labeling can increase the average
classification speed by 79.13% compared to manual data, although the decrease in
accuracy obtained from the overall experiment is an average of 42.15%. This
indicates that while the accuracy has not outperformed the overall baseline yet, but
the result shows that automatic labeling can improve data labeling quickly with
high quantity. |
format |
Theses |
author |
Dewi Salma, Triana |
spellingShingle |
Dewi Salma, Triana TEXT CLASSIFICATION USING XLNET WITH INFOMAP AUTOMATIC LABELING PROCESS |
author_facet |
Dewi Salma, Triana |
author_sort |
Dewi Salma, Triana |
title |
TEXT CLASSIFICATION USING XLNET WITH INFOMAP AUTOMATIC LABELING PROCESS |
title_short |
TEXT CLASSIFICATION USING XLNET WITH INFOMAP AUTOMATIC LABELING PROCESS |
title_full |
TEXT CLASSIFICATION USING XLNET WITH INFOMAP AUTOMATIC LABELING PROCESS |
title_fullStr |
TEXT CLASSIFICATION USING XLNET WITH INFOMAP AUTOMATIC LABELING PROCESS |
title_full_unstemmed |
TEXT CLASSIFICATION USING XLNET WITH INFOMAP AUTOMATIC LABELING PROCESS |
title_sort |
text classification using xlnet with infomap automatic labeling process |
url |
https://digilib.itb.ac.id/gdl/view/55941 |
_version_ |
1822002211796811776 |