Performance improvement of poem genre classification using a combination of SMOTE and support vector machine
Text classification aims to be able to classify documents into the correct class. In this study, the use of SVM to perform text classification. The existence of this research is to find out how good the performance produced by SVM is in classifying text. A total of 841 poetry data were categorized i...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference or Workshop Item |
Published: |
2023
|
Subjects: | |
Online Access: | http://eprints.utm.my/107699/ http://dx.doi.org/10.1109/iSemantic59612.2023.10295293 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Teknologi Malaysia |
id |
my.utm.107699 |
---|---|
record_format |
eprints |
spelling |
my.utm.1076992024-10-02T06:29:29Z http://eprints.utm.my/107699/ Performance improvement of poem genre classification using a combination of SMOTE and support vector machine Quratu Aini, Quratu Aini Muljono, Muljono Yakub, Fitri T Technology (General) Text classification aims to be able to classify documents into the correct class. In this study, the use of SVM to perform text classification. The existence of this research is to find out how good the performance produced by SVM is in classifying text. A total of 841 poetry data were categorized into 4 genres namely affection, death, environment, and music. The data is cleaned at the text preprocessing stage with the stages of case folding (lowercase, remove punctuation and whitespace), tokenization, stopword removal, and lemmatizing. Feature extraction using Bag of Word (BoW) produces 6860 features. Features resulting from BoW will be weighted using TF - IDF. Data separation is carried out with a separation ratio of 80:20, 70:30, and 60:40. There is data imbalance, so it needs to be balanced. In this research, data balancing is done using SMOTE. Data separation is done for original data, balancing result data, and balancing result data with PCA. The highest accuracy result of train data is obtained by balancing data with a 60:40 separation of 97%. While the highest test data accuracy result is obtained by balancing data with an 80:20 separation of 87%. Thus the highest accuracy of each train data and test data is obtained by balancing data. 2023 Conference or Workshop Item PeerReviewed Quratu Aini, Quratu Aini and Muljono, Muljono and Yakub, Fitri (2023) Performance improvement of poem genre classification using a combination of SMOTE and support vector machine. In: 2023 International Seminar on Application for Technology of Information and Communication (iSemantic), 16 September 2023-17 September 2023, Semarang, Indonesia. http://dx.doi.org/10.1109/iSemantic59612.2023.10295293 |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
topic |
T Technology (General) |
spellingShingle |
T Technology (General) Quratu Aini, Quratu Aini Muljono, Muljono Yakub, Fitri Performance improvement of poem genre classification using a combination of SMOTE and support vector machine |
description |
Text classification aims to be able to classify documents into the correct class. In this study, the use of SVM to perform text classification. The existence of this research is to find out how good the performance produced by SVM is in classifying text. A total of 841 poetry data were categorized into 4 genres namely affection, death, environment, and music. The data is cleaned at the text preprocessing stage with the stages of case folding (lowercase, remove punctuation and whitespace), tokenization, stopword removal, and lemmatizing. Feature extraction using Bag of Word (BoW) produces 6860 features. Features resulting from BoW will be weighted using TF - IDF. Data separation is carried out with a separation ratio of 80:20, 70:30, and 60:40. There is data imbalance, so it needs to be balanced. In this research, data balancing is done using SMOTE. Data separation is done for original data, balancing result data, and balancing result data with PCA. The highest accuracy result of train data is obtained by balancing data with a 60:40 separation of 97%. While the highest test data accuracy result is obtained by balancing data with an 80:20 separation of 87%. Thus the highest accuracy of each train data and test data is obtained by balancing data. |
format |
Conference or Workshop Item |
author |
Quratu Aini, Quratu Aini Muljono, Muljono Yakub, Fitri |
author_facet |
Quratu Aini, Quratu Aini Muljono, Muljono Yakub, Fitri |
author_sort |
Quratu Aini, Quratu Aini |
title |
Performance improvement of poem genre classification using a combination of SMOTE and support vector machine |
title_short |
Performance improvement of poem genre classification using a combination of SMOTE and support vector machine |
title_full |
Performance improvement of poem genre classification using a combination of SMOTE and support vector machine |
title_fullStr |
Performance improvement of poem genre classification using a combination of SMOTE and support vector machine |
title_full_unstemmed |
Performance improvement of poem genre classification using a combination of SMOTE and support vector machine |
title_sort |
performance improvement of poem genre classification using a combination of smote and support vector machine |
publishDate |
2023 |
url |
http://eprints.utm.my/107699/ http://dx.doi.org/10.1109/iSemantic59612.2023.10295293 |
_version_ |
1814043510103343104 |