Performance improvement of poem genre classification using a combination of SMOTE and support vector machine

Text classification aims to be able to classify documents into the correct class. In this study, the use of SVM to perform text classification. The existence of this research is to find out how good the performance produced by SVM is in classifying text. A total of 841 poetry data were categorized i...

Full description

Saved in:

Bibliographic Details
Main Authors:	Quratu Aini, Quratu Aini, Muljono, Muljono, Yakub, Fitri
Format:	Conference or Workshop Item
Published:	2023
Subjects:	T Technology (General)
Online Access:	http://eprints.utm.my/107699/ http://dx.doi.org/10.1109/iSemantic59612.2023.10295293
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Teknologi Malaysia

id	my.utm.107699
record_format	eprints
spelling	my.utm.1076992024-10-02T06:29:29Z http://eprints.utm.my/107699/ Performance improvement of poem genre classification using a combination of SMOTE and support vector machine Quratu Aini, Quratu Aini Muljono, Muljono Yakub, Fitri T Technology (General) Text classification aims to be able to classify documents into the correct class. In this study, the use of SVM to perform text classification. The existence of this research is to find out how good the performance produced by SVM is in classifying text. A total of 841 poetry data were categorized into 4 genres namely affection, death, environment, and music. The data is cleaned at the text preprocessing stage with the stages of case folding (lowercase, remove punctuation and whitespace), tokenization, stopword removal, and lemmatizing. Feature extraction using Bag of Word (BoW) produces 6860 features. Features resulting from BoW will be weighted using TF - IDF. Data separation is carried out with a separation ratio of 80:20, 70:30, and 60:40. There is data imbalance, so it needs to be balanced. In this research, data balancing is done using SMOTE. Data separation is done for original data, balancing result data, and balancing result data with PCA. The highest accuracy result of train data is obtained by balancing data with a 60:40 separation of 97%. While the highest test data accuracy result is obtained by balancing data with an 80:20 separation of 87%. Thus the highest accuracy of each train data and test data is obtained by balancing data. 2023 Conference or Workshop Item PeerReviewed Quratu Aini, Quratu Aini and Muljono, Muljono and Yakub, Fitri (2023) Performance improvement of poem genre classification using a combination of SMOTE and support vector machine. In: 2023 International Seminar on Application for Technology of Information and Communication (iSemantic), 16 September 2023-17 September 2023, Semarang, Indonesia. http://dx.doi.org/10.1109/iSemantic59612.2023.10295293
institution	Universiti Teknologi Malaysia
building	UTM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Malaysia
content_source	UTM Institutional Repository
url_provider	http://eprints.utm.my/
topic	T Technology (General)
spellingShingle	T Technology (General) Quratu Aini, Quratu Aini Muljono, Muljono Yakub, Fitri Performance improvement of poem genre classification using a combination of SMOTE and support vector machine
description	Text classification aims to be able to classify documents into the correct class. In this study, the use of SVM to perform text classification. The existence of this research is to find out how good the performance produced by SVM is in classifying text. A total of 841 poetry data were categorized into 4 genres namely affection, death, environment, and music. The data is cleaned at the text preprocessing stage with the stages of case folding (lowercase, remove punctuation and whitespace), tokenization, stopword removal, and lemmatizing. Feature extraction using Bag of Word (BoW) produces 6860 features. Features resulting from BoW will be weighted using TF - IDF. Data separation is carried out with a separation ratio of 80:20, 70:30, and 60:40. There is data imbalance, so it needs to be balanced. In this research, data balancing is done using SMOTE. Data separation is done for original data, balancing result data, and balancing result data with PCA. The highest accuracy result of train data is obtained by balancing data with a 60:40 separation of 97%. While the highest test data accuracy result is obtained by balancing data with an 80:20 separation of 87%. Thus the highest accuracy of each train data and test data is obtained by balancing data.
format	Conference or Workshop Item
author	Quratu Aini, Quratu Aini Muljono, Muljono Yakub, Fitri
author_facet	Quratu Aini, Quratu Aini Muljono, Muljono Yakub, Fitri
author_sort	Quratu Aini, Quratu Aini
title	Performance improvement of poem genre classification using a combination of SMOTE and support vector machine
title_short	Performance improvement of poem genre classification using a combination of SMOTE and support vector machine
title_full	Performance improvement of poem genre classification using a combination of SMOTE and support vector machine
title_fullStr	Performance improvement of poem genre classification using a combination of SMOTE and support vector machine
title_full_unstemmed	Performance improvement of poem genre classification using a combination of SMOTE and support vector machine
title_sort	performance improvement of poem genre classification using a combination of smote and support vector machine
publishDate	2023
url	http://eprints.utm.my/107699/ http://dx.doi.org/10.1109/iSemantic59612.2023.10295293
_version_	1814043510103343104

Performance improvement of poem genre classification using a combination of SMOTE and support vector machine

Similar Items