The Evaluation of Accuracy Performance in an Enhanced Embedded Feature Selection for Unstructured Text Classification

Text documents are unstructured and high dimensional. Effective feature selection is required to select the most important and significant feature from the sparse feature space. Thus, this paper proposed an embedded feature selection technique based on Term Frequency-Inverse Document Frequency (TF-I...

Full description

Saved in:
Bibliographic Details
Main Authors: Nur Syafiqah, Mohd Nafis, Suryanti, Awang
Format: Article
Language:English
Published: University of Baghdad-College of Science 2020
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/30746/8/The%20Evaluation%20of%20Accuracy%20Performance.pdf
http://umpir.ump.edu.my/id/eprint/30746/
https://doi.org/10.24996/ijs.2020.61.12.28
https://doi.org/10.24996/ijs.2020.61.12.28
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaysia Pahang
Language: English
id my.ump.umpir.30746
record_format eprints
spelling my.ump.umpir.307462021-02-25T03:14:50Z http://umpir.ump.edu.my/id/eprint/30746/ The Evaluation of Accuracy Performance in an Enhanced Embedded Feature Selection for Unstructured Text Classification Nur Syafiqah, Mohd Nafis Suryanti, Awang T Technology (General) Text documents are unstructured and high dimensional. Effective feature selection is required to select the most important and significant feature from the sparse feature space. Thus, this paper proposed an embedded feature selection technique based on Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) for unstructured and high dimensional text classificationhis technique has the ability to measure the feature’s importance in a high-dimensional text document. In addition, it aims to increase the efficiency of the feature selection. Hence, obtaining a promising text classification accuracy. TF-IDF act as a filter approach which measures features importance of the text documents at the first stage. SVM-RFE utilized a backward feature elimination scheme to recursively remove insignificant features from the filtered feature subsets at the second stage. This research executes sets of experiments using a text document retrieved from a benchmark repository comprising a collection of Twitter posts. Pre-processing processes are applied to extract relevant features. After that, the pre-processed features are divided into training and testing datasets. Next, feature selection is implemented on the training dataset by calculating the TF-IDF score for each feature. SVM-RFE is applied for feature ranking as the next feature selection step. Only top-rank features will be selected for text classification using the SVM classifier. Based on the experiments, it shows that the proposed technique able to achieve 98% accuracy that outperformed other existing techniques. In conclusion, the proposed technique able to select the significant features in the unstructured and high dimensional text document. University of Baghdad-College of Science 2020-12-31 Article PeerReviewed pdf en cc_by_4 http://umpir.ump.edu.my/id/eprint/30746/8/The%20Evaluation%20of%20Accuracy%20Performance.pdf Nur Syafiqah, Mohd Nafis and Suryanti, Awang (2020) The Evaluation of Accuracy Performance in an Enhanced Embedded Feature Selection for Unstructured Text Classification. Iraqi Journal of Science, 61 (12). pp. 3397-3407. ISSN 0067-2904 https://doi.org/10.24996/ijs.2020.61.12.28 https://doi.org/10.24996/ijs.2020.61.12.28
institution Universiti Malaysia Pahang
building UMP Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Pahang
content_source UMP Institutional Repository
url_provider http://umpir.ump.edu.my/
language English
topic T Technology (General)
spellingShingle T Technology (General)
Nur Syafiqah, Mohd Nafis
Suryanti, Awang
The Evaluation of Accuracy Performance in an Enhanced Embedded Feature Selection for Unstructured Text Classification
description Text documents are unstructured and high dimensional. Effective feature selection is required to select the most important and significant feature from the sparse feature space. Thus, this paper proposed an embedded feature selection technique based on Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) for unstructured and high dimensional text classificationhis technique has the ability to measure the feature’s importance in a high-dimensional text document. In addition, it aims to increase the efficiency of the feature selection. Hence, obtaining a promising text classification accuracy. TF-IDF act as a filter approach which measures features importance of the text documents at the first stage. SVM-RFE utilized a backward feature elimination scheme to recursively remove insignificant features from the filtered feature subsets at the second stage. This research executes sets of experiments using a text document retrieved from a benchmark repository comprising a collection of Twitter posts. Pre-processing processes are applied to extract relevant features. After that, the pre-processed features are divided into training and testing datasets. Next, feature selection is implemented on the training dataset by calculating the TF-IDF score for each feature. SVM-RFE is applied for feature ranking as the next feature selection step. Only top-rank features will be selected for text classification using the SVM classifier. Based on the experiments, it shows that the proposed technique able to achieve 98% accuracy that outperformed other existing techniques. In conclusion, the proposed technique able to select the significant features in the unstructured and high dimensional text document.
format Article
author Nur Syafiqah, Mohd Nafis
Suryanti, Awang
author_facet Nur Syafiqah, Mohd Nafis
Suryanti, Awang
author_sort Nur Syafiqah, Mohd Nafis
title The Evaluation of Accuracy Performance in an Enhanced Embedded Feature Selection for Unstructured Text Classification
title_short The Evaluation of Accuracy Performance in an Enhanced Embedded Feature Selection for Unstructured Text Classification
title_full The Evaluation of Accuracy Performance in an Enhanced Embedded Feature Selection for Unstructured Text Classification
title_fullStr The Evaluation of Accuracy Performance in an Enhanced Embedded Feature Selection for Unstructured Text Classification
title_full_unstemmed The Evaluation of Accuracy Performance in an Enhanced Embedded Feature Selection for Unstructured Text Classification
title_sort evaluation of accuracy performance in an enhanced embedded feature selection for unstructured text classification
publisher University of Baghdad-College of Science
publishDate 2020
url http://umpir.ump.edu.my/id/eprint/30746/8/The%20Evaluation%20of%20Accuracy%20Performance.pdf
http://umpir.ump.edu.my/id/eprint/30746/
https://doi.org/10.24996/ijs.2020.61.12.28
https://doi.org/10.24996/ijs.2020.61.12.28
_version_ 1692991973033508864