A review of feature selection on text classification

Textual data is a high-dimensional data. In high-dimensional data, the number of features xceeds the number of samples. Hence, it equally increased the amount of noise, and irrelevant features. At this point, dimensionality reduction is necessary. Feature selection is an example of dimensionality re...

Full description

Saved in:

Bibliographic Details
Main Authors:	Nur Syafiqah, Mohd Nafis, Suryanti, Awang
Format:	Conference or Workshop Item
Language:	English
Published:	Universiti Malaysia Pahang 2018
Subjects:	QA76 Computer software
Online Access:	http://umpir.ump.edu.my/id/eprint/23030/7/A%20Review%20of%20Feature%20Selection%20on%20Text2.pdf http://umpir.ump.edu.my/id/eprint/23030/ http://ncon-pgr.ump.edu.my/index.php/en/download/proceedings-book
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Malaysia Pahang
Language:	English

id	my.ump.umpir.23030
record_format	eprints
spelling	my.ump.umpir.230302019-07-24T01:17:04Z http://umpir.ump.edu.my/id/eprint/23030/ A review of feature selection on text classification Nur Syafiqah, Mohd Nafis Suryanti, Awang QA76 Computer software Textual data is a high-dimensional data. In high-dimensional data, the number of features xceeds the number of samples. Hence, it equally increased the amount of noise, and irrelevant features. At this point, dimensionality reduction is necessary. Feature selection is an example of dimensionality reduction techniques. Moreover, it had been an indispensable component in classification. Thus, in this paper, we presented three feature selection approaches; filter, wrapper and embedded. Their aims, advantages and disadvantages are also briefly explained. Besides, this study reviews several significant studies for each feature selection approach for text classification. Based on the studies, we found that wrapper approach is less used by researchers since it is prone to over-fit and exposed local-optima for text classification while filter and embedded achieved an amount of research. However, in filter approach, the classification accuracies cannot be guaranteed because it does not incorporate with any learning algorithm. Therefore, it concludes that embedded feature selection can offer a promising classification performance regarding classification accuracy and computational time. Universiti Malaysia Pahang 2018-08 Conference or Workshop Item PeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/23030/7/A%20Review%20of%20Feature%20Selection%20on%20Text2.pdf Nur Syafiqah, Mohd Nafis and Suryanti, Awang (2018) A review of feature selection on text classification. In: Proceedings Book: National Conference for Postgraduate Research (NCON-PGR 2018), 28-29 August 2018 , Universiti Malaysia Pahang, Gambang, Pahang. pp. 8-14.. ISBN 978-967-22260-5-5 http://ncon-pgr.ump.edu.my/index.php/en/download/proceedings-book
institution	Universiti Malaysia Pahang
building	UMP Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Malaysia Pahang
content_source	UMP Institutional Repository
url_provider	http://umpir.ump.edu.my/
language	English
topic	QA76 Computer software
spellingShingle	QA76 Computer software Nur Syafiqah, Mohd Nafis Suryanti, Awang A review of feature selection on text classification
description	Textual data is a high-dimensional data. In high-dimensional data, the number of features xceeds the number of samples. Hence, it equally increased the amount of noise, and irrelevant features. At this point, dimensionality reduction is necessary. Feature selection is an example of dimensionality reduction techniques. Moreover, it had been an indispensable component in classification. Thus, in this paper, we presented three feature selection approaches; filter, wrapper and embedded. Their aims, advantages and disadvantages are also briefly explained. Besides, this study reviews several significant studies for each feature selection approach for text classification. Based on the studies, we found that wrapper approach is less used by researchers since it is prone to over-fit and exposed local-optima for text classification while filter and embedded achieved an amount of research. However, in filter approach, the classification accuracies cannot be guaranteed because it does not incorporate with any learning algorithm. Therefore, it concludes that embedded feature selection can offer a promising classification performance regarding classification accuracy and computational time.
format	Conference or Workshop Item
author	Nur Syafiqah, Mohd Nafis Suryanti, Awang
author_facet	Nur Syafiqah, Mohd Nafis Suryanti, Awang
author_sort	Nur Syafiqah, Mohd Nafis
title	A review of feature selection on text classification
title_short	A review of feature selection on text classification
title_full	A review of feature selection on text classification
title_fullStr	A review of feature selection on text classification
title_full_unstemmed	A review of feature selection on text classification
title_sort	review of feature selection on text classification
publisher	Universiti Malaysia Pahang
publishDate	2018
url	http://umpir.ump.edu.my/id/eprint/23030/7/A%20Review%20of%20Feature%20Selection%20on%20Text2.pdf http://umpir.ump.edu.my/id/eprint/23030/ http://ncon-pgr.ump.edu.my/index.php/en/download/proceedings-book
_version_	1643669503556452352

A review of feature selection on text classification

Similar Items