Optimal feature selection for learning-based algorithms for sentiment classification

Sentiment classification is an important branch of cognitive computation—thus the further studies of properties of sentiment analysis is important. Sentiment classification on text data has been an active topic for the last two decades and learning-based methods are very popular and widely used in v...

Full description

Saved in:
Bibliographic Details
Main Authors: WANG, Zhaoxia, LIN, Zhiping
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2020
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/5887
https://ink.library.smu.edu.sg/context/sis_research/article/6882/viewcontent/Wang_Lin2020_Article_OptimalFeatureSelectionForLear__1_.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-6882
record_format dspace
spelling sg-smu-ink.sis_research-68822021-06-11T06:33:41Z Optimal feature selection for learning-based algorithms for sentiment classification WANG, Zhaoxia LIN, Zhiping Sentiment classification is an important branch of cognitive computation—thus the further studies of properties of sentiment analysis is important. Sentiment classification on text data has been an active topic for the last two decades and learning-based methods are very popular and widely used in various applications. For learning-based methods, a lot of enhanced technical strategies have been used to improve the performance of the methods. Feature selection is one of these strategies and it has been studied by many researchers. However, an existing unsolved difficult problem is the choice of a suitable number of features for obtaining the best sentiment classification performance of the learning-based methods. Therefore, we investigate the relationship between the number of features selected and the sentiment classification performance of the learning-based methods. A new method for the selection of a suitable number of features is proposed in which the Chi Square feature selection algorithm is employed and the features are selected using a preset score threshold. It is discovered that there is a relationship between the logarithm of the number of features selected and the sentiment classification performance of the learning-based method, and it is also found that this relationship is independent of the learning-based method involved. The new findings in this research indicate that it is always possible for researchers to select the appropriate number of features for learning-based methods to obtain the best sentiment classification performance. This can guide researchers to select the proper features for optimizing the performance of learning-based algorithms. (A preliminary version of this paper received a Best Paper Award at the International Conference on Extreme Learning Machines 2018.) 2020-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/5887 info:doi/10.1007/s12559-019-09669-5 https://ink.library.smu.edu.sg/context/sis_research/article/6882/viewcontent/Wang_Lin2020_Article_OptimalFeatureSelectionForLear__1_.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Machine learning feature selection Optimal feature selection relationship analysis sentiment classification social media text analysis Computational Engineering Databases and Information Systems Theory and Algorithms
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Machine learning
feature selection
Optimal feature selection
relationship analysis
sentiment classification
social media
text analysis
Computational Engineering
Databases and Information Systems
Theory and Algorithms
spellingShingle Machine learning
feature selection
Optimal feature selection
relationship analysis
sentiment classification
social media
text analysis
Computational Engineering
Databases and Information Systems
Theory and Algorithms
WANG, Zhaoxia
LIN, Zhiping
Optimal feature selection for learning-based algorithms for sentiment classification
description Sentiment classification is an important branch of cognitive computation—thus the further studies of properties of sentiment analysis is important. Sentiment classification on text data has been an active topic for the last two decades and learning-based methods are very popular and widely used in various applications. For learning-based methods, a lot of enhanced technical strategies have been used to improve the performance of the methods. Feature selection is one of these strategies and it has been studied by many researchers. However, an existing unsolved difficult problem is the choice of a suitable number of features for obtaining the best sentiment classification performance of the learning-based methods. Therefore, we investigate the relationship between the number of features selected and the sentiment classification performance of the learning-based methods. A new method for the selection of a suitable number of features is proposed in which the Chi Square feature selection algorithm is employed and the features are selected using a preset score threshold. It is discovered that there is a relationship between the logarithm of the number of features selected and the sentiment classification performance of the learning-based method, and it is also found that this relationship is independent of the learning-based method involved. The new findings in this research indicate that it is always possible for researchers to select the appropriate number of features for learning-based methods to obtain the best sentiment classification performance. This can guide researchers to select the proper features for optimizing the performance of learning-based algorithms. (A preliminary version of this paper received a Best Paper Award at the International Conference on Extreme Learning Machines 2018.)
format text
author WANG, Zhaoxia
LIN, Zhiping
author_facet WANG, Zhaoxia
LIN, Zhiping
author_sort WANG, Zhaoxia
title Optimal feature selection for learning-based algorithms for sentiment classification
title_short Optimal feature selection for learning-based algorithms for sentiment classification
title_full Optimal feature selection for learning-based algorithms for sentiment classification
title_fullStr Optimal feature selection for learning-based algorithms for sentiment classification
title_full_unstemmed Optimal feature selection for learning-based algorithms for sentiment classification
title_sort optimal feature selection for learning-based algorithms for sentiment classification
publisher Institutional Knowledge at Singapore Management University
publishDate 2020
url https://ink.library.smu.edu.sg/sis_research/5887
https://ink.library.smu.edu.sg/context/sis_research/article/6882/viewcontent/Wang_Lin2020_Article_OptimalFeatureSelectionForLear__1_.pdf
_version_ 1770575640277286912