On strategies for imbalanced text classification using SVM: A comparative study

Many real-world text classification tasks involve imbalanced training examples. The strategies proposed to address the imbalanced classification (e.g., resampling, instance weighting), however, have not been systematically evaluated in the text domain. In this paper, we conduct a comparative study o...

Full description

Saved in:

Bibliographic Details
Main Authors:	SUN, Aixin, LIM, Ee Peng, LIU, Ying
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2009
Subjects:	Imbalanced text classification Support Vector Machines SVM Resampling Instance weighting Databases and Information Systems Numerical Analysis and Scientific Computing
Online Access:	https://ink.library.smu.edu.sg/sis_research/757 https://ink.library.smu.edu.sg/context/sis_research/article/1756/viewcontent/1_s2.0_S0167923609001754_main.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-1756
record_format	dspace
spelling	sg-smu-ink.sis_research-17562018-06-25T03:48:56Z On strategies for imbalanced text classification using SVM: A comparative study SUN, Aixin LIM, Ee Peng LIU, Ying Many real-world text classification tasks involve imbalanced training examples. The strategies proposed to address the imbalanced classification (e.g., resampling, instance weighting), however, have not been systematically evaluated in the text domain. In this paper, we conduct a comparative study on the effectiveness of these strategies in the context of imbalanced text classification using Support Vector Machines (SVM) classifier. SVM is the interest in this study for its good classification accuracy reported in many text classification tasks. We propose a taxonomy to organize all proposed strategies following the training and the test phases in text classification tasks. Based on the taxonomy, we survey the methods proposed to address the imbalanced classification. Among them, 10 commonly-used methods were evaluated in our experiments on three benchmark datasets, i.e., Reuters-21578, 20-Newsgroups, and WebKB. Using the area under the Precision–Recall Curve as the performance measure, our experimental results showed that the best decision surface was often learned by the standard SVM, not coupled with any of the proposed strategies. We believe such a negative finding will benefit both researchers and application developers in the area by focusing more on thresholding strategies. 2009-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/757 info:doi/10.1016/j.dss.2009.07.011 https://ink.library.smu.edu.sg/context/sis_research/article/1756/viewcontent/1_s2.0_S0167923609001754_main.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Imbalanced text classification Support Vector Machines SVM Resampling Instance weighting Databases and Information Systems Numerical Analysis and Scientific Computing
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Imbalanced text classification Support Vector Machines SVM Resampling Instance weighting Databases and Information Systems Numerical Analysis and Scientific Computing
spellingShingle	Imbalanced text classification Support Vector Machines SVM Resampling Instance weighting Databases and Information Systems Numerical Analysis and Scientific Computing SUN, Aixin LIM, Ee Peng LIU, Ying On strategies for imbalanced text classification using SVM: A comparative study
description	Many real-world text classification tasks involve imbalanced training examples. The strategies proposed to address the imbalanced classification (e.g., resampling, instance weighting), however, have not been systematically evaluated in the text domain. In this paper, we conduct a comparative study on the effectiveness of these strategies in the context of imbalanced text classification using Support Vector Machines (SVM) classifier. SVM is the interest in this study for its good classification accuracy reported in many text classification tasks. We propose a taxonomy to organize all proposed strategies following the training and the test phases in text classification tasks. Based on the taxonomy, we survey the methods proposed to address the imbalanced classification. Among them, 10 commonly-used methods were evaluated in our experiments on three benchmark datasets, i.e., Reuters-21578, 20-Newsgroups, and WebKB. Using the area under the Precision–Recall Curve as the performance measure, our experimental results showed that the best decision surface was often learned by the standard SVM, not coupled with any of the proposed strategies. We believe such a negative finding will benefit both researchers and application developers in the area by focusing more on thresholding strategies.
format	text
author	SUN, Aixin LIM, Ee Peng LIU, Ying
author_facet	SUN, Aixin LIM, Ee Peng LIU, Ying
author_sort	SUN, Aixin
title	On strategies for imbalanced text classification using SVM: A comparative study
title_short	On strategies for imbalanced text classification using SVM: A comparative study
title_full	On strategies for imbalanced text classification using SVM: A comparative study
title_fullStr	On strategies for imbalanced text classification using SVM: A comparative study
title_full_unstemmed	On strategies for imbalanced text classification using SVM: A comparative study
title_sort	on strategies for imbalanced text classification using svm: a comparative study
publisher	Institutional Knowledge at Singapore Management University
publishDate	2009
url	https://ink.library.smu.edu.sg/sis_research/757 https://ink.library.smu.edu.sg/context/sis_research/article/1756/viewcontent/1_s2.0_S0167923609001754_main.pdf
_version_	1770570702270758912

On strategies for imbalanced text classification using SVM: A comparative study

Similar Items