A generalized cluster centroid based classifier for text categorization

In this paper, a Generalized Cluster Centroid based Classifier (GCCC) and its variants for text categorization are proposed by utilizing a clustering algorithm to integrate two wellknown classifiers, i.e., the K-nearest-neighbor (KNN) classifier and the Rocchio classifier. KNN, a lazy learning metho...

Full description

Saved in:
Bibliographic Details
Main Authors: PANG, Guansong, JIANG, Shengyi
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2012
Subjects:
KNN
Online Access:https://ink.library.smu.edu.sg/sis_research/7028
https://ink.library.smu.edu.sg/context/sis_research/article/8031/viewcontent/1000006552265.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8031
record_format dspace
spelling sg-smu-ink.sis_research-80312022-03-17T14:58:56Z A generalized cluster centroid based classifier for text categorization PANG, Guansong JIANG, Shengyi In this paper, a Generalized Cluster Centroid based Classifier (GCCC) and its variants for text categorization are proposed by utilizing a clustering algorithm to integrate two wellknown classifiers, i.e., the K-nearest-neighbor (KNN) classifier and the Rocchio classifier. KNN, a lazy learning method, suffers from inefficiency in online categorization while achieving remarkable effectiveness. Rocchio, which has efficient categorization performance, fails to obtain an expressive categorization model due to its inherent linear separability assumption. Our proposed method mainly focuses on two points: one point is that we use a clustering algorithm to strengthen the expressiveness of the Rocchio model; another one is that we employ the improved Rocchio model to speed up the categorization process of KNN. Extensive experiments conducted on both English and Chinese corpora show that GCCC and its variants have better categorization ability than some state-ofthe-art classifiers, i.e., Rocchio, KNN and Support Vector Machine (SVM). 2012-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7028 info:doi/10.1016/j.ipm.2012.10.003 https://ink.library.smu.edu.sg/context/sis_research/article/8031/viewcontent/1000006552265.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Text categorization KNN Rocchio Clustering Generalized cluster centroid Artificial Intelligence and Robotics Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Text categorization
KNN
Rocchio
Clustering
Generalized cluster centroid
Artificial Intelligence and Robotics
Databases and Information Systems
spellingShingle Text categorization
KNN
Rocchio
Clustering
Generalized cluster centroid
Artificial Intelligence and Robotics
Databases and Information Systems
PANG, Guansong
JIANG, Shengyi
A generalized cluster centroid based classifier for text categorization
description In this paper, a Generalized Cluster Centroid based Classifier (GCCC) and its variants for text categorization are proposed by utilizing a clustering algorithm to integrate two wellknown classifiers, i.e., the K-nearest-neighbor (KNN) classifier and the Rocchio classifier. KNN, a lazy learning method, suffers from inefficiency in online categorization while achieving remarkable effectiveness. Rocchio, which has efficient categorization performance, fails to obtain an expressive categorization model due to its inherent linear separability assumption. Our proposed method mainly focuses on two points: one point is that we use a clustering algorithm to strengthen the expressiveness of the Rocchio model; another one is that we employ the improved Rocchio model to speed up the categorization process of KNN. Extensive experiments conducted on both English and Chinese corpora show that GCCC and its variants have better categorization ability than some state-ofthe-art classifiers, i.e., Rocchio, KNN and Support Vector Machine (SVM).
format text
author PANG, Guansong
JIANG, Shengyi
author_facet PANG, Guansong
JIANG, Shengyi
author_sort PANG, Guansong
title A generalized cluster centroid based classifier for text categorization
title_short A generalized cluster centroid based classifier for text categorization
title_full A generalized cluster centroid based classifier for text categorization
title_fullStr A generalized cluster centroid based classifier for text categorization
title_full_unstemmed A generalized cluster centroid based classifier for text categorization
title_sort generalized cluster centroid based classifier for text categorization
publisher Institutional Knowledge at Singapore Management University
publishDate 2012
url https://ink.library.smu.edu.sg/sis_research/7028
https://ink.library.smu.edu.sg/context/sis_research/article/8031/viewcontent/1000006552265.pdf
_version_ 1770576190708383744