A generalized cluster centroid based classifier for text categorization

In this paper, a Generalized Cluster Centroid based Classifier (GCCC) and its variants for text categorization are proposed by utilizing a clustering algorithm to integrate two wellknown classifiers, i.e., the K-nearest-neighbor (KNN) classifier and the Rocchio classifier. KNN, a lazy learning metho...

Full description

Saved in:
Bibliographic Details
Main Authors: PANG, Guansong, JIANG, Shengyi
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2012
Subjects:
KNN
Online Access:https://ink.library.smu.edu.sg/sis_research/7028
https://ink.library.smu.edu.sg/context/sis_research/article/8031/viewcontent/1000006552265.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:In this paper, a Generalized Cluster Centroid based Classifier (GCCC) and its variants for text categorization are proposed by utilizing a clustering algorithm to integrate two wellknown classifiers, i.e., the K-nearest-neighbor (KNN) classifier and the Rocchio classifier. KNN, a lazy learning method, suffers from inefficiency in online categorization while achieving remarkable effectiveness. Rocchio, which has efficient categorization performance, fails to obtain an expressive categorization model due to its inherent linear separability assumption. Our proposed method mainly focuses on two points: one point is that we use a clustering algorithm to strengthen the expressiveness of the Rocchio model; another one is that we employ the improved Rocchio model to speed up the categorization process of KNN. Extensive experiments conducted on both English and Chinese corpora show that GCCC and its variants have better categorization ability than some state-ofthe-art classifiers, i.e., Rocchio, KNN and Support Vector Machine (SVM).