An improved K-nearest-neighbor algorithm for text categorization

Text categorization is a significant tool to manage and organize the surging text data. Many text categorization algorithms have been explored in previous literatures, such as KNN, Naive Bayes and Support Vector Machine. KNN text categorization is an effective but less efficient classification metho...

Full description

Saved in:

Bibliographic Details
Main Authors:	JIANG, Shengyi, PANG, Guansong, WU, Meiling, KUANG, Limin
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2012
Subjects:	Text categorization KNN text categorization One-pass clustering Spam filtering Databases and Information Systems Theory and Algorithms
Online Access:	https://ink.library.smu.edu.sg/sis_research/7542 https://ink.library.smu.edu.sg/context/sis_research/article/8545/viewcontent/1_s2.0_S0957417411011511_main.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-8545
record_format	dspace
spelling	sg-smu-ink.sis_research-85452022-11-29T07:10:24Z An improved K-nearest-neighbor algorithm for text categorization JIANG, Shengyi PANG, Guansong WU, Meiling KUANG, Limin Text categorization is a significant tool to manage and organize the surging text data. Many text categorization algorithms have been explored in previous literatures, such as KNN, Naive Bayes and Support Vector Machine. KNN text categorization is an effective but less efficient classification method. In this paper, we propose an improved KNN algorithm for text categorization, which builds the classification model by combining constrained one pass clustering algorithm and KNN text categorization. Empirical results on three benchmark corpora show that our algorithm can reduce the text similarity computation substantially and outperform the-state-of-the-art KNN, Naive Bayes and Support Vector Machine classifiers. In addition, the classification model constructed by the proposed algorithm can be updated incrementally, and it has great scalability in many real-word applications. (C) 2011 Elsevier Ltd. All rights reserved. 2012-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7542 info:doi/10.1016/j.eswa.2011.08.040 https://ink.library.smu.edu.sg/context/sis_research/article/8545/viewcontent/1_s2.0_S0957417411011511_main.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Text categorization KNN text categorization One-pass clustering Spam filtering Databases and Information Systems Theory and Algorithms
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Text categorization KNN text categorization One-pass clustering Spam filtering Databases and Information Systems Theory and Algorithms
spellingShingle	Text categorization KNN text categorization One-pass clustering Spam filtering Databases and Information Systems Theory and Algorithms JIANG, Shengyi PANG, Guansong WU, Meiling KUANG, Limin An improved K-nearest-neighbor algorithm for text categorization
description	Text categorization is a significant tool to manage and organize the surging text data. Many text categorization algorithms have been explored in previous literatures, such as KNN, Naive Bayes and Support Vector Machine. KNN text categorization is an effective but less efficient classification method. In this paper, we propose an improved KNN algorithm for text categorization, which builds the classification model by combining constrained one pass clustering algorithm and KNN text categorization. Empirical results on three benchmark corpora show that our algorithm can reduce the text similarity computation substantially and outperform the-state-of-the-art KNN, Naive Bayes and Support Vector Machine classifiers. In addition, the classification model constructed by the proposed algorithm can be updated incrementally, and it has great scalability in many real-word applications. (C) 2011 Elsevier Ltd. All rights reserved.
format	text
author	JIANG, Shengyi PANG, Guansong WU, Meiling KUANG, Limin
author_facet	JIANG, Shengyi PANG, Guansong WU, Meiling KUANG, Limin
author_sort	JIANG, Shengyi
title	An improved K-nearest-neighbor algorithm for text categorization
title_short	An improved K-nearest-neighbor algorithm for text categorization
title_full	An improved K-nearest-neighbor algorithm for text categorization
title_fullStr	An improved K-nearest-neighbor algorithm for text categorization
title_full_unstemmed	An improved K-nearest-neighbor algorithm for text categorization
title_sort	improved k-nearest-neighbor algorithm for text categorization
publisher	Institutional Knowledge at Singapore Management University
publishDate	2012
url	https://ink.library.smu.edu.sg/sis_research/7542 https://ink.library.smu.edu.sg/context/sis_research/article/8545/viewcontent/1_s2.0_S0957417411011511_main.pdf
_version_	1770576369202233344

An improved K-nearest-neighbor algorithm for text categorization

Similar Items