On machine learning methods for Chinese document classification

This paper reports our comparative evaluation of three machine learning methods, namely k Nearest Neighbor (kNN), Support Vector Machines (SVM), and Adaptive Resonance Associative Map (ARAM) for Chinese document categorization. Based on two Chinese corpora, a series of controlled experiments evaluat...

Full description

Saved in:
Bibliographic Details
Main Authors: HE, Ji, TAN, Ah-hwee, TAN, Chew-Lim
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2003
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/5243
https://ink.library.smu.edu.sg/context/sis_research/article/6246/viewcontent/download__1_.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-6246
record_format dspace
spelling sg-smu-ink.sis_research-62462020-07-23T18:23:29Z On machine learning methods for Chinese document classification HE, Ji TAN, Ah-hwee TAN, Chew-Lim This paper reports our comparative evaluation of three machine learning methods, namely k Nearest Neighbor (kNN), Support Vector Machines (SVM), and Adaptive Resonance Associative Map (ARAM) for Chinese document categorization. Based on two Chinese corpora, a series of controlled experiments evaluated their learning capabilities and efficiency in mining text classification knowledge. Benchmark experiments showed that their predictive performance were roughly comparable, especially on clean and well organized data sets. While kNN and ARAM yield better performances than SVM on small and clean data sets, SVM and ARAM significantly outperformed kNN on noisy data. Comparing efficiency, kNN was notably more costly in terms of time and memory than the other two methods. SVM is highly efficient in learning from well organized samples of moderate size, although on relatively large and noisy data the efficiency of SVM and ARAM are comparable. 2003-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/5243 info:doi/10.1023%2FA%3A1023202221875 https://ink.library.smu.edu.sg/context/sis_research/article/6246/viewcontent/download__1_.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University text categorization machine learning comparative experiments Artificial Intelligence and Robotics Databases and Information Systems Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic text categorization
machine learning
comparative experiments
Artificial Intelligence and Robotics
Databases and Information Systems
Software Engineering
spellingShingle text categorization
machine learning
comparative experiments
Artificial Intelligence and Robotics
Databases and Information Systems
Software Engineering
HE, Ji
TAN, Ah-hwee
TAN, Chew-Lim
On machine learning methods for Chinese document classification
description This paper reports our comparative evaluation of three machine learning methods, namely k Nearest Neighbor (kNN), Support Vector Machines (SVM), and Adaptive Resonance Associative Map (ARAM) for Chinese document categorization. Based on two Chinese corpora, a series of controlled experiments evaluated their learning capabilities and efficiency in mining text classification knowledge. Benchmark experiments showed that their predictive performance were roughly comparable, especially on clean and well organized data sets. While kNN and ARAM yield better performances than SVM on small and clean data sets, SVM and ARAM significantly outperformed kNN on noisy data. Comparing efficiency, kNN was notably more costly in terms of time and memory than the other two methods. SVM is highly efficient in learning from well organized samples of moderate size, although on relatively large and noisy data the efficiency of SVM and ARAM are comparable.
format text
author HE, Ji
TAN, Ah-hwee
TAN, Chew-Lim
author_facet HE, Ji
TAN, Ah-hwee
TAN, Chew-Lim
author_sort HE, Ji
title On machine learning methods for Chinese document classification
title_short On machine learning methods for Chinese document classification
title_full On machine learning methods for Chinese document classification
title_fullStr On machine learning methods for Chinese document classification
title_full_unstemmed On machine learning methods for Chinese document classification
title_sort on machine learning methods for chinese document classification
publisher Institutional Knowledge at Singapore Management University
publishDate 2003
url https://ink.library.smu.edu.sg/sis_research/5243
https://ink.library.smu.edu.sg/context/sis_research/article/6246/viewcontent/download__1_.pdf
_version_ 1770575347100680192