Efficient text classification

As the digital age pushes forward, data and document size have been increasing rapidly. A more efficient and accurate method of sampling data for training text classifiers is required. We require good samples and not just blind samples from Simple Random Sampling, therefore we experimented on a new...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Tan, Cheryl Qian Ru.
مؤلفون آخرون:	Manoranjan Dash
التنسيق:	Final Year Project
اللغة:	English
منشور في:	2010
الموضوعات:	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
الوصول للمادة أونلاين:	http://hdl.handle.net/10356/39727
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

id	sg-ntu-dr.10356-39727
record_format	dspace
spelling	sg-ntu-dr.10356-397272023-03-03T20:47:47Z Efficient text classification Tan, Cheryl Qian Ru. Manoranjan Dash School of Computer Engineering Centre for Advanced Information Systems DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing As the digital age pushes forward, data and document size have been increasing rapidly. A more efficient and accurate method of sampling data for training text classifiers is required. We require good samples and not just blind samples from Simple Random Sampling, therefore we experimented on a new proposed sampling algorithm – CONCISE. It is a novel sampling algorithm that is proposed for selecting training documents for text classification and experiments showed that it works particularly well with small sampling ratio. Experiments were conducted on the 20 Newsgroup corpus and Reuters 21578 document set using two classifiers SVM and Naïve Bayes classifier. CONCISE is compared with SRS in all experiments and results showed that CONCISE is consistent in accuracy no matter which classifier is used. In all experiments, CONCISE outperforms SRS in all sampling ratios and the accuracy with CONCISE is higher. However, CONCISE requires more running time but the trade off is small compared to the increase in accuracy. Bachelor of Engineering (Computer Science) 2010-06-03T06:38:30Z 2010-06-03T06:38:30Z 2010 2010 Final Year Project (FYP) http://hdl.handle.net/10356/39727 en Nanyang Technological University 57 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
spellingShingle	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing Tan, Cheryl Qian Ru. Efficient text classification
description	As the digital age pushes forward, data and document size have been increasing rapidly. A more efficient and accurate method of sampling data for training text classifiers is required. We require good samples and not just blind samples from Simple Random Sampling, therefore we experimented on a new proposed sampling algorithm – CONCISE. It is a novel sampling algorithm that is proposed for selecting training documents for text classification and experiments showed that it works particularly well with small sampling ratio. Experiments were conducted on the 20 Newsgroup corpus and Reuters 21578 document set using two classifiers SVM and Naïve Bayes classifier. CONCISE is compared with SRS in all experiments and results showed that CONCISE is consistent in accuracy no matter which classifier is used. In all experiments, CONCISE outperforms SRS in all sampling ratios and the accuracy with CONCISE is higher. However, CONCISE requires more running time but the trade off is small compared to the increase in accuracy.
author2	Manoranjan Dash
author_facet	Manoranjan Dash Tan, Cheryl Qian Ru.
format	Final Year Project
author	Tan, Cheryl Qian Ru.
author_sort	Tan, Cheryl Qian Ru.
title	Efficient text classification
title_short	Efficient text classification
title_full	Efficient text classification
title_fullStr	Efficient text classification
title_full_unstemmed	Efficient text classification
title_sort	efficient text classification
publishDate	2010
url	http://hdl.handle.net/10356/39727
_version_	1759853436988293120

Efficient text classification

مواد مشابهة