Short text classification

In the information age, short texts are being encountered at numerous instances and in large quantities in the web. The fundamental text mining techniques fail to achieve high accuracy because the short texts are much shorter, nosier and sparser. Hence an efficient way is needed to process and categ...

Full description

Saved in:
Bibliographic Details
Main Author: Nagarajan, Divya
Other Authors: Sun Aixin
Format: Final Year Project
Language:English
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/10356/52084
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-52084
record_format dspace
spelling sg-ntu-dr.10356-520842023-03-03T20:32:48Z Short text classification Nagarajan, Divya Sun Aixin School of Computer Engineering DRNTU::Engineering In the information age, short texts are being encountered at numerous instances and in large quantities in the web. The fundamental text mining techniques fail to achieve high accuracy because the short texts are much shorter, nosier and sparser. Hence an efficient way is needed to process and categorise them so that these information can be used to improve the performance of systems that deal with such data. The aim of this project is to classify a given piece of short text as accurately as possible. Firstly, an existing algorithm was implemented to categorise a given piece of short text. The method first tries to pick up the most representative and topical indicative words from the given short text. These are the query words which would be used while performing the search. From the results retrieved, the category with the majority vote would be chosen as the category label of the given short text. Following this, an enhancement of the above algorithm was done. It was implemented using clustering and relevance ranking. Performance improvements were achieved and the classification accuracy had increased relatively compared to the above mentioned algorithm. Bachelor of Engineering (Computer Science) 2013-04-22T05:21:18Z 2013-04-22T05:21:18Z 2013 2013 Final Year Project (FYP) http://hdl.handle.net/10356/52084 en Nanyang Technological University 33 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering
spellingShingle DRNTU::Engineering
Nagarajan, Divya
Short text classification
description In the information age, short texts are being encountered at numerous instances and in large quantities in the web. The fundamental text mining techniques fail to achieve high accuracy because the short texts are much shorter, nosier and sparser. Hence an efficient way is needed to process and categorise them so that these information can be used to improve the performance of systems that deal with such data. The aim of this project is to classify a given piece of short text as accurately as possible. Firstly, an existing algorithm was implemented to categorise a given piece of short text. The method first tries to pick up the most representative and topical indicative words from the given short text. These are the query words which would be used while performing the search. From the results retrieved, the category with the majority vote would be chosen as the category label of the given short text. Following this, an enhancement of the above algorithm was done. It was implemented using clustering and relevance ranking. Performance improvements were achieved and the classification accuracy had increased relatively compared to the above mentioned algorithm.
author2 Sun Aixin
author_facet Sun Aixin
Nagarajan, Divya
format Final Year Project
author Nagarajan, Divya
author_sort Nagarajan, Divya
title Short text classification
title_short Short text classification
title_full Short text classification
title_fullStr Short text classification
title_full_unstemmed Short text classification
title_sort short text classification
publishDate 2013
url http://hdl.handle.net/10356/52084
_version_ 1759858356927856640