Short text classification

In the information age, short texts are being encountered at numerous instances and in large quantities in the web. The fundamental text mining techniques fail to achieve high accuracy because the short texts are much shorter, nosier and sparser. Hence an efficient way is needed to process and categ...

Full description

Saved in:
Bibliographic Details
Main Author: Nagarajan, Divya
Other Authors: Sun Aixin
Format: Final Year Project
Language:English
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/10356/52084
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In the information age, short texts are being encountered at numerous instances and in large quantities in the web. The fundamental text mining techniques fail to achieve high accuracy because the short texts are much shorter, nosier and sparser. Hence an efficient way is needed to process and categorise them so that these information can be used to improve the performance of systems that deal with such data. The aim of this project is to classify a given piece of short text as accurately as possible. Firstly, an existing algorithm was implemented to categorise a given piece of short text. The method first tries to pick up the most representative and topical indicative words from the given short text. These are the query words which would be used while performing the search. From the results retrieved, the category with the majority vote would be chosen as the category label of the given short text. Following this, an enhancement of the above algorithm was done. It was implemented using clustering and relevance ranking. Performance improvements were achieved and the classification accuracy had increased relatively compared to the above mentioned algorithm.