Automatic document classification

Sentiment analysis has been increasingly viewed as a major research area of Natural Language Processing from both an academic and an industrial standpoint. Automatic classification of natural language unit has become a major target of sentiment analysis. Current models on document classification, ho...

Full description

Saved in:

Bibliographic Details
Main Author:	Zhao, Zinian
Other Authors:	Chan Pack Kwong
Format:	Final Year Project
Language:	English
Published:	2017
Subjects:	DRNTU::Engineering::Electrical and electronic engineering
Online Access:	http://hdl.handle.net/10356/71212
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-71212
record_format	dspace
spelling	sg-ntu-dr.10356-712122023-07-07T15:42:16Z Automatic document classification Zhao, Zinian Chan Pack Kwong Mao Kezhi School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering Sentiment analysis has been increasingly viewed as a major research area of Natural Language Processing from both an academic and an industrial standpoint. Automatic classification of natural language unit has become a major target of sentiment analysis. Current models on document classification, however, are limited to short text span and could not yield an accurate classification on news and articles. This project aims to present an effective solution to document classification of real news and articles, which is a pipelined system consisting of representation learning and classification. In my work, various document representation learning methods and classification techniques have been investigated. In total of 9 models have been created and evaluated to classify real news and articles into three categories: Positive, Neutral and Negative. With elaborate experiments, our results show Word Embeddings (WE) and Random Forests (RF) model outperformed all pre-existing models with a classification accuracy as high as 60%. Further more, we will present in this report an automatic news analytics system using the WE and RF model. Given a keyword, the system can classify related news extracted from online sources. This project has successfully designed models that perform accurate document classification of news and can be utilized in a wide range of applications such as analyzing market trend and building investment decisions. Bachelor of Engineering 2017-05-15T07:45:21Z 2017-05-15T07:45:21Z 2017 Final Year Project (FYP) http://hdl.handle.net/10356/71212 en Nanyang Technological University 66 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Electrical and electronic engineering
spellingShingle	DRNTU::Engineering::Electrical and electronic engineering Zhao, Zinian Automatic document classification
description	Sentiment analysis has been increasingly viewed as a major research area of Natural Language Processing from both an academic and an industrial standpoint. Automatic classification of natural language unit has become a major target of sentiment analysis. Current models on document classification, however, are limited to short text span and could not yield an accurate classification on news and articles. This project aims to present an effective solution to document classification of real news and articles, which is a pipelined system consisting of representation learning and classification. In my work, various document representation learning methods and classification techniques have been investigated. In total of 9 models have been created and evaluated to classify real news and articles into three categories: Positive, Neutral and Negative. With elaborate experiments, our results show Word Embeddings (WE) and Random Forests (RF) model outperformed all pre-existing models with a classification accuracy as high as 60%. Further more, we will present in this report an automatic news analytics system using the WE and RF model. Given a keyword, the system can classify related news extracted from online sources. This project has successfully designed models that perform accurate document classification of news and can be utilized in a wide range of applications such as analyzing market trend and building investment decisions.
author2	Chan Pack Kwong
author_facet	Chan Pack Kwong Zhao, Zinian
format	Final Year Project
author	Zhao, Zinian
author_sort	Zhao, Zinian
title	Automatic document classification
title_short	Automatic document classification
title_full	Automatic document classification
title_fullStr	Automatic document classification
title_full_unstemmed	Automatic document classification
title_sort	automatic document classification
publishDate	2017
url	http://hdl.handle.net/10356/71212
_version_	1772828347607285760

Automatic document classification

Similar Items