Classification on big data set using data analytics techniques
The advancement of big data allows data analytics to grow with the increase in the amount of information that can be processed. As information is more readily available, programs can be created to extract, analyse and classify online social media messages and comments. Existing word dictionaries are...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/138594 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-138594 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1385942023-07-07T18:09:55Z Classification on big data set using data analytics techniques Chung, Ka Wai Chan Chee Keong School of Electrical and Electronic Engineering eckchan@ntu.edu.sg Engineering::Electrical and electronic engineering::Computer hardware, software and systems The advancement of big data allows data analytics to grow with the increase in the amount of information that can be processed. As information is more readily available, programs can be created to extract, analyse and classify online social media messages and comments. Existing word dictionaries are based on old literature text and documents and are unable to pick up slang used by users of the internet, as well as languages that are an amalgamation of different dialects and languages such as Singlish. The project aims to create a classification model based on a localised dataset of an online message board to be able to categorise comments whether they are positive, negative or neutral in sentiment. A total of 3 concepts of classification were explored and 5 different models were generated to obtain an accuracy ranging from 57%-64%. A voting classifier consisting of the combination of all 5 models resulted in a higher accuracy of 65.5%. A chatbot was also programmed and interaction with the classification models to evaluate the sentiment of user input. This project can be utilised in social data analytics and metrics to gauge feedback of online comments for news and updates. Bachelor of Engineering (Electrical and Electronic Engineering) 2020-05-11T02:16:38Z 2020-05-11T02:16:38Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/138594 en A3039-191 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Electrical and electronic engineering::Computer hardware, software and systems |
spellingShingle |
Engineering::Electrical and electronic engineering::Computer hardware, software and systems Chung, Ka Wai Classification on big data set using data analytics techniques |
description |
The advancement of big data allows data analytics to grow with the increase in the amount of information that can be processed. As information is more readily available, programs can be created to extract, analyse and classify online social media messages and comments. Existing word dictionaries are based on old literature text and documents and are unable to pick up slang used by users of the internet, as well as languages that are an amalgamation of different dialects and languages such as Singlish. The project aims to create a classification model based on a localised dataset of an online message board to be able to categorise comments whether they are positive, negative or neutral in sentiment. A total of 3 concepts of classification were explored and 5 different models were generated to obtain an accuracy ranging from 57%-64%. A voting classifier consisting of the combination of all 5 models resulted in a higher accuracy of 65.5%. A chatbot was also programmed and interaction with the classification models to evaluate the sentiment of user input. This project can be utilised in social data analytics and metrics to gauge feedback of online comments for news and updates. |
author2 |
Chan Chee Keong |
author_facet |
Chan Chee Keong Chung, Ka Wai |
format |
Final Year Project |
author |
Chung, Ka Wai |
author_sort |
Chung, Ka Wai |
title |
Classification on big data set using data analytics techniques |
title_short |
Classification on big data set using data analytics techniques |
title_full |
Classification on big data set using data analytics techniques |
title_fullStr |
Classification on big data set using data analytics techniques |
title_full_unstemmed |
Classification on big data set using data analytics techniques |
title_sort |
classification on big data set using data analytics techniques |
publisher |
Nanyang Technological University |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/138594 |
_version_ |
1772829160139390976 |