Classification on big data set using data analytics techniques
The advancement of big data allows data analytics to grow with the increase in the amount of information that can be processed. As information is more readily available, programs can be created to extract, analyse and classify online social media messages and comments. Existing word dictionaries are...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/138594 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | The advancement of big data allows data analytics to grow with the increase in the amount of information that can be processed. As information is more readily available, programs can be created to extract, analyse and classify online social media messages and comments. Existing word dictionaries are based on old literature text and documents and are unable to pick up slang used by users of the internet, as well as languages that are an amalgamation of different dialects and languages such as Singlish. The project aims to create a classification model based on a localised dataset of an online message board to be able to categorise comments whether they are positive, negative or neutral in sentiment. A total of 3 concepts of classification were explored and 5 different models were generated to obtain an accuracy ranging from 57%-64%. A voting classifier consisting of the combination of all 5 models resulted in a higher accuracy of 65.5%. A chatbot was also programmed and interaction with the classification models to evaluate the sentiment of user input. This project can be utilised in social data analytics and metrics to gauge feedback of online comments for news and updates. |
---|