Classification on big data set using data analytics techniques

The advancement of big data allows data analytics to grow with the increase in the amount of information that can be processed. As information is more readily available, programs can be created to extract, analyse and classify online social media messages and comments. Existing word dictionaries are...

Full description

Saved in:
Bibliographic Details
Main Author: Chung, Ka Wai
Other Authors: Chan Chee Keong
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/138594
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The advancement of big data allows data analytics to grow with the increase in the amount of information that can be processed. As information is more readily available, programs can be created to extract, analyse and classify online social media messages and comments. Existing word dictionaries are based on old literature text and documents and are unable to pick up slang used by users of the internet, as well as languages that are an amalgamation of different dialects and languages such as Singlish. The project aims to create a classification model based on a localised dataset of an online message board to be able to categorise comments whether they are positive, negative or neutral in sentiment. A total of 3 concepts of classification were explored and 5 different models were generated to obtain an accuracy ranging from 57%-64%. A voting classifier consisting of the combination of all 5 models resulted in a higher accuracy of 65.5%. A chatbot was also programmed and interaction with the classification models to evaluate the sentiment of user input. This project can be utilised in social data analytics and metrics to gauge feedback of online comments for news and updates.