Mining social media data

In recent years, there have been a huge growth in the use of social media. Despite the huge amount of social media data available, they are still not fully utilised. Hence, there is a need for social media mining to find patterns and make sense of the data available. This study sought to predi...

Full description

Saved in:
Bibliographic Details
Main Author: Teo, Kelvin Mo Sheng
Other Authors: Ke Yiping, Kelly
Format: Final Year Project
Language:English
Published: 2016
Subjects:
Online Access:http://hdl.handle.net/10356/66709
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In recent years, there have been a huge growth in the use of social media. Despite the huge amount of social media data available, they are still not fully utilised. Hence, there is a need for social media mining to find patterns and make sense of the data available. This study sought to predict popular topics by examining them on Twitter over a time-window of 7 days. Through the application of three classification algorithms, namely, Decision Tree Classifiers, Naïve Bayes Classifiers and Support Vector Machines, and compare the performance of these three classification algorithms to find the most effective algorithm for mining two different types of class labels, Absolute and Relative Addressing. The results obtained showed that Support Vector Machines produced more accurate results while taking a substantial amount of time to process. Decision Tree Classifiers, on the other hand, took a much shorter time to process, but still able to predict with only a slightly lower accuracy than Support Vector Machines. Therefore, mining Twitter data prove to be useful in predicting popular topics, and mining social media data can be an effective method for commercial purposes. While this study focuses only on three classification algorithms and one data set with two types of class labels, further studies on other social media, algorithms and more data sets can be done in order to provide more accurate and comprehensive findings.