Identifying behavioral anomalies in Twitter users

In this modern society, social media are widely used in people’s lives for communication, B2C engagements and many more. Twitter, in particular, has become a popular medium for communication and with its popularity comes the increasing attention of spammers and cyber attackers who are looking to ups...

Full description

Saved in:
Bibliographic Details
Main Author: Teo, Yue Qi
Other Authors: Yeo Chai Kiat
Format: Final Year Project
Language:English
Published: 2016
Subjects:
Online Access:http://hdl.handle.net/10356/66710
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-66710
record_format dspace
spelling sg-ntu-dr.10356-667102023-03-03T20:42:26Z Identifying behavioral anomalies in Twitter users Teo, Yue Qi Yeo Chai Kiat School of Computer Engineering DRNTU::Engineering In this modern society, social media are widely used in people’s lives for communication, B2C engagements and many more. Twitter, in particular, has become a popular medium for communication and with its popularity comes the increasing attention of spammers and cyber attackers who are looking to upset the Twitter experience by spreading spams, hacking and etc. Therefore, the objective of this project is to identify behavioral anomalies in Twitter users and be able to detect these suspicious accounts by looking out for differing features that such users have from normal users and minimize the amount of damage they will be able to inflict as much as possible with early detection. To achieve the objective of this project, Twitter REST API was first used to obtain the latest 50 tweets of 10,000 users as well as two data mining algorithms which are the Random Forest and Time Series Anomaly Detection algorithms. Approximately 15,000 tweets were manually labelled before the remaining tweets were being automatically labelled using the prediction values derived from the Random Forest algorithm which was being implemented using ‘Caret’ R package in RStudio. Thereafter, the tweets dataset was being processed into users’ features as the final dataset used for behavioral anomalies identification. Anacondas Spyder was used as the processing tool. Finally, Time Series Anomaly Detection algorithm was used to identify abnormal tweeting frequencies over a period of time and was implemented using Twitter’s AnomalyDetection R package in RStudio. Lastly, the values of the Twitter users’ features are being plotted into graphs to showcase the differences between anomalous and normal users and results have shown that behavioral anomalies can be identified in features such as retweet counts, followers counts, friends counts, followers-to-friends ratio and many more. In addition, anomalous users tend to have greater variation of tweeting frequency at unexpected occasions and time frames while normal users usually become much more active around special events occurrences which in this case are the Christmas and New Year festive seasons. Results have also shown that anomalous users’ behaviors are versatile and more prone to changes as compared to normal users. Overall, the objective of this project was successfully achieved but there are certain areas which have not been studied due to time constraints and limited manpower and therefore can be researched further into in future to achieve even better identification of behavioral anomalies. Bachelor of Engineering (Computer Science) 2016-04-21T08:18:38Z 2016-04-21T08:18:38Z 2016 Final Year Project (FYP) http://hdl.handle.net/10356/66710 en Nanyang Technological University 99 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering
spellingShingle DRNTU::Engineering
Teo, Yue Qi
Identifying behavioral anomalies in Twitter users
description In this modern society, social media are widely used in people’s lives for communication, B2C engagements and many more. Twitter, in particular, has become a popular medium for communication and with its popularity comes the increasing attention of spammers and cyber attackers who are looking to upset the Twitter experience by spreading spams, hacking and etc. Therefore, the objective of this project is to identify behavioral anomalies in Twitter users and be able to detect these suspicious accounts by looking out for differing features that such users have from normal users and minimize the amount of damage they will be able to inflict as much as possible with early detection. To achieve the objective of this project, Twitter REST API was first used to obtain the latest 50 tweets of 10,000 users as well as two data mining algorithms which are the Random Forest and Time Series Anomaly Detection algorithms. Approximately 15,000 tweets were manually labelled before the remaining tweets were being automatically labelled using the prediction values derived from the Random Forest algorithm which was being implemented using ‘Caret’ R package in RStudio. Thereafter, the tweets dataset was being processed into users’ features as the final dataset used for behavioral anomalies identification. Anacondas Spyder was used as the processing tool. Finally, Time Series Anomaly Detection algorithm was used to identify abnormal tweeting frequencies over a period of time and was implemented using Twitter’s AnomalyDetection R package in RStudio. Lastly, the values of the Twitter users’ features are being plotted into graphs to showcase the differences between anomalous and normal users and results have shown that behavioral anomalies can be identified in features such as retweet counts, followers counts, friends counts, followers-to-friends ratio and many more. In addition, anomalous users tend to have greater variation of tweeting frequency at unexpected occasions and time frames while normal users usually become much more active around special events occurrences which in this case are the Christmas and New Year festive seasons. Results have also shown that anomalous users’ behaviors are versatile and more prone to changes as compared to normal users. Overall, the objective of this project was successfully achieved but there are certain areas which have not been studied due to time constraints and limited manpower and therefore can be researched further into in future to achieve even better identification of behavioral anomalies.
author2 Yeo Chai Kiat
author_facet Yeo Chai Kiat
Teo, Yue Qi
format Final Year Project
author Teo, Yue Qi
author_sort Teo, Yue Qi
title Identifying behavioral anomalies in Twitter users
title_short Identifying behavioral anomalies in Twitter users
title_full Identifying behavioral anomalies in Twitter users
title_fullStr Identifying behavioral anomalies in Twitter users
title_full_unstemmed Identifying behavioral anomalies in Twitter users
title_sort identifying behavioral anomalies in twitter users
publishDate 2016
url http://hdl.handle.net/10356/66710
_version_ 1759854266541932544