Identifying behavioral anomalies in Twitter users
In this modern society, social media are widely used in people’s lives for communication, B2C engagements and many more. Twitter, in particular, has become a popular medium for communication and with its popularity comes the increasing attention of spammers and cyber attackers who are looking to ups...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2016
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/66710 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In this modern society, social media are widely used in people’s lives for communication, B2C engagements and many more. Twitter, in particular, has become a popular medium for communication and with its popularity comes the increasing attention of spammers and cyber attackers who are looking to upset the Twitter experience by spreading spams, hacking and etc.
Therefore, the objective of this project is to identify behavioral anomalies in Twitter users and be able to detect these suspicious accounts by looking out for differing features that such users have from normal users and minimize the amount of damage they will be able to inflict as much as possible with early detection.
To achieve the objective of this project, Twitter REST API was first used to obtain the latest 50 tweets of 10,000 users as well as two data mining algorithms which are the Random Forest and Time Series Anomaly Detection algorithms. Approximately 15,000 tweets were manually labelled before the remaining tweets were being automatically labelled using the prediction values derived from the Random Forest algorithm which was being implemented using ‘Caret’ R package in RStudio. Thereafter, the tweets dataset was being processed into users’ features as the final dataset used for behavioral anomalies identification. Anacondas Spyder was used as the processing tool. Finally, Time Series Anomaly Detection algorithm was used to identify abnormal tweeting frequencies over a period of time and was implemented using Twitter’s AnomalyDetection R package in RStudio.
Lastly, the values of the Twitter users’ features are being plotted into graphs to showcase the differences between anomalous and normal users and results have shown that behavioral anomalies can be identified in features such as retweet counts, followers counts, friends counts, followers-to-friends ratio and many more. In addition, anomalous users tend to have greater variation of tweeting frequency at unexpected occasions and time frames while normal users usually become much more active around special events occurrences which in this case are the Christmas and New Year festive seasons. Results have also shown that anomalous users’ behaviors are versatile and more prone to changes as compared to normal users.
Overall, the objective of this project was successfully achieved but there are certain areas which have not been studied due to time constraints and limited manpower and therefore can be researched further into in future to achieve even better identification of behavioral anomalies. |
---|