TWITTER SPAMMER DETECTION USING SOCIAL NETWORK

Social media is very useful in people daily life, especially in facilitate communication between people. Due to this convenience, social media users reach hundreds of millions globally. This is accompanied by the increasing number of spams on various media social media, including Twitter. The neg...

Full description

Saved in:
Bibliographic Details
Main Author: Jeffry
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/50183
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Social media is very useful in people daily life, especially in facilitate communication between people. Due to this convenience, social media users reach hundreds of millions globally. This is accompanied by the increasing number of spams on various media social media, including Twitter. The negative effect of spam is to annoy other legitimate users, or hijacking other user’s devices by inserting malware. Therefore, there is need of a system that can detect spam or spammer. Various research has been conducted to develop a spammer detection system on Twitter. Classification models are often used to develop spammer detection systems. Features that can be used to train the classification model include the account profile information, the tweets’ content, and friendship graph. There is still a small amount of research focuses on friendship graphs, although a lot of information can be obtained from friendship graphs. In this research, a spammer detection system was developed using a classification model, which will be trained with the features obtained from friendship graphs. In order to be used to train the classification model, the feature is transformed into a social network graph and analyzed using Social Network Analysis. From the test results, the detection system has a pretty good performance when tested with first dataset, which gets an accuracy up to 83%. However, when tested with second dataset, the system’s accuracy was only 33%. The most possible reason is the amount of data used to train the model is still insufficient, resulting in the trained model fail to predict data in second dataset correctly.