TWITTER SPAMMER DETECTION USING SOCIAL NETWORK
Social media is very useful in people daily life, especially in facilitate communication between people. Due to this convenience, social media users reach hundreds of millions globally. This is accompanied by the increasing number of spams on various media social media, including Twitter. The neg...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/50183 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Social media is very useful in people daily life, especially in facilitate communication between
people. Due to this convenience, social media users reach hundreds of millions globally. This is
accompanied by the increasing number of spams on various media social media, including Twitter.
The negative effect of spam is to annoy other legitimate users, or hijacking other user’s devices by
inserting malware. Therefore, there is need of a system that can detect spam or spammer.
Various research has been conducted to develop a spammer detection system on Twitter.
Classification models are often used to develop spammer detection systems. Features that can be
used to train the classification model include the account profile information, the tweets’ content,
and friendship graph. There is still a small amount of research focuses on friendship graphs,
although a lot of information can be obtained from friendship graphs.
In this research, a spammer detection system was developed using a classification model, which
will be trained with the features obtained from friendship graphs. In order to be used to train the
classification model, the feature is transformed into a social network graph and analyzed using
Social Network Analysis.
From the test results, the detection system has a pretty good performance when tested with first
dataset, which gets an accuracy up to 83%. However, when tested with second dataset, the system’s
accuracy was only 33%. The most possible reason is the amount of data used to train the model is
still insufficient, resulting in the trained model fail to predict data in second dataset correctly.
|
---|