DETECTION OF ONLINE PROSTITUTION ACCOUNT ON TWITTER PLATFORM USING MACHINE LEARNING APPROACHES
Twitter is one of the social media used for online prostitution. Based on data from Kominfo, there were 1000 online prostitution accounts on a Twitter report every month. In dealing with online prostitution, the Indonesian National Police is passive, meaning that it awaits public statements. One...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/54486 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Twitter is one of the social media used for online prostitution. Based on data from
Kominfo, there were 1000 online prostitution accounts on a Twitter report every
month. In dealing with online prostitution, the Indonesian National Police is
passive, meaning that it awaits public statements. One way to reduce online
prostitution is by taking preventive measures. The method is to detect online
prostitution activities.
Machine learning is a technological approach that can detect the existence of
online prostitution accounts on Twitter. The formulation of this research is how to
detect online prostitution accounts with a machine learning approach. The research
method used in this research is CRISP-DM. CRISP-DM consists of six stages:
business understanding, data understanding, data preparation, modeling,
evaluation, and deployment. The algorithms used are SVM, Random Forest, and
Naïve Bayes.
Crawling using hashtags containing prostitution such as #openbo is a solution to
get data about online prostitution accounts. From the results of data labeling, there
are two data set models. The first set of data models is the data set of accounts for
prostitution and accounts for non-prostitution without hashtags prostitution.
Second is the data set of accounts for prostitution with non-prostitution accounts
with prostitution hashtags. The study results show that for the data set 1 model,
features that can distinguish between prostitution accounts and non-prostitution
accounts are the number of followers, tweets, age of accounts, and content (words
and hashtags). For data set 2, distinguishing between prostitution accounts and
non-prostitution accounts with prostitution hashtags is the number of tweets and
content (hashtags and words). Afterward, from the three algorithms used SVM,
Random Forest, and Naïve Bayes, it is known that SVM has the highest accuracy
rate, namely 98,83% for data set model 1, while Random Forest has the highest
accuracy for dataset model 2, namely, 82,93% for data set model 2. Furthermore,
to know the best model between model dataset1 and model dataset2, we also test
the model with the same new data as 150 random data. The result is that dataset
model 2 is better than dataset model 1 because it makes fewer errors in predictions,
which is only 29 errors compared to dataset 1 with 37 errors. |
---|