DETECTION OF ONLINE PROSTITUTION ACCOUNT ON TWITTER PLATFORM USING MACHINE LEARNING APPROACHES

Twitter is one of the social media used for online prostitution. Based on data from Kominfo, there were 1000 online prostitution accounts on a Twitter report every month. In dealing with online prostitution, the Indonesian National Police is passive, meaning that it awaits public statements. One...

Full description

Saved in:
Bibliographic Details
Main Author: Kusuma, Nugrahadi
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/54486
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Twitter is one of the social media used for online prostitution. Based on data from Kominfo, there were 1000 online prostitution accounts on a Twitter report every month. In dealing with online prostitution, the Indonesian National Police is passive, meaning that it awaits public statements. One way to reduce online prostitution is by taking preventive measures. The method is to detect online prostitution activities. Machine learning is a technological approach that can detect the existence of online prostitution accounts on Twitter. The formulation of this research is how to detect online prostitution accounts with a machine learning approach. The research method used in this research is CRISP-DM. CRISP-DM consists of six stages: business understanding, data understanding, data preparation, modeling, evaluation, and deployment. The algorithms used are SVM, Random Forest, and Naïve Bayes. Crawling using hashtags containing prostitution such as #openbo is a solution to get data about online prostitution accounts. From the results of data labeling, there are two data set models. The first set of data models is the data set of accounts for prostitution and accounts for non-prostitution without hashtags prostitution. Second is the data set of accounts for prostitution with non-prostitution accounts with prostitution hashtags. The study results show that for the data set 1 model, features that can distinguish between prostitution accounts and non-prostitution accounts are the number of followers, tweets, age of accounts, and content (words and hashtags). For data set 2, distinguishing between prostitution accounts and non-prostitution accounts with prostitution hashtags is the number of tweets and content (hashtags and words). Afterward, from the three algorithms used SVM, Random Forest, and Naïve Bayes, it is known that SVM has the highest accuracy rate, namely 98,83% for data set model 1, while Random Forest has the highest accuracy for dataset model 2, namely, 82,93% for data set model 2. Furthermore, to know the best model between model dataset1 and model dataset2, we also test the model with the same new data as 150 random data. The result is that dataset model 2 is better than dataset model 1 because it makes fewer errors in predictions, which is only 29 errors compared to dataset 1 with 37 errors.