DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES

Online prostitution through social media, particularly Twitter, has become a serious issue with significant legal and social implications. This study developed a model to detect accounts involved in online prostitution activities by combining machine learning algorithms and advanced linguistic re...

Full description

Saved in:
Bibliographic Details
Main Author: Muhammad Aulia, Faraz
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/86848
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Online prostitution through social media, particularly Twitter, has become a serious issue with significant legal and social implications. This study developed a model to detect accounts involved in online prostitution activities by combining machine learning algorithms and advanced linguistic representations from the BERT model (Bidirectional Encoder Representations from Transformers). The dataset used was obtained by scraping over 4,000 tweets with hashtags related to prostitution, followed by data cleaning, feature extraction, and manual labeling to distinguish between prostitution and non-prostitution accounts. This study integrates machine learning methods, such as Random Forest, Decision Tree, and Support Vector Machine (SVM), with the semantic representation power of BERT, including the implementation of the zero-shot classification approach. Probabilistic labels generated from BERT-based zero-shot classification, in the form of ”prostitution” and ”non-prostitution” scores, were added as new features to the numerical dataset. This feature addition provides semantic contributions unavailable in numerical data, enabling the model to understand more complex linguistic contexts. The results demonstrate a significant improvement in model performance after incorporating BERT features. Accuracy increased from 71.21% to 80%, precision from 67.64% to 88.14%, and F1-score from 73.93% to 83.87%. These findings indicate that BERT-based zero-shot classification not only enhances the reliability of detecting online prostitution accounts but also achieves a better balance between precision and sensitivity, making it an effective solution for addressing issues on social media platforms.