DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES
Online prostitution through social media, particularly Twitter, has become a serious issue with significant legal and social implications. This study developed a model to detect accounts involved in online prostitution activities by combining machine learning algorithms and advanced linguistic re...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/86848 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Online prostitution through social media, particularly Twitter, has become a serious
issue with significant legal and social implications. This study developed a model
to detect accounts involved in online prostitution activities by combining machine
learning algorithms and advanced linguistic representations from the BERT model
(Bidirectional Encoder Representations from Transformers). The dataset used was
obtained by scraping over 4,000 tweets with hashtags related to prostitution, followed
by data cleaning, feature extraction, and manual labeling to distinguish between
prostitution and non-prostitution accounts.
This study integrates machine learning methods, such as Random Forest, Decision
Tree, and Support Vector Machine (SVM), with the semantic representation power
of BERT, including the implementation of the zero-shot classification approach.
Probabilistic labels generated from BERT-based zero-shot classification, in the
form of ”prostitution” and ”non-prostitution” scores, were added as new features
to the numerical dataset. This feature addition provides semantic contributions
unavailable in numerical data, enabling the model to understand more complex
linguistic contexts.
The results demonstrate a significant improvement in model performance after
incorporating BERT features. Accuracy increased from 71.21% to 80%, precision
from 67.64% to 88.14%, and F1-score from 73.93% to 83.87%. These findings
indicate that BERT-based zero-shot classification not only enhances the reliability
of detecting online prostitution accounts but also achieves a better balance between
precision and sensitivity, making it an effective solution for addressing issues on
social media platforms. |
---|