DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES
Online prostitution through social media, particularly Twitter, has become a serious issue with significant legal and social implications. This study developed a model to detect accounts involved in online prostitution activities by combining machine learning algorithms and advanced linguistic re...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/86848 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:86848 |
---|---|
spelling |
id-itb.:868482024-12-26T21:28:33ZDETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES Muhammad Aulia, Faraz Indonesia Theses online prostitution, Twitter, machine learning, BERT, zero-shot classification, account detection. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/86848 Online prostitution through social media, particularly Twitter, has become a serious issue with significant legal and social implications. This study developed a model to detect accounts involved in online prostitution activities by combining machine learning algorithms and advanced linguistic representations from the BERT model (Bidirectional Encoder Representations from Transformers). The dataset used was obtained by scraping over 4,000 tweets with hashtags related to prostitution, followed by data cleaning, feature extraction, and manual labeling to distinguish between prostitution and non-prostitution accounts. This study integrates machine learning methods, such as Random Forest, Decision Tree, and Support Vector Machine (SVM), with the semantic representation power of BERT, including the implementation of the zero-shot classification approach. Probabilistic labels generated from BERT-based zero-shot classification, in the form of ”prostitution” and ”non-prostitution” scores, were added as new features to the numerical dataset. This feature addition provides semantic contributions unavailable in numerical data, enabling the model to understand more complex linguistic contexts. The results demonstrate a significant improvement in model performance after incorporating BERT features. Accuracy increased from 71.21% to 80%, precision from 67.64% to 88.14%, and F1-score from 73.93% to 83.87%. These findings indicate that BERT-based zero-shot classification not only enhances the reliability of detecting online prostitution accounts but also achieves a better balance between precision and sensitivity, making it an effective solution for addressing issues on social media platforms. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Online prostitution through social media, particularly Twitter, has become a serious
issue with significant legal and social implications. This study developed a model
to detect accounts involved in online prostitution activities by combining machine
learning algorithms and advanced linguistic representations from the BERT model
(Bidirectional Encoder Representations from Transformers). The dataset used was
obtained by scraping over 4,000 tweets with hashtags related to prostitution, followed
by data cleaning, feature extraction, and manual labeling to distinguish between
prostitution and non-prostitution accounts.
This study integrates machine learning methods, such as Random Forest, Decision
Tree, and Support Vector Machine (SVM), with the semantic representation power
of BERT, including the implementation of the zero-shot classification approach.
Probabilistic labels generated from BERT-based zero-shot classification, in the
form of ”prostitution” and ”non-prostitution” scores, were added as new features
to the numerical dataset. This feature addition provides semantic contributions
unavailable in numerical data, enabling the model to understand more complex
linguistic contexts.
The results demonstrate a significant improvement in model performance after
incorporating BERT features. Accuracy increased from 71.21% to 80%, precision
from 67.64% to 88.14%, and F1-score from 73.93% to 83.87%. These findings
indicate that BERT-based zero-shot classification not only enhances the reliability
of detecting online prostitution accounts but also achieves a better balance between
precision and sensitivity, making it an effective solution for addressing issues on
social media platforms. |
format |
Theses |
author |
Muhammad Aulia, Faraz |
spellingShingle |
Muhammad Aulia, Faraz DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES |
author_facet |
Muhammad Aulia, Faraz |
author_sort |
Muhammad Aulia, Faraz |
title |
DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES |
title_short |
DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES |
title_full |
DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES |
title_fullStr |
DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES |
title_full_unstemmed |
DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES |
title_sort |
detection of online prostitution accounts on twitter using machine learning and bert approaches |
url |
https://digilib.itb.ac.id/gdl/view/86848 |
_version_ |
1822999697225154560 |