DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES

Online prostitution through social media, particularly Twitter, has become a serious issue with significant legal and social implications. This study developed a model to detect accounts involved in online prostitution activities by combining machine learning algorithms and advanced linguistic re...

Full description

Saved in:
Bibliographic Details
Main Author: Muhammad Aulia, Faraz
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/86848
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:86848
spelling id-itb.:868482024-12-26T21:28:33ZDETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES Muhammad Aulia, Faraz Indonesia Theses online prostitution, Twitter, machine learning, BERT, zero-shot classification, account detection. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/86848 Online prostitution through social media, particularly Twitter, has become a serious issue with significant legal and social implications. This study developed a model to detect accounts involved in online prostitution activities by combining machine learning algorithms and advanced linguistic representations from the BERT model (Bidirectional Encoder Representations from Transformers). The dataset used was obtained by scraping over 4,000 tweets with hashtags related to prostitution, followed by data cleaning, feature extraction, and manual labeling to distinguish between prostitution and non-prostitution accounts. This study integrates machine learning methods, such as Random Forest, Decision Tree, and Support Vector Machine (SVM), with the semantic representation power of BERT, including the implementation of the zero-shot classification approach. Probabilistic labels generated from BERT-based zero-shot classification, in the form of ”prostitution” and ”non-prostitution” scores, were added as new features to the numerical dataset. This feature addition provides semantic contributions unavailable in numerical data, enabling the model to understand more complex linguistic contexts. The results demonstrate a significant improvement in model performance after incorporating BERT features. Accuracy increased from 71.21% to 80%, precision from 67.64% to 88.14%, and F1-score from 73.93% to 83.87%. These findings indicate that BERT-based zero-shot classification not only enhances the reliability of detecting online prostitution accounts but also achieves a better balance between precision and sensitivity, making it an effective solution for addressing issues on social media platforms. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Online prostitution through social media, particularly Twitter, has become a serious issue with significant legal and social implications. This study developed a model to detect accounts involved in online prostitution activities by combining machine learning algorithms and advanced linguistic representations from the BERT model (Bidirectional Encoder Representations from Transformers). The dataset used was obtained by scraping over 4,000 tweets with hashtags related to prostitution, followed by data cleaning, feature extraction, and manual labeling to distinguish between prostitution and non-prostitution accounts. This study integrates machine learning methods, such as Random Forest, Decision Tree, and Support Vector Machine (SVM), with the semantic representation power of BERT, including the implementation of the zero-shot classification approach. Probabilistic labels generated from BERT-based zero-shot classification, in the form of ”prostitution” and ”non-prostitution” scores, were added as new features to the numerical dataset. This feature addition provides semantic contributions unavailable in numerical data, enabling the model to understand more complex linguistic contexts. The results demonstrate a significant improvement in model performance after incorporating BERT features. Accuracy increased from 71.21% to 80%, precision from 67.64% to 88.14%, and F1-score from 73.93% to 83.87%. These findings indicate that BERT-based zero-shot classification not only enhances the reliability of detecting online prostitution accounts but also achieves a better balance between precision and sensitivity, making it an effective solution for addressing issues on social media platforms.
format Theses
author Muhammad Aulia, Faraz
spellingShingle Muhammad Aulia, Faraz
DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES
author_facet Muhammad Aulia, Faraz
author_sort Muhammad Aulia, Faraz
title DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES
title_short DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES
title_full DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES
title_fullStr DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES
title_full_unstemmed DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES
title_sort detection of online prostitution accounts on twitter using machine learning and bert approaches
url https://digilib.itb.ac.id/gdl/view/86848
_version_ 1822999697225154560