DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES

Online prostitution through social media, particularly Twitter, has become a serious issue with significant legal and social implications. This study developed a model to detect accounts involved in online prostitution activities by combining machine learning algorithms and advanced linguistic re...

Full description

Saved in:

Bibliographic Details
Main Author:	Muhammad Aulia, Faraz
Format:	Theses
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/86848
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:86848
spelling	id-itb.:868482024-12-26T21:28:33ZDETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES Muhammad Aulia, Faraz Indonesia Theses online prostitution, Twitter, machine learning, BERT, zero-shot classification, account detection. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/86848 Online prostitution through social media, particularly Twitter, has become a serious issue with significant legal and social implications. This study developed a model to detect accounts involved in online prostitution activities by combining machine learning algorithms and advanced linguistic representations from the BERT model (Bidirectional Encoder Representations from Transformers). The dataset used was obtained by scraping over 4,000 tweets with hashtags related to prostitution, followed by data cleaning, feature extraction, and manual labeling to distinguish between prostitution and non-prostitution accounts. This study integrates machine learning methods, such as Random Forest, Decision Tree, and Support Vector Machine (SVM), with the semantic representation power of BERT, including the implementation of the zero-shot classification approach. Probabilistic labels generated from BERT-based zero-shot classification, in the form of ”prostitution” and ”non-prostitution” scores, were added as new features to the numerical dataset. This feature addition provides semantic contributions unavailable in numerical data, enabling the model to understand more complex linguistic contexts. The results demonstrate a significant improvement in model performance after incorporating BERT features. Accuracy increased from 71.21% to 80%, precision from 67.64% to 88.14%, and F1-score from 73.93% to 83.87%. These findings indicate that BERT-based zero-shot classification not only enhances the reliability of detecting online prostitution accounts but also achieves a better balance between precision and sensitivity, making it an effective solution for addressing issues on social media platforms. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Online prostitution through social media, particularly Twitter, has become a serious issue with significant legal and social implications. This study developed a model to detect accounts involved in online prostitution activities by combining machine learning algorithms and advanced linguistic representations from the BERT model (Bidirectional Encoder Representations from Transformers). The dataset used was obtained by scraping over 4,000 tweets with hashtags related to prostitution, followed by data cleaning, feature extraction, and manual labeling to distinguish between prostitution and non-prostitution accounts. This study integrates machine learning methods, such as Random Forest, Decision Tree, and Support Vector Machine (SVM), with the semantic representation power of BERT, including the implementation of the zero-shot classification approach. Probabilistic labels generated from BERT-based zero-shot classification, in the form of ”prostitution” and ”non-prostitution” scores, were added as new features to the numerical dataset. This feature addition provides semantic contributions unavailable in numerical data, enabling the model to understand more complex linguistic contexts. The results demonstrate a significant improvement in model performance after incorporating BERT features. Accuracy increased from 71.21% to 80%, precision from 67.64% to 88.14%, and F1-score from 73.93% to 83.87%. These findings indicate that BERT-based zero-shot classification not only enhances the reliability of detecting online prostitution accounts but also achieves a better balance between precision and sensitivity, making it an effective solution for addressing issues on social media platforms.
format	Theses
author	Muhammad Aulia, Faraz
spellingShingle	Muhammad Aulia, Faraz DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES
author_facet	Muhammad Aulia, Faraz
author_sort	Muhammad Aulia, Faraz
title	DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES
title_short	DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES
title_full	DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES
title_fullStr	DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES
title_full_unstemmed	DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES
title_sort	detection of online prostitution accounts on twitter using machine learning and bert approaches
url	https://digilib.itb.ac.id/gdl/view/86848
_version_	1822999697225154560

DETECTION OF ONLINE PROSTITUTION ACCOUNTS ON TWITTER USING MACHINE LEARNING AND BERT APPROACHES

Similar Items