HOAX DETECTION ON TWITTER SOCIAL MEDIA BASED ON TEXTUAL ANALYSIS AND SOCIAL NETWORK CHARACTERISTICS
The ease of spreading information in the social media growth era not only provides benefits but also gives rise to various threats. One of the biggest threats is the spread of hoaxes or fake news information by irresponsible parties. This final project intends to build a hoax detection model for...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/67258 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:67258 |
---|---|
spelling |
id-itb.:672582022-08-19T04:10:25ZHOAX DETECTION ON TWITTER SOCIAL MEDIA BASED ON TEXTUAL ANALYSIS AND SOCIAL NETWORK CHARACTERISTICS Fadhlurohman, Aufa Indonesia Final Project classification, hoax, fake news, deep learning, siamese network, social network INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/67258 The ease of spreading information in the social media growth era not only provides benefits but also gives rise to various threats. One of the biggest threats is the spread of hoaxes or fake news information by irresponsible parties. This final project intends to build a hoax detection model for Twitter social media by identification of textual and social network characteristics that can assist in this task. Currently, social media hoax detection in social network analysis app still relies on the hoax data text similarity search method with a certain threshold. This final project proposes the use of a combination of textual pattern, textual similarity, user information, and network information to improve detection ability. The baseline model in this study uses shallow learning algorithms with a textual similar level of information input. The main model was developed using deep learning method with various word embedding, feature combinations, and some architecture. One of the architectures being tried is the siamese architecture as an effort to better identify textual similarity based on the context. This study focuses on conducting experiments to get the best configuration for the hoax classification model. The data used in the experiment amounted to 6983 data consisting of three classes of counter hoaxes, non-hoaxes, and hoaxes. Based on the experimental results, the baseline model got an f1-score of 0.6521. The best model is obtained using the siamese similarity architecture with additional user information features. This best model was built using BERT learning and managed to get an f1-score of 0.8086. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
The ease of spreading information in the social media growth era not only provides
benefits but also gives rise to various threats. One of the biggest threats is the
spread of hoaxes or fake news information by irresponsible parties. This final
project intends to build a hoax detection model for Twitter social media by
identification of textual and social network characteristics that can assist in this
task.
Currently, social media hoax detection in social network analysis app still relies on
the hoax data text similarity search method with a certain threshold. This final
project proposes the use of a combination of textual pattern, textual similarity, user
information, and network information to improve detection ability. The baseline
model in this study uses shallow learning algorithms with a textual similar level of
information input. The main model was developed using deep learning method with
various word embedding, feature combinations, and some architecture. One of the
architectures being tried is the siamese architecture as an effort to better identify
textual similarity based on the context. This study focuses on conducting
experiments to get the best configuration for the hoax classification model.
The data used in the experiment amounted to 6983 data consisting of three classes
of counter hoaxes, non-hoaxes, and hoaxes. Based on the experimental results, the
baseline model got an f1-score of 0.6521. The best model is obtained using the
siamese similarity architecture with additional user information features. This best
model was built using BERT learning and managed to get an f1-score of 0.8086. |
format |
Final Project |
author |
Fadhlurohman, Aufa |
spellingShingle |
Fadhlurohman, Aufa HOAX DETECTION ON TWITTER SOCIAL MEDIA BASED ON TEXTUAL ANALYSIS AND SOCIAL NETWORK CHARACTERISTICS |
author_facet |
Fadhlurohman, Aufa |
author_sort |
Fadhlurohman, Aufa |
title |
HOAX DETECTION ON TWITTER SOCIAL MEDIA BASED ON TEXTUAL ANALYSIS AND SOCIAL NETWORK CHARACTERISTICS |
title_short |
HOAX DETECTION ON TWITTER SOCIAL MEDIA BASED ON TEXTUAL ANALYSIS AND SOCIAL NETWORK CHARACTERISTICS |
title_full |
HOAX DETECTION ON TWITTER SOCIAL MEDIA BASED ON TEXTUAL ANALYSIS AND SOCIAL NETWORK CHARACTERISTICS |
title_fullStr |
HOAX DETECTION ON TWITTER SOCIAL MEDIA BASED ON TEXTUAL ANALYSIS AND SOCIAL NETWORK CHARACTERISTICS |
title_full_unstemmed |
HOAX DETECTION ON TWITTER SOCIAL MEDIA BASED ON TEXTUAL ANALYSIS AND SOCIAL NETWORK CHARACTERISTICS |
title_sort |
hoax detection on twitter social media based on textual analysis and social network characteristics |
url |
https://digilib.itb.ac.id/gdl/view/67258 |
_version_ |
1822933296704651264 |