HOAX DETECTION ON TWITTER SOCIAL MEDIA BASED ON TEXTUAL ANALYSIS AND SOCIAL NETWORK CHARACTERISTICS
The ease of spreading information in the social media growth era not only provides benefits but also gives rise to various threats. One of the biggest threats is the spread of hoaxes or fake news information by irresponsible parties. This final project intends to build a hoax detection model for...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/67258 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | The ease of spreading information in the social media growth era not only provides
benefits but also gives rise to various threats. One of the biggest threats is the
spread of hoaxes or fake news information by irresponsible parties. This final
project intends to build a hoax detection model for Twitter social media by
identification of textual and social network characteristics that can assist in this
task.
Currently, social media hoax detection in social network analysis app still relies on
the hoax data text similarity search method with a certain threshold. This final
project proposes the use of a combination of textual pattern, textual similarity, user
information, and network information to improve detection ability. The baseline
model in this study uses shallow learning algorithms with a textual similar level of
information input. The main model was developed using deep learning method with
various word embedding, feature combinations, and some architecture. One of the
architectures being tried is the siamese architecture as an effort to better identify
textual similarity based on the context. This study focuses on conducting
experiments to get the best configuration for the hoax classification model.
The data used in the experiment amounted to 6983 data consisting of three classes
of counter hoaxes, non-hoaxes, and hoaxes. Based on the experimental results, the
baseline model got an f1-score of 0.6521. The best model is obtained using the
siamese similarity architecture with additional user information features. This best
model was built using BERT learning and managed to get an f1-score of 0.8086. |
---|