HOAX DETECTION ON TWITTER SOCIAL MEDIA BASED ON TEXTUAL ANALYSIS AND SOCIAL NETWORK CHARACTERISTICS

The ease of spreading information in the social media growth era not only provides benefits but also gives rise to various threats. One of the biggest threats is the spread of hoaxes or fake news information by irresponsible parties. This final project intends to build a hoax detection model for...

Full description

Saved in:
Bibliographic Details
Main Author: Fadhlurohman, Aufa
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/67258
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:The ease of spreading information in the social media growth era not only provides benefits but also gives rise to various threats. One of the biggest threats is the spread of hoaxes or fake news information by irresponsible parties. This final project intends to build a hoax detection model for Twitter social media by identification of textual and social network characteristics that can assist in this task. Currently, social media hoax detection in social network analysis app still relies on the hoax data text similarity search method with a certain threshold. This final project proposes the use of a combination of textual pattern, textual similarity, user information, and network information to improve detection ability. The baseline model in this study uses shallow learning algorithms with a textual similar level of information input. The main model was developed using deep learning method with various word embedding, feature combinations, and some architecture. One of the architectures being tried is the siamese architecture as an effort to better identify textual similarity based on the context. This study focuses on conducting experiments to get the best configuration for the hoax classification model. The data used in the experiment amounted to 6983 data consisting of three classes of counter hoaxes, non-hoaxes, and hoaxes. Based on the experimental results, the baseline model got an f1-score of 0.6521. The best model is obtained using the siamese similarity architecture with additional user information features. This best model was built using BERT learning and managed to get an f1-score of 0.8086.