RELIABILITY OF OPINION SOURCES BASED ON MULTIFACTORS IN ONLINE SOCIAL NETWORKS

Online social networks (OSN), i.e., Facebook and Twitter, have received tremendous attention in the last decade and is evolving from media of conversations or opinion sharing to platforms to disseminate information on the current events. However, some content posted in OSN cannot be trusted, beca...

Full description

Saved in:
Bibliographic Details
Main Author: Budi Setiawan, Erwin
Format: Dissertations
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/49318
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Online social networks (OSN), i.e., Facebook and Twitter, have received tremendous attention in the last decade and is evolving from media of conversations or opinion sharing to platforms to disseminate information on the current events. However, some content posted in OSN cannot be trusted, because the information generally does not go through an editorial and fact-checking process. Inaccurate information that attracts attention is repeatedly exposed and spread quickly leading to a consideration of being a true. This problem can make a serious negative impact to the users using OSN as a source of information and decision making. There are many ways to overcome this problem, i.e., by checking the reliability of the information sources. Reliability of the source helps the users to decide which source can be relied on for the accurate information. This dissertation proposes an integration of factors and a use of cross-group data for the reliability of sources at OSN, i.e., Twitter and Facebook. The source reliability model is built based on five main factors, i.e., topic-based factors, sentiment-based factors, spam-based factors, HCC factors (hoax-based factors, competence-based, and curator-based), and UCR factors (user-based, content-based, and retweet-based factors). This dissertation develops 4 models to build a source reliability model, i.e., the topic class model for topic-based factors, sentiment models for sentiment-based factors, spammer models for spam-based factors and information credibility models. The information credibility model is developed earlier to filter the information and also as a basis for identifying these key factors. In the next stage, three models are also developed, i.e., the topic class model, the sentiment model, and the spammer model. The topic class model is developed to reduce vocabulary mismatch with the word embeddings on tweet topic classes on Twitter. This stage expands features using Word2Vec. Furthermore, the sentiment model focuses on developing the hybrid method, which is a combination of basic features with feature expansion based on Term Frequency – Inverse Document Frequency (TF-IDF) and feature expansion based on tweet-based features. The spammer model is a model obtained by adding 4 new features, i.e., spam_words_indo, total_spam, #like, and URL_rasio. Finally, this dissertation focuses on the development of a source reliability model on Twitter and Facebook. To observe the accuracy of source reliability, five classifiers are used, i.e., Naïve iv Bayes (NB), Support Vector Machine (SVM), Logistic Regression (Logit), J48, and Random Forest (RF). The source reliability model is built in stages, the first to build a source reliability model on Twitter, the second, to build a source reliability model on Facebook and the last, to build a combined source reliability model (Twitter and Facebook). The results of the reliability of the proposed source on Twitter are better than the previous studies with an increase of 11,31% in accuracy and an F-measure of 16,68%. Based on the 5 classifiers, the highest accuracy and F-measure results are achieved by the RF classifier with feature selection of 90,46% and 0,9040. For the results of the reliability of sources on Facebook, as far as knowledge has not been done, of the 5 classifiers used, the best accuracy and F-measure is achieved by the SVM classifier with a feature selection of 73,18% and 0,7205. The results of the combined source reliability model, Twitter and Facebook, produces the best accuracy, while F-Measure is achieved by the RF classifier amounted to 80,00% and 0,7974.