RELIABILITY OF OPINION SOURCES BASED ON MULTIFACTORS IN ONLINE SOCIAL NETWORKS
Online social networks (OSN), i.e., Facebook and Twitter, have received tremendous attention in the last decade and is evolving from media of conversations or opinion sharing to platforms to disseminate information on the current events. However, some content posted in OSN cannot be trusted, beca...
Saved in:
Main Author: | |
---|---|
Format: | Dissertations |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/49318 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Online social networks (OSN), i.e., Facebook and Twitter, have received
tremendous attention in the last decade and is evolving from media of conversations
or opinion sharing to platforms to disseminate information on the current events.
However, some content posted in OSN cannot be trusted, because the information
generally does not go through an editorial and fact-checking process. Inaccurate
information that attracts attention is repeatedly exposed and spread quickly leading
to a consideration of being a true. This problem can make a serious negative impact
to the users using OSN as a source of information and decision making. There are
many ways to overcome this problem, i.e., by checking the reliability of the
information sources. Reliability of the source helps the users to decide which source
can be relied on for the accurate information. This dissertation proposes an
integration of factors and a use of cross-group data for the reliability of sources at
OSN, i.e., Twitter and Facebook.
The source reliability model is built based on five main factors, i.e., topic-based
factors, sentiment-based factors, spam-based factors, HCC factors (hoax-based
factors, competence-based, and curator-based), and UCR factors (user-based,
content-based, and retweet-based factors). This dissertation develops 4 models to
build a source reliability model, i.e., the topic class model for topic-based factors,
sentiment models for sentiment-based factors, spammer models for spam-based
factors and information credibility models. The information credibility model is
developed earlier to filter the information and also as a basis for identifying these
key factors. In the next stage, three models are also developed, i.e., the topic class
model, the sentiment model, and the spammer model. The topic class model is
developed to reduce vocabulary mismatch with the word embeddings on tweet topic
classes on Twitter. This stage expands features using Word2Vec. Furthermore, the
sentiment model focuses on developing the hybrid method, which is a combination
of basic features with feature expansion based on Term Frequency – Inverse
Document Frequency (TF-IDF) and feature expansion based on tweet-based
features. The spammer model is a model obtained by adding 4 new features, i.e.,
spam_words_indo, total_spam, #like, and URL_rasio. Finally, this dissertation
focuses on the development of a source reliability model on Twitter and Facebook.
To observe the accuracy of source reliability, five classifiers are used, i.e., Naïve
iv
Bayes (NB), Support Vector Machine (SVM), Logistic Regression (Logit), J48, and
Random Forest (RF).
The source reliability model is built in stages, the first to build a source reliability
model on Twitter, the second, to build a source reliability model on Facebook and
the last, to build a combined source reliability model (Twitter and Facebook). The
results of the reliability of the proposed source on Twitter are better than the
previous studies with an increase of 11,31% in accuracy and an F-measure of
16,68%. Based on the 5 classifiers, the highest accuracy and F-measure results are
achieved by the RF classifier with feature selection of 90,46% and 0,9040. For the
results of the reliability of sources on Facebook, as far as knowledge has not been
done, of the 5 classifiers used, the best accuracy and F-measure is achieved by the
SVM classifier with a feature selection of 73,18% and 0,7205. The results of the
combined source reliability model, Twitter and Facebook, produces the best
accuracy, while F-Measure is achieved by the RF classifier amounted to 80,00%
and 0,7974. |
---|