Identifying misinformation and their sources in social networks

Online social networks such as Twitter, Facebook and Sina Weibo are becoming increasingly important sources of information for its users. Misinformation such as rumors and fake news can spread quickly within social circles and may potentially incur massive losses to society. Therefore, it is often i...

Full description

Saved in:
Bibliographic Details
Main Author: Tang, Wenchang
Other Authors: Tay, Wee Peng
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/138568
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Online social networks such as Twitter, Facebook and Sina Weibo are becoming increasingly important sources of information for its users. Misinformation such as rumors and fake news can spread quickly within social circles and may potentially incur massive losses to society. Therefore, it is often important to be able to accurately and promptly identify misinformation and their sources, so that proper control measures can be adopted. The research goal of this thesis is to develop methods estimating misinformation sources based on network topology and partial timestamps, and detecting misinformation based on appearance-level linguistic patterns. In many practical applications, the network topology may not be known in advance and needs to be inferred. We first consider the problem of inferring network topology from information cascades and knowledge of some moments of the diffusion distribution across each edge, without needing to know the distribution itself. We first propose an iterative tree inference algorithm and then generalized the algorithm heuristically to general graphs. Simulations using synthetic and real networks, and experiments using real-world data suggest that our proposed algorithm performs better than some current state-of-the-art network reconstruction methods. We then study the problem of identifying misinformation sources in a network based on the network topology, and a subset of infection timestamps. In the case of a single misinformation source in a tree network, we derive the maximum likelihood estimator of the source and the unknown diffusion parameters. We then introduce a new heuristic involving an optimization over a parameterized family of Gromov matrices to develop a single source estimation algorithm for general graphs. Compared with the breadth-first search tree heuristic commonly adopted in the literature, simulations demonstrate that our approach achieves better estimation accuracy than several other benchmark algorithms, even though these require more information like the diffusion parameters. We next develop a multiple sources estimation algorithm for general graphs, which first partitions the graph into source candidate clusters, and then applies our single source estimation algorithm to each cluster. We show that if the graph is a tree, then each source candidate cluster contains at least one source. We also propose a sequential source estimation algorithm using a particle filter that is based on an approximate hidden Markov chain model, which can be interpreted as a ``reverse'' propagation process. This algorithm allows the source estimate to be updated quickly whenever a new timestamp is available. Simulations using synthetic and real networks, and experiments using real-world data suggest that our proposed algorithms are able to estimate the true misinformation source(s) to within a small number of hops with a small portion of the infection timestamps being observed. Finally, we consider the problem of fake news detection based on statement and additional metadata. We develop a hybrid supervised learning framework combining a deep learning model and a graph model to predict the truthfulness label of a piece of news. We first train two deep neural networks, one to obtain a vector of likelihood scores for different truthfulness classes for any news, the other to show the relationship of truthfulness between any pair of news. A directed and weighted graph is then constructed to compute another vector of likelihood scores for different truthfulness classes. We finally combine two vectors from the first neural network and the graph model and obtain the label prediction by finding the truthfulness class with highest likelihood score. We apply our framework on the publicly LIAR dataset and obtained an improvement in prediction accuracy over the state-of-the-art.