Identifying misinformation and their sources in social networks

Online social networks such as Twitter, Facebook and Sina Weibo are becoming increasingly important sources of information for its users. Misinformation such as rumors and fake news can spread quickly within social circles and may potentially incur massive losses to society. Therefore, it is often i...

Full description

Saved in:
Bibliographic Details
Main Author: Tang, Wenchang
Other Authors: Tay, Wee Peng
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/138568
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-138568
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering
spellingShingle Engineering::Electrical and electronic engineering
Tang, Wenchang
Identifying misinformation and their sources in social networks
description Online social networks such as Twitter, Facebook and Sina Weibo are becoming increasingly important sources of information for its users. Misinformation such as rumors and fake news can spread quickly within social circles and may potentially incur massive losses to society. Therefore, it is often important to be able to accurately and promptly identify misinformation and their sources, so that proper control measures can be adopted. The research goal of this thesis is to develop methods estimating misinformation sources based on network topology and partial timestamps, and detecting misinformation based on appearance-level linguistic patterns. In many practical applications, the network topology may not be known in advance and needs to be inferred. We first consider the problem of inferring network topology from information cascades and knowledge of some moments of the diffusion distribution across each edge, without needing to know the distribution itself. We first propose an iterative tree inference algorithm and then generalized the algorithm heuristically to general graphs. Simulations using synthetic and real networks, and experiments using real-world data suggest that our proposed algorithm performs better than some current state-of-the-art network reconstruction methods. We then study the problem of identifying misinformation sources in a network based on the network topology, and a subset of infection timestamps. In the case of a single misinformation source in a tree network, we derive the maximum likelihood estimator of the source and the unknown diffusion parameters. We then introduce a new heuristic involving an optimization over a parameterized family of Gromov matrices to develop a single source estimation algorithm for general graphs. Compared with the breadth-first search tree heuristic commonly adopted in the literature, simulations demonstrate that our approach achieves better estimation accuracy than several other benchmark algorithms, even though these require more information like the diffusion parameters. We next develop a multiple sources estimation algorithm for general graphs, which first partitions the graph into source candidate clusters, and then applies our single source estimation algorithm to each cluster. We show that if the graph is a tree, then each source candidate cluster contains at least one source. We also propose a sequential source estimation algorithm using a particle filter that is based on an approximate hidden Markov chain model, which can be interpreted as a ``reverse'' propagation process. This algorithm allows the source estimate to be updated quickly whenever a new timestamp is available. Simulations using synthetic and real networks, and experiments using real-world data suggest that our proposed algorithms are able to estimate the true misinformation source(s) to within a small number of hops with a small portion of the infection timestamps being observed. Finally, we consider the problem of fake news detection based on statement and additional metadata. We develop a hybrid supervised learning framework combining a deep learning model and a graph model to predict the truthfulness label of a piece of news. We first train two deep neural networks, one to obtain a vector of likelihood scores for different truthfulness classes for any news, the other to show the relationship of truthfulness between any pair of news. A directed and weighted graph is then constructed to compute another vector of likelihood scores for different truthfulness classes. We finally combine two vectors from the first neural network and the graph model and obtain the label prediction by finding the truthfulness class with highest likelihood score. We apply our framework on the publicly LIAR dataset and obtained an improvement in prediction accuracy over the state-of-the-art.
author2 Tay, Wee Peng
author_facet Tay, Wee Peng
Tang, Wenchang
format Thesis-Doctor of Philosophy
author Tang, Wenchang
author_sort Tang, Wenchang
title Identifying misinformation and their sources in social networks
title_short Identifying misinformation and their sources in social networks
title_full Identifying misinformation and their sources in social networks
title_fullStr Identifying misinformation and their sources in social networks
title_full_unstemmed Identifying misinformation and their sources in social networks
title_sort identifying misinformation and their sources in social networks
publisher Nanyang Technological University
publishDate 2020
url https://hdl.handle.net/10356/138568
_version_ 1772827804148170752
spelling sg-ntu-dr.10356-1385682023-07-04T17:18:47Z Identifying misinformation and their sources in social networks Tang, Wenchang Tay, Wee Peng School of Electrical and Electronic Engineering wptay@ntu.edu.sg Engineering::Electrical and electronic engineering Online social networks such as Twitter, Facebook and Sina Weibo are becoming increasingly important sources of information for its users. Misinformation such as rumors and fake news can spread quickly within social circles and may potentially incur massive losses to society. Therefore, it is often important to be able to accurately and promptly identify misinformation and their sources, so that proper control measures can be adopted. The research goal of this thesis is to develop methods estimating misinformation sources based on network topology and partial timestamps, and detecting misinformation based on appearance-level linguistic patterns. In many practical applications, the network topology may not be known in advance and needs to be inferred. We first consider the problem of inferring network topology from information cascades and knowledge of some moments of the diffusion distribution across each edge, without needing to know the distribution itself. We first propose an iterative tree inference algorithm and then generalized the algorithm heuristically to general graphs. Simulations using synthetic and real networks, and experiments using real-world data suggest that our proposed algorithm performs better than some current state-of-the-art network reconstruction methods. We then study the problem of identifying misinformation sources in a network based on the network topology, and a subset of infection timestamps. In the case of a single misinformation source in a tree network, we derive the maximum likelihood estimator of the source and the unknown diffusion parameters. We then introduce a new heuristic involving an optimization over a parameterized family of Gromov matrices to develop a single source estimation algorithm for general graphs. Compared with the breadth-first search tree heuristic commonly adopted in the literature, simulations demonstrate that our approach achieves better estimation accuracy than several other benchmark algorithms, even though these require more information like the diffusion parameters. We next develop a multiple sources estimation algorithm for general graphs, which first partitions the graph into source candidate clusters, and then applies our single source estimation algorithm to each cluster. We show that if the graph is a tree, then each source candidate cluster contains at least one source. We also propose a sequential source estimation algorithm using a particle filter that is based on an approximate hidden Markov chain model, which can be interpreted as a ``reverse'' propagation process. This algorithm allows the source estimate to be updated quickly whenever a new timestamp is available. Simulations using synthetic and real networks, and experiments using real-world data suggest that our proposed algorithms are able to estimate the true misinformation source(s) to within a small number of hops with a small portion of the infection timestamps being observed. Finally, we consider the problem of fake news detection based on statement and additional metadata. We develop a hybrid supervised learning framework combining a deep learning model and a graph model to predict the truthfulness label of a piece of news. We first train two deep neural networks, one to obtain a vector of likelihood scores for different truthfulness classes for any news, the other to show the relationship of truthfulness between any pair of news. A directed and weighted graph is then constructed to compute another vector of likelihood scores for different truthfulness classes. We finally combine two vectors from the first neural network and the graph model and obtain the label prediction by finding the truthfulness class with highest likelihood score. We apply our framework on the publicly LIAR dataset and obtained an improvement in prediction accuracy over the state-of-the-art. Doctor of Philosophy 2020-05-08T06:42:59Z 2020-05-08T06:42:59Z 2020 Thesis-Doctor of Philosophy Tang, W. (2020). Identifying misinformation and their sources in social networks. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/138568 10.32657/10356/138568 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University