Identifying misinformation and their sources in social networks

Online social networks such as Twitter, Facebook and Sina Weibo are becoming increasingly important sources of information for its users. Misinformation such as rumors and fake news can spread quickly within social circles and may potentially incur massive losses to society. Therefore, it is often i...

Full description

Saved in:

Bibliographic Details
Main Author:	Tang, Wenchang
Other Authors:	Tay, Wee Peng
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2020
Subjects:	Engineering::Electrical and electronic engineering
Online Access:	https://hdl.handle.net/10356/138568
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-138568
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering
spellingShingle	Engineering::Electrical and electronic engineering Tang, Wenchang Identifying misinformation and their sources in social networks
description	Online social networks such as Twitter, Facebook and Sina Weibo are becoming increasingly important sources of information for its users. Misinformation such as rumors and fake news can spread quickly within social circles and may potentially incur massive losses to society. Therefore, it is often important to be able to accurately and promptly identify misinformation and their sources, so that proper control measures can be adopted. The research goal of this thesis is to develop methods estimating misinformation sources based on network topology and partial timestamps, and detecting misinformation based on appearance-level linguistic patterns. In many practical applications, the network topology may not be known in advance and needs to be inferred. We first consider the problem of inferring network topology from information cascades and knowledge of some moments of the diffusion distribution across each edge, without needing to know the distribution itself. We first propose an iterative tree inference algorithm and then generalized the algorithm heuristically to general graphs. Simulations using synthetic and real networks, and experiments using real-world data suggest that our proposed algorithm performs better than some current state-of-the-art network reconstruction methods. We then study the problem of identifying misinformation sources in a network based on the network topology, and a subset of infection timestamps. In the case of a single misinformation source in a tree network, we derive the maximum likelihood estimator of the source and the unknown diffusion parameters. We then introduce a new heuristic involving an optimization over a parameterized family of Gromov matrices to develop a single source estimation algorithm for general graphs. Compared with the breadth-first search tree heuristic commonly adopted in the literature, simulations demonstrate that our approach achieves better estimation accuracy than several other benchmark algorithms, even though these require more information like the diffusion parameters. We next develop a multiple sources estimation algorithm for general graphs, which first partitions the graph into source candidate clusters, and then applies our single source estimation algorithm to each cluster. We show that if the graph is a tree, then each source candidate cluster contains at least one source. We also propose a sequential source estimation algorithm using a particle filter that is based on an approximate hidden Markov chain model, which can be interpreted as a ``reverse'' propagation process. This algorithm allows the source estimate to be updated quickly whenever a new timestamp is available. Simulations using synthetic and real networks, and experiments using real-world data suggest that our proposed algorithms are able to estimate the true misinformation source(s) to within a small number of hops with a small portion of the infection timestamps being observed. Finally, we consider the problem of fake news detection based on statement and additional metadata. We develop a hybrid supervised learning framework combining a deep learning model and a graph model to predict the truthfulness label of a piece of news. We first train two deep neural networks, one to obtain a vector of likelihood scores for different truthfulness classes for any news, the other to show the relationship of truthfulness between any pair of news. A directed and weighted graph is then constructed to compute another vector of likelihood scores for different truthfulness classes. We finally combine two vectors from the first neural network and the graph model and obtain the label prediction by finding the truthfulness class with highest likelihood score. We apply our framework on the publicly LIAR dataset and obtained an improvement in prediction accuracy over the state-of-the-art.
author2	Tay, Wee Peng
author_facet	Tay, Wee Peng Tang, Wenchang
format	Thesis-Doctor of Philosophy
author	Tang, Wenchang
author_sort	Tang, Wenchang
title	Identifying misinformation and their sources in social networks
title_short	Identifying misinformation and their sources in social networks
title_full	Identifying misinformation and their sources in social networks
title_fullStr	Identifying misinformation and their sources in social networks
title_full_unstemmed	Identifying misinformation and their sources in social networks
title_sort	identifying misinformation and their sources in social networks
publisher	Nanyang Technological University
publishDate	2020
url	https://hdl.handle.net/10356/138568
_version_	1772827804148170752
spelling	sg-ntu-dr.10356-1385682023-07-04T17:18:47Z Identifying misinformation and their sources in social networks Tang, Wenchang Tay, Wee Peng School of Electrical and Electronic Engineering wptay@ntu.edu.sg Engineering::Electrical and electronic engineering Online social networks such as Twitter, Facebook and Sina Weibo are becoming increasingly important sources of information for its users. Misinformation such as rumors and fake news can spread quickly within social circles and may potentially incur massive losses to society. Therefore, it is often important to be able to accurately and promptly identify misinformation and their sources, so that proper control measures can be adopted. The research goal of this thesis is to develop methods estimating misinformation sources based on network topology and partial timestamps, and detecting misinformation based on appearance-level linguistic patterns. In many practical applications, the network topology may not be known in advance and needs to be inferred. We first consider the problem of inferring network topology from information cascades and knowledge of some moments of the diffusion distribution across each edge, without needing to know the distribution itself. We first propose an iterative tree inference algorithm and then generalized the algorithm heuristically to general graphs. Simulations using synthetic and real networks, and experiments using real-world data suggest that our proposed algorithm performs better than some current state-of-the-art network reconstruction methods. We then study the problem of identifying misinformation sources in a network based on the network topology, and a subset of infection timestamps. In the case of a single misinformation source in a tree network, we derive the maximum likelihood estimator of the source and the unknown diffusion parameters. We then introduce a new heuristic involving an optimization over a parameterized family of Gromov matrices to develop a single source estimation algorithm for general graphs. Compared with the breadth-first search tree heuristic commonly adopted in the literature, simulations demonstrate that our approach achieves better estimation accuracy than several other benchmark algorithms, even though these require more information like the diffusion parameters. We next develop a multiple sources estimation algorithm for general graphs, which first partitions the graph into source candidate clusters, and then applies our single source estimation algorithm to each cluster. We show that if the graph is a tree, then each source candidate cluster contains at least one source. We also propose a sequential source estimation algorithm using a particle filter that is based on an approximate hidden Markov chain model, which can be interpreted as a ``reverse'' propagation process. This algorithm allows the source estimate to be updated quickly whenever a new timestamp is available. Simulations using synthetic and real networks, and experiments using real-world data suggest that our proposed algorithms are able to estimate the true misinformation source(s) to within a small number of hops with a small portion of the infection timestamps being observed. Finally, we consider the problem of fake news detection based on statement and additional metadata. We develop a hybrid supervised learning framework combining a deep learning model and a graph model to predict the truthfulness label of a piece of news. We first train two deep neural networks, one to obtain a vector of likelihood scores for different truthfulness classes for any news, the other to show the relationship of truthfulness between any pair of news. A directed and weighted graph is then constructed to compute another vector of likelihood scores for different truthfulness classes. We finally combine two vectors from the first neural network and the graph model and obtain the label prediction by finding the truthfulness class with highest likelihood score. We apply our framework on the publicly LIAR dataset and obtained an improvement in prediction accuracy over the state-of-the-art. Doctor of Philosophy 2020-05-08T06:42:59Z 2020-05-08T06:42:59Z 2020 Thesis-Doctor of Philosophy Tang, W. (2020). Identifying misinformation and their sources in social networks. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/138568 10.32657/10356/138568 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

Identifying misinformation and their sources in social networks

Similar Items