Identifying misinformation and their sources in social networks
Online social networks such as Twitter, Facebook and Sina Weibo are becoming increasingly important sources of information for its users. Misinformation such as rumors and fake news can spread quickly within social circles and may potentially incur massive losses to society. Therefore, it is often i...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/138568 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-138568 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Electrical and electronic engineering |
spellingShingle |
Engineering::Electrical and electronic engineering Tang, Wenchang Identifying misinformation and their sources in social networks |
description |
Online social networks such as Twitter, Facebook and Sina Weibo are becoming increasingly important sources of information for its users. Misinformation such as rumors and fake news can spread quickly within social circles and may potentially incur massive losses to society. Therefore, it is often important to be able to accurately and promptly identify misinformation and their sources, so that proper control measures can be adopted. The research goal of this thesis is to develop methods estimating misinformation sources based on network topology and partial timestamps, and detecting misinformation based on appearance-level linguistic patterns.
In many practical applications, the network topology may not be known in advance and needs to be inferred. We first consider the problem of inferring network topology from information cascades and knowledge of some moments of the diffusion distribution across each edge, without needing to know the distribution itself. We first propose an iterative tree inference algorithm and then generalized the algorithm heuristically to general graphs. Simulations using synthetic and real networks, and experiments using real-world data suggest that our proposed algorithm performs better than some current state-of-the-art network reconstruction methods.
We then study the problem of identifying misinformation sources in a network based on the network topology, and a subset of infection timestamps. In the case of a single misinformation source in a tree network, we derive the maximum likelihood estimator of the source and the unknown diffusion parameters. We then introduce a new heuristic involving an optimization over a parameterized family of Gromov matrices to develop a single source estimation algorithm for general graphs. Compared with the breadth-first search tree heuristic commonly adopted in the literature, simulations demonstrate that our approach achieves better estimation accuracy than several other benchmark algorithms, even though these require more information like the diffusion parameters. We next develop a multiple sources estimation algorithm for general graphs, which first partitions the graph into source candidate clusters, and then applies our single source estimation algorithm to each cluster. We show that if the graph is a tree, then each source candidate cluster contains at least one source. We also propose a sequential source estimation algorithm using a particle filter that is based on an approximate hidden Markov chain model, which can be interpreted as a ``reverse'' propagation process. This algorithm allows the source estimate to be updated quickly whenever a new timestamp is available. Simulations using synthetic and real networks, and experiments using real-world data suggest that our proposed algorithms are able to estimate the true misinformation source(s) to within a small number of hops with a small portion of the infection timestamps being observed.
Finally, we consider the problem of fake news detection based on statement and additional metadata. We develop a hybrid supervised learning framework combining a deep learning model and a graph model to predict the truthfulness label of a piece of news. We first train two deep neural networks, one to obtain a vector of likelihood scores for different truthfulness classes for any news, the other to show the relationship of truthfulness between any pair of news. A directed and weighted graph is then constructed to compute another vector of likelihood scores for different truthfulness classes. We finally combine two vectors from the first neural network and the graph model and obtain the label prediction by finding the truthfulness class with highest likelihood score. We apply our framework on the publicly LIAR dataset and obtained an improvement in prediction accuracy over the state-of-the-art. |
author2 |
Tay, Wee Peng |
author_facet |
Tay, Wee Peng Tang, Wenchang |
format |
Thesis-Doctor of Philosophy |
author |
Tang, Wenchang |
author_sort |
Tang, Wenchang |
title |
Identifying misinformation and their sources in social networks |
title_short |
Identifying misinformation and their sources in social networks |
title_full |
Identifying misinformation and their sources in social networks |
title_fullStr |
Identifying misinformation and their sources in social networks |
title_full_unstemmed |
Identifying misinformation and their sources in social networks |
title_sort |
identifying misinformation and their sources in social networks |
publisher |
Nanyang Technological University |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/138568 |
_version_ |
1772827804148170752 |
spelling |
sg-ntu-dr.10356-1385682023-07-04T17:18:47Z Identifying misinformation and their sources in social networks Tang, Wenchang Tay, Wee Peng School of Electrical and Electronic Engineering wptay@ntu.edu.sg Engineering::Electrical and electronic engineering Online social networks such as Twitter, Facebook and Sina Weibo are becoming increasingly important sources of information for its users. Misinformation such as rumors and fake news can spread quickly within social circles and may potentially incur massive losses to society. Therefore, it is often important to be able to accurately and promptly identify misinformation and their sources, so that proper control measures can be adopted. The research goal of this thesis is to develop methods estimating misinformation sources based on network topology and partial timestamps, and detecting misinformation based on appearance-level linguistic patterns. In many practical applications, the network topology may not be known in advance and needs to be inferred. We first consider the problem of inferring network topology from information cascades and knowledge of some moments of the diffusion distribution across each edge, without needing to know the distribution itself. We first propose an iterative tree inference algorithm and then generalized the algorithm heuristically to general graphs. Simulations using synthetic and real networks, and experiments using real-world data suggest that our proposed algorithm performs better than some current state-of-the-art network reconstruction methods. We then study the problem of identifying misinformation sources in a network based on the network topology, and a subset of infection timestamps. In the case of a single misinformation source in a tree network, we derive the maximum likelihood estimator of the source and the unknown diffusion parameters. We then introduce a new heuristic involving an optimization over a parameterized family of Gromov matrices to develop a single source estimation algorithm for general graphs. Compared with the breadth-first search tree heuristic commonly adopted in the literature, simulations demonstrate that our approach achieves better estimation accuracy than several other benchmark algorithms, even though these require more information like the diffusion parameters. We next develop a multiple sources estimation algorithm for general graphs, which first partitions the graph into source candidate clusters, and then applies our single source estimation algorithm to each cluster. We show that if the graph is a tree, then each source candidate cluster contains at least one source. We also propose a sequential source estimation algorithm using a particle filter that is based on an approximate hidden Markov chain model, which can be interpreted as a ``reverse'' propagation process. This algorithm allows the source estimate to be updated quickly whenever a new timestamp is available. Simulations using synthetic and real networks, and experiments using real-world data suggest that our proposed algorithms are able to estimate the true misinformation source(s) to within a small number of hops with a small portion of the infection timestamps being observed. Finally, we consider the problem of fake news detection based on statement and additional metadata. We develop a hybrid supervised learning framework combining a deep learning model and a graph model to predict the truthfulness label of a piece of news. We first train two deep neural networks, one to obtain a vector of likelihood scores for different truthfulness classes for any news, the other to show the relationship of truthfulness between any pair of news. A directed and weighted graph is then constructed to compute another vector of likelihood scores for different truthfulness classes. We finally combine two vectors from the first neural network and the graph model and obtain the label prediction by finding the truthfulness class with highest likelihood score. We apply our framework on the publicly LIAR dataset and obtained an improvement in prediction accuracy over the state-of-the-art. Doctor of Philosophy 2020-05-08T06:42:59Z 2020-05-08T06:42:59Z 2020 Thesis-Doctor of Philosophy Tang, W. (2020). Identifying misinformation and their sources in social networks. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/138568 10.32657/10356/138568 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |