Efficient optimization of network metrics in large uncertain graphs

Graphs constitute an omnipresent data structure that can model objects and their relationships in a wide variety of real-world applications like social networks and geo-spatial road networks. Most of these graphs nowadays, in addition to being huge, are intrinsically uncertain for a variety of reaso...

Full description

Saved in:
Bibliographic Details
Main Author: Saha, Arkaprava
Other Authors: Long Cheng
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/163960
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Graphs constitute an omnipresent data structure that can model objects and their relationships in a wide variety of real-world applications like social networks and geo-spatial road networks. Most of these graphs nowadays, in addition to being huge, are intrinsically uncertain for a variety of reasons. As an example, the well-known social network Twitter alone has at least 187 million users and 132 billion connections as of now, and these numbers keep on increasing. Also, the Twitter network is uncertain in the sense that all connections do not signify complete influence or trust between the concerned users all the time. Thus, extracting relevant information from or performing meaningful operations on these graphs is highly challenging. A network metric is a measure that quantifies a certain property of a given component in a graph. Examples include the length of a given path, the density of a given subgraph and the influence spread of a given node set. The optimization of network metrics, i.e., finding the component maximizing or minimizing a certain metric, finds use in a plethora of real-world applications, e.g., road network routing, finding social network communities and viral marketing. Most of the exact techniques for such tasks turn out to be prohibitively time-consuming and memory-intensive for the huge graphs that are usually encountered. Thus, there is a need for efficient approximation algorithms. Although this may result in some loss in accuracy, the goal is to design techniques which achieve reasonably accurate results and scale well to very large graphs. This thesis focuses on the efficient optimization of network metrics in large uncertain graphs. Specifically, in this thesis, the following research problems have been studied. The first problem aims to find, between a given pair of nodes in an uncertain graph, the k paths having the highest probabilities of being a shortest path. The proposed solution improves the existing filtering-and-verification framework by coupling Dijkstra’s algorithm with Monte Carlo sampling to reduce the running time by some orders of magnitude, with end-to-end accuracy guarantees. Such shortest paths are then used to define a novel concept of betweenness centrality in uncertain graphs. The next problem involves finding the k node sets in an uncertain graph which have the largest probabilities of inducing a densest subgraph according to a given notion of density (edge, clique or any arbitrary pattern). The proposed solution samples some deterministic graphs, for each of which it computes the densest subgraphs, and eventually returns the k subgraphs with the highest frequencies. This problem is extended to computing the node sets with high probabilities of being contained in a densest subgraph. The solution to this is similar to the previous one, except that the final step involves using a maximal frequent itemset mining algorithm on all the sampled densest subgraphs. All the proposed methods are highly efficient and effective, both theoretically as well as experimentally. The final problem is a novel variant of the well-known opinion maximization problem where, given a social network of users with real-valued opinions (about different candidates) which diffuse according to a specified model starting from time 0, the goal is to choose the top-k seed users maximizing a specific voting-based score for the target candidate at a given finite time horizon. In contrast to prior works, the scoring function may not be submodular with respect to the seed set, which necessitates the augmentation of the well-known greedy algorithm with sandwich approximation using some suitably designed bound functions. To improve the efficiency, random walk and sketching-based methods are proposed, with theoretical quality guarantees.