Efficient optimization of network metrics in large uncertain graphs
Graphs constitute an omnipresent data structure that can model objects and their relationships in a wide variety of real-world applications like social networks and geo-spatial road networks. Most of these graphs nowadays, in addition to being huge, are intrinsically uncertain for a variety of reaso...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/163960 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-163960 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering |
spellingShingle |
Engineering::Computer science and engineering Saha, Arkaprava Efficient optimization of network metrics in large uncertain graphs |
description |
Graphs constitute an omnipresent data structure that can model objects and their relationships in a wide variety of real-world applications like social networks and geo-spatial road networks. Most of these graphs nowadays, in addition to being huge, are intrinsically uncertain for a variety of reasons. As an example, the well-known social network Twitter alone has at least 187 million users and 132 billion connections as of now, and these numbers keep on increasing. Also, the Twitter network is uncertain in the sense that all connections do not signify complete influence or trust between the concerned users all the time. Thus, extracting relevant information from or performing meaningful operations on these graphs is highly challenging.
A network metric is a measure that quantifies a certain property of a given component in a graph. Examples include the length of a given path, the density of a given subgraph and the influence spread of a given node set. The optimization of network metrics, i.e., finding the component maximizing or minimizing a certain metric, finds use in a plethora of real-world applications, e.g., road network routing, finding social network communities and viral marketing. Most of the exact techniques for such tasks turn out to be prohibitively time-consuming and memory-intensive for the huge graphs that are usually encountered. Thus, there is a need for efficient approximation algorithms. Although this may result in some loss in accuracy, the goal is to design techniques which achieve reasonably accurate results and scale well to very large graphs. This thesis focuses on the efficient optimization of network metrics in large uncertain graphs. Specifically, in this thesis, the following research problems have been studied.
The first problem aims to find, between a given pair of nodes in an uncertain graph, the k paths having the highest probabilities of being a shortest path. The proposed solution improves the existing filtering-and-verification framework by coupling Dijkstra’s algorithm with Monte Carlo sampling to reduce the running time by some orders of magnitude, with end-to-end accuracy guarantees. Such shortest paths are then used to define a novel concept of betweenness centrality in uncertain graphs.
The next problem involves finding the k node sets in an uncertain graph which have the largest probabilities of inducing a densest subgraph according to a given notion of density (edge, clique or any arbitrary pattern). The proposed solution samples some deterministic graphs, for each of which it computes the densest subgraphs, and eventually returns the k subgraphs with the highest frequencies. This problem is extended to computing the node sets with high probabilities of being contained in a densest subgraph. The solution to this is similar to the previous one, except that the final step involves using a maximal frequent itemset mining algorithm on all the sampled densest subgraphs. All the proposed methods are highly efficient and effective, both theoretically as well as experimentally.
The final problem is a novel variant of the well-known opinion maximization problem where, given a social network of users with real-valued opinions (about different candidates) which diffuse according to a specified model starting from time 0, the goal is to choose the top-k seed users maximizing a specific voting-based score for the target candidate at a given finite time horizon. In contrast to prior works, the scoring function may not be submodular with respect to the seed set, which necessitates the augmentation of the well-known greedy algorithm with sandwich approximation using some suitably designed bound functions. To improve the efficiency, random walk and sketching-based methods are proposed, with theoretical quality guarantees. |
author2 |
Long Cheng |
author_facet |
Long Cheng Saha, Arkaprava |
format |
Thesis-Doctor of Philosophy |
author |
Saha, Arkaprava |
author_sort |
Saha, Arkaprava |
title |
Efficient optimization of network metrics in large uncertain graphs |
title_short |
Efficient optimization of network metrics in large uncertain graphs |
title_full |
Efficient optimization of network metrics in large uncertain graphs |
title_fullStr |
Efficient optimization of network metrics in large uncertain graphs |
title_full_unstemmed |
Efficient optimization of network metrics in large uncertain graphs |
title_sort |
efficient optimization of network metrics in large uncertain graphs |
publisher |
Nanyang Technological University |
publishDate |
2022 |
url |
https://hdl.handle.net/10356/163960 |
_version_ |
1754611294810406912 |
spelling |
sg-ntu-dr.10356-1639602023-01-03T05:05:24Z Efficient optimization of network metrics in large uncertain graphs Saha, Arkaprava Long Cheng School of Computer Science and Engineering c.long@ntu.edu.sg Engineering::Computer science and engineering Graphs constitute an omnipresent data structure that can model objects and their relationships in a wide variety of real-world applications like social networks and geo-spatial road networks. Most of these graphs nowadays, in addition to being huge, are intrinsically uncertain for a variety of reasons. As an example, the well-known social network Twitter alone has at least 187 million users and 132 billion connections as of now, and these numbers keep on increasing. Also, the Twitter network is uncertain in the sense that all connections do not signify complete influence or trust between the concerned users all the time. Thus, extracting relevant information from or performing meaningful operations on these graphs is highly challenging. A network metric is a measure that quantifies a certain property of a given component in a graph. Examples include the length of a given path, the density of a given subgraph and the influence spread of a given node set. The optimization of network metrics, i.e., finding the component maximizing or minimizing a certain metric, finds use in a plethora of real-world applications, e.g., road network routing, finding social network communities and viral marketing. Most of the exact techniques for such tasks turn out to be prohibitively time-consuming and memory-intensive for the huge graphs that are usually encountered. Thus, there is a need for efficient approximation algorithms. Although this may result in some loss in accuracy, the goal is to design techniques which achieve reasonably accurate results and scale well to very large graphs. This thesis focuses on the efficient optimization of network metrics in large uncertain graphs. Specifically, in this thesis, the following research problems have been studied. The first problem aims to find, between a given pair of nodes in an uncertain graph, the k paths having the highest probabilities of being a shortest path. The proposed solution improves the existing filtering-and-verification framework by coupling Dijkstra’s algorithm with Monte Carlo sampling to reduce the running time by some orders of magnitude, with end-to-end accuracy guarantees. Such shortest paths are then used to define a novel concept of betweenness centrality in uncertain graphs. The next problem involves finding the k node sets in an uncertain graph which have the largest probabilities of inducing a densest subgraph according to a given notion of density (edge, clique or any arbitrary pattern). The proposed solution samples some deterministic graphs, for each of which it computes the densest subgraphs, and eventually returns the k subgraphs with the highest frequencies. This problem is extended to computing the node sets with high probabilities of being contained in a densest subgraph. The solution to this is similar to the previous one, except that the final step involves using a maximal frequent itemset mining algorithm on all the sampled densest subgraphs. All the proposed methods are highly efficient and effective, both theoretically as well as experimentally. The final problem is a novel variant of the well-known opinion maximization problem where, given a social network of users with real-valued opinions (about different candidates) which diffuse according to a specified model starting from time 0, the goal is to choose the top-k seed users maximizing a specific voting-based score for the target candidate at a given finite time horizon. In contrast to prior works, the scoring function may not be submodular with respect to the seed set, which necessitates the augmentation of the well-known greedy algorithm with sandwich approximation using some suitably designed bound functions. To improve the efficiency, random walk and sketching-based methods are proposed, with theoretical quality guarantees. Doctor of Philosophy 2022-12-27T04:03:45Z 2022-12-27T04:03:45Z 2022 Thesis-Doctor of Philosophy Saha, A. (2022). Efficient optimization of network metrics in large uncertain graphs. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/163960 https://hdl.handle.net/10356/163960 10.32657/10356/163960 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |