Querying and mining complex graphs : uncertainty and multi-relations
Graphs are important data representations that can ubiquitously model objects and their relations in a wide diversity of real-word applications. Effective and efficient graph analytics provide users with a deeper understanding of what is behind the data, thus can bene t a lot of useful applications...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/137863 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-137863 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computer applications Engineering::Computer science and engineering::Data |
spellingShingle |
Engineering::Computer science and engineering::Computer applications Engineering::Computer science and engineering::Data Ke, Xiangyu Querying and mining complex graphs : uncertainty and multi-relations |
description |
Graphs are important data representations that can ubiquitously model objects and their relations in a wide diversity of real-word applications. Effective and efficient graph analytics provide users with a deeper understanding of what is behind the data, thus can bene t a lot of useful applications such as node classification, node recommendation, link prediction, etc. Nowadays, besides the rapid growth in the volume of graphs, the inherent complexity on graphs has attracted great attention from data management community. This thesis will therefore focus on querying and mining uncertain and multi-relation graphs. Uncertain graphs can characterize the inherent uncertainty in the data due to many reasons, including noisy measurements, inference and prediction models, and explicit manipulations. For example, in a road network, each road junction can be represented as a node, and each road segment can serve as an edge with a blocking (e.g., traffic jam) probability. Meanwhile, the multi-relation graphs, where multiple edges may exist between a same pair of nodes, are even more expressive. Using the same example of a road network, the probability of traffic jam occurrence may vary with time. Other examples include interaction conditions and transformation probabilities from a protein to many others in a protein-protein interaction network, the topics and the corresponding probabilities that a user may re-tweet her friends' posts, etc. The uncertainty and the multi-relations over graphs bring about additional complexities to graph querying and mining. In this thesis, I investigated the following research problems. First, I conducted an experimental analyses about six state-of-the-art s-t reliability algorithms. The s-t reliability measures the probability that a target node t is reachable from a source node s. We introduced the algorithms and provided several corrections or adaptions. Then I implemented all of them on a same code base, and conducted the evaluation with uni ed metrics, medium to large real-world datasets, and same query workloads. We summarized their trade-offs, and provided guidelines for future researchers
and practitioners.
Then, I examined a network modification problem, aiming at maximizing the network reliability through adding a small budget of new edges. We designed solutions to the basic two-terminal reliability case, then generalized them to the multiple-source-target case, and further studied a restricted version of only considering most reliable paths. The experimental results and the real-world applications well validated the effectiveness and the efficiency of our solutions.
Next, I studied a novel variant of well-known influence maximization problem by finding influential seed users and relevant topics simultaneously. Instead of using a fixed information diffusion probability between each pair of users, variations of diffusion probability in different topics were allowed. We formally characterized the problem and provided an iterative solution. The efficiency of the seed finding part was improved by up to 30%, while the accuracy of the topic finding part was improved by up to 30%.
Furthermore, I applied the uncertain graph to model the crowd errors in the solution to the crowdsourcing entity resolution problem. With the help of our novel metric, the "reliability" of a clustering, the next crowdsourcing questions could be selected wisely, which thus improved the accuracy by 15%, and reduced the crowdsourcing cost by 50%. I also developed a demo software for the proposed PERC system.
Finally, I presented our multi-relation graph summarization techniques. In our twostep baselines, a final summary was generated by properly aggregating the summary of each relation. I demonstrated their drawbacks and designed holistic solutions. Among them, the k-median clustering approach was able to return a summary with a provable bounded size. |
author2 |
Arijit Khan |
author_facet |
Arijit Khan Ke, Xiangyu |
format |
Thesis-Doctor of Philosophy |
author |
Ke, Xiangyu |
author_sort |
Ke, Xiangyu |
title |
Querying and mining complex graphs : uncertainty and multi-relations |
title_short |
Querying and mining complex graphs : uncertainty and multi-relations |
title_full |
Querying and mining complex graphs : uncertainty and multi-relations |
title_fullStr |
Querying and mining complex graphs : uncertainty and multi-relations |
title_full_unstemmed |
Querying and mining complex graphs : uncertainty and multi-relations |
title_sort |
querying and mining complex graphs : uncertainty and multi-relations |
publisher |
Nanyang Technological University |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/137863 |
_version_ |
1683493285560057856 |
spelling |
sg-ntu-dr.10356-1378632020-10-28T08:41:01Z Querying and mining complex graphs : uncertainty and multi-relations Ke, Xiangyu Arijit Khan School of Computer Science and Engineering arijit.khan@ntu.edu.sg Engineering::Computer science and engineering::Computer applications Engineering::Computer science and engineering::Data Graphs are important data representations that can ubiquitously model objects and their relations in a wide diversity of real-word applications. Effective and efficient graph analytics provide users with a deeper understanding of what is behind the data, thus can bene t a lot of useful applications such as node classification, node recommendation, link prediction, etc. Nowadays, besides the rapid growth in the volume of graphs, the inherent complexity on graphs has attracted great attention from data management community. This thesis will therefore focus on querying and mining uncertain and multi-relation graphs. Uncertain graphs can characterize the inherent uncertainty in the data due to many reasons, including noisy measurements, inference and prediction models, and explicit manipulations. For example, in a road network, each road junction can be represented as a node, and each road segment can serve as an edge with a blocking (e.g., traffic jam) probability. Meanwhile, the multi-relation graphs, where multiple edges may exist between a same pair of nodes, are even more expressive. Using the same example of a road network, the probability of traffic jam occurrence may vary with time. Other examples include interaction conditions and transformation probabilities from a protein to many others in a protein-protein interaction network, the topics and the corresponding probabilities that a user may re-tweet her friends' posts, etc. The uncertainty and the multi-relations over graphs bring about additional complexities to graph querying and mining. In this thesis, I investigated the following research problems. First, I conducted an experimental analyses about six state-of-the-art s-t reliability algorithms. The s-t reliability measures the probability that a target node t is reachable from a source node s. We introduced the algorithms and provided several corrections or adaptions. Then I implemented all of them on a same code base, and conducted the evaluation with uni ed metrics, medium to large real-world datasets, and same query workloads. We summarized their trade-offs, and provided guidelines for future researchers and practitioners. Then, I examined a network modification problem, aiming at maximizing the network reliability through adding a small budget of new edges. We designed solutions to the basic two-terminal reliability case, then generalized them to the multiple-source-target case, and further studied a restricted version of only considering most reliable paths. The experimental results and the real-world applications well validated the effectiveness and the efficiency of our solutions. Next, I studied a novel variant of well-known influence maximization problem by finding influential seed users and relevant topics simultaneously. Instead of using a fixed information diffusion probability between each pair of users, variations of diffusion probability in different topics were allowed. We formally characterized the problem and provided an iterative solution. The efficiency of the seed finding part was improved by up to 30%, while the accuracy of the topic finding part was improved by up to 30%. Furthermore, I applied the uncertain graph to model the crowd errors in the solution to the crowdsourcing entity resolution problem. With the help of our novel metric, the "reliability" of a clustering, the next crowdsourcing questions could be selected wisely, which thus improved the accuracy by 15%, and reduced the crowdsourcing cost by 50%. I also developed a demo software for the proposed PERC system. Finally, I presented our multi-relation graph summarization techniques. In our twostep baselines, a final summary was generated by properly aggregating the summary of each relation. I demonstrated their drawbacks and designed holistic solutions. Among them, the k-median clustering approach was able to return a summary with a provable bounded size. Doctor of Philosophy 2020-04-17T00:35:04Z 2020-04-17T00:35:04Z 2019 Thesis-Doctor of Philosophy Ke, X. (2019). Querying and mining complex graphs : uncertainty and multi-relations. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/137863 10.32657/10356/137863 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |