Querying and mining complex graphs : uncertainty and multi-relations

Graphs are important data representations that can ubiquitously model objects and their relations in a wide diversity of real-word applications. Effective and efficient graph analytics provide users with a deeper understanding of what is behind the data, thus can bene t a lot of useful applications...

Full description

Saved in:
Bibliographic Details
Main Author: Ke, Xiangyu
Other Authors: Arijit Khan
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/137863
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-137863
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computer applications
Engineering::Computer science and engineering::Data
spellingShingle Engineering::Computer science and engineering::Computer applications
Engineering::Computer science and engineering::Data
Ke, Xiangyu
Querying and mining complex graphs : uncertainty and multi-relations
description Graphs are important data representations that can ubiquitously model objects and their relations in a wide diversity of real-word applications. Effective and efficient graph analytics provide users with a deeper understanding of what is behind the data, thus can bene t a lot of useful applications such as node classification, node recommendation, link prediction, etc. Nowadays, besides the rapid growth in the volume of graphs, the inherent complexity on graphs has attracted great attention from data management community. This thesis will therefore focus on querying and mining uncertain and multi-relation graphs. Uncertain graphs can characterize the inherent uncertainty in the data due to many reasons, including noisy measurements, inference and prediction models, and explicit manipulations. For example, in a road network, each road junction can be represented as a node, and each road segment can serve as an edge with a blocking (e.g., traffic jam) probability. Meanwhile, the multi-relation graphs, where multiple edges may exist between a same pair of nodes, are even more expressive. Using the same example of a road network, the probability of traffic jam occurrence may vary with time. Other examples include interaction conditions and transformation probabilities from a protein to many others in a protein-protein interaction network, the topics and the corresponding probabilities that a user may re-tweet her friends' posts, etc. The uncertainty and the multi-relations over graphs bring about additional complexities to graph querying and mining. In this thesis, I investigated the following research problems. First, I conducted an experimental analyses about six state-of-the-art s-t reliability algorithms. The s-t reliability measures the probability that a target node t is reachable from a source node s. We introduced the algorithms and provided several corrections or adaptions. Then I implemented all of them on a same code base, and conducted the evaluation with uni ed metrics, medium to large real-world datasets, and same query workloads. We summarized their trade-offs, and provided guidelines for future researchers and practitioners. Then, I examined a network modification problem, aiming at maximizing the network reliability through adding a small budget of new edges. We designed solutions to the basic two-terminal reliability case, then generalized them to the multiple-source-target case, and further studied a restricted version of only considering most reliable paths. The experimental results and the real-world applications well validated the effectiveness and the efficiency of our solutions. Next, I studied a novel variant of well-known influence maximization problem by finding influential seed users and relevant topics simultaneously. Instead of using a fixed information diffusion probability between each pair of users, variations of diffusion probability in different topics were allowed. We formally characterized the problem and provided an iterative solution. The efficiency of the seed finding part was improved by up to 30%, while the accuracy of the topic finding part was improved by up to 30%. Furthermore, I applied the uncertain graph to model the crowd errors in the solution to the crowdsourcing entity resolution problem. With the help of our novel metric, the "reliability" of a clustering, the next crowdsourcing questions could be selected wisely, which thus improved the accuracy by 15%, and reduced the crowdsourcing cost by 50%. I also developed a demo software for the proposed PERC system. Finally, I presented our multi-relation graph summarization techniques. In our twostep baselines, a final summary was generated by properly aggregating the summary of each relation. I demonstrated their drawbacks and designed holistic solutions. Among them, the k-median clustering approach was able to return a summary with a provable bounded size.
author2 Arijit Khan
author_facet Arijit Khan
Ke, Xiangyu
format Thesis-Doctor of Philosophy
author Ke, Xiangyu
author_sort Ke, Xiangyu
title Querying and mining complex graphs : uncertainty and multi-relations
title_short Querying and mining complex graphs : uncertainty and multi-relations
title_full Querying and mining complex graphs : uncertainty and multi-relations
title_fullStr Querying and mining complex graphs : uncertainty and multi-relations
title_full_unstemmed Querying and mining complex graphs : uncertainty and multi-relations
title_sort querying and mining complex graphs : uncertainty and multi-relations
publisher Nanyang Technological University
publishDate 2020
url https://hdl.handle.net/10356/137863
_version_ 1683493285560057856
spelling sg-ntu-dr.10356-1378632020-10-28T08:41:01Z Querying and mining complex graphs : uncertainty and multi-relations Ke, Xiangyu Arijit Khan School of Computer Science and Engineering arijit.khan@ntu.edu.sg Engineering::Computer science and engineering::Computer applications Engineering::Computer science and engineering::Data Graphs are important data representations that can ubiquitously model objects and their relations in a wide diversity of real-word applications. Effective and efficient graph analytics provide users with a deeper understanding of what is behind the data, thus can bene t a lot of useful applications such as node classification, node recommendation, link prediction, etc. Nowadays, besides the rapid growth in the volume of graphs, the inherent complexity on graphs has attracted great attention from data management community. This thesis will therefore focus on querying and mining uncertain and multi-relation graphs. Uncertain graphs can characterize the inherent uncertainty in the data due to many reasons, including noisy measurements, inference and prediction models, and explicit manipulations. For example, in a road network, each road junction can be represented as a node, and each road segment can serve as an edge with a blocking (e.g., traffic jam) probability. Meanwhile, the multi-relation graphs, where multiple edges may exist between a same pair of nodes, are even more expressive. Using the same example of a road network, the probability of traffic jam occurrence may vary with time. Other examples include interaction conditions and transformation probabilities from a protein to many others in a protein-protein interaction network, the topics and the corresponding probabilities that a user may re-tweet her friends' posts, etc. The uncertainty and the multi-relations over graphs bring about additional complexities to graph querying and mining. In this thesis, I investigated the following research problems. First, I conducted an experimental analyses about six state-of-the-art s-t reliability algorithms. The s-t reliability measures the probability that a target node t is reachable from a source node s. We introduced the algorithms and provided several corrections or adaptions. Then I implemented all of them on a same code base, and conducted the evaluation with uni ed metrics, medium to large real-world datasets, and same query workloads. We summarized their trade-offs, and provided guidelines for future researchers and practitioners. Then, I examined a network modification problem, aiming at maximizing the network reliability through adding a small budget of new edges. We designed solutions to the basic two-terminal reliability case, then generalized them to the multiple-source-target case, and further studied a restricted version of only considering most reliable paths. The experimental results and the real-world applications well validated the effectiveness and the efficiency of our solutions. Next, I studied a novel variant of well-known influence maximization problem by finding influential seed users and relevant topics simultaneously. Instead of using a fixed information diffusion probability between each pair of users, variations of diffusion probability in different topics were allowed. We formally characterized the problem and provided an iterative solution. The efficiency of the seed finding part was improved by up to 30%, while the accuracy of the topic finding part was improved by up to 30%. Furthermore, I applied the uncertain graph to model the crowd errors in the solution to the crowdsourcing entity resolution problem. With the help of our novel metric, the "reliability" of a clustering, the next crowdsourcing questions could be selected wisely, which thus improved the accuracy by 15%, and reduced the crowdsourcing cost by 50%. I also developed a demo software for the proposed PERC system. Finally, I presented our multi-relation graph summarization techniques. In our twostep baselines, a final summary was generated by properly aggregating the summary of each relation. I demonstrated their drawbacks and designed holistic solutions. Among them, the k-median clustering approach was able to return a summary with a provable bounded size. Doctor of Philosophy 2020-04-17T00:35:04Z 2020-04-17T00:35:04Z 2019 Thesis-Doctor of Philosophy Ke, X. (2019). Querying and mining complex graphs : uncertainty and multi-relations. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/137863 10.32657/10356/137863 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University