Querying and mining complex graphs : uncertainty and multi-relations

Graphs are important data representations that can ubiquitously model objects and their relations in a wide diversity of real-word applications. Effective and efficient graph analytics provide users with a deeper understanding of what is behind the data, thus can bene t a lot of useful applications...

Full description

Saved in:

Bibliographic Details
Main Author:	Ke, Xiangyu
Other Authors:	Arijit Khan
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2020
Subjects:	Engineering::Computer science and engineering::Computer applications Engineering::Computer science and engineering::Data
Online Access:	https://hdl.handle.net/10356/137863
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-137863
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computer applications Engineering::Computer science and engineering::Data
spellingShingle	Engineering::Computer science and engineering::Computer applications Engineering::Computer science and engineering::Data Ke, Xiangyu Querying and mining complex graphs : uncertainty and multi-relations
description	Graphs are important data representations that can ubiquitously model objects and their relations in a wide diversity of real-word applications. Effective and efficient graph analytics provide users with a deeper understanding of what is behind the data, thus can bene t a lot of useful applications such as node classification, node recommendation, link prediction, etc. Nowadays, besides the rapid growth in the volume of graphs, the inherent complexity on graphs has attracted great attention from data management community. This thesis will therefore focus on querying and mining uncertain and multi-relation graphs. Uncertain graphs can characterize the inherent uncertainty in the data due to many reasons, including noisy measurements, inference and prediction models, and explicit manipulations. For example, in a road network, each road junction can be represented as a node, and each road segment can serve as an edge with a blocking (e.g., traffic jam) probability. Meanwhile, the multi-relation graphs, where multiple edges may exist between a same pair of nodes, are even more expressive. Using the same example of a road network, the probability of traffic jam occurrence may vary with time. Other examples include interaction conditions and transformation probabilities from a protein to many others in a protein-protein interaction network, the topics and the corresponding probabilities that a user may re-tweet her friends' posts, etc. The uncertainty and the multi-relations over graphs bring about additional complexities to graph querying and mining. In this thesis, I investigated the following research problems. First, I conducted an experimental analyses about six state-of-the-art s-t reliability algorithms. The s-t reliability measures the probability that a target node t is reachable from a source node s. We introduced the algorithms and provided several corrections or adaptions. Then I implemented all of them on a same code base, and conducted the evaluation with uni ed metrics, medium to large real-world datasets, and same query workloads. We summarized their trade-offs, and provided guidelines for future researchers and practitioners. Then, I examined a network modification problem, aiming at maximizing the network reliability through adding a small budget of new edges. We designed solutions to the basic two-terminal reliability case, then generalized them to the multiple-source-target case, and further studied a restricted version of only considering most reliable paths. The experimental results and the real-world applications well validated the effectiveness and the efficiency of our solutions. Next, I studied a novel variant of well-known influence maximization problem by finding influential seed users and relevant topics simultaneously. Instead of using a fixed information diffusion probability between each pair of users, variations of diffusion probability in different topics were allowed. We formally characterized the problem and provided an iterative solution. The efficiency of the seed finding part was improved by up to 30%, while the accuracy of the topic finding part was improved by up to 30%. Furthermore, I applied the uncertain graph to model the crowd errors in the solution to the crowdsourcing entity resolution problem. With the help of our novel metric, the "reliability" of a clustering, the next crowdsourcing questions could be selected wisely, which thus improved the accuracy by 15%, and reduced the crowdsourcing cost by 50%. I also developed a demo software for the proposed PERC system. Finally, I presented our multi-relation graph summarization techniques. In our twostep baselines, a final summary was generated by properly aggregating the summary of each relation. I demonstrated their drawbacks and designed holistic solutions. Among them, the k-median clustering approach was able to return a summary with a provable bounded size.
author2	Arijit Khan
author_facet	Arijit Khan Ke, Xiangyu
format	Thesis-Doctor of Philosophy
author	Ke, Xiangyu
author_sort	Ke, Xiangyu
title	Querying and mining complex graphs : uncertainty and multi-relations
title_short	Querying and mining complex graphs : uncertainty and multi-relations
title_full	Querying and mining complex graphs : uncertainty and multi-relations
title_fullStr	Querying and mining complex graphs : uncertainty and multi-relations
title_full_unstemmed	Querying and mining complex graphs : uncertainty and multi-relations
title_sort	querying and mining complex graphs : uncertainty and multi-relations
publisher	Nanyang Technological University
publishDate	2020
url	https://hdl.handle.net/10356/137863
_version_	1683493285560057856
spelling	sg-ntu-dr.10356-1378632020-10-28T08:41:01Z Querying and mining complex graphs : uncertainty and multi-relations Ke, Xiangyu Arijit Khan School of Computer Science and Engineering arijit.khan@ntu.edu.sg Engineering::Computer science and engineering::Computer applications Engineering::Computer science and engineering::Data Graphs are important data representations that can ubiquitously model objects and their relations in a wide diversity of real-word applications. Effective and efficient graph analytics provide users with a deeper understanding of what is behind the data, thus can bene t a lot of useful applications such as node classification, node recommendation, link prediction, etc. Nowadays, besides the rapid growth in the volume of graphs, the inherent complexity on graphs has attracted great attention from data management community. This thesis will therefore focus on querying and mining uncertain and multi-relation graphs. Uncertain graphs can characterize the inherent uncertainty in the data due to many reasons, including noisy measurements, inference and prediction models, and explicit manipulations. For example, in a road network, each road junction can be represented as a node, and each road segment can serve as an edge with a blocking (e.g., traffic jam) probability. Meanwhile, the multi-relation graphs, where multiple edges may exist between a same pair of nodes, are even more expressive. Using the same example of a road network, the probability of traffic jam occurrence may vary with time. Other examples include interaction conditions and transformation probabilities from a protein to many others in a protein-protein interaction network, the topics and the corresponding probabilities that a user may re-tweet her friends' posts, etc. The uncertainty and the multi-relations over graphs bring about additional complexities to graph querying and mining. In this thesis, I investigated the following research problems. First, I conducted an experimental analyses about six state-of-the-art s-t reliability algorithms. The s-t reliability measures the probability that a target node t is reachable from a source node s. We introduced the algorithms and provided several corrections or adaptions. Then I implemented all of them on a same code base, and conducted the evaluation with uni ed metrics, medium to large real-world datasets, and same query workloads. We summarized their trade-offs, and provided guidelines for future researchers and practitioners. Then, I examined a network modification problem, aiming at maximizing the network reliability through adding a small budget of new edges. We designed solutions to the basic two-terminal reliability case, then generalized them to the multiple-source-target case, and further studied a restricted version of only considering most reliable paths. The experimental results and the real-world applications well validated the effectiveness and the efficiency of our solutions. Next, I studied a novel variant of well-known influence maximization problem by finding influential seed users and relevant topics simultaneously. Instead of using a fixed information diffusion probability between each pair of users, variations of diffusion probability in different topics were allowed. We formally characterized the problem and provided an iterative solution. The efficiency of the seed finding part was improved by up to 30%, while the accuracy of the topic finding part was improved by up to 30%. Furthermore, I applied the uncertain graph to model the crowd errors in the solution to the crowdsourcing entity resolution problem. With the help of our novel metric, the "reliability" of a clustering, the next crowdsourcing questions could be selected wisely, which thus improved the accuracy by 15%, and reduced the crowdsourcing cost by 50%. I also developed a demo software for the proposed PERC system. Finally, I presented our multi-relation graph summarization techniques. In our twostep baselines, a final summary was generated by properly aggregating the summary of each relation. I demonstrated their drawbacks and designed holistic solutions. Among them, the k-median clustering approach was able to return a summary with a provable bounded size. Doctor of Philosophy 2020-04-17T00:35:04Z 2020-04-17T00:35:04Z 2019 Thesis-Doctor of Philosophy Ke, X. (2019). Querying and mining complex graphs : uncertainty and multi-relations. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/137863 10.32657/10356/137863 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

Querying and mining complex graphs : uncertainty and multi-relations

Similar Items