Enumeration of substructure patterns from large graphs

Graph is a ubiquitous data model and can be used to represent complex entities and their relationships, such as bioinformatics and biochemistry networks, traffic networks, communication networks, social networks, etc. This thesis study the problem of enumeration of fundamental substructure...

Full description

Saved in:

Bibliographic Details
Main Author:	Chu, Shumo.
Other Authors:	Cong Gao
Format:	Theses and Dissertations
Language:	English
Published:	2013
Subjects:	DRNTU::Engineering::Computer science and engineering
Online Access:	http://hdl.handle.net/10356/54633
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-54633
record_format	dspace
spelling	sg-ntu-dr.10356-546332023-03-04T00:34:31Z Enumeration of substructure patterns from large graphs Chu, Shumo. Cong Gao School of Computer Engineering Centre for Advanced Information Systems DRNTU::Engineering::Computer science and engineering Graph is a ubiquitous data model and can be used to represent complex entities and their relationships, such as bioinformatics and biochemistry networks, traffic networks, communication networks, social networks, etc. This thesis study the problem of enumeration of fundamental substructure patterns in graphs. The enumeration of substructure patterns of graphs plays an important role in network analysis, since these problems are crucial to the understanding of complex networks and integral parts of many more advanced techniques of graph analysis. In recent years, we have seen a dramatic increase in the number, the size, and the variety of networks. For example, there are over 800 million active users in Facebook and there are over 1 billion webpages in WWW according to Google. How to handle big data becomes a very challenging problem for graph analysis. Due to the large volume, large graphs cannot fit into main memory of a single computer any longer. And due to the huge gap between the efficiency of accessing data from external memory and from main memory, (in terms of bandwidth and random access time), existing algorithms on enumerating substructures become inefficient or even infeasible for large graphs. In this thesis, we study the problem of enumerating substructures of large disk-resident graph. We make the following major contributions. -We developed the first I/O-efficient algorithms for listing triangles in massive networks and applied them in the computation of clustering coefficient, transitivity, triangular connectivity, etc. For the same dataset, our algorithm outperform existing algorithm by orders of magnitude. -We developed a general algorithm framework to guarantee performance in networks with different characteristics and worked out a non-trivial parallel version of this algorithm. This algorithm is up to 1,000 times faster than the state-of-the-art algorithms for clique enumeration in disk-resident networks with a single machine and shows near linear scale up given more machines. Master of Engineering (SCE) 2013-07-01T02:18:27Z 2013-07-01T02:18:27Z 2013 2013 Thesis http://hdl.handle.net/10356/54633 en 100 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering
spellingShingle	DRNTU::Engineering::Computer science and engineering Chu, Shumo. Enumeration of substructure patterns from large graphs
description	Graph is a ubiquitous data model and can be used to represent complex entities and their relationships, such as bioinformatics and biochemistry networks, traffic networks, communication networks, social networks, etc. This thesis study the problem of enumeration of fundamental substructure patterns in graphs. The enumeration of substructure patterns of graphs plays an important role in network analysis, since these problems are crucial to the understanding of complex networks and integral parts of many more advanced techniques of graph analysis. In recent years, we have seen a dramatic increase in the number, the size, and the variety of networks. For example, there are over 800 million active users in Facebook and there are over 1 billion webpages in WWW according to Google. How to handle big data becomes a very challenging problem for graph analysis. Due to the large volume, large graphs cannot fit into main memory of a single computer any longer. And due to the huge gap between the efficiency of accessing data from external memory and from main memory, (in terms of bandwidth and random access time), existing algorithms on enumerating substructures become inefficient or even infeasible for large graphs. In this thesis, we study the problem of enumerating substructures of large disk-resident graph. We make the following major contributions. -We developed the first I/O-efficient algorithms for listing triangles in massive networks and applied them in the computation of clustering coefficient, transitivity, triangular connectivity, etc. For the same dataset, our algorithm outperform existing algorithm by orders of magnitude. -We developed a general algorithm framework to guarantee performance in networks with different characteristics and worked out a non-trivial parallel version of this algorithm. This algorithm is up to 1,000 times faster than the state-of-the-art algorithms for clique enumeration in disk-resident networks with a single machine and shows near linear scale up given more machines.
author2	Cong Gao
author_facet	Cong Gao Chu, Shumo.
format	Theses and Dissertations
author	Chu, Shumo.
author_sort	Chu, Shumo.
title	Enumeration of substructure patterns from large graphs
title_short	Enumeration of substructure patterns from large graphs
title_full	Enumeration of substructure patterns from large graphs
title_fullStr	Enumeration of substructure patterns from large graphs
title_full_unstemmed	Enumeration of substructure patterns from large graphs
title_sort	enumeration of substructure patterns from large graphs
publishDate	2013
url	http://hdl.handle.net/10356/54633
_version_	1759855222651355136

Enumeration of substructure patterns from large graphs

Similar Items