Mining Top-K Large Structural Patterns in a Massive Network

With ever-growing popularity of social networks, web and bio-networks, mining large frequent patterns from a single huge network has become increasingly important. Yet the existing pattern mining methods cannot offer the efficiency desirable for large pattern discovery. We propose Spider- Mine, a novel...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHU, Feida, QU, Qiang, LO, David, YAN, Xifeng, HAN, Jiawei, YU, Philip S.
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2011
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/1394
https://ink.library.smu.edu.sg/context/sis_research/article/2393/viewcontent/p807_zhu.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-2393
record_format dspace
spelling sg-smu-ink.sis_research-23932020-01-10T01:00:56Z Mining Top-K Large Structural Patterns in a Massive Network ZHU, Feida QU, Qiang LO, David YAN, Xifeng HAN, Jiawei YU, Philip S. With ever-growing popularity of social networks, web and bio-networks, mining large frequent patterns from a single huge network has become increasingly important. Yet the existing pattern mining methods cannot offer the efficiency desirable for large pattern discovery. We propose Spider- Mine, a novel algorithm to efficiently mine top-K largest frequent patterns from a single massive network with any user-specified probability of 1 − ϵ. Deviating from the existing edge-by-edge (i.e., incremental) pattern-growth framework, SpiderMine achieves its efficiency by unleashing the power of small patterns of a bounded diameter, which we call “spiders”. With the spider structure, our approach adopts a probabilistic mining framework to find the top-k largest patterns by (i) identifying an affordable set of promising growth paths toward large patterns, (ii) generating large patterns with much lower combinatorial complexity, and finally (iii) greatly reducing the cost of graph isomorphism tests with a new graph pattern representation by a multi-set of spiders. Extensive experimental studies on both synthetic and real data sets show that our algorithm outperforms existing methods. 2011-09-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/1394 info:doi/10.14778/3402707.3402720 https://ink.library.smu.edu.sg/context/sis_research/article/2393/viewcontent/p807_zhu.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Numerical Analysis and Scientific Computing
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
Numerical Analysis and Scientific Computing
spellingShingle Databases and Information Systems
Numerical Analysis and Scientific Computing
ZHU, Feida
QU, Qiang
LO, David
YAN, Xifeng
HAN, Jiawei
YU, Philip S.
Mining Top-K Large Structural Patterns in a Massive Network
description With ever-growing popularity of social networks, web and bio-networks, mining large frequent patterns from a single huge network has become increasingly important. Yet the existing pattern mining methods cannot offer the efficiency desirable for large pattern discovery. We propose Spider- Mine, a novel algorithm to efficiently mine top-K largest frequent patterns from a single massive network with any user-specified probability of 1 − ϵ. Deviating from the existing edge-by-edge (i.e., incremental) pattern-growth framework, SpiderMine achieves its efficiency by unleashing the power of small patterns of a bounded diameter, which we call “spiders”. With the spider structure, our approach adopts a probabilistic mining framework to find the top-k largest patterns by (i) identifying an affordable set of promising growth paths toward large patterns, (ii) generating large patterns with much lower combinatorial complexity, and finally (iii) greatly reducing the cost of graph isomorphism tests with a new graph pattern representation by a multi-set of spiders. Extensive experimental studies on both synthetic and real data sets show that our algorithm outperforms existing methods.
format text
author ZHU, Feida
QU, Qiang
LO, David
YAN, Xifeng
HAN, Jiawei
YU, Philip S.
author_facet ZHU, Feida
QU, Qiang
LO, David
YAN, Xifeng
HAN, Jiawei
YU, Philip S.
author_sort ZHU, Feida
title Mining Top-K Large Structural Patterns in a Massive Network
title_short Mining Top-K Large Structural Patterns in a Massive Network
title_full Mining Top-K Large Structural Patterns in a Massive Network
title_fullStr Mining Top-K Large Structural Patterns in a Massive Network
title_full_unstemmed Mining Top-K Large Structural Patterns in a Massive Network
title_sort mining top-k large structural patterns in a massive network
publisher Institutional Knowledge at Singapore Management University
publishDate 2011
url https://ink.library.smu.edu.sg/sis_research/1394
https://ink.library.smu.edu.sg/context/sis_research/article/2393/viewcontent/p807_zhu.pdf
_version_ 1770571101374513152