On the parallel complexity of hierarchical clustering and CC-complete problems

Complex data sets are often unmanageable unless they can be subdivided and simplified in an intelligent manner. Clustering is a technique that is used in data mining and scientific analysis for partitioning a data set into groups of similar or nearby items. Hierarchical clustering is an important an...

Full description

Saved in:
Bibliographic Details
Main Authors: Greenlaw R., Kantabutra S.
Format: Article
Language:English
Published: 2014
Online Access:http://www.scopus.com/inward/record.url?eid=2-s2.0-60749115682&partnerID=40&md5=ee24cfc80b89c94f6b9a12f1769f1b0f
http://cmuir.cmu.ac.th/handle/6653943832/5451
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Chiang Mai University
Language: English
id th-cmuir.6653943832-5451
record_format dspace
spelling th-cmuir.6653943832-54512014-08-30T02:56:32Z On the parallel complexity of hierarchical clustering and CC-complete problems Greenlaw R. Kantabutra S. Complex data sets are often unmanageable unless they can be subdivided and simplified in an intelligent manner. Clustering is a technique that is used in data mining and scientific analysis for partitioning a data set into groups of similar or nearby items. Hierarchical clustering is an important and well-studied clustering method involving both top-down and bottom-up subdivisions of data. In this article we address the parallel complexity of hierarchical clustering. We describe known sequential algorithms for top-down and bottom-up hierarchical clustering. The top-down algorithm can be parallelized, and when there are n points to be clustered, we provide an O(log n)-time, n2-processor CREW PRAM algorithm that computes the same output as its corresponding sequential algorithm. We define a natural decision problem based on bottom-up hierarchical clustering, and add this HIERARCHICAL CLUSTERING PROBLEM (HCP) to the slowly growing list of CC-complete problems, thereby showing that HCP is one of the computationally most difficult problems in the COMPARATOR CIRCUITVALUE PROBLEM class. This class contains a variety of interesting problems,and nowfor the first time a problem fromdata mining as well. By proving that HCP is CC-complete, we have demonstrated that HCP is very unlikely to have an NC algorithm. This result is in sharp contrast to the NC algorithm which we give for the top-down sequential approach, and the result surprisingly shows that the parallel complexities of the top-down and bottom-up approaches are different, unless CC equals NC. In addition, we provide a compendium of all known CC-complete problems. © 2008 Wiley Periodicals, Inc. 2014-08-30T02:56:32Z 2014-08-30T02:56:32Z 2008 Article 10762787 10.1002/cplx.20238 http://www.scopus.com/inward/record.url?eid=2-s2.0-60749115682&partnerID=40&md5=ee24cfc80b89c94f6b9a12f1769f1b0f http://cmuir.cmu.ac.th/handle/6653943832/5451 English
institution Chiang Mai University
building Chiang Mai University Library
country Thailand
collection CMU Intellectual Repository
language English
description Complex data sets are often unmanageable unless they can be subdivided and simplified in an intelligent manner. Clustering is a technique that is used in data mining and scientific analysis for partitioning a data set into groups of similar or nearby items. Hierarchical clustering is an important and well-studied clustering method involving both top-down and bottom-up subdivisions of data. In this article we address the parallel complexity of hierarchical clustering. We describe known sequential algorithms for top-down and bottom-up hierarchical clustering. The top-down algorithm can be parallelized, and when there are n points to be clustered, we provide an O(log n)-time, n2-processor CREW PRAM algorithm that computes the same output as its corresponding sequential algorithm. We define a natural decision problem based on bottom-up hierarchical clustering, and add this HIERARCHICAL CLUSTERING PROBLEM (HCP) to the slowly growing list of CC-complete problems, thereby showing that HCP is one of the computationally most difficult problems in the COMPARATOR CIRCUITVALUE PROBLEM class. This class contains a variety of interesting problems,and nowfor the first time a problem fromdata mining as well. By proving that HCP is CC-complete, we have demonstrated that HCP is very unlikely to have an NC algorithm. This result is in sharp contrast to the NC algorithm which we give for the top-down sequential approach, and the result surprisingly shows that the parallel complexities of the top-down and bottom-up approaches are different, unless CC equals NC. In addition, we provide a compendium of all known CC-complete problems. © 2008 Wiley Periodicals, Inc.
format Article
author Greenlaw R.
Kantabutra S.
spellingShingle Greenlaw R.
Kantabutra S.
On the parallel complexity of hierarchical clustering and CC-complete problems
author_facet Greenlaw R.
Kantabutra S.
author_sort Greenlaw R.
title On the parallel complexity of hierarchical clustering and CC-complete problems
title_short On the parallel complexity of hierarchical clustering and CC-complete problems
title_full On the parallel complexity of hierarchical clustering and CC-complete problems
title_fullStr On the parallel complexity of hierarchical clustering and CC-complete problems
title_full_unstemmed On the parallel complexity of hierarchical clustering and CC-complete problems
title_sort on the parallel complexity of hierarchical clustering and cc-complete problems
publishDate 2014
url http://www.scopus.com/inward/record.url?eid=2-s2.0-60749115682&partnerID=40&md5=ee24cfc80b89c94f6b9a12f1769f1b0f
http://cmuir.cmu.ac.th/handle/6653943832/5451
_version_ 1681420428139036672