On the parallel complexity of hierarchical clustering and CC-complete problems
Complex data sets are often unmanageable unless they can be subdivided and simplified in an intelligent manner. Clustering is a technique that is used in data mining and scientific analysis for partitioning a data set into groups of similar or nearby items. Hierarchical clustering is an important an...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
2014
|
Online Access: | http://www.scopus.com/inward/record.url?eid=2-s2.0-60749115682&partnerID=40&md5=ee24cfc80b89c94f6b9a12f1769f1b0f http://cmuir.cmu.ac.th/handle/6653943832/5451 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Chiang Mai University |
Language: | English |
id |
th-cmuir.6653943832-5451 |
---|---|
record_format |
dspace |
spelling |
th-cmuir.6653943832-54512014-08-30T02:56:32Z On the parallel complexity of hierarchical clustering and CC-complete problems Greenlaw R. Kantabutra S. Complex data sets are often unmanageable unless they can be subdivided and simplified in an intelligent manner. Clustering is a technique that is used in data mining and scientific analysis for partitioning a data set into groups of similar or nearby items. Hierarchical clustering is an important and well-studied clustering method involving both top-down and bottom-up subdivisions of data. In this article we address the parallel complexity of hierarchical clustering. We describe known sequential algorithms for top-down and bottom-up hierarchical clustering. The top-down algorithm can be parallelized, and when there are n points to be clustered, we provide an O(log n)-time, n2-processor CREW PRAM algorithm that computes the same output as its corresponding sequential algorithm. We define a natural decision problem based on bottom-up hierarchical clustering, and add this HIERARCHICAL CLUSTERING PROBLEM (HCP) to the slowly growing list of CC-complete problems, thereby showing that HCP is one of the computationally most difficult problems in the COMPARATOR CIRCUITVALUE PROBLEM class. This class contains a variety of interesting problems,and nowfor the first time a problem fromdata mining as well. By proving that HCP is CC-complete, we have demonstrated that HCP is very unlikely to have an NC algorithm. This result is in sharp contrast to the NC algorithm which we give for the top-down sequential approach, and the result surprisingly shows that the parallel complexities of the top-down and bottom-up approaches are different, unless CC equals NC. In addition, we provide a compendium of all known CC-complete problems. © 2008 Wiley Periodicals, Inc. 2014-08-30T02:56:32Z 2014-08-30T02:56:32Z 2008 Article 10762787 10.1002/cplx.20238 http://www.scopus.com/inward/record.url?eid=2-s2.0-60749115682&partnerID=40&md5=ee24cfc80b89c94f6b9a12f1769f1b0f http://cmuir.cmu.ac.th/handle/6653943832/5451 English |
institution |
Chiang Mai University |
building |
Chiang Mai University Library |
country |
Thailand |
collection |
CMU Intellectual Repository |
language |
English |
description |
Complex data sets are often unmanageable unless they can be subdivided and simplified in an intelligent manner. Clustering is a technique that is used in data mining and scientific analysis for partitioning a data set into groups of similar or nearby items. Hierarchical clustering is an important and well-studied clustering method involving both top-down and bottom-up subdivisions of data. In this article we address the parallel complexity of hierarchical clustering. We describe known sequential algorithms for top-down and bottom-up hierarchical clustering. The top-down algorithm can be parallelized, and when there are n points to be clustered, we provide an O(log n)-time, n2-processor CREW PRAM algorithm that computes the same output as its corresponding sequential algorithm. We define a natural decision problem based on bottom-up hierarchical clustering, and add this HIERARCHICAL CLUSTERING PROBLEM (HCP) to the slowly growing list of CC-complete problems, thereby showing that HCP is one of the computationally most difficult problems in the COMPARATOR CIRCUITVALUE PROBLEM class. This class contains a variety of interesting problems,and nowfor the first time a problem fromdata mining as well. By proving that HCP is CC-complete, we have demonstrated that HCP is very unlikely to have an NC algorithm. This result is in sharp contrast to the NC algorithm which we give for the top-down sequential approach, and the result surprisingly shows that the parallel complexities of the top-down and bottom-up approaches are different, unless CC equals NC. In addition, we provide a compendium of all known CC-complete problems. © 2008 Wiley Periodicals, Inc. |
format |
Article |
author |
Greenlaw R. Kantabutra S. |
spellingShingle |
Greenlaw R. Kantabutra S. On the parallel complexity of hierarchical clustering and CC-complete problems |
author_facet |
Greenlaw R. Kantabutra S. |
author_sort |
Greenlaw R. |
title |
On the parallel complexity of hierarchical clustering and CC-complete problems |
title_short |
On the parallel complexity of hierarchical clustering and CC-complete problems |
title_full |
On the parallel complexity of hierarchical clustering and CC-complete problems |
title_fullStr |
On the parallel complexity of hierarchical clustering and CC-complete problems |
title_full_unstemmed |
On the parallel complexity of hierarchical clustering and CC-complete problems |
title_sort |
on the parallel complexity of hierarchical clustering and cc-complete problems |
publishDate |
2014 |
url |
http://www.scopus.com/inward/record.url?eid=2-s2.0-60749115682&partnerID=40&md5=ee24cfc80b89c94f6b9a12f1769f1b0f http://cmuir.cmu.ac.th/handle/6653943832/5451 |
_version_ |
1681420428139036672 |