On the parallel complexity of hierarchical clustering and CC-complete problems
Complex data sets are often unmanageable unless they can be subdivided and simplified in an intelligent manner. Clustering is a technique that is used in data mining and scientific analysis for partitioning a data set into groups of similar or nearby items. Hierarchical clustering is an important an...
Saved in:
Main Authors: | , |
---|---|
Format: | Journal |
Published: |
2018
|
Subjects: | |
Online Access: | https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=60749115682&origin=inward http://cmuir.cmu.ac.th/jspui/handle/6653943832/60756 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Chiang Mai University |
id |
th-cmuir.6653943832-60756 |
---|---|
record_format |
dspace |
spelling |
th-cmuir.6653943832-607562018-09-10T03:49:28Z On the parallel complexity of hierarchical clustering and CC-complete problems Raymond Greenlaw Sanpawat Kantabutra Multidisciplinary Complex data sets are often unmanageable unless they can be subdivided and simplified in an intelligent manner. Clustering is a technique that is used in data mining and scientific analysis for partitioning a data set into groups of similar or nearby items. Hierarchical clustering is an important and well-studied clustering method involving both top-down and bottom-up subdivisions of data. In this article we address the parallel complexity of hierarchical clustering. We describe known sequential algorithms for top-down and bottom-up hierarchical clustering. The top-down algorithm can be parallelized, and when there are n points to be clustered, we provide an O(log n)-time, n2-processor CREW PRAM algorithm that computes the same output as its corresponding sequential algorithm. We define a natural decision problem based on bottom-up hierarchical clustering, and add this HIERARCHICAL CLUSTERING PROBLEM (HCP) to the slowly growing list of CC-complete problems, thereby showing that HCP is one of the computationally most difficult problems in the COMPARATOR CIRCUITVALUE PROBLEM class. This class contains a variety of interesting problems,and nowfor the first time a problem fromdata mining as well. By proving that HCP is CC-complete, we have demonstrated that HCP is very unlikely to have an NC algorithm. This result is in sharp contrast to the NC algorithm which we give for the top-down sequential approach, and the result surprisingly shows that the parallel complexities of the top-down and bottom-up approaches are different, unless CC equals NC. In addition, we provide a compendium of all known CC-complete problems. © 2008 Wiley Periodicals, Inc. 2018-09-10T03:49:28Z 2018-09-10T03:49:28Z 2008-11-01 Journal 10990526 10762787 2-s2.0-60749115682 10.1002/cplx.20238 https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=60749115682&origin=inward http://cmuir.cmu.ac.th/jspui/handle/6653943832/60756 |
institution |
Chiang Mai University |
building |
Chiang Mai University Library |
country |
Thailand |
collection |
CMU Intellectual Repository |
topic |
Multidisciplinary |
spellingShingle |
Multidisciplinary Raymond Greenlaw Sanpawat Kantabutra On the parallel complexity of hierarchical clustering and CC-complete problems |
description |
Complex data sets are often unmanageable unless they can be subdivided and simplified in an intelligent manner. Clustering is a technique that is used in data mining and scientific analysis for partitioning a data set into groups of similar or nearby items. Hierarchical clustering is an important and well-studied clustering method involving both top-down and bottom-up subdivisions of data. In this article we address the parallel complexity of hierarchical clustering. We describe known sequential algorithms for top-down and bottom-up hierarchical clustering. The top-down algorithm can be parallelized, and when there are n points to be clustered, we provide an O(log n)-time, n2-processor CREW PRAM algorithm that computes the same output as its corresponding sequential algorithm. We define a natural decision problem based on bottom-up hierarchical clustering, and add this HIERARCHICAL CLUSTERING PROBLEM (HCP) to the slowly growing list of CC-complete problems, thereby showing that HCP is one of the computationally most difficult problems in the COMPARATOR CIRCUITVALUE PROBLEM class. This class contains a variety of interesting problems,and nowfor the first time a problem fromdata mining as well. By proving that HCP is CC-complete, we have demonstrated that HCP is very unlikely to have an NC algorithm. This result is in sharp contrast to the NC algorithm which we give for the top-down sequential approach, and the result surprisingly shows that the parallel complexities of the top-down and bottom-up approaches are different, unless CC equals NC. In addition, we provide a compendium of all known CC-complete problems. © 2008 Wiley Periodicals, Inc. |
format |
Journal |
author |
Raymond Greenlaw Sanpawat Kantabutra |
author_facet |
Raymond Greenlaw Sanpawat Kantabutra |
author_sort |
Raymond Greenlaw |
title |
On the parallel complexity of hierarchical clustering and CC-complete problems |
title_short |
On the parallel complexity of hierarchical clustering and CC-complete problems |
title_full |
On the parallel complexity of hierarchical clustering and CC-complete problems |
title_fullStr |
On the parallel complexity of hierarchical clustering and CC-complete problems |
title_full_unstemmed |
On the parallel complexity of hierarchical clustering and CC-complete problems |
title_sort |
on the parallel complexity of hierarchical clustering and cc-complete problems |
publishDate |
2018 |
url |
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=60749115682&origin=inward http://cmuir.cmu.ac.th/jspui/handle/6653943832/60756 |
_version_ |
1681425494653796352 |