On the parallel complexity of hierarchical clustering and CC-complete problems

Complex data sets are often unmanageable unless they can be subdivided and simplified in an intelligent manner. Clustering is a technique that is used in data mining and scientific analysis for partitioning a data set into groups of similar or nearby items. Hierarchical clustering is an important an...

Full description

Saved in:
Bibliographic Details
Main Authors: Raymond Greenlaw, Sanpawat Kantabutra
Format: Journal
Published: 2018
Subjects:
Online Access:https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=60749115682&origin=inward
http://cmuir.cmu.ac.th/jspui/handle/6653943832/60756
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Chiang Mai University
id th-cmuir.6653943832-60756
record_format dspace
spelling th-cmuir.6653943832-607562018-09-10T03:49:28Z On the parallel complexity of hierarchical clustering and CC-complete problems Raymond Greenlaw Sanpawat Kantabutra Multidisciplinary Complex data sets are often unmanageable unless they can be subdivided and simplified in an intelligent manner. Clustering is a technique that is used in data mining and scientific analysis for partitioning a data set into groups of similar or nearby items. Hierarchical clustering is an important and well-studied clustering method involving both top-down and bottom-up subdivisions of data. In this article we address the parallel complexity of hierarchical clustering. We describe known sequential algorithms for top-down and bottom-up hierarchical clustering. The top-down algorithm can be parallelized, and when there are n points to be clustered, we provide an O(log n)-time, n2-processor CREW PRAM algorithm that computes the same output as its corresponding sequential algorithm. We define a natural decision problem based on bottom-up hierarchical clustering, and add this HIERARCHICAL CLUSTERING PROBLEM (HCP) to the slowly growing list of CC-complete problems, thereby showing that HCP is one of the computationally most difficult problems in the COMPARATOR CIRCUITVALUE PROBLEM class. This class contains a variety of interesting problems,and nowfor the first time a problem fromdata mining as well. By proving that HCP is CC-complete, we have demonstrated that HCP is very unlikely to have an NC algorithm. This result is in sharp contrast to the NC algorithm which we give for the top-down sequential approach, and the result surprisingly shows that the parallel complexities of the top-down and bottom-up approaches are different, unless CC equals NC. In addition, we provide a compendium of all known CC-complete problems. © 2008 Wiley Periodicals, Inc. 2018-09-10T03:49:28Z 2018-09-10T03:49:28Z 2008-11-01 Journal 10990526 10762787 2-s2.0-60749115682 10.1002/cplx.20238 https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=60749115682&origin=inward http://cmuir.cmu.ac.th/jspui/handle/6653943832/60756
institution Chiang Mai University
building Chiang Mai University Library
country Thailand
collection CMU Intellectual Repository
topic Multidisciplinary
spellingShingle Multidisciplinary
Raymond Greenlaw
Sanpawat Kantabutra
On the parallel complexity of hierarchical clustering and CC-complete problems
description Complex data sets are often unmanageable unless they can be subdivided and simplified in an intelligent manner. Clustering is a technique that is used in data mining and scientific analysis for partitioning a data set into groups of similar or nearby items. Hierarchical clustering is an important and well-studied clustering method involving both top-down and bottom-up subdivisions of data. In this article we address the parallel complexity of hierarchical clustering. We describe known sequential algorithms for top-down and bottom-up hierarchical clustering. The top-down algorithm can be parallelized, and when there are n points to be clustered, we provide an O(log n)-time, n2-processor CREW PRAM algorithm that computes the same output as its corresponding sequential algorithm. We define a natural decision problem based on bottom-up hierarchical clustering, and add this HIERARCHICAL CLUSTERING PROBLEM (HCP) to the slowly growing list of CC-complete problems, thereby showing that HCP is one of the computationally most difficult problems in the COMPARATOR CIRCUITVALUE PROBLEM class. This class contains a variety of interesting problems,and nowfor the first time a problem fromdata mining as well. By proving that HCP is CC-complete, we have demonstrated that HCP is very unlikely to have an NC algorithm. This result is in sharp contrast to the NC algorithm which we give for the top-down sequential approach, and the result surprisingly shows that the parallel complexities of the top-down and bottom-up approaches are different, unless CC equals NC. In addition, we provide a compendium of all known CC-complete problems. © 2008 Wiley Periodicals, Inc.
format Journal
author Raymond Greenlaw
Sanpawat Kantabutra
author_facet Raymond Greenlaw
Sanpawat Kantabutra
author_sort Raymond Greenlaw
title On the parallel complexity of hierarchical clustering and CC-complete problems
title_short On the parallel complexity of hierarchical clustering and CC-complete problems
title_full On the parallel complexity of hierarchical clustering and CC-complete problems
title_fullStr On the parallel complexity of hierarchical clustering and CC-complete problems
title_full_unstemmed On the parallel complexity of hierarchical clustering and CC-complete problems
title_sort on the parallel complexity of hierarchical clustering and cc-complete problems
publishDate 2018
url https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=60749115682&origin=inward
http://cmuir.cmu.ac.th/jspui/handle/6653943832/60756
_version_ 1681425494653796352