A split-merge framework for comparing clusterings

External clustering evaluation measures are often used to evaluate the performance of different clustering algorithms on a collection of data sets. Traditional normalization property is no longer suitable for this task and a conditional normalization property is proposed based on the fact that one c...

Full description

Saved in:
Bibliographic Details
Main Author: Xiang, Qiaoliang
Other Authors: Tsang Wai-Hung, Ivor
Format: Theses and Dissertations
Language:English
Published: 2013
Subjects:
Online Access:https://hdl.handle.net/10356/55194
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-55194
record_format dspace
spelling sg-ntu-dr.10356-551942023-03-04T00:48:44Z A split-merge framework for comparing clusterings Xiang, Qiaoliang Tsang Wai-Hung, Ivor School of Computer Engineering Centre for Computational Intelligence DRNTU::Engineering::Computer science and engineering External clustering evaluation measures are often used to evaluate the performance of different clustering algorithms on a collection of data sets. Traditional normalization property is no longer suitable for this task and a conditional normalization property is proposed based on the fact that one clustering is the ground-truth. Even existing measures have been proposed from different points of view, we study them from the normalization point of view. Besides, we propose a new category of cluster counting measures and further group set matching measures into two subcategories according to how the matching is performed. Furthermore, we propose a generative model to study how exist- ing measures are generated as well as producing new measures according to application requirements. In order to understand the intrinsic properties of a measure, a graph-based model is presented to model two clusterings as a directed bipartite graph, which can be decomposed into weakly connected components. A measure can be expressed as a conic combination of scores on components, and different weights are assigned to components when aggregating their scores. Based on the graph-based model, we propose a split-merge framework by breaking components into subcomponents and combining the scores of any two related subcomponents. It is conditionally normalized while existing measures are not. It also has many nice properties compared to other existing frameworks. We give some examples of the framework and compare one example with a few representative measures theoretically and empirically on a coreference resolution data set. MASTER OF ENGINEERING (SCE) 2013-12-30T02:27:06Z 2013-12-30T02:27:06Z 2013 2013 Thesis Xiang, Q. (2013). A split-merge framework for comparing clusterings. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/55194 10.32657/10356/55194 en 106 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering
spellingShingle DRNTU::Engineering::Computer science and engineering
Xiang, Qiaoliang
A split-merge framework for comparing clusterings
description External clustering evaluation measures are often used to evaluate the performance of different clustering algorithms on a collection of data sets. Traditional normalization property is no longer suitable for this task and a conditional normalization property is proposed based on the fact that one clustering is the ground-truth. Even existing measures have been proposed from different points of view, we study them from the normalization point of view. Besides, we propose a new category of cluster counting measures and further group set matching measures into two subcategories according to how the matching is performed. Furthermore, we propose a generative model to study how exist- ing measures are generated as well as producing new measures according to application requirements. In order to understand the intrinsic properties of a measure, a graph-based model is presented to model two clusterings as a directed bipartite graph, which can be decomposed into weakly connected components. A measure can be expressed as a conic combination of scores on components, and different weights are assigned to components when aggregating their scores. Based on the graph-based model, we propose a split-merge framework by breaking components into subcomponents and combining the scores of any two related subcomponents. It is conditionally normalized while existing measures are not. It also has many nice properties compared to other existing frameworks. We give some examples of the framework and compare one example with a few representative measures theoretically and empirically on a coreference resolution data set.
author2 Tsang Wai-Hung, Ivor
author_facet Tsang Wai-Hung, Ivor
Xiang, Qiaoliang
format Theses and Dissertations
author Xiang, Qiaoliang
author_sort Xiang, Qiaoliang
title A split-merge framework for comparing clusterings
title_short A split-merge framework for comparing clusterings
title_full A split-merge framework for comparing clusterings
title_fullStr A split-merge framework for comparing clusterings
title_full_unstemmed A split-merge framework for comparing clusterings
title_sort split-merge framework for comparing clusterings
publishDate 2013
url https://hdl.handle.net/10356/55194
_version_ 1759855294678040576