A split-merge framework for comparing clusterings

External clustering evaluation measures are often used to evaluate the performance of different clustering algorithms on a collection of data sets. Traditional normalization property is no longer suitable for this task and a conditional normalization property is proposed based on the fact that one c...

Full description

Saved in:

Bibliographic Details
Main Author:	Xiang, Qiaoliang
Other Authors:	Tsang Wai-Hung, Ivor
Format:	Theses and Dissertations
Language:	English
Published:	2013
Subjects:	DRNTU::Engineering::Computer science and engineering
Online Access:	https://hdl.handle.net/10356/55194
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-55194
record_format	dspace
spelling	sg-ntu-dr.10356-551942023-03-04T00:48:44Z A split-merge framework for comparing clusterings Xiang, Qiaoliang Tsang Wai-Hung, Ivor School of Computer Engineering Centre for Computational Intelligence DRNTU::Engineering::Computer science and engineering External clustering evaluation measures are often used to evaluate the performance of different clustering algorithms on a collection of data sets. Traditional normalization property is no longer suitable for this task and a conditional normalization property is proposed based on the fact that one clustering is the ground-truth. Even existing measures have been proposed from different points of view, we study them from the normalization point of view. Besides, we propose a new category of cluster counting measures and further group set matching measures into two subcategories according to how the matching is performed. Furthermore, we propose a generative model to study how exist- ing measures are generated as well as producing new measures according to application requirements. In order to understand the intrinsic properties of a measure, a graph-based model is presented to model two clusterings as a directed bipartite graph, which can be decomposed into weakly connected components. A measure can be expressed as a conic combination of scores on components, and different weights are assigned to components when aggregating their scores. Based on the graph-based model, we propose a split-merge framework by breaking components into subcomponents and combining the scores of any two related subcomponents. It is conditionally normalized while existing measures are not. It also has many nice properties compared to other existing frameworks. We give some examples of the framework and compare one example with a few representative measures theoretically and empirically on a coreference resolution data set. MASTER OF ENGINEERING (SCE) 2013-12-30T02:27:06Z 2013-12-30T02:27:06Z 2013 2013 Thesis Xiang, Q. (2013). A split-merge framework for comparing clusterings. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/55194 10.32657/10356/55194 en 106 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering
spellingShingle	DRNTU::Engineering::Computer science and engineering Xiang, Qiaoliang A split-merge framework for comparing clusterings
description	External clustering evaluation measures are often used to evaluate the performance of different clustering algorithms on a collection of data sets. Traditional normalization property is no longer suitable for this task and a conditional normalization property is proposed based on the fact that one clustering is the ground-truth. Even existing measures have been proposed from different points of view, we study them from the normalization point of view. Besides, we propose a new category of cluster counting measures and further group set matching measures into two subcategories according to how the matching is performed. Furthermore, we propose a generative model to study how exist- ing measures are generated as well as producing new measures according to application requirements. In order to understand the intrinsic properties of a measure, a graph-based model is presented to model two clusterings as a directed bipartite graph, which can be decomposed into weakly connected components. A measure can be expressed as a conic combination of scores on components, and different weights are assigned to components when aggregating their scores. Based on the graph-based model, we propose a split-merge framework by breaking components into subcomponents and combining the scores of any two related subcomponents. It is conditionally normalized while existing measures are not. It also has many nice properties compared to other existing frameworks. We give some examples of the framework and compare one example with a few representative measures theoretically and empirically on a coreference resolution data set.
author2	Tsang Wai-Hung, Ivor
author_facet	Tsang Wai-Hung, Ivor Xiang, Qiaoliang
format	Theses and Dissertations
author	Xiang, Qiaoliang
author_sort	Xiang, Qiaoliang
title	A split-merge framework for comparing clusterings
title_short	A split-merge framework for comparing clusterings
title_full	A split-merge framework for comparing clusterings
title_fullStr	A split-merge framework for comparing clusterings
title_full_unstemmed	A split-merge framework for comparing clusterings
title_sort	split-merge framework for comparing clusterings
publishDate	2013
url	https://hdl.handle.net/10356/55194
_version_	1759855294678040576

A split-merge framework for comparing clusterings

Similar Items