Scalable and globally optimal generalized L1 K-center clustering via constraint generation in mixed integer linear programming

The k-center clustering algorithm, introduced over 35 years ago, is known to be robust to class imbalance prevalent in many clustering problems and has various applications such as data summarization, document clustering, and facility location determination. Unfortunately, existing k-center algorith...

Full description

Saved in:
Bibliographic Details
Main Authors: CHEMBU, Aravinth, SANNER, Scott, KHURRAM, Hassan, KUMAR, Akshat
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2023
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8518
https://ink.library.smu.edu.sg/context/sis_research/article/9521/viewcontent/aaai23_kcenters.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9521
record_format dspace
spelling sg-smu-ink.sis_research-95212024-01-22T15:07:44Z Scalable and globally optimal generalized L1 K-center clustering via constraint generation in mixed integer linear programming CHEMBU, Aravinth SANNER, Scott KHURRAM, Hassan KUMAR, Akshat The k-center clustering algorithm, introduced over 35 years ago, is known to be robust to class imbalance prevalent in many clustering problems and has various applications such as data summarization, document clustering, and facility location determination. Unfortunately, existing k-center algorithms provide highly suboptimal solutions that can limit their practical application, reproducibility, and clustering quality. In this paper, we provide a novel scalable and globally optimal solution to a popular variant of the k-center problem known as generalized L1 k-center clustering that uses L1 distance and allows the selection of arbitrary vectors as cluster centers. We show that this clustering objective can be reduced to a mixed-integer linear program (MILP) that facilitates globally optimal clustering solutions. However, solving such a MILP may be intractable for large datasets; to remedy this, we present a scalable algorithm that leverages constraint generation to efficiently and provably converge to its global optimum. We further enhance outlier handling through a simple but elegant extension to our MILP objective. We first evaluate our algorithm on a variety of synthetic datasets to better understand its properties and then validate on 20 real benchmark datasets where we compare its performance to both traditional L1 distance k-center and k-medians baselines. Our results demonstrate significant suboptimality of existing algorithms in comparison to our approach and further demonstrate that we can find optimal generalized L1 k-center clustering solutions up to an unprecedented 1,000,000 data points. 2023-02-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8518 info:doi/10.1609/aaai.v37i6.25857 https://ink.library.smu.edu.sg/context/sis_research/article/9521/viewcontent/aaai23_kcenters.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Class imbalance Clustering problems Clustering solutions Clusterings Constraints generation Integer Linear Programming K-center Mixed integer linear Mixed integer linear program Mixed integer-linear programmes Databases and Information Systems Theory and Algorithms
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Class imbalance
Clustering problems
Clustering solutions
Clusterings
Constraints generation
Integer Linear Programming
K-center
Mixed integer linear
Mixed integer linear program
Mixed integer-linear programmes
Databases and Information Systems
Theory and Algorithms
spellingShingle Class imbalance
Clustering problems
Clustering solutions
Clusterings
Constraints generation
Integer Linear Programming
K-center
Mixed integer linear
Mixed integer linear program
Mixed integer-linear programmes
Databases and Information Systems
Theory and Algorithms
CHEMBU, Aravinth
SANNER, Scott
KHURRAM, Hassan
KUMAR, Akshat
Scalable and globally optimal generalized L1 K-center clustering via constraint generation in mixed integer linear programming
description The k-center clustering algorithm, introduced over 35 years ago, is known to be robust to class imbalance prevalent in many clustering problems and has various applications such as data summarization, document clustering, and facility location determination. Unfortunately, existing k-center algorithms provide highly suboptimal solutions that can limit their practical application, reproducibility, and clustering quality. In this paper, we provide a novel scalable and globally optimal solution to a popular variant of the k-center problem known as generalized L1 k-center clustering that uses L1 distance and allows the selection of arbitrary vectors as cluster centers. We show that this clustering objective can be reduced to a mixed-integer linear program (MILP) that facilitates globally optimal clustering solutions. However, solving such a MILP may be intractable for large datasets; to remedy this, we present a scalable algorithm that leverages constraint generation to efficiently and provably converge to its global optimum. We further enhance outlier handling through a simple but elegant extension to our MILP objective. We first evaluate our algorithm on a variety of synthetic datasets to better understand its properties and then validate on 20 real benchmark datasets where we compare its performance to both traditional L1 distance k-center and k-medians baselines. Our results demonstrate significant suboptimality of existing algorithms in comparison to our approach and further demonstrate that we can find optimal generalized L1 k-center clustering solutions up to an unprecedented 1,000,000 data points.
format text
author CHEMBU, Aravinth
SANNER, Scott
KHURRAM, Hassan
KUMAR, Akshat
author_facet CHEMBU, Aravinth
SANNER, Scott
KHURRAM, Hassan
KUMAR, Akshat
author_sort CHEMBU, Aravinth
title Scalable and globally optimal generalized L1 K-center clustering via constraint generation in mixed integer linear programming
title_short Scalable and globally optimal generalized L1 K-center clustering via constraint generation in mixed integer linear programming
title_full Scalable and globally optimal generalized L1 K-center clustering via constraint generation in mixed integer linear programming
title_fullStr Scalable and globally optimal generalized L1 K-center clustering via constraint generation in mixed integer linear programming
title_full_unstemmed Scalable and globally optimal generalized L1 K-center clustering via constraint generation in mixed integer linear programming
title_sort scalable and globally optimal generalized l1 k-center clustering via constraint generation in mixed integer linear programming
publisher Institutional Knowledge at Singapore Management University
publishDate 2023
url https://ink.library.smu.edu.sg/sis_research/8518
https://ink.library.smu.edu.sg/context/sis_research/article/9521/viewcontent/aaai23_kcenters.pdf
_version_ 1789483257698451456