Dimensionality's blessing: Clustering images by underlying distribution

Many high dimensional vector distances tend to a constant. This is typically considered a negative “contrastloss” phenomenon that hinders clustering and other machine learning techniques. We reinterpret “contrast-loss” as a blessing. Re-deriving “contrast-loss” using the law of large numbers, we sho...

Full description

Saved in:
Bibliographic Details
Main Authors: LIN, Wen-yan, LAI, Jian-Huang, LIU, Siying, MATSUSHITA, Yasuyuki
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2018
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/4810
https://ink.library.smu.edu.sg/context/sis_research/article/5813/viewcontent/Lin_Dimensionalitys_Blessing_Clustering_CVPR_2018_paper.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-5813
record_format dspace
spelling sg-smu-ink.sis_research-58132020-01-16T10:03:17Z Dimensionality's blessing: Clustering images by underlying distribution LIN, Wen-yan LAI, Jian-Huang LIU, Siying MATSUSHITA, Yasuyuki Many high dimensional vector distances tend to a constant. This is typically considered a negative “contrastloss” phenomenon that hinders clustering and other machine learning techniques. We reinterpret “contrast-loss” as a blessing. Re-deriving “contrast-loss” using the law of large numbers, we show it results in a distribution’s instances concentrating on a thin “hyper-shell”. The hollow center means apparently chaotically overlapping distributions are actually intrinsically separable. We use this to develop distribution-clustering, an elegant algorithm for grouping of data points by their (unknown) underlying distribution. Distribution-clustering, creates notably clean clusters from raw unlabeled data, estimates the number of clusters for itself and is inherently robust to “outliers” which form their own clusters. This enables trawling for patterns in unorganized data and may be the key to enabling machine intelligence. 2018-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4810 info:doi/10.1109/CVPR.2018.00606 https://ink.library.smu.edu.sg/context/sis_research/article/5813/viewcontent/Lin_Dimensionalitys_Blessing_Clustering_CVPR_2018_paper.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Computer and Systems Architecture Graphics and Human Computer Interfaces
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Computer and Systems Architecture
Graphics and Human Computer Interfaces
spellingShingle Computer and Systems Architecture
Graphics and Human Computer Interfaces
LIN, Wen-yan
LAI, Jian-Huang
LIU, Siying
MATSUSHITA, Yasuyuki
Dimensionality's blessing: Clustering images by underlying distribution
description Many high dimensional vector distances tend to a constant. This is typically considered a negative “contrastloss” phenomenon that hinders clustering and other machine learning techniques. We reinterpret “contrast-loss” as a blessing. Re-deriving “contrast-loss” using the law of large numbers, we show it results in a distribution’s instances concentrating on a thin “hyper-shell”. The hollow center means apparently chaotically overlapping distributions are actually intrinsically separable. We use this to develop distribution-clustering, an elegant algorithm for grouping of data points by their (unknown) underlying distribution. Distribution-clustering, creates notably clean clusters from raw unlabeled data, estimates the number of clusters for itself and is inherently robust to “outliers” which form their own clusters. This enables trawling for patterns in unorganized data and may be the key to enabling machine intelligence.
format text
author LIN, Wen-yan
LAI, Jian-Huang
LIU, Siying
MATSUSHITA, Yasuyuki
author_facet LIN, Wen-yan
LAI, Jian-Huang
LIU, Siying
MATSUSHITA, Yasuyuki
author_sort LIN, Wen-yan
title Dimensionality's blessing: Clustering images by underlying distribution
title_short Dimensionality's blessing: Clustering images by underlying distribution
title_full Dimensionality's blessing: Clustering images by underlying distribution
title_fullStr Dimensionality's blessing: Clustering images by underlying distribution
title_full_unstemmed Dimensionality's blessing: Clustering images by underlying distribution
title_sort dimensionality's blessing: clustering images by underlying distribution
publisher Institutional Knowledge at Singapore Management University
publishDate 2018
url https://ink.library.smu.edu.sg/sis_research/4810
https://ink.library.smu.edu.sg/context/sis_research/article/5813/viewcontent/Lin_Dimensionalitys_Blessing_Clustering_CVPR_2018_paper.pdf
_version_ 1770575037546364928