CURE: Flexible categorical data representation by hierarchical coupling learning

The representation of categorical data with hierarchical value coupling relationships (i.e., various value-to-value cluster interactions) is very critical yet challenging for capturing complex data characteristics in learning tasks. This paper proposes a novel and flexible coupled unsupervised categ...

Full description

Saved in:
Bibliographic Details
Main Authors: JIAN, Songlei, PANG, Guansong, CAO, Longbing, LU, Kai, GAO, Hang
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2019
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7137
https://ink.library.smu.edu.sg/context/sis_research/article/8140/viewcontent/08395013.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8140
record_format dspace
spelling sg-smu-ink.sis_research-81402022-04-22T04:23:17Z CURE: Flexible categorical data representation by hierarchical coupling learning JIAN, Songlei PANG, Guansong CAO, Longbing LU, Kai GAO, Hang The representation of categorical data with hierarchical value coupling relationships (i.e., various value-to-value cluster interactions) is very critical yet challenging for capturing complex data characteristics in learning tasks. This paper proposes a novel and flexible coupled unsupervised categorical data representation (CURE) framework, which not only captures the hierarchical couplings but is also flexible enough to be instantiated for contrastive learning tasks. CURE first learns the value clusters of different granularities based on multiple value coupling functions and then learns the value representation from the couplings between the obtained value clusters. With two complementary value coupling functions, CURE is instantiated into two models: coupled data embedding (CDE) for clustering and coupled outlier scoring of high-dimensional data (COSH) for outlier detection. These show that CURE is flexible for value clustering and coupling learning between value clusters for different learning tasks. CDE embeds categorical data into a new space in which features are independent and semantics are rich. COSH represents data w.r.t. an outlying vector to capture complex outlying behaviors of objects in high-dimensional data. Substantial experiments show that CDE significantly outperforms three popular unsupervised encoding methods and three state-of-the-art similarity measures, and COSH performs significantly better than five state-of-the-art outlier detection methods on high-dimensional data. CDE and COSH are scalable and stable, linear to data size and quadratic to the number of features, and are insensitive to their parameters. 2019-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7137 info:doi/10.1109/TKDE.2018.2848902 https://ink.library.smu.edu.sg/context/sis_research/article/8140/viewcontent/08395013.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Categorical data representation unsupervised learning coupling learning non-IID learning clustering outlier detection Databases and Information Systems Data Storage Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Categorical data representation
unsupervised learning
coupling learning
non-IID learning
clustering
outlier detection
Databases and Information Systems
Data Storage Systems
spellingShingle Categorical data representation
unsupervised learning
coupling learning
non-IID learning
clustering
outlier detection
Databases and Information Systems
Data Storage Systems
JIAN, Songlei
PANG, Guansong
CAO, Longbing
LU, Kai
GAO, Hang
CURE: Flexible categorical data representation by hierarchical coupling learning
description The representation of categorical data with hierarchical value coupling relationships (i.e., various value-to-value cluster interactions) is very critical yet challenging for capturing complex data characteristics in learning tasks. This paper proposes a novel and flexible coupled unsupervised categorical data representation (CURE) framework, which not only captures the hierarchical couplings but is also flexible enough to be instantiated for contrastive learning tasks. CURE first learns the value clusters of different granularities based on multiple value coupling functions and then learns the value representation from the couplings between the obtained value clusters. With two complementary value coupling functions, CURE is instantiated into two models: coupled data embedding (CDE) for clustering and coupled outlier scoring of high-dimensional data (COSH) for outlier detection. These show that CURE is flexible for value clustering and coupling learning between value clusters for different learning tasks. CDE embeds categorical data into a new space in which features are independent and semantics are rich. COSH represents data w.r.t. an outlying vector to capture complex outlying behaviors of objects in high-dimensional data. Substantial experiments show that CDE significantly outperforms three popular unsupervised encoding methods and three state-of-the-art similarity measures, and COSH performs significantly better than five state-of-the-art outlier detection methods on high-dimensional data. CDE and COSH are scalable and stable, linear to data size and quadratic to the number of features, and are insensitive to their parameters.
format text
author JIAN, Songlei
PANG, Guansong
CAO, Longbing
LU, Kai
GAO, Hang
author_facet JIAN, Songlei
PANG, Guansong
CAO, Longbing
LU, Kai
GAO, Hang
author_sort JIAN, Songlei
title CURE: Flexible categorical data representation by hierarchical coupling learning
title_short CURE: Flexible categorical data representation by hierarchical coupling learning
title_full CURE: Flexible categorical data representation by hierarchical coupling learning
title_fullStr CURE: Flexible categorical data representation by hierarchical coupling learning
title_full_unstemmed CURE: Flexible categorical data representation by hierarchical coupling learning
title_sort cure: flexible categorical data representation by hierarchical coupling learning
publisher Institutional Knowledge at Singapore Management University
publishDate 2019
url https://ink.library.smu.edu.sg/sis_research/7137
https://ink.library.smu.edu.sg/context/sis_research/article/8140/viewcontent/08395013.pdf
_version_ 1770576229927223296