Selective value coupling learning for detecting outliers in high-dimensional categorical data
This paper introduces a novel framework, namely SelectVC and its instance POP, for learning selective value couplings (i.e., interactions between the full value set and a set of outlying values) to identify outliers in high-dimensional categorical data. Existing outlier detection methods work on a f...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2017
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/7142 https://ink.library.smu.edu.sg/context/sis_research/article/8145/viewcontent/3132847.3132994.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-8145 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-81452022-04-22T04:21:13Z Selective value coupling learning for detecting outliers in high-dimensional categorical data PANG, Guansong XU, Hongzuo CAO Longbing, ZHAO, Wentao This paper introduces a novel framework, namely SelectVC and its instance POP, for learning selective value couplings (i.e., interactions between the full value set and a set of outlying values) to identify outliers in high-dimensional categorical data. Existing outlier detection methods work on a full data space or feature subspaces that are identified independently from subsequent outlier scoring. As a result, they are significantly challenged by overwhelming irrelevant features in high-dimensional data due to the noise brought by the irrelevant features and its huge search space. In contrast, SelectVC works on a clean and condensed data space spanned by selective value couplings by jointly optimizing outlying value selection and value outlierness scoring. Its instance POP defines a value outlierness scoring function by modeling a partial outlierness propagation process to capture the selective value couplings. POP further defines a top-k outlying value selection method to ensure its scalability to the huge search space. We show that POP (i) significantly outperforms five state-of-the-art full space- or subspace-based outlier detectors and their combinations with three feature selection methods on 12 real-world high-dimensional data sets with different levels of irrelevant features; and (ii) obtains good scalability, stable performance w.r.t. k, and fast convergence rate. 2017-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7142 info:doi/10.1145/3132847.3132994 https://ink.library.smu.edu.sg/context/sis_research/article/8145/viewcontent/3132847.3132994.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Outlier Detection High-Dimensional Data Categorical Data Feature Selection Coupling Learning Databases and Information Systems Data Storage Systems |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Outlier Detection High-Dimensional Data Categorical Data Feature Selection Coupling Learning Databases and Information Systems Data Storage Systems |
spellingShingle |
Outlier Detection High-Dimensional Data Categorical Data Feature Selection Coupling Learning Databases and Information Systems Data Storage Systems PANG, Guansong XU, Hongzuo CAO Longbing, ZHAO, Wentao Selective value coupling learning for detecting outliers in high-dimensional categorical data |
description |
This paper introduces a novel framework, namely SelectVC and its instance POP, for learning selective value couplings (i.e., interactions between the full value set and a set of outlying values) to identify outliers in high-dimensional categorical data. Existing outlier detection methods work on a full data space or feature subspaces that are identified independently from subsequent outlier scoring. As a result, they are significantly challenged by overwhelming irrelevant features in high-dimensional data due to the noise brought by the irrelevant features and its huge search space. In contrast, SelectVC works on a clean and condensed data space spanned by selective value couplings by jointly optimizing outlying value selection and value outlierness scoring. Its instance POP defines a value outlierness scoring function by modeling a partial outlierness propagation process to capture the selective value couplings. POP further defines a top-k outlying value selection method to ensure its scalability to the huge search space. We show that POP (i) significantly outperforms five state-of-the-art full space- or subspace-based outlier detectors and their combinations with three feature selection methods on 12 real-world high-dimensional data sets with different levels of irrelevant features; and (ii) obtains good scalability, stable performance w.r.t. k, and fast convergence rate. |
format |
text |
author |
PANG, Guansong XU, Hongzuo CAO Longbing, ZHAO, Wentao |
author_facet |
PANG, Guansong XU, Hongzuo CAO Longbing, ZHAO, Wentao |
author_sort |
PANG, Guansong |
title |
Selective value coupling learning for detecting outliers in high-dimensional categorical data |
title_short |
Selective value coupling learning for detecting outliers in high-dimensional categorical data |
title_full |
Selective value coupling learning for detecting outliers in high-dimensional categorical data |
title_fullStr |
Selective value coupling learning for detecting outliers in high-dimensional categorical data |
title_full_unstemmed |
Selective value coupling learning for detecting outliers in high-dimensional categorical data |
title_sort |
selective value coupling learning for detecting outliers in high-dimensional categorical data |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2017 |
url |
https://ink.library.smu.edu.sg/sis_research/7142 https://ink.library.smu.edu.sg/context/sis_research/article/8145/viewcontent/3132847.3132994.pdf |
_version_ |
1770576231025082368 |