Selective value coupling learning for detecting outliers in high-dimensional categorical data

This paper introduces a novel framework, namely SelectVC and its instance POP, for learning selective value couplings (i.e., interactions between the full value set and a set of outlying values) to identify outliers in high-dimensional categorical data. Existing outlier detection methods work on a f...

Full description

Saved in:
Bibliographic Details
Main Authors: PANG, Guansong, XU, Hongzuo, CAO Longbing, ZHAO, Wentao
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2017
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7142
https://ink.library.smu.edu.sg/context/sis_research/article/8145/viewcontent/3132847.3132994.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8145
record_format dspace
spelling sg-smu-ink.sis_research-81452022-04-22T04:21:13Z Selective value coupling learning for detecting outliers in high-dimensional categorical data PANG, Guansong XU, Hongzuo CAO Longbing, ZHAO, Wentao This paper introduces a novel framework, namely SelectVC and its instance POP, for learning selective value couplings (i.e., interactions between the full value set and a set of outlying values) to identify outliers in high-dimensional categorical data. Existing outlier detection methods work on a full data space or feature subspaces that are identified independently from subsequent outlier scoring. As a result, they are significantly challenged by overwhelming irrelevant features in high-dimensional data due to the noise brought by the irrelevant features and its huge search space. In contrast, SelectVC works on a clean and condensed data space spanned by selective value couplings by jointly optimizing outlying value selection and value outlierness scoring. Its instance POP defines a value outlierness scoring function by modeling a partial outlierness propagation process to capture the selective value couplings. POP further defines a top-k outlying value selection method to ensure its scalability to the huge search space. We show that POP (i) significantly outperforms five state-of-the-art full space- or subspace-based outlier detectors and their combinations with three feature selection methods on 12 real-world high-dimensional data sets with different levels of irrelevant features; and (ii) obtains good scalability, stable performance w.r.t. k, and fast convergence rate. 2017-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7142 info:doi/10.1145/3132847.3132994 https://ink.library.smu.edu.sg/context/sis_research/article/8145/viewcontent/3132847.3132994.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Outlier Detection High-Dimensional Data Categorical Data Feature Selection Coupling Learning Databases and Information Systems Data Storage Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Outlier Detection
High-Dimensional Data
Categorical Data
Feature Selection
Coupling Learning
Databases and Information Systems
Data Storage Systems
spellingShingle Outlier Detection
High-Dimensional Data
Categorical Data
Feature Selection
Coupling Learning
Databases and Information Systems
Data Storage Systems
PANG, Guansong
XU, Hongzuo
CAO Longbing,
ZHAO, Wentao
Selective value coupling learning for detecting outliers in high-dimensional categorical data
description This paper introduces a novel framework, namely SelectVC and its instance POP, for learning selective value couplings (i.e., interactions between the full value set and a set of outlying values) to identify outliers in high-dimensional categorical data. Existing outlier detection methods work on a full data space or feature subspaces that are identified independently from subsequent outlier scoring. As a result, they are significantly challenged by overwhelming irrelevant features in high-dimensional data due to the noise brought by the irrelevant features and its huge search space. In contrast, SelectVC works on a clean and condensed data space spanned by selective value couplings by jointly optimizing outlying value selection and value outlierness scoring. Its instance POP defines a value outlierness scoring function by modeling a partial outlierness propagation process to capture the selective value couplings. POP further defines a top-k outlying value selection method to ensure its scalability to the huge search space. We show that POP (i) significantly outperforms five state-of-the-art full space- or subspace-based outlier detectors and their combinations with three feature selection methods on 12 real-world high-dimensional data sets with different levels of irrelevant features; and (ii) obtains good scalability, stable performance w.r.t. k, and fast convergence rate.
format text
author PANG, Guansong
XU, Hongzuo
CAO Longbing,
ZHAO, Wentao
author_facet PANG, Guansong
XU, Hongzuo
CAO Longbing,
ZHAO, Wentao
author_sort PANG, Guansong
title Selective value coupling learning for detecting outliers in high-dimensional categorical data
title_short Selective value coupling learning for detecting outliers in high-dimensional categorical data
title_full Selective value coupling learning for detecting outliers in high-dimensional categorical data
title_fullStr Selective value coupling learning for detecting outliers in high-dimensional categorical data
title_full_unstemmed Selective value coupling learning for detecting outliers in high-dimensional categorical data
title_sort selective value coupling learning for detecting outliers in high-dimensional categorical data
publisher Institutional Knowledge at Singapore Management University
publishDate 2017
url https://ink.library.smu.edu.sg/sis_research/7142
https://ink.library.smu.edu.sg/context/sis_research/article/8145/viewcontent/3132847.3132994.pdf
_version_ 1770576231025082368