Unsupervised feature selection for outlier detection by modelling hierarchical value-feature couplings

Proper feature selection for unsupervised outlier detection can improve detection performance but is very challenging due to complex feature interactions, the mixture of relevant features with noisy/redundant features in imbalanced data, and the unavailability of class labels. Little work has been d...

Full description

Saved in:
Bibliographic Details
Main Authors: PANG, Guansong, CAO, Longbing, CHEN, Ling, LIU, Huan
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2016
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7145
https://ink.library.smu.edu.sg/context/sis_research/article/8148/viewcontent/07837865.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8148
record_format dspace
spelling sg-smu-ink.sis_research-81482022-04-22T04:19:31Z Unsupervised feature selection for outlier detection by modelling hierarchical value-feature couplings PANG, Guansong CAO, Longbing CHEN, Ling LIU, Huan Proper feature selection for unsupervised outlier detection can improve detection performance but is very challenging due to complex feature interactions, the mixture of relevant features with noisy/redundant features in imbalanced data, and the unavailability of class labels. Little work has been done on this challenge. This paper proposes a novel Coupled Unsupervised Feature Selection framework (CUFS for short) to filter out noisy or redundant features for subsequent outlier detection in categorical data. CUFS quantifies the outlierness (or relevance) of features by learning and integrating both the feature value couplings and feature couplings. Such value-to-feature couplings capture intrinsic data characteristics and distinguish relevant features from those noisy/redundant features. CUFS is further instantiated into a parameter-free Dense Subgraph-based Feature Selection method, called DSFS. We prove that DSFS retains a 2-approximation feature subset to the optimal subset. Extensive evaluation results on 15 real-world data sets show that DSFS obtains an average 48% feature reduction rate, and enables three different types of pattern-based outlier detection methods to achieve substantially better AUC improvements and/or perform orders of magnitude faster than on the original feature set. Compared to its feature selection contender, on average, all three DSFS-based detectors achieve more than 20% AUC improvement. 2016-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7145 info:doi/10.1109/ICDM.2016.0052 https://ink.library.smu.edu.sg/context/sis_research/article/8148/viewcontent/07837865.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Outlying Feature Selection Coupling Learning Non-IID Outlier Detection Databases and Information Systems Data Storage Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Outlying Feature Selection
Coupling Learning
Non-IID Outlier Detection
Databases and Information Systems
Data Storage Systems
spellingShingle Outlying Feature Selection
Coupling Learning
Non-IID Outlier Detection
Databases and Information Systems
Data Storage Systems
PANG, Guansong
CAO, Longbing
CHEN, Ling
LIU, Huan
Unsupervised feature selection for outlier detection by modelling hierarchical value-feature couplings
description Proper feature selection for unsupervised outlier detection can improve detection performance but is very challenging due to complex feature interactions, the mixture of relevant features with noisy/redundant features in imbalanced data, and the unavailability of class labels. Little work has been done on this challenge. This paper proposes a novel Coupled Unsupervised Feature Selection framework (CUFS for short) to filter out noisy or redundant features for subsequent outlier detection in categorical data. CUFS quantifies the outlierness (or relevance) of features by learning and integrating both the feature value couplings and feature couplings. Such value-to-feature couplings capture intrinsic data characteristics and distinguish relevant features from those noisy/redundant features. CUFS is further instantiated into a parameter-free Dense Subgraph-based Feature Selection method, called DSFS. We prove that DSFS retains a 2-approximation feature subset to the optimal subset. Extensive evaluation results on 15 real-world data sets show that DSFS obtains an average 48% feature reduction rate, and enables three different types of pattern-based outlier detection methods to achieve substantially better AUC improvements and/or perform orders of magnitude faster than on the original feature set. Compared to its feature selection contender, on average, all three DSFS-based detectors achieve more than 20% AUC improvement.
format text
author PANG, Guansong
CAO, Longbing
CHEN, Ling
LIU, Huan
author_facet PANG, Guansong
CAO, Longbing
CHEN, Ling
LIU, Huan
author_sort PANG, Guansong
title Unsupervised feature selection for outlier detection by modelling hierarchical value-feature couplings
title_short Unsupervised feature selection for outlier detection by modelling hierarchical value-feature couplings
title_full Unsupervised feature selection for outlier detection by modelling hierarchical value-feature couplings
title_fullStr Unsupervised feature selection for outlier detection by modelling hierarchical value-feature couplings
title_full_unstemmed Unsupervised feature selection for outlier detection by modelling hierarchical value-feature couplings
title_sort unsupervised feature selection for outlier detection by modelling hierarchical value-feature couplings
publisher Institutional Knowledge at Singapore Management University
publishDate 2016
url https://ink.library.smu.edu.sg/sis_research/7145
https://ink.library.smu.edu.sg/context/sis_research/article/8148/viewcontent/07837865.pdf
_version_ 1770576231605993472