Learning homophily couplings from non-iid data for joint feature selection and noise-resilient outlier detection

This paper introduces a novel wrapper-based outlier detection framework (WrapperOD) and its instance (HOUR) for identifying outliers in noisy data (i.e., data with noisy features) with strong couplings between outlying behaviors. Existing subspace or feature selection-based methods are significantly...

Full description

Saved in:
Bibliographic Details
Main Authors: PANG, Guansong, CAO, Longbing, CHEN, Ling, LIU, Huan
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2017
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7144
https://ink.library.smu.edu.sg/context/sis_research/article/8147/viewcontent/0360.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8147
record_format dspace
spelling sg-smu-ink.sis_research-81472022-04-22T04:20:16Z Learning homophily couplings from non-iid data for joint feature selection and noise-resilient outlier detection PANG, Guansong CAO, Longbing CHEN, Ling LIU, Huan This paper introduces a novel wrapper-based outlier detection framework (WrapperOD) and its instance (HOUR) for identifying outliers in noisy data (i.e., data with noisy features) with strong couplings between outlying behaviors. Existing subspace or feature selection-based methods are significantly challenged by such data, as their search of feature subset(s) is independent of outlier scoring and thus can be misled by noisy features. In contrast, HOUR takes a wrapper approach to iteratively optimize the feature subset selection and outlier scoring using a top-k outlier ranking evaluation measure as its objective function. HOUR learns homophily couplings between outlying behaviors (i.e., abnormal behaviors are not independent - they bond together) in constructing a noise-resilient outlier scoring function to produce a reliable outlier ranking in each iteration. We show that HOUR (i) retains a 2-approximation outlier ranking to the optimal one; and (ii) significantly outperforms five state-of-the-art competitors on 15 real-world data sets with different noise levels in terms of AUC and/or P@n. The source code of HOUR is available at https://sites.google.com/site/gspangsite/sourcecode. 2017-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7144 info:doi/10.24963/ijcai.2017/360 https://ink.library.smu.edu.sg/context/sis_research/article/8147/viewcontent/0360.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Machine Learning: Data Mining Machine Learning: Feature Selection/Construction Databases and Information Systems Data Storage Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Machine Learning: Data Mining
Machine Learning: Feature Selection/Construction
Databases and Information Systems
Data Storage Systems
spellingShingle Machine Learning: Data Mining
Machine Learning: Feature Selection/Construction
Databases and Information Systems
Data Storage Systems
PANG, Guansong
CAO, Longbing
CHEN, Ling
LIU, Huan
Learning homophily couplings from non-iid data for joint feature selection and noise-resilient outlier detection
description This paper introduces a novel wrapper-based outlier detection framework (WrapperOD) and its instance (HOUR) for identifying outliers in noisy data (i.e., data with noisy features) with strong couplings between outlying behaviors. Existing subspace or feature selection-based methods are significantly challenged by such data, as their search of feature subset(s) is independent of outlier scoring and thus can be misled by noisy features. In contrast, HOUR takes a wrapper approach to iteratively optimize the feature subset selection and outlier scoring using a top-k outlier ranking evaluation measure as its objective function. HOUR learns homophily couplings between outlying behaviors (i.e., abnormal behaviors are not independent - they bond together) in constructing a noise-resilient outlier scoring function to produce a reliable outlier ranking in each iteration. We show that HOUR (i) retains a 2-approximation outlier ranking to the optimal one; and (ii) significantly outperforms five state-of-the-art competitors on 15 real-world data sets with different noise levels in terms of AUC and/or P@n. The source code of HOUR is available at https://sites.google.com/site/gspangsite/sourcecode.
format text
author PANG, Guansong
CAO, Longbing
CHEN, Ling
LIU, Huan
author_facet PANG, Guansong
CAO, Longbing
CHEN, Ling
LIU, Huan
author_sort PANG, Guansong
title Learning homophily couplings from non-iid data for joint feature selection and noise-resilient outlier detection
title_short Learning homophily couplings from non-iid data for joint feature selection and noise-resilient outlier detection
title_full Learning homophily couplings from non-iid data for joint feature selection and noise-resilient outlier detection
title_fullStr Learning homophily couplings from non-iid data for joint feature selection and noise-resilient outlier detection
title_full_unstemmed Learning homophily couplings from non-iid data for joint feature selection and noise-resilient outlier detection
title_sort learning homophily couplings from non-iid data for joint feature selection and noise-resilient outlier detection
publisher Institutional Knowledge at Singapore Management University
publishDate 2017
url https://ink.library.smu.edu.sg/sis_research/7144
https://ink.library.smu.edu.sg/context/sis_research/article/8147/viewcontent/0360.pdf
_version_ 1770576231432978432