Heterogeneous univariate outlier ensembles in multidimensional data

In outlier detection, recent major research has shifted from developing univariate methods to multivariate methods due to the rapid growth of multidimensional data. However, one typical issue of this paradigm shift is that many multidimensional data often mainly contains univariate outliers, in whic...

Full description

Saved in:
Bibliographic Details
Main Authors: PANG, Guansong, CAO, Longbing
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2020
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7039
https://ink.library.smu.edu.sg/context/sis_research/article/8042/viewcontent/3403934.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8042
record_format dspace
spelling sg-smu-ink.sis_research-80422022-04-14T02:27:07Z Heterogeneous univariate outlier ensembles in multidimensional data PANG, Guansong CAO, Longbing In outlier detection, recent major research has shifted from developing univariate methods to multivariate methods due to the rapid growth of multidimensional data. However, one typical issue of this paradigm shift is that many multidimensional data often mainly contains univariate outliers, in which many features are actually irrelevant. In such cases, multivariate methods are ineffective in identifying such outliers due to the potential biases and the curse of dimensionality brought by irrelevant features. Those univariate outliers might be well detected by applying univariate outlier detectors in individually relevant features. However, it is very challenging to choose a right univariate detector for each individual feature since different features may take very different probability distributions. To address this challenge, we introduce a novel Heterogeneous Univariate Outlier Ensembles (HUOE) framework and its instance ZDD to synthesize a set of heterogeneous univariate outlier detectors as base learners to build heterogeneous ensembles that are optimized for each individual feature. Extensive results on 19 real-world datasets and a collection of synthetic datasets show that ZDD obtains 5%–14% average AUC improvement over four state-of-the-art multivariate ensembles and performs substantially more robustly w.r.t. irrelevant features. 2020-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7039 info:doi/10.1145/3403934 https://ink.library.smu.edu.sg/context/sis_research/article/8042/viewcontent/3403934.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Outlier detection outlier ensemble anomaly detection univariate outlier multidimensional data heterogeneous data Artificial Intelligence and Robotics Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Outlier detection
outlier ensemble
anomaly detection
univariate outlier
multidimensional data
heterogeneous data
Artificial Intelligence and Robotics
Databases and Information Systems
spellingShingle Outlier detection
outlier ensemble
anomaly detection
univariate outlier
multidimensional data
heterogeneous data
Artificial Intelligence and Robotics
Databases and Information Systems
PANG, Guansong
CAO, Longbing
Heterogeneous univariate outlier ensembles in multidimensional data
description In outlier detection, recent major research has shifted from developing univariate methods to multivariate methods due to the rapid growth of multidimensional data. However, one typical issue of this paradigm shift is that many multidimensional data often mainly contains univariate outliers, in which many features are actually irrelevant. In such cases, multivariate methods are ineffective in identifying such outliers due to the potential biases and the curse of dimensionality brought by irrelevant features. Those univariate outliers might be well detected by applying univariate outlier detectors in individually relevant features. However, it is very challenging to choose a right univariate detector for each individual feature since different features may take very different probability distributions. To address this challenge, we introduce a novel Heterogeneous Univariate Outlier Ensembles (HUOE) framework and its instance ZDD to synthesize a set of heterogeneous univariate outlier detectors as base learners to build heterogeneous ensembles that are optimized for each individual feature. Extensive results on 19 real-world datasets and a collection of synthetic datasets show that ZDD obtains 5%–14% average AUC improvement over four state-of-the-art multivariate ensembles and performs substantially more robustly w.r.t. irrelevant features.
format text
author PANG, Guansong
CAO, Longbing
author_facet PANG, Guansong
CAO, Longbing
author_sort PANG, Guansong
title Heterogeneous univariate outlier ensembles in multidimensional data
title_short Heterogeneous univariate outlier ensembles in multidimensional data
title_full Heterogeneous univariate outlier ensembles in multidimensional data
title_fullStr Heterogeneous univariate outlier ensembles in multidimensional data
title_full_unstemmed Heterogeneous univariate outlier ensembles in multidimensional data
title_sort heterogeneous univariate outlier ensembles in multidimensional data
publisher Institutional Knowledge at Singapore Management University
publishDate 2020
url https://ink.library.smu.edu.sg/sis_research/7039
https://ink.library.smu.edu.sg/context/sis_research/article/8042/viewcontent/3403934.pdf
_version_ 1770576193004765184