Outlier detection in complex categorical data by modeling the feature value couplings

This paper introduces a novel unsupervised outlier detection method, namely Coupled Biased Random Walks (CBRW), for identifying outliers in categorical data with diversified frequency distributions and many noisy features. Existing pattern-based outlier detection methods are ineffective in handling...

Full description

Saved in:
Bibliographic Details
Main Authors: PANG, Guansong, CAO, Longbing, CHEN, Ling
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2016
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7146
https://ink.library.smu.edu.sg/context/sis_research/article/8149/viewcontent/272.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8149
record_format dspace
spelling sg-smu-ink.sis_research-81492022-04-22T04:19:10Z Outlier detection in complex categorical data by modeling the feature value couplings PANG, Guansong CAO, Longbing CHEN, Ling This paper introduces a novel unsupervised outlier detection method, namely Coupled Biased Random Walks (CBRW), for identifying outliers in categorical data with diversified frequency distributions and many noisy features. Existing pattern-based outlier detection methods are ineffective in handling such complex scenarios, as they misfit such data. CBRW estimates outlier scores of feature values by modelling feature value level couplings, which carry intrinsic data characteristics, via biased random walks to handle this complex data. The outlier scores of feature values can either measure the outlierness of an object or facilitate the existing methods as a feature weighting and selection indicator. Substantial experiments show that CBRW can not only detect outliers in complex data significantly better than the state-of-the-art methods, but also greatly improve the performance of existing methods on data sets with many noisy features. 2016-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7146 info:doi/10.5555/3060832.3060887 https://ink.library.smu.edu.sg/context/sis_research/article/8149/viewcontent/272.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Data Storage Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
Data Storage Systems
spellingShingle Databases and Information Systems
Data Storage Systems
PANG, Guansong
CAO, Longbing
CHEN, Ling
Outlier detection in complex categorical data by modeling the feature value couplings
description This paper introduces a novel unsupervised outlier detection method, namely Coupled Biased Random Walks (CBRW), for identifying outliers in categorical data with diversified frequency distributions and many noisy features. Existing pattern-based outlier detection methods are ineffective in handling such complex scenarios, as they misfit such data. CBRW estimates outlier scores of feature values by modelling feature value level couplings, which carry intrinsic data characteristics, via biased random walks to handle this complex data. The outlier scores of feature values can either measure the outlierness of an object or facilitate the existing methods as a feature weighting and selection indicator. Substantial experiments show that CBRW can not only detect outliers in complex data significantly better than the state-of-the-art methods, but also greatly improve the performance of existing methods on data sets with many noisy features.
format text
author PANG, Guansong
CAO, Longbing
CHEN, Ling
author_facet PANG, Guansong
CAO, Longbing
CHEN, Ling
author_sort PANG, Guansong
title Outlier detection in complex categorical data by modeling the feature value couplings
title_short Outlier detection in complex categorical data by modeling the feature value couplings
title_full Outlier detection in complex categorical data by modeling the feature value couplings
title_fullStr Outlier detection in complex categorical data by modeling the feature value couplings
title_full_unstemmed Outlier detection in complex categorical data by modeling the feature value couplings
title_sort outlier detection in complex categorical data by modeling the feature value couplings
publisher Institutional Knowledge at Singapore Management University
publishDate 2016
url https://ink.library.smu.edu.sg/sis_research/7146
https://ink.library.smu.edu.sg/context/sis_research/article/8149/viewcontent/272.pdf
_version_ 1770576231844020224