Comparison of robust estimators for detecting outliers in multivariate datasets

Detecting outliers for multivariate data is difficult and does not work by visual inspection. Mahalanobis distance (MD) has been a classical method to detect outliers in multivariate data. However, classical mean and covariance matrix in MD suffer from masking and swamping effects. Masking effects h...

Full description

Saved in:
Bibliographic Details
Main Authors: Sharifah Sakinah, Syed Abd Mutalib, Siti Zanariah, Satari, Wan Nur Syahidah, Wan Yusoff
Format: Conference or Workshop Item
Language:English
Published: IOP Publishing 2021
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/35199/1/Comparison%20of%20robust%20estimators%20for%20detecting%20outliers%20in%20multivariate%20datasets.pdf
http://umpir.ump.edu.my/id/eprint/35199/
https://doi.org/10.1088/1742-6596/1988/1/012095
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaysia Pahang
Language: English
id my.ump.umpir.35199
record_format eprints
spelling my.ump.umpir.351992022-11-07T06:14:09Z http://umpir.ump.edu.my/id/eprint/35199/ Comparison of robust estimators for detecting outliers in multivariate datasets Sharifah Sakinah, Syed Abd Mutalib Siti Zanariah, Satari Wan Nur Syahidah, Wan Yusoff Q Science (General) QA Mathematics Detecting outliers for multivariate data is difficult and does not work by visual inspection. Mahalanobis distance (MD) has been a classical method to detect outliers in multivariate data. However, classical mean and covariance matrix in MD suffer from masking and swamping effects. Masking effects happened when outliers are not identified and swamping effects happened when inliers are identified as outliers. Hence, robust estimators have been proposed to overcome these problems. In this study, the performance of a new robust estimator named Test on Covariance (TOC) is tested and compared with other robust estimators which are Fast Minimum Covariance Determinant (FMCD), Minimum Vector Variance (MVV), Covariance Matrix Equality (CME) and Index Set Equality (ISE). These five robust estimators' performance is being tested on five real multivariate datasets. Brain and weight, Hawkins-Bradu Kass, Stackloss, Bushfire and Milk datasets were used as these five real datasets are well-known in most outlier detection studies. Results show that TOC has proven to be able in detecting outliers, does not have a masking effect and has the same performance as other robust estimators in all datasets. IOP Publishing 2021-08-17 Conference or Workshop Item PeerReviewed pdf en cc_by http://umpir.ump.edu.my/id/eprint/35199/1/Comparison%20of%20robust%20estimators%20for%20detecting%20outliers%20in%20multivariate%20datasets.pdf Sharifah Sakinah, Syed Abd Mutalib and Siti Zanariah, Satari and Wan Nur Syahidah, Wan Yusoff (2021) Comparison of robust estimators for detecting outliers in multivariate datasets. In: Journal of Physics: Conference Series, Simposium Kebangsaan Sains Matematik ke-28 (SKSM28), 28-29 July 2021 , Kuantan, Pahang, Malaysia. pp. 1-10., 1988 (012095). ISSN 1742-6588 (print); 1742-6596 (online) https://doi.org/10.1088/1742-6596/1988/1/012095
institution Universiti Malaysia Pahang
building UMP Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Pahang
content_source UMP Institutional Repository
url_provider http://umpir.ump.edu.my/
language English
topic Q Science (General)
QA Mathematics
spellingShingle Q Science (General)
QA Mathematics
Sharifah Sakinah, Syed Abd Mutalib
Siti Zanariah, Satari
Wan Nur Syahidah, Wan Yusoff
Comparison of robust estimators for detecting outliers in multivariate datasets
description Detecting outliers for multivariate data is difficult and does not work by visual inspection. Mahalanobis distance (MD) has been a classical method to detect outliers in multivariate data. However, classical mean and covariance matrix in MD suffer from masking and swamping effects. Masking effects happened when outliers are not identified and swamping effects happened when inliers are identified as outliers. Hence, robust estimators have been proposed to overcome these problems. In this study, the performance of a new robust estimator named Test on Covariance (TOC) is tested and compared with other robust estimators which are Fast Minimum Covariance Determinant (FMCD), Minimum Vector Variance (MVV), Covariance Matrix Equality (CME) and Index Set Equality (ISE). These five robust estimators' performance is being tested on five real multivariate datasets. Brain and weight, Hawkins-Bradu Kass, Stackloss, Bushfire and Milk datasets were used as these five real datasets are well-known in most outlier detection studies. Results show that TOC has proven to be able in detecting outliers, does not have a masking effect and has the same performance as other robust estimators in all datasets.
format Conference or Workshop Item
author Sharifah Sakinah, Syed Abd Mutalib
Siti Zanariah, Satari
Wan Nur Syahidah, Wan Yusoff
author_facet Sharifah Sakinah, Syed Abd Mutalib
Siti Zanariah, Satari
Wan Nur Syahidah, Wan Yusoff
author_sort Sharifah Sakinah, Syed Abd Mutalib
title Comparison of robust estimators for detecting outliers in multivariate datasets
title_short Comparison of robust estimators for detecting outliers in multivariate datasets
title_full Comparison of robust estimators for detecting outliers in multivariate datasets
title_fullStr Comparison of robust estimators for detecting outliers in multivariate datasets
title_full_unstemmed Comparison of robust estimators for detecting outliers in multivariate datasets
title_sort comparison of robust estimators for detecting outliers in multivariate datasets
publisher IOP Publishing
publishDate 2021
url http://umpir.ump.edu.my/id/eprint/35199/1/Comparison%20of%20robust%20estimators%20for%20detecting%20outliers%20in%20multivariate%20datasets.pdf
http://umpir.ump.edu.my/id/eprint/35199/
https://doi.org/10.1088/1742-6596/1988/1/012095
_version_ 1751536383481085952