Comparison of Robust Estimators’ Performance for Detecting Outliers in Multivariate Data

In multivariate data, outliers are difficult to detect especially when the dimension of the data increase. Mahalanobis distance (MD) has been one of the classical methods to detect outliers for multivariate data. However, the classical mean and covariance matrix in MD suffered from masking and swamp...

Full description

Saved in:
Bibliographic Details
Main Authors: Sharifah Sakinah, Syed Abd Mutalib, Siti Zanariah, Satari, Wan Nur Syahidah, Wan Yusoff
Format: Article
Language:English
Published: Universiti Teknologi MARA (UiTM) 2021
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/32427/8/Comparison%20of%20Robust%20Estimators.pdf
http://umpir.ump.edu.my/id/eprint/32427/
https://ejournal.um.edu.my/index.php/JOSMA/article/view/32399
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaysia Pahang
Language: English
Description
Summary:In multivariate data, outliers are difficult to detect especially when the dimension of the data increase. Mahalanobis distance (MD) has been one of the classical methods to detect outliers for multivariate data. However, the classical mean and covariance matrix in MD suffered from masking and swamping effects if the data contain outliers. Due to this problem, many studies used a robust estimator instead of the classical estimator of mean and covariance matrix. In this study, the performance of five robust estimators namely Fast Minimum Covariance Determinant (FMCD), Minimum Vector Variance (MVV), Covariance Matrix Equality (CME), Index Set Equality (ISE),and Test on Covariance (TOC) are investigated and compared. FMCD has been widely used and is known as among the best robust estimator. However, there are certain conditions that FMCD still lacks. MVV, CME, ISE and TOC are innovative of FMCD. These four robust estimators improve the last step of the FMCD algorithm. Hence, the objective of this study is to observe the performance of these five estimator to detect outliers in multivariate data particularly TOC as TOC is the latest robust estimator. Simulation studies are conducted for two outlier scenarios with various conditions. There are three performance measures, which are pout, pmask and pswamp used to measure the performance of the robust estimators. It is found that the TOC gives better performance in pswamp for most conditions. TOC gives better results for pout and pmask for certain conditions.