Statistical method for finding outliers in multivariate data using a boxplot and multiple linear regression

The objective of this study was to propose a method for detecting outliers in multivariate data. It is based on a boxplot and multiple linear regression. In our proposed method, the box plot was initially applied to filter the data across all variables to split the data set into two sets: normal dat...

Full description

Saved in:
Bibliographic Details
Main Authors: Theeraphat Thanwiset, Wuttichai Srisodaphol
Format: Article
Language:English
Published: Penerbit Universiti Kebangsaan Malaysia 2023
Online Access:http://journalarticle.ukm.my/23099/1/SMS%2020.pdf
http://journalarticle.ukm.my/23099/
https://www.ukm.my/jsm/english_journals/vol52num9_2023/contentsVol52num9_2023.html
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Kebangsaan Malaysia
Language: English
id my-ukm.journal.23099
record_format eprints
spelling my-ukm.journal.230992024-02-19T07:06:55Z http://journalarticle.ukm.my/23099/ Statistical method for finding outliers in multivariate data using a boxplot and multiple linear regression Theeraphat Thanwiset, Wuttichai Srisodaphol, The objective of this study was to propose a method for detecting outliers in multivariate data. It is based on a boxplot and multiple linear regression. In our proposed method, the box plot was initially applied to filter the data across all variables to split the data set into two sets: normal data (belonging to the upper and lower fences of the boxplot) and data that could be outliers. The normal data was then used to construct a multiple linear regression model and find the maximum error of the residual to denote the cut-off point. For the performance evaluation of the proposed method, a simulation study for multivariate normal data with and without contaminated data was conducted at various levels. The previous methods were compared with the performance of the proposed methods, namely, the Mahalanobis distance and Mahalanobis distance with the robust estimators using the minimum volume ellipsoid method, the minimum covariance determinant method, and the minimum vector variance method. The results showed that the proposed method had the best performance over other methods that were compared for all the contaminated levels. It was also found that when the proposed method was used with real data, it was able to find outlier values that were in line with the real data. Penerbit Universiti Kebangsaan Malaysia 2023 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/23099/1/SMS%2020.pdf Theeraphat Thanwiset, and Wuttichai Srisodaphol, (2023) Statistical method for finding outliers in multivariate data using a boxplot and multiple linear regression. Sains Malaysiana, 52 (9). pp. 2725-2732. ISSN 0126-6039 https://www.ukm.my/jsm/english_journals/vol52num9_2023/contentsVol52num9_2023.html
institution Universiti Kebangsaan Malaysia
building Tun Sri Lanang Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Kebangsaan Malaysia
content_source UKM Journal Article Repository
url_provider http://journalarticle.ukm.my/
language English
description The objective of this study was to propose a method for detecting outliers in multivariate data. It is based on a boxplot and multiple linear regression. In our proposed method, the box plot was initially applied to filter the data across all variables to split the data set into two sets: normal data (belonging to the upper and lower fences of the boxplot) and data that could be outliers. The normal data was then used to construct a multiple linear regression model and find the maximum error of the residual to denote the cut-off point. For the performance evaluation of the proposed method, a simulation study for multivariate normal data with and without contaminated data was conducted at various levels. The previous methods were compared with the performance of the proposed methods, namely, the Mahalanobis distance and Mahalanobis distance with the robust estimators using the minimum volume ellipsoid method, the minimum covariance determinant method, and the minimum vector variance method. The results showed that the proposed method had the best performance over other methods that were compared for all the contaminated levels. It was also found that when the proposed method was used with real data, it was able to find outlier values that were in line with the real data.
format Article
author Theeraphat Thanwiset,
Wuttichai Srisodaphol,
spellingShingle Theeraphat Thanwiset,
Wuttichai Srisodaphol,
Statistical method for finding outliers in multivariate data using a boxplot and multiple linear regression
author_facet Theeraphat Thanwiset,
Wuttichai Srisodaphol,
author_sort Theeraphat Thanwiset,
title Statistical method for finding outliers in multivariate data using a boxplot and multiple linear regression
title_short Statistical method for finding outliers in multivariate data using a boxplot and multiple linear regression
title_full Statistical method for finding outliers in multivariate data using a boxplot and multiple linear regression
title_fullStr Statistical method for finding outliers in multivariate data using a boxplot and multiple linear regression
title_full_unstemmed Statistical method for finding outliers in multivariate data using a boxplot and multiple linear regression
title_sort statistical method for finding outliers in multivariate data using a boxplot and multiple linear regression
publisher Penerbit Universiti Kebangsaan Malaysia
publishDate 2023
url http://journalarticle.ukm.my/23099/1/SMS%2020.pdf
http://journalarticle.ukm.my/23099/
https://www.ukm.my/jsm/english_journals/vol52num9_2023/contentsVol52num9_2023.html
_version_ 1792152153073123328