A Systematic Review of Anomaly Detection within High Dimensional and Multivariate Data

—In data analysis, recognizing unusual patterns (outliers’ analysis or anomaly detection) plays a crucial role in identifying critical events. Because of its widespread use in many applications, it remains an important and extensive research brand in data mining. As a result, numerous techniques for...

Full description

Saved in:
Bibliographic Details
Main Authors: Suboh, Syahirah, Abdul Aziz, Izzatdin, Shaharudin, Shazlyn Milleana, Ismail, Saidatul Akmar, Mahdin, Hairulnizam
Format: Article
Language:English
Published: JOIV
Subjects:
Online Access:http://eprints.uthm.edu.my/11414/1/J15862_f3944b7e279a07421e2ed97fc6d397d2.pdf
http://eprints.uthm.edu.my/11414/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Tun Hussein Onn Malaysia
Language: English
Description
Summary:—In data analysis, recognizing unusual patterns (outliers’ analysis or anomaly detection) plays a crucial role in identifying critical events. Because of its widespread use in many applications, it remains an important and extensive research brand in data mining. As a result, numerous techniques for finding anomalies have been developed, and more are still being worked on. Researchers can gain vital knowledge by identifying anomalies, which helps them make better meaningful data analyses. However, anomaly detection is even more challenging when the datasets are high-dimensional and multivariate. In the literature, anomaly detection has received much attention but not as much as anomaly detection, specifically in high dimensional and multivariate conditions. This paper systematically reviews the existing related techniques and presents extensive coverage of challenges and perspectives of anomaly detection within highdimensional and multivariate data. At the same time, it provides a clear insight into the techniques developed for anomaly detection problems. This paper aims to help select the best technique that suits its rightful purpose. It has been found that PCA, DOBIN, Stray algorithm, and DAE-KNN have a high learning rate compared to Random projection, ROBEM, and OCP methods. Overall, most methods have shown an excellent ability to tackle the curse of dimensionality and multivariate features to perform anomaly detection. Moreover, a comparison of each algorithm for anomaly detection is also provided to produce a better algorithm. Finally, it would be a line of future studies to extend by comparing the methods on other domain-specific datasets and offering a comprehensive anomaly interpretation in describing the truth of anomalies.