Anomaly detection in multivariate time series using ensemble method

Water distribution networks (WDNs) are essential services to people’s life and production. The identification of anomalies and mitigation of cyber-attacks are crucial to ensure uninterrupted water service. Among various solutions of anomalies detection, matrix profile is recognized as the most ti...

Full description

Saved in:
Bibliographic Details
Main Author: Liu, Yanling
Other Authors: Chng Eng Siong
Format: Thesis-Master by Research
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/155731
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Water distribution networks (WDNs) are essential services to people’s life and production. The identification of anomalies and mitigation of cyber-attacks are crucial to ensure uninterrupted water service. Among various solutions of anomalies detection, matrix profile is recognized as the most time-efficient distanced-based approach. Matrix profile identifies discords in univariate time series (UTS). As the physical processes are interdependent in water networks, the data obtained from different sensors are correlated to detect anomalies in multivariate time series (MTS). However, this approach only collects positive predictions and has limitations in eliminating false-positive detections. To improve the above-mentioned anomaly detection limitation of matrix profile, we propose and demonstrate two methods, the matrix profile with autoencoder method and the boosting method. Autoencoder, an artificial neural network trained to copy its input to its output, is introduced to reduce false alarms. Moreover, the localization of anomalies is automated by analyzing the UTS anomaly detection results. Boosting is an ensemble learning algorithm that focuses on correcting misclassified labels by the previous model with the current model. It converts weak learners to strong learners sequentially. Three boosting methods, including XGBoost, LightGBM, and CatBoost, are studied to tackle the classification of anomalies. Specifically, the proposed matrix profile with the autoencoder based ensemble model is applied as a semi-supervised anomaly detection model. The three boosting-based models are proposed as supervised anomaly detection models. To validate effectiveness in complex environments of water distribution system (WDS), we tested the proposed two methods with simulated datasets containing labeled cyber-attacks. Both the matrix profile with autoencoder model and the CatBoost model show high accuracy of 0.9645 and 0.9245, respectively, superior to the existing state-of-the-art models. In addition, the boosting methods are also applied to anomaly detection on a simulated leakage dataset that contains detailed leakage information in WDS. The LightGMB provides outstanding classification results with 0.945 and 0.985 accuracy, which is competitive among the frontier models.