Anomaly detection in multivariate time series using ensemble method

Water distribution networks (WDNs) are essential services to people’s life and production. The identification of anomalies and mitigation of cyber-attacks are crucial to ensure uninterrupted water service. Among various solutions of anomalies detection, matrix profile is recognized as the most ti...

Full description

Saved in:
Bibliographic Details
Main Author: Liu, Yanling
Other Authors: Chng Eng Siong
Format: Thesis-Master by Research
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/155731
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-155731
record_format dspace
spelling sg-ntu-dr.10356-1557312022-04-04T03:16:53Z Anomaly detection in multivariate time series using ensemble method Liu, Yanling Chng Eng Siong School of Computer Science and Engineering Xylem Water Solutions Singapore Pte Ltd Li Ye ASESChng@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Water distribution networks (WDNs) are essential services to people’s life and production. The identification of anomalies and mitigation of cyber-attacks are crucial to ensure uninterrupted water service. Among various solutions of anomalies detection, matrix profile is recognized as the most time-efficient distanced-based approach. Matrix profile identifies discords in univariate time series (UTS). As the physical processes are interdependent in water networks, the data obtained from different sensors are correlated to detect anomalies in multivariate time series (MTS). However, this approach only collects positive predictions and has limitations in eliminating false-positive detections. To improve the above-mentioned anomaly detection limitation of matrix profile, we propose and demonstrate two methods, the matrix profile with autoencoder method and the boosting method. Autoencoder, an artificial neural network trained to copy its input to its output, is introduced to reduce false alarms. Moreover, the localization of anomalies is automated by analyzing the UTS anomaly detection results. Boosting is an ensemble learning algorithm that focuses on correcting misclassified labels by the previous model with the current model. It converts weak learners to strong learners sequentially. Three boosting methods, including XGBoost, LightGBM, and CatBoost, are studied to tackle the classification of anomalies. Specifically, the proposed matrix profile with the autoencoder based ensemble model is applied as a semi-supervised anomaly detection model. The three boosting-based models are proposed as supervised anomaly detection models. To validate effectiveness in complex environments of water distribution system (WDS), we tested the proposed two methods with simulated datasets containing labeled cyber-attacks. Both the matrix profile with autoencoder model and the CatBoost model show high accuracy of 0.9645 and 0.9245, respectively, superior to the existing state-of-the-art models. In addition, the boosting methods are also applied to anomaly detection on a simulated leakage dataset that contains detailed leakage information in WDS. The LightGMB provides outstanding classification results with 0.945 and 0.985 accuracy, which is competitive among the frontier models. Master of Engineering 2022-03-15T02:33:49Z 2022-03-15T02:33:49Z 2021 Thesis-Master by Research Liu, Y. (2021). Anomaly detection in multivariate time series using ensemble method. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/155731 https://hdl.handle.net/10356/155731 10.32657/10356/155731 en Industrial Postgraduate Program (IPP) This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Liu, Yanling
Anomaly detection in multivariate time series using ensemble method
description Water distribution networks (WDNs) are essential services to people’s life and production. The identification of anomalies and mitigation of cyber-attacks are crucial to ensure uninterrupted water service. Among various solutions of anomalies detection, matrix profile is recognized as the most time-efficient distanced-based approach. Matrix profile identifies discords in univariate time series (UTS). As the physical processes are interdependent in water networks, the data obtained from different sensors are correlated to detect anomalies in multivariate time series (MTS). However, this approach only collects positive predictions and has limitations in eliminating false-positive detections. To improve the above-mentioned anomaly detection limitation of matrix profile, we propose and demonstrate two methods, the matrix profile with autoencoder method and the boosting method. Autoencoder, an artificial neural network trained to copy its input to its output, is introduced to reduce false alarms. Moreover, the localization of anomalies is automated by analyzing the UTS anomaly detection results. Boosting is an ensemble learning algorithm that focuses on correcting misclassified labels by the previous model with the current model. It converts weak learners to strong learners sequentially. Three boosting methods, including XGBoost, LightGBM, and CatBoost, are studied to tackle the classification of anomalies. Specifically, the proposed matrix profile with the autoencoder based ensemble model is applied as a semi-supervised anomaly detection model. The three boosting-based models are proposed as supervised anomaly detection models. To validate effectiveness in complex environments of water distribution system (WDS), we tested the proposed two methods with simulated datasets containing labeled cyber-attacks. Both the matrix profile with autoencoder model and the CatBoost model show high accuracy of 0.9645 and 0.9245, respectively, superior to the existing state-of-the-art models. In addition, the boosting methods are also applied to anomaly detection on a simulated leakage dataset that contains detailed leakage information in WDS. The LightGMB provides outstanding classification results with 0.945 and 0.985 accuracy, which is competitive among the frontier models.
author2 Chng Eng Siong
author_facet Chng Eng Siong
Liu, Yanling
format Thesis-Master by Research
author Liu, Yanling
author_sort Liu, Yanling
title Anomaly detection in multivariate time series using ensemble method
title_short Anomaly detection in multivariate time series using ensemble method
title_full Anomaly detection in multivariate time series using ensemble method
title_fullStr Anomaly detection in multivariate time series using ensemble method
title_full_unstemmed Anomaly detection in multivariate time series using ensemble method
title_sort anomaly detection in multivariate time series using ensemble method
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/155731
_version_ 1729789514162896896