Anomaly detection in multivariate time series using ensemble method
Water distribution networks (WDNs) are essential services to people’s life and production. The identification of anomalies and mitigation of cyber-attacks are crucial to ensure uninterrupted water service. Among various solutions of anomalies detection, matrix profile is recognized as the most ti...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Research |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/155731 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-155731 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1557312022-04-04T03:16:53Z Anomaly detection in multivariate time series using ensemble method Liu, Yanling Chng Eng Siong School of Computer Science and Engineering Xylem Water Solutions Singapore Pte Ltd Li Ye ASESChng@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Water distribution networks (WDNs) are essential services to people’s life and production. The identification of anomalies and mitigation of cyber-attacks are crucial to ensure uninterrupted water service. Among various solutions of anomalies detection, matrix profile is recognized as the most time-efficient distanced-based approach. Matrix profile identifies discords in univariate time series (UTS). As the physical processes are interdependent in water networks, the data obtained from different sensors are correlated to detect anomalies in multivariate time series (MTS). However, this approach only collects positive predictions and has limitations in eliminating false-positive detections. To improve the above-mentioned anomaly detection limitation of matrix profile, we propose and demonstrate two methods, the matrix profile with autoencoder method and the boosting method. Autoencoder, an artificial neural network trained to copy its input to its output, is introduced to reduce false alarms. Moreover, the localization of anomalies is automated by analyzing the UTS anomaly detection results. Boosting is an ensemble learning algorithm that focuses on correcting misclassified labels by the previous model with the current model. It converts weak learners to strong learners sequentially. Three boosting methods, including XGBoost, LightGBM, and CatBoost, are studied to tackle the classification of anomalies. Specifically, the proposed matrix profile with the autoencoder based ensemble model is applied as a semi-supervised anomaly detection model. The three boosting-based models are proposed as supervised anomaly detection models. To validate effectiveness in complex environments of water distribution system (WDS), we tested the proposed two methods with simulated datasets containing labeled cyber-attacks. Both the matrix profile with autoencoder model and the CatBoost model show high accuracy of 0.9645 and 0.9245, respectively, superior to the existing state-of-the-art models. In addition, the boosting methods are also applied to anomaly detection on a simulated leakage dataset that contains detailed leakage information in WDS. The LightGMB provides outstanding classification results with 0.945 and 0.985 accuracy, which is competitive among the frontier models. Master of Engineering 2022-03-15T02:33:49Z 2022-03-15T02:33:49Z 2021 Thesis-Master by Research Liu, Y. (2021). Anomaly detection in multivariate time series using ensemble method. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/155731 https://hdl.handle.net/10356/155731 10.32657/10356/155731 en Industrial Postgraduate Program (IPP) This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Liu, Yanling Anomaly detection in multivariate time series using ensemble method |
description |
Water distribution networks (WDNs) are essential services to people’s life and production.
The identification of anomalies and mitigation of cyber-attacks are crucial to ensure uninterrupted
water service. Among various solutions of anomalies detection, matrix profile
is recognized as the most time-efficient distanced-based approach. Matrix profile identifies
discords in univariate time series (UTS). As the physical processes are interdependent in
water networks, the data obtained from different sensors are correlated to detect anomalies
in multivariate time series (MTS). However, this approach only collects positive predictions
and has limitations in eliminating false-positive detections.
To improve the above-mentioned anomaly detection limitation of matrix profile, we propose
and demonstrate two methods, the matrix profile with autoencoder method and the boosting
method. Autoencoder, an artificial neural network trained to copy its input to its output, is
introduced to reduce false alarms. Moreover, the localization of anomalies is automated by
analyzing the UTS anomaly detection results. Boosting is an ensemble learning algorithm
that focuses on correcting misclassified labels by the previous model with the current model.
It converts weak learners to strong learners sequentially. Three boosting methods, including
XGBoost, LightGBM, and CatBoost, are studied to tackle the classification of anomalies.
Specifically, the proposed matrix profile with the autoencoder based ensemble model is applied
as a semi-supervised anomaly detection model. The three boosting-based models are
proposed as supervised anomaly detection models.
To validate effectiveness in complex environments of water distribution system (WDS), we
tested the proposed two methods with simulated datasets containing labeled cyber-attacks.
Both the matrix profile with autoencoder model and the CatBoost model show high accuracy
of 0.9645 and 0.9245, respectively, superior to the existing state-of-the-art models. In
addition, the boosting methods are also applied to anomaly detection on a simulated leakage
dataset that contains detailed leakage information in WDS. The LightGMB provides outstanding
classification results with 0.945 and 0.985 accuracy, which is competitive among
the frontier models. |
author2 |
Chng Eng Siong |
author_facet |
Chng Eng Siong Liu, Yanling |
format |
Thesis-Master by Research |
author |
Liu, Yanling |
author_sort |
Liu, Yanling |
title |
Anomaly detection in multivariate time series using ensemble method |
title_short |
Anomaly detection in multivariate time series using ensemble method |
title_full |
Anomaly detection in multivariate time series using ensemble method |
title_fullStr |
Anomaly detection in multivariate time series using ensemble method |
title_full_unstemmed |
Anomaly detection in multivariate time series using ensemble method |
title_sort |
anomaly detection in multivariate time series using ensemble method |
publisher |
Nanyang Technological University |
publishDate |
2022 |
url |
https://hdl.handle.net/10356/155731 |
_version_ |
1729789514162896896 |