Real-time data-processing framework with model updating for digital twins of water treatment facilities

Machine learning (ML) models are now widely used in digital twins of water treatment facilities. These models are commonly trained based on historical datasets, and their predictions serve various important objectives, such as anomaly detection and optimization. While predictions from the trained mo...

Full description

Saved in:
Bibliographic Details
Main Authors: Wei, Yuying, Law, Adrian Wing-Keung, Yang, Chun
Other Authors: School of Civil and Environmental Engineering
Format: Article
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/165234
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Machine learning (ML) models are now widely used in digital twins of water treatment facilities. These models are commonly trained based on historical datasets, and their predictions serve various important objectives, such as anomaly detection and optimization. While predictions from the trained models are being made continuously for the digital twin, model updating using newly available real-time data is also necessary so that the twin can mimic the changes in the physical system dynamically. Thus, a synchronicity framework needs to be established in the digital twin, which has not been addressed in the literature so far. In this study, a novel framework with new coverage-based algorithms is proposed to determine the necessity and timing for model updating during real-time data transfers to improve the ML performance over time. The framework is tested in a prototype water treatment facility called the secure water treatment (SWaT) system. The results show that the framework performs well in general to synchronize the model updates and predictions, with a significant reduction in errors of up to 97%. The good performance can be attributed particularly to the coverage-based updating algorithms which control the size of training datasets to accelerate the ML model updating during synchronization.