Data assimilation with missing data in nonstationary environments for probabilistic machine learning models
In this study, we further develop the data assimilation framework proposed for probabilistic Machine Learning (ML) models, named Probabilistic Optimal Interpolation (POI), in nonstationary environments with missing data which are common in real-world situations. The dataset is based on a multi-scale...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/173067 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-173067 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1730672024-01-12T15:34:14Z Data assimilation with missing data in nonstationary environments for probabilistic machine learning models Wei, Yuying Law, Adrian Wing-Keung Yang, Chun School of Civil and Environmental Engineering Interdisciplinary Graduate School (IGS) School of Mechanical and Aerospace Engineering Environmental Process Modelling Centre Nanyang Environment and Water Research Institute Engineering::Civil engineering Data Assimilation Missing Data In this study, we further develop the data assimilation framework proposed for probabilistic Machine Learning (ML) models, named Probabilistic Optimal Interpolation (POI), in nonstationary environments with missing data which are common in real-world situations. The dataset is based on a multi-scale Lorenz 96 chaos system. Three types of nonstationary environments (i.e., trend, heteroscedasticity, and random walk) are introduced in the dataset. In addition, the test datasets are masked with different missingness rates to evaluate the POI performance under scenarios with missing values. This study utilizes several filters to identify background noises for observation covariance initialization, and the covariance is updated along the real-time data assimilation specifically for nonstationary environments. The results show that heteroscedastic noises can be well identified while random-walk noises are very difficult to analyze. Overall, the results show that the POI implementation can lead to reduced uncertainty, but POI performance can also be significantly affected due to the limitation of ML models accuracy in the nonstationary environments. The impact from missing values is then examined and compared between stationary and nonstationary environments. Both prediction and POI updates are more accurate with smaller missingness rates as expected, and whether POI is bypassed or not at missing points does not affect the overall performance significantly. Finally, input evolution can perform well with POI under high noise level and missingness rates in stationary environments, but it always yields worse results in nonstationary environments and thus is not recommended. National Research Foundation (NRF) Public Utilities Board (PUB) Submitted/Accepted version This research / project is supported by the National Research Foundation, Singapore, and PUB, Singapore’s National Water Agency under its RIE2025 Urban Solutions and Sustainability (USS) (Water) Centre of Excellence (CoE) Programme, awarded to Nanyang Environment & Water Research Institute (NEWRI), Nanyang Technological University, Singapore (NTU). 2024-01-10T07:01:12Z 2024-01-10T07:01:12Z 2023 Journal Article Wei, Y., Law, A. W. & Yang, C. (2023). Data assimilation with missing data in nonstationary environments for probabilistic machine learning models. Journal of Computational Science, 74, 102151-. https://dx.doi.org/10.1016/j.jocs.2023.102151 1877-7503 https://hdl.handle.net/10356/173067 10.1016/j.jocs.2023.102151 2-s2.0-85174034277 74 102151 en Journal of Computational Science © 2023 Elsevier B.V. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1016/j.jocs.2023.102151. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Civil engineering Data Assimilation Missing Data |
spellingShingle |
Engineering::Civil engineering Data Assimilation Missing Data Wei, Yuying Law, Adrian Wing-Keung Yang, Chun Data assimilation with missing data in nonstationary environments for probabilistic machine learning models |
description |
In this study, we further develop the data assimilation framework proposed for probabilistic Machine Learning (ML) models, named Probabilistic Optimal Interpolation (POI), in nonstationary environments with missing data which are common in real-world situations. The dataset is based on a multi-scale Lorenz 96 chaos system. Three types of nonstationary environments (i.e., trend, heteroscedasticity, and random walk) are introduced in the dataset. In addition, the test datasets are masked with different missingness rates to evaluate the POI performance under scenarios with missing values. This study utilizes several filters to identify background noises for observation covariance initialization, and the covariance is updated along the real-time data assimilation specifically for nonstationary environments. The results show that heteroscedastic noises can be well identified while random-walk noises are very difficult to analyze. Overall, the results show that the POI implementation can lead to reduced uncertainty, but POI performance can also be significantly affected due to the limitation of ML models accuracy in the nonstationary environments. The impact from missing values is then examined and compared between stationary and nonstationary environments. Both prediction and POI updates are more accurate with smaller missingness rates as expected, and whether POI is bypassed or not at missing points does not affect the overall performance significantly. Finally, input evolution can perform well with POI under high noise level and missingness rates in stationary environments, but it always yields worse results in nonstationary environments and thus is not recommended. |
author2 |
School of Civil and Environmental Engineering |
author_facet |
School of Civil and Environmental Engineering Wei, Yuying Law, Adrian Wing-Keung Yang, Chun |
format |
Article |
author |
Wei, Yuying Law, Adrian Wing-Keung Yang, Chun |
author_sort |
Wei, Yuying |
title |
Data assimilation with missing data in nonstationary environments for probabilistic machine learning models |
title_short |
Data assimilation with missing data in nonstationary environments for probabilistic machine learning models |
title_full |
Data assimilation with missing data in nonstationary environments for probabilistic machine learning models |
title_fullStr |
Data assimilation with missing data in nonstationary environments for probabilistic machine learning models |
title_full_unstemmed |
Data assimilation with missing data in nonstationary environments for probabilistic machine learning models |
title_sort |
data assimilation with missing data in nonstationary environments for probabilistic machine learning models |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/173067 |
_version_ |
1789483106571386880 |