Data assimilation with missing data in nonstationary environments for probabilistic machine learning models
In this study, we further develop the data assimilation framework proposed for probabilistic Machine Learning (ML) models, named Probabilistic Optimal Interpolation (POI), in nonstationary environments with missing data which are common in real-world situations. The dataset is based on a multi-scale...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/173067 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In this study, we further develop the data assimilation framework proposed for probabilistic Machine Learning (ML) models, named Probabilistic Optimal Interpolation (POI), in nonstationary environments with missing data which are common in real-world situations. The dataset is based on a multi-scale Lorenz 96 chaos system. Three types of nonstationary environments (i.e., trend, heteroscedasticity, and random walk) are introduced in the dataset. In addition, the test datasets are masked with different missingness rates to evaluate the POI performance under scenarios with missing values. This study utilizes several filters to identify background noises for observation covariance initialization, and the covariance is updated along the real-time data assimilation specifically for nonstationary environments. The results show that heteroscedastic noises can be well identified while random-walk noises are very difficult to analyze. Overall, the results show that the POI implementation can lead to reduced uncertainty, but POI performance can also be significantly affected due to the limitation of ML models accuracy in the nonstationary environments. The impact from missing values is then examined and compared between stationary and nonstationary environments. Both prediction and POI updates are more accurate with smaller missingness rates as expected, and whether POI is bypassed or not at missing points does not affect the overall performance significantly. Finally, input evolution can perform well with POI under high noise level and missingness rates in stationary environments, but it always yields worse results in nonstationary environments and thus is not recommended. |
---|