Probabilistic optimal interpolation for data assimilation between machine learning model predictions and real time observations

In this study, we propose a new framework for Data Assimilation (DA) named Probabilistic Optimal Interpolation (POI) to combine the predictions from Machine Learning (ML) models trained with historical data and real-time observations, with the key objective to improve the estimate on the state of sy...

Full description

Saved in:
Bibliographic Details
Main Authors: Wei, Yuying, Law, Adrian Wing-Keung, Yang, Chun
Other Authors: School of Civil and Environmental Engineering
Format: Article
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/170149
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In this study, we propose a new framework for Data Assimilation (DA) named Probabilistic Optimal Interpolation (POI) to combine the predictions from Machine Learning (ML) models trained with historical data and real-time observations, with the key objective to improve the estimate on the state of system. The framework utilizes the heteroscedastic uncertainty of the ML predictions as well as the residual-based uncertainty of the observations and integrates the two through the technique of optimal interpolation. The quantification of the respective uncertainties is directly included within the framework itself. As an application example, we test the performance of POI using a multi-scale Lorenz 96 chaos system with various added noise levels. The ML model is based on a Long Short-Term Memory (LSTM) neural network and the technique of Monte Carlo (MC) dropout is adopted for the uncertainty quantification. The computational results show that the POI implementation can lead to improved predictions of the state of the system with less uncertainty and it can also filter the added level of noises effectively when the historical data are reasonably accurate. However, if the noise level is high, using the updated POI predictions as sequential inputs for the next time step does not guarantee better performance than using the real-time observations directly. Furthermore, under very noisy conditions, the average ML predictions after the MC dropout can already reduce the noises substantially, and these predictions might even be better than the POI updates. Therefore, the POI implementation (or data assimilation in general) is not recommended with a ML-based surrogate model in a noisy environment.