DATA DRIVEN LINEAR OPTIMAL CONTROL USING MODEL-BASED AND MODEL-FREE REINFORCEMENT LEARNING
The objective of optimal control theory is to design control signals such that the output of the controlled system achieves the desired reference while simultaneously optimizing a performance index. Conventional optimal control systems require solving the Hamilton-Jacobi-Bellman (HJB) Equation to...
Saved in:
Main Author: | |
---|---|
Format: | Dissertations |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/84508 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:84508 |
---|---|
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
The objective of optimal control theory is to design control signals such that
the output of the controlled system achieves the desired reference while simultaneously
optimizing a performance index. Conventional optimal control systems
require solving the Hamilton-Jacobi-Bellman (HJB) Equation to derive optimal
control laws. The solution of the HJB Equation necessitates an accurate control
model. However, in practice, it is often challenging to obtain an accurate system
dynamics model due to uncertainties and time-varying dynamics. On the other
hand, Reinforcement Learning (RL) is a branch of artificial intelligence that can
be used to determine optimal solutions, making it a viable alternative for solving
optimal control problems. Consequently, RL serves as an intersection between
control theory and artificial intelligence. RL aims to optimize a performance
index, which quantifies how effectively the RL agent delivers control signals to its
environment. A data-driven paradigm for RL enables the agent to learn efficiently,
offering a potential solution to the limitations of conventional optimal control
systems. RL methods are classified into model-based and model-free approaches.
This dissertation explores a model-based RL approach for the Linear Quadratic
Gaussian (LQG) case, referred to as the Data-Driven LQG Method. Additionally,
a model-free RL approach is examined for the Output Feedback (OPFB) case,
referred to as the Data-Driven OPFB Method.
In the Data-Driven LQG Method, the LQG controller combines the roles of the
Kalman filter and the Linear Quadratic Regulator (LQR) as an estimator and
controller. This combination effectively addresses the regulation of linear systems
subject to Gaussian-distributed disturbances. The limitation of this method is that
it requires knowledge of the linear dynamics and the stochastic characteristics of
the system disturbances and measurements. The proposed method in this research
combines KalmanNet and the Value Iteration algorithm to design a controller
for discrete-time stochastic systems. The Data-Driven LQG Method begins with
preparing a dataset of input and output signals from a control system. Then, system
identification is explicitly performed to obtain a model approximation. KalmanNet
is used to build state estimates. KalmanNet is an algorithm that replaces the
Kalman filter with a Recurrent Neural Network (RNN), specifically a Long-Short
Term Memory (LSTM) network in this study. For the control component, the Value
Iteration algorithm is used to generate the control gain. This results in a control
signal being implemented in the system and producing an output signal. The performance
evaluation of the Data-Driven LQG Method involves analyzing the convergence
of the data-driven RL control gain compared to conventional optimal control
methods.
In the Data-Driven OPFB Method, the focus is on designing control without full
state feedback. The objectives of the OPFB control scheme are (1) to ensure the
closed-loop system’s stability and (2) to enable the control system to track desired
reference signals. Solving the HJB equation for the OPFB scheme requires a system
dynamics model, which is practically difficult to obtain. Additionally, the OPFB
scheme requires an observer (estimator) to generate the state trajectory during
the learning process. The proposed Data-Driven OPFB Method employs Deep
Recurrent Q-Networks (DRQN) to generate the trajectory of optimal control signals
based on a dataset of input and output signals from the system. This approach is
based on the Q-Learning method from the RL scheme. An LSTM network is used
to estimate the Q-function and determine control signals for systems with unknown
models.
The Data-Driven LQG and Data-Driven OPFB Method in this research have been
shown to produce optimal controllers with faster convergence times compared to
conventional methods. These methods were tested on three case studies: a cartpole
system, a batch distillation column, and an unstable system. In these case
studies, the norm of the control signals from the Data-Driven LQG was found
to be 49.83%, 75.68%, and 88.50% smaller than the conventional LQG method
for the first, second, and third case studies, respectively. The computation time
was 98.52%, 98.50%, and 14.66% faster compared to conventional methods. The
controllers derived from the Data-Driven LQG Method effectively replicated the
role of conventional LQG, as evidenced by the reduction in error values of 5.31E-
02, 2.68E-02, and 1.06E-02 for the first, second, and third case studies, respectively.
For the Data-Driven OPFB Method, the control signal norms for the first to third
case studies were 46.72%, 99.22%, and 23.03% smaller than the conventional
OPFB method. The convergence times for the Data-Driven OPFB Method were
80%, 76.92%, and 25% faster for the first, second, and third case studies, respectively,
compared to the conventional OPFB method. The controllers obtained from
the Data-Driven OPFB Method also ensured stability, as indicated by the finite
norms of the state augmentation trajectories, which were 0.3162, 5.35E-28, and
1.40E-45 for the first, second, and third case studies, respectively. |
format |
Dissertations |
author |
Novitarini Putri, Adi |
spellingShingle |
Novitarini Putri, Adi DATA DRIVEN LINEAR OPTIMAL CONTROL USING MODEL-BASED AND MODEL-FREE REINFORCEMENT LEARNING |
author_facet |
Novitarini Putri, Adi |
author_sort |
Novitarini Putri, Adi |
title |
DATA DRIVEN LINEAR OPTIMAL CONTROL USING MODEL-BASED AND MODEL-FREE REINFORCEMENT LEARNING |
title_short |
DATA DRIVEN LINEAR OPTIMAL CONTROL USING MODEL-BASED AND MODEL-FREE REINFORCEMENT LEARNING |
title_full |
DATA DRIVEN LINEAR OPTIMAL CONTROL USING MODEL-BASED AND MODEL-FREE REINFORCEMENT LEARNING |
title_fullStr |
DATA DRIVEN LINEAR OPTIMAL CONTROL USING MODEL-BASED AND MODEL-FREE REINFORCEMENT LEARNING |
title_full_unstemmed |
DATA DRIVEN LINEAR OPTIMAL CONTROL USING MODEL-BASED AND MODEL-FREE REINFORCEMENT LEARNING |
title_sort |
data driven linear optimal control using model-based and model-free reinforcement learning |
url |
https://digilib.itb.ac.id/gdl/view/84508 |
_version_ |
1822998598160220160 |
spelling |
id-itb.:845082024-08-15T22:10:36ZDATA DRIVEN LINEAR OPTIMAL CONTROL USING MODEL-BASED AND MODEL-FREE REINFORCEMENT LEARNING Novitarini Putri, Adi Indonesia Dissertations data driven LQG, Value Iteration, KalmanNet, data driven OPFB INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/84508 The objective of optimal control theory is to design control signals such that the output of the controlled system achieves the desired reference while simultaneously optimizing a performance index. Conventional optimal control systems require solving the Hamilton-Jacobi-Bellman (HJB) Equation to derive optimal control laws. The solution of the HJB Equation necessitates an accurate control model. However, in practice, it is often challenging to obtain an accurate system dynamics model due to uncertainties and time-varying dynamics. On the other hand, Reinforcement Learning (RL) is a branch of artificial intelligence that can be used to determine optimal solutions, making it a viable alternative for solving optimal control problems. Consequently, RL serves as an intersection between control theory and artificial intelligence. RL aims to optimize a performance index, which quantifies how effectively the RL agent delivers control signals to its environment. A data-driven paradigm for RL enables the agent to learn efficiently, offering a potential solution to the limitations of conventional optimal control systems. RL methods are classified into model-based and model-free approaches. This dissertation explores a model-based RL approach for the Linear Quadratic Gaussian (LQG) case, referred to as the Data-Driven LQG Method. Additionally, a model-free RL approach is examined for the Output Feedback (OPFB) case, referred to as the Data-Driven OPFB Method. In the Data-Driven LQG Method, the LQG controller combines the roles of the Kalman filter and the Linear Quadratic Regulator (LQR) as an estimator and controller. This combination effectively addresses the regulation of linear systems subject to Gaussian-distributed disturbances. The limitation of this method is that it requires knowledge of the linear dynamics and the stochastic characteristics of the system disturbances and measurements. The proposed method in this research combines KalmanNet and the Value Iteration algorithm to design a controller for discrete-time stochastic systems. The Data-Driven LQG Method begins with preparing a dataset of input and output signals from a control system. Then, system identification is explicitly performed to obtain a model approximation. KalmanNet is used to build state estimates. KalmanNet is an algorithm that replaces the Kalman filter with a Recurrent Neural Network (RNN), specifically a Long-Short Term Memory (LSTM) network in this study. For the control component, the Value Iteration algorithm is used to generate the control gain. This results in a control signal being implemented in the system and producing an output signal. The performance evaluation of the Data-Driven LQG Method involves analyzing the convergence of the data-driven RL control gain compared to conventional optimal control methods. In the Data-Driven OPFB Method, the focus is on designing control without full state feedback. The objectives of the OPFB control scheme are (1) to ensure the closed-loop system’s stability and (2) to enable the control system to track desired reference signals. Solving the HJB equation for the OPFB scheme requires a system dynamics model, which is practically difficult to obtain. Additionally, the OPFB scheme requires an observer (estimator) to generate the state trajectory during the learning process. The proposed Data-Driven OPFB Method employs Deep Recurrent Q-Networks (DRQN) to generate the trajectory of optimal control signals based on a dataset of input and output signals from the system. This approach is based on the Q-Learning method from the RL scheme. An LSTM network is used to estimate the Q-function and determine control signals for systems with unknown models. The Data-Driven LQG and Data-Driven OPFB Method in this research have been shown to produce optimal controllers with faster convergence times compared to conventional methods. These methods were tested on three case studies: a cartpole system, a batch distillation column, and an unstable system. In these case studies, the norm of the control signals from the Data-Driven LQG was found to be 49.83%, 75.68%, and 88.50% smaller than the conventional LQG method for the first, second, and third case studies, respectively. The computation time was 98.52%, 98.50%, and 14.66% faster compared to conventional methods. The controllers derived from the Data-Driven LQG Method effectively replicated the role of conventional LQG, as evidenced by the reduction in error values of 5.31E- 02, 2.68E-02, and 1.06E-02 for the first, second, and third case studies, respectively. For the Data-Driven OPFB Method, the control signal norms for the first to third case studies were 46.72%, 99.22%, and 23.03% smaller than the conventional OPFB method. The convergence times for the Data-Driven OPFB Method were 80%, 76.92%, and 25% faster for the first, second, and third case studies, respectively, compared to the conventional OPFB method. The controllers obtained from the Data-Driven OPFB Method also ensured stability, as indicated by the finite norms of the state augmentation trajectories, which were 0.3162, 5.35E-28, and 1.40E-45 for the first, second, and third case studies, respectively. text |