DATA DRIVEN LINEAR OPTIMAL CONTROL USING MODEL-BASED AND MODEL-FREE REINFORCEMENT LEARNING
The objective of optimal control theory is to design control signals such that the output of the controlled system achieves the desired reference while simultaneously optimizing a performance index. Conventional optimal control systems require solving the Hamilton-Jacobi-Bellman (HJB) Equation to...
Saved in:
Main Author: | |
---|---|
Format: | Dissertations |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/84508 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | The objective of optimal control theory is to design control signals such that
the output of the controlled system achieves the desired reference while simultaneously
optimizing a performance index. Conventional optimal control systems
require solving the Hamilton-Jacobi-Bellman (HJB) Equation to derive optimal
control laws. The solution of the HJB Equation necessitates an accurate control
model. However, in practice, it is often challenging to obtain an accurate system
dynamics model due to uncertainties and time-varying dynamics. On the other
hand, Reinforcement Learning (RL) is a branch of artificial intelligence that can
be used to determine optimal solutions, making it a viable alternative for solving
optimal control problems. Consequently, RL serves as an intersection between
control theory and artificial intelligence. RL aims to optimize a performance
index, which quantifies how effectively the RL agent delivers control signals to its
environment. A data-driven paradigm for RL enables the agent to learn efficiently,
offering a potential solution to the limitations of conventional optimal control
systems. RL methods are classified into model-based and model-free approaches.
This dissertation explores a model-based RL approach for the Linear Quadratic
Gaussian (LQG) case, referred to as the Data-Driven LQG Method. Additionally,
a model-free RL approach is examined for the Output Feedback (OPFB) case,
referred to as the Data-Driven OPFB Method.
In the Data-Driven LQG Method, the LQG controller combines the roles of the
Kalman filter and the Linear Quadratic Regulator (LQR) as an estimator and
controller. This combination effectively addresses the regulation of linear systems
subject to Gaussian-distributed disturbances. The limitation of this method is that
it requires knowledge of the linear dynamics and the stochastic characteristics of
the system disturbances and measurements. The proposed method in this research
combines KalmanNet and the Value Iteration algorithm to design a controller
for discrete-time stochastic systems. The Data-Driven LQG Method begins with
preparing a dataset of input and output signals from a control system. Then, system
identification is explicitly performed to obtain a model approximation. KalmanNet
is used to build state estimates. KalmanNet is an algorithm that replaces the
Kalman filter with a Recurrent Neural Network (RNN), specifically a Long-Short
Term Memory (LSTM) network in this study. For the control component, the Value
Iteration algorithm is used to generate the control gain. This results in a control
signal being implemented in the system and producing an output signal. The performance
evaluation of the Data-Driven LQG Method involves analyzing the convergence
of the data-driven RL control gain compared to conventional optimal control
methods.
In the Data-Driven OPFB Method, the focus is on designing control without full
state feedback. The objectives of the OPFB control scheme are (1) to ensure the
closed-loop system’s stability and (2) to enable the control system to track desired
reference signals. Solving the HJB equation for the OPFB scheme requires a system
dynamics model, which is practically difficult to obtain. Additionally, the OPFB
scheme requires an observer (estimator) to generate the state trajectory during
the learning process. The proposed Data-Driven OPFB Method employs Deep
Recurrent Q-Networks (DRQN) to generate the trajectory of optimal control signals
based on a dataset of input and output signals from the system. This approach is
based on the Q-Learning method from the RL scheme. An LSTM network is used
to estimate the Q-function and determine control signals for systems with unknown
models.
The Data-Driven LQG and Data-Driven OPFB Method in this research have been
shown to produce optimal controllers with faster convergence times compared to
conventional methods. These methods were tested on three case studies: a cartpole
system, a batch distillation column, and an unstable system. In these case
studies, the norm of the control signals from the Data-Driven LQG was found
to be 49.83%, 75.68%, and 88.50% smaller than the conventional LQG method
for the first, second, and third case studies, respectively. The computation time
was 98.52%, 98.50%, and 14.66% faster compared to conventional methods. The
controllers derived from the Data-Driven LQG Method effectively replicated the
role of conventional LQG, as evidenced by the reduction in error values of 5.31E-
02, 2.68E-02, and 1.06E-02 for the first, second, and third case studies, respectively.
For the Data-Driven OPFB Method, the control signal norms for the first to third
case studies were 46.72%, 99.22%, and 23.03% smaller than the conventional
OPFB method. The convergence times for the Data-Driven OPFB Method were
80%, 76.92%, and 25% faster for the first, second, and third case studies, respectively,
compared to the conventional OPFB method. The controllers obtained from
the Data-Driven OPFB Method also ensured stability, as indicated by the finite
norms of the state augmentation trajectories, which were 0.3162, 5.35E-28, and
1.40E-45 for the first, second, and third case studies, respectively. |
---|