DATA DRIVEN LINEAR OPTIMAL CONTROL USING MODEL-BASED AND MODEL-FREE REINFORCEMENT LEARNING

The objective of optimal control theory is to design control signals such that the output of the controlled system achieves the desired reference while simultaneously optimizing a performance index. Conventional optimal control systems require solving the Hamilton-Jacobi-Bellman (HJB) Equation to...

Full description

Saved in:
Bibliographic Details
Main Author: Novitarini Putri, Adi
Format: Dissertations
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/84508
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:84508
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description The objective of optimal control theory is to design control signals such that the output of the controlled system achieves the desired reference while simultaneously optimizing a performance index. Conventional optimal control systems require solving the Hamilton-Jacobi-Bellman (HJB) Equation to derive optimal control laws. The solution of the HJB Equation necessitates an accurate control model. However, in practice, it is often challenging to obtain an accurate system dynamics model due to uncertainties and time-varying dynamics. On the other hand, Reinforcement Learning (RL) is a branch of artificial intelligence that can be used to determine optimal solutions, making it a viable alternative for solving optimal control problems. Consequently, RL serves as an intersection between control theory and artificial intelligence. RL aims to optimize a performance index, which quantifies how effectively the RL agent delivers control signals to its environment. A data-driven paradigm for RL enables the agent to learn efficiently, offering a potential solution to the limitations of conventional optimal control systems. RL methods are classified into model-based and model-free approaches. This dissertation explores a model-based RL approach for the Linear Quadratic Gaussian (LQG) case, referred to as the Data-Driven LQG Method. Additionally, a model-free RL approach is examined for the Output Feedback (OPFB) case, referred to as the Data-Driven OPFB Method. In the Data-Driven LQG Method, the LQG controller combines the roles of the Kalman filter and the Linear Quadratic Regulator (LQR) as an estimator and controller. This combination effectively addresses the regulation of linear systems subject to Gaussian-distributed disturbances. The limitation of this method is that it requires knowledge of the linear dynamics and the stochastic characteristics of the system disturbances and measurements. The proposed method in this research combines KalmanNet and the Value Iteration algorithm to design a controller for discrete-time stochastic systems. The Data-Driven LQG Method begins with preparing a dataset of input and output signals from a control system. Then, system identification is explicitly performed to obtain a model approximation. KalmanNet is used to build state estimates. KalmanNet is an algorithm that replaces the Kalman filter with a Recurrent Neural Network (RNN), specifically a Long-Short Term Memory (LSTM) network in this study. For the control component, the Value Iteration algorithm is used to generate the control gain. This results in a control signal being implemented in the system and producing an output signal. The performance evaluation of the Data-Driven LQG Method involves analyzing the convergence of the data-driven RL control gain compared to conventional optimal control methods. In the Data-Driven OPFB Method, the focus is on designing control without full state feedback. The objectives of the OPFB control scheme are (1) to ensure the closed-loop system’s stability and (2) to enable the control system to track desired reference signals. Solving the HJB equation for the OPFB scheme requires a system dynamics model, which is practically difficult to obtain. Additionally, the OPFB scheme requires an observer (estimator) to generate the state trajectory during the learning process. The proposed Data-Driven OPFB Method employs Deep Recurrent Q-Networks (DRQN) to generate the trajectory of optimal control signals based on a dataset of input and output signals from the system. This approach is based on the Q-Learning method from the RL scheme. An LSTM network is used to estimate the Q-function and determine control signals for systems with unknown models. The Data-Driven LQG and Data-Driven OPFB Method in this research have been shown to produce optimal controllers with faster convergence times compared to conventional methods. These methods were tested on three case studies: a cartpole system, a batch distillation column, and an unstable system. In these case studies, the norm of the control signals from the Data-Driven LQG was found to be 49.83%, 75.68%, and 88.50% smaller than the conventional LQG method for the first, second, and third case studies, respectively. The computation time was 98.52%, 98.50%, and 14.66% faster compared to conventional methods. The controllers derived from the Data-Driven LQG Method effectively replicated the role of conventional LQG, as evidenced by the reduction in error values of 5.31E- 02, 2.68E-02, and 1.06E-02 for the first, second, and third case studies, respectively. For the Data-Driven OPFB Method, the control signal norms for the first to third case studies were 46.72%, 99.22%, and 23.03% smaller than the conventional OPFB method. The convergence times for the Data-Driven OPFB Method were 80%, 76.92%, and 25% faster for the first, second, and third case studies, respectively, compared to the conventional OPFB method. The controllers obtained from the Data-Driven OPFB Method also ensured stability, as indicated by the finite norms of the state augmentation trajectories, which were 0.3162, 5.35E-28, and 1.40E-45 for the first, second, and third case studies, respectively.
format Dissertations
author Novitarini Putri, Adi
spellingShingle Novitarini Putri, Adi
DATA DRIVEN LINEAR OPTIMAL CONTROL USING MODEL-BASED AND MODEL-FREE REINFORCEMENT LEARNING
author_facet Novitarini Putri, Adi
author_sort Novitarini Putri, Adi
title DATA DRIVEN LINEAR OPTIMAL CONTROL USING MODEL-BASED AND MODEL-FREE REINFORCEMENT LEARNING
title_short DATA DRIVEN LINEAR OPTIMAL CONTROL USING MODEL-BASED AND MODEL-FREE REINFORCEMENT LEARNING
title_full DATA DRIVEN LINEAR OPTIMAL CONTROL USING MODEL-BASED AND MODEL-FREE REINFORCEMENT LEARNING
title_fullStr DATA DRIVEN LINEAR OPTIMAL CONTROL USING MODEL-BASED AND MODEL-FREE REINFORCEMENT LEARNING
title_full_unstemmed DATA DRIVEN LINEAR OPTIMAL CONTROL USING MODEL-BASED AND MODEL-FREE REINFORCEMENT LEARNING
title_sort data driven linear optimal control using model-based and model-free reinforcement learning
url https://digilib.itb.ac.id/gdl/view/84508
_version_ 1822998598160220160
spelling id-itb.:845082024-08-15T22:10:36ZDATA DRIVEN LINEAR OPTIMAL CONTROL USING MODEL-BASED AND MODEL-FREE REINFORCEMENT LEARNING Novitarini Putri, Adi Indonesia Dissertations data driven LQG, Value Iteration, KalmanNet, data driven OPFB INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/84508 The objective of optimal control theory is to design control signals such that the output of the controlled system achieves the desired reference while simultaneously optimizing a performance index. Conventional optimal control systems require solving the Hamilton-Jacobi-Bellman (HJB) Equation to derive optimal control laws. The solution of the HJB Equation necessitates an accurate control model. However, in practice, it is often challenging to obtain an accurate system dynamics model due to uncertainties and time-varying dynamics. On the other hand, Reinforcement Learning (RL) is a branch of artificial intelligence that can be used to determine optimal solutions, making it a viable alternative for solving optimal control problems. Consequently, RL serves as an intersection between control theory and artificial intelligence. RL aims to optimize a performance index, which quantifies how effectively the RL agent delivers control signals to its environment. A data-driven paradigm for RL enables the agent to learn efficiently, offering a potential solution to the limitations of conventional optimal control systems. RL methods are classified into model-based and model-free approaches. This dissertation explores a model-based RL approach for the Linear Quadratic Gaussian (LQG) case, referred to as the Data-Driven LQG Method. Additionally, a model-free RL approach is examined for the Output Feedback (OPFB) case, referred to as the Data-Driven OPFB Method. In the Data-Driven LQG Method, the LQG controller combines the roles of the Kalman filter and the Linear Quadratic Regulator (LQR) as an estimator and controller. This combination effectively addresses the regulation of linear systems subject to Gaussian-distributed disturbances. The limitation of this method is that it requires knowledge of the linear dynamics and the stochastic characteristics of the system disturbances and measurements. The proposed method in this research combines KalmanNet and the Value Iteration algorithm to design a controller for discrete-time stochastic systems. The Data-Driven LQG Method begins with preparing a dataset of input and output signals from a control system. Then, system identification is explicitly performed to obtain a model approximation. KalmanNet is used to build state estimates. KalmanNet is an algorithm that replaces the Kalman filter with a Recurrent Neural Network (RNN), specifically a Long-Short Term Memory (LSTM) network in this study. For the control component, the Value Iteration algorithm is used to generate the control gain. This results in a control signal being implemented in the system and producing an output signal. The performance evaluation of the Data-Driven LQG Method involves analyzing the convergence of the data-driven RL control gain compared to conventional optimal control methods. In the Data-Driven OPFB Method, the focus is on designing control without full state feedback. The objectives of the OPFB control scheme are (1) to ensure the closed-loop system’s stability and (2) to enable the control system to track desired reference signals. Solving the HJB equation for the OPFB scheme requires a system dynamics model, which is practically difficult to obtain. Additionally, the OPFB scheme requires an observer (estimator) to generate the state trajectory during the learning process. The proposed Data-Driven OPFB Method employs Deep Recurrent Q-Networks (DRQN) to generate the trajectory of optimal control signals based on a dataset of input and output signals from the system. This approach is based on the Q-Learning method from the RL scheme. An LSTM network is used to estimate the Q-function and determine control signals for systems with unknown models. The Data-Driven LQG and Data-Driven OPFB Method in this research have been shown to produce optimal controllers with faster convergence times compared to conventional methods. These methods were tested on three case studies: a cartpole system, a batch distillation column, and an unstable system. In these case studies, the norm of the control signals from the Data-Driven LQG was found to be 49.83%, 75.68%, and 88.50% smaller than the conventional LQG method for the first, second, and third case studies, respectively. The computation time was 98.52%, 98.50%, and 14.66% faster compared to conventional methods. The controllers derived from the Data-Driven LQG Method effectively replicated the role of conventional LQG, as evidenced by the reduction in error values of 5.31E- 02, 2.68E-02, and 1.06E-02 for the first, second, and third case studies, respectively. For the Data-Driven OPFB Method, the control signal norms for the first to third case studies were 46.72%, 99.22%, and 23.03% smaller than the conventional OPFB method. The convergence times for the Data-Driven OPFB Method were 80%, 76.92%, and 25% faster for the first, second, and third case studies, respectively, compared to the conventional OPFB method. The controllers obtained from the Data-Driven OPFB Method also ensured stability, as indicated by the finite norms of the state augmentation trajectories, which were 0.3162, 5.35E-28, and 1.40E-45 for the first, second, and third case studies, respectively. text