Reinforcement learning for strategic airport slot scheduling: Analysis of state observations and reward designs

Due to the NP-hard nature, the strategic airport slot scheduling problem is calling for exploring sub-optimal approaches, such as heuristics and learning-based approaches. Moreover, the continuous increase in air traffic demand requires approaches that can work well in new scenarios. While heuristic...

全面介紹

Saved in:
書目詳細資料
Main Authors: Nguyen-Duy, Anh, Pham, Duc-Thinh, Lye, Jian-Yi, TA, Nguyen Binh Duong
格式: text
語言:English
出版: Institutional Knowledge at Singapore Management University 2024
主題:
在線閱讀:https://ink.library.smu.edu.sg/sis_research/9268
https://ink.library.smu.edu.sg/context/sis_research/article/10268/viewcontent/RL_AirportSlot_CAI_2024_av.pdf
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Singapore Management University
語言: English
id sg-smu-ink.sis_research-10268
record_format dspace
spelling sg-smu-ink.sis_research-102682024-09-05T06:43:00Z Reinforcement learning for strategic airport slot scheduling: Analysis of state observations and reward designs Nguyen-Duy, Anh Pham, Duc-Thinh Lye, Jian-Yi TA, Nguyen Binh Duong Due to the NP-hard nature, the strategic airport slot scheduling problem is calling for exploring sub-optimal approaches, such as heuristics and learning-based approaches. Moreover, the continuous increase in air traffic demand requires approaches that can work well in new scenarios. While heuristics rely on a fixed set of rules, which limits the ability to explore new solutions, Reinforcement Learning offers a versatile framework to automate the search and generalize to unseen scenarios. Finding a suitable state observation and reward structure design is essential in using Reinforcement Learning. In this paper, we investigate the impact of providing the Reinforcement Learning agent with an intermediate positive signal in the reward structure along with the use of the Full State Observation and the Local State Observation. We perform training with different combinations of the reward structure, the state observation, and the Deep Q-Network (DQN) algorithm to define the training efficient formulation. We use two types of scenarios, medium and high-density, to test the ability to generalize to unseen data of the approach. Each type of scenario is used to train two separate models, Model 1 and Model 2. Model 1, which is trained on high-density scenarios, will be tested with medium-density scenarios; the results obtained will then be compared with the results of Model 2, and vice versa. We additionally analyze the performance of the DQN models with the Proximal Policy Optimization (PPO) models. Results suggest that combining the Local State Observation and the intermediate positive signal leads to a stable convergence. The obtained DQN models perform better compared to the PPO models, achieving an average displacement per request of 1.44/1.99 while only having on average 0.00/0.02 unaccommodated requests for medium/high-density scenarios. The t-statistic of 0.0810/-1.0016 and the p-value of 0.9356/0.3190 also suggest that the DQN models can generalize to unseen scenarios. 2024-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9268 info:doi/10.1109/CAI59869.2024.00213 https://ink.library.smu.edu.sg/context/sis_research/article/10268/viewcontent/RL_AirportSlot_CAI_2024_av.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University airport slot scheduling Reinforcement Learning strategic Artificial Intelligence and Robotics Operations Research, Systems Engineering and Industrial Engineering Transportation
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic airport slot scheduling
Reinforcement Learning
strategic
Artificial Intelligence and Robotics
Operations Research, Systems Engineering and Industrial Engineering
Transportation
spellingShingle airport slot scheduling
Reinforcement Learning
strategic
Artificial Intelligence and Robotics
Operations Research, Systems Engineering and Industrial Engineering
Transportation
Nguyen-Duy, Anh
Pham, Duc-Thinh
Lye, Jian-Yi
TA, Nguyen Binh Duong
Reinforcement learning for strategic airport slot scheduling: Analysis of state observations and reward designs
description Due to the NP-hard nature, the strategic airport slot scheduling problem is calling for exploring sub-optimal approaches, such as heuristics and learning-based approaches. Moreover, the continuous increase in air traffic demand requires approaches that can work well in new scenarios. While heuristics rely on a fixed set of rules, which limits the ability to explore new solutions, Reinforcement Learning offers a versatile framework to automate the search and generalize to unseen scenarios. Finding a suitable state observation and reward structure design is essential in using Reinforcement Learning. In this paper, we investigate the impact of providing the Reinforcement Learning agent with an intermediate positive signal in the reward structure along with the use of the Full State Observation and the Local State Observation. We perform training with different combinations of the reward structure, the state observation, and the Deep Q-Network (DQN) algorithm to define the training efficient formulation. We use two types of scenarios, medium and high-density, to test the ability to generalize to unseen data of the approach. Each type of scenario is used to train two separate models, Model 1 and Model 2. Model 1, which is trained on high-density scenarios, will be tested with medium-density scenarios; the results obtained will then be compared with the results of Model 2, and vice versa. We additionally analyze the performance of the DQN models with the Proximal Policy Optimization (PPO) models. Results suggest that combining the Local State Observation and the intermediate positive signal leads to a stable convergence. The obtained DQN models perform better compared to the PPO models, achieving an average displacement per request of 1.44/1.99 while only having on average 0.00/0.02 unaccommodated requests for medium/high-density scenarios. The t-statistic of 0.0810/-1.0016 and the p-value of 0.9356/0.3190 also suggest that the DQN models can generalize to unseen scenarios.
format text
author Nguyen-Duy, Anh
Pham, Duc-Thinh
Lye, Jian-Yi
TA, Nguyen Binh Duong
author_facet Nguyen-Duy, Anh
Pham, Duc-Thinh
Lye, Jian-Yi
TA, Nguyen Binh Duong
author_sort Nguyen-Duy, Anh
title Reinforcement learning for strategic airport slot scheduling: Analysis of state observations and reward designs
title_short Reinforcement learning for strategic airport slot scheduling: Analysis of state observations and reward designs
title_full Reinforcement learning for strategic airport slot scheduling: Analysis of state observations and reward designs
title_fullStr Reinforcement learning for strategic airport slot scheduling: Analysis of state observations and reward designs
title_full_unstemmed Reinforcement learning for strategic airport slot scheduling: Analysis of state observations and reward designs
title_sort reinforcement learning for strategic airport slot scheduling: analysis of state observations and reward designs
publisher Institutional Knowledge at Singapore Management University
publishDate 2024
url https://ink.library.smu.edu.sg/sis_research/9268
https://ink.library.smu.edu.sg/context/sis_research/article/10268/viewcontent/RL_AirportSlot_CAI_2024_av.pdf
_version_ 1814047849786114048