REINFORCEMENT LEARNING-BASED TRAFFIC LIGHT DURATION CONTROL SYSTEM DEVELOPMENT

fuel consumption, exhaust emissions, as well as the mental and physical health of the public. Fixed-duration traffic light control systems are unable to adapt to dynamic traffic conditions. Therefore, an adaptive traffic light duration control system is needed, with the objective of reducing cong...

Full description

Saved in:
Bibliographic Details
Main Author: Pandu Irsyadi, Naufal
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/87727
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:87727
spelling id-itb.:877272025-02-03T07:52:42ZREINFORCEMENT LEARNING-BASED TRAFFIC LIGHT DURATION CONTROL SYSTEM DEVELOPMENT Pandu Irsyadi, Naufal Indonesia Theses Traffic Congestion, Traffic Light Control, Reinforcement Learning, Q-learning, Policy Gradient, Google Maps. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/87727 fuel consumption, exhaust emissions, as well as the mental and physical health of the public. Fixed-duration traffic light control systems are unable to adapt to dynamic traffic conditions. Therefore, an adaptive traffic light duration control system is needed, with the objective of reducing congestion. This research aims to develop a traffic light duration control system based on Reinforcement Learning (RL) that can adjust traffic light durations in real-time. The research methodology follows the CRISP-DM (Cross Industry Standard Process for Data Mining) framework, which consists of the following phases: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment. Traffic data is obtained through the Google Maps API and further synthesized into vehicle route data using the SUMO simulator to construct a simulated traffic environment. The RL model is trained using eight different algorithms: SARSA, Q-Learning, Deep Q-Network (DQN), Double Deep Q-Network (DDQN), REINFORCE, Deep Deterministic Policy Gradient (DDPG), Proximal Policy Optimization (PPO), and Soft Actor-Critic (SAC). Hyperparameter optimization is performed using the grid search approach to determine the best parameters for each algorithm. In this study, the evaluation results show that the off-policy policy gradient-based algorithm, DDPG, achieves the best performance in reducing congestion compared to other models. On weekday traffic, this model is able to reduce travel time by 30%, decrease queue length by 23%, and reduce the maximum queue length at intersections by 79%. On weekend traffic, it reduces travel time by 16%, decreases queue length by 14%, and reduces the maximum queue length at intersections by 75%. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description fuel consumption, exhaust emissions, as well as the mental and physical health of the public. Fixed-duration traffic light control systems are unable to adapt to dynamic traffic conditions. Therefore, an adaptive traffic light duration control system is needed, with the objective of reducing congestion. This research aims to develop a traffic light duration control system based on Reinforcement Learning (RL) that can adjust traffic light durations in real-time. The research methodology follows the CRISP-DM (Cross Industry Standard Process for Data Mining) framework, which consists of the following phases: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment. Traffic data is obtained through the Google Maps API and further synthesized into vehicle route data using the SUMO simulator to construct a simulated traffic environment. The RL model is trained using eight different algorithms: SARSA, Q-Learning, Deep Q-Network (DQN), Double Deep Q-Network (DDQN), REINFORCE, Deep Deterministic Policy Gradient (DDPG), Proximal Policy Optimization (PPO), and Soft Actor-Critic (SAC). Hyperparameter optimization is performed using the grid search approach to determine the best parameters for each algorithm. In this study, the evaluation results show that the off-policy policy gradient-based algorithm, DDPG, achieves the best performance in reducing congestion compared to other models. On weekday traffic, this model is able to reduce travel time by 30%, decrease queue length by 23%, and reduce the maximum queue length at intersections by 79%. On weekend traffic, it reduces travel time by 16%, decreases queue length by 14%, and reduces the maximum queue length at intersections by 75%.
format Theses
author Pandu Irsyadi, Naufal
spellingShingle Pandu Irsyadi, Naufal
REINFORCEMENT LEARNING-BASED TRAFFIC LIGHT DURATION CONTROL SYSTEM DEVELOPMENT
author_facet Pandu Irsyadi, Naufal
author_sort Pandu Irsyadi, Naufal
title REINFORCEMENT LEARNING-BASED TRAFFIC LIGHT DURATION CONTROL SYSTEM DEVELOPMENT
title_short REINFORCEMENT LEARNING-BASED TRAFFIC LIGHT DURATION CONTROL SYSTEM DEVELOPMENT
title_full REINFORCEMENT LEARNING-BASED TRAFFIC LIGHT DURATION CONTROL SYSTEM DEVELOPMENT
title_fullStr REINFORCEMENT LEARNING-BASED TRAFFIC LIGHT DURATION CONTROL SYSTEM DEVELOPMENT
title_full_unstemmed REINFORCEMENT LEARNING-BASED TRAFFIC LIGHT DURATION CONTROL SYSTEM DEVELOPMENT
title_sort reinforcement learning-based traffic light duration control system development
url https://digilib.itb.ac.id/gdl/view/87727
_version_ 1823658252858032128