HARDWARE ARCHITECTURE DESIGN OF DOUBLE Q-LEARNING ALGORITHM FOR SMART TRAFFIC CONTROLLER

Traffic jam is a serious problem that causes many losses. There are many causes of congestion, ranging from vehicle debits that exceed road capacity, poor driving culture, and traffic control systems that don't adapt to road conditions. Most of the traffic management systems in Indonesia are...

Full description

Saved in:
Bibliographic Details
Main Author: Reswara Wiradjanu, Jalu
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/73890
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Traffic jam is a serious problem that causes many losses. There are many causes of congestion, ranging from vehicle debits that exceed road capacity, poor driving culture, and traffic control systems that don't adapt to road conditions. Most of the traffic management systems in Indonesia are still regulated on a pre-timed basis which makes it unable to adjust to traffic conditions at any time. This increases the risk of congestion due to the remaining queues. An adaptive traffic control system was developed for two adjacent intersections based on Q-Learning at the simulation level. The system is capable of simulating traffic conditions based on traffic counting results and traffic lights are regulated based on the Q-learning algorithm. Traffic conditions are also replicated in traffic miniatures as a form of real-world implementation approach. The Q-Learning algorithm is implemented in the hardware description language Verilog. The implementation of Q-learning has not been successful. Supposedly, the Q-matrix results are sent to SUMO to be able to manage the simulated traffic. The resulting Q-matrix is 256 x 4 in size with a 32- bit signed Q-value data width. The results of the reward graph show that the number of rewards is increasingly positive. Policy changes make the results of the reward graph change because the reward is determined at each step. There is still no performance comparison between adaptive settings and manual or pre-timed settings yet.