Automatic speech recognition and chat bot for air traffic control

Artificial Intelligence (AI) has demonstrated the ability to manage complex processes highly effectively and thus is widely seen as a key component in future airport ATM systems. Future AI tools for ATMs will rely on digital data, such as surveillance, radar, weather, and flight plans, for the...

Full description

Saved in:
Bibliographic Details
Main Author: Low, Ashton Kin Yun
Other Authors: Sameer Alam
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/177842
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Artificial Intelligence (AI) has demonstrated the ability to manage complex processes highly effectively and thus is widely seen as a key component in future airport ATM systems. Future AI tools for ATMs will rely on digital data, such as surveillance, radar, weather, and flight plans, for their operation. However, the foundational Air Traffic Control Officer (ATCo)-pilot communication medium is voice, which is a vital source of situational data. Controller Pilot Data Link Communications (CPDLC) has been developed as an alternative, text-based communication delivery method, however, ATCo-pilot communications will not be completely transitioned to this framework in the near-term future. Moreover, as CPDLC is a one-to-one communication paradigm, the additional situational awareness of other traffic provided by traditional party-line VHF communications is potentially lost. Therefore, an automated speech-to-text translation tool can be seen as a missing link, enabling traditional ATCo-pilot voice communications to be automatically translated and input into a datalink system such as CPDLC. To this end this paper presents a Machine Learning (ML) based Automatic Speech Recognition (ASR) framework that is able to accurately translate VHF-quality ATCo-pilot speech communication to text, achieving a Word Error Rate of only 6.13%. Moreover, the presented model is able to extract crucial information with an accuracy and F1-score of 95.2% and 90.5% respectively. A detailed design of the framework is provided to enable its replication by the wider research community.