A computer vision framework to detect, track and identify objects in a wide field-of-view airport-airside environment

To accommodate the steady growth of air traffic, airports worldwide need to increase their capacity by expanding infrastructure and optimizing air traffic procedures. However, as airports add new infrastructure (runways, taxiways, and aprons) into operation, they also increase the challenge of airpo...

Full description

Saved in:
Bibliographic Details
Main Author: Thai, Van Phat
Other Authors: Sameer Alam
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/168335
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-168335
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Thai, Van Phat
A computer vision framework to detect, track and identify objects in a wide field-of-view airport-airside environment
description To accommodate the steady growth of air traffic, airports worldwide need to increase their capacity by expanding infrastructure and optimizing air traffic procedures. However, as airports add new infrastructure (runways, taxiways, and aprons) into operation, they also increase the challenge of airport airside management. This may lead to an increase in controller workload as well as runway and taxiway incursion events. This problem is compounded by unauthorized access to the control zone by unauthorized and non-cooperative objects, such as drones. Air traffic controllers now are therefore required to monitor more traffic and direct complex operations. For effective airport airside management, controllers must detect aircraft movement on the ground. They also must coordinate push-back procedures, taxiway routing, and runway sequencing for effective airside traffic circulation. Controllers are traditionally located in a tall control tower to manage air traffic through an out-of-window view. Although physical towers have served airside control well, they have several limitations. First, a physical tower is costly for a small airport because the fixed costs for providing air traffic service are independent of the traffic. Secondly, one physical tower might not be sufficient to visually cover a large airport. For full visual traffic coverage of a large airport, the geometric height of a tower can be sustainably high which poses an obstruction to navigation in the airport control zone. To enhance safety and improve operational efficiency, the concept of the digital tower has been developed. A digital tower replaces the out-of-window view of a conventional tower with a visualization system provided by a network of high-resolution cameras. By using such a visualization system, an airport can have more than one digital tower if necessary, resulting in greater visual coverage. The visibility can be further enhanced by infrared and pan-tilt-zoom cameras. With a digital system, digital towers can provide many enhanced functions to assist controllers. However, the airport airside is a wide field-of-view environment with dynamic traffic and complex operations, and existing computer vision systems at airports only focus on a portion of the airport airside, such as detecting an aircraft on the runway, tracking moving aircraft, or detecting objects on an apron. Moreover, such systems do not consider gate site management, such as push-back control or turnaround monitoring. The objective of this thesis is to develop a computer vision framework to detect, track and identify aircraft and relevant objects (cooperative and non-cooperative) in a wide field-of-view (multiple cameras) airside for effective airside monitoring and turnaround management. The framework designs a specific Convolutional Neural Network, namely AirNet, customized for the unique characteristics of the airport airside environment. By using depthwise convolution operation, the computational costs are significantly reduced while maintaining high detection performance. In addition, the AirNet architecture is divided into blocks, customizable by hyper-parameters. Therefore, the model can be optimized easily to detect a wide range of objects in different airside environments. The framework also exploits spatial-temporal information. By representing spatial information as a gray-scale image colored by temporal information, objects with various dimensions and speeds are detected with high performance. Furthermore, by placing cameras in different positions, the airport airside is covered from different angles and ranges. As a result, the framework in this thesis can detect aircraft and relevant objects in real-time with high performance. The experimental results show that the average precision of object detection is approximately 97% in three different airport datasets. The framework also provides many enhanced functions depending on aircraft phases. When aircraft are airborne, during the final approach and take-off phases, small flying aircraft and drones are detected to prevent aircraft collision. The AirNet is validated on a public dataset which high variability including complex backgrounds, and different weather condition. The results show that the average precision of the AirNet is 85% which outperforms the state-of-the-art models by a large margin. After landing or before taking off, aircraft maneuver through runways and taxiways, which requires multiple cameras to capture. During this maneuvering phase, aircraft speed and distance are estimated by transferring pixel coordinates to geographic coordinates. To reduce the distortion effect of a wide field of view, the airport airside is non-uniformly divided into small regions for camera calibration. The average geographic estimation error, which is a combination of detection error and transfer error, is 6m, and within the acceptable error of airport surveillance, 7.5m. After exiting the taxiway, aircraft arrive at an apron to undergo a turnaround process in preparation for the next flight. Accurately predicting turnaround time may improve the predictability of runway demand and help determine an optimal push-back sequence to ensure smooth take‐offs at the runways. By grouping turnaround activities into three different process chains, the prediction error provided by the framework is significantly reduced compared to the prediction error provided by the airlines. Specifically, the sample data from Obihiro airport in the period 10/2020 to 03/2021 showed the mean error provided by airlines is 411 seconds which can be reduced to 224 seconds by the prediction model of the framework. Furthermore, as the framework predicts turnaround time on an ongoing basis, the mean error is reduced to 155 seconds at 10 minutes before push-back, and 95 seconds at 5 minutes before push-back. The results presented in this thesis might help realize the aims of next generation digital towers, including monitoring multiple airports, or integration into large-scale airports.
author2 Sameer Alam
author_facet Sameer Alam
Thai, Van Phat
format Thesis-Doctor of Philosophy
author Thai, Van Phat
author_sort Thai, Van Phat
title A computer vision framework to detect, track and identify objects in a wide field-of-view airport-airside environment
title_short A computer vision framework to detect, track and identify objects in a wide field-of-view airport-airside environment
title_full A computer vision framework to detect, track and identify objects in a wide field-of-view airport-airside environment
title_fullStr A computer vision framework to detect, track and identify objects in a wide field-of-view airport-airside environment
title_full_unstemmed A computer vision framework to detect, track and identify objects in a wide field-of-view airport-airside environment
title_sort computer vision framework to detect, track and identify objects in a wide field-of-view airport-airside environment
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/168335
_version_ 1772827170312290304
spelling sg-ntu-dr.10356-1683352023-06-03T16:54:13Z A computer vision framework to detect, track and identify objects in a wide field-of-view airport-airside environment Thai, Van Phat Sameer Alam School of Mechanical and Aerospace Engineering sameeralam@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision To accommodate the steady growth of air traffic, airports worldwide need to increase their capacity by expanding infrastructure and optimizing air traffic procedures. However, as airports add new infrastructure (runways, taxiways, and aprons) into operation, they also increase the challenge of airport airside management. This may lead to an increase in controller workload as well as runway and taxiway incursion events. This problem is compounded by unauthorized access to the control zone by unauthorized and non-cooperative objects, such as drones. Air traffic controllers now are therefore required to monitor more traffic and direct complex operations. For effective airport airside management, controllers must detect aircraft movement on the ground. They also must coordinate push-back procedures, taxiway routing, and runway sequencing for effective airside traffic circulation. Controllers are traditionally located in a tall control tower to manage air traffic through an out-of-window view. Although physical towers have served airside control well, they have several limitations. First, a physical tower is costly for a small airport because the fixed costs for providing air traffic service are independent of the traffic. Secondly, one physical tower might not be sufficient to visually cover a large airport. For full visual traffic coverage of a large airport, the geometric height of a tower can be sustainably high which poses an obstruction to navigation in the airport control zone. To enhance safety and improve operational efficiency, the concept of the digital tower has been developed. A digital tower replaces the out-of-window view of a conventional tower with a visualization system provided by a network of high-resolution cameras. By using such a visualization system, an airport can have more than one digital tower if necessary, resulting in greater visual coverage. The visibility can be further enhanced by infrared and pan-tilt-zoom cameras. With a digital system, digital towers can provide many enhanced functions to assist controllers. However, the airport airside is a wide field-of-view environment with dynamic traffic and complex operations, and existing computer vision systems at airports only focus on a portion of the airport airside, such as detecting an aircraft on the runway, tracking moving aircraft, or detecting objects on an apron. Moreover, such systems do not consider gate site management, such as push-back control or turnaround monitoring. The objective of this thesis is to develop a computer vision framework to detect, track and identify aircraft and relevant objects (cooperative and non-cooperative) in a wide field-of-view (multiple cameras) airside for effective airside monitoring and turnaround management. The framework designs a specific Convolutional Neural Network, namely AirNet, customized for the unique characteristics of the airport airside environment. By using depthwise convolution operation, the computational costs are significantly reduced while maintaining high detection performance. In addition, the AirNet architecture is divided into blocks, customizable by hyper-parameters. Therefore, the model can be optimized easily to detect a wide range of objects in different airside environments. The framework also exploits spatial-temporal information. By representing spatial information as a gray-scale image colored by temporal information, objects with various dimensions and speeds are detected with high performance. Furthermore, by placing cameras in different positions, the airport airside is covered from different angles and ranges. As a result, the framework in this thesis can detect aircraft and relevant objects in real-time with high performance. The experimental results show that the average precision of object detection is approximately 97% in three different airport datasets. The framework also provides many enhanced functions depending on aircraft phases. When aircraft are airborne, during the final approach and take-off phases, small flying aircraft and drones are detected to prevent aircraft collision. The AirNet is validated on a public dataset which high variability including complex backgrounds, and different weather condition. The results show that the average precision of the AirNet is 85% which outperforms the state-of-the-art models by a large margin. After landing or before taking off, aircraft maneuver through runways and taxiways, which requires multiple cameras to capture. During this maneuvering phase, aircraft speed and distance are estimated by transferring pixel coordinates to geographic coordinates. To reduce the distortion effect of a wide field of view, the airport airside is non-uniformly divided into small regions for camera calibration. The average geographic estimation error, which is a combination of detection error and transfer error, is 6m, and within the acceptable error of airport surveillance, 7.5m. After exiting the taxiway, aircraft arrive at an apron to undergo a turnaround process in preparation for the next flight. Accurately predicting turnaround time may improve the predictability of runway demand and help determine an optimal push-back sequence to ensure smooth take‐offs at the runways. By grouping turnaround activities into three different process chains, the prediction error provided by the framework is significantly reduced compared to the prediction error provided by the airlines. Specifically, the sample data from Obihiro airport in the period 10/2020 to 03/2021 showed the mean error provided by airlines is 411 seconds which can be reduced to 224 seconds by the prediction model of the framework. Furthermore, as the framework predicts turnaround time on an ongoing basis, the mean error is reduced to 155 seconds at 10 minutes before push-back, and 95 seconds at 5 minutes before push-back. The results presented in this thesis might help realize the aims of next generation digital towers, including monitoring multiple airports, or integration into large-scale airports. Doctor of Philosophy 2023-05-29T05:25:41Z 2023-05-29T05:25:41Z 2022 Thesis-Doctor of Philosophy Thai, V. P. (2022). A computer vision framework to detect, track and identify objects in a wide field-of-view airport-airside environment. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/168335 https://hdl.handle.net/10356/168335 10.32657/10356/168335 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University