Real time semantic segmentation by fully convolutional network for UAV localization in urban environment

Deep learning is an improvement over neural networks that includes more layers of computation, allowing for higher levels of abstraction and prediction in the data. To date, it has become the leading machine learning tool for general imaging and computer vision. Current research trends also indicate...

Full description

Saved in:
Bibliographic Details
Main Author: Jiang, Muyun
Other Authors: Wang Dan Wei
Format: Theses and Dissertations
Language:English
Published: 2019
Subjects:
Online Access:http://hdl.handle.net/10356/78780
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Deep learning is an improvement over neural networks that includes more layers of computation, allowing for higher levels of abstraction and prediction in the data. To date, it has become the leading machine learning tool for general imaging and computer vision. Current research trends also indicate that deep convolutional neural networks (DCNN) are very effective for automatically analyzing images. Therefore, machine learning has a wide range of uses in the perceptual positioning of robots. This project aims to localize UAV equipped with a camera automatically flying among urban streets. The UAV navigation suffers from day-night luminance and weather change,so the UAV should only remember the invariant features in the environment such as road boundaries and building structures, and ignore those may change over time, such as cars, trees. And because UAV on-board computer is a computational power limited platform, so the network design should leverage speed and accuracy. This work proposes an effective and efficient Full Convolutional Network based End-to-End Encoder-Decoder Architecture for automatically semantic segmentation in urban environment. Because of the computational power limit of the UAV platform, the network is designed to be a 12-layer FCN with Dilated Convolution to to enlarge the field of view and extract multiscale information, and depthwise and pointwise separable convolution to reduce the number of parameters and cut down the computation cost, without causing degrading performance. In case of insufficient dataset images, we use data augmentation and Drop Block to provide sufficient and training data and improve generalization capability. The proposed model can achieve real-time operation standard of 11Hz image processing frequency on NVIDIA TX2 platform with over 86.5% mean IOU.