Deep learning for video segmentation
In recent years, CNNs have been the methodology used in multiple Computer Vision tasks. Although CNNs are avant-garde in image classification or object detection challenges, there are several limitations to them when it comes to semantic segmentation. When training the model, the resultant featur...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/76976 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In recent years, CNNs have been the methodology used in multiple Computer Vision tasks.
Although CNNs are avant-garde in image classification or object detection challenges, there
are several limitations to them when it comes to semantic segmentation. When training the
model, the resultant feature maps are usually coarse. Moreover, a typical evaluation process
for the state-of-the-art DeepLab model takes approximately seven to eight FPS and is not
suitable for real-time applications such as self-driving cars.
This final year project seeks to evaluate the effectiveness of the atrous convolutions and atrous
spatial pyramid pooling module on CNNs for the task of semantic segmentation. Before diving
directly into the training of the CNN architectures, the analysis was done on the feature
extractors and semantic segmentation architectures that will be used in the project. Next, the
DeepLabV2, DeepLabV3 and dilated MobileNetV2 architectures were trained and evaluated
on the Computer Vision and Pattern Recognition (CVPR) Workshop on Autonomous Driving
(WAD) 2018 Berkeley DeepDrive dataset. In addition, the Cityscapes and a Singapore video
will be used to visualize the drivable road segmentations.
The DeepLabV3 and DeepLabV2 models used in this project achieved 84.30% and 78.83%
validation mIOU respectively and these findings suggest that the atrous convolution and atrous
spatial pooling module boosts the mIOU accuracy substantially and it may be reused in several
other image classification architectures. These upsampling methodologies were incorporated
into the MobileNetV2 which then achieved 76.10% validation mIOU and the trade-off between
the accuracy and efficiency between the DeepLabV2 and MobileNetV2 architectures are
discussed. |
---|