Deep learning for video segmentation

In recent years, CNNs have been the methodology used in multiple Computer Vision tasks. Although CNNs are avant-garde in image classification or object detection challenges, there are several limitations to them when it comes to semantic segmentation. When training the model, the resultant featur...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Clement Xian Ren
Other Authors: Lin Guosheng
Format: Final Year Project
Language:English
Published: 2019
Subjects:
Online Access:http://hdl.handle.net/10356/76976
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-76976
record_format dspace
spelling sg-ntu-dr.10356-769762023-03-03T20:23:11Z Deep learning for video segmentation Tan, Clement Xian Ren Lin Guosheng School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering In recent years, CNNs have been the methodology used in multiple Computer Vision tasks. Although CNNs are avant-garde in image classification or object detection challenges, there are several limitations to them when it comes to semantic segmentation. When training the model, the resultant feature maps are usually coarse. Moreover, a typical evaluation process for the state-of-the-art DeepLab model takes approximately seven to eight FPS and is not suitable for real-time applications such as self-driving cars. This final year project seeks to evaluate the effectiveness of the atrous convolutions and atrous spatial pyramid pooling module on CNNs for the task of semantic segmentation. Before diving directly into the training of the CNN architectures, the analysis was done on the feature extractors and semantic segmentation architectures that will be used in the project. Next, the DeepLabV2, DeepLabV3 and dilated MobileNetV2 architectures were trained and evaluated on the Computer Vision and Pattern Recognition (CVPR) Workshop on Autonomous Driving (WAD) 2018 Berkeley DeepDrive dataset. In addition, the Cityscapes and a Singapore video will be used to visualize the drivable road segmentations. The DeepLabV3 and DeepLabV2 models used in this project achieved 84.30% and 78.83% validation mIOU respectively and these findings suggest that the atrous convolution and atrous spatial pooling module boosts the mIOU accuracy substantially and it may be reused in several other image classification architectures. These upsampling methodologies were incorporated into the MobileNetV2 which then achieved 76.10% validation mIOU and the trade-off between the accuracy and efficiency between the DeepLabV2 and MobileNetV2 architectures are discussed. Bachelor of Engineering (Computer Science) 2019-04-28T13:40:01Z 2019-04-28T13:40:01Z 2019 Final Year Project (FYP) http://hdl.handle.net/10356/76976 en Nanyang Technological University 59 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering
spellingShingle DRNTU::Engineering::Computer science and engineering
Tan, Clement Xian Ren
Deep learning for video segmentation
description In recent years, CNNs have been the methodology used in multiple Computer Vision tasks. Although CNNs are avant-garde in image classification or object detection challenges, there are several limitations to them when it comes to semantic segmentation. When training the model, the resultant feature maps are usually coarse. Moreover, a typical evaluation process for the state-of-the-art DeepLab model takes approximately seven to eight FPS and is not suitable for real-time applications such as self-driving cars. This final year project seeks to evaluate the effectiveness of the atrous convolutions and atrous spatial pyramid pooling module on CNNs for the task of semantic segmentation. Before diving directly into the training of the CNN architectures, the analysis was done on the feature extractors and semantic segmentation architectures that will be used in the project. Next, the DeepLabV2, DeepLabV3 and dilated MobileNetV2 architectures were trained and evaluated on the Computer Vision and Pattern Recognition (CVPR) Workshop on Autonomous Driving (WAD) 2018 Berkeley DeepDrive dataset. In addition, the Cityscapes and a Singapore video will be used to visualize the drivable road segmentations. The DeepLabV3 and DeepLabV2 models used in this project achieved 84.30% and 78.83% validation mIOU respectively and these findings suggest that the atrous convolution and atrous spatial pooling module boosts the mIOU accuracy substantially and it may be reused in several other image classification architectures. These upsampling methodologies were incorporated into the MobileNetV2 which then achieved 76.10% validation mIOU and the trade-off between the accuracy and efficiency between the DeepLabV2 and MobileNetV2 architectures are discussed.
author2 Lin Guosheng
author_facet Lin Guosheng
Tan, Clement Xian Ren
format Final Year Project
author Tan, Clement Xian Ren
author_sort Tan, Clement Xian Ren
title Deep learning for video segmentation
title_short Deep learning for video segmentation
title_full Deep learning for video segmentation
title_fullStr Deep learning for video segmentation
title_full_unstemmed Deep learning for video segmentation
title_sort deep learning for video segmentation
publishDate 2019
url http://hdl.handle.net/10356/76976
_version_ 1759857496498896896