Deep learning for video segmentation

In recent years, CNNs have been the methodology used in multiple Computer Vision tasks. Although CNNs are avant-garde in image classification or object detection challenges, there are several limitations to them when it comes to semantic segmentation. When training the model, the resultant featur...

Full description

Saved in:

Bibliographic Details
Main Author:	Tan, Clement Xian Ren
Other Authors:	Lin Guosheng
Format:	Final Year Project
Language:	English
Published:	2019
Subjects:	DRNTU::Engineering::Computer science and engineering
Online Access:	http://hdl.handle.net/10356/76976
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-76976
record_format	dspace
spelling	sg-ntu-dr.10356-769762023-03-03T20:23:11Z Deep learning for video segmentation Tan, Clement Xian Ren Lin Guosheng School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering In recent years, CNNs have been the methodology used in multiple Computer Vision tasks. Although CNNs are avant-garde in image classification or object detection challenges, there are several limitations to them when it comes to semantic segmentation. When training the model, the resultant feature maps are usually coarse. Moreover, a typical evaluation process for the state-of-the-art DeepLab model takes approximately seven to eight FPS and is not suitable for real-time applications such as self-driving cars. This final year project seeks to evaluate the effectiveness of the atrous convolutions and atrous spatial pyramid pooling module on CNNs for the task of semantic segmentation. Before diving directly into the training of the CNN architectures, the analysis was done on the feature extractors and semantic segmentation architectures that will be used in the project. Next, the DeepLabV2, DeepLabV3 and dilated MobileNetV2 architectures were trained and evaluated on the Computer Vision and Pattern Recognition (CVPR) Workshop on Autonomous Driving (WAD) 2018 Berkeley DeepDrive dataset. In addition, the Cityscapes and a Singapore video will be used to visualize the drivable road segmentations. The DeepLabV3 and DeepLabV2 models used in this project achieved 84.30% and 78.83% validation mIOU respectively and these findings suggest that the atrous convolution and atrous spatial pooling module boosts the mIOU accuracy substantially and it may be reused in several other image classification architectures. These upsampling methodologies were incorporated into the MobileNetV2 which then achieved 76.10% validation mIOU and the trade-off between the accuracy and efficiency between the DeepLabV2 and MobileNetV2 architectures are discussed. Bachelor of Engineering (Computer Science) 2019-04-28T13:40:01Z 2019-04-28T13:40:01Z 2019 Final Year Project (FYP) http://hdl.handle.net/10356/76976 en Nanyang Technological University 59 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering
spellingShingle	DRNTU::Engineering::Computer science and engineering Tan, Clement Xian Ren Deep learning for video segmentation
description	In recent years, CNNs have been the methodology used in multiple Computer Vision tasks. Although CNNs are avant-garde in image classification or object detection challenges, there are several limitations to them when it comes to semantic segmentation. When training the model, the resultant feature maps are usually coarse. Moreover, a typical evaluation process for the state-of-the-art DeepLab model takes approximately seven to eight FPS and is not suitable for real-time applications such as self-driving cars. This final year project seeks to evaluate the effectiveness of the atrous convolutions and atrous spatial pyramid pooling module on CNNs for the task of semantic segmentation. Before diving directly into the training of the CNN architectures, the analysis was done on the feature extractors and semantic segmentation architectures that will be used in the project. Next, the DeepLabV2, DeepLabV3 and dilated MobileNetV2 architectures were trained and evaluated on the Computer Vision and Pattern Recognition (CVPR) Workshop on Autonomous Driving (WAD) 2018 Berkeley DeepDrive dataset. In addition, the Cityscapes and a Singapore video will be used to visualize the drivable road segmentations. The DeepLabV3 and DeepLabV2 models used in this project achieved 84.30% and 78.83% validation mIOU respectively and these findings suggest that the atrous convolution and atrous spatial pooling module boosts the mIOU accuracy substantially and it may be reused in several other image classification architectures. These upsampling methodologies were incorporated into the MobileNetV2 which then achieved 76.10% validation mIOU and the trade-off between the accuracy and efficiency between the DeepLabV2 and MobileNetV2 architectures are discussed.
author2	Lin Guosheng
author_facet	Lin Guosheng Tan, Clement Xian Ren
format	Final Year Project
author	Tan, Clement Xian Ren
author_sort	Tan, Clement Xian Ren
title	Deep learning for video segmentation
title_short	Deep learning for video segmentation
title_full	Deep learning for video segmentation
title_fullStr	Deep learning for video segmentation
title_full_unstemmed	Deep learning for video segmentation
title_sort	deep learning for video segmentation
publishDate	2019
url	http://hdl.handle.net/10356/76976
_version_	1759857496498896896

Deep learning for video segmentation

Similar Items