Real-time semantic image segmentation via light-weight neural networks

Semantic image segmentation aims to generate the high-level classification of regions, in which each pixel will be associated with a class label from a predefined set. However, not all semantic segmentation architectures are suitable to be applied to real time applications due to their high computat...

Full description

Saved in:
Bibliographic Details
Main Author: Xu, Jing
Other Authors: Jiang Xudong
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/140054
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Semantic image segmentation aims to generate the high-level classification of regions, in which each pixel will be associated with a class label from a predefined set. However, not all semantic segmentation architectures are suitable to be applied to real time applications due to their high computational cost and huge number of parameters. Thus, we consider the task of adapting an efficient and powerful semantic image segmentation architecture, called Light Weight RefineNet, into a further reduction of number of parameters while keeping competitive performance. In particular, we have explored the possibility to use the MobileNet family as the encoder backbone, namely, MobileNet-v2 and its upgraded version MobileNet-v3 architectures and to improve their performance by adding dilated convolution layer to the decoder part as well as solving exploding gradient problem by applying gradient clipping and Residual with Zero initialisation (ReZero). The adapted architecture with MobileNet-v2 as encoder reduces the parameters by 15.27M, which is half of parameters of the original RefineNet-LW-50, yet keeps similar accuracy. For MobileNet-v3, although its performance may not be that competitive comparing to MobileNet-v2, it achieves further parameter reduction to 8.32M. and a fast speed of 77 FPS on 625 × 468 inputs.