Deep neural network compression for pixel-level vision tasks

Deep convolutional neural networks (DCNNs) have demonstrated remarkable performance in many computer vision tasks. In order to achieve this, DCNNs typically require a large number of trainable parameters that are optimized to extract informative features. This often results in over-parameterization...

Full description

Saved in:

Bibliographic Details
Main Author:	He, Wei
Other Authors:	Lam Siew Kei
Format:	Thesis-Master by Research
Language:	English
Published:	Nanyang Technological University 2021
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Online Access:	https://hdl.handle.net/10356/150076
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-150076
record_format	dspace
spelling	sg-ntu-dr.10356-1500762021-07-08T16:01:19Z Deep neural network compression for pixel-level vision tasks He, Wei Lam Siew Kei School of Computer Science and Engineering ASSKLam@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Deep convolutional neural networks (DCNNs) have demonstrated remarkable performance in many computer vision tasks. In order to achieve this, DCNNs typically require a large number of trainable parameters that are optimized to extract informative features. This often results in over-parameterization of the DCNN models, which incurs high computational complexity and large storage requirements that hinder their deployment on embedded devices with stringent computational and memory resources. In this thesis, we aim to develop DCNN compression methods to generate compact DCNN models that can still produce comparable performance as the original models. In particular, our proposed methods must lend themselves well towards DCNN models for pixel-level vision tasks (such as semantic segmentation and crowd counting). DCNN compression for pixel-level vision tasks has not been thoroughly investigated, as existing works mainly target the less challenging image-level classification task. We first present a framework that utilizes knowledge distillation to recover the performance loss of DCNN models that have undergone network pruning. This departs from the existing knowledge distillation approaches, where the student model and teacher model are pre-defined before knowledge adaptation. Experiments on the encoder-decoder type models for semantic segmentation demonstrate that the proposed framework can effectively recover the performance loss of the compact student model after aggressive pruning in most cases. However, in certain cases, knowledge transfer cannot outperform the conventional fine-tuning process on the pruned semantic segmentation architectures. Next, we propose Context-Aware Pruning (CAP) that utilizes channel association, which captures the contextual information, to exploit parameters redundancy for pruning semantic segmentation models. We evaluated our framework on widely-used benchmarks and showed its effectiveness on both large and lightweight models. Our framework reduces the number of parameters of state-of-the-art semantic segmentation models PSPNet101, PSPNet-50, ICNet, and SegNet, by 32%, 47%, 54%, and 63% respectively on the Cityscapes dataset. This reduction is achieved while preserving the best performance among all the baselines pruning methods considered. Finally, we propose Adaptive Correlation-driven Sparsity Learning (ACSL) for DCNN compression that can provide superior performance on both image-level and pixel-level vision tasks. ACSL extends CAP by inducing sparsity into the channel importance with an adaptive penalty strength. The experimental results demonstrate that ACSL outperforms state-of-the-art pruning methods on image-level classification, semantic segmentation, and dense crowd counting tasks. In particular for the crowd counting task, the proposed ACSL framework is able to reduce the DCNN model parameters by up to 94%, while maintaining the same performance of (at times outperforming) the original model. Master of Engineering 2021-06-22T08:47:47Z 2021-06-22T08:47:47Z 2021 Thesis-Master by Research He, W. (2021). Deep neural network compression for pixel-level vision tasks. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/150076 https://hdl.handle.net/10356/150076 10.32657/10356/150076 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence He, Wei Deep neural network compression for pixel-level vision tasks
description	Deep convolutional neural networks (DCNNs) have demonstrated remarkable performance in many computer vision tasks. In order to achieve this, DCNNs typically require a large number of trainable parameters that are optimized to extract informative features. This often results in over-parameterization of the DCNN models, which incurs high computational complexity and large storage requirements that hinder their deployment on embedded devices with stringent computational and memory resources. In this thesis, we aim to develop DCNN compression methods to generate compact DCNN models that can still produce comparable performance as the original models. In particular, our proposed methods must lend themselves well towards DCNN models for pixel-level vision tasks (such as semantic segmentation and crowd counting). DCNN compression for pixel-level vision tasks has not been thoroughly investigated, as existing works mainly target the less challenging image-level classification task. We first present a framework that utilizes knowledge distillation to recover the performance loss of DCNN models that have undergone network pruning. This departs from the existing knowledge distillation approaches, where the student model and teacher model are pre-defined before knowledge adaptation. Experiments on the encoder-decoder type models for semantic segmentation demonstrate that the proposed framework can effectively recover the performance loss of the compact student model after aggressive pruning in most cases. However, in certain cases, knowledge transfer cannot outperform the conventional fine-tuning process on the pruned semantic segmentation architectures. Next, we propose Context-Aware Pruning (CAP) that utilizes channel association, which captures the contextual information, to exploit parameters redundancy for pruning semantic segmentation models. We evaluated our framework on widely-used benchmarks and showed its effectiveness on both large and lightweight models. Our framework reduces the number of parameters of state-of-the-art semantic segmentation models PSPNet101, PSPNet-50, ICNet, and SegNet, by 32%, 47%, 54%, and 63% respectively on the Cityscapes dataset. This reduction is achieved while preserving the best performance among all the baselines pruning methods considered. Finally, we propose Adaptive Correlation-driven Sparsity Learning (ACSL) for DCNN compression that can provide superior performance on both image-level and pixel-level vision tasks. ACSL extends CAP by inducing sparsity into the channel importance with an adaptive penalty strength. The experimental results demonstrate that ACSL outperforms state-of-the-art pruning methods on image-level classification, semantic segmentation, and dense crowd counting tasks. In particular for the crowd counting task, the proposed ACSL framework is able to reduce the DCNN model parameters by up to 94%, while maintaining the same performance of (at times outperforming) the original model.
author2	Lam Siew Kei
author_facet	Lam Siew Kei He, Wei
format	Thesis-Master by Research
author	He, Wei
author_sort	He, Wei
title	Deep neural network compression for pixel-level vision tasks
title_short	Deep neural network compression for pixel-level vision tasks
title_full	Deep neural network compression for pixel-level vision tasks
title_fullStr	Deep neural network compression for pixel-level vision tasks
title_full_unstemmed	Deep neural network compression for pixel-level vision tasks
title_sort	deep neural network compression for pixel-level vision tasks
publisher	Nanyang Technological University
publishDate	2021
url	https://hdl.handle.net/10356/150076
_version_	1705151279906422784

Deep neural network compression for pixel-level vision tasks

Similar Items