Deep neural network compression for pixel-level vision tasks

Deep convolutional neural networks (DCNNs) have demonstrated remarkable performance in many computer vision tasks. In order to achieve this, DCNNs typically require a large number of trainable parameters that are optimized to extract informative features. This often results in over-parameterization...

Full description

Saved in:
Bibliographic Details
Main Author: He, Wei
Other Authors: Lam Siew Kei
Format: Thesis-Master by Research
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/150076
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-150076
record_format dspace
spelling sg-ntu-dr.10356-1500762021-07-08T16:01:19Z Deep neural network compression for pixel-level vision tasks He, Wei Lam Siew Kei School of Computer Science and Engineering ASSKLam@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Deep convolutional neural networks (DCNNs) have demonstrated remarkable performance in many computer vision tasks. In order to achieve this, DCNNs typically require a large number of trainable parameters that are optimized to extract informative features. This often results in over-parameterization of the DCNN models, which incurs high computational complexity and large storage requirements that hinder their deployment on embedded devices with stringent computational and memory resources. In this thesis, we aim to develop DCNN compression methods to generate compact DCNN models that can still produce comparable performance as the original models. In particular, our proposed methods must lend themselves well towards DCNN models for pixel-level vision tasks (such as semantic segmentation and crowd counting). DCNN compression for pixel-level vision tasks has not been thoroughly investigated, as existing works mainly target the less challenging image-level classification task. We first present a framework that utilizes knowledge distillation to recover the performance loss of DCNN models that have undergone network pruning. This departs from the existing knowledge distillation approaches, where the student model and teacher model are pre-defined before knowledge adaptation. Experiments on the encoder-decoder type models for semantic segmentation demonstrate that the proposed framework can effectively recover the performance loss of the compact student model after aggressive pruning in most cases. However, in certain cases, knowledge transfer cannot outperform the conventional fine-tuning process on the pruned semantic segmentation architectures. Next, we propose Context-Aware Pruning (CAP) that utilizes channel association, which captures the contextual information, to exploit parameters redundancy for pruning semantic segmentation models. We evaluated our framework on widely-used benchmarks and showed its effectiveness on both large and lightweight models. Our framework reduces the number of parameters of state-of-the-art semantic segmentation models PSPNet101, PSPNet-50, ICNet, and SegNet, by 32%, 47%, 54%, and 63% respectively on the Cityscapes dataset. This reduction is achieved while preserving the best performance among all the baselines pruning methods considered. Finally, we propose Adaptive Correlation-driven Sparsity Learning (ACSL) for DCNN compression that can provide superior performance on both image-level and pixel-level vision tasks. ACSL extends CAP by inducing sparsity into the channel importance with an adaptive penalty strength. The experimental results demonstrate that ACSL outperforms state-of-the-art pruning methods on image-level classification, semantic segmentation, and dense crowd counting tasks. In particular for the crowd counting task, the proposed ACSL framework is able to reduce the DCNN model parameters by up to 94%, while maintaining the same performance of (at times outperforming) the original model. Master of Engineering 2021-06-22T08:47:47Z 2021-06-22T08:47:47Z 2021 Thesis-Master by Research He, W. (2021). Deep neural network compression for pixel-level vision tasks. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/150076 https://hdl.handle.net/10356/150076 10.32657/10356/150076 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
He, Wei
Deep neural network compression for pixel-level vision tasks
description Deep convolutional neural networks (DCNNs) have demonstrated remarkable performance in many computer vision tasks. In order to achieve this, DCNNs typically require a large number of trainable parameters that are optimized to extract informative features. This often results in over-parameterization of the DCNN models, which incurs high computational complexity and large storage requirements that hinder their deployment on embedded devices with stringent computational and memory resources. In this thesis, we aim to develop DCNN compression methods to generate compact DCNN models that can still produce comparable performance as the original models. In particular, our proposed methods must lend themselves well towards DCNN models for pixel-level vision tasks (such as semantic segmentation and crowd counting). DCNN compression for pixel-level vision tasks has not been thoroughly investigated, as existing works mainly target the less challenging image-level classification task. We first present a framework that utilizes knowledge distillation to recover the performance loss of DCNN models that have undergone network pruning. This departs from the existing knowledge distillation approaches, where the student model and teacher model are pre-defined before knowledge adaptation. Experiments on the encoder-decoder type models for semantic segmentation demonstrate that the proposed framework can effectively recover the performance loss of the compact student model after aggressive pruning in most cases. However, in certain cases, knowledge transfer cannot outperform the conventional fine-tuning process on the pruned semantic segmentation architectures. Next, we propose Context-Aware Pruning (CAP) that utilizes channel association, which captures the contextual information, to exploit parameters redundancy for pruning semantic segmentation models. We evaluated our framework on widely-used benchmarks and showed its effectiveness on both large and lightweight models. Our framework reduces the number of parameters of state-of-the-art semantic segmentation models PSPNet101, PSPNet-50, ICNet, and SegNet, by 32%, 47%, 54%, and 63% respectively on the Cityscapes dataset. This reduction is achieved while preserving the best performance among all the baselines pruning methods considered. Finally, we propose Adaptive Correlation-driven Sparsity Learning (ACSL) for DCNN compression that can provide superior performance on both image-level and pixel-level vision tasks. ACSL extends CAP by inducing sparsity into the channel importance with an adaptive penalty strength. The experimental results demonstrate that ACSL outperforms state-of-the-art pruning methods on image-level classification, semantic segmentation, and dense crowd counting tasks. In particular for the crowd counting task, the proposed ACSL framework is able to reduce the DCNN model parameters by up to 94%, while maintaining the same performance of (at times outperforming) the original model.
author2 Lam Siew Kei
author_facet Lam Siew Kei
He, Wei
format Thesis-Master by Research
author He, Wei
author_sort He, Wei
title Deep neural network compression for pixel-level vision tasks
title_short Deep neural network compression for pixel-level vision tasks
title_full Deep neural network compression for pixel-level vision tasks
title_fullStr Deep neural network compression for pixel-level vision tasks
title_full_unstemmed Deep neural network compression for pixel-level vision tasks
title_sort deep neural network compression for pixel-level vision tasks
publisher Nanyang Technological University
publishDate 2021
url https://hdl.handle.net/10356/150076
_version_ 1705151279906422784