Deep neural network compression for pixel-level vision tasks

Deep convolutional neural networks (DCNNs) have demonstrated remarkable performance in many computer vision tasks. In order to achieve this, DCNNs typically require a large number of trainable parameters that are optimized to extract informative features. This often results in over-parameterization...

Full description

Saved in:
Bibliographic Details
Main Author: He, Wei
Other Authors: Lam Siew Kei
Format: Thesis-Master by Research
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/150076
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Deep convolutional neural networks (DCNNs) have demonstrated remarkable performance in many computer vision tasks. In order to achieve this, DCNNs typically require a large number of trainable parameters that are optimized to extract informative features. This often results in over-parameterization of the DCNN models, which incurs high computational complexity and large storage requirements that hinder their deployment on embedded devices with stringent computational and memory resources. In this thesis, we aim to develop DCNN compression methods to generate compact DCNN models that can still produce comparable performance as the original models. In particular, our proposed methods must lend themselves well towards DCNN models for pixel-level vision tasks (such as semantic segmentation and crowd counting). DCNN compression for pixel-level vision tasks has not been thoroughly investigated, as existing works mainly target the less challenging image-level classification task. We first present a framework that utilizes knowledge distillation to recover the performance loss of DCNN models that have undergone network pruning. This departs from the existing knowledge distillation approaches, where the student model and teacher model are pre-defined before knowledge adaptation. Experiments on the encoder-decoder type models for semantic segmentation demonstrate that the proposed framework can effectively recover the performance loss of the compact student model after aggressive pruning in most cases. However, in certain cases, knowledge transfer cannot outperform the conventional fine-tuning process on the pruned semantic segmentation architectures. Next, we propose Context-Aware Pruning (CAP) that utilizes channel association, which captures the contextual information, to exploit parameters redundancy for pruning semantic segmentation models. We evaluated our framework on widely-used benchmarks and showed its effectiveness on both large and lightweight models. Our framework reduces the number of parameters of state-of-the-art semantic segmentation models PSPNet101, PSPNet-50, ICNet, and SegNet, by 32%, 47%, 54%, and 63% respectively on the Cityscapes dataset. This reduction is achieved while preserving the best performance among all the baselines pruning methods considered. Finally, we propose Adaptive Correlation-driven Sparsity Learning (ACSL) for DCNN compression that can provide superior performance on both image-level and pixel-level vision tasks. ACSL extends CAP by inducing sparsity into the channel importance with an adaptive penalty strength. The experimental results demonstrate that ACSL outperforms state-of-the-art pruning methods on image-level classification, semantic segmentation, and dense crowd counting tasks. In particular for the crowd counting task, the proposed ACSL framework is able to reduce the DCNN model parameters by up to 94%, while maintaining the same performance of (at times outperforming) the original model.