Deep neural network compression for pixel-level vision tasks

Deep convolutional neural networks (DCNNs) have demonstrated remarkable performance in many computer vision tasks. In order to achieve this, DCNNs typically require a large number of trainable parameters that are optimized to extract informative features. This often results in over-parameterization...

全面介紹

Saved in:

書目詳細資料
主要作者:	He, Wei
其他作者:	Lam Siew Kei
格式:	Thesis-Master by Research
語言:	English
出版:	Nanyang Technological University 2021
主題:	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
在線閱讀:	https://hdl.handle.net/10356/150076
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!
機構:	Nanyang Technological University
語言:	English

實物特徵
總結:	Deep convolutional neural networks (DCNNs) have demonstrated remarkable performance in many computer vision tasks. In order to achieve this, DCNNs typically require a large number of trainable parameters that are optimized to extract informative features. This often results in over-parameterization of the DCNN models, which incurs high computational complexity and large storage requirements that hinder their deployment on embedded devices with stringent computational and memory resources. In this thesis, we aim to develop DCNN compression methods to generate compact DCNN models that can still produce comparable performance as the original models. In particular, our proposed methods must lend themselves well towards DCNN models for pixel-level vision tasks (such as semantic segmentation and crowd counting). DCNN compression for pixel-level vision tasks has not been thoroughly investigated, as existing works mainly target the less challenging image-level classification task. We first present a framework that utilizes knowledge distillation to recover the performance loss of DCNN models that have undergone network pruning. This departs from the existing knowledge distillation approaches, where the student model and teacher model are pre-defined before knowledge adaptation. Experiments on the encoder-decoder type models for semantic segmentation demonstrate that the proposed framework can effectively recover the performance loss of the compact student model after aggressive pruning in most cases. However, in certain cases, knowledge transfer cannot outperform the conventional fine-tuning process on the pruned semantic segmentation architectures. Next, we propose Context-Aware Pruning (CAP) that utilizes channel association, which captures the contextual information, to exploit parameters redundancy for pruning semantic segmentation models. We evaluated our framework on widely-used benchmarks and showed its effectiveness on both large and lightweight models. Our framework reduces the number of parameters of state-of-the-art semantic segmentation models PSPNet101, PSPNet-50, ICNet, and SegNet, by 32%, 47%, 54%, and 63% respectively on the Cityscapes dataset. This reduction is achieved while preserving the best performance among all the baselines pruning methods considered. Finally, we propose Adaptive Correlation-driven Sparsity Learning (ACSL) for DCNN compression that can provide superior performance on both image-level and pixel-level vision tasks. ACSL extends CAP by inducing sparsity into the channel importance with an adaptive penalty strength. The experimental results demonstrate that ACSL outperforms state-of-the-art pruning methods on image-level classification, semantic segmentation, and dense crowd counting tasks. In particular for the crowd counting task, the proposed ACSL framework is able to reduce the DCNN model parameters by up to 94%, while maintaining the same performance of (at times outperforming) the original model.

Deep neural network compression for pixel-level vision tasks

相似書籍