Evaluating the merits of ranking in structured network pruning

Pruning of channels in trained deep neural networks has been widely used to implement efficient DNNs that can be deployed on embedded/mobile devices. Majority of existing techniques employ criteria-based sorting of the channels to preserve salient channels during pruning as well as to automatically...

Full description

Saved in:
Bibliographic Details
Main Authors: Sharma, Kuldeep, Ramakrishnan, Nirmala, Prakash, Alok, Lam, Siew-Kei, Srikanthan, Thambipillai
Other Authors: School of Computer Science and Engineering
Format: Conference or Workshop Item
Language:English
Published: 2021
Subjects:
Online Access:https://hdl.handle.net/10356/147716
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Pruning of channels in trained deep neural networks has been widely used to implement efficient DNNs that can be deployed on embedded/mobile devices. Majority of existing techniques employ criteria-based sorting of the channels to preserve salient channels during pruning as well as to automatically determine the pruned network architecture. However, recent studies on widely used DNNs, such as VGG-16, have shown that selecting and preserving salient channels using pruning criteria is not necessary since the plasticity of the network allows the accuracy to be recovered through fine-tuning. In this work, we further explore the value of the ranking criteria in pruning to show that if channels are removed gradually and iteratively, alternating with fine-tuning on the target dataset, ranking criteria are indeed not necessary to select redundant channels. Experimental results confirm that even a random selection of channels for pruning leads to similar performance (accuracy). In addition, we demonstrate that even a simple pruning technique that uniformly removes channels from all layers in the network, performs similar to existing ranking criteria-based approaches, while leading to lower inference time (GFLOPs). Our extensive evaluations include the context of embedded implementations of DNNs - specifically, on small networks such as SqueezeNet and at aggressive pruning percentages. We leverage these insights, to propose a GFLOPs-aware iterative pruning strategy that does not rely on any ranking criteria and yet can further lead to lower inference time by 15% without sacrificing accuracy.