Resource efficient neural networks through Hessian based pruning

Neural network pruning is a practical way for reducing the size of trained models and the number of floating-point operations (FLOPs). One way of pruning is to use the relative Hessian trace to calculate sensitivity of each channel, as compared to the more common magnitude pruning approach. However,...

Full description

Saved in:

Bibliographic Details
Main Author:	Chong, Jack Huai Jie
Other Authors:	Lihui Chen
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2023
Subjects:	Engineering::Electrical and electronic engineering
Online Access:	https://hdl.handle.net/10356/167151
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-167151
record_format	dspace
spelling	sg-ntu-dr.10356-1671512023-07-07T18:06:44Z Resource efficient neural networks through Hessian based pruning Chong, Jack Huai Jie Lihui Chen School of Electrical and Electronic Engineering A*STAR Institute for Infocomm Research ELHCHEN@ntu.edu.sg Engineering::Electrical and electronic engineering Neural network pruning is a practical way for reducing the size of trained models and the number of floating-point operations (FLOPs). One way of pruning is to use the relative Hessian trace to calculate sensitivity of each channel, as compared to the more common magnitude pruning approach. However, the stochastic approach used to estimate the Hessian trace needs to iterate over many times before it can converge. This can be time-consuming when used for larger models with many millions of parameters. To address this problem, we modify the existing approach by estimating the Hessian trace using FP16 precision instead of FP32. We test the modified approach (EHAP) on ResNet-32/ResNet-56/WideResNet-28-8 trained on CIFAR10/CIFAR100 image classification tasks and achieve faster computation of the Hessian trace. Specifically, our modified approach can achieve speed ups ranging from 17% to as much as 44% during our experiments on different combinations of model architectures and GPU devices. Our modified approach also takes up ∼40% less GPU memory when pruning ResNet-32 and ResNet-56 models, which allows for a larger Hessian batch size to be used for estimating the Hessian trace. Meanwhile, we also present the results of pruning using both FP16 and FP32 Hessian trace calculation and show that there is no noticeable accuracy differences between the two. Overall, it is a simple and effective way to compute the relative Hessian trace faster without sacrificing on pruned model performance. We also present a full pipeline using EHAP and quantization aware training (QAT), using INT8 QAT to compress the network further after pruning. In particular, we use symmetric quantization for the weights and asymmetric quantization for the activations. Bachelor of Engineering (Electrical and Electronic Engineering) 2023-05-23T12:09:05Z 2023-05-23T12:09:05Z 2023 Final Year Project (FYP) Chong, J. H. J. (2023). Resource efficient neural networks through Hessian based pruning. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/167151 https://hdl.handle.net/10356/167151 en B3061-221 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering
spellingShingle	Engineering::Electrical and electronic engineering Chong, Jack Huai Jie Resource efficient neural networks through Hessian based pruning
description	Neural network pruning is a practical way for reducing the size of trained models and the number of floating-point operations (FLOPs). One way of pruning is to use the relative Hessian trace to calculate sensitivity of each channel, as compared to the more common magnitude pruning approach. However, the stochastic approach used to estimate the Hessian trace needs to iterate over many times before it can converge. This can be time-consuming when used for larger models with many millions of parameters. To address this problem, we modify the existing approach by estimating the Hessian trace using FP16 precision instead of FP32. We test the modified approach (EHAP) on ResNet-32/ResNet-56/WideResNet-28-8 trained on CIFAR10/CIFAR100 image classification tasks and achieve faster computation of the Hessian trace. Specifically, our modified approach can achieve speed ups ranging from 17% to as much as 44% during our experiments on different combinations of model architectures and GPU devices. Our modified approach also takes up ∼40% less GPU memory when pruning ResNet-32 and ResNet-56 models, which allows for a larger Hessian batch size to be used for estimating the Hessian trace. Meanwhile, we also present the results of pruning using both FP16 and FP32 Hessian trace calculation and show that there is no noticeable accuracy differences between the two. Overall, it is a simple and effective way to compute the relative Hessian trace faster without sacrificing on pruned model performance. We also present a full pipeline using EHAP and quantization aware training (QAT), using INT8 QAT to compress the network further after pruning. In particular, we use symmetric quantization for the weights and asymmetric quantization for the activations.
author2	Lihui Chen
author_facet	Lihui Chen Chong, Jack Huai Jie
format	Final Year Project
author	Chong, Jack Huai Jie
author_sort	Chong, Jack Huai Jie
title	Resource efficient neural networks through Hessian based pruning
title_short	Resource efficient neural networks through Hessian based pruning
title_full	Resource efficient neural networks through Hessian based pruning
title_fullStr	Resource efficient neural networks through Hessian based pruning
title_full_unstemmed	Resource efficient neural networks through Hessian based pruning
title_sort	resource efficient neural networks through hessian based pruning
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/167151
_version_	1772826956991037440

Resource efficient neural networks through Hessian based pruning

Similar Items