Resource efficient neural networks through Hessian based pruning

Neural network pruning is a practical way for reducing the size of trained models and the number of floating-point operations (FLOPs). One way of pruning is to use the relative Hessian trace to calculate sensitivity of each channel, as compared to the more common magnitude pruning approach. However,...

Full description

Saved in:
Bibliographic Details
Main Author: Chong, Jack Huai Jie
Other Authors: Lihui Chen
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/167151
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-167151
record_format dspace
spelling sg-ntu-dr.10356-1671512023-07-07T18:06:44Z Resource efficient neural networks through Hessian based pruning Chong, Jack Huai Jie Lihui Chen School of Electrical and Electronic Engineering A*STAR Institute for Infocomm Research ELHCHEN@ntu.edu.sg Engineering::Electrical and electronic engineering Neural network pruning is a practical way for reducing the size of trained models and the number of floating-point operations (FLOPs). One way of pruning is to use the relative Hessian trace to calculate sensitivity of each channel, as compared to the more common magnitude pruning approach. However, the stochastic approach used to estimate the Hessian trace needs to iterate over many times before it can converge. This can be time-consuming when used for larger models with many millions of parameters. To address this problem, we modify the existing approach by estimating the Hessian trace using FP16 precision instead of FP32. We test the modified approach (EHAP) on ResNet-32/ResNet-56/WideResNet-28-8 trained on CIFAR10/CIFAR100 image classification tasks and achieve faster computation of the Hessian trace. Specifically, our modified approach can achieve speed ups ranging from 17% to as much as 44% during our experiments on different combinations of model architectures and GPU devices. Our modified approach also takes up ∼40% less GPU memory when pruning ResNet-32 and ResNet-56 models, which allows for a larger Hessian batch size to be used for estimating the Hessian trace. Meanwhile, we also present the results of pruning using both FP16 and FP32 Hessian trace calculation and show that there is no noticeable accuracy differences between the two. Overall, it is a simple and effective way to compute the relative Hessian trace faster without sacrificing on pruned model performance. We also present a full pipeline using EHAP and quantization aware training (QAT), using INT8 QAT to compress the network further after pruning. In particular, we use symmetric quantization for the weights and asymmetric quantization for the activations. Bachelor of Engineering (Electrical and Electronic Engineering) 2023-05-23T12:09:05Z 2023-05-23T12:09:05Z 2023 Final Year Project (FYP) Chong, J. H. J. (2023). Resource efficient neural networks through Hessian based pruning. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/167151 https://hdl.handle.net/10356/167151 en B3061-221 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering
spellingShingle Engineering::Electrical and electronic engineering
Chong, Jack Huai Jie
Resource efficient neural networks through Hessian based pruning
description Neural network pruning is a practical way for reducing the size of trained models and the number of floating-point operations (FLOPs). One way of pruning is to use the relative Hessian trace to calculate sensitivity of each channel, as compared to the more common magnitude pruning approach. However, the stochastic approach used to estimate the Hessian trace needs to iterate over many times before it can converge. This can be time-consuming when used for larger models with many millions of parameters. To address this problem, we modify the existing approach by estimating the Hessian trace using FP16 precision instead of FP32. We test the modified approach (EHAP) on ResNet-32/ResNet-56/WideResNet-28-8 trained on CIFAR10/CIFAR100 image classification tasks and achieve faster computation of the Hessian trace. Specifically, our modified approach can achieve speed ups ranging from 17% to as much as 44% during our experiments on different combinations of model architectures and GPU devices. Our modified approach also takes up ∼40% less GPU memory when pruning ResNet-32 and ResNet-56 models, which allows for a larger Hessian batch size to be used for estimating the Hessian trace. Meanwhile, we also present the results of pruning using both FP16 and FP32 Hessian trace calculation and show that there is no noticeable accuracy differences between the two. Overall, it is a simple and effective way to compute the relative Hessian trace faster without sacrificing on pruned model performance. We also present a full pipeline using EHAP and quantization aware training (QAT), using INT8 QAT to compress the network further after pruning. In particular, we use symmetric quantization for the weights and asymmetric quantization for the activations.
author2 Lihui Chen
author_facet Lihui Chen
Chong, Jack Huai Jie
format Final Year Project
author Chong, Jack Huai Jie
author_sort Chong, Jack Huai Jie
title Resource efficient neural networks through Hessian based pruning
title_short Resource efficient neural networks through Hessian based pruning
title_full Resource efficient neural networks through Hessian based pruning
title_fullStr Resource efficient neural networks through Hessian based pruning
title_full_unstemmed Resource efficient neural networks through Hessian based pruning
title_sort resource efficient neural networks through hessian based pruning
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/167151
_version_ 1772826956991037440