Resource efficient neural networks through Hessian based pruning
Neural network pruning is a practical way for reducing the size of trained models and the number of floating-point operations (FLOPs). One way of pruning is to use the relative Hessian trace to calculate sensitivity of each channel, as compared to the more common magnitude pruning approach. However,...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/167151 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-167151 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1671512023-07-07T18:06:44Z Resource efficient neural networks through Hessian based pruning Chong, Jack Huai Jie Lihui Chen School of Electrical and Electronic Engineering A*STAR Institute for Infocomm Research ELHCHEN@ntu.edu.sg Engineering::Electrical and electronic engineering Neural network pruning is a practical way for reducing the size of trained models and the number of floating-point operations (FLOPs). One way of pruning is to use the relative Hessian trace to calculate sensitivity of each channel, as compared to the more common magnitude pruning approach. However, the stochastic approach used to estimate the Hessian trace needs to iterate over many times before it can converge. This can be time-consuming when used for larger models with many millions of parameters. To address this problem, we modify the existing approach by estimating the Hessian trace using FP16 precision instead of FP32. We test the modified approach (EHAP) on ResNet-32/ResNet-56/WideResNet-28-8 trained on CIFAR10/CIFAR100 image classification tasks and achieve faster computation of the Hessian trace. Specifically, our modified approach can achieve speed ups ranging from 17% to as much as 44% during our experiments on different combinations of model architectures and GPU devices. Our modified approach also takes up ∼40% less GPU memory when pruning ResNet-32 and ResNet-56 models, which allows for a larger Hessian batch size to be used for estimating the Hessian trace. Meanwhile, we also present the results of pruning using both FP16 and FP32 Hessian trace calculation and show that there is no noticeable accuracy differences between the two. Overall, it is a simple and effective way to compute the relative Hessian trace faster without sacrificing on pruned model performance. We also present a full pipeline using EHAP and quantization aware training (QAT), using INT8 QAT to compress the network further after pruning. In particular, we use symmetric quantization for the weights and asymmetric quantization for the activations. Bachelor of Engineering (Electrical and Electronic Engineering) 2023-05-23T12:09:05Z 2023-05-23T12:09:05Z 2023 Final Year Project (FYP) Chong, J. H. J. (2023). Resource efficient neural networks through Hessian based pruning. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/167151 https://hdl.handle.net/10356/167151 en B3061-221 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Electrical and electronic engineering |
spellingShingle |
Engineering::Electrical and electronic engineering Chong, Jack Huai Jie Resource efficient neural networks through Hessian based pruning |
description |
Neural network pruning is a practical way for reducing the size of trained models and the number of floating-point operations (FLOPs). One way of pruning is to use the relative Hessian trace to calculate sensitivity of each channel, as compared to the more common magnitude pruning approach. However, the stochastic approach used to estimate the Hessian trace needs to iterate over many times before it can converge. This can be time-consuming when used for larger models with many millions of parameters. To address this problem, we modify the existing approach by estimating the Hessian trace using FP16 precision instead of FP32. We test the modified approach (EHAP) on ResNet-32/ResNet-56/WideResNet-28-8 trained on CIFAR10/CIFAR100 image classification tasks and achieve faster computation of the Hessian trace. Specifically, our modified approach can achieve speed ups ranging from 17% to as much as 44% during our experiments on different combinations of model architectures and
GPU devices. Our modified approach also takes up ∼40% less GPU memory when pruning ResNet-32 and ResNet-56 models, which allows for a larger Hessian batch size to be used for estimating the Hessian trace. Meanwhile, we also present the results of pruning using both FP16 and FP32 Hessian trace calculation and show that there is no noticeable accuracy differences between the two. Overall, it is a simple and effective way to compute the relative Hessian trace faster without sacrificing on pruned model performance. We also present a full pipeline using EHAP and quantization aware training (QAT), using INT8 QAT to compress the network further after pruning. In particular, we use symmetric quantization for the weights and asymmetric quantization for the activations. |
author2 |
Lihui Chen |
author_facet |
Lihui Chen Chong, Jack Huai Jie |
format |
Final Year Project |
author |
Chong, Jack Huai Jie |
author_sort |
Chong, Jack Huai Jie |
title |
Resource efficient neural networks through Hessian based pruning |
title_short |
Resource efficient neural networks through Hessian based pruning |
title_full |
Resource efficient neural networks through Hessian based pruning |
title_fullStr |
Resource efficient neural networks through Hessian based pruning |
title_full_unstemmed |
Resource efficient neural networks through Hessian based pruning |
title_sort |
resource efficient neural networks through hessian based pruning |
publisher |
Nanyang Technological University |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/167151 |
_version_ |
1772826956991037440 |