Rethinking pruning for accelerating deep inference at the edge

There is a growing trend to deploy deep neural networks at the edge for high-accuracy, real-time data mining and user interaction. Applications such as speech recognition and language understanding often apply a deep neural network to encode an input sequence and then use a decoder to generate the o...

Full description

Saved in:

Bibliographic Details
Main Authors:	GAO, Dawei, HE, Xiaoxi, ZHOU, Zimu, TONG, Yongxin, XU, Ke, THIELE, Lothar
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2020
Subjects:	Deep Learning Sequence Labelling Network Pruning Automatic Speech Recognition Name Entity Recognition Databases and Information Systems Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/5292 https://ink.library.smu.edu.sg/context/sis_research/article/6295/viewcontent/3394486.3403058.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-6295
record_format	dspace
spelling	sg-smu-ink.sis_research-62952021-05-24T02:42:28Z Rethinking pruning for accelerating deep inference at the edge GAO, Dawei HE, Xiaoxi ZHOU, Zimu TONG, Yongxin XU, Ke THIELE, Lothar There is a growing trend to deploy deep neural networks at the edge for high-accuracy, real-time data mining and user interaction. Applications such as speech recognition and language understanding often apply a deep neural network to encode an input sequence and then use a decoder to generate the output sequence. A promising technique to accelerate these applications on resource-constrained devices is network pruning, which compresses the size of the deep neural network without severe drop in inference accuracy. However, we observe that although existing network pruning algorithms prove effective to speed up the prior deep neural network, they lead to dramatic slowdown of the subsequent decoding and may not always reduce the overall latency of the entire application. To rectify such drawbacks, we propose entropy-based pruning, a new regularizer that can be seamlessly integrated into existing network pruning algorithms. Our key theoretical insight is that reducing the information entropy of the deep neural network outputs decreases the upper bound of the subsequent decoding search space. We validate our solution with two state-of-the-art network pruning algorithms on two model architectures. Experimental results show that compared with existing network pruning algorithms, our entropy-based pruning method notably suppresses and even eliminates the increase of decoding time, and achieves shorter overall latency with only negligible extra accuracy loss in the applications. 2020-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/5292 info:doi/10.1145/3394486.3403058 https://ink.library.smu.edu.sg/context/sis_research/article/6295/viewcontent/3394486.3403058.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Deep Learning Sequence Labelling Network Pruning Automatic Speech Recognition Name Entity Recognition Databases and Information Systems Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Deep Learning Sequence Labelling Network Pruning Automatic Speech Recognition Name Entity Recognition Databases and Information Systems Software Engineering
spellingShingle	Deep Learning Sequence Labelling Network Pruning Automatic Speech Recognition Name Entity Recognition Databases and Information Systems Software Engineering GAO, Dawei HE, Xiaoxi ZHOU, Zimu TONG, Yongxin XU, Ke THIELE, Lothar Rethinking pruning for accelerating deep inference at the edge
description	There is a growing trend to deploy deep neural networks at the edge for high-accuracy, real-time data mining and user interaction. Applications such as speech recognition and language understanding often apply a deep neural network to encode an input sequence and then use a decoder to generate the output sequence. A promising technique to accelerate these applications on resource-constrained devices is network pruning, which compresses the size of the deep neural network without severe drop in inference accuracy. However, we observe that although existing network pruning algorithms prove effective to speed up the prior deep neural network, they lead to dramatic slowdown of the subsequent decoding and may not always reduce the overall latency of the entire application. To rectify such drawbacks, we propose entropy-based pruning, a new regularizer that can be seamlessly integrated into existing network pruning algorithms. Our key theoretical insight is that reducing the information entropy of the deep neural network outputs decreases the upper bound of the subsequent decoding search space. We validate our solution with two state-of-the-art network pruning algorithms on two model architectures. Experimental results show that compared with existing network pruning algorithms, our entropy-based pruning method notably suppresses and even eliminates the increase of decoding time, and achieves shorter overall latency with only negligible extra accuracy loss in the applications.
format	text
author	GAO, Dawei HE, Xiaoxi ZHOU, Zimu TONG, Yongxin XU, Ke THIELE, Lothar
author_facet	GAO, Dawei HE, Xiaoxi ZHOU, Zimu TONG, Yongxin XU, Ke THIELE, Lothar
author_sort	GAO, Dawei
title	Rethinking pruning for accelerating deep inference at the edge
title_short	Rethinking pruning for accelerating deep inference at the edge
title_full	Rethinking pruning for accelerating deep inference at the edge
title_fullStr	Rethinking pruning for accelerating deep inference at the edge
title_full_unstemmed	Rethinking pruning for accelerating deep inference at the edge
title_sort	rethinking pruning for accelerating deep inference at the edge
publisher	Institutional Knowledge at Singapore Management University
publishDate	2020
url	https://ink.library.smu.edu.sg/sis_research/5292 https://ink.library.smu.edu.sg/context/sis_research/article/6295/viewcontent/3394486.3403058.pdf
_version_	1770575373454540800

Rethinking pruning for accelerating deep inference at the edge

Similar Items