Pruning-aware merging for efficient multitask inference

Many mobile applications demand selective execution of multiple correlated deep learning inference tasks on resource-constrained platforms. Given a set of deep neural networks, each pre-trained for a single task, it is desired that executing arbitrary combinations of tasks yields minimal computation...

Full description

Saved in:

Bibliographic Details
Main Authors:	GAO, Dawei, HE, Xiaoxi, ZHOU, Zimu, TONG, Yongxin, THIELE, Lothar
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2021
Subjects:	Deep learning Network pruning Multitask inference Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/6804 https://ink.library.smu.edu.sg/context/sis_research/article/7807/viewcontent/kdd21_he.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-7807
record_format	dspace
spelling	sg-smu-ink.sis_research-78072022-01-27T08:31:09Z Pruning-aware merging for efficient multitask inference GAO, Dawei HE, Xiaoxi ZHOU, Zimu TONG, Yongxin THIELE, Lothar Many mobile applications demand selective execution of multiple correlated deep learning inference tasks on resource-constrained platforms. Given a set of deep neural networks, each pre-trained for a single task, it is desired that executing arbitrary combinations of tasks yields minimal computation cost. Pruning each network separately yields suboptimal computation cost due to task relatedness. A promising remedy is to merge the networks into a multitask network to eliminate redundancy across tasks before network pruning. However, pruning a multitask network combined by existing network merging schemes cannot minimise the computation cost of every task combination because they do not consider such a future pruning. To this end, we theoretically identify the conditions such that pruning a multitask network minimises the computation of all task combinations. On this basis, we propose Pruning-Aware Merging (PAM), a heuristic network merging scheme to construct a multitask network that approximates these conditions. The merged network is then ready to be further pruned by existing network pruning methods. Evaluations with different pruning schemes, datasets, and network architectures show that PAM achieves up to 4.87× less computation against the baseline without network merging, and up to 2.01× less computation against the baseline with a state-of-the-art network merging scheme 2021-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6804 info:doi/10.1145/3447548.3467271 https://ink.library.smu.edu.sg/context/sis_research/article/7807/viewcontent/kdd21_he.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Deep learning Network pruning Multitask inference Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Deep learning Network pruning Multitask inference Software Engineering
spellingShingle	Deep learning Network pruning Multitask inference Software Engineering GAO, Dawei HE, Xiaoxi ZHOU, Zimu TONG, Yongxin THIELE, Lothar Pruning-aware merging for efficient multitask inference
description	Many mobile applications demand selective execution of multiple correlated deep learning inference tasks on resource-constrained platforms. Given a set of deep neural networks, each pre-trained for a single task, it is desired that executing arbitrary combinations of tasks yields minimal computation cost. Pruning each network separately yields suboptimal computation cost due to task relatedness. A promising remedy is to merge the networks into a multitask network to eliminate redundancy across tasks before network pruning. However, pruning a multitask network combined by existing network merging schemes cannot minimise the computation cost of every task combination because they do not consider such a future pruning. To this end, we theoretically identify the conditions such that pruning a multitask network minimises the computation of all task combinations. On this basis, we propose Pruning-Aware Merging (PAM), a heuristic network merging scheme to construct a multitask network that approximates these conditions. The merged network is then ready to be further pruned by existing network pruning methods. Evaluations with different pruning schemes, datasets, and network architectures show that PAM achieves up to 4.87× less computation against the baseline without network merging, and up to 2.01× less computation against the baseline with a state-of-the-art network merging scheme
format	text
author	GAO, Dawei HE, Xiaoxi ZHOU, Zimu TONG, Yongxin THIELE, Lothar
author_facet	GAO, Dawei HE, Xiaoxi ZHOU, Zimu TONG, Yongxin THIELE, Lothar
author_sort	GAO, Dawei
title	Pruning-aware merging for efficient multitask inference
title_short	Pruning-aware merging for efficient multitask inference
title_full	Pruning-aware merging for efficient multitask inference
title_fullStr	Pruning-aware merging for efficient multitask inference
title_full_unstemmed	Pruning-aware merging for efficient multitask inference
title_sort	pruning-aware merging for efficient multitask inference
publisher	Institutional Knowledge at Singapore Management University
publishDate	2021
url	https://ink.library.smu.edu.sg/sis_research/6804 https://ink.library.smu.edu.sg/context/sis_research/article/7807/viewcontent/kdd21_he.pdf
_version_	1770576072064106496

Pruning-aware merging for efficient multitask inference

Similar Items