Pruning-aware merging for efficient multitask inference
Many mobile applications demand selective execution of multiple correlated deep learning inference tasks on resource-constrained platforms. Given a set of deep neural networks, each pre-trained for a single task, it is desired that executing arbitrary combinations of tasks yields minimal computation...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2021
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/6804 https://ink.library.smu.edu.sg/context/sis_research/article/7807/viewcontent/kdd21_he.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-7807 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-78072022-01-27T08:31:09Z Pruning-aware merging for efficient multitask inference GAO, Dawei HE, Xiaoxi ZHOU, Zimu TONG, Yongxin THIELE, Lothar Many mobile applications demand selective execution of multiple correlated deep learning inference tasks on resource-constrained platforms. Given a set of deep neural networks, each pre-trained for a single task, it is desired that executing arbitrary combinations of tasks yields minimal computation cost. Pruning each network separately yields suboptimal computation cost due to task relatedness. A promising remedy is to merge the networks into a multitask network to eliminate redundancy across tasks before network pruning. However, pruning a multitask network combined by existing network merging schemes cannot minimise the computation cost of every task combination because they do not consider such a future pruning. To this end, we theoretically identify the conditions such that pruning a multitask network minimises the computation of all task combinations. On this basis, we propose Pruning-Aware Merging (PAM), a heuristic network merging scheme to construct a multitask network that approximates these conditions. The merged network is then ready to be further pruned by existing network pruning methods. Evaluations with different pruning schemes, datasets, and network architectures show that PAM achieves up to 4.87× less computation against the baseline without network merging, and up to 2.01× less computation against the baseline with a state-of-the-art network merging scheme 2021-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6804 info:doi/10.1145/3447548.3467271 https://ink.library.smu.edu.sg/context/sis_research/article/7807/viewcontent/kdd21_he.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Deep learning Network pruning Multitask inference Software Engineering |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Deep learning Network pruning Multitask inference Software Engineering |
spellingShingle |
Deep learning Network pruning Multitask inference Software Engineering GAO, Dawei HE, Xiaoxi ZHOU, Zimu TONG, Yongxin THIELE, Lothar Pruning-aware merging for efficient multitask inference |
description |
Many mobile applications demand selective execution of multiple correlated deep learning inference tasks on resource-constrained platforms. Given a set of deep neural networks, each pre-trained for a single task, it is desired that executing arbitrary combinations of tasks yields minimal computation cost. Pruning each network separately yields suboptimal computation cost due to task relatedness. A promising remedy is to merge the networks into a multitask network to eliminate redundancy across tasks before network pruning. However, pruning a multitask network combined by existing network merging schemes cannot minimise the computation cost of every task combination because they do not consider such a future pruning. To this end, we theoretically identify the conditions such that pruning a multitask network minimises the computation of all task combinations. On this basis, we propose Pruning-Aware Merging (PAM), a heuristic network merging scheme to construct a multitask network that approximates these conditions. The merged network is then ready to be further pruned by existing network pruning methods. Evaluations with different pruning schemes, datasets, and network architectures show that PAM achieves up to 4.87× less computation against the baseline without network merging, and up to 2.01× less computation against the baseline with a state-of-the-art network merging scheme |
format |
text |
author |
GAO, Dawei HE, Xiaoxi ZHOU, Zimu TONG, Yongxin THIELE, Lothar |
author_facet |
GAO, Dawei HE, Xiaoxi ZHOU, Zimu TONG, Yongxin THIELE, Lothar |
author_sort |
GAO, Dawei |
title |
Pruning-aware merging for efficient multitask inference |
title_short |
Pruning-aware merging for efficient multitask inference |
title_full |
Pruning-aware merging for efficient multitask inference |
title_fullStr |
Pruning-aware merging for efficient multitask inference |
title_full_unstemmed |
Pruning-aware merging for efficient multitask inference |
title_sort |
pruning-aware merging for efficient multitask inference |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2021 |
url |
https://ink.library.smu.edu.sg/sis_research/6804 https://ink.library.smu.edu.sg/context/sis_research/article/7807/viewcontent/kdd21_he.pdf |
_version_ |
1770576072064106496 |