Pruning-aware merging for efficient multitask inference

Many mobile applications demand selective execution of multiple correlated deep learning inference tasks on resource-constrained platforms. Given a set of deep neural networks, each pre-trained for a single task, it is desired that executing arbitrary combinations of tasks yields minimal computation...

Full description

Saved in:
Bibliographic Details
Main Authors: GAO, Dawei, HE, Xiaoxi, ZHOU, Zimu, TONG, Yongxin, THIELE, Lothar
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2021
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/6804
https://ink.library.smu.edu.sg/context/sis_research/article/7807/viewcontent/kdd21_he.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-7807
record_format dspace
spelling sg-smu-ink.sis_research-78072022-01-27T08:31:09Z Pruning-aware merging for efficient multitask inference GAO, Dawei HE, Xiaoxi ZHOU, Zimu TONG, Yongxin THIELE, Lothar Many mobile applications demand selective execution of multiple correlated deep learning inference tasks on resource-constrained platforms. Given a set of deep neural networks, each pre-trained for a single task, it is desired that executing arbitrary combinations of tasks yields minimal computation cost. Pruning each network separately yields suboptimal computation cost due to task relatedness. A promising remedy is to merge the networks into a multitask network to eliminate redundancy across tasks before network pruning. However, pruning a multitask network combined by existing network merging schemes cannot minimise the computation cost of every task combination because they do not consider such a future pruning. To this end, we theoretically identify the conditions such that pruning a multitask network minimises the computation of all task combinations. On this basis, we propose Pruning-Aware Merging (PAM), a heuristic network merging scheme to construct a multitask network that approximates these conditions. The merged network is then ready to be further pruned by existing network pruning methods. Evaluations with different pruning schemes, datasets, and network architectures show that PAM achieves up to 4.87× less computation against the baseline without network merging, and up to 2.01× less computation against the baseline with a state-of-the-art network merging scheme 2021-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6804 info:doi/10.1145/3447548.3467271 https://ink.library.smu.edu.sg/context/sis_research/article/7807/viewcontent/kdd21_he.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Deep learning Network pruning Multitask inference Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Deep learning
Network pruning
Multitask inference
Software Engineering
spellingShingle Deep learning
Network pruning
Multitask inference
Software Engineering
GAO, Dawei
HE, Xiaoxi
ZHOU, Zimu
TONG, Yongxin
THIELE, Lothar
Pruning-aware merging for efficient multitask inference
description Many mobile applications demand selective execution of multiple correlated deep learning inference tasks on resource-constrained platforms. Given a set of deep neural networks, each pre-trained for a single task, it is desired that executing arbitrary combinations of tasks yields minimal computation cost. Pruning each network separately yields suboptimal computation cost due to task relatedness. A promising remedy is to merge the networks into a multitask network to eliminate redundancy across tasks before network pruning. However, pruning a multitask network combined by existing network merging schemes cannot minimise the computation cost of every task combination because they do not consider such a future pruning. To this end, we theoretically identify the conditions such that pruning a multitask network minimises the computation of all task combinations. On this basis, we propose Pruning-Aware Merging (PAM), a heuristic network merging scheme to construct a multitask network that approximates these conditions. The merged network is then ready to be further pruned by existing network pruning methods. Evaluations with different pruning schemes, datasets, and network architectures show that PAM achieves up to 4.87× less computation against the baseline without network merging, and up to 2.01× less computation against the baseline with a state-of-the-art network merging scheme
format text
author GAO, Dawei
HE, Xiaoxi
ZHOU, Zimu
TONG, Yongxin
THIELE, Lothar
author_facet GAO, Dawei
HE, Xiaoxi
ZHOU, Zimu
TONG, Yongxin
THIELE, Lothar
author_sort GAO, Dawei
title Pruning-aware merging for efficient multitask inference
title_short Pruning-aware merging for efficient multitask inference
title_full Pruning-aware merging for efficient multitask inference
title_fullStr Pruning-aware merging for efficient multitask inference
title_full_unstemmed Pruning-aware merging for efficient multitask inference
title_sort pruning-aware merging for efficient multitask inference
publisher Institutional Knowledge at Singapore Management University
publishDate 2021
url https://ink.library.smu.edu.sg/sis_research/6804
https://ink.library.smu.edu.sg/context/sis_research/article/7807/viewcontent/kdd21_he.pdf
_version_ 1770576072064106496