SoftSkip: Empowering multi-modal dynamic pruning for single-stage referring comprehension
Supporting real-time referring expression comprehension (REC) on pervasive devices is an important capability for human-AI collaborative tasks. Model pruning techniques, applied to DNN models, can enable real-time execution even on resource-constrained devices. However, existing pruning strategies a...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2022
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/7707 https://ink.library.smu.edu.sg/context/sis_research/article/8710/viewcontent/multimedia_final.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-8710 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-87102023-04-04T02:10:21Z SoftSkip: Empowering multi-modal dynamic pruning for single-stage referring comprehension WEERAKOON, Dulanga SUBBARAJU, Vigneshwaran TRAN, Tuan MISRA, Archan Supporting real-time referring expression comprehension (REC) on pervasive devices is an important capability for human-AI collaborative tasks. Model pruning techniques, applied to DNN models, can enable real-time execution even on resource-constrained devices. However, existing pruning strategies are designed principally for uni-modal applications, and suffer a significant loss of accuracy when applied to REC tasks that require fusion of textual and visual inputs. We thus present a multi-modal pruning model, LGMDP, which uses language as a pivot to dynamically and judiciously select the relevant computational blocks that need to be executed. LGMDP also introduces a new SoftSkip mechanism, whereby 'skipped' visual scales are not completely eliminated but approximated with minimal additional computation. Experimental evaluation, using 3 benchmark REC datasets and an embedded device implementation, shows that LGMDP can achieve 33% latency savings, with an accuracy loss 0.5% - 2%. 2022-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7707 info:doi/10.1145/3503161.3548432 https://ink.library.smu.edu.sg/context/sis_research/article/8710/viewcontent/multimedia_final.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Human-Robot Interaction Referring Expression Comprehension Pruning Computer Vision Natural Language Processing Computer Engineering Software Engineering |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Human-Robot Interaction Referring Expression Comprehension Pruning Computer Vision Natural Language Processing Computer Engineering Software Engineering |
spellingShingle |
Human-Robot Interaction Referring Expression Comprehension Pruning Computer Vision Natural Language Processing Computer Engineering Software Engineering WEERAKOON, Dulanga SUBBARAJU, Vigneshwaran TRAN, Tuan MISRA, Archan SoftSkip: Empowering multi-modal dynamic pruning for single-stage referring comprehension |
description |
Supporting real-time referring expression comprehension (REC) on pervasive devices is an important capability for human-AI collaborative tasks. Model pruning techniques, applied to DNN models, can enable real-time execution even on resource-constrained devices. However, existing pruning strategies are designed principally for uni-modal applications, and suffer a significant loss of accuracy when applied to REC tasks that require fusion of textual and visual inputs. We thus present a multi-modal pruning model, LGMDP, which uses language as a pivot to dynamically and judiciously select the relevant computational blocks that need to be executed. LGMDP also introduces a new SoftSkip mechanism, whereby 'skipped' visual scales are not completely eliminated but approximated with minimal additional computation. Experimental evaluation, using 3 benchmark REC datasets and an embedded device implementation, shows that LGMDP can achieve 33% latency savings, with an accuracy loss 0.5% - 2%. |
format |
text |
author |
WEERAKOON, Dulanga SUBBARAJU, Vigneshwaran TRAN, Tuan MISRA, Archan |
author_facet |
WEERAKOON, Dulanga SUBBARAJU, Vigneshwaran TRAN, Tuan MISRA, Archan |
author_sort |
WEERAKOON, Dulanga |
title |
SoftSkip: Empowering multi-modal dynamic pruning for single-stage referring comprehension |
title_short |
SoftSkip: Empowering multi-modal dynamic pruning for single-stage referring comprehension |
title_full |
SoftSkip: Empowering multi-modal dynamic pruning for single-stage referring comprehension |
title_fullStr |
SoftSkip: Empowering multi-modal dynamic pruning for single-stage referring comprehension |
title_full_unstemmed |
SoftSkip: Empowering multi-modal dynamic pruning for single-stage referring comprehension |
title_sort |
softskip: empowering multi-modal dynamic pruning for single-stage referring comprehension |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2022 |
url |
https://ink.library.smu.edu.sg/sis_research/7707 https://ink.library.smu.edu.sg/context/sis_research/article/8710/viewcontent/multimedia_final.pdf |
_version_ |
1770576418347941888 |