SoftSkip: Empowering multi-modal dynamic pruning for single-stage referring comprehension

Supporting real-time referring expression comprehension (REC) on pervasive devices is an important capability for human-AI collaborative tasks. Model pruning techniques, applied to DNN models, can enable real-time execution even on resource-constrained devices. However, existing pruning strategies a...

全面介紹

Saved in:
書目詳細資料
Main Authors: WEERAKOON, Dulanga, SUBBARAJU, Vigneshwaran, TRAN, Tuan, MISRA, Archan
格式: text
語言:English
出版: Institutional Knowledge at Singapore Management University 2022
主題:
在線閱讀:https://ink.library.smu.edu.sg/sis_research/7707
https://ink.library.smu.edu.sg/context/sis_research/article/8710/viewcontent/multimedia_final.pdf
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Singapore Management University
語言: English
id sg-smu-ink.sis_research-8710
record_format dspace
spelling sg-smu-ink.sis_research-87102023-04-04T02:10:21Z SoftSkip: Empowering multi-modal dynamic pruning for single-stage referring comprehension WEERAKOON, Dulanga SUBBARAJU, Vigneshwaran TRAN, Tuan MISRA, Archan Supporting real-time referring expression comprehension (REC) on pervasive devices is an important capability for human-AI collaborative tasks. Model pruning techniques, applied to DNN models, can enable real-time execution even on resource-constrained devices. However, existing pruning strategies are designed principally for uni-modal applications, and suffer a significant loss of accuracy when applied to REC tasks that require fusion of textual and visual inputs. We thus present a multi-modal pruning model, LGMDP, which uses language as a pivot to dynamically and judiciously select the relevant computational blocks that need to be executed. LGMDP also introduces a new SoftSkip mechanism, whereby 'skipped' visual scales are not completely eliminated but approximated with minimal additional computation. Experimental evaluation, using 3 benchmark REC datasets and an embedded device implementation, shows that LGMDP can achieve 33% latency savings, with an accuracy loss 0.5% - 2%. 2022-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7707 info:doi/10.1145/3503161.3548432 https://ink.library.smu.edu.sg/context/sis_research/article/8710/viewcontent/multimedia_final.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Human-Robot Interaction Referring Expression Comprehension Pruning Computer Vision Natural Language Processing Computer Engineering Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Human-Robot Interaction
Referring Expression Comprehension
Pruning
Computer Vision
Natural Language Processing
Computer Engineering
Software Engineering
spellingShingle Human-Robot Interaction
Referring Expression Comprehension
Pruning
Computer Vision
Natural Language Processing
Computer Engineering
Software Engineering
WEERAKOON, Dulanga
SUBBARAJU, Vigneshwaran
TRAN, Tuan
MISRA, Archan
SoftSkip: Empowering multi-modal dynamic pruning for single-stage referring comprehension
description Supporting real-time referring expression comprehension (REC) on pervasive devices is an important capability for human-AI collaborative tasks. Model pruning techniques, applied to DNN models, can enable real-time execution even on resource-constrained devices. However, existing pruning strategies are designed principally for uni-modal applications, and suffer a significant loss of accuracy when applied to REC tasks that require fusion of textual and visual inputs. We thus present a multi-modal pruning model, LGMDP, which uses language as a pivot to dynamically and judiciously select the relevant computational blocks that need to be executed. LGMDP also introduces a new SoftSkip mechanism, whereby 'skipped' visual scales are not completely eliminated but approximated with minimal additional computation. Experimental evaluation, using 3 benchmark REC datasets and an embedded device implementation, shows that LGMDP can achieve 33% latency savings, with an accuracy loss 0.5% - 2%.
format text
author WEERAKOON, Dulanga
SUBBARAJU, Vigneshwaran
TRAN, Tuan
MISRA, Archan
author_facet WEERAKOON, Dulanga
SUBBARAJU, Vigneshwaran
TRAN, Tuan
MISRA, Archan
author_sort WEERAKOON, Dulanga
title SoftSkip: Empowering multi-modal dynamic pruning for single-stage referring comprehension
title_short SoftSkip: Empowering multi-modal dynamic pruning for single-stage referring comprehension
title_full SoftSkip: Empowering multi-modal dynamic pruning for single-stage referring comprehension
title_fullStr SoftSkip: Empowering multi-modal dynamic pruning for single-stage referring comprehension
title_full_unstemmed SoftSkip: Empowering multi-modal dynamic pruning for single-stage referring comprehension
title_sort softskip: empowering multi-modal dynamic pruning for single-stage referring comprehension
publisher Institutional Knowledge at Singapore Management University
publishDate 2022
url https://ink.library.smu.edu.sg/sis_research/7707
https://ink.library.smu.edu.sg/context/sis_research/article/8710/viewcontent/multimedia_final.pdf
_version_ 1770576418347941888