Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices
As the use of pervasive devices expands into complex collaborative tasks such as cognitive assistants and interactive AR/VR companions, they are equipped with a myriad of sensors facilitating natural interactions, such as voice commands. Spatio-Temporal Video Grounding (STVG), the task of identifyin...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2024
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/9219 https://ink.library.smu.edu.sg/context/sis_research/article/10177/viewcontent/Mobisys2024_ProfilingEventVision_PosterPaper_cameraready.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-10177 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-101772024-08-27T02:39:40Z Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices WEERAKOON MUDIYANSELAGE, Dulanga Kaveesha SUBBARAJU, Vigneshwaran LIM, Joo Hwee Misra, Archan As the use of pervasive devices expands into complex collaborative tasks such as cognitive assistants and interactive AR/VR companions, they are equipped with a myriad of sensors facilitating natural interactions, such as voice commands. Spatio-Temporal Video Grounding (STVG), the task of identifying the target object in the field-of-view referred to in a language instruction, is a key capability needed for such systems. However, current STVG models tend to be resource-intensive, relying on multiple cross-attentional transformers applied to each video frame. This results in runtime complexity that increases linearly with video length. Furthermore, deploying these models on mobile devices while maintaining a low-latency poses additional challenges. Hence, this paper explores the latency and energy requirements for implementing STVG models on a pervasive device. 2024-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9219 info:doi/10.1145/3643832.3661402 https://ink.library.smu.edu.sg/context/sis_research/article/10177/viewcontent/Mobisys2024_ProfilingEventVision_PosterPaper_cameraready.pdf http://creativecommons.org/licenses/by/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Human-AI Collaboration Spatio-Temporal Video Grounding Computer Engineering |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Human-AI Collaboration Spatio-Temporal Video Grounding Computer Engineering |
spellingShingle |
Human-AI Collaboration Spatio-Temporal Video Grounding Computer Engineering WEERAKOON MUDIYANSELAGE, Dulanga Kaveesha SUBBARAJU, Vigneshwaran LIM, Joo Hwee Misra, Archan Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices |
description |
As the use of pervasive devices expands into complex collaborative tasks such as cognitive assistants and interactive AR/VR companions, they are equipped with a myriad of sensors facilitating natural interactions, such as voice commands. Spatio-Temporal Video Grounding (STVG), the task of identifying the target object in the field-of-view referred to in a language instruction, is a key capability needed for such systems. However, current STVG models tend to be resource-intensive, relying on multiple cross-attentional transformers applied to each video frame. This results in runtime complexity that increases linearly with video length. Furthermore, deploying these models on mobile devices while maintaining a low-latency poses additional challenges. Hence, this paper explores the latency and energy requirements for implementing STVG models on a pervasive device. |
format |
text |
author |
WEERAKOON MUDIYANSELAGE, Dulanga Kaveesha SUBBARAJU, Vigneshwaran LIM, Joo Hwee Misra, Archan |
author_facet |
WEERAKOON MUDIYANSELAGE, Dulanga Kaveesha SUBBARAJU, Vigneshwaran LIM, Joo Hwee Misra, Archan |
author_sort |
WEERAKOON MUDIYANSELAGE, Dulanga Kaveesha |
title |
Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices |
title_short |
Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices |
title_full |
Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices |
title_fullStr |
Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices |
title_full_unstemmed |
Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices |
title_sort |
poster: towards efficient spatio-temporal video grounding in pervasive mobile devices |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2024 |
url |
https://ink.library.smu.edu.sg/sis_research/9219 https://ink.library.smu.edu.sg/context/sis_research/article/10177/viewcontent/Mobisys2024_ProfilingEventVision_PosterPaper_cameraready.pdf |
_version_ |
1814047811488972800 |