Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices

As the use of pervasive devices expands into complex collaborative tasks such as cognitive assistants and interactive AR/VR companions, they are equipped with a myriad of sensors facilitating natural interactions, such as voice commands. Spatio-Temporal Video Grounding (STVG), the task of identifyin...

Full description

Saved in:
Bibliographic Details
Main Authors: WEERAKOON MUDIYANSELAGE, Dulanga Kaveesha, SUBBARAJU, Vigneshwaran, LIM, Joo Hwee, Misra, Archan
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9219
https://ink.library.smu.edu.sg/context/sis_research/article/10177/viewcontent/Mobisys2024_ProfilingEventVision_PosterPaper_cameraready.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-10177
record_format dspace
spelling sg-smu-ink.sis_research-101772024-08-27T02:39:40Z Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices WEERAKOON MUDIYANSELAGE, Dulanga Kaveesha SUBBARAJU, Vigneshwaran LIM, Joo Hwee Misra, Archan As the use of pervasive devices expands into complex collaborative tasks such as cognitive assistants and interactive AR/VR companions, they are equipped with a myriad of sensors facilitating natural interactions, such as voice commands. Spatio-Temporal Video Grounding (STVG), the task of identifying the target object in the field-of-view referred to in a language instruction, is a key capability needed for such systems. However, current STVG models tend to be resource-intensive, relying on multiple cross-attentional transformers applied to each video frame. This results in runtime complexity that increases linearly with video length. Furthermore, deploying these models on mobile devices while maintaining a low-latency poses additional challenges. Hence, this paper explores the latency and energy requirements for implementing STVG models on a pervasive device. 2024-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9219 info:doi/10.1145/3643832.3661402 https://ink.library.smu.edu.sg/context/sis_research/article/10177/viewcontent/Mobisys2024_ProfilingEventVision_PosterPaper_cameraready.pdf http://creativecommons.org/licenses/by/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Human-AI Collaboration Spatio-Temporal Video Grounding Computer Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Human-AI Collaboration
Spatio-Temporal Video Grounding
Computer Engineering
spellingShingle Human-AI Collaboration
Spatio-Temporal Video Grounding
Computer Engineering
WEERAKOON MUDIYANSELAGE, Dulanga Kaveesha
SUBBARAJU, Vigneshwaran
LIM, Joo Hwee
Misra, Archan
Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices
description As the use of pervasive devices expands into complex collaborative tasks such as cognitive assistants and interactive AR/VR companions, they are equipped with a myriad of sensors facilitating natural interactions, such as voice commands. Spatio-Temporal Video Grounding (STVG), the task of identifying the target object in the field-of-view referred to in a language instruction, is a key capability needed for such systems. However, current STVG models tend to be resource-intensive, relying on multiple cross-attentional transformers applied to each video frame. This results in runtime complexity that increases linearly with video length. Furthermore, deploying these models on mobile devices while maintaining a low-latency poses additional challenges. Hence, this paper explores the latency and energy requirements for implementing STVG models on a pervasive device.
format text
author WEERAKOON MUDIYANSELAGE, Dulanga Kaveesha
SUBBARAJU, Vigneshwaran
LIM, Joo Hwee
Misra, Archan
author_facet WEERAKOON MUDIYANSELAGE, Dulanga Kaveesha
SUBBARAJU, Vigneshwaran
LIM, Joo Hwee
Misra, Archan
author_sort WEERAKOON MUDIYANSELAGE, Dulanga Kaveesha
title Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices
title_short Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices
title_full Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices
title_fullStr Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices
title_full_unstemmed Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices
title_sort poster: towards efficient spatio-temporal video grounding in pervasive mobile devices
publisher Institutional Knowledge at Singapore Management University
publishDate 2024
url https://ink.library.smu.edu.sg/sis_research/9219
https://ink.library.smu.edu.sg/context/sis_research/article/10177/viewcontent/Mobisys2024_ProfilingEventVision_PosterPaper_cameraready.pdf
_version_ 1814047811488972800