Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices

As the use of pervasive devices expands into complex collaborative tasks such as cognitive assistants and interactive AR/VR companions, they are equipped with a myriad of sensors facilitating natural interactions, such as voice commands. Spatio-Temporal Video Grounding (STVG), the task of identifyin...

Full description

Saved in:

Bibliographic Details
Main Authors:	WEERAKOON MUDIYANSELAGE, Dulanga Kaveesha, SUBBARAJU, Vigneshwaran, LIM, Joo Hwee, Misra, Archan
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Human-AI Collaboration Spatio-Temporal Video Grounding Computer Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/9219 https://ink.library.smu.edu.sg/context/sis_research/article/10177/viewcontent/Mobisys2024_ProfilingEventVision_PosterPaper_cameraready.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10177
record_format	dspace
spelling	sg-smu-ink.sis_research-101772024-08-27T02:39:40Z Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices WEERAKOON MUDIYANSELAGE, Dulanga Kaveesha SUBBARAJU, Vigneshwaran LIM, Joo Hwee Misra, Archan As the use of pervasive devices expands into complex collaborative tasks such as cognitive assistants and interactive AR/VR companions, they are equipped with a myriad of sensors facilitating natural interactions, such as voice commands. Spatio-Temporal Video Grounding (STVG), the task of identifying the target object in the field-of-view referred to in a language instruction, is a key capability needed for such systems. However, current STVG models tend to be resource-intensive, relying on multiple cross-attentional transformers applied to each video frame. This results in runtime complexity that increases linearly with video length. Furthermore, deploying these models on mobile devices while maintaining a low-latency poses additional challenges. Hence, this paper explores the latency and energy requirements for implementing STVG models on a pervasive device. 2024-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9219 info:doi/10.1145/3643832.3661402 https://ink.library.smu.edu.sg/context/sis_research/article/10177/viewcontent/Mobisys2024_ProfilingEventVision_PosterPaper_cameraready.pdf http://creativecommons.org/licenses/by/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Human-AI Collaboration Spatio-Temporal Video Grounding Computer Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Human-AI Collaboration Spatio-Temporal Video Grounding Computer Engineering
spellingShingle	Human-AI Collaboration Spatio-Temporal Video Grounding Computer Engineering WEERAKOON MUDIYANSELAGE, Dulanga Kaveesha SUBBARAJU, Vigneshwaran LIM, Joo Hwee Misra, Archan Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices
description	As the use of pervasive devices expands into complex collaborative tasks such as cognitive assistants and interactive AR/VR companions, they are equipped with a myriad of sensors facilitating natural interactions, such as voice commands. Spatio-Temporal Video Grounding (STVG), the task of identifying the target object in the field-of-view referred to in a language instruction, is a key capability needed for such systems. However, current STVG models tend to be resource-intensive, relying on multiple cross-attentional transformers applied to each video frame. This results in runtime complexity that increases linearly with video length. Furthermore, deploying these models on mobile devices while maintaining a low-latency poses additional challenges. Hence, this paper explores the latency and energy requirements for implementing STVG models on a pervasive device.
format	text
author	WEERAKOON MUDIYANSELAGE, Dulanga Kaveesha SUBBARAJU, Vigneshwaran LIM, Joo Hwee Misra, Archan
author_facet	WEERAKOON MUDIYANSELAGE, Dulanga Kaveesha SUBBARAJU, Vigneshwaran LIM, Joo Hwee Misra, Archan
author_sort	WEERAKOON MUDIYANSELAGE, Dulanga Kaveesha
title	Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices
title_short	Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices
title_full	Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices
title_fullStr	Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices
title_full_unstemmed	Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices
title_sort	poster: towards efficient spatio-temporal video grounding in pervasive mobile devices
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/sis_research/9219 https://ink.library.smu.edu.sg/context/sis_research/article/10177/viewcontent/Mobisys2024_ProfilingEventVision_PosterPaper_cameraready.pdf
_version_	1814047811488972800

Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices

Similar Items