Learning temporal dynamics in videos with image transformer

Temporal dynamics represent the evolving of video content over time, which are critical for action recognition. In this paper, we ask the question: can the off-the-shelf image transformer architecture learn temporal dynamics in videos? To this end, we propose Multidimensional Stacked Image (MSImage)...

Full description

Saved in:
Bibliographic Details
Main Authors: SHU, Yan, QIU, Z, LONG, Fuchen, YAO, Ting, NGO, Chong-wah, MEI, Tao
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9860
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English