PosMLP-Video: Spatial and temporal relative position encoding for efficient video recognition

In recent years, vision Transformers and MLPs have demonstrated remarkable performance in image understanding tasks. However, their inherently dense computational operators, such as self-attention and token-mixing layers, pose significant challenges when applied to spatio-temporal video data. To add...

Full description

Saved in:

Bibliographic Details
Main Authors:	HAO, Yanbin, ZHOU, Diansong, WANG, Zhicai, NGO, Chong-wah, HE, Xiangnan, WANG, Meng
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Multi-layer perceptron Positional encoding Spatio-temporal modeling Video recognition Artificial Intelligence and Robotics Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/8256 https://ink.library.smu.edu.sg/context/sis_research/article/9259/viewcontent/PosMLP_preprint_pvoa_cc_by.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Internet

https://ink.library.smu.edu.sg/sis_research/8256
https://ink.library.smu.edu.sg/context/sis_research/article/9259/viewcontent/PosMLP_preprint_pvoa_cc_by.pdf

PosMLP-Video: Spatial and temporal relative position encoding for efficient video recognition

Internet

Similar Items