PosMLP-Video: Spatial and temporal relative position encoding for efficient video recognition

In recent years, vision Transformers and MLPs have demonstrated remarkable performance in image understanding tasks. However, their inherently dense computational operators, such as self-attention and token-mixing layers, pose significant challenges when applied to spatio-temporal video data. To add...

Full description

Saved in:
Bibliographic Details
Main Authors: HAO, Yanbin, ZHOU, Diansong, WANG, Zhicai, NGO, Chong-wah, HE, Xiangnan, WANG, Meng
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2023
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8256
https://ink.library.smu.edu.sg/context/sis_research/article/9259/viewcontent/PosMLP_preprint_pvoa_cc_by.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Be the first to leave a comment!
You must be logged in first