PosMLP-Video: Spatial and temporal relative position encoding for efficient video recognition
In recent years, vision Transformers and MLPs have demonstrated remarkable performance in image understanding tasks. However, their inherently dense computational operators, such as self-attention and token-mixing layers, pose significant challenges when applied to spatio-temporal video data. To add...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2023
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/8256 https://ink.library.smu.edu.sg/context/sis_research/article/9259/viewcontent/PosMLP_preprint_pvoa_cc_by.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |