PosMLP-Video: Spatial and temporal relative position encoding for efficient video recognition
In recent years, vision Transformers and MLPs have demonstrated remarkable performance in image understanding tasks. However, their inherently dense computational operators, such as self-attention and token-mixing layers, pose significant challenges when applied to spatio-temporal video data. To add...
Saved in:
Main Authors: | HAO, Yanbin, ZHOU, Diansong, WANG, Zhicai, NGO, Chong-wah, HE, Xiangnan, WANG, Meng |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2023
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/8256 https://ink.library.smu.edu.sg/context/sis_research/article/9259/viewcontent/PosMLP_preprint_pvoa_cc_by.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
Similar Items
-
Architecture analysis of MLP by geometrical interpretation
by: Xiang, C., et al.
Published: (2014) -
Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices
by: WEERAKOON MUDIYANSELAGE, Dulanga Kaveesha, et al.
Published: (2024) -
Video partitioning by temporal slice coherency
by: NGO, Chong-wah, et al.
Published: (2001) -
Video segmentation: Temporally-constrained graph-based optimization
by: LIU SIYING
Published: (2010) -
Video Encoder Optimization for Real - Time Communication
by: TAN YIH HAN
Published: (2011)