Shunted self-attention via multi-scale token aggregation

Recent Vision Transformer (ViT) models have demonstrated encouraging results across various computer vision tasks, thanks to its competence in modeling long-range dependencies of image patches or tokens via self-attention. These models, however, usually designate the similar receptive fields of each...

Full description

Saved in:
Bibliographic Details
Main Authors: REN, Sucheng, ZHOU, Daquan, HE, Shengfeng, FENG, Jiashi, WANG, Xinchao
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2022
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8530
https://ink.library.smu.edu.sg/context/sis_research/article/9533/viewcontent/Shunted_self_attention_via_multi_scale_token_aggregation.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English