Shunted self-attention via multi-scale token aggregation

Recent Vision Transformer (ViT) models have demonstrated encouraging results across various computer vision tasks, thanks to its competence in modeling long-range dependencies of image patches or tokens via self-attention. These models, however, usually designate the similar receptive fields of each...

Full description

Saved in:

Bibliographic Details
Main Authors:	REN, Sucheng, ZHOU, Daquan, HE, Shengfeng, FENG, Jiashi, WANG, Xinchao
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2022
Subjects:	Computation costs Deep learning architecture and technique Efficient learning Efficient learning and inference Image patches Learning architectures Learning techniques Multi-scales Receptive fields Transformer modeling Databases and Information Systems Information Security
Online Access:	https://ink.library.smu.edu.sg/sis_research/8530 https://ink.library.smu.edu.sg/context/sis_research/article/9533/viewcontent/Shunted_self_attention_via_multi_scale_token_aggregation.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Be the first to leave a comment!

Shunted self-attention via multi-scale token aggregation

Similar Items