Self-promoted supervision for few-shot transformer

The few-shot learning ability of vision transformers (ViTs) is rarely investigated though heavily desired. In this work, we empirically find that with the same few-shot learning frameworks, e.g. MetaBaseline, replacing the widely used CNN feature extractor with a ViT model often severely impairs few...

Full description

Saved in:

Bibliographic Details
Main Authors:	DONG, Bowen, ZHOU, Pan, YAN, Shuicheng, ZUO, Wangmeng
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2022
Subjects:	few-shot learning location-specific supervision Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/8984 https://ink.library.smu.edu.sg/context/sis_research/article/9987/viewcontent/2022_ECCV_few_shot__1_.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-9987
record_format	dspace
spelling	sg-smu-ink.sis_research-99872024-07-25T08:30:38Z Self-promoted supervision for few-shot transformer DONG, Bowen ZHOU, Pan YAN, Shuicheng ZUO, Wangmeng The few-shot learning ability of vision transformers (ViTs) is rarely investigated though heavily desired. In this work, we empirically find that with the same few-shot learning frameworks, e.g. MetaBaseline, replacing the widely used CNN feature extractor with a ViT model often severely impairs few-shot classification performance. Moreover, our empirical study shows that in the absence of inductive bias, ViTs often learn the low-qualified token dependencies under few-shot learning regime where only a few labeled training data are available, which largely contributes to the above performance degradation. To alleviate this issue, for the first time, we propose a simple yet effective few-shot training framework for ViTs, namely Self-promoted sUpervisioN (SUN). Specifically, besides the conventional global supervision for global semantic learning, SUN further pretrains the ViT on the few-shot learning dataset and then uses it to generate individual location-specific supervision for guiding each patch token. This location-specific supervision tells the ViT which patch tokens are similar or dissimilar and thus accelerates token dependency learning. Moreover, it models the local semantics in each patch token to improve the object grounding and recognition capability which helps learn generalizable patterns. To improve the quality of location-specific supervision, we further propose two techniques: 1) background patch filtration to filtrate background patches out and assign them into an extra background class; and 2) spatialconsistent augmentation to introduce sufficient diversity for data augmentation while keeping the accuracy of the generated local supervisions. Experimental results show that SUN using ViTs significantly surpasses other few-shot learning frameworks with ViTs and is the first one that achieves higher performance than those CNN state-of-the-arts. Our code is publicly available at https://github.com/DongSky/few-shot-vit 2022-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8984 info:doi/10.1007/978-3-031-20044-1_19 https://ink.library.smu.edu.sg/context/sis_research/article/9987/viewcontent/2022_ECCV_few_shot__1_.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University few-shot learning location-specific supervision Graphics and Human Computer Interfaces
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	few-shot learning location-specific supervision Graphics and Human Computer Interfaces
spellingShingle	few-shot learning location-specific supervision Graphics and Human Computer Interfaces DONG, Bowen ZHOU, Pan YAN, Shuicheng ZUO, Wangmeng Self-promoted supervision for few-shot transformer
description	The few-shot learning ability of vision transformers (ViTs) is rarely investigated though heavily desired. In this work, we empirically find that with the same few-shot learning frameworks, e.g. MetaBaseline, replacing the widely used CNN feature extractor with a ViT model often severely impairs few-shot classification performance. Moreover, our empirical study shows that in the absence of inductive bias, ViTs often learn the low-qualified token dependencies under few-shot learning regime where only a few labeled training data are available, which largely contributes to the above performance degradation. To alleviate this issue, for the first time, we propose a simple yet effective few-shot training framework for ViTs, namely Self-promoted sUpervisioN (SUN). Specifically, besides the conventional global supervision for global semantic learning, SUN further pretrains the ViT on the few-shot learning dataset and then uses it to generate individual location-specific supervision for guiding each patch token. This location-specific supervision tells the ViT which patch tokens are similar or dissimilar and thus accelerates token dependency learning. Moreover, it models the local semantics in each patch token to improve the object grounding and recognition capability which helps learn generalizable patterns. To improve the quality of location-specific supervision, we further propose two techniques: 1) background patch filtration to filtrate background patches out and assign them into an extra background class; and 2) spatialconsistent augmentation to introduce sufficient diversity for data augmentation while keeping the accuracy of the generated local supervisions. Experimental results show that SUN using ViTs significantly surpasses other few-shot learning frameworks with ViTs and is the first one that achieves higher performance than those CNN state-of-the-arts. Our code is publicly available at https://github.com/DongSky/few-shot-vit
format	text
author	DONG, Bowen ZHOU, Pan YAN, Shuicheng ZUO, Wangmeng
author_facet	DONG, Bowen ZHOU, Pan YAN, Shuicheng ZUO, Wangmeng
author_sort	DONG, Bowen
title	Self-promoted supervision for few-shot transformer
title_short	Self-promoted supervision for few-shot transformer
title_full	Self-promoted supervision for few-shot transformer
title_fullStr	Self-promoted supervision for few-shot transformer
title_full_unstemmed	Self-promoted supervision for few-shot transformer
title_sort	self-promoted supervision for few-shot transformer
publisher	Institutional Knowledge at Singapore Management University
publishDate	2022
url	https://ink.library.smu.edu.sg/sis_research/8984 https://ink.library.smu.edu.sg/context/sis_research/article/9987/viewcontent/2022_ECCV_few_shot__1_.pdf
_version_	1814047700583186432

Self-promoted supervision for few-shot transformer

Similar Items