Co-advise: Cross inductive bias distillation

The inductive bias of vision transformers is more relaxed that cannot work well with insufficient data. Knowledge distillation is thus introduced to assist the training of transformers. Unlike previous works, where merely heavy convolution-based teachers are provided, in this paper, we delve into th...

Full description

Saved in:

Bibliographic Details
Main Authors:	REN, Sucheng, GAO, Zhengqi, HUA, Tiany, XUE, Zihui, TIAN, Yonglong, HE, Shengfeng, ZHAO, Hang
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2022
Subjects:	Adversarial attack and defense Distillation method Inductive bias Performance Representation learning Size models Teacher models Teachers' Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/8538 https://ink.library.smu.edu.sg/context/sis_research/article/9541/viewcontent/Co_Advise__Cross_Inductive_Bias_Distillation.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-9541
record_format	dspace
spelling	sg-smu-ink.sis_research-95412024-01-22T14:56:00Z Co-advise: Cross inductive bias distillation REN, Sucheng GAO, Zhengqi HUA, Tiany XUE, Zihui TIAN, Yonglong HE, Shengfeng ZHAO, Hang The inductive bias of vision transformers is more relaxed that cannot work well with insufficient data. Knowledge distillation is thus introduced to assist the training of transformers. Unlike previous works, where merely heavy convolution-based teachers are provided, in this paper, we delve into the influence of models inductive biases in knowledge distillation (e.g., convolution and involution). Our key observation is that the teacher accuracy is not the dominant reason for the student accuracy, but the teacher inductive bias is more important. We demonstrate that lightweight teachers with different architectural inductive biases can be used to co-advise the student transformer with outstanding performances. The rationale behind is that models designed with different inductive biases tend to focus on diverse patterns, and teachers with different inductive biases attain various knowledge despite being trained on the same dataset. The diverse knowledge provides a more precise and comprehensive description of the data and compounds and boosts the performance of the student during distillation. Furthermore, we propose a token inductive bias alignment to align the inductive bias of the token with its target teacher model. With only lightweight teachers provided and using this cross inductive bias distillation method, our vision transformers (termed as CiT) outperform all previous vision transformers (ViT) of the same architecture on ImageNet. Moreover, our small size model CiT-SAK further achieves 82.7% Top-1 accuracy on ImageNet without modifying the attention module of the ViT. 2022-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8538 info:doi/10.1109/CVPR52688.2022.01627 https://ink.library.smu.edu.sg/context/sis_research/article/9541/viewcontent/Co_Advise__Cross_Inductive_Bias_Distillation.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Adversarial attack and defense Distillation method Inductive bias Performance Representation learning Size models Teacher models Teachers' Databases and Information Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Adversarial attack and defense Distillation method Inductive bias Performance Representation learning Size models Teacher models Teachers' Databases and Information Systems
spellingShingle	Adversarial attack and defense Distillation method Inductive bias Performance Representation learning Size models Teacher models Teachers' Databases and Information Systems REN, Sucheng GAO, Zhengqi HUA, Tiany XUE, Zihui TIAN, Yonglong HE, Shengfeng ZHAO, Hang Co-advise: Cross inductive bias distillation
description	The inductive bias of vision transformers is more relaxed that cannot work well with insufficient data. Knowledge distillation is thus introduced to assist the training of transformers. Unlike previous works, where merely heavy convolution-based teachers are provided, in this paper, we delve into the influence of models inductive biases in knowledge distillation (e.g., convolution and involution). Our key observation is that the teacher accuracy is not the dominant reason for the student accuracy, but the teacher inductive bias is more important. We demonstrate that lightweight teachers with different architectural inductive biases can be used to co-advise the student transformer with outstanding performances. The rationale behind is that models designed with different inductive biases tend to focus on diverse patterns, and teachers with different inductive biases attain various knowledge despite being trained on the same dataset. The diverse knowledge provides a more precise and comprehensive description of the data and compounds and boosts the performance of the student during distillation. Furthermore, we propose a token inductive bias alignment to align the inductive bias of the token with its target teacher model. With only lightweight teachers provided and using this cross inductive bias distillation method, our vision transformers (termed as CiT) outperform all previous vision transformers (ViT) of the same architecture on ImageNet. Moreover, our small size model CiT-SAK further achieves 82.7% Top-1 accuracy on ImageNet without modifying the attention module of the ViT.
format	text
author	REN, Sucheng GAO, Zhengqi HUA, Tiany XUE, Zihui TIAN, Yonglong HE, Shengfeng ZHAO, Hang
author_facet	REN, Sucheng GAO, Zhengqi HUA, Tiany XUE, Zihui TIAN, Yonglong HE, Shengfeng ZHAO, Hang
author_sort	REN, Sucheng
title	Co-advise: Cross inductive bias distillation
title_short	Co-advise: Cross inductive bias distillation
title_full	Co-advise: Cross inductive bias distillation
title_fullStr	Co-advise: Cross inductive bias distillation
title_full_unstemmed	Co-advise: Cross inductive bias distillation
title_sort	co-advise: cross inductive bias distillation
publisher	Institutional Knowledge at Singapore Management University
publishDate	2022
url	https://ink.library.smu.edu.sg/sis_research/8538 https://ink.library.smu.edu.sg/context/sis_research/article/9541/viewcontent/Co_Advise__Cross_Inductive_Bias_Distillation.pdf
_version_	1789483261226909696

Co-advise: Cross inductive bias distillation

Similar Items