Computation-efficient knowledge distillation via uncertainty-aware mixup

Knowledge distillation (KD) has emerged as an essential technique not only for model compression, but also other learning tasks such as continual learning. Given the richer application spectrum and potential online usage of KD, knowledge distillation efficiency becomes a pivotal component. In this w...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xu, Guodong, Liu, Ziwei, Loy, Chen Change
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2023
Subjects:	Engineering::Computer science and engineering Knowledge Distillation Training Cost
Online Access:	https://hdl.handle.net/10356/172038
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-172038
record_format	dspace
spelling	sg-ntu-dr.10356-1720382023-11-20T04:39:10Z Computation-efficient knowledge distillation via uncertainty-aware mixup Xu, Guodong Liu, Ziwei Loy, Chen Change School of Computer Science and Engineering Engineering::Computer science and engineering Knowledge Distillation Training Cost Knowledge distillation (KD) has emerged as an essential technique not only for model compression, but also other learning tasks such as continual learning. Given the richer application spectrum and potential online usage of KD, knowledge distillation efficiency becomes a pivotal component. In this work, we study this little-explored but important topic. Unlike previous works that focus solely on the accuracy of student network, we attempt to achieve a harder goal – to obtain a performance comparable to conventional KD with a lower computation cost during the transfer. To this end, we present UNcertainty-aware mIXup (UNIX), an effective approach that can reduce transfer cost by 20% to 30% and yet maintain comparable or achieve even better student performance than conventional KD. This is made possible via effective uncertainty sampling and a novel adaptive mixup approach that select informative samples dynamically over ample data and compact knowledge in these samples. We show that our approach inherently performs hard sample mining. We demonstrate the applicability of our approach to improve various existing KD approaches by reducing their queries to a teacher network. Extensive experiments are performed on CIFAR100 and ImageNet. Code and model are available at https://github.com/xuguodong03/UNIXKD. Nanyang Technological University This study is supported by Collaborative Research Grant from SenseTime Group (CUHK Agreement No. TS1610626 & No. TS1712093) and NTU NAP. 2023-11-20T04:39:10Z 2023-11-20T04:39:10Z 2023 Journal Article Xu, G., Liu, Z. & Loy, C. C. (2023). Computation-efficient knowledge distillation via uncertainty-aware mixup. Pattern Recognition, 138, 109338-. https://dx.doi.org/10.1016/j.patcog.2023.109338 0031-3203 https://hdl.handle.net/10356/172038 10.1016/j.patcog.2023.109338 2-s2.0-85147248505 138 109338 en Pattern Recognition © 2023 Elsevier Ltd. All rights reserved.
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering Knowledge Distillation Training Cost
spellingShingle	Engineering::Computer science and engineering Knowledge Distillation Training Cost Xu, Guodong Liu, Ziwei Loy, Chen Change Computation-efficient knowledge distillation via uncertainty-aware mixup
description	Knowledge distillation (KD) has emerged as an essential technique not only for model compression, but also other learning tasks such as continual learning. Given the richer application spectrum and potential online usage of KD, knowledge distillation efficiency becomes a pivotal component. In this work, we study this little-explored but important topic. Unlike previous works that focus solely on the accuracy of student network, we attempt to achieve a harder goal – to obtain a performance comparable to conventional KD with a lower computation cost during the transfer. To this end, we present UNcertainty-aware mIXup (UNIX), an effective approach that can reduce transfer cost by 20% to 30% and yet maintain comparable or achieve even better student performance than conventional KD. This is made possible via effective uncertainty sampling and a novel adaptive mixup approach that select informative samples dynamically over ample data and compact knowledge in these samples. We show that our approach inherently performs hard sample mining. We demonstrate the applicability of our approach to improve various existing KD approaches by reducing their queries to a teacher network. Extensive experiments are performed on CIFAR100 and ImageNet. Code and model are available at https://github.com/xuguodong03/UNIXKD.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Xu, Guodong Liu, Ziwei Loy, Chen Change
format	Article
author	Xu, Guodong Liu, Ziwei Loy, Chen Change
author_sort	Xu, Guodong
title	Computation-efficient knowledge distillation via uncertainty-aware mixup
title_short	Computation-efficient knowledge distillation via uncertainty-aware mixup
title_full	Computation-efficient knowledge distillation via uncertainty-aware mixup
title_fullStr	Computation-efficient knowledge distillation via uncertainty-aware mixup
title_full_unstemmed	Computation-efficient knowledge distillation via uncertainty-aware mixup
title_sort	computation-efficient knowledge distillation via uncertainty-aware mixup
publishDate	2023
url	https://hdl.handle.net/10356/172038
_version_	1783955545214943232

Computation-efficient knowledge distillation via uncertainty-aware mixup

Similar Items