Computation-efficient knowledge distillation via uncertainty-aware mixup
Knowledge distillation (KD) has emerged as an essential technique not only for model compression, but also other learning tasks such as continual learning. Given the richer application spectrum and potential online usage of KD, knowledge distillation efficiency becomes a pivotal component. In this w...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/172038 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-172038 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1720382023-11-20T04:39:10Z Computation-efficient knowledge distillation via uncertainty-aware mixup Xu, Guodong Liu, Ziwei Loy, Chen Change School of Computer Science and Engineering Engineering::Computer science and engineering Knowledge Distillation Training Cost Knowledge distillation (KD) has emerged as an essential technique not only for model compression, but also other learning tasks such as continual learning. Given the richer application spectrum and potential online usage of KD, knowledge distillation efficiency becomes a pivotal component. In this work, we study this little-explored but important topic. Unlike previous works that focus solely on the accuracy of student network, we attempt to achieve a harder goal – to obtain a performance comparable to conventional KD with a lower computation cost during the transfer. To this end, we present UNcertainty-aware mIXup (UNIX), an effective approach that can reduce transfer cost by 20% to 30% and yet maintain comparable or achieve even better student performance than conventional KD. This is made possible via effective uncertainty sampling and a novel adaptive mixup approach that select informative samples dynamically over ample data and compact knowledge in these samples. We show that our approach inherently performs hard sample mining. We demonstrate the applicability of our approach to improve various existing KD approaches by reducing their queries to a teacher network. Extensive experiments are performed on CIFAR100 and ImageNet. Code and model are available at https://github.com/xuguodong03/UNIXKD. Nanyang Technological University This study is supported by Collaborative Research Grant from SenseTime Group (CUHK Agreement No. TS1610626 & No. TS1712093) and NTU NAP. 2023-11-20T04:39:10Z 2023-11-20T04:39:10Z 2023 Journal Article Xu, G., Liu, Z. & Loy, C. C. (2023). Computation-efficient knowledge distillation via uncertainty-aware mixup. Pattern Recognition, 138, 109338-. https://dx.doi.org/10.1016/j.patcog.2023.109338 0031-3203 https://hdl.handle.net/10356/172038 10.1016/j.patcog.2023.109338 2-s2.0-85147248505 138 109338 en Pattern Recognition © 2023 Elsevier Ltd. All rights reserved. |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering Knowledge Distillation Training Cost |
spellingShingle |
Engineering::Computer science and engineering Knowledge Distillation Training Cost Xu, Guodong Liu, Ziwei Loy, Chen Change Computation-efficient knowledge distillation via uncertainty-aware mixup |
description |
Knowledge distillation (KD) has emerged as an essential technique not only for model compression, but also other learning tasks such as continual learning. Given the richer application spectrum and potential online usage of KD, knowledge distillation efficiency becomes a pivotal component. In this work, we study this little-explored but important topic. Unlike previous works that focus solely on the accuracy of student network, we attempt to achieve a harder goal – to obtain a performance comparable to conventional KD with a lower computation cost during the transfer. To this end, we present UNcertainty-aware mIXup (UNIX), an effective approach that can reduce transfer cost by 20% to 30% and yet maintain comparable or achieve even better student performance than conventional KD. This is made possible via effective uncertainty sampling and a novel adaptive mixup approach that select informative samples dynamically over ample data and compact knowledge in these samples. We show that our approach inherently performs hard sample mining. We demonstrate the applicability of our approach to improve various existing KD approaches by reducing their queries to a teacher network. Extensive experiments are performed on CIFAR100 and ImageNet. Code and model are available at https://github.com/xuguodong03/UNIXKD. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Xu, Guodong Liu, Ziwei Loy, Chen Change |
format |
Article |
author |
Xu, Guodong Liu, Ziwei Loy, Chen Change |
author_sort |
Xu, Guodong |
title |
Computation-efficient knowledge distillation via uncertainty-aware mixup |
title_short |
Computation-efficient knowledge distillation via uncertainty-aware mixup |
title_full |
Computation-efficient knowledge distillation via uncertainty-aware mixup |
title_fullStr |
Computation-efficient knowledge distillation via uncertainty-aware mixup |
title_full_unstemmed |
Computation-efficient knowledge distillation via uncertainty-aware mixup |
title_sort |
computation-efficient knowledge distillation via uncertainty-aware mixup |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/172038 |
_version_ |
1783955545214943232 |