Computation-efficient knowledge distillation via uncertainty-aware mixup

Knowledge distillation (KD) has emerged as an essential technique not only for model compression, but also other learning tasks such as continual learning. Given the richer application spectrum and potential online usage of KD, knowledge distillation efficiency becomes a pivotal component. In this w...

Full description

Saved in:
Bibliographic Details
Main Authors: Xu, Guodong, Liu, Ziwei, Loy, Chen Change
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/172038
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-172038
record_format dspace
spelling sg-ntu-dr.10356-1720382023-11-20T04:39:10Z Computation-efficient knowledge distillation via uncertainty-aware mixup Xu, Guodong Liu, Ziwei Loy, Chen Change School of Computer Science and Engineering Engineering::Computer science and engineering Knowledge Distillation Training Cost Knowledge distillation (KD) has emerged as an essential technique not only for model compression, but also other learning tasks such as continual learning. Given the richer application spectrum and potential online usage of KD, knowledge distillation efficiency becomes a pivotal component. In this work, we study this little-explored but important topic. Unlike previous works that focus solely on the accuracy of student network, we attempt to achieve a harder goal – to obtain a performance comparable to conventional KD with a lower computation cost during the transfer. To this end, we present UNcertainty-aware mIXup (UNIX), an effective approach that can reduce transfer cost by 20% to 30% and yet maintain comparable or achieve even better student performance than conventional KD. This is made possible via effective uncertainty sampling and a novel adaptive mixup approach that select informative samples dynamically over ample data and compact knowledge in these samples. We show that our approach inherently performs hard sample mining. We demonstrate the applicability of our approach to improve various existing KD approaches by reducing their queries to a teacher network. Extensive experiments are performed on CIFAR100 and ImageNet. Code and model are available at https://github.com/xuguodong03/UNIXKD. Nanyang Technological University This study is supported by Collaborative Research Grant from SenseTime Group (CUHK Agreement No. TS1610626 & No. TS1712093) and NTU NAP. 2023-11-20T04:39:10Z 2023-11-20T04:39:10Z 2023 Journal Article Xu, G., Liu, Z. & Loy, C. C. (2023). Computation-efficient knowledge distillation via uncertainty-aware mixup. Pattern Recognition, 138, 109338-. https://dx.doi.org/10.1016/j.patcog.2023.109338 0031-3203 https://hdl.handle.net/10356/172038 10.1016/j.patcog.2023.109338 2-s2.0-85147248505 138 109338 en Pattern Recognition © 2023 Elsevier Ltd. All rights reserved.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Knowledge Distillation
Training Cost
spellingShingle Engineering::Computer science and engineering
Knowledge Distillation
Training Cost
Xu, Guodong
Liu, Ziwei
Loy, Chen Change
Computation-efficient knowledge distillation via uncertainty-aware mixup
description Knowledge distillation (KD) has emerged as an essential technique not only for model compression, but also other learning tasks such as continual learning. Given the richer application spectrum and potential online usage of KD, knowledge distillation efficiency becomes a pivotal component. In this work, we study this little-explored but important topic. Unlike previous works that focus solely on the accuracy of student network, we attempt to achieve a harder goal – to obtain a performance comparable to conventional KD with a lower computation cost during the transfer. To this end, we present UNcertainty-aware mIXup (UNIX), an effective approach that can reduce transfer cost by 20% to 30% and yet maintain comparable or achieve even better student performance than conventional KD. This is made possible via effective uncertainty sampling and a novel adaptive mixup approach that select informative samples dynamically over ample data and compact knowledge in these samples. We show that our approach inherently performs hard sample mining. We demonstrate the applicability of our approach to improve various existing KD approaches by reducing their queries to a teacher network. Extensive experiments are performed on CIFAR100 and ImageNet. Code and model are available at https://github.com/xuguodong03/UNIXKD.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Xu, Guodong
Liu, Ziwei
Loy, Chen Change
format Article
author Xu, Guodong
Liu, Ziwei
Loy, Chen Change
author_sort Xu, Guodong
title Computation-efficient knowledge distillation via uncertainty-aware mixup
title_short Computation-efficient knowledge distillation via uncertainty-aware mixup
title_full Computation-efficient knowledge distillation via uncertainty-aware mixup
title_fullStr Computation-efficient knowledge distillation via uncertainty-aware mixup
title_full_unstemmed Computation-efficient knowledge distillation via uncertainty-aware mixup
title_sort computation-efficient knowledge distillation via uncertainty-aware mixup
publishDate 2023
url https://hdl.handle.net/10356/172038
_version_ 1783955545214943232