Discriminator-enhanced knowledge-distillation networks
Query auto-completion (QAC) serves as a critical functionality in contemporary textual search systems by generating real-time query completion suggestions based on a user’s input prefix. Despite the prevalent use of language models (LMs) in QAC candidate generation, LM-based approaches frequently su...
Saved in:
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/171765 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-171765 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1717652023-11-10T15:40:34Z Discriminator-enhanced knowledge-distillation networks Li, Zhenping Cao, Zhen Li, Pengfei Zhong, Yong Li, Shaobo School of Electrical and Electronic Engineering Engineering::Electrical and electronic engineering Knowledge Distillation Reinforcement Learning Query auto-completion (QAC) serves as a critical functionality in contemporary textual search systems by generating real-time query completion suggestions based on a user’s input prefix. Despite the prevalent use of language models (LMs) in QAC candidate generation, LM-based approaches frequently suffer from overcorrection issues during pair-wise loss training and efficiency deficiencies. To address these challenges, this paper presents a novel framework—discriminator-enhanced knowledge distillation (Dis-KD)—for the QAC task. This framework combines three core components: a large-scale pre-trained teacher model, a lightweight student model, and a discriminator for adversarial learning. Specifically, the discriminator aids in discerning generative-level differences between the teacher and the student models. An additional discriminator score loss is amalgamated with the traditional knowledge-distillation loss, resulting in enhanced performance of the student model. Contrary to the stepwise evaluation of each generated word, our approach assesses the entire generation sequence. This method alleviates the prevalent overcorrection issue in the generation process. Consequently, our proposed framework boasts improvements in model accuracy and a reduction in parameter size. Empirical results highlight the superiority of Dis-KD over established baseline methods, with the student model surpassing the teacher model in QAC tasks for sub-word languages. Published version This work was supported by the AI industrial technology innovation platform of Sichuan Province, grant number 2020ZHCG0002. 2023-11-07T06:17:36Z 2023-11-07T06:17:36Z 2023 Journal Article Li, Z., Cao, Z., Li, P., Zhong, Y. & Li, S. (2023). Discriminator-enhanced knowledge-distillation networks. Applied Sciences, 13(14), 8041-. https://dx.doi.org/10.3390/app13148041 2076-3417 https://hdl.handle.net/10356/171765 10.3390/app13148041 2-s2.0-85166261907 14 13 8041 en Applied Sciences © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Electrical and electronic engineering Knowledge Distillation Reinforcement Learning |
spellingShingle |
Engineering::Electrical and electronic engineering Knowledge Distillation Reinforcement Learning Li, Zhenping Cao, Zhen Li, Pengfei Zhong, Yong Li, Shaobo Discriminator-enhanced knowledge-distillation networks |
description |
Query auto-completion (QAC) serves as a critical functionality in contemporary textual search systems by generating real-time query completion suggestions based on a user’s input prefix. Despite the prevalent use of language models (LMs) in QAC candidate generation, LM-based approaches frequently suffer from overcorrection issues during pair-wise loss training and efficiency deficiencies. To address these challenges, this paper presents a novel framework—discriminator-enhanced knowledge distillation (Dis-KD)—for the QAC task. This framework combines three core components: a large-scale pre-trained teacher model, a lightweight student model, and a discriminator for adversarial learning. Specifically, the discriminator aids in discerning generative-level differences between the teacher and the student models. An additional discriminator score loss is amalgamated with the traditional knowledge-distillation loss, resulting in enhanced performance of the student model. Contrary to the stepwise evaluation of each generated word, our approach assesses the entire generation sequence. This method alleviates the prevalent overcorrection issue in the generation process. Consequently, our proposed framework boasts improvements in model accuracy and a reduction in parameter size. Empirical results highlight the superiority of Dis-KD over established baseline methods, with the student model surpassing the teacher model in QAC tasks for sub-word languages. |
author2 |
School of Electrical and Electronic Engineering |
author_facet |
School of Electrical and Electronic Engineering Li, Zhenping Cao, Zhen Li, Pengfei Zhong, Yong Li, Shaobo |
format |
Article |
author |
Li, Zhenping Cao, Zhen Li, Pengfei Zhong, Yong Li, Shaobo |
author_sort |
Li, Zhenping |
title |
Discriminator-enhanced knowledge-distillation networks |
title_short |
Discriminator-enhanced knowledge-distillation networks |
title_full |
Discriminator-enhanced knowledge-distillation networks |
title_fullStr |
Discriminator-enhanced knowledge-distillation networks |
title_full_unstemmed |
Discriminator-enhanced knowledge-distillation networks |
title_sort |
discriminator-enhanced knowledge-distillation networks |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/171765 |
_version_ |
1783955562163077120 |