Continual learning optimizations for auto-regressive decoder of multilingual ASR systems
Continual Learning (CL) involves fine-tuning pre-trained models with new data while maintaining the performance on the pre-trained data. This is particularly relevant for expanding multilingual ASR (MASR) capabilities. However, existing CL methods, mainly designed for computer vision and reinforc...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/180315 http://arxiv.org/abs/2407.03645v3 https://www.isca-archive.org/interspeech_2024/index.html |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Continual Learning (CL) involves fine-tuning pre-trained models with new data
while maintaining the performance on the pre-trained data. This is particularly
relevant for expanding multilingual ASR (MASR) capabilities. However, existing
CL methods, mainly designed for computer vision and reinforcement learning
tasks, often yield sub-optimal results when directly applied to MASR. We
hypothesise that this is because CL of the auto-regressive decoder in the MASR
model is difficult. To verify this, we propose four optimizations on the
decoder. They include decoder-layer gradient surgery, freezing unused token
embeddings, suppressing output of newly added tokens, and learning rate
re-scaling. Our experiments on adapting Whisper to 10 unseen languages from the
Common Voice dataset demonstrate that these optimizations reduce the Average
Word Error Rate (AWER) of pretrained languages from 14.2% to 12.4% compared
with Experience Replay, without compromising the AWER of new languages. |
---|