Continual learning optimizations for auto-regressive decoder of multilingual ASR systems

Continual Learning (CL) involves fine-tuning pre-trained models with new data while maintaining the performance on the pre-trained data. This is particularly relevant for expanding multilingual ASR (MASR) capabilities. However, existing CL methods, mainly designed for computer vision and reinforc...

Full description

Saved in:
Bibliographic Details
Main Authors: Kwok, Chin Yuen, Yip, Jia Qi, Chng, Eng Siong
Other Authors: College of Computing and Data Science
Format: Conference or Workshop Item
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/180315
http://arxiv.org/abs/2407.03645v3
https://www.isca-archive.org/interspeech_2024/index.html
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Continual Learning (CL) involves fine-tuning pre-trained models with new data while maintaining the performance on the pre-trained data. This is particularly relevant for expanding multilingual ASR (MASR) capabilities. However, existing CL methods, mainly designed for computer vision and reinforcement learning tasks, often yield sub-optimal results when directly applied to MASR. We hypothesise that this is because CL of the auto-regressive decoder in the MASR model is difficult. To verify this, we propose four optimizations on the decoder. They include decoder-layer gradient surgery, freezing unused token embeddings, suppressing output of newly added tokens, and learning rate re-scaling. Our experiments on adapting Whisper to 10 unseen languages from the Common Voice dataset demonstrate that these optimizations reduce the Average Word Error Rate (AWER) of pretrained languages from 14.2% to 12.4% compared with Experience Replay, without compromising the AWER of new languages.