Continual learning optimizations for auto-regressive decoder of multilingual ASR systems

Continual Learning (CL) involves fine-tuning pre-trained models with new data while maintaining the performance on the pre-trained data. This is particularly relevant for expanding multilingual ASR (MASR) capabilities. However, existing CL methods, mainly designed for computer vision and reinforc...

Full description

Saved in:

Bibliographic Details
Main Authors:	Kwok, Chin Yuen, Yip, Jia Qi, Chng, Eng Siong
Other Authors:	College of Computing and Data Science
Format:	Conference or Workshop Item
Language:	English
Published:	2024
Subjects:	Computer and Information Science Continual learning Language-agnostic
Online Access:	https://hdl.handle.net/10356/180315 http://arxiv.org/abs/2407.03645v3 https://www.isca-archive.org/interspeech_2024/index.html
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Description
Summary:	Continual Learning (CL) involves fine-tuning pre-trained models with new data while maintaining the performance on the pre-trained data. This is particularly relevant for expanding multilingual ASR (MASR) capabilities. However, existing CL methods, mainly designed for computer vision and reinforcement learning tasks, often yield sub-optimal results when directly applied to MASR. We hypothesise that this is because CL of the auto-regressive decoder in the MASR model is difficult. To verify this, we propose four optimizations on the decoder. They include decoder-layer gradient surgery, freezing unused token embeddings, suppressing output of newly added tokens, and learning rate re-scaling. Our experiments on adapting Whisper to 10 unseen languages from the Common Voice dataset demonstrate that these optimizations reduce the Average Word Error Rate (AWER) of pretrained languages from 14.2% to 12.4% compared with Experience Replay, without compromising the AWER of new languages.

Continual learning optimizations for auto-regressive decoder of multilingual ASR systems

Similar Items