Bayesian neural network language modeling for speech recognition

State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex. They are prone to overfitting and poor generalization when given limited training data. To this end, an overarching full B...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	Xue, Boyang, Hu, Shoukang, Xu, Junhao, Geng, Mengzhe, Liu, Xunying, Meng, Helen
مؤلفون آخرون:	School of Computer Science and Engineering
التنسيق:	مقال
اللغة:	English
منشور في:	2023
الموضوعات:	Engineering::Computer science and engineering Bayesian Learning Model Uncertainty
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/164438
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

id	sg-ntu-dr.10356-164438
record_format	dspace
spelling	sg-ntu-dr.10356-1644382023-01-25T04:56:04Z Bayesian neural network language modeling for speech recognition Xue, Boyang Hu, Shoukang Xu, Junhao Geng, Mengzhe Liu, Xunying Meng, Helen School of Computer Science and Engineering Engineering::Computer science and engineering Bayesian Learning Model Uncertainty State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex. They are prone to overfitting and poor generalization when given limited training data. To this end, an overarching full Bayesian learning framework encompassing three methods is proposed in this paper to account for the underlying uncertainty in LSTM-RNN and Transformer LMs. The uncertainty over their model parameters, choice of neural activations and hidden output representations are modeled using Bayesian, Gaussian Process and variational LSTM-RNN or Transformer LMs respectively. Efficient inference approaches were used to automatically select the optimal network internal components to be Bayesian learned using neural architecture search. A minimal number of Monte Carlo parameter samples as low as one was also used. These allow the computational costs incurred in Bayesian NNLM training and evaluation to be minimized. Experiments are conducted on two tasks: AMI meeting transcription and Oxford-BBC LipReading Sentences 2 (LRS2) overlapped speech recognition using state-of-the-art LF-MMI trained factored TDNN systems featuring data augmentation, speaker adaptation and audio-visual multi-channel beamforming for overlapped speech. Consistent performance improvements over the baseline LSTM-RNN and Transformer LMs with point estimated model parameters and drop-out regularization were obtained across both tasks in terms of perplexity and word error rate (WER). In particular, on the LRS2 data, statistically significant WER reductions up to 1.3% and 1.2% absolute (12.1% and 11.3% relative) were obtained over the baseline LSTM-RNN and Transformer LMs respectively after model combination between Bayesian NNLMs and their respective baselines. This work was supported in part by Hong Kong Research Council GRF under Grants 14200218, 14200220, and 14200021 and in part by Innovation and Technology Fund under Grants ITS/254/19 and InP/057/21. 2023-01-25T04:56:04Z 2023-01-25T04:56:04Z 2022 Journal Article Xue, B., Hu, S., Xu, J., Geng, M., Liu, X. & Meng, H. (2022). Bayesian neural network language modeling for speech recognition. IEEE/ACM Transactions On Audio Speech and Language Processing, 30, 2900-2917. https://dx.doi.org/10.1109/TASLP.2022.3203891 2329-9290 https://hdl.handle.net/10356/164438 10.1109/TASLP.2022.3203891 2-s2.0-85137870890 30 2900 2917 en IEEE/ACM Transactions on Audio Speech and Language Processing © 2022 IEEE. All rights reserved.
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering Bayesian Learning Model Uncertainty
spellingShingle	Engineering::Computer science and engineering Bayesian Learning Model Uncertainty Xue, Boyang Hu, Shoukang Xu, Junhao Geng, Mengzhe Liu, Xunying Meng, Helen Bayesian neural network language modeling for speech recognition
description	State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex. They are prone to overfitting and poor generalization when given limited training data. To this end, an overarching full Bayesian learning framework encompassing three methods is proposed in this paper to account for the underlying uncertainty in LSTM-RNN and Transformer LMs. The uncertainty over their model parameters, choice of neural activations and hidden output representations are modeled using Bayesian, Gaussian Process and variational LSTM-RNN or Transformer LMs respectively. Efficient inference approaches were used to automatically select the optimal network internal components to be Bayesian learned using neural architecture search. A minimal number of Monte Carlo parameter samples as low as one was also used. These allow the computational costs incurred in Bayesian NNLM training and evaluation to be minimized. Experiments are conducted on two tasks: AMI meeting transcription and Oxford-BBC LipReading Sentences 2 (LRS2) overlapped speech recognition using state-of-the-art LF-MMI trained factored TDNN systems featuring data augmentation, speaker adaptation and audio-visual multi-channel beamforming for overlapped speech. Consistent performance improvements over the baseline LSTM-RNN and Transformer LMs with point estimated model parameters and drop-out regularization were obtained across both tasks in terms of perplexity and word error rate (WER). In particular, on the LRS2 data, statistically significant WER reductions up to 1.3% and 1.2% absolute (12.1% and 11.3% relative) were obtained over the baseline LSTM-RNN and Transformer LMs respectively after model combination between Bayesian NNLMs and their respective baselines.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Xue, Boyang Hu, Shoukang Xu, Junhao Geng, Mengzhe Liu, Xunying Meng, Helen
format	Article
author	Xue, Boyang Hu, Shoukang Xu, Junhao Geng, Mengzhe Liu, Xunying Meng, Helen
author_sort	Xue, Boyang
title	Bayesian neural network language modeling for speech recognition
title_short	Bayesian neural network language modeling for speech recognition
title_full	Bayesian neural network language modeling for speech recognition
title_fullStr	Bayesian neural network language modeling for speech recognition
title_full_unstemmed	Bayesian neural network language modeling for speech recognition
title_sort	bayesian neural network language modeling for speech recognition
publishDate	2023
url	https://hdl.handle.net/10356/164438
_version_	1756370602241818624

Bayesian neural network language modeling for speech recognition

مواد مشابهة