Wav-BERT: Cooperative acoustic and linguistic representation learning for low-resource speech recognition

Unifying acoustic and linguistic representation learning has become increasingly crucial to transfer the knowledge learned on the abundance of high-resource language data for low-resource speech recognition. Existing approaches simply cascade pre-trained acoustic and language models to learn the tra...

全面介紹

Saved in:

書目詳細資料
Main Authors:	ZHENG, Guolin, XIAO, Yubei, GONG, Ke, ZHOU, Pan, LIANG, Xiaodan, LIN, Liang
格式:	text
語言:	English
出版:	Institutional Knowledge at Singapore Management University 2021
主題:	Graphics and Human Computer Interfaces Programming Languages and Compilers
在線閱讀:	https://ink.library.smu.edu.sg/sis_research/9000 https://ink.library.smu.edu.sg/context/sis_research/article/10003/viewcontent/2021_EMNLP_Wav_BERT.pdf
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!
機構:	Singapore Management University
語言:	English

id	sg-smu-ink.sis_research-10003
record_format	dspace
spelling	sg-smu-ink.sis_research-100032024-07-25T08:18:57Z Wav-BERT: Cooperative acoustic and linguistic representation learning for low-resource speech recognition ZHENG, Guolin XIAO, Yubei GONG, Ke ZHOU, Pan LIANG, Xiaodan LIN, Liang Unifying acoustic and linguistic representation learning has become increasingly crucial to transfer the knowledge learned on the abundance of high-resource language data for low-resource speech recognition. Existing approaches simply cascade pre-trained acoustic and language models to learn the transfer from speech to text. However, how to solve the representation discrepancy of speech and text is unexplored, which hinders the utilization of acoustic and linguistic information. Moreover, previous works simply replace the embedding layer of the pre-trained language model with the acoustic features, which may cause the catastrophic forgetting problem. In this work, we introduce Wav-BERT, a cooperative acoustic and linguistic representation learning method to fuse and utilize the contextual information of speech and text. Specifically, we unify a pre-trained acoustic model (wav2vec 2.0) and a language model (BERT) into an end-to-end trainable framework. A Representation Aggregation Module is designed to aggregate acoustic and linguistic representation, and an Embedding Attention Module is introduced to incorporate acoustic information into BERT, which can effectively facilitate the cooperation of two pre-trained models and thus boost the representation learning. Extensive experiments show that our Wav-BERT significantly outperforms the existing approaches and achieves state-of-the-art performance on low-resource speech recognition. 2021-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9000 info:doi/10.18653/V1/2021.FINDINGS-EMNLP.236 https://ink.library.smu.edu.sg/context/sis_research/article/10003/viewcontent/2021_EMNLP_Wav_BERT.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Graphics and Human Computer Interfaces Programming Languages and Compilers
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Graphics and Human Computer Interfaces Programming Languages and Compilers
spellingShingle	Graphics and Human Computer Interfaces Programming Languages and Compilers ZHENG, Guolin XIAO, Yubei GONG, Ke ZHOU, Pan LIANG, Xiaodan LIN, Liang Wav-BERT: Cooperative acoustic and linguistic representation learning for low-resource speech recognition
description	Unifying acoustic and linguistic representation learning has become increasingly crucial to transfer the knowledge learned on the abundance of high-resource language data for low-resource speech recognition. Existing approaches simply cascade pre-trained acoustic and language models to learn the transfer from speech to text. However, how to solve the representation discrepancy of speech and text is unexplored, which hinders the utilization of acoustic and linguistic information. Moreover, previous works simply replace the embedding layer of the pre-trained language model with the acoustic features, which may cause the catastrophic forgetting problem. In this work, we introduce Wav-BERT, a cooperative acoustic and linguistic representation learning method to fuse and utilize the contextual information of speech and text. Specifically, we unify a pre-trained acoustic model (wav2vec 2.0) and a language model (BERT) into an end-to-end trainable framework. A Representation Aggregation Module is designed to aggregate acoustic and linguistic representation, and an Embedding Attention Module is introduced to incorporate acoustic information into BERT, which can effectively facilitate the cooperation of two pre-trained models and thus boost the representation learning. Extensive experiments show that our Wav-BERT significantly outperforms the existing approaches and achieves state-of-the-art performance on low-resource speech recognition.
format	text
author	ZHENG, Guolin XIAO, Yubei GONG, Ke ZHOU, Pan LIANG, Xiaodan LIN, Liang
author_facet	ZHENG, Guolin XIAO, Yubei GONG, Ke ZHOU, Pan LIANG, Xiaodan LIN, Liang
author_sort	ZHENG, Guolin
title	Wav-BERT: Cooperative acoustic and linguistic representation learning for low-resource speech recognition
title_short	Wav-BERT: Cooperative acoustic and linguistic representation learning for low-resource speech recognition
title_full	Wav-BERT: Cooperative acoustic and linguistic representation learning for low-resource speech recognition
title_fullStr	Wav-BERT: Cooperative acoustic and linguistic representation learning for low-resource speech recognition
title_full_unstemmed	Wav-BERT: Cooperative acoustic and linguistic representation learning for low-resource speech recognition
title_sort	wav-bert: cooperative acoustic and linguistic representation learning for low-resource speech recognition
publisher	Institutional Knowledge at Singapore Management University
publishDate	2021
url	https://ink.library.smu.edu.sg/sis_research/9000 https://ink.library.smu.edu.sg/context/sis_research/article/10003/viewcontent/2021_EMNLP_Wav_BERT.pdf
_version_	1814047688398733312

Wav-BERT: Cooperative acoustic and linguistic representation learning for low-resource speech recognition

相似書籍