Cross-lingual phone mapping for large vocabulary speech recognition of under-resourced languages

This paper presents a novel acoustic modeling technique of large vocabulary automatic speech recognition for under-resourced languages by leveraging well-trained acoustic models of other languages (called source languages). The idea is to use source language acoustic model to score the acoustic feat...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	Do, Van Hai, Xiao, Xiong, Chng, Eng Siong, Li, Haizhou
مؤلفون آخرون:	School of Computer Engineering
التنسيق:	مقال
اللغة:	English
منشور في:	2014
الموضوعات:	DRNTU::Engineering::Computer science and engineering
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/100818 http://hdl.handle.net/10220/19586
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

id	sg-ntu-dr.10356-100818
record_format	dspace
spelling	sg-ntu-dr.10356-1008182020-09-26T22:17:40Z Cross-lingual phone mapping for large vocabulary speech recognition of under-resourced languages Do, Van Hai Xiao, Xiong Chng, Eng Siong Li, Haizhou School of Computer Engineering Temasek Laboratories DRNTU::Engineering::Computer science and engineering This paper presents a novel acoustic modeling technique of large vocabulary automatic speech recognition for under-resourced languages by leveraging well-trained acoustic models of other languages (called source languages). The idea is to use source language acoustic model to score the acoustic features of the target language, and then map these scores to the posteriors of the target phones using a classifier. The target phone posteriors are then used for decoding in the usual way of hybrid acoustic modeling. The motivation of such a strategy is that human languages usually share similar phone sets and hence it may be easier to predict the target phone posteriors from the scores generated by source language acoustic models than to train from scratch an under-resourced language acoustic model. The proposed method is evaluated using on the Aurora-4 task with less than 1 hour of training data. Two types of source language acoustic models are considered, i.e. hybrid HMM/MLP and conventional HMM/GMM models. In addition, we also use triphone tied states in the mapping. Our experimental results show that by leveraging well trained Malay and Hungarian acoustic models, we achieved 9.0% word error rate (WER) given 55 minutes of English training data. This is close to the WER of 7.9% obtained by using the full 15 hours of training data and much better than the WER of 14.4% obtained by conventional acoustic modeling techniques with the same 55 minutes of training data. Published version 2014-06-09T07:37:26Z 2019-12-06T20:28:54Z 2014-06-09T07:37:26Z 2019-12-06T20:28:54Z 2014 2014 Journal Article DO, V. H., XIAO, X., CHNG, E. S., & LI, H. (2014). Cross-Lingual Phone Mapping for Large Vocabulary Speech Recognition of Under-Resourced Languages. IEICE Transactions on Information and Systems, E97.D(2), 285-295. 0916-8532 https://hdl.handle.net/10356/100818 http://hdl.handle.net/10220/19586 10.1587/transinf.E97.D.285 en IEICE transactions on information and systems © 2014 The Institute of Electronics, Information and Communication Engineers. This paper was published in IEICE Transactions on Information and Systems and is made available as an electronic reprint (preprint) with permission of The Institute of Electronics, Information and Communication Engineers. The paper can be found at the following official DOI: http://dx.doi.org/10.1587/transinf.E97.D.285. One print or electronic copy may be made for personal use only. Systematic or multiple reproduction, distribution to multiple locations via electronic or other means, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper is prohibited and is subject to penalties under law. application/pdf
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering
spellingShingle	DRNTU::Engineering::Computer science and engineering Do, Van Hai Xiao, Xiong Chng, Eng Siong Li, Haizhou Cross-lingual phone mapping for large vocabulary speech recognition of under-resourced languages
description	This paper presents a novel acoustic modeling technique of large vocabulary automatic speech recognition for under-resourced languages by leveraging well-trained acoustic models of other languages (called source languages). The idea is to use source language acoustic model to score the acoustic features of the target language, and then map these scores to the posteriors of the target phones using a classifier. The target phone posteriors are then used for decoding in the usual way of hybrid acoustic modeling. The motivation of such a strategy is that human languages usually share similar phone sets and hence it may be easier to predict the target phone posteriors from the scores generated by source language acoustic models than to train from scratch an under-resourced language acoustic model. The proposed method is evaluated using on the Aurora-4 task with less than 1 hour of training data. Two types of source language acoustic models are considered, i.e. hybrid HMM/MLP and conventional HMM/GMM models. In addition, we also use triphone tied states in the mapping. Our experimental results show that by leveraging well trained Malay and Hungarian acoustic models, we achieved 9.0% word error rate (WER) given 55 minutes of English training data. This is close to the WER of 7.9% obtained by using the full 15 hours of training data and much better than the WER of 14.4% obtained by conventional acoustic modeling techniques with the same 55 minutes of training data.
author2	School of Computer Engineering
author_facet	School of Computer Engineering Do, Van Hai Xiao, Xiong Chng, Eng Siong Li, Haizhou
format	Article
author	Do, Van Hai Xiao, Xiong Chng, Eng Siong Li, Haizhou
author_sort	Do, Van Hai
title	Cross-lingual phone mapping for large vocabulary speech recognition of under-resourced languages
title_short	Cross-lingual phone mapping for large vocabulary speech recognition of under-resourced languages
title_full	Cross-lingual phone mapping for large vocabulary speech recognition of under-resourced languages
title_fullStr	Cross-lingual phone mapping for large vocabulary speech recognition of under-resourced languages
title_full_unstemmed	Cross-lingual phone mapping for large vocabulary speech recognition of under-resourced languages
title_sort	cross-lingual phone mapping for large vocabulary speech recognition of under-resourced languages
publishDate	2014
url	https://hdl.handle.net/10356/100818 http://hdl.handle.net/10220/19586
_version_	1681057324602490880

Cross-lingual phone mapping for large vocabulary speech recognition of under-resourced languages

مواد مشابهة