A transfer learning approach to goodness of pronunciation based automatic mispronunciation detection

Goodness of pronunciation (GOP) is the most widely used method for automatic mispronunciation detection. In this paper, a transfer learning approach to GOP based mispronunciation detection when applying maximum F1-score criterion (MFC) training to deep neural network (DNN)-hidden Markov model based...

Full description

Saved in:

Bibliographic Details
Main Authors:	Huang, Hao, Xu, Haihua, Hu, Ying, Zhou, Gang
Other Authors:	Temasek Laboratories
Format:	Article
Language:	English
Published:	2017
Subjects:	Acoustic Analysis Speech Recognition
Online Access:	https://hdl.handle.net/10356/86625 http://hdl.handle.net/10220/44162
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-86625
record_format	dspace
spelling	sg-ntu-dr.10356-866252020-09-26T22:18:15Z A transfer learning approach to goodness of pronunciation based automatic mispronunciation detection Huang, Hao Xu, Haihua Hu, Ying Zhou, Gang Temasek Laboratories Acoustic Analysis Speech Recognition Goodness of pronunciation (GOP) is the most widely used method for automatic mispronunciation detection. In this paper, a transfer learning approach to GOP based mispronunciation detection when applying maximum F1-score criterion (MFC) training to deep neural network (DNN)-hidden Markov model based acoustic models is proposed. Rather than train the whole network using MFC, a DNN is used, whose hidden layers are borrowed from native speech recognition with only the softmax layer trained according to the MFC objective function. As a result, significant mispronunciation detection improvement is obtained. In light of this, the two-stage transfer learning based GOP is investigated in depth. The first stage exploits the hidden layer(s) to extract phonetic-discriminating features. The second stage uses a trainable softmax layer to learn the human standard for judgment. The validation is carried out by experimenting with different mispronunciation detection architectures using acoustic models trained by different criteria. It is found that it is preferable to use frame-level cross-entropy to train the hidden layer parameters. Classifier based mispronunciation detection is further experimented with using features computed by transfer learning based GOP and it is shown that it also helps to achieve better results. Published version 2017-12-19T05:51:17Z 2019-12-06T16:26:01Z 2017-12-19T05:51:17Z 2019-12-06T16:26:01Z 2017 Journal Article Huang, H., Xu, H., Hu, Y., & Zhou, G. (2017). A transfer learning approach to goodness of pronunciation based automatic mispronunciation detection. The Journal of the Acoustical Society of America, 142(5), 3165-3177. 0001-4966 https://hdl.handle.net/10356/86625 http://hdl.handle.net/10220/44162 10.1121/1.5011159 en The Journal of the Acoustical Society of America © 2017 Acoustical Society of America. This paper was published in Journal of the Acoustical Society of America and is made available as an electronic reprint (preprint) with permission of Acoustical Society of America. The published version is available at: [http://dx.doi.org/10.1121/1.5011159]. One print or electronic copy may be made for personal use only. Systematic or multiple reproduction, distribution to multiple locations via electronic or other means, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper is prohibited and is subject to penalties under law. 13 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	Acoustic Analysis Speech Recognition
spellingShingle	Acoustic Analysis Speech Recognition Huang, Hao Xu, Haihua Hu, Ying Zhou, Gang A transfer learning approach to goodness of pronunciation based automatic mispronunciation detection
description	Goodness of pronunciation (GOP) is the most widely used method for automatic mispronunciation detection. In this paper, a transfer learning approach to GOP based mispronunciation detection when applying maximum F1-score criterion (MFC) training to deep neural network (DNN)-hidden Markov model based acoustic models is proposed. Rather than train the whole network using MFC, a DNN is used, whose hidden layers are borrowed from native speech recognition with only the softmax layer trained according to the MFC objective function. As a result, significant mispronunciation detection improvement is obtained. In light of this, the two-stage transfer learning based GOP is investigated in depth. The first stage exploits the hidden layer(s) to extract phonetic-discriminating features. The second stage uses a trainable softmax layer to learn the human standard for judgment. The validation is carried out by experimenting with different mispronunciation detection architectures using acoustic models trained by different criteria. It is found that it is preferable to use frame-level cross-entropy to train the hidden layer parameters. Classifier based mispronunciation detection is further experimented with using features computed by transfer learning based GOP and it is shown that it also helps to achieve better results.
author2	Temasek Laboratories
author_facet	Temasek Laboratories Huang, Hao Xu, Haihua Hu, Ying Zhou, Gang
format	Article
author	Huang, Hao Xu, Haihua Hu, Ying Zhou, Gang
author_sort	Huang, Hao
title	A transfer learning approach to goodness of pronunciation based automatic mispronunciation detection
title_short	A transfer learning approach to goodness of pronunciation based automatic mispronunciation detection
title_full	A transfer learning approach to goodness of pronunciation based automatic mispronunciation detection
title_fullStr	A transfer learning approach to goodness of pronunciation based automatic mispronunciation detection
title_full_unstemmed	A transfer learning approach to goodness of pronunciation based automatic mispronunciation detection
title_sort	transfer learning approach to goodness of pronunciation based automatic mispronunciation detection
publishDate	2017
url	https://hdl.handle.net/10356/86625 http://hdl.handle.net/10220/44162
_version_	1681058059611275264

A transfer learning approach to goodness of pronunciation based automatic mispronunciation detection

Similar Items