The NNi Vietnamese speech recognition system for mediaeval 2016

This paper provides an overall description of the Vietnamese speech recognition system developed by the joint team for MediaEval 2016. The submitted system consisted of 3 subsystems, and adopted different deep neural network-based techniques such as fMLLR transformed bottleneck features, sequence tr...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiao, Xiong, Nwe, Tin Lay, Chng, Eng Siong, Ma, Bin, Li, Haizhou, Wang, Lei, Ni, Chongjia, Leung, Cheung-Chi, You, Changhuai, Xie, Lei, Xu, Haihua
Other Authors: School of Computer Science and Engineering
Format: Conference or Workshop Item
Language:English
Published: 2019
Subjects:
Online Access:https://hdl.handle.net/10356/79851
http://hdl.handle.net/10220/48316
http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_52.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:This paper provides an overall description of the Vietnamese speech recognition system developed by the joint team for MediaEval 2016. The submitted system consisted of 3 subsystems, and adopted different deep neural network-based techniques such as fMLLR transformed bottleneck features, sequence training, etc. Besides the acoustic modeling techniques, speech data augmentation was also examined to develop a more robust acoustic model. The I2R team collected a number of text resources from the Internet and made them available to other participants in the task. The web text crawled from the Internet was used to train a 5-gram language model. The submitted system obtained the token error rate (TER) of 15.1, 23.0 and 50.5 on Devel local set, Devel set and Test set, respectively.