DEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING

<p align="justify">The shortcomings of Gaussian Mixture Model (GMM) in modeling spontaneous speech makes Deep Neural Network (DNN) an alternative technique for acoustic modeling. A DNN needs an enormous amount of training data to learn the model parameters effectively. A large amount...

Full description

Saved in:

Bibliographic Details
Main Author:	YUWAN (NIM : 23516027), RAHMI
Format:	Theses
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/30162
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:30162
spelling	id-itb.:301622018-03-16T14:01:41ZDEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING YUWAN (NIM : 23516027), RAHMI Indonesia Theses INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/30162 <p align="justify">The shortcomings of Gaussian Mixture Model (GMM) in modeling spontaneous speech makes Deep Neural Network (DNN) an alternative technique for acoustic modeling. A DNN needs an enormous amount of training data to learn the model parameters effectively. A large amount of training data does not guarantee all samples contain the appropriate information to build a good model. Active learning can be used to select these samples based on their informativeness. <br /> <br /> <br /> This research aims to build a DNN-based acoustic model for recognizing Indonesian spontaneous speech. The word error rate (WER) is used as the performance metric to compare the DNN-HMM model and the baseline model, GMM-HMM. The research also includes experiments to determine data contribution using active learning for both models in reducing WER. <br /> <br /> <br /> This research uses the previous studiesÃ‚Â’ speech corpus with additional data. The system is tested using closed and open schemes to the language model. The training corpus for the acoustic model contains 35.17 hours of speech data with 14,572 utterances from 239 speakers. This corpus is also used as the baseline set for active learning. The test corpus is selected randomly as 1,989 utterances with duration 3.6 hours of speech spoken by 10% from total speakers. <br /> <br /> <br /> The DNN-HMM model has a 2.53% and 3.89% performance rate improvement compared to the triphone-based GMM-HMM in the closed and open testing respectively. Active learning for GMM-HMM model shows a balanced performace by only using 54% of data compared to baseline set. Meanwhile, increasing the amount of data for DNN modeling improves ASR performance.<p align="justify"> text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	<p align="justify">The shortcomings of Gaussian Mixture Model (GMM) in modeling spontaneous speech makes Deep Neural Network (DNN) an alternative technique for acoustic modeling. A DNN needs an enormous amount of training data to learn the model parameters effectively. A large amount of training data does not guarantee all samples contain the appropriate information to build a good model. Active learning can be used to select these samples based on their informativeness. <br /> <br /> <br /> This research aims to build a DNN-based acoustic model for recognizing Indonesian spontaneous speech. The word error rate (WER) is used as the performance metric to compare the DNN-HMM model and the baseline model, GMM-HMM. The research also includes experiments to determine data contribution using active learning for both models in reducing WER. <br /> <br /> <br /> This research uses the previous studiesÃ‚Â’ speech corpus with additional data. The system is tested using closed and open schemes to the language model. The training corpus for the acoustic model contains 35.17 hours of speech data with 14,572 utterances from 239 speakers. This corpus is also used as the baseline set for active learning. The test corpus is selected randomly as 1,989 utterances with duration 3.6 hours of speech spoken by 10% from total speakers. <br /> <br /> <br /> The DNN-HMM model has a 2.53% and 3.89% performance rate improvement compared to the triphone-based GMM-HMM in the closed and open testing respectively. Active learning for GMM-HMM model shows a balanced performace by only using 54% of data compared to baseline set. Meanwhile, increasing the amount of data for DNN modeling improves ASR performance.<p align="justify">
format	Theses
author	YUWAN (NIM : 23516027), RAHMI
spellingShingle	YUWAN (NIM : 23516027), RAHMI DEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING
author_facet	YUWAN (NIM : 23516027), RAHMI
author_sort	YUWAN (NIM : 23516027), RAHMI
title	DEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING
title_short	DEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING
title_full	DEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING
title_fullStr	DEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING
title_full_unstemmed	DEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING
title_sort	deep neural network acoustic modeling for indonesian spontaneous speech recognition with active learning
url	https://digilib.itb.ac.id/gdl/view/30162
_version_	1823636521292398592

DEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING

Similar Items