DEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING
<p align="justify">The shortcomings of Gaussian Mixture Model (GMM) in modeling spontaneous speech makes Deep Neural Network (DNN) an alternative technique for acoustic modeling. A DNN needs an enormous amount of training data to learn the model parameters effectively. A large amount...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/30162 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:30162 |
---|---|
spelling |
id-itb.:301622018-03-16T14:01:41ZDEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING YUWAN (NIM : 23516027), RAHMI Indonesia Theses INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/30162 <p align="justify">The shortcomings of Gaussian Mixture Model (GMM) in modeling spontaneous speech makes Deep Neural Network (DNN) an alternative technique for acoustic modeling. A DNN needs an enormous amount of training data to learn the model parameters effectively. A large amount of training data does not guarantee all samples contain the appropriate information to build a good model. Active learning can be used to select these samples based on their informativeness. <br /> <br /> <br /> This research aims to build a DNN-based acoustic model for recognizing Indonesian spontaneous speech. The word error rate (WER) is used as the performance metric to compare the DNN-HMM model and the baseline model, GMM-HMM. The research also includes experiments to determine data contribution using active learning for both models in reducing WER. <br /> <br /> <br /> This research uses the previous studies’ speech corpus with additional data. The system is tested using closed and open schemes to the language model. The training corpus for the acoustic model contains 35.17 hours of speech data with 14,572 utterances from 239 speakers. This corpus is also used as the baseline set for active learning. The test corpus is selected randomly as 1,989 utterances with duration 3.6 hours of speech spoken by 10% from total speakers. <br /> <br /> <br /> The DNN-HMM model has a 2.53% and 3.89% performance rate improvement compared to the triphone-based GMM-HMM in the closed and open testing respectively. Active learning for GMM-HMM model shows a balanced performace by only using 54% of data compared to baseline set. Meanwhile, increasing the amount of data for DNN modeling improves ASR performance.<p align="justify"> text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
<p align="justify">The shortcomings of Gaussian Mixture Model (GMM) in modeling spontaneous speech makes Deep Neural Network (DNN) an alternative technique for acoustic modeling. A DNN needs an enormous amount of training data to learn the model parameters effectively. A large amount of training data does not guarantee all samples contain the appropriate information to build a good model. Active learning can be used to select these samples based on their informativeness. <br />
<br />
<br />
This research aims to build a DNN-based acoustic model for recognizing Indonesian spontaneous speech. The word error rate (WER) is used as the performance metric to compare the DNN-HMM model and the baseline model, GMM-HMM. The research also includes experiments to determine data contribution using active learning for both models in reducing WER. <br />
<br />
<br />
This research uses the previous studies’ speech corpus with additional data. The system is tested using closed and open schemes to the language model. The training corpus for the acoustic model contains 35.17 hours of speech data with 14,572 utterances from 239 speakers. This corpus is also used as the baseline set for active learning. The test corpus is selected randomly as 1,989 utterances with duration 3.6 hours of speech spoken by 10% from total speakers. <br />
<br />
<br />
The DNN-HMM model has a 2.53% and 3.89% performance rate improvement compared to the triphone-based GMM-HMM in the closed and open testing respectively. Active learning for GMM-HMM model shows a balanced performace by only using 54% of data compared to baseline set. Meanwhile, increasing the amount of data for DNN modeling improves ASR performance.<p align="justify"> |
format |
Theses |
author |
YUWAN (NIM : 23516027), RAHMI |
spellingShingle |
YUWAN (NIM : 23516027), RAHMI DEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING |
author_facet |
YUWAN (NIM : 23516027), RAHMI |
author_sort |
YUWAN (NIM : 23516027), RAHMI |
title |
DEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING |
title_short |
DEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING |
title_full |
DEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING |
title_fullStr |
DEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING |
title_full_unstemmed |
DEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING |
title_sort |
deep neural network acoustic modeling for indonesian spontaneous speech recognition with active learning |
url |
https://digilib.itb.ac.id/gdl/view/30162 |
_version_ |
1821995660407209984 |