DEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING

<p align="justify">The shortcomings of Gaussian Mixture Model (GMM) in modeling spontaneous speech makes Deep Neural Network (DNN) an alternative technique for acoustic modeling. A DNN needs an enormous amount of training data to learn the model parameters effectively. A large amount...

Full description

Saved in:
Bibliographic Details
Main Author: YUWAN (NIM : 23516027), RAHMI
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/30162
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:30162
spelling id-itb.:301622018-03-16T14:01:41ZDEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING YUWAN (NIM : 23516027), RAHMI Indonesia Theses INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/30162 <p align="justify">The shortcomings of Gaussian Mixture Model (GMM) in modeling spontaneous speech makes Deep Neural Network (DNN) an alternative technique for acoustic modeling. A DNN needs an enormous amount of training data to learn the model parameters effectively. A large amount of training data does not guarantee all samples contain the appropriate information to build a good model. Active learning can be used to select these samples based on their informativeness. <br /> <br /> <br /> This research aims to build a DNN-based acoustic model for recognizing Indonesian spontaneous speech. The word error rate (WER) is used as the performance metric to compare the DNN-HMM model and the baseline model, GMM-HMM. The research also includes experiments to determine data contribution using active learning for both models in reducing WER. <br /> <br /> <br /> This research uses the previous studies’ speech corpus with additional data. The system is tested using closed and open schemes to the language model. The training corpus for the acoustic model contains 35.17 hours of speech data with 14,572 utterances from 239 speakers. This corpus is also used as the baseline set for active learning. The test corpus is selected randomly as 1,989 utterances with duration 3.6 hours of speech spoken by 10% from total speakers. <br /> <br /> <br /> The DNN-HMM model has a 2.53% and 3.89% performance rate improvement compared to the triphone-based GMM-HMM in the closed and open testing respectively. Active learning for GMM-HMM model shows a balanced performace by only using 54% of data compared to baseline set. Meanwhile, increasing the amount of data for DNN modeling improves ASR performance.<p align="justify"> text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description <p align="justify">The shortcomings of Gaussian Mixture Model (GMM) in modeling spontaneous speech makes Deep Neural Network (DNN) an alternative technique for acoustic modeling. A DNN needs an enormous amount of training data to learn the model parameters effectively. A large amount of training data does not guarantee all samples contain the appropriate information to build a good model. Active learning can be used to select these samples based on their informativeness. <br /> <br /> <br /> This research aims to build a DNN-based acoustic model for recognizing Indonesian spontaneous speech. The word error rate (WER) is used as the performance metric to compare the DNN-HMM model and the baseline model, GMM-HMM. The research also includes experiments to determine data contribution using active learning for both models in reducing WER. <br /> <br /> <br /> This research uses the previous studies’ speech corpus with additional data. The system is tested using closed and open schemes to the language model. The training corpus for the acoustic model contains 35.17 hours of speech data with 14,572 utterances from 239 speakers. This corpus is also used as the baseline set for active learning. The test corpus is selected randomly as 1,989 utterances with duration 3.6 hours of speech spoken by 10% from total speakers. <br /> <br /> <br /> The DNN-HMM model has a 2.53% and 3.89% performance rate improvement compared to the triphone-based GMM-HMM in the closed and open testing respectively. Active learning for GMM-HMM model shows a balanced performace by only using 54% of data compared to baseline set. Meanwhile, increasing the amount of data for DNN modeling improves ASR performance.<p align="justify">
format Theses
author YUWAN (NIM : 23516027), RAHMI
spellingShingle YUWAN (NIM : 23516027), RAHMI
DEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING
author_facet YUWAN (NIM : 23516027), RAHMI
author_sort YUWAN (NIM : 23516027), RAHMI
title DEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING
title_short DEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING
title_full DEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING
title_fullStr DEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING
title_full_unstemmed DEEP NEURAL NETWORK ACOUSTIC MODELING FOR INDONESIAN SPONTANEOUS SPEECH RECOGNITION WITH ACTIVE LEARNING
title_sort deep neural network acoustic modeling for indonesian spontaneous speech recognition with active learning
url https://digilib.itb.ac.id/gdl/view/30162
_version_ 1821995660407209984