INDONESIAN SPONTANEOUS SPEECH RECOGNITION SYSTEM USING DEEP NEURAL NETWORKS
The existing Indonesian speech recognition system has an accuracy that is still not good for spontaneous speech recognition. The system was trained using the HMM-GMM acoustic model. In this study, spontaneous speech data collected in Indonesian with a duration of 14 hours and speech recognition syst...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/48149 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:48149 |
---|---|
spelling |
id-itb.:481492020-06-26T22:34:24ZINDONESIAN SPONTANEOUS SPEECH RECOGNITION SYSTEM USING DEEP NEURAL NETWORKS Arif Rahman, Dandy Indonesia Final Project neural network, CNN, DNN, TDNN, acoustic model, WER INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/48149 The existing Indonesian speech recognition system has an accuracy that is still not good for spontaneous speech recognition. The system was trained using the HMM-GMM acoustic model. In this study, spontaneous speech data collected in Indonesian with a duration of 14 hours and speech recognition system performance was improved by replacing the acoustic model with a neural network-based model. The neural network topology used are Deep Neural Network (DNN), Convolutional Neural Network (CNN), and Time Delay Neural Network (TDNN). In this paper, the baseline is the HMM-GMM acoustic model trained with dictated speech, the WER obtained by 73.87%. Then the model was trained on data augmented with noise, the WER value dropped to 71.15%. Then the adaptation technique is applied to the model so that the WER drops to 62.75%. Then adaptation model added noise augmentation and WER dropped to 62.16%. In subsequent experiments, the model was trained with mixed training data between dictated and spontaneous speech, the WER value dropped to 57.59%. Furthermore, the acoustic model was replaced with a neural network-based model. In the DNN model, the WER value drops to 50.02%. While on the CNN model, the WER value dropped to 47.58%. The smallest WER value was obtained in acoustic modeling using TDNN topology. The WER value of the model is 40.63%. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
The existing Indonesian speech recognition system has an accuracy that is still not good for spontaneous speech recognition. The system was trained using the HMM-GMM acoustic model. In this study, spontaneous speech data collected in Indonesian with a duration of 14 hours and speech recognition system performance was improved by replacing the acoustic model with a neural network-based model. The neural network topology used are Deep Neural Network (DNN), Convolutional Neural Network (CNN), and Time Delay Neural Network (TDNN).
In this paper, the baseline is the HMM-GMM acoustic model trained with dictated speech, the WER obtained by 73.87%. Then the model was trained on data augmented with noise, the WER value dropped to 71.15%. Then the adaptation technique is applied to the model so that the WER drops to 62.75%. Then adaptation model added noise augmentation and WER dropped to 62.16%. In subsequent experiments, the model was trained with mixed training data between dictated and spontaneous speech, the WER value dropped to 57.59%. Furthermore, the acoustic model was replaced with a neural network-based model. In the DNN model, the WER value drops to 50.02%. While on the CNN model, the WER value dropped to 47.58%. The smallest WER value was obtained in acoustic modeling using TDNN topology. The WER value of the model is 40.63%.
|
format |
Final Project |
author |
Arif Rahman, Dandy |
spellingShingle |
Arif Rahman, Dandy INDONESIAN SPONTANEOUS SPEECH RECOGNITION SYSTEM USING DEEP NEURAL NETWORKS |
author_facet |
Arif Rahman, Dandy |
author_sort |
Arif Rahman, Dandy |
title |
INDONESIAN SPONTANEOUS SPEECH RECOGNITION SYSTEM USING DEEP NEURAL NETWORKS |
title_short |
INDONESIAN SPONTANEOUS SPEECH RECOGNITION SYSTEM USING DEEP NEURAL NETWORKS |
title_full |
INDONESIAN SPONTANEOUS SPEECH RECOGNITION SYSTEM USING DEEP NEURAL NETWORKS |
title_fullStr |
INDONESIAN SPONTANEOUS SPEECH RECOGNITION SYSTEM USING DEEP NEURAL NETWORKS |
title_full_unstemmed |
INDONESIAN SPONTANEOUS SPEECH RECOGNITION SYSTEM USING DEEP NEURAL NETWORKS |
title_sort |
indonesian spontaneous speech recognition system using deep neural networks |
url |
https://digilib.itb.ac.id/gdl/view/48149 |
_version_ |
1822927838791073792 |