INDONESIAN SPONTANEOUS SPEECH RECOGNITION SYSTEM USING DEEP NEURAL NETWORKS

The existing Indonesian speech recognition system has an accuracy that is still not good for spontaneous speech recognition. The system was trained using the HMM-GMM acoustic model. In this study, spontaneous speech data collected in Indonesian with a duration of 14 hours and speech recognition syst...

Full description

Saved in:
Bibliographic Details
Main Author: Arif Rahman, Dandy
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/48149
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:48149
spelling id-itb.:481492020-06-26T22:34:24ZINDONESIAN SPONTANEOUS SPEECH RECOGNITION SYSTEM USING DEEP NEURAL NETWORKS Arif Rahman, Dandy Indonesia Final Project neural network, CNN, DNN, TDNN, acoustic model, WER INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/48149 The existing Indonesian speech recognition system has an accuracy that is still not good for spontaneous speech recognition. The system was trained using the HMM-GMM acoustic model. In this study, spontaneous speech data collected in Indonesian with a duration of 14 hours and speech recognition system performance was improved by replacing the acoustic model with a neural network-based model. The neural network topology used are Deep Neural Network (DNN), Convolutional Neural Network (CNN), and Time Delay Neural Network (TDNN). In this paper, the baseline is the HMM-GMM acoustic model trained with dictated speech, the WER obtained by 73.87%. Then the model was trained on data augmented with noise, the WER value dropped to 71.15%. Then the adaptation technique is applied to the model so that the WER drops to 62.75%. Then adaptation model added noise augmentation and WER dropped to 62.16%. In subsequent experiments, the model was trained with mixed training data between dictated and spontaneous speech, the WER value dropped to 57.59%. Furthermore, the acoustic model was replaced with a neural network-based model. In the DNN model, the WER value drops to 50.02%. While on the CNN model, the WER value dropped to 47.58%. The smallest WER value was obtained in acoustic modeling using TDNN topology. The WER value of the model is 40.63%. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description The existing Indonesian speech recognition system has an accuracy that is still not good for spontaneous speech recognition. The system was trained using the HMM-GMM acoustic model. In this study, spontaneous speech data collected in Indonesian with a duration of 14 hours and speech recognition system performance was improved by replacing the acoustic model with a neural network-based model. The neural network topology used are Deep Neural Network (DNN), Convolutional Neural Network (CNN), and Time Delay Neural Network (TDNN). In this paper, the baseline is the HMM-GMM acoustic model trained with dictated speech, the WER obtained by 73.87%. Then the model was trained on data augmented with noise, the WER value dropped to 71.15%. Then the adaptation technique is applied to the model so that the WER drops to 62.75%. Then adaptation model added noise augmentation and WER dropped to 62.16%. In subsequent experiments, the model was trained with mixed training data between dictated and spontaneous speech, the WER value dropped to 57.59%. Furthermore, the acoustic model was replaced with a neural network-based model. In the DNN model, the WER value drops to 50.02%. While on the CNN model, the WER value dropped to 47.58%. The smallest WER value was obtained in acoustic modeling using TDNN topology. The WER value of the model is 40.63%.
format Final Project
author Arif Rahman, Dandy
spellingShingle Arif Rahman, Dandy
INDONESIAN SPONTANEOUS SPEECH RECOGNITION SYSTEM USING DEEP NEURAL NETWORKS
author_facet Arif Rahman, Dandy
author_sort Arif Rahman, Dandy
title INDONESIAN SPONTANEOUS SPEECH RECOGNITION SYSTEM USING DEEP NEURAL NETWORKS
title_short INDONESIAN SPONTANEOUS SPEECH RECOGNITION SYSTEM USING DEEP NEURAL NETWORKS
title_full INDONESIAN SPONTANEOUS SPEECH RECOGNITION SYSTEM USING DEEP NEURAL NETWORKS
title_fullStr INDONESIAN SPONTANEOUS SPEECH RECOGNITION SYSTEM USING DEEP NEURAL NETWORKS
title_full_unstemmed INDONESIAN SPONTANEOUS SPEECH RECOGNITION SYSTEM USING DEEP NEURAL NETWORKS
title_sort indonesian spontaneous speech recognition system using deep neural networks
url https://digilib.itb.ac.id/gdl/view/48149
_version_ 1822927838791073792