CONVOLUTIONAL NEURAL NETWORK ACOUSTIC MODEL FOR ROBUSTNESS OF INDONESIAN SPEECH RECOGNITION IN STATIONARY NOISE ENVIRONMENT

Noise causes the decreasing accuracy of speech recognition system. Several techniques have been developed and proposed to overcome this problem. Using artificial neural network (ANN) as acoustic model is one of the techniques. Convolutional neural network (CNN) is a variant of ANN that has been u...

Full description

Saved in:
Bibliographic Details
Main Author: Jerremy Budiman, Marvin
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/39718
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:39718
spelling id-itb.:397182019-06-27T14:28:29ZCONVOLUTIONAL NEURAL NETWORK ACOUSTIC MODEL FOR ROBUSTNESS OF INDONESIAN SPEECH RECOGNITION IN STATIONARY NOISE ENVIRONMENT Jerremy Budiman, Marvin Indonesia Final Project noise, acoustic model, speech recognition. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/39718 Noise causes the decreasing accuracy of speech recognition system. Several techniques have been developed and proposed to overcome this problem. Using artificial neural network (ANN) as acoustic model is one of the techniques. Convolutional neural network (CNN) is a variant of ANN that has been used for acoustic modeling. Another technique to improve speech recognition accuracy is to do pre-processing to the speech signal or to the extracted acoustic feature from speech signal. Cepstral mean and variance normalization (CMVN) is one of the pre-processing technique. It has been proven that CMVN can improve the accuracy of speech recognition. In this thesis, CNN acoustic models were made by using CMVN pre-processed acoustic feature to make a noise-robust speech recognition system. Two models, each to handle 2 kinds of noise (babble noise and street noise) were made. Those acoustic models were tested with noisy speech at different SNR (signal-to-noise) value. Testing results from CNN acoustic models were compared with the ones from GMM-HMM acoustic models. Testing results showed the increasing accuracy scores of acoustic models when models were trained using more variation of training data. On the other hand, accuracy scores get lower when models were tested with speech that has lower SNR value. Comparation of CNN acoustic models and GMM-HMM acoustic models gave insight about how the selection of acoustic feature can affect the accuracy scores of the models. CNN acoustic models that were built using FBANK feature have higher accuracy scores than GMM-HMM models that were built using the same feature. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Noise causes the decreasing accuracy of speech recognition system. Several techniques have been developed and proposed to overcome this problem. Using artificial neural network (ANN) as acoustic model is one of the techniques. Convolutional neural network (CNN) is a variant of ANN that has been used for acoustic modeling. Another technique to improve speech recognition accuracy is to do pre-processing to the speech signal or to the extracted acoustic feature from speech signal. Cepstral mean and variance normalization (CMVN) is one of the pre-processing technique. It has been proven that CMVN can improve the accuracy of speech recognition. In this thesis, CNN acoustic models were made by using CMVN pre-processed acoustic feature to make a noise-robust speech recognition system. Two models, each to handle 2 kinds of noise (babble noise and street noise) were made. Those acoustic models were tested with noisy speech at different SNR (signal-to-noise) value. Testing results from CNN acoustic models were compared with the ones from GMM-HMM acoustic models. Testing results showed the increasing accuracy scores of acoustic models when models were trained using more variation of training data. On the other hand, accuracy scores get lower when models were tested with speech that has lower SNR value. Comparation of CNN acoustic models and GMM-HMM acoustic models gave insight about how the selection of acoustic feature can affect the accuracy scores of the models. CNN acoustic models that were built using FBANK feature have higher accuracy scores than GMM-HMM models that were built using the same feature.
format Final Project
author Jerremy Budiman, Marvin
spellingShingle Jerremy Budiman, Marvin
CONVOLUTIONAL NEURAL NETWORK ACOUSTIC MODEL FOR ROBUSTNESS OF INDONESIAN SPEECH RECOGNITION IN STATIONARY NOISE ENVIRONMENT
author_facet Jerremy Budiman, Marvin
author_sort Jerremy Budiman, Marvin
title CONVOLUTIONAL NEURAL NETWORK ACOUSTIC MODEL FOR ROBUSTNESS OF INDONESIAN SPEECH RECOGNITION IN STATIONARY NOISE ENVIRONMENT
title_short CONVOLUTIONAL NEURAL NETWORK ACOUSTIC MODEL FOR ROBUSTNESS OF INDONESIAN SPEECH RECOGNITION IN STATIONARY NOISE ENVIRONMENT
title_full CONVOLUTIONAL NEURAL NETWORK ACOUSTIC MODEL FOR ROBUSTNESS OF INDONESIAN SPEECH RECOGNITION IN STATIONARY NOISE ENVIRONMENT
title_fullStr CONVOLUTIONAL NEURAL NETWORK ACOUSTIC MODEL FOR ROBUSTNESS OF INDONESIAN SPEECH RECOGNITION IN STATIONARY NOISE ENVIRONMENT
title_full_unstemmed CONVOLUTIONAL NEURAL NETWORK ACOUSTIC MODEL FOR ROBUSTNESS OF INDONESIAN SPEECH RECOGNITION IN STATIONARY NOISE ENVIRONMENT
title_sort convolutional neural network acoustic model for robustness of indonesian speech recognition in stationary noise environment
url https://digilib.itb.ac.id/gdl/view/39718
_version_ 1821997872455876608