CONVOLUTIONAL NEURAL NETWORK ACOUSTIC MODEL FOR ROBUSTNESS OF INDONESIAN SPEECH RECOGNITION IN STATIONARY NOISE ENVIRONMENT
Noise causes the decreasing accuracy of speech recognition system. Several techniques have been developed and proposed to overcome this problem. Using artificial neural network (ANN) as acoustic model is one of the techniques. Convolutional neural network (CNN) is a variant of ANN that has been u...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/39718 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Noise causes the decreasing accuracy of speech recognition system. Several
techniques have been developed and proposed to overcome this problem. Using
artificial neural network (ANN) as acoustic model is one of the techniques.
Convolutional neural network (CNN) is a variant of ANN that has been used for
acoustic modeling. Another technique to improve speech recognition accuracy is
to do pre-processing to the speech signal or to the extracted acoustic feature from
speech signal. Cepstral mean and variance normalization (CMVN) is one of the
pre-processing technique. It has been proven that CMVN can improve the
accuracy of speech recognition.
In this thesis, CNN acoustic models were made by using CMVN pre-processed
acoustic feature to make a noise-robust speech recognition system. Two models,
each to handle 2 kinds of noise (babble noise and street noise) were made. Those
acoustic models were tested with noisy speech at different SNR (signal-to-noise)
value. Testing results from CNN acoustic models were compared with the ones
from GMM-HMM acoustic models.
Testing results showed the increasing accuracy scores of acoustic models when
models were trained using more variation of training data. On the other hand,
accuracy scores get lower when models were tested with speech that has lower
SNR value. Comparation of CNN acoustic models and GMM-HMM acoustic
models gave insight about how the selection of acoustic feature can affect the
accuracy scores of the models. CNN acoustic models that were built using
FBANK feature have higher accuracy scores than GMM-HMM models that were
built using the same feature. |
---|