DEEP DENOISING BABBLE NOISE USING A COMBINATION OF CNN AND RNN
Babble noise or chattering noise has become a type of sound signal noise that is difficult to handle. In this study, this noise handling was carried out by utilizing a deep learning model, more precisely a combination of CNN and RNN-based architectures. The model with this basis was chosen because...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/76866 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:76866 |
---|---|
spelling |
id-itb.:768662023-08-19T09:44:17Z DEEP DENOISING BABBLE NOISE USING A COMBINATION OF CNN AND RNN Nurul Hukmi, Imam Indonesia Final Project denoising, deep learning, CNN, RNN, combination of CNN and RNN INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/76866 Babble noise or chattering noise has become a type of sound signal noise that is difficult to handle. In this study, this noise handling was carried out by utilizing a deep learning model, more precisely a combination of CNN and RNN-based architectures. The model with this basis was chosen because of the consideration of the advantages possessed by each type. CNN-based architectures can handle spatial data, have better generalization, and can do inference quickly. RNN-based architectures are specialists in dealing with ordered data. The built model is trained to be able to identify the noise contained in the noise signal in the form of a spectrogram. To find out how good the performance of the model is, this blended model is compared with CNN-based models and RNN-based models. Especially for models based on a combination of CNN and RNN as the main model, experiments will be carried out to get the best configuration of the model architecture. The metric that will be used is PESQ (perceptual evaluation of speech quality), which is a metric that is built to resemble how humans evaluate signal quality and assess the difference between the noise signal and the reference clean signal. The experimental results show that the best configuration for the CNN and RNN combination model is the model with the CNN U-Net architecture, the activation function is PRELU, and the RNN layer type is GRU. After conducting the training, it was found that the model based on the combination of CNN-RNN, CNN, and RNN had PESQ values of 2.23, 1.95, and 1.58, respectively. The results of the follow-up research show that the blended model succeeded in providing better output than the other two models on data with SNR levels within its training range. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Babble noise or chattering noise has become a type of sound signal noise that is difficult to handle. In
this study, this noise handling was carried out by utilizing a deep learning model, more precisely a
combination of CNN and RNN-based architectures. The model with this basis was chosen because of
the consideration of the advantages possessed by each type. CNN-based architectures can handle
spatial data, have better generalization, and can do inference quickly. RNN-based architectures are
specialists in dealing with ordered data. The built model is trained to be able to identify the noise
contained in the noise signal in the form of a spectrogram. To find out how good the performance of
the model is, this blended model is compared with CNN-based models and RNN-based models.
Especially for models based on a combination of CNN and RNN as the main model, experiments will
be carried out to get the best configuration of the model architecture. The metric that will be used is
PESQ (perceptual evaluation of speech quality), which is a metric that is built to resemble how humans
evaluate signal quality and assess the difference between the noise signal and the reference clean
signal. The experimental results show that the best configuration for the CNN and RNN combination
model is the model with the CNN U-Net architecture, the activation function is PRELU, and the RNN
layer type is GRU. After conducting the training, it was found that the model based on the combination
of CNN-RNN, CNN, and RNN had PESQ values of 2.23, 1.95, and 1.58, respectively. The results of the
follow-up research show that the blended model succeeded in providing better output than the other
two models on data with SNR levels within its training range. |
format |
Final Project |
author |
Nurul Hukmi, Imam |
spellingShingle |
Nurul Hukmi, Imam DEEP DENOISING BABBLE NOISE USING A COMBINATION OF CNN AND RNN |
author_facet |
Nurul Hukmi, Imam |
author_sort |
Nurul Hukmi, Imam |
title |
DEEP DENOISING BABBLE NOISE USING A COMBINATION OF CNN AND RNN |
title_short |
DEEP DENOISING BABBLE NOISE USING A COMBINATION OF CNN AND RNN |
title_full |
DEEP DENOISING BABBLE NOISE USING A COMBINATION OF CNN AND RNN |
title_fullStr |
DEEP DENOISING BABBLE NOISE USING A COMBINATION OF CNN AND RNN |
title_full_unstemmed |
DEEP DENOISING BABBLE NOISE USING A COMBINATION OF CNN AND RNN |
title_sort |
deep denoising babble noise using a combination of cnn and rnn |
url |
https://digilib.itb.ac.id/gdl/view/76866 |
_version_ |
1822995089667915776 |