DEEP DENOISING BABBLE NOISE USING A COMBINATION OF CNN AND RNN

Babble noise or chattering noise has become a type of sound signal noise that is difficult to handle. In this study, this noise handling was carried out by utilizing a deep learning model, more precisely a combination of CNN and RNN-based architectures. The model with this basis was chosen because...

Full description

Saved in:
Bibliographic Details
Main Author: Nurul Hukmi, Imam
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/76866
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:76866
spelling id-itb.:768662023-08-19T09:44:17Z DEEP DENOISING BABBLE NOISE USING A COMBINATION OF CNN AND RNN Nurul Hukmi, Imam Indonesia Final Project denoising, deep learning, CNN, RNN, combination of CNN and RNN INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/76866 Babble noise or chattering noise has become a type of sound signal noise that is difficult to handle. In this study, this noise handling was carried out by utilizing a deep learning model, more precisely a combination of CNN and RNN-based architectures. The model with this basis was chosen because of the consideration of the advantages possessed by each type. CNN-based architectures can handle spatial data, have better generalization, and can do inference quickly. RNN-based architectures are specialists in dealing with ordered data. The built model is trained to be able to identify the noise contained in the noise signal in the form of a spectrogram. To find out how good the performance of the model is, this blended model is compared with CNN-based models and RNN-based models. Especially for models based on a combination of CNN and RNN as the main model, experiments will be carried out to get the best configuration of the model architecture. The metric that will be used is PESQ (perceptual evaluation of speech quality), which is a metric that is built to resemble how humans evaluate signal quality and assess the difference between the noise signal and the reference clean signal. The experimental results show that the best configuration for the CNN and RNN combination model is the model with the CNN U-Net architecture, the activation function is PRELU, and the RNN layer type is GRU. After conducting the training, it was found that the model based on the combination of CNN-RNN, CNN, and RNN had PESQ values of 2.23, 1.95, and 1.58, respectively. The results of the follow-up research show that the blended model succeeded in providing better output than the other two models on data with SNR levels within its training range. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Babble noise or chattering noise has become a type of sound signal noise that is difficult to handle. In this study, this noise handling was carried out by utilizing a deep learning model, more precisely a combination of CNN and RNN-based architectures. The model with this basis was chosen because of the consideration of the advantages possessed by each type. CNN-based architectures can handle spatial data, have better generalization, and can do inference quickly. RNN-based architectures are specialists in dealing with ordered data. The built model is trained to be able to identify the noise contained in the noise signal in the form of a spectrogram. To find out how good the performance of the model is, this blended model is compared with CNN-based models and RNN-based models. Especially for models based on a combination of CNN and RNN as the main model, experiments will be carried out to get the best configuration of the model architecture. The metric that will be used is PESQ (perceptual evaluation of speech quality), which is a metric that is built to resemble how humans evaluate signal quality and assess the difference between the noise signal and the reference clean signal. The experimental results show that the best configuration for the CNN and RNN combination model is the model with the CNN U-Net architecture, the activation function is PRELU, and the RNN layer type is GRU. After conducting the training, it was found that the model based on the combination of CNN-RNN, CNN, and RNN had PESQ values of 2.23, 1.95, and 1.58, respectively. The results of the follow-up research show that the blended model succeeded in providing better output than the other two models on data with SNR levels within its training range.
format Final Project
author Nurul Hukmi, Imam
spellingShingle Nurul Hukmi, Imam
DEEP DENOISING BABBLE NOISE USING A COMBINATION OF CNN AND RNN
author_facet Nurul Hukmi, Imam
author_sort Nurul Hukmi, Imam
title DEEP DENOISING BABBLE NOISE USING A COMBINATION OF CNN AND RNN
title_short DEEP DENOISING BABBLE NOISE USING A COMBINATION OF CNN AND RNN
title_full DEEP DENOISING BABBLE NOISE USING A COMBINATION OF CNN AND RNN
title_fullStr DEEP DENOISING BABBLE NOISE USING A COMBINATION OF CNN AND RNN
title_full_unstemmed DEEP DENOISING BABBLE NOISE USING A COMBINATION OF CNN AND RNN
title_sort deep denoising babble noise using a combination of cnn and rnn
url https://digilib.itb.ac.id/gdl/view/76866
_version_ 1822995089667915776