CHANNEL NORMALIZATION OF SPEECH ACOUSTIC SIGNAL USING WITHIN-CLASS COVARIANCE NORMALIZATION (WCCN) FOR SPEAKER RECOGNITION SYSTEM WITH BAHASA INDONESIA

Speaker recognition system is a technology that can be used to verify the speaker's identity from an unknown speech voice sample. In Indonesia, this system is actively used to assist the speaker verification process as an evidence in court by the anti-corruption agency, the Police and the Attor...

Full description

Saved in:
Bibliographic Details
Main Author: Saifuddin, Habibi
Format: Theses
Language:Indonesia
Subjects:
Online Access:https://digilib.itb.ac.id/gdl/view/56272
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:56272
spelling id-itb.:562722021-06-21T17:37:04ZCHANNEL NORMALIZATION OF SPEECH ACOUSTIC SIGNAL USING WITHIN-CLASS COVARIANCE NORMALIZATION (WCCN) FOR SPEAKER RECOGNITION SYSTEM WITH BAHASA INDONESIA Saifuddin, Habibi Teknik (Rekayasa, enjinering dan kegiatan berkaitan) Indonesia Theses Speech Recognition System, I-vector, cosine distance, same-channel, channel mismatch, Within-class Covariance Normalization, MFCC, Equal Error Rate. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/56272 Speaker recognition system is a technology that can be used to verify the speaker's identity from an unknown speech voice sample. In Indonesia, this system is actively used to assist the speaker verification process as an evidence in court by the anti-corruption agency, the Police and the Attorney General's Office. The speech recognition system developed in this study uses I-vector modeling. This system is trained and tested using an Indonesian speech database owned by the Acoustic Laboratory of Physics Engineering Department at Institut Teknologi Bandung. The test data used were speech data of 46 males and 52 females and the training data were the first 20 speakers for each gender and recording scenarios. In this system, the speech features are extracted from its speech data, using 19 Mels Frequency Cepstral Coefficients (MFCCs) along with 1 energy dimension, 20 delta-MFCC, and 20 delta-delta-MFCC. The extracted speech features are modeled using 32 Gaussian components of UBM and 100 I-vector feature dimensions. Furthermore, an assessment of the similarity of the Known (K) and Unknown (UK) samples is carried out using the cosine distance method. The previous experiment using the same dataset and parameters has achieved maximum results in the female speech interview scenario data with Equal Error Rate (EER) = 3.50%. In this study, an effort to improve system performance from the same and different (mismatched) voice recording devices was carried out using the Within-Class Covariance Normalization (WCCN) technique. According to the hypothesis, the WCCN technique applied on the same-channel and mismatched-channel speaker recognition system can improve the system performance. In the same-channel experiment, an increase of 31.43% in system performance was obtained from the previous studies using the same dataset and parameters without WCCN. The best EER obtained in this study was 2.40% which was obtained in a same-channel experiment on a female interview scenario. Compared to the original I-vector system, the mismatched-channel speaker recognition system using WCCN has experienced an average performance increase of 33.75% in each scenario. The best EER obtained in a mismatched-channel speaker recognition system is in the female conversation scenario with an EER of 20.52%. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
topic Teknik (Rekayasa, enjinering dan kegiatan berkaitan)
spellingShingle Teknik (Rekayasa, enjinering dan kegiatan berkaitan)
Saifuddin, Habibi
CHANNEL NORMALIZATION OF SPEECH ACOUSTIC SIGNAL USING WITHIN-CLASS COVARIANCE NORMALIZATION (WCCN) FOR SPEAKER RECOGNITION SYSTEM WITH BAHASA INDONESIA
description Speaker recognition system is a technology that can be used to verify the speaker's identity from an unknown speech voice sample. In Indonesia, this system is actively used to assist the speaker verification process as an evidence in court by the anti-corruption agency, the Police and the Attorney General's Office. The speech recognition system developed in this study uses I-vector modeling. This system is trained and tested using an Indonesian speech database owned by the Acoustic Laboratory of Physics Engineering Department at Institut Teknologi Bandung. The test data used were speech data of 46 males and 52 females and the training data were the first 20 speakers for each gender and recording scenarios. In this system, the speech features are extracted from its speech data, using 19 Mels Frequency Cepstral Coefficients (MFCCs) along with 1 energy dimension, 20 delta-MFCC, and 20 delta-delta-MFCC. The extracted speech features are modeled using 32 Gaussian components of UBM and 100 I-vector feature dimensions. Furthermore, an assessment of the similarity of the Known (K) and Unknown (UK) samples is carried out using the cosine distance method. The previous experiment using the same dataset and parameters has achieved maximum results in the female speech interview scenario data with Equal Error Rate (EER) = 3.50%. In this study, an effort to improve system performance from the same and different (mismatched) voice recording devices was carried out using the Within-Class Covariance Normalization (WCCN) technique. According to the hypothesis, the WCCN technique applied on the same-channel and mismatched-channel speaker recognition system can improve the system performance. In the same-channel experiment, an increase of 31.43% in system performance was obtained from the previous studies using the same dataset and parameters without WCCN. The best EER obtained in this study was 2.40% which was obtained in a same-channel experiment on a female interview scenario. Compared to the original I-vector system, the mismatched-channel speaker recognition system using WCCN has experienced an average performance increase of 33.75% in each scenario. The best EER obtained in a mismatched-channel speaker recognition system is in the female conversation scenario with an EER of 20.52%.
format Theses
author Saifuddin, Habibi
author_facet Saifuddin, Habibi
author_sort Saifuddin, Habibi
title CHANNEL NORMALIZATION OF SPEECH ACOUSTIC SIGNAL USING WITHIN-CLASS COVARIANCE NORMALIZATION (WCCN) FOR SPEAKER RECOGNITION SYSTEM WITH BAHASA INDONESIA
title_short CHANNEL NORMALIZATION OF SPEECH ACOUSTIC SIGNAL USING WITHIN-CLASS COVARIANCE NORMALIZATION (WCCN) FOR SPEAKER RECOGNITION SYSTEM WITH BAHASA INDONESIA
title_full CHANNEL NORMALIZATION OF SPEECH ACOUSTIC SIGNAL USING WITHIN-CLASS COVARIANCE NORMALIZATION (WCCN) FOR SPEAKER RECOGNITION SYSTEM WITH BAHASA INDONESIA
title_fullStr CHANNEL NORMALIZATION OF SPEECH ACOUSTIC SIGNAL USING WITHIN-CLASS COVARIANCE NORMALIZATION (WCCN) FOR SPEAKER RECOGNITION SYSTEM WITH BAHASA INDONESIA
title_full_unstemmed CHANNEL NORMALIZATION OF SPEECH ACOUSTIC SIGNAL USING WITHIN-CLASS COVARIANCE NORMALIZATION (WCCN) FOR SPEAKER RECOGNITION SYSTEM WITH BAHASA INDONESIA
title_sort channel normalization of speech acoustic signal using within-class covariance normalization (wccn) for speaker recognition system with bahasa indonesia
url https://digilib.itb.ac.id/gdl/view/56272
_version_ 1822274534704676864