DEVELOPMENT OF SPEAKER RECOGNITION SYSTEM IN INDONESIAN WITH DEEP NEURAL NETWORK AND VECTOR EMBEDDING

The speaker recognition system is a human biometric system that identifies a person with voice parameters. Identification of a person can be done by modeling each characteristic of the speaker. There is a speech recognition model that is considered state-of-the-art, namely the i-vector model. Along...

Full description

Saved in:

Bibliographic Details
Main Author:	Angelia, Tifany
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/69107
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:69107
spelling	id-itb.:691072022-09-20T11:54:01ZDEVELOPMENT OF SPEAKER RECOGNITION SYSTEM IN INDONESIAN WITH DEEP NEURAL NETWORK AND VECTOR EMBEDDING Angelia, Tifany Indonesia Final Project speaker recognition, data augmentation, i-vector, x-vector, deep learning, vector embedding. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/69107 The speaker recognition system is a human biometric system that identifies a person with voice parameters. Identification of a person can be done by modeling each characteristic of the speaker. There is a speech recognition model that is considered state-of-the-art, namely the i-vector model. Along with the development of deep learning models, many models are designed with deep learning, one of which is the x-vector model. The performance of the x-vector model is considered better than the i-vector model, but there are also those who think that the x-vector cannot outperform the i-vector. The models built for the speaker recognition system in this final project are vector-i and vector-x models. The i-vector model is an unsupervised learning model, while the x-vector model is a discriminatory model where the training process is carried out with supervised learning. The data used in this study are recorded data collected by themselves for multi-channel testing where the data is recorded with cellphones and laptops. The number of speakers collected was 150 speakers. In order for the speaker recognition system to be more robust in handling speaker variability, a data augmentation process is carried out on the training data. The data augmentation techniques applied are changing the sound strength, adding white noise, shifting tone, stretching time, and simulating room echoes. The feature extraction technique used is MFCC with 60 features and Fbank with 40 features. Then the feature is processed with VAD and CMVN. The vector-i model development is carried out by vector extraction with 400 dimensions using GMM 512 gaus. Meanwhile, the x-vector model is extracted by deep learning and applies LDA to reduce the vector dimensions to 200. The backend system for making decisions uses the PLDA method and the evaluation matrix used by EER. The test results show that the x-vector model with MFCC feature extraction gives the lowest EER value with the use of all training data, which is 0%. The Vector-x model with the MFCC feature provides a stable EER value in the 5-fold cross-validation test scheme with an average EER value of 1.67%. In addition, in testing the test data against the enroll data, no non-target speaker was identified as the target speaker. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	The speaker recognition system is a human biometric system that identifies a person with voice parameters. Identification of a person can be done by modeling each characteristic of the speaker. There is a speech recognition model that is considered state-of-the-art, namely the i-vector model. Along with the development of deep learning models, many models are designed with deep learning, one of which is the x-vector model. The performance of the x-vector model is considered better than the i-vector model, but there are also those who think that the x-vector cannot outperform the i-vector. The models built for the speaker recognition system in this final project are vector-i and vector-x models. The i-vector model is an unsupervised learning model, while the x-vector model is a discriminatory model where the training process is carried out with supervised learning. The data used in this study are recorded data collected by themselves for multi-channel testing where the data is recorded with cellphones and laptops. The number of speakers collected was 150 speakers. In order for the speaker recognition system to be more robust in handling speaker variability, a data augmentation process is carried out on the training data. The data augmentation techniques applied are changing the sound strength, adding white noise, shifting tone, stretching time, and simulating room echoes. The feature extraction technique used is MFCC with 60 features and Fbank with 40 features. Then the feature is processed with VAD and CMVN. The vector-i model development is carried out by vector extraction with 400 dimensions using GMM 512 gaus. Meanwhile, the x-vector model is extracted by deep learning and applies LDA to reduce the vector dimensions to 200. The backend system for making decisions uses the PLDA method and the evaluation matrix used by EER. The test results show that the x-vector model with MFCC feature extraction gives the lowest EER value with the use of all training data, which is 0%. The Vector-x model with the MFCC feature provides a stable EER value in the 5-fold cross-validation test scheme with an average EER value of 1.67%. In addition, in testing the test data against the enroll data, no non-target speaker was identified as the target speaker.
format	Final Project
author	Angelia, Tifany
spellingShingle	Angelia, Tifany DEVELOPMENT OF SPEAKER RECOGNITION SYSTEM IN INDONESIAN WITH DEEP NEURAL NETWORK AND VECTOR EMBEDDING
author_facet	Angelia, Tifany
author_sort	Angelia, Tifany
title	DEVELOPMENT OF SPEAKER RECOGNITION SYSTEM IN INDONESIAN WITH DEEP NEURAL NETWORK AND VECTOR EMBEDDING
title_short	DEVELOPMENT OF SPEAKER RECOGNITION SYSTEM IN INDONESIAN WITH DEEP NEURAL NETWORK AND VECTOR EMBEDDING
title_full	DEVELOPMENT OF SPEAKER RECOGNITION SYSTEM IN INDONESIAN WITH DEEP NEURAL NETWORK AND VECTOR EMBEDDING
title_fullStr	DEVELOPMENT OF SPEAKER RECOGNITION SYSTEM IN INDONESIAN WITH DEEP NEURAL NETWORK AND VECTOR EMBEDDING
title_full_unstemmed	DEVELOPMENT OF SPEAKER RECOGNITION SYSTEM IN INDONESIAN WITH DEEP NEURAL NETWORK AND VECTOR EMBEDDING
title_sort	development of speaker recognition system in indonesian with deep neural network and vector embedding
url	https://digilib.itb.ac.id/gdl/view/69107
_version_	1822990840448942080

DEVELOPMENT OF SPEAKER RECOGNITION SYSTEM IN INDONESIAN WITH DEEP NEURAL NETWORK AND VECTOR EMBEDDING

Similar Items