AUTOMATIC SPEAKER RECOGNITION FOR FORENSIC APPLICATIONS IN INDONESIA BASED ON I-VECTOR MODELING

Speaker recognition is a process of technology to identify a speaker’s identity based on their speech recording. This system can be used to help in forensic application. In Indonesia, speaker recognition is used to help to verify the legal evidence in the court by Komisi Pemberantasan Korupsi (KP...

Full description

Saved in:

Bibliographic Details
Main Author:	Hartanto, Jocelyn
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/50299
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:50299
spelling	id-itb.:502992020-09-23T12:50:06ZAUTOMATIC SPEAKER RECOGNITION FOR FORENSIC APPLICATIONS IN INDONESIA BASED ON I-VECTOR MODELING Hartanto, Jocelyn Indonesia Final Project Automated speaker recognition, Bahasa Indonesia, I-Vector, MFCC INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/50299 Speaker recognition is a process of technology to identify a speaker’s identity based on their speech recording. This system can be used to help in forensic application. In Indonesia, speaker recognition is used to help to verify the legal evidence in the court by Komisi Pemberantasan Korupsi (KPK), police, and judiciary. Currently, the system used is based on text-dependent system that needs more time and human intervention. Therefore, a system that can reduce the time needed for analysis while also have small error is desirable in verification process. The constructed system is an automatic speaker recognition system based on Identity Vector (I-Vector model). This system is trained and tested using speech database in Bahasa Indonesia. Speech recording are taken at semi-anechoic chamber in Adhiwijogo Acoustic Laboratory, Institut Teknologi Bandung. The data features will be extracted using 19+1 dimensions Mel Frequency Cepstral Coefficient (MFCC). In addition to MFCC coefficient, 20 dimensions of delta MFCC and delta-delta MFCC will be used to obtain more detailed feature in speech dynamics and to achieve higher accuracy. The extracted data is modeled using IVector using 32 components of Gaussian and 100 dimensions of I-Vector. The system will be scored using cosine distance scoring to obtain the target and nontarget score. Normalization is applied using Zero Normalization (Z-norm), Test Normalization (T-norm), or Zero-Test Normalization (ZT-norm) to further reduce the system’s error. The system is tested using 46 male speech data and 52 female speech data and trained using the first 20 data for both genders. The lowest Equal Error Rate (EER) achieved by this system is 3,50% which is obtained using T-normed and ZT-normed score in female interview scenario, while the lowest EER by male speaker is 3,56% achieved using T-normed conversation scenario. The low EER number means this system is better than the previous speaker recognition system based on GMM-UBM model. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Speaker recognition is a process of technology to identify a speaker’s identity based on their speech recording. This system can be used to help in forensic application. In Indonesia, speaker recognition is used to help to verify the legal evidence in the court by Komisi Pemberantasan Korupsi (KPK), police, and judiciary. Currently, the system used is based on text-dependent system that needs more time and human intervention. Therefore, a system that can reduce the time needed for analysis while also have small error is desirable in verification process. The constructed system is an automatic speaker recognition system based on Identity Vector (I-Vector model). This system is trained and tested using speech database in Bahasa Indonesia. Speech recording are taken at semi-anechoic chamber in Adhiwijogo Acoustic Laboratory, Institut Teknologi Bandung. The data features will be extracted using 19+1 dimensions Mel Frequency Cepstral Coefficient (MFCC). In addition to MFCC coefficient, 20 dimensions of delta MFCC and delta-delta MFCC will be used to obtain more detailed feature in speech dynamics and to achieve higher accuracy. The extracted data is modeled using IVector using 32 components of Gaussian and 100 dimensions of I-Vector. The system will be scored using cosine distance scoring to obtain the target and nontarget score. Normalization is applied using Zero Normalization (Z-norm), Test Normalization (T-norm), or Zero-Test Normalization (ZT-norm) to further reduce the system’s error. The system is tested using 46 male speech data and 52 female speech data and trained using the first 20 data for both genders. The lowest Equal Error Rate (EER) achieved by this system is 3,50% which is obtained using T-normed and ZT-normed score in female interview scenario, while the lowest EER by male speaker is 3,56% achieved using T-normed conversation scenario. The low EER number means this system is better than the previous speaker recognition system based on GMM-UBM model.
format	Final Project
author	Hartanto, Jocelyn
spellingShingle	Hartanto, Jocelyn AUTOMATIC SPEAKER RECOGNITION FOR FORENSIC APPLICATIONS IN INDONESIA BASED ON I-VECTOR MODELING
author_facet	Hartanto, Jocelyn
author_sort	Hartanto, Jocelyn
title	AUTOMATIC SPEAKER RECOGNITION FOR FORENSIC APPLICATIONS IN INDONESIA BASED ON I-VECTOR MODELING
title_short	AUTOMATIC SPEAKER RECOGNITION FOR FORENSIC APPLICATIONS IN INDONESIA BASED ON I-VECTOR MODELING
title_full	AUTOMATIC SPEAKER RECOGNITION FOR FORENSIC APPLICATIONS IN INDONESIA BASED ON I-VECTOR MODELING
title_fullStr	AUTOMATIC SPEAKER RECOGNITION FOR FORENSIC APPLICATIONS IN INDONESIA BASED ON I-VECTOR MODELING
title_full_unstemmed	AUTOMATIC SPEAKER RECOGNITION FOR FORENSIC APPLICATIONS IN INDONESIA BASED ON I-VECTOR MODELING
title_sort	automatic speaker recognition for forensic applications in indonesia based on i-vector modeling
url	https://digilib.itb.ac.id/gdl/view/50299
_version_	1822000620714852352

AUTOMATIC SPEAKER RECOGNITION FOR FORENSIC APPLICATIONS IN INDONESIA BASED ON I-VECTOR MODELING

Similar Items