AUTOMATIC SPEAKER RECOGNITION FOR FORENSIC APPLICATIONS IN INDONESIA BASED ON I-VECTOR MODELING

Speaker recognition is a process of technology to identify a speaker’s identity based on their speech recording. This system can be used to help in forensic application. In Indonesia, speaker recognition is used to help to verify the legal evidence in the court by Komisi Pemberantasan Korupsi (KP...

Full description

Saved in:
Bibliographic Details
Main Author: Hartanto, Jocelyn
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/50299
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:50299
spelling id-itb.:502992020-09-23T12:50:06ZAUTOMATIC SPEAKER RECOGNITION FOR FORENSIC APPLICATIONS IN INDONESIA BASED ON I-VECTOR MODELING Hartanto, Jocelyn Indonesia Final Project Automated speaker recognition, Bahasa Indonesia, I-Vector, MFCC INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/50299 Speaker recognition is a process of technology to identify a speaker’s identity based on their speech recording. This system can be used to help in forensic application. In Indonesia, speaker recognition is used to help to verify the legal evidence in the court by Komisi Pemberantasan Korupsi (KPK), police, and judiciary. Currently, the system used is based on text-dependent system that needs more time and human intervention. Therefore, a system that can reduce the time needed for analysis while also have small error is desirable in verification process. The constructed system is an automatic speaker recognition system based on Identity Vector (I-Vector model). This system is trained and tested using speech database in Bahasa Indonesia. Speech recording are taken at semi-anechoic chamber in Adhiwijogo Acoustic Laboratory, Institut Teknologi Bandung. The data features will be extracted using 19+1 dimensions Mel Frequency Cepstral Coefficient (MFCC). In addition to MFCC coefficient, 20 dimensions of delta MFCC and delta-delta MFCC will be used to obtain more detailed feature in speech dynamics and to achieve higher accuracy. The extracted data is modeled using IVector using 32 components of Gaussian and 100 dimensions of I-Vector. The system will be scored using cosine distance scoring to obtain the target and nontarget score. Normalization is applied using Zero Normalization (Z-norm), Test Normalization (T-norm), or Zero-Test Normalization (ZT-norm) to further reduce the system’s error. The system is tested using 46 male speech data and 52 female speech data and trained using the first 20 data for both genders. The lowest Equal Error Rate (EER) achieved by this system is 3,50% which is obtained using T-normed and ZT-normed score in female interview scenario, while the lowest EER by male speaker is 3,56% achieved using T-normed conversation scenario. The low EER number means this system is better than the previous speaker recognition system based on GMM-UBM model. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Speaker recognition is a process of technology to identify a speaker’s identity based on their speech recording. This system can be used to help in forensic application. In Indonesia, speaker recognition is used to help to verify the legal evidence in the court by Komisi Pemberantasan Korupsi (KPK), police, and judiciary. Currently, the system used is based on text-dependent system that needs more time and human intervention. Therefore, a system that can reduce the time needed for analysis while also have small error is desirable in verification process. The constructed system is an automatic speaker recognition system based on Identity Vector (I-Vector model). This system is trained and tested using speech database in Bahasa Indonesia. Speech recording are taken at semi-anechoic chamber in Adhiwijogo Acoustic Laboratory, Institut Teknologi Bandung. The data features will be extracted using 19+1 dimensions Mel Frequency Cepstral Coefficient (MFCC). In addition to MFCC coefficient, 20 dimensions of delta MFCC and delta-delta MFCC will be used to obtain more detailed feature in speech dynamics and to achieve higher accuracy. The extracted data is modeled using IVector using 32 components of Gaussian and 100 dimensions of I-Vector. The system will be scored using cosine distance scoring to obtain the target and nontarget score. Normalization is applied using Zero Normalization (Z-norm), Test Normalization (T-norm), or Zero-Test Normalization (ZT-norm) to further reduce the system’s error. The system is tested using 46 male speech data and 52 female speech data and trained using the first 20 data for both genders. The lowest Equal Error Rate (EER) achieved by this system is 3,50% which is obtained using T-normed and ZT-normed score in female interview scenario, while the lowest EER by male speaker is 3,56% achieved using T-normed conversation scenario. The low EER number means this system is better than the previous speaker recognition system based on GMM-UBM model.
format Final Project
author Hartanto, Jocelyn
spellingShingle Hartanto, Jocelyn
AUTOMATIC SPEAKER RECOGNITION FOR FORENSIC APPLICATIONS IN INDONESIA BASED ON I-VECTOR MODELING
author_facet Hartanto, Jocelyn
author_sort Hartanto, Jocelyn
title AUTOMATIC SPEAKER RECOGNITION FOR FORENSIC APPLICATIONS IN INDONESIA BASED ON I-VECTOR MODELING
title_short AUTOMATIC SPEAKER RECOGNITION FOR FORENSIC APPLICATIONS IN INDONESIA BASED ON I-VECTOR MODELING
title_full AUTOMATIC SPEAKER RECOGNITION FOR FORENSIC APPLICATIONS IN INDONESIA BASED ON I-VECTOR MODELING
title_fullStr AUTOMATIC SPEAKER RECOGNITION FOR FORENSIC APPLICATIONS IN INDONESIA BASED ON I-VECTOR MODELING
title_full_unstemmed AUTOMATIC SPEAKER RECOGNITION FOR FORENSIC APPLICATIONS IN INDONESIA BASED ON I-VECTOR MODELING
title_sort automatic speaker recognition for forensic applications in indonesia based on i-vector modeling
url https://digilib.itb.ac.id/gdl/view/50299
_version_ 1822000620714852352