AUTOMATIC SPEAKER RECOGNITION FOR FORENSIC APPLICATIONS IN INDONESIA BASED ON I-VECTOR MODELING
Speaker recognition is a process of technology to identify a speaker’s identity based on their speech recording. This system can be used to help in forensic application. In Indonesia, speaker recognition is used to help to verify the legal evidence in the court by Komisi Pemberantasan Korupsi (KP...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/50299 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Speaker recognition is a process of technology to identify a speaker’s identity based
on their speech recording. This system can be used to help in forensic application.
In Indonesia, speaker recognition is used to help to verify the legal evidence in the
court by Komisi Pemberantasan Korupsi (KPK), police, and judiciary. Currently,
the system used is based on text-dependent system that needs more time and human
intervention. Therefore, a system that can reduce the time needed for analysis while
also have small error is desirable in verification process.
The constructed system is an automatic speaker recognition system based on
Identity Vector (I-Vector model). This system is trained and tested using speech
database in Bahasa Indonesia. Speech recording are taken at semi-anechoic
chamber in Adhiwijogo Acoustic Laboratory, Institut Teknologi Bandung. The data
features will be extracted using 19+1 dimensions Mel Frequency Cepstral
Coefficient (MFCC). In addition to MFCC coefficient, 20 dimensions of delta
MFCC and delta-delta MFCC will be used to obtain more detailed feature in speech
dynamics and to achieve higher accuracy. The extracted data is modeled using IVector
using 32 components of Gaussian and 100 dimensions of I-Vector. The
system will be scored using cosine distance scoring to obtain the target and nontarget
score. Normalization is applied using Zero Normalization (Z-norm), Test
Normalization (T-norm), or Zero-Test Normalization (ZT-norm) to further reduce
the system’s error.
The system is tested using 46 male speech data and 52 female speech data and
trained using the first 20 data for both genders. The lowest Equal Error Rate (EER)
achieved by this system is 3,50% which is obtained using T-normed and ZT-normed
score in female interview scenario, while the lowest EER by male speaker is 3,56%
achieved using T-normed conversation scenario. The low EER number means this
system is better than the previous speaker recognition system based on GMM-UBM
model.
|
---|