JAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR

Dialect is a variance of language that can affect the way a person pronounces. In a speech recognition system that translates voice into text form, the speaker dialect may affect the results of the recognition. Research on dialect identification has been done first in Indian (Hindi), Arabic and Bang...

Full description

Saved in:
Bibliographic Details
Main Author: RAHMAWATI (NIM: 23514023), RITA
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/24044
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:24044
spelling id-itb.:240442017-09-27T15:37:11ZJAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR RAHMAWATI (NIM: 23514023), RITA Indonesia Theses INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/24044 Dialect is a variance of language that can affect the way a person pronounces. In a speech recognition system that translates voice into text form, the speaker dialect may affect the results of the recognition. Research on dialect identification has been done first in Indian (Hindi), Arabic and Bangladeshi dialects. Although in Indonesia there are quite a lot of dialects, but research for the recognition of dialect in Indonesian is still limited, therefore this research focus on recognition of Java and Sunda dialect that have the most speakers in Indonesia. This research begins with data collection used for machine learning experiments based on supervised learning. The sound corpus used to construct the model is recorded voice corpus of 8 men and 2 women in each dialect who read the story in Indonesian with a total duration of training data for 1.5 hours. The recognition of Java and Sunda dialects from Indonesian Speech was built through a combination of MFCC and pitch features and using GMM and I-vector modeling techniques. The process of building the dialect model is done with the ratio of 80:20 for the training and testing data. In addition, the constructed model has been tested using a 5-Fold scheme on 4 tesing data on closed test and 12 tesing data on open test. Classification Error value obtained by using I-vector modeling technique and MFCC + pitch feature combination is 35% for closed test and 13,34% for open test. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Dialect is a variance of language that can affect the way a person pronounces. In a speech recognition system that translates voice into text form, the speaker dialect may affect the results of the recognition. Research on dialect identification has been done first in Indian (Hindi), Arabic and Bangladeshi dialects. Although in Indonesia there are quite a lot of dialects, but research for the recognition of dialect in Indonesian is still limited, therefore this research focus on recognition of Java and Sunda dialect that have the most speakers in Indonesia. This research begins with data collection used for machine learning experiments based on supervised learning. The sound corpus used to construct the model is recorded voice corpus of 8 men and 2 women in each dialect who read the story in Indonesian with a total duration of training data for 1.5 hours. The recognition of Java and Sunda dialects from Indonesian Speech was built through a combination of MFCC and pitch features and using GMM and I-vector modeling techniques. The process of building the dialect model is done with the ratio of 80:20 for the training and testing data. In addition, the constructed model has been tested using a 5-Fold scheme on 4 tesing data on closed test and 12 tesing data on open test. Classification Error value obtained by using I-vector modeling technique and MFCC + pitch feature combination is 35% for closed test and 13,34% for open test.
format Theses
author RAHMAWATI (NIM: 23514023), RITA
spellingShingle RAHMAWATI (NIM: 23514023), RITA
JAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR
author_facet RAHMAWATI (NIM: 23514023), RITA
author_sort RAHMAWATI (NIM: 23514023), RITA
title JAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR
title_short JAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR
title_full JAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR
title_fullStr JAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR
title_full_unstemmed JAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR
title_sort java and sunda dialect recognition from indonesian speech using gmm and i-vector
url https://digilib.itb.ac.id/gdl/view/24044
_version_ 1822020278401630208