JAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR

Dialect is a variance of language that can affect the way a person pronounces. In a speech recognition system that translates voice into text form, the speaker dialect may affect the results of the recognition. Research on dialect identification has been done first in Indian (Hindi), Arabic and Bang...

Full description

Saved in:

Bibliographic Details
Main Author:	RAHMAWATI (NIM: 23514023), RITA
Format:	Theses
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/24044
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:24044
spelling	id-itb.:240442017-09-27T15:37:11ZJAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR RAHMAWATI (NIM: 23514023), RITA Indonesia Theses INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/24044 Dialect is a variance of language that can affect the way a person pronounces. In a speech recognition system that translates voice into text form, the speaker dialect may affect the results of the recognition. Research on dialect identification has been done first in Indian (Hindi), Arabic and Bangladeshi dialects. Although in Indonesia there are quite a lot of dialects, but research for the recognition of dialect in Indonesian is still limited, therefore this research focus on recognition of Java and Sunda dialect that have the most speakers in Indonesia. This research begins with data collection used for machine learning experiments based on supervised learning. The sound corpus used to construct the model is recorded voice corpus of 8 men and 2 women in each dialect who read the story in Indonesian with a total duration of training data for 1.5 hours. The recognition of Java and Sunda dialects from Indonesian Speech was built through a combination of MFCC and pitch features and using GMM and I-vector modeling techniques. The process of building the dialect model is done with the ratio of 80:20 for the training and testing data. In addition, the constructed model has been tested using a 5-Fold scheme on 4 tesing data on closed test and 12 tesing data on open test. Classification Error value obtained by using I-vector modeling technique and MFCC + pitch feature combination is 35% for closed test and 13,34% for open test. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Dialect is a variance of language that can affect the way a person pronounces. In a speech recognition system that translates voice into text form, the speaker dialect may affect the results of the recognition. Research on dialect identification has been done first in Indian (Hindi), Arabic and Bangladeshi dialects. Although in Indonesia there are quite a lot of dialects, but research for the recognition of dialect in Indonesian is still limited, therefore this research focus on recognition of Java and Sunda dialect that have the most speakers in Indonesia. This research begins with data collection used for machine learning experiments based on supervised learning. The sound corpus used to construct the model is recorded voice corpus of 8 men and 2 women in each dialect who read the story in Indonesian with a total duration of training data for 1.5 hours. The recognition of Java and Sunda dialects from Indonesian Speech was built through a combination of MFCC and pitch features and using GMM and I-vector modeling techniques. The process of building the dialect model is done with the ratio of 80:20 for the training and testing data. In addition, the constructed model has been tested using a 5-Fold scheme on 4 tesing data on closed test and 12 tesing data on open test. Classification Error value obtained by using I-vector modeling technique and MFCC + pitch feature combination is 35% for closed test and 13,34% for open test.
format	Theses
author	RAHMAWATI (NIM: 23514023), RITA
spellingShingle	RAHMAWATI (NIM: 23514023), RITA JAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR
author_facet	RAHMAWATI (NIM: 23514023), RITA
author_sort	RAHMAWATI (NIM: 23514023), RITA
title	JAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR
title_short	JAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR
title_full	JAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR
title_fullStr	JAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR
title_full_unstemmed	JAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR
title_sort	java and sunda dialect recognition from indonesian speech using gmm and i-vector
url	https://digilib.itb.ac.id/gdl/view/24044
_version_	1822020278401630208

JAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR

Similar Items