JAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR
Dialect is a variance of language that can affect the way a person pronounces. In a speech recognition system that translates voice into text form, the speaker dialect may affect the results of the recognition. Research on dialect identification has been done first in Indian (Hindi), Arabic and Bang...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/24044 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:24044 |
---|---|
spelling |
id-itb.:240442017-09-27T15:37:11ZJAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR RAHMAWATI (NIM: 23514023), RITA Indonesia Theses INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/24044 Dialect is a variance of language that can affect the way a person pronounces. In a speech recognition system that translates voice into text form, the speaker dialect may affect the results of the recognition. Research on dialect identification has been done first in Indian (Hindi), Arabic and Bangladeshi dialects. Although in Indonesia there are quite a lot of dialects, but research for the recognition of dialect in Indonesian is still limited, therefore this research focus on recognition of Java and Sunda dialect that have the most speakers in Indonesia. This research begins with data collection used for machine learning experiments based on supervised learning. The sound corpus used to construct the model is recorded voice corpus of 8 men and 2 women in each dialect who read the story in Indonesian with a total duration of training data for 1.5 hours. The recognition of Java and Sunda dialects from Indonesian Speech was built through a combination of MFCC and pitch features and using GMM and I-vector modeling techniques. The process of building the dialect model is done with the ratio of 80:20 for the training and testing data. In addition, the constructed model has been tested using a 5-Fold scheme on 4 tesing data on closed test and 12 tesing data on open test. Classification Error value obtained by using I-vector modeling technique and MFCC + pitch feature combination is 35% for closed test and 13,34% for open test. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Dialect is a variance of language that can affect the way a person pronounces. In a speech recognition system that translates voice into text form, the speaker dialect may affect the results of the recognition. Research on dialect identification has been done first in Indian (Hindi), Arabic and Bangladeshi dialects. Although in Indonesia there are quite a lot of dialects, but research for the recognition of dialect in Indonesian is still limited, therefore this research focus on recognition of Java and Sunda dialect that have the most speakers in Indonesia. This research begins with data collection used for machine learning experiments based on supervised learning. The sound corpus used to construct the model is recorded voice corpus of 8 men and 2 women in each dialect who read the story in Indonesian with a total duration of training data for 1.5 hours. The recognition of Java and Sunda dialects from Indonesian Speech was built through a combination of MFCC and pitch features and using GMM and I-vector modeling techniques. The process of building the dialect model is done with the ratio of 80:20 for the training and testing data. In addition, the constructed model has been tested using a 5-Fold scheme on 4 tesing data on closed test and 12 tesing data on open test. Classification Error value obtained by using I-vector modeling technique and MFCC + pitch feature combination is 35% for closed test and 13,34% for open test. |
format |
Theses |
author |
RAHMAWATI (NIM: 23514023), RITA |
spellingShingle |
RAHMAWATI (NIM: 23514023), RITA JAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR |
author_facet |
RAHMAWATI (NIM: 23514023), RITA |
author_sort |
RAHMAWATI (NIM: 23514023), RITA |
title |
JAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR |
title_short |
JAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR |
title_full |
JAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR |
title_fullStr |
JAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR |
title_full_unstemmed |
JAVA AND SUNDA DIALECT RECOGNITION FROM INDONESIAN SPEECH USING GMM AND I-VECTOR |
title_sort |
java and sunda dialect recognition from indonesian speech using gmm and i-vector |
url |
https://digilib.itb.ac.id/gdl/view/24044 |
_version_ |
1822020278401630208 |