HANDLING INDONESIAN-ENGLISH CODE-SWITCHING IN SPEECH RECOGNITION SYSTEMS USING A MACHINE SPEECH CHAIN APPROACH

There is a phenomenon known as code-switching in human conversations, which is the switching of languages from one language to another during a communication process. In Indonesia, there is a code-switching phenomenon between Indonesian and English. Code-switching phenomenon needs to be addressed...

全面介紹

Saved in:
書目詳細資料
主要作者: Vaza Man Tazakka, Rais
格式: Final Project
語言:Indonesia
在線閱讀:https://digilib.itb.ac.id/gdl/view/78312
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Institut Teknologi Bandung
語言: Indonesia
實物特徵
總結:There is a phenomenon known as code-switching in human conversations, which is the switching of languages from one language to another during a communication process. In Indonesia, there is a code-switching phenomenon between Indonesian and English. Code-switching phenomenon needs to be addressed in a speech recognition system. Most research on handling code-switching in speech recognition systems use a supervised approach, which relies only on labeled data for training, even though unlabeled data is more readily available. On the other hand, there is the machine speech chain approach, a semi-supervised deep learning- based approach that can leverage unlabeled data in addition to labeled data to train both speech recognition and speech synthesis models simultaneously. Therefore, experiments were conducted to improve the performance of Indonesian-English code-switching speech recognition models that utilize unlabeled data using the machine speech chain approach. The trained models were evaluated using the character error rate (CER) metric in two ways: (1) for Indonesian and English separately and (2) combined. Based on the experiments conducted, the use of unlabeled data can improve the performance of Indonesian-English code-switching speech recognition using the machine speech chain approach if the code-switching patterns are not yet represented in models trained using a supervised approach alone. Models previously trained using only supervised data with 10% and 30% labeled code-switching data showed performance improvements when trained with the remaining 90% and 70% of code-switching data treated as unlabeled data using the machine speech chain mechanism, reducing the CER from 163.00% to 104.94% and from 124.11% to 84.00%, respectively. The model previously trained with 50% labeled code-switching data showed a slight performance decrease when fine-tuned with 50% of the remaining code-switching data treated as unlabeled data, increasing the CER from 77.22% to 78.00%.