HANDLING INDONESIAN-ENGLISH CODE-SWITCHING IN SPEECH RECOGNITION SYSTEMS USING A MACHINE SPEECH CHAIN APPROACH

There is a phenomenon known as code-switching in human conversations, which is the switching of languages from one language to another during a communication process. In Indonesia, there is a code-switching phenomenon between Indonesian and English. Code-switching phenomenon needs to be addressed...

Full description

Saved in:
Bibliographic Details
Main Author: Vaza Man Tazakka, Rais
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/78312
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:There is a phenomenon known as code-switching in human conversations, which is the switching of languages from one language to another during a communication process. In Indonesia, there is a code-switching phenomenon between Indonesian and English. Code-switching phenomenon needs to be addressed in a speech recognition system. Most research on handling code-switching in speech recognition systems use a supervised approach, which relies only on labeled data for training, even though unlabeled data is more readily available. On the other hand, there is the machine speech chain approach, a semi-supervised deep learning- based approach that can leverage unlabeled data in addition to labeled data to train both speech recognition and speech synthesis models simultaneously. Therefore, experiments were conducted to improve the performance of Indonesian-English code-switching speech recognition models that utilize unlabeled data using the machine speech chain approach. The trained models were evaluated using the character error rate (CER) metric in two ways: (1) for Indonesian and English separately and (2) combined. Based on the experiments conducted, the use of unlabeled data can improve the performance of Indonesian-English code-switching speech recognition using the machine speech chain approach if the code-switching patterns are not yet represented in models trained using a supervised approach alone. Models previously trained using only supervised data with 10% and 30% labeled code-switching data showed performance improvements when trained with the remaining 90% and 70% of code-switching data treated as unlabeled data using the machine speech chain mechanism, reducing the CER from 163.00% to 104.94% and from 124.11% to 84.00%, respectively. The model previously trained with 50% labeled code-switching data showed a slight performance decrease when fine-tuned with 50% of the remaining code-switching data treated as unlabeled data, increasing the CER from 77.22% to 78.00%.