HANDLING INDONESIAN-ENGLISH CODE-SWITCHING IN SPEECH RECOGNITION SYSTEMS USING A MACHINE SPEECH CHAIN APPROACH
There is a phenomenon known as code-switching in human conversations, which is the switching of languages from one language to another during a communication process. In Indonesia, there is a code-switching phenomenon between Indonesian and English. Code-switching phenomenon needs to be addressed...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/78312 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | There is a phenomenon known as code-switching in human conversations, which is
the switching of languages from one language to another during a communication
process. In Indonesia, there is a code-switching phenomenon between Indonesian
and English. Code-switching phenomenon needs to be addressed in a speech
recognition system. Most research on handling code-switching in speech
recognition systems use a supervised approach, which relies only on labeled data
for training, even though unlabeled data is more readily available. On the other
hand, there is the machine speech chain approach, a semi-supervised deep learning-
based approach that can leverage unlabeled data in addition to labeled data to train
both speech recognition and speech synthesis models simultaneously. Therefore,
experiments were conducted to improve the performance of Indonesian-English
code-switching speech recognition models that utilize unlabeled data using the
machine speech chain approach. The trained models were evaluated using the
character error rate (CER) metric in two ways: (1) for Indonesian and English
separately and (2) combined. Based on the experiments conducted, the use of
unlabeled data can improve the performance of Indonesian-English code-switching
speech recognition using the machine speech chain approach if the code-switching
patterns are not yet represented in models trained using a supervised approach
alone. Models previously trained using only supervised data with 10% and 30%
labeled code-switching data showed performance improvements when trained with
the remaining 90% and 70% of code-switching data treated as unlabeled data using
the machine speech chain mechanism, reducing the CER from 163.00% to 104.94%
and from 124.11% to 84.00%, respectively. The model previously trained with 50%
labeled code-switching data showed a slight performance decrease when fine-tuned
with 50% of the remaining code-switching data treated as unlabeled data, increasing
the CER from 77.22% to 78.00%. |
---|