HANDLING OF INDONESIAN-ENGLISH CODESWITCHING SPEECH IN END-TO-END INDONESIAN SPEECH RECOGNITION SYSTEM USING CONNECTIONIST TEMPORAL CLASSIFICATION MODEL

Code-switching in speech poses a significant challenge to Automatic Speech Recognition (ASR) systems. When left unaddressed, code-switching between foreign languages can lead to decreased speech recognition accuracy. With advancements in technology, ASR systems have evolved into two main archite...

Full description

Saved in:

Bibliographic Details
Main Author:	Raditya Pratama Roosadi, Hizkia
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/74792
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

Description
Summary:	Code-switching in speech poses a significant challenge to Automatic Speech Recognition (ASR) systems. When left unaddressed, code-switching between foreign languages can lead to decreased speech recognition accuracy. With advancements in technology, ASR systems have evolved into two main architectures: conventional and end-to-end (e2e). Compared to conventional architectures, e2e architectures are more commonly used due to their simplicity and superior performance. A widely used modeling technique in e2e ASR is the Connectionist Temporal Classification (CTC) model, which combines Recurrent Neural Networks (RNN) with the CTC loss function to handle situations where the alignment between speech and transcription is unknown. This study focuses on handling the code-switching phenomenon between Indonesian and English in e2e ASR using the CTC model. The proposed code- switching handling involves pre-training and transfer learning. Pre-training is performed on Indonesian speech data, resulting in an average error rate of 13.23% (WER) and 4.13% (CER) on Indonesian test data. However, the error rate remains high for code-switched test data. Transfer learning is then conducted on code-switched speech data by fine-tuning the model. This results in improved performance, with an average error rate of 48.115% (WER) and 16.8% (CER) on code-switched test data. Nonetheless, the error rate on Indonesian test data increases. To address this, a two-model CTC system is developed, capable of distinguishing between Indonesian-only and code- switched data using confidence values. The system successfully reduces the average error rate to 24.6625% (WER) and 7.0525% (CER) for Indonesian data and achieves 53.852% (WER) and 20.7675% (CER) for code-switched data.

HANDLING OF INDONESIAN-ENGLISH CODESWITCHING SPEECH IN END-TO-END INDONESIAN SPEECH RECOGNITION SYSTEM USING CONNECTIONIST TEMPORAL CLASSIFICATION MODEL

Similar Items