HANDLING OF INDONESIAN-ENGLISH CODESWITCHING SPEECH IN END-TO-END INDONESIAN SPEECH RECOGNITION SYSTEM USING CONNECTIONIST TEMPORAL CLASSIFICATION MODEL
Code-switching in speech poses a significant challenge to Automatic Speech Recognition (ASR) systems. When left unaddressed, code-switching between foreign languages can lead to decreased speech recognition accuracy. With advancements in technology, ASR systems have evolved into two main archite...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/74792 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Code-switching in speech poses a significant challenge to Automatic Speech
Recognition (ASR) systems. When left unaddressed, code-switching between
foreign languages can lead to decreased speech recognition accuracy. With
advancements in technology, ASR systems have evolved into two main
architectures: conventional and end-to-end (e2e). Compared to conventional
architectures, e2e architectures are more commonly used due to their simplicity
and superior performance. A widely used modeling technique in e2e ASR is the
Connectionist Temporal Classification (CTC) model, which combines
Recurrent Neural Networks (RNN) with the CTC loss function to handle
situations where the alignment between speech and transcription is unknown.
This study focuses on handling the code-switching phenomenon between
Indonesian and English in e2e ASR using the CTC model. The proposed code-
switching handling involves pre-training and transfer learning. Pre-training is
performed on Indonesian speech data, resulting in an average error rate of
13.23% (WER) and 4.13% (CER) on Indonesian test data. However, the error
rate remains high for code-switched test data. Transfer learning is then
conducted on code-switched speech data by fine-tuning the model. This results
in improved performance, with an average error rate of 48.115% (WER) and
16.8% (CER) on code-switched test data. Nonetheless, the error rate on
Indonesian test data increases. To address this, a two-model CTC system is
developed, capable of distinguishing between Indonesian-only and code-
switched data using confidence values. The system successfully reduces the
average error rate to 24.6625% (WER) and 7.0525% (CER) for Indonesian data
and achieves 53.852% (WER) and 20.7675% (CER) for code-switched data. |
---|