HANDLING OF INDONESIAN-ENGLISH CODESWITCHING SPEECH IN END-TO-END INDONESIAN SPEECH RECOGNITION SYSTEM USING CONNECTIONIST TEMPORAL CLASSIFICATION MODEL

Code-switching in speech poses a significant challenge to Automatic Speech Recognition (ASR) systems. When left unaddressed, code-switching between foreign languages can lead to decreased speech recognition accuracy. With advancements in technology, ASR systems have evolved into two main archite...

Full description

Saved in:
Bibliographic Details
Main Author: Raditya Pratama Roosadi, Hizkia
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/74792
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:74792
spelling id-itb.:747922023-07-24T09:04:14ZHANDLING OF INDONESIAN-ENGLISH CODESWITCHING SPEECH IN END-TO-END INDONESIAN SPEECH RECOGNITION SYSTEM USING CONNECTIONIST TEMPORAL CLASSIFICATION MODEL Raditya Pratama Roosadi, Hizkia Indonesia Final Project ASR, CTC, codeswitching, transfer learning. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/74792 Code-switching in speech poses a significant challenge to Automatic Speech Recognition (ASR) systems. When left unaddressed, code-switching between foreign languages can lead to decreased speech recognition accuracy. With advancements in technology, ASR systems have evolved into two main architectures: conventional and end-to-end (e2e). Compared to conventional architectures, e2e architectures are more commonly used due to their simplicity and superior performance. A widely used modeling technique in e2e ASR is the Connectionist Temporal Classification (CTC) model, which combines Recurrent Neural Networks (RNN) with the CTC loss function to handle situations where the alignment between speech and transcription is unknown. This study focuses on handling the code-switching phenomenon between Indonesian and English in e2e ASR using the CTC model. The proposed code- switching handling involves pre-training and transfer learning. Pre-training is performed on Indonesian speech data, resulting in an average error rate of 13.23% (WER) and 4.13% (CER) on Indonesian test data. However, the error rate remains high for code-switched test data. Transfer learning is then conducted on code-switched speech data by fine-tuning the model. This results in improved performance, with an average error rate of 48.115% (WER) and 16.8% (CER) on code-switched test data. Nonetheless, the error rate on Indonesian test data increases. To address this, a two-model CTC system is developed, capable of distinguishing between Indonesian-only and code- switched data using confidence values. The system successfully reduces the average error rate to 24.6625% (WER) and 7.0525% (CER) for Indonesian data and achieves 53.852% (WER) and 20.7675% (CER) for code-switched data. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Code-switching in speech poses a significant challenge to Automatic Speech Recognition (ASR) systems. When left unaddressed, code-switching between foreign languages can lead to decreased speech recognition accuracy. With advancements in technology, ASR systems have evolved into two main architectures: conventional and end-to-end (e2e). Compared to conventional architectures, e2e architectures are more commonly used due to their simplicity and superior performance. A widely used modeling technique in e2e ASR is the Connectionist Temporal Classification (CTC) model, which combines Recurrent Neural Networks (RNN) with the CTC loss function to handle situations where the alignment between speech and transcription is unknown. This study focuses on handling the code-switching phenomenon between Indonesian and English in e2e ASR using the CTC model. The proposed code- switching handling involves pre-training and transfer learning. Pre-training is performed on Indonesian speech data, resulting in an average error rate of 13.23% (WER) and 4.13% (CER) on Indonesian test data. However, the error rate remains high for code-switched test data. Transfer learning is then conducted on code-switched speech data by fine-tuning the model. This results in improved performance, with an average error rate of 48.115% (WER) and 16.8% (CER) on code-switched test data. Nonetheless, the error rate on Indonesian test data increases. To address this, a two-model CTC system is developed, capable of distinguishing between Indonesian-only and code- switched data using confidence values. The system successfully reduces the average error rate to 24.6625% (WER) and 7.0525% (CER) for Indonesian data and achieves 53.852% (WER) and 20.7675% (CER) for code-switched data.
format Final Project
author Raditya Pratama Roosadi, Hizkia
spellingShingle Raditya Pratama Roosadi, Hizkia
HANDLING OF INDONESIAN-ENGLISH CODESWITCHING SPEECH IN END-TO-END INDONESIAN SPEECH RECOGNITION SYSTEM USING CONNECTIONIST TEMPORAL CLASSIFICATION MODEL
author_facet Raditya Pratama Roosadi, Hizkia
author_sort Raditya Pratama Roosadi, Hizkia
title HANDLING OF INDONESIAN-ENGLISH CODESWITCHING SPEECH IN END-TO-END INDONESIAN SPEECH RECOGNITION SYSTEM USING CONNECTIONIST TEMPORAL CLASSIFICATION MODEL
title_short HANDLING OF INDONESIAN-ENGLISH CODESWITCHING SPEECH IN END-TO-END INDONESIAN SPEECH RECOGNITION SYSTEM USING CONNECTIONIST TEMPORAL CLASSIFICATION MODEL
title_full HANDLING OF INDONESIAN-ENGLISH CODESWITCHING SPEECH IN END-TO-END INDONESIAN SPEECH RECOGNITION SYSTEM USING CONNECTIONIST TEMPORAL CLASSIFICATION MODEL
title_fullStr HANDLING OF INDONESIAN-ENGLISH CODESWITCHING SPEECH IN END-TO-END INDONESIAN SPEECH RECOGNITION SYSTEM USING CONNECTIONIST TEMPORAL CLASSIFICATION MODEL
title_full_unstemmed HANDLING OF INDONESIAN-ENGLISH CODESWITCHING SPEECH IN END-TO-END INDONESIAN SPEECH RECOGNITION SYSTEM USING CONNECTIONIST TEMPORAL CLASSIFICATION MODEL
title_sort handling of indonesian-english codeswitching speech in end-to-end indonesian speech recognition system using connectionist temporal classification model
url https://digilib.itb.ac.id/gdl/view/74792
_version_ 1822993996420481024