HANDLING OF INDONESIAN-ENGLISH CODESWITCHING SPEECH IN END-TO-END INDONESIAN SPEECH RECOGNITION SYSTEM USING CONNECTIONIST TEMPORAL CLASSIFICATION MODEL
Code-switching in speech poses a significant challenge to Automatic Speech Recognition (ASR) systems. When left unaddressed, code-switching between foreign languages can lead to decreased speech recognition accuracy. With advancements in technology, ASR systems have evolved into two main archite...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/74792 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:74792 |
---|---|
spelling |
id-itb.:747922023-07-24T09:04:14ZHANDLING OF INDONESIAN-ENGLISH CODESWITCHING SPEECH IN END-TO-END INDONESIAN SPEECH RECOGNITION SYSTEM USING CONNECTIONIST TEMPORAL CLASSIFICATION MODEL Raditya Pratama Roosadi, Hizkia Indonesia Final Project ASR, CTC, codeswitching, transfer learning. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/74792 Code-switching in speech poses a significant challenge to Automatic Speech Recognition (ASR) systems. When left unaddressed, code-switching between foreign languages can lead to decreased speech recognition accuracy. With advancements in technology, ASR systems have evolved into two main architectures: conventional and end-to-end (e2e). Compared to conventional architectures, e2e architectures are more commonly used due to their simplicity and superior performance. A widely used modeling technique in e2e ASR is the Connectionist Temporal Classification (CTC) model, which combines Recurrent Neural Networks (RNN) with the CTC loss function to handle situations where the alignment between speech and transcription is unknown. This study focuses on handling the code-switching phenomenon between Indonesian and English in e2e ASR using the CTC model. The proposed code- switching handling involves pre-training and transfer learning. Pre-training is performed on Indonesian speech data, resulting in an average error rate of 13.23% (WER) and 4.13% (CER) on Indonesian test data. However, the error rate remains high for code-switched test data. Transfer learning is then conducted on code-switched speech data by fine-tuning the model. This results in improved performance, with an average error rate of 48.115% (WER) and 16.8% (CER) on code-switched test data. Nonetheless, the error rate on Indonesian test data increases. To address this, a two-model CTC system is developed, capable of distinguishing between Indonesian-only and code- switched data using confidence values. The system successfully reduces the average error rate to 24.6625% (WER) and 7.0525% (CER) for Indonesian data and achieves 53.852% (WER) and 20.7675% (CER) for code-switched data. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Code-switching in speech poses a significant challenge to Automatic Speech
Recognition (ASR) systems. When left unaddressed, code-switching between
foreign languages can lead to decreased speech recognition accuracy. With
advancements in technology, ASR systems have evolved into two main
architectures: conventional and end-to-end (e2e). Compared to conventional
architectures, e2e architectures are more commonly used due to their simplicity
and superior performance. A widely used modeling technique in e2e ASR is the
Connectionist Temporal Classification (CTC) model, which combines
Recurrent Neural Networks (RNN) with the CTC loss function to handle
situations where the alignment between speech and transcription is unknown.
This study focuses on handling the code-switching phenomenon between
Indonesian and English in e2e ASR using the CTC model. The proposed code-
switching handling involves pre-training and transfer learning. Pre-training is
performed on Indonesian speech data, resulting in an average error rate of
13.23% (WER) and 4.13% (CER) on Indonesian test data. However, the error
rate remains high for code-switched test data. Transfer learning is then
conducted on code-switched speech data by fine-tuning the model. This results
in improved performance, with an average error rate of 48.115% (WER) and
16.8% (CER) on code-switched test data. Nonetheless, the error rate on
Indonesian test data increases. To address this, a two-model CTC system is
developed, capable of distinguishing between Indonesian-only and code-
switched data using confidence values. The system successfully reduces the
average error rate to 24.6625% (WER) and 7.0525% (CER) for Indonesian data
and achieves 53.852% (WER) and 20.7675% (CER) for code-switched data. |
format |
Final Project |
author |
Raditya Pratama Roosadi, Hizkia |
spellingShingle |
Raditya Pratama Roosadi, Hizkia HANDLING OF INDONESIAN-ENGLISH CODESWITCHING SPEECH IN END-TO-END INDONESIAN SPEECH RECOGNITION SYSTEM USING CONNECTIONIST TEMPORAL CLASSIFICATION MODEL |
author_facet |
Raditya Pratama Roosadi, Hizkia |
author_sort |
Raditya Pratama Roosadi, Hizkia |
title |
HANDLING OF INDONESIAN-ENGLISH CODESWITCHING SPEECH IN END-TO-END INDONESIAN SPEECH RECOGNITION SYSTEM USING CONNECTIONIST TEMPORAL CLASSIFICATION MODEL |
title_short |
HANDLING OF INDONESIAN-ENGLISH CODESWITCHING SPEECH IN END-TO-END INDONESIAN SPEECH RECOGNITION SYSTEM USING CONNECTIONIST TEMPORAL CLASSIFICATION MODEL |
title_full |
HANDLING OF INDONESIAN-ENGLISH CODESWITCHING SPEECH IN END-TO-END INDONESIAN SPEECH RECOGNITION SYSTEM USING CONNECTIONIST TEMPORAL CLASSIFICATION MODEL |
title_fullStr |
HANDLING OF INDONESIAN-ENGLISH CODESWITCHING SPEECH IN END-TO-END INDONESIAN SPEECH RECOGNITION SYSTEM USING CONNECTIONIST TEMPORAL CLASSIFICATION MODEL |
title_full_unstemmed |
HANDLING OF INDONESIAN-ENGLISH CODESWITCHING SPEECH IN END-TO-END INDONESIAN SPEECH RECOGNITION SYSTEM USING CONNECTIONIST TEMPORAL CLASSIFICATION MODEL |
title_sort |
handling of indonesian-english codeswitching speech in end-to-end indonesian speech recognition system using connectionist temporal classification model |
url |
https://digilib.itb.ac.id/gdl/view/74792 |
_version_ |
1822993996420481024 |