ACOUSTIC MODELS CONSTRUCTION ON INDONESIAN AUTOMATIC SPEECH RECOGNITION THROUGH WEAKLY SUPERVISED LEARNING WITH AGREEMENT-BASED

Automatic Speech Recognition (ASR) is rapidly growing in the era. ASR has been developed in various languages, one of which is Bahasa Indonesia. But, ASR Bahasa has a little transcribed data if compared by other languages. Transcribing the audio data on word-level takes long about 6-8 times the audi...

Full description

Saved in:

Bibliographic Details
Main Author:	Zakiah, Iftitakhul
Format:	Theses
Language:	Indonesia
Subjects:	Teknik (Rekayasa, enjinering dan kegiatan berkaitan)
Online Access:	https://digilib.itb.ac.id/gdl/view/48062
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:48062
spelling	id-itb.:480622020-06-25T23:01:49ZACOUSTIC MODELS CONSTRUCTION ON INDONESIAN AUTOMATIC SPEECH RECOGNITION THROUGH WEAKLY SUPERVISED LEARNING WITH AGREEMENT-BASED Zakiah, Iftitakhul Teknik (Rekayasa, enjinering dan kegiatan berkaitan) Indonesia Theses deep learning, agreement-based, segments, speech recognition INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/48062 Automatic Speech Recognition (ASR) is rapidly growing in the era. ASR has been developed in various languages, one of which is Bahasa Indonesia. But, ASR Bahasa has a little transcribed data if compared by other languages. Transcribing the audio data on word-level takes long about 6-8 times the audio duration while transcribing on phoneme-level requires more time. Nevertheless, untranscribed data are abundant and easier to collect, thus requiring another approach to optimize ASR performance. Weakly supervised learning has many approaches, using the untranscribed data is one of the strategies. In the thesis, we used an agreement based on four heterogeneous topologies models, that are DNN, LSTM, CNN, and TDNN. All of them decode the untranscribed data and the result was aligned by each model. And then the aligned data are voted per frame by all models, later, it's reformed into segments which are approved by the models. The segments are used as additional data on the training processes. DNN gives relative gains up to 1,95%, CNN up to 1,56%, and TDNN up to 2,59%. Overall, LSTM didn't give improvement yet the approach increased relative performance on the one formal_val corpus up to 1,65%. The segmented data isn’t suitable for LSTM topology because it misses context from the segment before. Yet the DNN, CNN, and TDNN can be further improved. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
topic	Teknik (Rekayasa, enjinering dan kegiatan berkaitan)
spellingShingle	Teknik (Rekayasa, enjinering dan kegiatan berkaitan) Zakiah, Iftitakhul ACOUSTIC MODELS CONSTRUCTION ON INDONESIAN AUTOMATIC SPEECH RECOGNITION THROUGH WEAKLY SUPERVISED LEARNING WITH AGREEMENT-BASED
description	Automatic Speech Recognition (ASR) is rapidly growing in the era. ASR has been developed in various languages, one of which is Bahasa Indonesia. But, ASR Bahasa has a little transcribed data if compared by other languages. Transcribing the audio data on word-level takes long about 6-8 times the audio duration while transcribing on phoneme-level requires more time. Nevertheless, untranscribed data are abundant and easier to collect, thus requiring another approach to optimize ASR performance. Weakly supervised learning has many approaches, using the untranscribed data is one of the strategies. In the thesis, we used an agreement based on four heterogeneous topologies models, that are DNN, LSTM, CNN, and TDNN. All of them decode the untranscribed data and the result was aligned by each model. And then the aligned data are voted per frame by all models, later, it's reformed into segments which are approved by the models. The segments are used as additional data on the training processes. DNN gives relative gains up to 1,95%, CNN up to 1,56%, and TDNN up to 2,59%. Overall, LSTM didn't give improvement yet the approach increased relative performance on the one formal_val corpus up to 1,65%. The segmented data isn’t suitable for LSTM topology because it misses context from the segment before. Yet the DNN, CNN, and TDNN can be further improved.
format	Theses
author	Zakiah, Iftitakhul
author_facet	Zakiah, Iftitakhul
author_sort	Zakiah, Iftitakhul
title	ACOUSTIC MODELS CONSTRUCTION ON INDONESIAN AUTOMATIC SPEECH RECOGNITION THROUGH WEAKLY SUPERVISED LEARNING WITH AGREEMENT-BASED
title_short	ACOUSTIC MODELS CONSTRUCTION ON INDONESIAN AUTOMATIC SPEECH RECOGNITION THROUGH WEAKLY SUPERVISED LEARNING WITH AGREEMENT-BASED
title_full	ACOUSTIC MODELS CONSTRUCTION ON INDONESIAN AUTOMATIC SPEECH RECOGNITION THROUGH WEAKLY SUPERVISED LEARNING WITH AGREEMENT-BASED
title_fullStr	ACOUSTIC MODELS CONSTRUCTION ON INDONESIAN AUTOMATIC SPEECH RECOGNITION THROUGH WEAKLY SUPERVISED LEARNING WITH AGREEMENT-BASED
title_full_unstemmed	ACOUSTIC MODELS CONSTRUCTION ON INDONESIAN AUTOMATIC SPEECH RECOGNITION THROUGH WEAKLY SUPERVISED LEARNING WITH AGREEMENT-BASED
title_sort	acoustic models construction on indonesian automatic speech recognition through weakly supervised learning with agreement-based
url	https://digilib.itb.ac.id/gdl/view/48062
_version_	1822000013675331584

ACOUSTIC MODELS CONSTRUCTION ON INDONESIAN AUTOMATIC SPEECH RECOGNITION THROUGH WEAKLY SUPERVISED LEARNING WITH AGREEMENT-BASED

Similar Items