FAR DISTANCE AUTOMATIC SPEECH RECOGNITION IN INDONESIAN LANGUAGE
Automatic speech recognition (ASR) is a system which is capable of translating speech into the corresponding text. Development of current ASR is focused on the case of close-range speech, in which the distance between the speaker and the microphone is less than 30 cm. There has been no research c...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Subjects: | |
Online Access: | https://digilib.itb.ac.id/gdl/view/39800 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:39800 |
---|---|
spelling |
id-itb.:398002019-06-27T16:07:01ZFAR DISTANCE AUTOMATIC SPEECH RECOGNITION IN INDONESIAN LANGUAGE Agus Haryono, Stefanus Teknik (Rekayasa, enjinering dan kegiatan berkaitan) Indonesia Theses ASR, acoustic model, phoneme. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/39800 Automatic speech recognition (ASR) is a system which is capable of translating speech into the corresponding text. Development of current ASR is focused on the case of close-range speech, in which the distance between the speaker and the microphone is less than 30 cm. There has been no research conducted on the case of far distance ASR for Indonesian language. In this research, experiments on far distance Indonesian language ASR is conducted. Two different approaches are used to build the ASR; by making speech processing front-end and by making a more robust acoustic model. The tested front-end consist of spectral subtraction, wiener filter, volume normalization, and dynamic range compression. More robust acoustic models are achieved through additions of long distance speech as training data and through volume perturbation. Experiments are conducted on speech data from multiple distance, including 0 meter, 0.5 meter, 1 meter, and 2 meter, with 96 data for each distance. Result of the experiments shows that using spectral subtraction on baseline model reduce the average WER by 0.59%. Addition of long distance speech as training data on acoustic model also increase the average WER by 2.31%. Combination of the new acoustic model and spectral subtraction results in WER reduction of 2.19%, which is lower than just using the acoustic model without frontend. s text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
topic |
Teknik (Rekayasa, enjinering dan kegiatan berkaitan) |
spellingShingle |
Teknik (Rekayasa, enjinering dan kegiatan berkaitan) Agus Haryono, Stefanus FAR DISTANCE AUTOMATIC SPEECH RECOGNITION IN INDONESIAN LANGUAGE |
description |
Automatic speech recognition (ASR) is a system which is capable of translating
speech into the corresponding text. Development of current ASR is focused on the
case of close-range speech, in which the distance between the speaker and the
microphone is less than 30 cm. There has been no research conducted on the case
of far distance ASR for Indonesian language. In this research, experiments on far
distance Indonesian language ASR is conducted. Two different approaches are used
to build the ASR; by making speech processing front-end and by making a more
robust acoustic model. The tested front-end consist of spectral subtraction, wiener
filter, volume normalization, and dynamic range compression. More robust acoustic
models are achieved through additions of long distance speech as training data and
through volume perturbation. Experiments are conducted on speech data from
multiple distance, including 0 meter, 0.5 meter, 1 meter, and 2 meter, with 96 data
for each distance. Result of the experiments shows that using spectral subtraction
on baseline model reduce the average WER by 0.59%. Addition of long distance
speech as training data on acoustic model also increase the average WER by 2.31%.
Combination of the new acoustic model and spectral subtraction results in WER
reduction of 2.19%, which is lower than just using the acoustic model without frontend.
s |
format |
Theses |
author |
Agus Haryono, Stefanus |
author_facet |
Agus Haryono, Stefanus |
author_sort |
Agus Haryono, Stefanus |
title |
FAR DISTANCE AUTOMATIC SPEECH RECOGNITION IN INDONESIAN LANGUAGE |
title_short |
FAR DISTANCE AUTOMATIC SPEECH RECOGNITION IN INDONESIAN LANGUAGE |
title_full |
FAR DISTANCE AUTOMATIC SPEECH RECOGNITION IN INDONESIAN LANGUAGE |
title_fullStr |
FAR DISTANCE AUTOMATIC SPEECH RECOGNITION IN INDONESIAN LANGUAGE |
title_full_unstemmed |
FAR DISTANCE AUTOMATIC SPEECH RECOGNITION IN INDONESIAN LANGUAGE |
title_sort |
far distance automatic speech recognition in indonesian language |
url |
https://digilib.itb.ac.id/gdl/view/39800 |
_version_ |
1822925406361092096 |