FAR DISTANCE AUTOMATIC SPEECH RECOGNITION IN INDONESIAN LANGUAGE

Automatic speech recognition (ASR) is a system which is capable of translating speech into the corresponding text. Development of current ASR is focused on the case of close-range speech, in which the distance between the speaker and the microphone is less than 30 cm. There has been no research c...

Full description

Saved in:
Bibliographic Details
Main Author: Agus Haryono, Stefanus
Format: Theses
Language:Indonesia
Subjects:
Online Access:https://digilib.itb.ac.id/gdl/view/39800
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:39800
spelling id-itb.:398002019-06-27T16:07:01ZFAR DISTANCE AUTOMATIC SPEECH RECOGNITION IN INDONESIAN LANGUAGE Agus Haryono, Stefanus Teknik (Rekayasa, enjinering dan kegiatan berkaitan) Indonesia Theses ASR, acoustic model, phoneme. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/39800 Automatic speech recognition (ASR) is a system which is capable of translating speech into the corresponding text. Development of current ASR is focused on the case of close-range speech, in which the distance between the speaker and the microphone is less than 30 cm. There has been no research conducted on the case of far distance ASR for Indonesian language. In this research, experiments on far distance Indonesian language ASR is conducted. Two different approaches are used to build the ASR; by making speech processing front-end and by making a more robust acoustic model. The tested front-end consist of spectral subtraction, wiener filter, volume normalization, and dynamic range compression. More robust acoustic models are achieved through additions of long distance speech as training data and through volume perturbation. Experiments are conducted on speech data from multiple distance, including 0 meter, 0.5 meter, 1 meter, and 2 meter, with 96 data for each distance. Result of the experiments shows that using spectral subtraction on baseline model reduce the average WER by 0.59%. Addition of long distance speech as training data on acoustic model also increase the average WER by 2.31%. Combination of the new acoustic model and spectral subtraction results in WER reduction of 2.19%, which is lower than just using the acoustic model without frontend. s text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
topic Teknik (Rekayasa, enjinering dan kegiatan berkaitan)
spellingShingle Teknik (Rekayasa, enjinering dan kegiatan berkaitan)
Agus Haryono, Stefanus
FAR DISTANCE AUTOMATIC SPEECH RECOGNITION IN INDONESIAN LANGUAGE
description Automatic speech recognition (ASR) is a system which is capable of translating speech into the corresponding text. Development of current ASR is focused on the case of close-range speech, in which the distance between the speaker and the microphone is less than 30 cm. There has been no research conducted on the case of far distance ASR for Indonesian language. In this research, experiments on far distance Indonesian language ASR is conducted. Two different approaches are used to build the ASR; by making speech processing front-end and by making a more robust acoustic model. The tested front-end consist of spectral subtraction, wiener filter, volume normalization, and dynamic range compression. More robust acoustic models are achieved through additions of long distance speech as training data and through volume perturbation. Experiments are conducted on speech data from multiple distance, including 0 meter, 0.5 meter, 1 meter, and 2 meter, with 96 data for each distance. Result of the experiments shows that using spectral subtraction on baseline model reduce the average WER by 0.59%. Addition of long distance speech as training data on acoustic model also increase the average WER by 2.31%. Combination of the new acoustic model and spectral subtraction results in WER reduction of 2.19%, which is lower than just using the acoustic model without frontend. s
format Theses
author Agus Haryono, Stefanus
author_facet Agus Haryono, Stefanus
author_sort Agus Haryono, Stefanus
title FAR DISTANCE AUTOMATIC SPEECH RECOGNITION IN INDONESIAN LANGUAGE
title_short FAR DISTANCE AUTOMATIC SPEECH RECOGNITION IN INDONESIAN LANGUAGE
title_full FAR DISTANCE AUTOMATIC SPEECH RECOGNITION IN INDONESIAN LANGUAGE
title_fullStr FAR DISTANCE AUTOMATIC SPEECH RECOGNITION IN INDONESIAN LANGUAGE
title_full_unstemmed FAR DISTANCE AUTOMATIC SPEECH RECOGNITION IN INDONESIAN LANGUAGE
title_sort far distance automatic speech recognition in indonesian language
url https://digilib.itb.ac.id/gdl/view/39800
_version_ 1822925406361092096