FAR DISTANCE AUTOMATIC SPEECH RECOGNITION IN INDONESIAN LANGUAGE
Automatic speech recognition (ASR) is a system which is capable of translating speech into the corresponding text. Development of current ASR is focused on the case of close-range speech, in which the distance between the speaker and the microphone is less than 30 cm. There has been no research c...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Subjects: | |
Online Access: | https://digilib.itb.ac.id/gdl/view/39800 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Automatic speech recognition (ASR) is a system which is capable of translating
speech into the corresponding text. Development of current ASR is focused on the
case of close-range speech, in which the distance between the speaker and the
microphone is less than 30 cm. There has been no research conducted on the case
of far distance ASR for Indonesian language. In this research, experiments on far
distance Indonesian language ASR is conducted. Two different approaches are used
to build the ASR; by making speech processing front-end and by making a more
robust acoustic model. The tested front-end consist of spectral subtraction, wiener
filter, volume normalization, and dynamic range compression. More robust acoustic
models are achieved through additions of long distance speech as training data and
through volume perturbation. Experiments are conducted on speech data from
multiple distance, including 0 meter, 0.5 meter, 1 meter, and 2 meter, with 96 data
for each distance. Result of the experiments shows that using spectral subtraction
on baseline model reduce the average WER by 0.59%. Addition of long distance
speech as training data on acoustic model also increase the average WER by 2.31%.
Combination of the new acoustic model and spectral subtraction results in WER
reduction of 2.19%, which is lower than just using the acoustic model without frontend.
s |
---|