FAR DISTANCE AUTOMATIC SPEECH RECOGNITION IN INDONESIAN LANGUAGE

Automatic speech recognition (ASR) is a system which is capable of translating speech into the corresponding text. Development of current ASR is focused on the case of close-range speech, in which the distance between the speaker and the microphone is less than 30 cm. There has been no research c...

Full description

Saved in:

Bibliographic Details
Main Author:	Agus Haryono, Stefanus
Format:	Theses
Language:	Indonesia
Subjects:	Teknik (Rekayasa, enjinering dan kegiatan berkaitan)
Online Access:	https://digilib.itb.ac.id/gdl/view/39800
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

Description
Summary:	Automatic speech recognition (ASR) is a system which is capable of translating speech into the corresponding text. Development of current ASR is focused on the case of close-range speech, in which the distance between the speaker and the microphone is less than 30 cm. There has been no research conducted on the case of far distance ASR for Indonesian language. In this research, experiments on far distance Indonesian language ASR is conducted. Two different approaches are used to build the ASR; by making speech processing front-end and by making a more robust acoustic model. The tested front-end consist of spectral subtraction, wiener filter, volume normalization, and dynamic range compression. More robust acoustic models are achieved through additions of long distance speech as training data and through volume perturbation. Experiments are conducted on speech data from multiple distance, including 0 meter, 0.5 meter, 1 meter, and 2 meter, with 96 data for each distance. Result of the experiments shows that using spectral subtraction on baseline model reduce the average WER by 0.59%. Addition of long distance speech as training data on acoustic model also increase the average WER by 2.31%. Combination of the new acoustic model and spectral subtraction results in WER reduction of 2.19%, which is lower than just using the acoustic model without frontend. s

FAR DISTANCE AUTOMATIC SPEECH RECOGNITION IN INDONESIAN LANGUAGE

Similar Items