CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM

ABSTRACT CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM By AHMAD NAUFAL HAKIM NIM : 13517055 Automatic speech recognition systems that use servers as speech processors still have connectivity and latency issues, because there are delays wh...

Full description

Saved in:
Bibliographic Details
Main Author: Naufal Hakim, Ahmad
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/64100
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:64100
spelling id-itb.:641002022-03-29T08:57:39ZCLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM Naufal Hakim, Ahmad Indonesia Final Project automatic speech recognition, acoustic model, WER, RTF INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/64100 ABSTRACT CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM By AHMAD NAUFAL HAKIM NIM : 13517055 Automatic speech recognition systems that use servers as speech processors still have connectivity and latency issues, because there are delays when the client sends speech signals to the server and when the server sends the recognition results back to the client. In this study, an Indonesian real-time automatic speech recognition system was built using a client-only architecture. Collecting of voice and text data, acoustic modeling based on HMM-GMM and neural network method that will be used as model of the system to be built, and graphical user interface development are conducted as means of building the speech recognition system. In this study, WER and RTF values were used to measure the performance of the acoustic model that had been built. The acoustic model based on HMM-GMM method has an average WER value of 19.060%, while the acoustic model based on DNN method has an average WER value of 11.951%. Although the average WER value of HMM-GMM acoustic model is higher than the DNN acoustic model, the HMM-GMM acoustic model is able to perform faster decoding than the DNN acoustic model. This can be seen from the average RTF value obtained by the HMM-GMM acoustic model, which is around 0.048, while the DNN acoustic model has an average RTF value of 0.1784. Keywords: automatic speech recognition, acoustic model, WER, RTF text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description ABSTRACT CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM By AHMAD NAUFAL HAKIM NIM : 13517055 Automatic speech recognition systems that use servers as speech processors still have connectivity and latency issues, because there are delays when the client sends speech signals to the server and when the server sends the recognition results back to the client. In this study, an Indonesian real-time automatic speech recognition system was built using a client-only architecture. Collecting of voice and text data, acoustic modeling based on HMM-GMM and neural network method that will be used as model of the system to be built, and graphical user interface development are conducted as means of building the speech recognition system. In this study, WER and RTF values were used to measure the performance of the acoustic model that had been built. The acoustic model based on HMM-GMM method has an average WER value of 19.060%, while the acoustic model based on DNN method has an average WER value of 11.951%. Although the average WER value of HMM-GMM acoustic model is higher than the DNN acoustic model, the HMM-GMM acoustic model is able to perform faster decoding than the DNN acoustic model. This can be seen from the average RTF value obtained by the HMM-GMM acoustic model, which is around 0.048, while the DNN acoustic model has an average RTF value of 0.1784. Keywords: automatic speech recognition, acoustic model, WER, RTF
format Final Project
author Naufal Hakim, Ahmad
spellingShingle Naufal Hakim, Ahmad
CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM
author_facet Naufal Hakim, Ahmad
author_sort Naufal Hakim, Ahmad
title CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM
title_short CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM
title_full CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM
title_fullStr CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM
title_full_unstemmed CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM
title_sort client-only architecture implementation on indonesian real-time automatic speech recognition system
url https://digilib.itb.ac.id/gdl/view/64100
_version_ 1822276923487682560