CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM

ABSTRACT CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM By AHMAD NAUFAL HAKIM NIM : 13517055 Automatic speech recognition systems that use servers as speech processors still have connectivity and latency issues, because there are delays wh...

Full description

Saved in:

Bibliographic Details
Main Author:	Naufal Hakim, Ahmad
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/64100
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:64100
spelling	id-itb.:641002022-03-29T08:57:39ZCLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM Naufal Hakim, Ahmad Indonesia Final Project automatic speech recognition, acoustic model, WER, RTF INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/64100 ABSTRACT CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM By AHMAD NAUFAL HAKIM NIM : 13517055 Automatic speech recognition systems that use servers as speech processors still have connectivity and latency issues, because there are delays when the client sends speech signals to the server and when the server sends the recognition results back to the client. In this study, an Indonesian real-time automatic speech recognition system was built using a client-only architecture. Collecting of voice and text data, acoustic modeling based on HMM-GMM and neural network method that will be used as model of the system to be built, and graphical user interface development are conducted as means of building the speech recognition system. In this study, WER and RTF values were used to measure the performance of the acoustic model that had been built. The acoustic model based on HMM-GMM method has an average WER value of 19.060%, while the acoustic model based on DNN method has an average WER value of 11.951%. Although the average WER value of HMM-GMM acoustic model is higher than the DNN acoustic model, the HMM-GMM acoustic model is able to perform faster decoding than the DNN acoustic model. This can be seen from the average RTF value obtained by the HMM-GMM acoustic model, which is around 0.048, while the DNN acoustic model has an average RTF value of 0.1784. Keywords: automatic speech recognition, acoustic model, WER, RTF text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	ABSTRACT CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM By AHMAD NAUFAL HAKIM NIM : 13517055 Automatic speech recognition systems that use servers as speech processors still have connectivity and latency issues, because there are delays when the client sends speech signals to the server and when the server sends the recognition results back to the client. In this study, an Indonesian real-time automatic speech recognition system was built using a client-only architecture. Collecting of voice and text data, acoustic modeling based on HMM-GMM and neural network method that will be used as model of the system to be built, and graphical user interface development are conducted as means of building the speech recognition system. In this study, WER and RTF values were used to measure the performance of the acoustic model that had been built. The acoustic model based on HMM-GMM method has an average WER value of 19.060%, while the acoustic model based on DNN method has an average WER value of 11.951%. Although the average WER value of HMM-GMM acoustic model is higher than the DNN acoustic model, the HMM-GMM acoustic model is able to perform faster decoding than the DNN acoustic model. This can be seen from the average RTF value obtained by the HMM-GMM acoustic model, which is around 0.048, while the DNN acoustic model has an average RTF value of 0.1784. Keywords: automatic speech recognition, acoustic model, WER, RTF
format	Final Project
author	Naufal Hakim, Ahmad
spellingShingle	Naufal Hakim, Ahmad CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM
author_facet	Naufal Hakim, Ahmad
author_sort	Naufal Hakim, Ahmad
title	CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM
title_short	CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM
title_full	CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM
title_fullStr	CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM
title_full_unstemmed	CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM
title_sort	client-only architecture implementation on indonesian real-time automatic speech recognition system
url	https://digilib.itb.ac.id/gdl/view/64100
_version_	1822276923487682560

CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM

Similar Items