CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM

ABSTRACT CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM By AHMAD NAUFAL HAKIM NIM : 13517055 Automatic speech recognition systems that use servers as speech processors still have connectivity and latency issues, because there are delays wh...

Full description

Saved in:
Bibliographic Details
Main Author: Naufal Hakim, Ahmad
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/64100
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:ABSTRACT CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM By AHMAD NAUFAL HAKIM NIM : 13517055 Automatic speech recognition systems that use servers as speech processors still have connectivity and latency issues, because there are delays when the client sends speech signals to the server and when the server sends the recognition results back to the client. In this study, an Indonesian real-time automatic speech recognition system was built using a client-only architecture. Collecting of voice and text data, acoustic modeling based on HMM-GMM and neural network method that will be used as model of the system to be built, and graphical user interface development are conducted as means of building the speech recognition system. In this study, WER and RTF values were used to measure the performance of the acoustic model that had been built. The acoustic model based on HMM-GMM method has an average WER value of 19.060%, while the acoustic model based on DNN method has an average WER value of 11.951%. Although the average WER value of HMM-GMM acoustic model is higher than the DNN acoustic model, the HMM-GMM acoustic model is able to perform faster decoding than the DNN acoustic model. This can be seen from the average RTF value obtained by the HMM-GMM acoustic model, which is around 0.048, while the DNN acoustic model has an average RTF value of 0.1784. Keywords: automatic speech recognition, acoustic model, WER, RTF