CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM
ABSTRACT CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM By AHMAD NAUFAL HAKIM NIM : 13517055 Automatic speech recognition systems that use servers as speech processors still have connectivity and latency issues, because there are delays wh...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/64100 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | ABSTRACT
CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON
INDONESIAN REAL-TIME AUTOMATIC SPEECH
RECOGNITION SYSTEM
By
AHMAD NAUFAL HAKIM
NIM : 13517055
Automatic speech recognition systems that use servers as speech processors
still have connectivity and latency issues, because there are delays when the client
sends speech signals to the server and when the server sends the recognition results
back to the client. In this study, an Indonesian real-time automatic speech
recognition system was built using a client-only architecture. Collecting of voice
and text data, acoustic modeling based on HMM-GMM and neural network
method that will be used as model of the system to be built, and graphical user
interface development are conducted as means of building the speech recognition
system.
In this study, WER and RTF values were used to measure the performance of
the acoustic model that had been built. The acoustic model based on HMM-GMM
method has an average WER value of 19.060%, while the acoustic model based on
DNN method has an average WER value of 11.951%. Although the average WER
value of HMM-GMM acoustic model is higher than the DNN acoustic model, the
HMM-GMM acoustic model is able to perform faster decoding than the DNN
acoustic model. This can be seen from the average RTF value obtained by the
HMM-GMM acoustic model, which is around 0.048, while the DNN acoustic
model has an average RTF value of 0.1784.
Keywords: automatic speech recognition, acoustic model, WER, RTF |
---|