CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM
ABSTRACT CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM By AHMAD NAUFAL HAKIM NIM : 13517055 Automatic speech recognition systems that use servers as speech processors still have connectivity and latency issues, because there are delays wh...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/64100 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:64100 |
---|---|
spelling |
id-itb.:641002022-03-29T08:57:39ZCLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM Naufal Hakim, Ahmad Indonesia Final Project automatic speech recognition, acoustic model, WER, RTF INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/64100 ABSTRACT CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM By AHMAD NAUFAL HAKIM NIM : 13517055 Automatic speech recognition systems that use servers as speech processors still have connectivity and latency issues, because there are delays when the client sends speech signals to the server and when the server sends the recognition results back to the client. In this study, an Indonesian real-time automatic speech recognition system was built using a client-only architecture. Collecting of voice and text data, acoustic modeling based on HMM-GMM and neural network method that will be used as model of the system to be built, and graphical user interface development are conducted as means of building the speech recognition system. In this study, WER and RTF values were used to measure the performance of the acoustic model that had been built. The acoustic model based on HMM-GMM method has an average WER value of 19.060%, while the acoustic model based on DNN method has an average WER value of 11.951%. Although the average WER value of HMM-GMM acoustic model is higher than the DNN acoustic model, the HMM-GMM acoustic model is able to perform faster decoding than the DNN acoustic model. This can be seen from the average RTF value obtained by the HMM-GMM acoustic model, which is around 0.048, while the DNN acoustic model has an average RTF value of 0.1784. Keywords: automatic speech recognition, acoustic model, WER, RTF text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
ABSTRACT
CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON
INDONESIAN REAL-TIME AUTOMATIC SPEECH
RECOGNITION SYSTEM
By
AHMAD NAUFAL HAKIM
NIM : 13517055
Automatic speech recognition systems that use servers as speech processors
still have connectivity and latency issues, because there are delays when the client
sends speech signals to the server and when the server sends the recognition results
back to the client. In this study, an Indonesian real-time automatic speech
recognition system was built using a client-only architecture. Collecting of voice
and text data, acoustic modeling based on HMM-GMM and neural network
method that will be used as model of the system to be built, and graphical user
interface development are conducted as means of building the speech recognition
system.
In this study, WER and RTF values were used to measure the performance of
the acoustic model that had been built. The acoustic model based on HMM-GMM
method has an average WER value of 19.060%, while the acoustic model based on
DNN method has an average WER value of 11.951%. Although the average WER
value of HMM-GMM acoustic model is higher than the DNN acoustic model, the
HMM-GMM acoustic model is able to perform faster decoding than the DNN
acoustic model. This can be seen from the average RTF value obtained by the
HMM-GMM acoustic model, which is around 0.048, while the DNN acoustic
model has an average RTF value of 0.1784.
Keywords: automatic speech recognition, acoustic model, WER, RTF |
format |
Final Project |
author |
Naufal Hakim, Ahmad |
spellingShingle |
Naufal Hakim, Ahmad CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM |
author_facet |
Naufal Hakim, Ahmad |
author_sort |
Naufal Hakim, Ahmad |
title |
CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM |
title_short |
CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM |
title_full |
CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM |
title_fullStr |
CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM |
title_full_unstemmed |
CLIENT-ONLY ARCHITECTURE IMPLEMENTATION ON INDONESIAN REAL-TIME AUTOMATIC SPEECH RECOGNITION SYSTEM |
title_sort |
client-only architecture implementation on indonesian real-time automatic speech recognition system |
url |
https://digilib.itb.ac.id/gdl/view/64100 |
_version_ |
1822276923487682560 |