DEVELOPMENT OF INDONESIAN SPEECH RECOGNITION SYSTEM BASED ON DEEP NEURAL NETWORK FOR GIVING SUBTITLES ON RECORDED NEWS BROADCAST
developed to improve accessibility for hearing-impaired news viewers. To create a speech recognition system for the news broadcast domain, a speech corpus, text corpus, and appropriate acoustic modeling techniques are needed. This final project discusses the construction of the speech corpus and tex...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/67108 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | developed to improve accessibility for hearing-impaired news viewers. To create a speech recognition system for the news broadcast domain, a speech corpus, text corpus, and appropriate acoustic modeling techniques are needed. This final project discusses the construction of the speech corpus and text corpus in the news broadcast domain as well as the construction of acoustic models, language models, and lexicon which are then integrated into the speech recognition system.
The speech corpus is obtained from news broadcast recordings which are then annotated. The text corpus is obtained from the annotated transcription of the speech corpus, web scraping of relevant online news, and a collection of online news items created by ILPS, Informatics Institute, University of Amsterdam. The lexicon was built using the Indonesian lexicon generator. The best acoustic modeling technique is selected from the results of the comparison of the GMM- HMM, DNN-HMM, and CNN-HMM techniques.
Determination of the best speech corpus is done by comparing the word error rate (WER) of the built acoustic model. The best text corpus is determined from the value of out-of-vocabulary (OOV) and the perplexity of the language model built. The best speech and text corpus was used to compare WER on all three acoustic models.
The use of the CNN-HMM technique improves the performance of the speech recognition system by 4.42% compared to the GMM-HMM technique and 1.98% compared to the DNN-HMM technique. Therefore, the acoustic model built using the CNN-HMM technique was chosen to be integrated with the subtitle system. |
---|