DEVELOPMENT OF INDONESIAN SPEECH RECOGNITION SYSTEM BASED ON DEEP NEURAL NETWORK FOR GIVING SUBTITLES ON RECORDED NEWS BROADCAST

developed to improve accessibility for hearing-impaired news viewers. To create a speech recognition system for the news broadcast domain, a speech corpus, text corpus, and appropriate acoustic modeling techniques are needed. This final project discusses the construction of the speech corpus and tex...

Full description

Saved in:
Bibliographic Details
Main Author: Alghifari, Mochamad
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/67108
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:67108
spelling id-itb.:671082022-08-10T03:23:03ZDEVELOPMENT OF INDONESIAN SPEECH RECOGNITION SYSTEM BASED ON DEEP NEURAL NETWORK FOR GIVING SUBTITLES ON RECORDED NEWS BROADCAST Alghifari, Mochamad Indonesia Final Project CNN-HMM, acoustic model, speech corpus, text corpus, news broadcast INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/67108 developed to improve accessibility for hearing-impaired news viewers. To create a speech recognition system for the news broadcast domain, a speech corpus, text corpus, and appropriate acoustic modeling techniques are needed. This final project discusses the construction of the speech corpus and text corpus in the news broadcast domain as well as the construction of acoustic models, language models, and lexicon which are then integrated into the speech recognition system. The speech corpus is obtained from news broadcast recordings which are then annotated. The text corpus is obtained from the annotated transcription of the speech corpus, web scraping of relevant online news, and a collection of online news items created by ILPS, Informatics Institute, University of Amsterdam. The lexicon was built using the Indonesian lexicon generator. The best acoustic modeling technique is selected from the results of the comparison of the GMM- HMM, DNN-HMM, and CNN-HMM techniques. Determination of the best speech corpus is done by comparing the word error rate (WER) of the built acoustic model. The best text corpus is determined from the value of out-of-vocabulary (OOV) and the perplexity of the language model built. The best speech and text corpus was used to compare WER on all three acoustic models. The use of the CNN-HMM technique improves the performance of the speech recognition system by 4.42% compared to the GMM-HMM technique and 1.98% compared to the DNN-HMM technique. Therefore, the acoustic model built using the CNN-HMM technique was chosen to be integrated with the subtitle system. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description developed to improve accessibility for hearing-impaired news viewers. To create a speech recognition system for the news broadcast domain, a speech corpus, text corpus, and appropriate acoustic modeling techniques are needed. This final project discusses the construction of the speech corpus and text corpus in the news broadcast domain as well as the construction of acoustic models, language models, and lexicon which are then integrated into the speech recognition system. The speech corpus is obtained from news broadcast recordings which are then annotated. The text corpus is obtained from the annotated transcription of the speech corpus, web scraping of relevant online news, and a collection of online news items created by ILPS, Informatics Institute, University of Amsterdam. The lexicon was built using the Indonesian lexicon generator. The best acoustic modeling technique is selected from the results of the comparison of the GMM- HMM, DNN-HMM, and CNN-HMM techniques. Determination of the best speech corpus is done by comparing the word error rate (WER) of the built acoustic model. The best text corpus is determined from the value of out-of-vocabulary (OOV) and the perplexity of the language model built. The best speech and text corpus was used to compare WER on all three acoustic models. The use of the CNN-HMM technique improves the performance of the speech recognition system by 4.42% compared to the GMM-HMM technique and 1.98% compared to the DNN-HMM technique. Therefore, the acoustic model built using the CNN-HMM technique was chosen to be integrated with the subtitle system.
format Final Project
author Alghifari, Mochamad
spellingShingle Alghifari, Mochamad
DEVELOPMENT OF INDONESIAN SPEECH RECOGNITION SYSTEM BASED ON DEEP NEURAL NETWORK FOR GIVING SUBTITLES ON RECORDED NEWS BROADCAST
author_facet Alghifari, Mochamad
author_sort Alghifari, Mochamad
title DEVELOPMENT OF INDONESIAN SPEECH RECOGNITION SYSTEM BASED ON DEEP NEURAL NETWORK FOR GIVING SUBTITLES ON RECORDED NEWS BROADCAST
title_short DEVELOPMENT OF INDONESIAN SPEECH RECOGNITION SYSTEM BASED ON DEEP NEURAL NETWORK FOR GIVING SUBTITLES ON RECORDED NEWS BROADCAST
title_full DEVELOPMENT OF INDONESIAN SPEECH RECOGNITION SYSTEM BASED ON DEEP NEURAL NETWORK FOR GIVING SUBTITLES ON RECORDED NEWS BROADCAST
title_fullStr DEVELOPMENT OF INDONESIAN SPEECH RECOGNITION SYSTEM BASED ON DEEP NEURAL NETWORK FOR GIVING SUBTITLES ON RECORDED NEWS BROADCAST
title_full_unstemmed DEVELOPMENT OF INDONESIAN SPEECH RECOGNITION SYSTEM BASED ON DEEP NEURAL NETWORK FOR GIVING SUBTITLES ON RECORDED NEWS BROADCAST
title_sort development of indonesian speech recognition system based on deep neural network for giving subtitles on recorded news broadcast
url https://digilib.itb.ac.id/gdl/view/67108
_version_ 1822277819218001920