DEVELOPMENT OF INDONESIAN SPEECH RECOGNITION SYSTEM BASED ON DEEP NEURAL NETWORK FOR GIVING SUBTITLES ON RECORDED NEWS BROADCAST

developed to improve accessibility for hearing-impaired news viewers. To create a speech recognition system for the news broadcast domain, a speech corpus, text corpus, and appropriate acoustic modeling techniques are needed. This final project discusses the construction of the speech corpus and tex...

Full description

Saved in:

Bibliographic Details
Main Author:	Alghifari, Mochamad
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/67108
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:67108
spelling	id-itb.:671082022-08-10T03:23:03ZDEVELOPMENT OF INDONESIAN SPEECH RECOGNITION SYSTEM BASED ON DEEP NEURAL NETWORK FOR GIVING SUBTITLES ON RECORDED NEWS BROADCAST Alghifari, Mochamad Indonesia Final Project CNN-HMM, acoustic model, speech corpus, text corpus, news broadcast INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/67108 developed to improve accessibility for hearing-impaired news viewers. To create a speech recognition system for the news broadcast domain, a speech corpus, text corpus, and appropriate acoustic modeling techniques are needed. This final project discusses the construction of the speech corpus and text corpus in the news broadcast domain as well as the construction of acoustic models, language models, and lexicon which are then integrated into the speech recognition system. The speech corpus is obtained from news broadcast recordings which are then annotated. The text corpus is obtained from the annotated transcription of the speech corpus, web scraping of relevant online news, and a collection of online news items created by ILPS, Informatics Institute, University of Amsterdam. The lexicon was built using the Indonesian lexicon generator. The best acoustic modeling technique is selected from the results of the comparison of the GMM- HMM, DNN-HMM, and CNN-HMM techniques. Determination of the best speech corpus is done by comparing the word error rate (WER) of the built acoustic model. The best text corpus is determined from the value of out-of-vocabulary (OOV) and the perplexity of the language model built. The best speech and text corpus was used to compare WER on all three acoustic models. The use of the CNN-HMM technique improves the performance of the speech recognition system by 4.42% compared to the GMM-HMM technique and 1.98% compared to the DNN-HMM technique. Therefore, the acoustic model built using the CNN-HMM technique was chosen to be integrated with the subtitle system. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	developed to improve accessibility for hearing-impaired news viewers. To create a speech recognition system for the news broadcast domain, a speech corpus, text corpus, and appropriate acoustic modeling techniques are needed. This final project discusses the construction of the speech corpus and text corpus in the news broadcast domain as well as the construction of acoustic models, language models, and lexicon which are then integrated into the speech recognition system. The speech corpus is obtained from news broadcast recordings which are then annotated. The text corpus is obtained from the annotated transcription of the speech corpus, web scraping of relevant online news, and a collection of online news items created by ILPS, Informatics Institute, University of Amsterdam. The lexicon was built using the Indonesian lexicon generator. The best acoustic modeling technique is selected from the results of the comparison of the GMM- HMM, DNN-HMM, and CNN-HMM techniques. Determination of the best speech corpus is done by comparing the word error rate (WER) of the built acoustic model. The best text corpus is determined from the value of out-of-vocabulary (OOV) and the perplexity of the language model built. The best speech and text corpus was used to compare WER on all three acoustic models. The use of the CNN-HMM technique improves the performance of the speech recognition system by 4.42% compared to the GMM-HMM technique and 1.98% compared to the DNN-HMM technique. Therefore, the acoustic model built using the CNN-HMM technique was chosen to be integrated with the subtitle system.
format	Final Project
author	Alghifari, Mochamad
spellingShingle	Alghifari, Mochamad DEVELOPMENT OF INDONESIAN SPEECH RECOGNITION SYSTEM BASED ON DEEP NEURAL NETWORK FOR GIVING SUBTITLES ON RECORDED NEWS BROADCAST
author_facet	Alghifari, Mochamad
author_sort	Alghifari, Mochamad
title	DEVELOPMENT OF INDONESIAN SPEECH RECOGNITION SYSTEM BASED ON DEEP NEURAL NETWORK FOR GIVING SUBTITLES ON RECORDED NEWS BROADCAST
title_short	DEVELOPMENT OF INDONESIAN SPEECH RECOGNITION SYSTEM BASED ON DEEP NEURAL NETWORK FOR GIVING SUBTITLES ON RECORDED NEWS BROADCAST
title_full	DEVELOPMENT OF INDONESIAN SPEECH RECOGNITION SYSTEM BASED ON DEEP NEURAL NETWORK FOR GIVING SUBTITLES ON RECORDED NEWS BROADCAST
title_fullStr	DEVELOPMENT OF INDONESIAN SPEECH RECOGNITION SYSTEM BASED ON DEEP NEURAL NETWORK FOR GIVING SUBTITLES ON RECORDED NEWS BROADCAST
title_full_unstemmed	DEVELOPMENT OF INDONESIAN SPEECH RECOGNITION SYSTEM BASED ON DEEP NEURAL NETWORK FOR GIVING SUBTITLES ON RECORDED NEWS BROADCAST
title_sort	development of indonesian speech recognition system based on deep neural network for giving subtitles on recorded news broadcast
url	https://digilib.itb.ac.id/gdl/view/67108
_version_	1822277819218001920

DEVELOPMENT OF INDONESIAN SPEECH RECOGNITION SYSTEM BASED ON DEEP NEURAL NETWORK FOR GIVING SUBTITLES ON RECORDED NEWS BROADCAST

Similar Items