Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition

In teaching and learning activities, the knowledge conveyed by the instructors is not only from references or presentation slides but also other experiences or knowledge. On the other hand, the speech recognition system (ASR) is increasingly developing and is starting to be implemented a lot, inc...

Full description

Saved in:

Bibliographic Details
Main Author:	Zakiah, Iftitakhul
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/39882
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:39882
spelling	id-itb.:398822019-06-28T10:59:51ZTransfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition Zakiah, Iftitakhul Indonesia Final Project automatic speech recognition, transfer learning, language model, acoustic model, triphone GMM-HMM, MAP, N-gram interpolation, LSTM, WER. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/39882 In teaching and learning activities, the knowledge conveyed by the instructors is not only from references or presentation slides but also other experiences or knowledge. On the other hand, the speech recognition system (ASR) is increasingly developing and is starting to be implemented a lot, including in the lecture domain. ASR that was built from the start requires very large data, both in the form of voice recording data or text data. Therefore, it can use another approach, namely transfer learning, is an approach to building models by utilizing existing models as source models. This final project begins with the stage of data collection on the lecture domain of Informatics ITB. ASR experiments use spontaneous language models on the news domain as source models. In general, the final assignment is divided into three systems, namely systems that use the news domain (baseline), lecture domain (baseline), and both (transfer learning). In all three systems, the acoustic model used was triphone GMM-HMM and also MAP which was only on system C. The third language system model uses n-gram and LSTM with projection layer. Transfer learning implemented on language models with N-gram interpolation and transfer of the LSTMP model. The news domain system provides WER results of 78.30% (5-fold) and 85.18% (10sp), the lecture domain system is 58,232% (5-fold) and 62.18% (10sp), and the learning system transfers 52,734% (5-fold) and 67.0 (10sp). The smaller the WER value, the better the model is built, so the best ASR for the lecture is the transfer learning approach on ordinary language models and triphone on the acoustic model. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	In teaching and learning activities, the knowledge conveyed by the instructors is not only from references or presentation slides but also other experiences or knowledge. On the other hand, the speech recognition system (ASR) is increasingly developing and is starting to be implemented a lot, including in the lecture domain. ASR that was built from the start requires very large data, both in the form of voice recording data or text data. Therefore, it can use another approach, namely transfer learning, is an approach to building models by utilizing existing models as source models. This final project begins with the stage of data collection on the lecture domain of Informatics ITB. ASR experiments use spontaneous language models on the news domain as source models. In general, the final assignment is divided into three systems, namely systems that use the news domain (baseline), lecture domain (baseline), and both (transfer learning). In all three systems, the acoustic model used was triphone GMM-HMM and also MAP which was only on system C. The third language system model uses n-gram and LSTM with projection layer. Transfer learning implemented on language models with N-gram interpolation and transfer of the LSTMP model. The news domain system provides WER results of 78.30% (5-fold) and 85.18% (10sp), the lecture domain system is 58,232% (5-fold) and 62.18% (10sp), and the learning system transfers 52,734% (5-fold) and 67.0 (10sp). The smaller the WER value, the better the model is built, so the best ASR for the lecture is the transfer learning approach on ordinary language models and triphone on the acoustic model.
format	Final Project
author	Zakiah, Iftitakhul
spellingShingle	Zakiah, Iftitakhul Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition
author_facet	Zakiah, Iftitakhul
author_sort	Zakiah, Iftitakhul
title	Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition
title_short	Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition
title_full	Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition
title_fullStr	Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition
title_full_unstemmed	Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition
title_sort	transfer learning from news domain to lecture domain in automatic speech recognition
url	https://digilib.itb.ac.id/gdl/view/39882
_version_	1822925458798280704

Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition

Similar Items