Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition

In teaching and learning activities, the knowledge conveyed by the instructors is not only from references or presentation slides but also other experiences or knowledge. On the other hand, the speech recognition system (ASR) is increasingly developing and is starting to be implemented a lot, inc...

Full description

Saved in:
Bibliographic Details
Main Author: Zakiah, Iftitakhul
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/39882
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:39882
spelling id-itb.:398822019-06-28T10:59:51ZTransfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition Zakiah, Iftitakhul Indonesia Final Project automatic speech recognition, transfer learning, language model, acoustic model, triphone GMM-HMM, MAP, N-gram interpolation, LSTM, WER. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/39882 In teaching and learning activities, the knowledge conveyed by the instructors is not only from references or presentation slides but also other experiences or knowledge. On the other hand, the speech recognition system (ASR) is increasingly developing and is starting to be implemented a lot, including in the lecture domain. ASR that was built from the start requires very large data, both in the form of voice recording data or text data. Therefore, it can use another approach, namely transfer learning, is an approach to building models by utilizing existing models as source models. This final project begins with the stage of data collection on the lecture domain of Informatics ITB. ASR experiments use spontaneous language models on the news domain as source models. In general, the final assignment is divided into three systems, namely systems that use the news domain (baseline), lecture domain (baseline), and both (transfer learning). In all three systems, the acoustic model used was triphone GMM-HMM and also MAP which was only on system C. The third language system model uses n-gram and LSTM with projection layer. Transfer learning implemented on language models with N-gram interpolation and transfer of the LSTMP model. The news domain system provides WER results of 78.30% (5-fold) and 85.18% (10sp), the lecture domain system is 58,232% (5-fold) and 62.18% (10sp), and the learning system transfers 52,734% (5-fold) and 67.0 (10sp). The smaller the WER value, the better the model is built, so the best ASR for the lecture is the transfer learning approach on ordinary language models and triphone on the acoustic model. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description In teaching and learning activities, the knowledge conveyed by the instructors is not only from references or presentation slides but also other experiences or knowledge. On the other hand, the speech recognition system (ASR) is increasingly developing and is starting to be implemented a lot, including in the lecture domain. ASR that was built from the start requires very large data, both in the form of voice recording data or text data. Therefore, it can use another approach, namely transfer learning, is an approach to building models by utilizing existing models as source models. This final project begins with the stage of data collection on the lecture domain of Informatics ITB. ASR experiments use spontaneous language models on the news domain as source models. In general, the final assignment is divided into three systems, namely systems that use the news domain (baseline), lecture domain (baseline), and both (transfer learning). In all three systems, the acoustic model used was triphone GMM-HMM and also MAP which was only on system C. The third language system model uses n-gram and LSTM with projection layer. Transfer learning implemented on language models with N-gram interpolation and transfer of the LSTMP model. The news domain system provides WER results of 78.30% (5-fold) and 85.18% (10sp), the lecture domain system is 58,232% (5-fold) and 62.18% (10sp), and the learning system transfers 52,734% (5-fold) and 67.0 (10sp). The smaller the WER value, the better the model is built, so the best ASR for the lecture is the transfer learning approach on ordinary language models and triphone on the acoustic model.
format Final Project
author Zakiah, Iftitakhul
spellingShingle Zakiah, Iftitakhul
Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition
author_facet Zakiah, Iftitakhul
author_sort Zakiah, Iftitakhul
title Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition
title_short Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition
title_full Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition
title_fullStr Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition
title_full_unstemmed Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition
title_sort transfer learning from news domain to lecture domain in automatic speech recognition
url https://digilib.itb.ac.id/gdl/view/39882
_version_ 1822925458798280704