Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition
In teaching and learning activities, the knowledge conveyed by the instructors is not only from references or presentation slides but also other experiences or knowledge. On the other hand, the speech recognition system (ASR) is increasingly developing and is starting to be implemented a lot, inc...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/39882 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:39882 |
---|---|
spelling |
id-itb.:398822019-06-28T10:59:51ZTransfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition Zakiah, Iftitakhul Indonesia Final Project automatic speech recognition, transfer learning, language model, acoustic model, triphone GMM-HMM, MAP, N-gram interpolation, LSTM, WER. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/39882 In teaching and learning activities, the knowledge conveyed by the instructors is not only from references or presentation slides but also other experiences or knowledge. On the other hand, the speech recognition system (ASR) is increasingly developing and is starting to be implemented a lot, including in the lecture domain. ASR that was built from the start requires very large data, both in the form of voice recording data or text data. Therefore, it can use another approach, namely transfer learning, is an approach to building models by utilizing existing models as source models. This final project begins with the stage of data collection on the lecture domain of Informatics ITB. ASR experiments use spontaneous language models on the news domain as source models. In general, the final assignment is divided into three systems, namely systems that use the news domain (baseline), lecture domain (baseline), and both (transfer learning). In all three systems, the acoustic model used was triphone GMM-HMM and also MAP which was only on system C. The third language system model uses n-gram and LSTM with projection layer. Transfer learning implemented on language models with N-gram interpolation and transfer of the LSTMP model. The news domain system provides WER results of 78.30% (5-fold) and 85.18% (10sp), the lecture domain system is 58,232% (5-fold) and 62.18% (10sp), and the learning system transfers 52,734% (5-fold) and 67.0 (10sp). The smaller the WER value, the better the model is built, so the best ASR for the lecture is the transfer learning approach on ordinary language models and triphone on the acoustic model. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
In teaching and learning activities, the knowledge conveyed by the instructors is not only from
references or presentation slides but also other experiences or knowledge. On the other hand, the
speech recognition system (ASR) is increasingly developing and is starting to be implemented a
lot, including in the lecture domain. ASR that was built from the start requires very large data,
both in the form of voice recording data or text data. Therefore, it can use another approach, namely
transfer learning, is an approach to building models by utilizing existing models as source models.
This final project begins with the stage of data collection on the lecture domain of Informatics
ITB. ASR experiments use spontaneous language models on the news domain as source models.
In general, the final assignment is divided into three systems, namely systems that use the news
domain (baseline), lecture domain (baseline), and both (transfer learning). In all three systems, the
acoustic model used was triphone GMM-HMM and also MAP which was only on system C. The
third language system model uses n-gram and LSTM with projection layer. Transfer learning
implemented on language models with N-gram interpolation and transfer of the LSTMP model.
The news domain system provides WER results of 78.30% (5-fold) and 85.18% (10sp), the lecture
domain system is 58,232% (5-fold) and 62.18% (10sp), and the learning system transfers 52,734%
(5-fold) and 67.0 (10sp). The smaller the WER value, the better the model is built, so the best ASR
for the lecture is the transfer learning approach on ordinary language models and triphone on the
acoustic model. |
format |
Final Project |
author |
Zakiah, Iftitakhul |
spellingShingle |
Zakiah, Iftitakhul Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition |
author_facet |
Zakiah, Iftitakhul |
author_sort |
Zakiah, Iftitakhul |
title |
Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition |
title_short |
Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition |
title_full |
Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition |
title_fullStr |
Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition |
title_full_unstemmed |
Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition |
title_sort |
transfer learning from news domain to lecture domain in automatic speech recognition |
url |
https://digilib.itb.ac.id/gdl/view/39882 |
_version_ |
1822925458798280704 |