Transfer Learning from News Domain to Lecture Domain in Automatic Speech Recognition
In teaching and learning activities, the knowledge conveyed by the instructors is not only from references or presentation slides but also other experiences or knowledge. On the other hand, the speech recognition system (ASR) is increasingly developing and is starting to be implemented a lot, inc...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/39882 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | In teaching and learning activities, the knowledge conveyed by the instructors is not only from
references or presentation slides but also other experiences or knowledge. On the other hand, the
speech recognition system (ASR) is increasingly developing and is starting to be implemented a
lot, including in the lecture domain. ASR that was built from the start requires very large data,
both in the form of voice recording data or text data. Therefore, it can use another approach, namely
transfer learning, is an approach to building models by utilizing existing models as source models.
This final project begins with the stage of data collection on the lecture domain of Informatics
ITB. ASR experiments use spontaneous language models on the news domain as source models.
In general, the final assignment is divided into three systems, namely systems that use the news
domain (baseline), lecture domain (baseline), and both (transfer learning). In all three systems, the
acoustic model used was triphone GMM-HMM and also MAP which was only on system C. The
third language system model uses n-gram and LSTM with projection layer. Transfer learning
implemented on language models with N-gram interpolation and transfer of the LSTMP model.
The news domain system provides WER results of 78.30% (5-fold) and 85.18% (10sp), the lecture
domain system is 58,232% (5-fold) and 62.18% (10sp), and the learning system transfers 52,734%
(5-fold) and 67.0 (10sp). The smaller the WER value, the better the model is built, so the best ASR
for the lecture is the transfer learning approach on ordinary language models and triphone on the
acoustic model. |
---|