INDONESIAN AUTOMATIC SPEECH RECOGNITION DEVELOPMENT USING TRANSFER LEARNING ON MASSIVELY MULTILINGUAL SPEECH (MMS) AND WHISPER

An ideal speech recognition model is capable of accurately transcribing speech across a variety of voice signal characteristics, such as speaking style (dictated and spontaneous), speech context (formal and informal), and background noise conditions (clean and moderate). Building a model from scr...

Full description

Saved in:

Bibliographic Details
Main Author:	Adila, Aulia
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/78316
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:78316
spelling	id-itb.:783162023-09-18T23:52:49ZINDONESIAN AUTOMATIC SPEECH RECOGNITION DEVELOPMENT USING TRANSFER LEARNING ON MASSIVELY MULTILINGUAL SPEECH (MMS) AND WHISPER Adila, Aulia Indonesia Final Project end-to-end speech recognition model, transfer learning, MMS, Whisper, speech variability. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/78316 An ideal speech recognition model is capable of accurately transcribing speech across a variety of voice signal characteristics, such as speaking style (dictated and spontaneous), speech context (formal and informal), and background noise conditions (clean and moderate). Building a model from scratch with large training data is a possible approach. However, there is no substantial amount of Indonesian speech training data available that represents the variability in characteristics; therefore, an alternative approach is used to build the model effectively by utilizing the knowledge already possessed by pretrained models through transfer learning. In this final project, research was carried out on the development of an Indonesian speech recognition model using the transfer learning method applied to state-of- the-art Massively Multilingual Speech (MMS) and Whisper models, leveraging 48,570 recordings. The transfer learning output models (fine-tuned models) were tested against speech data representing a range of characteristics, and then compared with the testing of models without transfer learning (baseline models). The experimental results indicate an enhanced predictive capability of the models post transfer learning, marked by a decrease in WER (word error rate). The lowest WER value was achieved by the fine-tuned Whisper model across all test data groups. The lowest WER score was recorded on the DFC (dictated-formal-clean) test data, while the highest was noted on the SIC (spontaneous-informal-clean) dataset. Furthermore, it was concluded that the characteristics most influencing the predictive capacity of the model are variations in speaking style and speech context. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	An ideal speech recognition model is capable of accurately transcribing speech across a variety of voice signal characteristics, such as speaking style (dictated and spontaneous), speech context (formal and informal), and background noise conditions (clean and moderate). Building a model from scratch with large training data is a possible approach. However, there is no substantial amount of Indonesian speech training data available that represents the variability in characteristics; therefore, an alternative approach is used to build the model effectively by utilizing the knowledge already possessed by pretrained models through transfer learning. In this final project, research was carried out on the development of an Indonesian speech recognition model using the transfer learning method applied to state-of- the-art Massively Multilingual Speech (MMS) and Whisper models, leveraging 48,570 recordings. The transfer learning output models (fine-tuned models) were tested against speech data representing a range of characteristics, and then compared with the testing of models without transfer learning (baseline models). The experimental results indicate an enhanced predictive capability of the models post transfer learning, marked by a decrease in WER (word error rate). The lowest WER value was achieved by the fine-tuned Whisper model across all test data groups. The lowest WER score was recorded on the DFC (dictated-formal-clean) test data, while the highest was noted on the SIC (spontaneous-informal-clean) dataset. Furthermore, it was concluded that the characteristics most influencing the predictive capacity of the model are variations in speaking style and speech context.
format	Final Project
author	Adila, Aulia
spellingShingle	Adila, Aulia INDONESIAN AUTOMATIC SPEECH RECOGNITION DEVELOPMENT USING TRANSFER LEARNING ON MASSIVELY MULTILINGUAL SPEECH (MMS) AND WHISPER
author_facet	Adila, Aulia
author_sort	Adila, Aulia
title	INDONESIAN AUTOMATIC SPEECH RECOGNITION DEVELOPMENT USING TRANSFER LEARNING ON MASSIVELY MULTILINGUAL SPEECH (MMS) AND WHISPER
title_short	INDONESIAN AUTOMATIC SPEECH RECOGNITION DEVELOPMENT USING TRANSFER LEARNING ON MASSIVELY MULTILINGUAL SPEECH (MMS) AND WHISPER
title_full	INDONESIAN AUTOMATIC SPEECH RECOGNITION DEVELOPMENT USING TRANSFER LEARNING ON MASSIVELY MULTILINGUAL SPEECH (MMS) AND WHISPER
title_fullStr	INDONESIAN AUTOMATIC SPEECH RECOGNITION DEVELOPMENT USING TRANSFER LEARNING ON MASSIVELY MULTILINGUAL SPEECH (MMS) AND WHISPER
title_full_unstemmed	INDONESIAN AUTOMATIC SPEECH RECOGNITION DEVELOPMENT USING TRANSFER LEARNING ON MASSIVELY MULTILINGUAL SPEECH (MMS) AND WHISPER
title_sort	indonesian automatic speech recognition development using transfer learning on massively multilingual speech (mms) and whisper
url	https://digilib.itb.ac.id/gdl/view/78316
_version_	1822995703132061696

INDONESIAN AUTOMATIC SPEECH RECOGNITION DEVELOPMENT USING TRANSFER LEARNING ON MASSIVELY MULTILINGUAL SPEECH (MMS) AND WHISPER

Similar Items