POST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM

This research aims to construct a Text to Speech (TTS) system in the domain of Indonesian Language that can control its emotion and prosody. Emotion FastPitch Model is used which was developed directly from the FastPitch Model. The development is intended so that the resulting prosody is not only...

Full description

Saved in:

Bibliographic Details
Main Author:	Azhar Dhiaulhaq, Moch
Format:	Theses
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/68652
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:68652
spelling	id-itb.:686522022-09-19T08:00:24ZPOST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM Azhar Dhiaulhaq, Moch Indonesia Theses Text to Speech System, FastPitch, HifiGAN, MOS INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/68652 This research aims to construct a Text to Speech (TTS) system in the domain of Indonesian Language that can control its emotion and prosody. Emotion FastPitch Model is used which was developed directly from the FastPitch Model. The development is intended so that the resulting prosody is not only predicted from the input sentence, but also the emotion label. To be able to generate emotions better, the Emotion FastPitch Model also handles feature energy as an additional feature. The Vocoder Model used to convert Mel Spectrogram into audio is the HifiGAN Model. All models are trained with expressive corpus. The corpus contains 11,500 pairs of text, audio, and emotion labels with a total duration of 21 hours and 57 minutes. The corpus contains angry, happy, sad, and neutral emotions. The Emotion FastPitch Model will be compared directly with the FastPitch Model as the baseline. This model is also joined with the HifiGAN Vocoder. Both models were evaluated using the Mean Opinion Score (MOS). The Emotion FastPitch Model got a value of 3.77 ± 0.09257. Higher than the baseline model with 3.272 ± 0.1067 MOS value. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	This research aims to construct a Text to Speech (TTS) system in the domain of Indonesian Language that can control its emotion and prosody. Emotion FastPitch Model is used which was developed directly from the FastPitch Model. The development is intended so that the resulting prosody is not only predicted from the input sentence, but also the emotion label. To be able to generate emotions better, the Emotion FastPitch Model also handles feature energy as an additional feature. The Vocoder Model used to convert Mel Spectrogram into audio is the HifiGAN Model. All models are trained with expressive corpus. The corpus contains 11,500 pairs of text, audio, and emotion labels with a total duration of 21 hours and 57 minutes. The corpus contains angry, happy, sad, and neutral emotions. The Emotion FastPitch Model will be compared directly with the FastPitch Model as the baseline. This model is also joined with the HifiGAN Vocoder. Both models were evaluated using the Mean Opinion Score (MOS). The Emotion FastPitch Model got a value of 3.77 ± 0.09257. Higher than the baseline model with 3.272 ± 0.1067 MOS value.
format	Theses
author	Azhar Dhiaulhaq, Moch
spellingShingle	Azhar Dhiaulhaq, Moch POST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM
author_facet	Azhar Dhiaulhaq, Moch
author_sort	Azhar Dhiaulhaq, Moch
title	POST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM
title_short	POST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM
title_full	POST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM
title_fullStr	POST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM
title_full_unstemmed	POST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM
title_sort	post-control prosody and emotion for indonesia text to speech system
url	https://digilib.itb.ac.id/gdl/view/68652
_version_	1822005812977991680

POST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM

Similar Items