POST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM

This research aims to construct a Text to Speech (TTS) system in the domain of Indonesian Language that can control its emotion and prosody. Emotion FastPitch Model is used which was developed directly from the FastPitch Model. The development is intended so that the resulting prosody is not only...

Full description

Saved in:
Bibliographic Details
Main Author: Azhar Dhiaulhaq, Moch
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/68652
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:68652
spelling id-itb.:686522022-09-19T08:00:24ZPOST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM Azhar Dhiaulhaq, Moch Indonesia Theses Text to Speech System, FastPitch, HifiGAN, MOS INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/68652 This research aims to construct a Text to Speech (TTS) system in the domain of Indonesian Language that can control its emotion and prosody. Emotion FastPitch Model is used which was developed directly from the FastPitch Model. The development is intended so that the resulting prosody is not only predicted from the input sentence, but also the emotion label. To be able to generate emotions better, the Emotion FastPitch Model also handles feature energy as an additional feature. The Vocoder Model used to convert Mel Spectrogram into audio is the HifiGAN Model. All models are trained with expressive corpus. The corpus contains 11,500 pairs of text, audio, and emotion labels with a total duration of 21 hours and 57 minutes. The corpus contains angry, happy, sad, and neutral emotions. The Emotion FastPitch Model will be compared directly with the FastPitch Model as the baseline. This model is also joined with the HifiGAN Vocoder. Both models were evaluated using the Mean Opinion Score (MOS). The Emotion FastPitch Model got a value of 3.77 ± 0.09257. Higher than the baseline model with 3.272 ± 0.1067 MOS value. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description This research aims to construct a Text to Speech (TTS) system in the domain of Indonesian Language that can control its emotion and prosody. Emotion FastPitch Model is used which was developed directly from the FastPitch Model. The development is intended so that the resulting prosody is not only predicted from the input sentence, but also the emotion label. To be able to generate emotions better, the Emotion FastPitch Model also handles feature energy as an additional feature. The Vocoder Model used to convert Mel Spectrogram into audio is the HifiGAN Model. All models are trained with expressive corpus. The corpus contains 11,500 pairs of text, audio, and emotion labels with a total duration of 21 hours and 57 minutes. The corpus contains angry, happy, sad, and neutral emotions. The Emotion FastPitch Model will be compared directly with the FastPitch Model as the baseline. This model is also joined with the HifiGAN Vocoder. Both models were evaluated using the Mean Opinion Score (MOS). The Emotion FastPitch Model got a value of 3.77 ± 0.09257. Higher than the baseline model with 3.272 ± 0.1067 MOS value.
format Theses
author Azhar Dhiaulhaq, Moch
spellingShingle Azhar Dhiaulhaq, Moch
POST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM
author_facet Azhar Dhiaulhaq, Moch
author_sort Azhar Dhiaulhaq, Moch
title POST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM
title_short POST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM
title_full POST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM
title_fullStr POST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM
title_full_unstemmed POST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM
title_sort post-control prosody and emotion for indonesia text to speech system
url https://digilib.itb.ac.id/gdl/view/68652
_version_ 1822005812977991680