POST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM
This research aims to construct a Text to Speech (TTS) system in the domain of Indonesian Language that can control its emotion and prosody. Emotion FastPitch Model is used which was developed directly from the FastPitch Model. The development is intended so that the resulting prosody is not only...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/68652 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:68652 |
---|---|
spelling |
id-itb.:686522022-09-19T08:00:24ZPOST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM Azhar Dhiaulhaq, Moch Indonesia Theses Text to Speech System, FastPitch, HifiGAN, MOS INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/68652 This research aims to construct a Text to Speech (TTS) system in the domain of Indonesian Language that can control its emotion and prosody. Emotion FastPitch Model is used which was developed directly from the FastPitch Model. The development is intended so that the resulting prosody is not only predicted from the input sentence, but also the emotion label. To be able to generate emotions better, the Emotion FastPitch Model also handles feature energy as an additional feature. The Vocoder Model used to convert Mel Spectrogram into audio is the HifiGAN Model. All models are trained with expressive corpus. The corpus contains 11,500 pairs of text, audio, and emotion labels with a total duration of 21 hours and 57 minutes. The corpus contains angry, happy, sad, and neutral emotions. The Emotion FastPitch Model will be compared directly with the FastPitch Model as the baseline. This model is also joined with the HifiGAN Vocoder. Both models were evaluated using the Mean Opinion Score (MOS). The Emotion FastPitch Model got a value of 3.77 ± 0.09257. Higher than the baseline model with 3.272 ± 0.1067 MOS value. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
This research aims to construct a Text to Speech (TTS) system in the domain of Indonesian
Language that can control its emotion and prosody. Emotion FastPitch Model is used which
was developed directly from the FastPitch Model. The development is intended so that the
resulting prosody is not only predicted from the input sentence, but also the emotion label. To
be able to generate emotions better, the Emotion FastPitch Model also handles feature energy
as an additional feature. The Vocoder Model used to convert Mel Spectrogram into audio is
the HifiGAN Model. All models are trained with expressive corpus. The corpus contains 11,500
pairs of text, audio, and emotion labels with a total duration of 21 hours and 57 minutes. The
corpus contains angry, happy, sad, and neutral emotions.
The Emotion FastPitch Model will be compared directly with the FastPitch Model as the
baseline. This model is also joined with the HifiGAN Vocoder. Both models were evaluated
using the Mean Opinion Score (MOS). The Emotion FastPitch Model got a value of 3.77 ±
0.09257. Higher than the baseline model with 3.272 ± 0.1067 MOS value. |
format |
Theses |
author |
Azhar Dhiaulhaq, Moch |
spellingShingle |
Azhar Dhiaulhaq, Moch POST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM |
author_facet |
Azhar Dhiaulhaq, Moch |
author_sort |
Azhar Dhiaulhaq, Moch |
title |
POST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM |
title_short |
POST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM |
title_full |
POST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM |
title_fullStr |
POST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM |
title_full_unstemmed |
POST-CONTROL PROSODY AND EMOTION FOR INDONESIA TEXT TO SPEECH SYSTEM |
title_sort |
post-control prosody and emotion for indonesia text to speech system |
url |
https://digilib.itb.ac.id/gdl/view/68652 |
_version_ |
1822005812977991680 |