EXPRESSIVE TEXT TO SPEECH SYSTEM TO READ INDONESIA NOVEL BASED ON DEEP NEURAL NETWORK USING GLOBAL STYLE TOKEN AND TACOTRON 2

This research aims to construct an Expressive Text to Speech (TTS) system in the domain of Indonesia Language. Tacotron 2 is used in this study with Global Style Token (GST) as an additional feature and Parallel WaveGAN as a vocoder. Linguistic features are extracted from input text using Taco...

Full description

Saved in:
Bibliographic Details
Main Author: Azhar Dhiaulhaq, Moch.
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/56159
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:This research aims to construct an Expressive Text to Speech (TTS) system in the domain of Indonesia Language. Tacotron 2 is used in this study with Global Style Token (GST) as an additional feature and Parallel WaveGAN as a vocoder. Linguistic features are extracted from input text using Tacotron 2. Those features are combined with GST which acts as emotion representation features extracted from reference audio. Combined features are processed by Decoder in Tacotron 2 model to produce Spectrogram which will then be processed by Parallel WaveGAN to finally produce expressive output audio. Both model GST + Tacotron 2 and model Parallel WaveGAN are trained using the same expressive corpus. The expressive corpus is constructed with 11.482 pairs of text and audio with 21 hours 57 minutes total duration. That expressive corpus contains angry, happy, sad, and neutral emotions. GST + Tacotron 2 model compared with baseline model, a Tacotron 2 architecture alone without Global Style Token and combined with Parallel WaveGAN as a vocoder. Both models are tested using Mean Opinion Score (MOS) and AB Testing. GST + Tacotron 2 model produce 3,90 ± 0,07 for MOS score. Higher than baseline model with 3,33 ± 0,10 MOS score. Respondent’s preference from AB Testing shows that most of the respondents chose GST + Tacotron2 Model (65,93%) than Baseline Model (34,07%).