MULTI SPEAKER SPEECH SYNTHESIS SYSTEM FOR INDONESIAN LANGUAGE
Generally, text-to-speech models only produce voice from a single speaker. The most straightforward method to produce another speaker’s voice, is to build a standalone synthesis model for each desired speaker’s voice. But such approach needs large amount of training data and computational resourc...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/70713 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:70713 |
---|---|
spelling |
id-itb.:707132023-01-19T13:28:44ZMULTI SPEAKER SPEECH SYNTHESIS SYSTEM FOR INDONESIAN LANGUAGE Jerremy Budiman, Marvin Indonesia Theses speech synthesis, multi speaker, Indonesian language. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/70713 Generally, text-to-speech models only produce voice from a single speaker. The most straightforward method to produce another speaker’s voice, is to build a standalone synthesis model for each desired speaker’s voice. But such approach needs large amount of training data and computational resource. To overcome the problem, several architectures has been successful in producing synthesized speech from various speakers efficiently in terms of data and computation. One of the architectures is Deep Voice 3. In this work a multi speaker speech synthesis system is built for Indonesian language. The system is using Deep Voice 3 architecture, with several additional components for preprocessing dan post-processing. Some of the components are specifically implemented for Indonesian language. The system is built using a multi speaker dataset, consists of speech data from 145 Indonesian speaker. This system is evaluated subjectively to assess naturalness, similarity to original speaker, and intelligibility of the produced speech. The result shows that the system has MOS (mean opinion score) of 3.39 for speech naturalness dan 3.11 for speech similarity. In assessing speech intelligibility using SUS (semantically unpredictable sentence), the test gives 73.88% for sentence accuracy and 93.48% for word accuracy. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Generally, text-to-speech models only produce voice from a single speaker. The
most straightforward method to produce another speaker’s voice, is to build a
standalone synthesis model for each desired speaker’s voice. But such approach
needs large amount of training data and computational resource. To overcome the
problem, several architectures has been successful in producing synthesized speech
from various speakers efficiently in terms of data and computation. One of the
architectures is Deep Voice 3.
In this work a multi speaker speech synthesis system is built for Indonesian
language. The system is using Deep Voice 3 architecture, with several additional
components for preprocessing dan post-processing. Some of the components are
specifically implemented for Indonesian language. The system is built using a multi
speaker dataset, consists of speech data from 145 Indonesian speaker. This system
is evaluated subjectively to assess naturalness, similarity to original speaker, and
intelligibility of the produced speech. The result shows that the system has MOS
(mean opinion score) of 3.39 for speech naturalness dan 3.11 for speech similarity.
In assessing speech intelligibility using SUS (semantically unpredictable sentence),
the test gives 73.88% for sentence accuracy and 93.48% for word accuracy. |
format |
Theses |
author |
Jerremy Budiman, Marvin |
spellingShingle |
Jerremy Budiman, Marvin MULTI SPEAKER SPEECH SYNTHESIS SYSTEM FOR INDONESIAN LANGUAGE |
author_facet |
Jerremy Budiman, Marvin |
author_sort |
Jerremy Budiman, Marvin |
title |
MULTI SPEAKER SPEECH SYNTHESIS SYSTEM FOR INDONESIAN LANGUAGE |
title_short |
MULTI SPEAKER SPEECH SYNTHESIS SYSTEM FOR INDONESIAN LANGUAGE |
title_full |
MULTI SPEAKER SPEECH SYNTHESIS SYSTEM FOR INDONESIAN LANGUAGE |
title_fullStr |
MULTI SPEAKER SPEECH SYNTHESIS SYSTEM FOR INDONESIAN LANGUAGE |
title_full_unstemmed |
MULTI SPEAKER SPEECH SYNTHESIS SYSTEM FOR INDONESIAN LANGUAGE |
title_sort |
multi speaker speech synthesis system for indonesian language |
url |
https://digilib.itb.ac.id/gdl/view/70713 |
_version_ |
1822006387988758528 |