MULTI SPEAKER SPEECH SYNTHESIS SYSTEM FOR INDONESIAN LANGUAGE

Generally, text-to-speech models only produce voice from a single speaker. The most straightforward method to produce another speaker’s voice, is to build a standalone synthesis model for each desired speaker’s voice. But such approach needs large amount of training data and computational resourc...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Jerremy Budiman, Marvin
التنسيق: Theses
اللغة:Indonesia
الوصول للمادة أونلاين:https://digilib.itb.ac.id/gdl/view/70713
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة: Institut Teknologi Bandung
اللغة: Indonesia
id id-itb.:70713
spelling id-itb.:707132023-01-19T13:28:44ZMULTI SPEAKER SPEECH SYNTHESIS SYSTEM FOR INDONESIAN LANGUAGE Jerremy Budiman, Marvin Indonesia Theses speech synthesis, multi speaker, Indonesian language. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/70713 Generally, text-to-speech models only produce voice from a single speaker. The most straightforward method to produce another speaker’s voice, is to build a standalone synthesis model for each desired speaker’s voice. But such approach needs large amount of training data and computational resource. To overcome the problem, several architectures has been successful in producing synthesized speech from various speakers efficiently in terms of data and computation. One of the architectures is Deep Voice 3. In this work a multi speaker speech synthesis system is built for Indonesian language. The system is using Deep Voice 3 architecture, with several additional components for preprocessing dan post-processing. Some of the components are specifically implemented for Indonesian language. The system is built using a multi speaker dataset, consists of speech data from 145 Indonesian speaker. This system is evaluated subjectively to assess naturalness, similarity to original speaker, and intelligibility of the produced speech. The result shows that the system has MOS (mean opinion score) of 3.39 for speech naturalness dan 3.11 for speech similarity. In assessing speech intelligibility using SUS (semantically unpredictable sentence), the test gives 73.88% for sentence accuracy and 93.48% for word accuracy. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Generally, text-to-speech models only produce voice from a single speaker. The most straightforward method to produce another speaker’s voice, is to build a standalone synthesis model for each desired speaker’s voice. But such approach needs large amount of training data and computational resource. To overcome the problem, several architectures has been successful in producing synthesized speech from various speakers efficiently in terms of data and computation. One of the architectures is Deep Voice 3. In this work a multi speaker speech synthesis system is built for Indonesian language. The system is using Deep Voice 3 architecture, with several additional components for preprocessing dan post-processing. Some of the components are specifically implemented for Indonesian language. The system is built using a multi speaker dataset, consists of speech data from 145 Indonesian speaker. This system is evaluated subjectively to assess naturalness, similarity to original speaker, and intelligibility of the produced speech. The result shows that the system has MOS (mean opinion score) of 3.39 for speech naturalness dan 3.11 for speech similarity. In assessing speech intelligibility using SUS (semantically unpredictable sentence), the test gives 73.88% for sentence accuracy and 93.48% for word accuracy.
format Theses
author Jerremy Budiman, Marvin
spellingShingle Jerremy Budiman, Marvin
MULTI SPEAKER SPEECH SYNTHESIS SYSTEM FOR INDONESIAN LANGUAGE
author_facet Jerremy Budiman, Marvin
author_sort Jerremy Budiman, Marvin
title MULTI SPEAKER SPEECH SYNTHESIS SYSTEM FOR INDONESIAN LANGUAGE
title_short MULTI SPEAKER SPEECH SYNTHESIS SYSTEM FOR INDONESIAN LANGUAGE
title_full MULTI SPEAKER SPEECH SYNTHESIS SYSTEM FOR INDONESIAN LANGUAGE
title_fullStr MULTI SPEAKER SPEECH SYNTHESIS SYSTEM FOR INDONESIAN LANGUAGE
title_full_unstemmed MULTI SPEAKER SPEECH SYNTHESIS SYSTEM FOR INDONESIAN LANGUAGE
title_sort multi speaker speech synthesis system for indonesian language
url https://digilib.itb.ac.id/gdl/view/70713
_version_ 1823650745316016128