DEVELOPMENT OF TEXT-TO-SPEECH SYSTEM FOR AN INDONESIAN SMART SPEAKER

Generally, smart speakers are operated using the English language, even though Indonesian people generally have poor English language skills. There are three components in a smart speaker, Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS). End-to-End (E2...

Full description

Saved in:
Bibliographic Details
Main Author: David Partogi, Ignatius
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/72116
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:72116
spelling id-itb.:721162023-03-06T03:53:10ZDEVELOPMENT OF TEXT-TO-SPEECH SYSTEM FOR AN INDONESIAN SMART SPEAKER David Partogi, Ignatius Indonesia Final Project Sistem smart speaker, Sistem TTS, Tacotron 2, Parallel WaveGAN, MOS, SUS. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/72116 Generally, smart speakers are operated using the English language, even though Indonesian people generally have poor English language skills. There are three components in a smart speaker, Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS). End-to-End (E2E) TTS system is a TTS system that can immediately process a text and generate audio from it. E2E TTS has two parts, spectrogram generator and vocoder. The TTS system for this research was built using Tacotron 2 which is the state of the art in TTS world as the spectrogram generator and Parallel WaveGAN as the vocoder. The dataset used for this research consist of 3000 pairs of audio and their text transcription that was sourced from an audiobook of Indonesian language school and college books, with a total duration of 9 hours, 22 minutes, and 30 seconds. Mean Opinion Score (MOS) testing of the TTS system for this research resulted in a MOS score of 3,24 ± 0,29, while the Semantically Unpredictable Sentence (SUS) testing from the TTS system for this research resulted in an accuracy score of (91.82 ± 7.63)%. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Generally, smart speakers are operated using the English language, even though Indonesian people generally have poor English language skills. There are three components in a smart speaker, Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS). End-to-End (E2E) TTS system is a TTS system that can immediately process a text and generate audio from it. E2E TTS has two parts, spectrogram generator and vocoder. The TTS system for this research was built using Tacotron 2 which is the state of the art in TTS world as the spectrogram generator and Parallel WaveGAN as the vocoder. The dataset used for this research consist of 3000 pairs of audio and their text transcription that was sourced from an audiobook of Indonesian language school and college books, with a total duration of 9 hours, 22 minutes, and 30 seconds. Mean Opinion Score (MOS) testing of the TTS system for this research resulted in a MOS score of 3,24 ± 0,29, while the Semantically Unpredictable Sentence (SUS) testing from the TTS system for this research resulted in an accuracy score of (91.82 ± 7.63)%.
format Final Project
author David Partogi, Ignatius
spellingShingle David Partogi, Ignatius
DEVELOPMENT OF TEXT-TO-SPEECH SYSTEM FOR AN INDONESIAN SMART SPEAKER
author_facet David Partogi, Ignatius
author_sort David Partogi, Ignatius
title DEVELOPMENT OF TEXT-TO-SPEECH SYSTEM FOR AN INDONESIAN SMART SPEAKER
title_short DEVELOPMENT OF TEXT-TO-SPEECH SYSTEM FOR AN INDONESIAN SMART SPEAKER
title_full DEVELOPMENT OF TEXT-TO-SPEECH SYSTEM FOR AN INDONESIAN SMART SPEAKER
title_fullStr DEVELOPMENT OF TEXT-TO-SPEECH SYSTEM FOR AN INDONESIAN SMART SPEAKER
title_full_unstemmed DEVELOPMENT OF TEXT-TO-SPEECH SYSTEM FOR AN INDONESIAN SMART SPEAKER
title_sort development of text-to-speech system for an indonesian smart speaker
url https://digilib.itb.ac.id/gdl/view/72116
_version_ 1822006769387307008