DEVELOPMENT OF TEXT-TO-SPEECH SYSTEM FOR AN INDONESIAN SMART SPEAKER
Generally, smart speakers are operated using the English language, even though Indonesian people generally have poor English language skills. There are three components in a smart speaker, Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS). End-to-End (E2...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/72116 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:72116 |
---|---|
spelling |
id-itb.:721162023-03-06T03:53:10ZDEVELOPMENT OF TEXT-TO-SPEECH SYSTEM FOR AN INDONESIAN SMART SPEAKER David Partogi, Ignatius Indonesia Final Project Sistem smart speaker, Sistem TTS, Tacotron 2, Parallel WaveGAN, MOS, SUS. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/72116 Generally, smart speakers are operated using the English language, even though Indonesian people generally have poor English language skills. There are three components in a smart speaker, Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS). End-to-End (E2E) TTS system is a TTS system that can immediately process a text and generate audio from it. E2E TTS has two parts, spectrogram generator and vocoder. The TTS system for this research was built using Tacotron 2 which is the state of the art in TTS world as the spectrogram generator and Parallel WaveGAN as the vocoder. The dataset used for this research consist of 3000 pairs of audio and their text transcription that was sourced from an audiobook of Indonesian language school and college books, with a total duration of 9 hours, 22 minutes, and 30 seconds. Mean Opinion Score (MOS) testing of the TTS system for this research resulted in a MOS score of 3,24 ± 0,29, while the Semantically Unpredictable Sentence (SUS) testing from the TTS system for this research resulted in an accuracy score of (91.82 ± 7.63)%. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Generally, smart speakers are operated using the English language, even though Indonesian people generally have poor English language skills. There are three components in a smart speaker, Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS). End-to-End (E2E) TTS system is a TTS system that can immediately process a text and generate audio from it. E2E TTS has two parts, spectrogram generator and vocoder. The TTS system for this research was built using Tacotron 2 which is the state of the art in TTS world as the spectrogram generator and Parallel WaveGAN as the vocoder. The dataset used for this research consist of 3000 pairs of audio and their text transcription that was sourced from an audiobook of Indonesian language school and college books, with a total duration of 9 hours, 22 minutes, and 30 seconds. Mean Opinion Score (MOS) testing of the TTS system for this research resulted in a MOS score of 3,24 ± 0,29, while the Semantically Unpredictable Sentence (SUS) testing from the TTS system for this research resulted in an accuracy score of (91.82 ± 7.63)%. |
format |
Final Project |
author |
David Partogi, Ignatius |
spellingShingle |
David Partogi, Ignatius DEVELOPMENT OF TEXT-TO-SPEECH SYSTEM FOR AN INDONESIAN SMART SPEAKER |
author_facet |
David Partogi, Ignatius |
author_sort |
David Partogi, Ignatius |
title |
DEVELOPMENT OF TEXT-TO-SPEECH SYSTEM FOR AN INDONESIAN SMART SPEAKER |
title_short |
DEVELOPMENT OF TEXT-TO-SPEECH SYSTEM FOR AN INDONESIAN SMART SPEAKER |
title_full |
DEVELOPMENT OF TEXT-TO-SPEECH SYSTEM FOR AN INDONESIAN SMART SPEAKER |
title_fullStr |
DEVELOPMENT OF TEXT-TO-SPEECH SYSTEM FOR AN INDONESIAN SMART SPEAKER |
title_full_unstemmed |
DEVELOPMENT OF TEXT-TO-SPEECH SYSTEM FOR AN INDONESIAN SMART SPEAKER |
title_sort |
development of text-to-speech system for an indonesian smart speaker |
url |
https://digilib.itb.ac.id/gdl/view/72116 |
_version_ |
1822006769387307008 |