DEVELOPMENT OF TEXT-TO-SPEECH SYSTEM FOR AN INDONESIAN SMART SPEAKER
Generally, smart speakers are operated using the English language, even though Indonesian people generally have poor English language skills. There are three components in a smart speaker, Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS). End-to-End (E2...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/72116 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Generally, smart speakers are operated using the English language, even though Indonesian people generally have poor English language skills. There are three components in a smart speaker, Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS). End-to-End (E2E) TTS system is a TTS system that can immediately process a text and generate audio from it. E2E TTS has two parts, spectrogram generator and vocoder. The TTS system for this research was built using Tacotron 2 which is the state of the art in TTS world as the spectrogram generator and Parallel WaveGAN as the vocoder. The dataset used for this research consist of 3000 pairs of audio and their text transcription that was sourced from an audiobook of Indonesian language school and college books, with a total duration of 9 hours, 22 minutes, and 30 seconds. Mean Opinion Score (MOS) testing of the TTS system for this research resulted in a MOS score of 3,24 ± 0,29, while the Semantically Unpredictable Sentence (SUS) testing from the TTS system for this research resulted in an accuracy score of (91.82 ± 7.63)%. |
---|