Synthesising the Singaporean voice: enhancing Singapore English in neural speech synthesis

In contemporary Text-to-Speech (TTS) techniques, achieving naturalness in synthetic speech presents a significant challenge. Additionally, to address the specific challenge of low-resource Singapore English (with minimal resources available in existing speech corpora), this Final Year Project fo...

Full description

Saved in:
Bibliographic Details
Main Author: Teo, Clarence Kai Xuan
Other Authors: Scott Reid Moisik
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/174253
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In contemporary Text-to-Speech (TTS) techniques, achieving naturalness in synthetic speech presents a significant challenge. Additionally, to address the specific challenge of low-resource Singapore English (with minimal resources available in existing speech corpora), this Final Year Project focused on applying and advancing these TTS techniques. The objective was to enhance the naturalness of synthetic Singapore English speech, a goal that was under-represented in contemporary TTS research. This paper delves into the latest innovations in TTS systems and introduces a novel spoken corpus in Singapore English. Following the creation of this new corpus is the training and evaluation of a TTS model based on FastSpeech2 and HiFiNet2 architectures. The evaluation focused on three key metrics: PESQ, SDR and MOS-X2. Findings indicate notable improvements in the naturalness of synthesised Singapore English speech, effectively capturing its unique nuances and characteristics. These advancements mark a significant step in TTS technology, particularly in enhancing the naturalness and authenticity of synthetic speech for under-represented languages like Singapore English, amidst a sea of native English voices.