Synthesising the Singaporean voice: enhancing Singapore English in neural speech synthesis
In contemporary Text-to-Speech (TTS) techniques, achieving naturalness in synthetic speech presents a significant challenge. Additionally, to address the specific challenge of low-resource Singapore English (with minimal resources available in existing speech corpora), this Final Year Project fo...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/174253 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In contemporary Text-to-Speech (TTS) techniques, achieving
naturalness in synthetic speech presents a significant challenge.
Additionally, to address the specific challenge of low-resource
Singapore English (with minimal resources available in existing
speech corpora), this Final Year Project focused on applying and
advancing these TTS techniques. The objective was to enhance the
naturalness of synthetic Singapore English speech, a goal that was
under-represented in contemporary TTS research.
This paper delves into the latest innovations in TTS systems and
introduces a novel spoken corpus in Singapore English. Following the
creation of this new corpus is the training and evaluation of a TTS
model based on FastSpeech2 and HiFiNet2 architectures. The
evaluation focused on three key metrics: PESQ, SDR and MOS-X2.
Findings indicate notable improvements in the naturalness of
synthesised Singapore English speech, effectively capturing its unique
nuances and characteristics. These advancements mark a significant
step in TTS technology, particularly in enhancing the naturalness
and authenticity of synthetic speech for under-represented languages
like Singapore English, amidst a sea of native English voices. |
---|