Synthesising the Singaporean voice: enhancing Singapore English in neural speech synthesis
In contemporary Text-to-Speech (TTS) techniques, achieving naturalness in synthetic speech presents a significant challenge. Additionally, to address the specific challenge of low-resource Singapore English (with minimal resources available in existing speech corpora), this Final Year Project fo...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/174253 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-174253 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1742532024-03-30T16:56:12Z Synthesising the Singaporean voice: enhancing Singapore English in neural speech synthesis Teo, Clarence Kai Xuan Scott Reid Moisik School of Humanities Home Team Science and Technology Agency Aloysius Tan scott.moisik@ntu.edu.sg, Aloysius_TAN@htx.gov.sg Arts and Humanities Computer and Information Science Transformers Text-to-speech Singapore English In contemporary Text-to-Speech (TTS) techniques, achieving naturalness in synthetic speech presents a significant challenge. Additionally, to address the specific challenge of low-resource Singapore English (with minimal resources available in existing speech corpora), this Final Year Project focused on applying and advancing these TTS techniques. The objective was to enhance the naturalness of synthetic Singapore English speech, a goal that was under-represented in contemporary TTS research. This paper delves into the latest innovations in TTS systems and introduces a novel spoken corpus in Singapore English. Following the creation of this new corpus is the training and evaluation of a TTS model based on FastSpeech2 and HiFiNet2 architectures. The evaluation focused on three key metrics: PESQ, SDR and MOS-X2. Findings indicate notable improvements in the naturalness of synthesised Singapore English speech, effectively capturing its unique nuances and characteristics. These advancements mark a significant step in TTS technology, particularly in enhancing the naturalness and authenticity of synthetic speech for under-represented languages like Singapore English, amidst a sea of native English voices. Bachelor's degree 2024-03-25T00:03:03Z 2024-03-25T00:03:03Z 2024 Final Year Project (FYP) Teo, C. K. X. (2024). Synthesising the Singaporean voice: enhancing Singapore English in neural speech synthesis. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/174253 https://hdl.handle.net/10356/174253 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Arts and Humanities Computer and Information Science Transformers Text-to-speech Singapore English |
spellingShingle |
Arts and Humanities Computer and Information Science Transformers Text-to-speech Singapore English Teo, Clarence Kai Xuan Synthesising the Singaporean voice: enhancing Singapore English in neural speech synthesis |
description |
In contemporary Text-to-Speech (TTS) techniques, achieving
naturalness in synthetic speech presents a significant challenge.
Additionally, to address the specific challenge of low-resource
Singapore English (with minimal resources available in existing
speech corpora), this Final Year Project focused on applying and
advancing these TTS techniques. The objective was to enhance the
naturalness of synthetic Singapore English speech, a goal that was
under-represented in contemporary TTS research.
This paper delves into the latest innovations in TTS systems and
introduces a novel spoken corpus in Singapore English. Following the
creation of this new corpus is the training and evaluation of a TTS
model based on FastSpeech2 and HiFiNet2 architectures. The
evaluation focused on three key metrics: PESQ, SDR and MOS-X2.
Findings indicate notable improvements in the naturalness of
synthesised Singapore English speech, effectively capturing its unique
nuances and characteristics. These advancements mark a significant
step in TTS technology, particularly in enhancing the naturalness
and authenticity of synthetic speech for under-represented languages
like Singapore English, amidst a sea of native English voices. |
author2 |
Scott Reid Moisik |
author_facet |
Scott Reid Moisik Teo, Clarence Kai Xuan |
format |
Final Year Project |
author |
Teo, Clarence Kai Xuan |
author_sort |
Teo, Clarence Kai Xuan |
title |
Synthesising the Singaporean voice: enhancing Singapore English in neural speech synthesis |
title_short |
Synthesising the Singaporean voice: enhancing Singapore English in neural speech synthesis |
title_full |
Synthesising the Singaporean voice: enhancing Singapore English in neural speech synthesis |
title_fullStr |
Synthesising the Singaporean voice: enhancing Singapore English in neural speech synthesis |
title_full_unstemmed |
Synthesising the Singaporean voice: enhancing Singapore English in neural speech synthesis |
title_sort |
synthesising the singaporean voice: enhancing singapore english in neural speech synthesis |
publisher |
Nanyang Technological University |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/174253 |
_version_ |
1795302118388662272 |