Synthesising the Singaporean voice: enhancing Singapore English in neural speech synthesis

In contemporary Text-to-Speech (TTS) techniques, achieving naturalness in synthetic speech presents a significant challenge. Additionally, to address the specific challenge of low-resource Singapore English (with minimal resources available in existing speech corpora), this Final Year Project fo...

Full description

Saved in:
Bibliographic Details
Main Author: Teo, Clarence Kai Xuan
Other Authors: Scott Reid Moisik
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/174253
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-174253
record_format dspace
spelling sg-ntu-dr.10356-1742532024-03-30T16:56:12Z Synthesising the Singaporean voice: enhancing Singapore English in neural speech synthesis Teo, Clarence Kai Xuan Scott Reid Moisik School of Humanities Home Team Science and Technology Agency Aloysius Tan scott.moisik@ntu.edu.sg, Aloysius_TAN@htx.gov.sg Arts and Humanities Computer and Information Science Transformers Text-to-speech Singapore English In contemporary Text-to-Speech (TTS) techniques, achieving naturalness in synthetic speech presents a significant challenge. Additionally, to address the specific challenge of low-resource Singapore English (with minimal resources available in existing speech corpora), this Final Year Project focused on applying and advancing these TTS techniques. The objective was to enhance the naturalness of synthetic Singapore English speech, a goal that was under-represented in contemporary TTS research. This paper delves into the latest innovations in TTS systems and introduces a novel spoken corpus in Singapore English. Following the creation of this new corpus is the training and evaluation of a TTS model based on FastSpeech2 and HiFiNet2 architectures. The evaluation focused on three key metrics: PESQ, SDR and MOS-X2. Findings indicate notable improvements in the naturalness of synthesised Singapore English speech, effectively capturing its unique nuances and characteristics. These advancements mark a significant step in TTS technology, particularly in enhancing the naturalness and authenticity of synthetic speech for under-represented languages like Singapore English, amidst a sea of native English voices. Bachelor's degree 2024-03-25T00:03:03Z 2024-03-25T00:03:03Z 2024 Final Year Project (FYP) Teo, C. K. X. (2024). Synthesising the Singaporean voice: enhancing Singapore English in neural speech synthesis. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/174253 https://hdl.handle.net/10356/174253 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Arts and Humanities
Computer and Information Science
Transformers
Text-to-speech
Singapore English
spellingShingle Arts and Humanities
Computer and Information Science
Transformers
Text-to-speech
Singapore English
Teo, Clarence Kai Xuan
Synthesising the Singaporean voice: enhancing Singapore English in neural speech synthesis
description In contemporary Text-to-Speech (TTS) techniques, achieving naturalness in synthetic speech presents a significant challenge. Additionally, to address the specific challenge of low-resource Singapore English (with minimal resources available in existing speech corpora), this Final Year Project focused on applying and advancing these TTS techniques. The objective was to enhance the naturalness of synthetic Singapore English speech, a goal that was under-represented in contemporary TTS research. This paper delves into the latest innovations in TTS systems and introduces a novel spoken corpus in Singapore English. Following the creation of this new corpus is the training and evaluation of a TTS model based on FastSpeech2 and HiFiNet2 architectures. The evaluation focused on three key metrics: PESQ, SDR and MOS-X2. Findings indicate notable improvements in the naturalness of synthesised Singapore English speech, effectively capturing its unique nuances and characteristics. These advancements mark a significant step in TTS technology, particularly in enhancing the naturalness and authenticity of synthetic speech for under-represented languages like Singapore English, amidst a sea of native English voices.
author2 Scott Reid Moisik
author_facet Scott Reid Moisik
Teo, Clarence Kai Xuan
format Final Year Project
author Teo, Clarence Kai Xuan
author_sort Teo, Clarence Kai Xuan
title Synthesising the Singaporean voice: enhancing Singapore English in neural speech synthesis
title_short Synthesising the Singaporean voice: enhancing Singapore English in neural speech synthesis
title_full Synthesising the Singaporean voice: enhancing Singapore English in neural speech synthesis
title_fullStr Synthesising the Singaporean voice: enhancing Singapore English in neural speech synthesis
title_full_unstemmed Synthesising the Singaporean voice: enhancing Singapore English in neural speech synthesis
title_sort synthesising the singaporean voice: enhancing singapore english in neural speech synthesis
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/174253
_version_ 1795302118388662272