Zero-shot text to speech with Singaporean accent

Zero-shot text-to-speech (TTS) is an assistive technology that converts written text into speech using target voices that are unseen during training. Zero-shot TTS has a wide range of applications such as personalized chatbots and virtual assistants. Currently, there is lack of research and developm...

Full description

Saved in:
Bibliographic Details
Main Author: Li, Haoyang
Other Authors: Chng Eng Siong
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/167121
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-167121
record_format dspace
spelling sg-ntu-dr.10356-1671212023-05-26T15:37:34Z Zero-shot text to speech with Singaporean accent Li, Haoyang Chng Eng Siong School of Computer Science and Engineering ASESChng@ntu.edu.sg Engineering::Computer science and engineering Zero-shot text-to-speech (TTS) is an assistive technology that converts written text into speech using target voices that are unseen during training. Zero-shot TTS has a wide range of applications such as personalized chatbots and virtual assistants. Currently, there is lack of research and development on zero-shot TTS that can generate Singaporean accented speech. This is because the demand for Singaporean accented TTS is low internationally given Singapore’s small population size, and TTS systems trained on popular benchmark corpus cannot adapt to the unique acoustic features of Singaporean accented speech. To improve Singaporean users' experience on using TTS services, this project develops a SoTA zero-shot TTS model for Singaporean accented English through fine tuning a pre-trained British accented TTS model with a Singaporean accented English corpus. Our subjective evaluation found that the fine tuning significantly improved in the model’s mean opinion scale (MOS) score from 2.52 to 4.14. Our objective evaluation further showed an improvement in the speaker similarity with the target speakers from 0.861 to 0.933 measured using cosine similarity. We hope this project will contribute to the sharing of knowledge in developing Singaporean accented or other accent specific TTS among the research community. Bachelor of Engineering (Computer Science) 2023-05-23T07:03:16Z 2023-05-23T07:03:16Z 2023 Final Year Project (FYP) Li, H. (2023). Zero-shot text to speech with Singaporean accent. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/167121 https://hdl.handle.net/10356/167121 en SCSE22-0106 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Li, Haoyang
Zero-shot text to speech with Singaporean accent
description Zero-shot text-to-speech (TTS) is an assistive technology that converts written text into speech using target voices that are unseen during training. Zero-shot TTS has a wide range of applications such as personalized chatbots and virtual assistants. Currently, there is lack of research and development on zero-shot TTS that can generate Singaporean accented speech. This is because the demand for Singaporean accented TTS is low internationally given Singapore’s small population size, and TTS systems trained on popular benchmark corpus cannot adapt to the unique acoustic features of Singaporean accented speech. To improve Singaporean users' experience on using TTS services, this project develops a SoTA zero-shot TTS model for Singaporean accented English through fine tuning a pre-trained British accented TTS model with a Singaporean accented English corpus. Our subjective evaluation found that the fine tuning significantly improved in the model’s mean opinion scale (MOS) score from 2.52 to 4.14. Our objective evaluation further showed an improvement in the speaker similarity with the target speakers from 0.861 to 0.933 measured using cosine similarity. We hope this project will contribute to the sharing of knowledge in developing Singaporean accented or other accent specific TTS among the research community.
author2 Chng Eng Siong
author_facet Chng Eng Siong
Li, Haoyang
format Final Year Project
author Li, Haoyang
author_sort Li, Haoyang
title Zero-shot text to speech with Singaporean accent
title_short Zero-shot text to speech with Singaporean accent
title_full Zero-shot text to speech with Singaporean accent
title_fullStr Zero-shot text to speech with Singaporean accent
title_full_unstemmed Zero-shot text to speech with Singaporean accent
title_sort zero-shot text to speech with singaporean accent
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/167121
_version_ 1772827617334919168