Zero-shot text to speech with Singaporean accent
Zero-shot text-to-speech (TTS) is an assistive technology that converts written text into speech using target voices that are unseen during training. Zero-shot TTS has a wide range of applications such as personalized chatbots and virtual assistants. Currently, there is lack of research and developm...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/167121 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-167121 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1671212023-05-26T15:37:34Z Zero-shot text to speech with Singaporean accent Li, Haoyang Chng Eng Siong School of Computer Science and Engineering ASESChng@ntu.edu.sg Engineering::Computer science and engineering Zero-shot text-to-speech (TTS) is an assistive technology that converts written text into speech using target voices that are unseen during training. Zero-shot TTS has a wide range of applications such as personalized chatbots and virtual assistants. Currently, there is lack of research and development on zero-shot TTS that can generate Singaporean accented speech. This is because the demand for Singaporean accented TTS is low internationally given Singapore’s small population size, and TTS systems trained on popular benchmark corpus cannot adapt to the unique acoustic features of Singaporean accented speech. To improve Singaporean users' experience on using TTS services, this project develops a SoTA zero-shot TTS model for Singaporean accented English through fine tuning a pre-trained British accented TTS model with a Singaporean accented English corpus. Our subjective evaluation found that the fine tuning significantly improved in the model’s mean opinion scale (MOS) score from 2.52 to 4.14. Our objective evaluation further showed an improvement in the speaker similarity with the target speakers from 0.861 to 0.933 measured using cosine similarity. We hope this project will contribute to the sharing of knowledge in developing Singaporean accented or other accent specific TTS among the research community. Bachelor of Engineering (Computer Science) 2023-05-23T07:03:16Z 2023-05-23T07:03:16Z 2023 Final Year Project (FYP) Li, H. (2023). Zero-shot text to speech with Singaporean accent. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/167121 https://hdl.handle.net/10356/167121 en SCSE22-0106 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering |
spellingShingle |
Engineering::Computer science and engineering Li, Haoyang Zero-shot text to speech with Singaporean accent |
description |
Zero-shot text-to-speech (TTS) is an assistive technology that converts written text into speech using target voices that are unseen during training. Zero-shot TTS has a wide range of applications such as personalized chatbots and virtual assistants. Currently, there is lack of research and development on zero-shot TTS that can generate Singaporean accented speech. This is because the demand for Singaporean accented TTS is low internationally given Singapore’s small population size, and TTS systems trained on popular benchmark corpus cannot adapt to the unique acoustic features of Singaporean accented speech. To improve Singaporean users' experience on using TTS services, this project develops a SoTA zero-shot TTS model for Singaporean accented English through fine tuning a pre-trained British accented TTS model with a Singaporean accented English corpus. Our subjective evaluation found that the fine tuning significantly improved in the model’s mean opinion scale (MOS) score from 2.52 to 4.14. Our objective evaluation further showed an improvement in the speaker similarity with the target speakers from 0.861 to 0.933 measured using cosine similarity. We hope this project will contribute to the sharing of knowledge in developing Singaporean accented or other accent specific TTS among the research community. |
author2 |
Chng Eng Siong |
author_facet |
Chng Eng Siong Li, Haoyang |
format |
Final Year Project |
author |
Li, Haoyang |
author_sort |
Li, Haoyang |
title |
Zero-shot text to speech with Singaporean accent |
title_short |
Zero-shot text to speech with Singaporean accent |
title_full |
Zero-shot text to speech with Singaporean accent |
title_fullStr |
Zero-shot text to speech with Singaporean accent |
title_full_unstemmed |
Zero-shot text to speech with Singaporean accent |
title_sort |
zero-shot text to speech with singaporean accent |
publisher |
Nanyang Technological University |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/167121 |
_version_ |
1772827617334919168 |