Zero-shot text to speech with Singaporean accent

Zero-shot text-to-speech (TTS) is an assistive technology that converts written text into speech using target voices that are unseen during training. Zero-shot TTS has a wide range of applications such as personalized chatbots and virtual assistants. Currently, there is lack of research and developm...

Full description

Saved in:
Bibliographic Details
Main Author: Li, Haoyang
Other Authors: Chng Eng Siong
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/167121
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Zero-shot text-to-speech (TTS) is an assistive technology that converts written text into speech using target voices that are unseen during training. Zero-shot TTS has a wide range of applications such as personalized chatbots and virtual assistants. Currently, there is lack of research and development on zero-shot TTS that can generate Singaporean accented speech. This is because the demand for Singaporean accented TTS is low internationally given Singapore’s small population size, and TTS systems trained on popular benchmark corpus cannot adapt to the unique acoustic features of Singaporean accented speech. To improve Singaporean users' experience on using TTS services, this project develops a SoTA zero-shot TTS model for Singaporean accented English through fine tuning a pre-trained British accented TTS model with a Singaporean accented English corpus. Our subjective evaluation found that the fine tuning significantly improved in the model’s mean opinion scale (MOS) score from 2.52 to 4.14. Our objective evaluation further showed an improvement in the speaker similarity with the target speakers from 0.861 to 0.933 measured using cosine similarity. We hope this project will contribute to the sharing of knowledge in developing Singaporean accented or other accent specific TTS among the research community.