Deep learning for speech synthesis
Speech is the most natural way for humans to communicate, and it is the majority of information that is transmitted in daily communication. Speech synthesis plays an important role of voice interaction. Although speech synthesis has been developed for more than half a century and has undergone sever...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/159591 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Speech is the most natural way for humans to communicate, and it is the majority of information that is transmitted in daily communication. Speech synthesis plays an important role of voice interaction. Although speech synthesis has been developed for more than half a century and has undergone several evolutions, the research on how to make the synthesised speech more natural has always been a hot topic in speech synthesis. The latest advances in deep learning have shown impressive results in speech synthesis. This dissertation begins with an introduction to the development of speech synthesis and focuses on the applications of two deep learning methods in the field of speech synthesis in detail. This is followed by a principle analysis of the fundamental theories as well as supporting technologies used in this dissertation. And finally a multispeaker text to speech system composed of three building blocks based on neural network is implemented in this dissertation, which can perform speech synthesis with the voices of different speakers. Using the synthesizer trained with multiple corpora in different languages, the speech synthesis system proposed in this dissertation is able to perform a variety of monolingual and even mixed-language speech synthesis tasks. |
---|