Deep learning for speech synthesis

Speech is the most natural way for humans to communicate, and it is the majority of information that is transmitted in daily communication. Speech synthesis plays an important role of voice interaction. Although speech synthesis has been developed for more than half a century and has undergone sever...

Full description

Saved in:

Bibliographic Details
Main Author:	Duan, Yue
Other Authors:	Tan Yap Peng
Format:	Thesis-Master by Coursework
Language:	English
Published:	Nanyang Technological University 2022
Subjects:	Engineering::Electrical and electronic engineering Engineering::Computer science and engineering
Online Access:	https://hdl.handle.net/10356/159591
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Description
Summary:	Speech is the most natural way for humans to communicate, and it is the majority of information that is transmitted in daily communication. Speech synthesis plays an important role of voice interaction. Although speech synthesis has been developed for more than half a century and has undergone several evolutions, the research on how to make the synthesised speech more natural has always been a hot topic in speech synthesis. The latest advances in deep learning have shown impressive results in speech synthesis. This dissertation begins with an introduction to the development of speech synthesis and focuses on the applications of two deep learning methods in the field of speech synthesis in detail. This is followed by a principle analysis of the fundamental theories as well as supporting technologies used in this dissertation. And finally a multispeaker text to speech system composed of three building blocks based on neural network is implemented in this dissertation, which can perform speech synthesis with the voices of different speakers. Using the synthesizer trained with multiple corpora in different languages, the speech synthesis system proposed in this dissertation is able to perform a variety of monolingual and even mixed-language speech synthesis tasks.

Deep learning for speech synthesis

Similar Items