Voice conversion using deep neural networks

This thesis focuses on techniques to improve the performance of voice conversion. Voice conversion modiﬁes the recorded speech of a source speaker towards a given target speaker. The resultant speech is to sound like the target speaker with the language content unchanged. This technology has been ap...

Full description

Saved in:

Bibliographic Details
Main Author:	Nguyen, Quy Hy
Other Authors:	Chng Eng Siong
Format:	Theses and Dissertations
Language:	English
Published:	2017
Subjects:	DRNTU::Science DRNTU::Engineering::Computer science and engineering
Online Access:	http://hdl.handle.net/10356/72102
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-72102
record_format	dspace
spelling	sg-ntu-dr.10356-721022023-03-04T00:47:46Z Voice conversion using deep neural networks Nguyen, Quy Hy Chng Eng Siong School of Computer Science and Engineering DRNTU::Science DRNTU::Engineering::Computer science and engineering This thesis focuses on techniques to improve the performance of voice conversion. Voice conversion modiﬁes the recorded speech of a source speaker towards a given target speaker. The resultant speech is to sound like the target speaker with the language content unchanged. This technology has been applied to create personalized voice in text-to-speech or virtual avatar, speech-to-singing synthesis or spooﬁng attacks in speaker veriﬁcation systems. To perform voice conversion, the usual approach is to create a conversion functions which is applied on the source speaker’s speech features such as timbre and prosodic features, to generate the corresponding target features. In this past decade, most of voice conversion researches had focused on spectral mapping, i.e. conversion of the features representing the timbre characteristics in a frame by frame manner. In chapter 3, we investigate a comprehensive approach to train the conversion function using DNN which considers both timbre and prosodic features simultaneously. For better modelling, we have used high-dimension spectral features. However, this further worsen the ability to robustly train a DNN which typically requires large amount of training data. To overcome the issue of limited training data, we propose a new pretraining process using autoencoder. The experimental results show the proposed comprehensive framework with pretraining performs better than conventional voice conversion systems including the state-of-the-art GMM-based system. The technique introduced in chapter 3 only learns a DNN system to convert between a pair of speaker. To reduce the need for parallel training data of new speaker pair, in chapter 4 we examine a novel DNN adaptation technology for voice conversion by including two bias vector representing both source and target speaker. By this conﬁguration, new speaker pair conversion are archived. Our preliminary results show that conversion to new target speakers’ voices could be achieved. Master of Engineering (SCE) 2017-05-25T08:57:53Z 2017-05-25T08:57:53Z 2017 Thesis Nguyen, Q. H. (2017). Voice conversion using deep neural networks. Master's thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/72102 10.32657/10356/72102 en 56 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Science DRNTU::Engineering::Computer science and engineering
spellingShingle	DRNTU::Science DRNTU::Engineering::Computer science and engineering Nguyen, Quy Hy Voice conversion using deep neural networks
description	This thesis focuses on techniques to improve the performance of voice conversion. Voice conversion modiﬁes the recorded speech of a source speaker towards a given target speaker. The resultant speech is to sound like the target speaker with the language content unchanged. This technology has been applied to create personalized voice in text-to-speech or virtual avatar, speech-to-singing synthesis or spooﬁng attacks in speaker veriﬁcation systems. To perform voice conversion, the usual approach is to create a conversion functions which is applied on the source speaker’s speech features such as timbre and prosodic features, to generate the corresponding target features. In this past decade, most of voice conversion researches had focused on spectral mapping, i.e. conversion of the features representing the timbre characteristics in a frame by frame manner. In chapter 3, we investigate a comprehensive approach to train the conversion function using DNN which considers both timbre and prosodic features simultaneously. For better modelling, we have used high-dimension spectral features. However, this further worsen the ability to robustly train a DNN which typically requires large amount of training data. To overcome the issue of limited training data, we propose a new pretraining process using autoencoder. The experimental results show the proposed comprehensive framework with pretraining performs better than conventional voice conversion systems including the state-of-the-art GMM-based system. The technique introduced in chapter 3 only learns a DNN system to convert between a pair of speaker. To reduce the need for parallel training data of new speaker pair, in chapter 4 we examine a novel DNN adaptation technology for voice conversion by including two bias vector representing both source and target speaker. By this conﬁguration, new speaker pair conversion are archived. Our preliminary results show that conversion to new target speakers’ voices could be achieved.
author2	Chng Eng Siong
author_facet	Chng Eng Siong Nguyen, Quy Hy
format	Theses and Dissertations
author	Nguyen, Quy Hy
author_sort	Nguyen, Quy Hy
title	Voice conversion using deep neural networks
title_short	Voice conversion using deep neural networks
title_full	Voice conversion using deep neural networks
title_fullStr	Voice conversion using deep neural networks
title_full_unstemmed	Voice conversion using deep neural networks
title_sort	voice conversion using deep neural networks
publishDate	2017
url	http://hdl.handle.net/10356/72102
_version_	1759855725681573888

Voice conversion using deep neural networks

Similar Items