Voice conversion by speech synthesis

Speech signal contains two kinds of information. They are: (i) The message the speaker wants to convey to the listener and (ii) the characteristics of the speaker. In this project, we focus on the analysis and manipulation of speaker characteristics embedded in the speech signal for voice conversion...

Full description

Saved in:

Bibliographic Details
Main Author:	Lee, Ming Hui.
Other Authors:	Wan Chunru
Format:	Final Year Project
Language:	English
Published:	2009
Subjects:	DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing
Online Access:	http://hdl.handle.net/10356/16707
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-16707
record_format	dspace
spelling	sg-ntu-dr.10356-167072023-07-07T15:59:51Z Voice conversion by speech synthesis Lee, Ming Hui. Wan Chunru School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing Speech signal contains two kinds of information. They are: (i) The message the speaker wants to convey to the listener and (ii) the characteristics of the speaker. In this project, we focus on the analysis and manipulation of speaker characteristics embedded in the speech signal for voice conversion. Voice conversion involves transformation of the speaker characteristics in the speech uttered by a speaker (source speaker), so as to generate speech having the voice characteristics of the desired speaker (target speaker). Voice characteristics lie at the linguistic, suprasegmental and segmental levels. The speaker characteristics at the linguistic and suprasegmental levels are learned features. Hence they are difficult to derive from data and model. On the other hand, speaker characteristics at the segmental level can be attributed to the speech production mechanism and they are reflected in the source and system characteristics of the physical system. This mechanism that models after the human speech production is known as source-filter and the two models that are looked at are linear prediction (LP) and formant. But research has shown that the quality of the synthesis using the LP synthesizer is superior to that using the formant synthesizer and since linear prediction is the most primitive methodology, it will serve as an appropriate baseline for beginners in the area of speech processing. Thus, this will form the central idea of this project. To start, with little knowledge in speech signal processing prior to this project and for specialized data sets such as speech, it is necessary to gain understanding of the acoustic features and properties of speech data before advancing the field of speech analysis and synthesis. Using Matlab, routines and functions with graphical user interface support are implemented to enable user to step through the program runtime execution with ease. The programs are closely referenced and built on existing toolboxes. Finally, performance of the system for converting speech from one voice to another is summarized, tabulated and discussed. Drawbacks and shortcomings are determined and examined. Methods involved in evaluating these transformations of the voice conversion system are studied and subjective test is the method employed for evaluation of the results obtained in this project. The report concludes with an application that voice conversion has served as an invaluable tool; speech-to-speech translation is briefly looked at. Bachelor of Engineering 2009-05-28T02:34:47Z 2009-05-28T02:34:47Z 2009 2009 Final Year Project (FYP) http://hdl.handle.net/10356/16707 en Nanyang Technological University 105 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing
spellingShingle	DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing Lee, Ming Hui. Voice conversion by speech synthesis
description	Speech signal contains two kinds of information. They are: (i) The message the speaker wants to convey to the listener and (ii) the characteristics of the speaker. In this project, we focus on the analysis and manipulation of speaker characteristics embedded in the speech signal for voice conversion. Voice conversion involves transformation of the speaker characteristics in the speech uttered by a speaker (source speaker), so as to generate speech having the voice characteristics of the desired speaker (target speaker). Voice characteristics lie at the linguistic, suprasegmental and segmental levels. The speaker characteristics at the linguistic and suprasegmental levels are learned features. Hence they are difficult to derive from data and model. On the other hand, speaker characteristics at the segmental level can be attributed to the speech production mechanism and they are reflected in the source and system characteristics of the physical system. This mechanism that models after the human speech production is known as source-filter and the two models that are looked at are linear prediction (LP) and formant. But research has shown that the quality of the synthesis using the LP synthesizer is superior to that using the formant synthesizer and since linear prediction is the most primitive methodology, it will serve as an appropriate baseline for beginners in the area of speech processing. Thus, this will form the central idea of this project. To start, with little knowledge in speech signal processing prior to this project and for specialized data sets such as speech, it is necessary to gain understanding of the acoustic features and properties of speech data before advancing the field of speech analysis and synthesis. Using Matlab, routines and functions with graphical user interface support are implemented to enable user to step through the program runtime execution with ease. The programs are closely referenced and built on existing toolboxes. Finally, performance of the system for converting speech from one voice to another is summarized, tabulated and discussed. Drawbacks and shortcomings are determined and examined. Methods involved in evaluating these transformations of the voice conversion system are studied and subjective test is the method employed for evaluation of the results obtained in this project. The report concludes with an application that voice conversion has served as an invaluable tool; speech-to-speech translation is briefly looked at.
author2	Wan Chunru
author_facet	Wan Chunru Lee, Ming Hui.
format	Final Year Project
author	Lee, Ming Hui.
author_sort	Lee, Ming Hui.
title	Voice conversion by speech synthesis
title_short	Voice conversion by speech synthesis
title_full	Voice conversion by speech synthesis
title_fullStr	Voice conversion by speech synthesis
title_full_unstemmed	Voice conversion by speech synthesis
title_sort	voice conversion by speech synthesis
publishDate	2009
url	http://hdl.handle.net/10356/16707
_version_	1772825710698692608

Voice conversion by speech synthesis

Similar Items