Voice conversion by speech synthesis

Speech signal contains two kinds of information. They are: (i) The message the speaker wants to convey to the listener and (ii) the characteristics of the speaker. In this project, we focus on the analysis and manipulation of speaker characteristics embedded in the speech signal for voice conversion...

Full description

Saved in:
Bibliographic Details
Main Author: Lee, Ming Hui.
Other Authors: Wan Chunru
Format: Final Year Project
Language:English
Published: 2009
Subjects:
Online Access:http://hdl.handle.net/10356/16707
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Speech signal contains two kinds of information. They are: (i) The message the speaker wants to convey to the listener and (ii) the characteristics of the speaker. In this project, we focus on the analysis and manipulation of speaker characteristics embedded in the speech signal for voice conversion. Voice conversion involves transformation of the speaker characteristics in the speech uttered by a speaker (source speaker), so as to generate speech having the voice characteristics of the desired speaker (target speaker). Voice characteristics lie at the linguistic, suprasegmental and segmental levels. The speaker characteristics at the linguistic and suprasegmental levels are learned features. Hence they are difficult to derive from data and model. On the other hand, speaker characteristics at the segmental level can be attributed to the speech production mechanism and they are reflected in the source and system characteristics of the physical system. This mechanism that models after the human speech production is known as source-filter and the two models that are looked at are linear prediction (LP) and formant. But research has shown that the quality of the synthesis using the LP synthesizer is superior to that using the formant synthesizer and since linear prediction is the most primitive methodology, it will serve as an appropriate baseline for beginners in the area of speech processing. Thus, this will form the central idea of this project. To start, with little knowledge in speech signal processing prior to this project and for specialized data sets such as speech, it is necessary to gain understanding of the acoustic features and properties of speech data before advancing the field of speech analysis and synthesis. Using Matlab, routines and functions with graphical user interface support are implemented to enable user to step through the program runtime execution with ease. The programs are closely referenced and built on existing toolboxes. Finally, performance of the system for converting speech from one voice to another is summarized, tabulated and discussed. Drawbacks and shortcomings are determined and examined. Methods involved in evaluating these transformations of the voice conversion system are studied and subjective test is the method employed for evaluation of the results obtained in this project. The report concludes with an application that voice conversion has served as an invaluable tool; speech-to-speech translation is briefly looked at.