Voice conversion by speech synthesis

Speech signal contains two kinds of information. They are: (i) The message the speaker wants to convey to the listener and (ii) the characteristics of the speaker. In this project, we focus on the analysis and manipulation of speaker characteristics embedded in the speech signal for voice conversion...

Full description

Saved in:
Bibliographic Details
Main Author: Lee, Ming Hui.
Other Authors: Wan Chunru
Format: Final Year Project
Language:English
Published: 2009
Subjects:
Online Access:http://hdl.handle.net/10356/16707
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-16707
record_format dspace
spelling sg-ntu-dr.10356-167072023-07-07T15:59:51Z Voice conversion by speech synthesis Lee, Ming Hui. Wan Chunru School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing Speech signal contains two kinds of information. They are: (i) The message the speaker wants to convey to the listener and (ii) the characteristics of the speaker. In this project, we focus on the analysis and manipulation of speaker characteristics embedded in the speech signal for voice conversion. Voice conversion involves transformation of the speaker characteristics in the speech uttered by a speaker (source speaker), so as to generate speech having the voice characteristics of the desired speaker (target speaker). Voice characteristics lie at the linguistic, suprasegmental and segmental levels. The speaker characteristics at the linguistic and suprasegmental levels are learned features. Hence they are difficult to derive from data and model. On the other hand, speaker characteristics at the segmental level can be attributed to the speech production mechanism and they are reflected in the source and system characteristics of the physical system. This mechanism that models after the human speech production is known as source-filter and the two models that are looked at are linear prediction (LP) and formant. But research has shown that the quality of the synthesis using the LP synthesizer is superior to that using the formant synthesizer and since linear prediction is the most primitive methodology, it will serve as an appropriate baseline for beginners in the area of speech processing. Thus, this will form the central idea of this project. To start, with little knowledge in speech signal processing prior to this project and for specialized data sets such as speech, it is necessary to gain understanding of the acoustic features and properties of speech data before advancing the field of speech analysis and synthesis. Using Matlab, routines and functions with graphical user interface support are implemented to enable user to step through the program runtime execution with ease. The programs are closely referenced and built on existing toolboxes. Finally, performance of the system for converting speech from one voice to another is summarized, tabulated and discussed. Drawbacks and shortcomings are determined and examined. Methods involved in evaluating these transformations of the voice conversion system are studied and subjective test is the method employed for evaluation of the results obtained in this project. The report concludes with an application that voice conversion has served as an invaluable tool; speech-to-speech translation is briefly looked at. Bachelor of Engineering 2009-05-28T02:34:47Z 2009-05-28T02:34:47Z 2009 2009 Final Year Project (FYP) http://hdl.handle.net/10356/16707 en Nanyang Technological University 105 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing
spellingShingle DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing
Lee, Ming Hui.
Voice conversion by speech synthesis
description Speech signal contains two kinds of information. They are: (i) The message the speaker wants to convey to the listener and (ii) the characteristics of the speaker. In this project, we focus on the analysis and manipulation of speaker characteristics embedded in the speech signal for voice conversion. Voice conversion involves transformation of the speaker characteristics in the speech uttered by a speaker (source speaker), so as to generate speech having the voice characteristics of the desired speaker (target speaker). Voice characteristics lie at the linguistic, suprasegmental and segmental levels. The speaker characteristics at the linguistic and suprasegmental levels are learned features. Hence they are difficult to derive from data and model. On the other hand, speaker characteristics at the segmental level can be attributed to the speech production mechanism and they are reflected in the source and system characteristics of the physical system. This mechanism that models after the human speech production is known as source-filter and the two models that are looked at are linear prediction (LP) and formant. But research has shown that the quality of the synthesis using the LP synthesizer is superior to that using the formant synthesizer and since linear prediction is the most primitive methodology, it will serve as an appropriate baseline for beginners in the area of speech processing. Thus, this will form the central idea of this project. To start, with little knowledge in speech signal processing prior to this project and for specialized data sets such as speech, it is necessary to gain understanding of the acoustic features and properties of speech data before advancing the field of speech analysis and synthesis. Using Matlab, routines and functions with graphical user interface support are implemented to enable user to step through the program runtime execution with ease. The programs are closely referenced and built on existing toolboxes. Finally, performance of the system for converting speech from one voice to another is summarized, tabulated and discussed. Drawbacks and shortcomings are determined and examined. Methods involved in evaluating these transformations of the voice conversion system are studied and subjective test is the method employed for evaluation of the results obtained in this project. The report concludes with an application that voice conversion has served as an invaluable tool; speech-to-speech translation is briefly looked at.
author2 Wan Chunru
author_facet Wan Chunru
Lee, Ming Hui.
format Final Year Project
author Lee, Ming Hui.
author_sort Lee, Ming Hui.
title Voice conversion by speech synthesis
title_short Voice conversion by speech synthesis
title_full Voice conversion by speech synthesis
title_fullStr Voice conversion by speech synthesis
title_full_unstemmed Voice conversion by speech synthesis
title_sort voice conversion by speech synthesis
publishDate 2009
url http://hdl.handle.net/10356/16707
_version_ 1772825710698692608