Time domain modification of synthesis speech

Concatenative synthesis becomes more and more popular nowadays because of its high naturalness and ease to implement. In cocatenative synthesis, the prerecord samples are modified correspondingly to synthesize the desired speech. In the modification process, precise pitch detection and modificati...

Full description

Saved in:

Bibliographic Details
Main Author:	Long, Hai
Other Authors:	Foo Say Wei
Format:	Final Year Project
Language:	English
Published:	2015
Subjects:	DRNTU::Engineering::Electrical and electronic engineering
Online Access:	http://hdl.handle.net/10356/62155
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-62155
record_format	dspace
spelling	sg-ntu-dr.10356-621552023-07-07T15:56:15Z Time domain modification of synthesis speech Long, Hai Foo Say Wei School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering Concatenative synthesis becomes more and more popular nowadays because of its high naturalness and ease to implement. In cocatenative synthesis, the prerecord samples are modified correspondingly to synthesize the desired speech. In the modification process, precise pitch detection and modification are very important and greatly affected the quality of synthesized speech. AMDF is time domain pitch detection method with high accuracy and low computation complexity. It calculates a difference signal between the waveform and its time delayed copy at varies time delays. Pitch is extracted from the difference signal by seeking the first minimum. A pre spectra flattener, the center clipping, can increase the reliability of AMDF. The accuracy can be further enhanced by apply a probabilistic error correction after rough estimation by AMDF. TD-PSOLA is a popular time domain pitch and duration modification method. It decomposes the signal into a series of short-time signal and modifies the short-time waveform according to desired pitch and time scale factor. Finally, the synthesized speech is obtained by applying an overlap-add method. Instead of applying a constant pitch scale factor, a pitch scale function is used to achieve a purpose of changing the tone in Mandarin Chinese. The pitch scale function is derived from the four lexical tone models of Mandarin Chinese and determined experimentally. Bachelor of Engineering 2015-02-10T08:51:43Z 2015-02-10T08:51:43Z 2006 2006 Final Year Project (FYP) http://hdl.handle.net/10356/62155 en Nanyang Technological University 74 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Electrical and electronic engineering
spellingShingle	DRNTU::Engineering::Electrical and electronic engineering Long, Hai Time domain modification of synthesis speech
description	Concatenative synthesis becomes more and more popular nowadays because of its high naturalness and ease to implement. In cocatenative synthesis, the prerecord samples are modified correspondingly to synthesize the desired speech. In the modification process, precise pitch detection and modification are very important and greatly affected the quality of synthesized speech. AMDF is time domain pitch detection method with high accuracy and low computation complexity. It calculates a difference signal between the waveform and its time delayed copy at varies time delays. Pitch is extracted from the difference signal by seeking the first minimum. A pre spectra flattener, the center clipping, can increase the reliability of AMDF. The accuracy can be further enhanced by apply a probabilistic error correction after rough estimation by AMDF. TD-PSOLA is a popular time domain pitch and duration modification method. It decomposes the signal into a series of short-time signal and modifies the short-time waveform according to desired pitch and time scale factor. Finally, the synthesized speech is obtained by applying an overlap-add method. Instead of applying a constant pitch scale factor, a pitch scale function is used to achieve a purpose of changing the tone in Mandarin Chinese. The pitch scale function is derived from the four lexical tone models of Mandarin Chinese and determined experimentally.
author2	Foo Say Wei
author_facet	Foo Say Wei Long, Hai
format	Final Year Project
author	Long, Hai
author_sort	Long, Hai
title	Time domain modification of synthesis speech
title_short	Time domain modification of synthesis speech
title_full	Time domain modification of synthesis speech
title_fullStr	Time domain modification of synthesis speech
title_full_unstemmed	Time domain modification of synthesis speech
title_sort	time domain modification of synthesis speech
publishDate	2015
url	http://hdl.handle.net/10356/62155
_version_	1772827489331052544

Time domain modification of synthesis speech

Similar Items