Time domain modification of synthesis speech

Concatenative synthesis becomes more and more popular nowadays because of its high naturalness and ease to implement. In cocatenative synthesis, the prerecord samples are modified correspondingly to synthesize the desired speech. In the modification process, precise pitch detection and modificati...

Full description

Saved in:
Bibliographic Details
Main Author: Long, Hai
Other Authors: Foo Say Wei
Format: Final Year Project
Language:English
Published: 2015
Subjects:
Online Access:http://hdl.handle.net/10356/62155
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-62155
record_format dspace
spelling sg-ntu-dr.10356-621552023-07-07T15:56:15Z Time domain modification of synthesis speech Long, Hai Foo Say Wei School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering Concatenative synthesis becomes more and more popular nowadays because of its high naturalness and ease to implement. In cocatenative synthesis, the prerecord samples are modified correspondingly to synthesize the desired speech. In the modification process, precise pitch detection and modification are very important and greatly affected the quality of synthesized speech. AMDF is time domain pitch detection method with high accuracy and low computation complexity. It calculates a difference signal between the waveform and its time delayed copy at varies time delays. Pitch is extracted from the difference signal by seeking the first minimum. A pre spectra flattener, the center clipping, can increase the reliability of AMDF. The accuracy can be further enhanced by apply a probabilistic error correction after rough estimation by AMDF. TD-PSOLA is a popular time domain pitch and duration modification method. It decomposes the signal into a series of short-time signal and modifies the short-time waveform according to desired pitch and time scale factor. Finally, the synthesized speech is obtained by applying an overlap-add method. Instead of applying a constant pitch scale factor, a pitch scale function is used to achieve a purpose of changing the tone in Mandarin Chinese. The pitch scale function is derived from the four lexical tone models of Mandarin Chinese and determined experimentally. Bachelor of Engineering 2015-02-10T08:51:43Z 2015-02-10T08:51:43Z 2006 2006 Final Year Project (FYP) http://hdl.handle.net/10356/62155 en Nanyang Technological University 74 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Electrical and electronic engineering
spellingShingle DRNTU::Engineering::Electrical and electronic engineering
Long, Hai
Time domain modification of synthesis speech
description Concatenative synthesis becomes more and more popular nowadays because of its high naturalness and ease to implement. In cocatenative synthesis, the prerecord samples are modified correspondingly to synthesize the desired speech. In the modification process, precise pitch detection and modification are very important and greatly affected the quality of synthesized speech. AMDF is time domain pitch detection method with high accuracy and low computation complexity. It calculates a difference signal between the waveform and its time delayed copy at varies time delays. Pitch is extracted from the difference signal by seeking the first minimum. A pre spectra flattener, the center clipping, can increase the reliability of AMDF. The accuracy can be further enhanced by apply a probabilistic error correction after rough estimation by AMDF. TD-PSOLA is a popular time domain pitch and duration modification method. It decomposes the signal into a series of short-time signal and modifies the short-time waveform according to desired pitch and time scale factor. Finally, the synthesized speech is obtained by applying an overlap-add method. Instead of applying a constant pitch scale factor, a pitch scale function is used to achieve a purpose of changing the tone in Mandarin Chinese. The pitch scale function is derived from the four lexical tone models of Mandarin Chinese and determined experimentally.
author2 Foo Say Wei
author_facet Foo Say Wei
Long, Hai
format Final Year Project
author Long, Hai
author_sort Long, Hai
title Time domain modification of synthesis speech
title_short Time domain modification of synthesis speech
title_full Time domain modification of synthesis speech
title_fullStr Time domain modification of synthesis speech
title_full_unstemmed Time domain modification of synthesis speech
title_sort time domain modification of synthesis speech
publishDate 2015
url http://hdl.handle.net/10356/62155
_version_ 1772827489331052544