Time domain modification of synthesis speech
Concatenative synthesis becomes more and more popular nowadays because of its high naturalness and ease to implement. In cocatenative synthesis, the prerecord samples are modified correspondingly to synthesize the desired speech. In the modification process, precise pitch detection and modificati...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/62155 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-62155 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-621552023-07-07T15:56:15Z Time domain modification of synthesis speech Long, Hai Foo Say Wei School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering Concatenative synthesis becomes more and more popular nowadays because of its high naturalness and ease to implement. In cocatenative synthesis, the prerecord samples are modified correspondingly to synthesize the desired speech. In the modification process, precise pitch detection and modification are very important and greatly affected the quality of synthesized speech. AMDF is time domain pitch detection method with high accuracy and low computation complexity. It calculates a difference signal between the waveform and its time delayed copy at varies time delays. Pitch is extracted from the difference signal by seeking the first minimum. A pre spectra flattener, the center clipping, can increase the reliability of AMDF. The accuracy can be further enhanced by apply a probabilistic error correction after rough estimation by AMDF. TD-PSOLA is a popular time domain pitch and duration modification method. It decomposes the signal into a series of short-time signal and modifies the short-time waveform according to desired pitch and time scale factor. Finally, the synthesized speech is obtained by applying an overlap-add method. Instead of applying a constant pitch scale factor, a pitch scale function is used to achieve a purpose of changing the tone in Mandarin Chinese. The pitch scale function is derived from the four lexical tone models of Mandarin Chinese and determined experimentally. Bachelor of Engineering 2015-02-10T08:51:43Z 2015-02-10T08:51:43Z 2006 2006 Final Year Project (FYP) http://hdl.handle.net/10356/62155 en Nanyang Technological University 74 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Electrical and electronic engineering |
spellingShingle |
DRNTU::Engineering::Electrical and electronic engineering Long, Hai Time domain modification of synthesis speech |
description |
Concatenative synthesis becomes more and more popular nowadays because of
its high naturalness and ease to implement. In cocatenative synthesis, the prerecord
samples are modified correspondingly to synthesize the desired speech. In the
modification process, precise pitch detection and modification are very important and
greatly affected the quality of synthesized speech.
AMDF is time domain pitch detection method with high accuracy and low
computation complexity. It calculates a difference signal between the waveform and
its time delayed copy at varies time delays. Pitch is extracted from the difference
signal by seeking the first minimum. A pre spectra flattener, the center clipping, can
increase the reliability of AMDF. The accuracy can be further enhanced by apply a
probabilistic error correction after rough estimation by AMDF.
TD-PSOLA is a popular time domain pitch and duration modification method. It
decomposes the signal into a series of short-time signal and modifies the short-time
waveform according to desired pitch and time scale factor. Finally, the synthesized
speech is obtained by applying an overlap-add method. Instead of applying a constant
pitch scale factor, a pitch scale function is used to achieve a purpose of changing the
tone in Mandarin Chinese. The pitch scale function is derived from the four lexical
tone models of Mandarin Chinese and determined experimentally. |
author2 |
Foo Say Wei |
author_facet |
Foo Say Wei Long, Hai |
format |
Final Year Project |
author |
Long, Hai |
author_sort |
Long, Hai |
title |
Time domain modification of synthesis speech |
title_short |
Time domain modification of synthesis speech |
title_full |
Time domain modification of synthesis speech |
title_fullStr |
Time domain modification of synthesis speech |
title_full_unstemmed |
Time domain modification of synthesis speech |
title_sort |
time domain modification of synthesis speech |
publishDate |
2015 |
url |
http://hdl.handle.net/10356/62155 |
_version_ |
1772827489331052544 |