Time domain modification of synthesis speech

Concatenative synthesis becomes more and more popular nowadays because of its high naturalness and ease to implement. In cocatenative synthesis, the prerecord samples are modified correspondingly to synthesize the desired speech. In the modification process, precise pitch detection and modificati...

全面介紹

Saved in:
書目詳細資料
主要作者: Long, Hai
其他作者: Foo Say Wei
格式: Final Year Project
語言:English
出版: 2015
主題:
在線閱讀:http://hdl.handle.net/10356/62155
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Nanyang Technological University
語言: English
實物特徵
總結:Concatenative synthesis becomes more and more popular nowadays because of its high naturalness and ease to implement. In cocatenative synthesis, the prerecord samples are modified correspondingly to synthesize the desired speech. In the modification process, precise pitch detection and modification are very important and greatly affected the quality of synthesized speech. AMDF is time domain pitch detection method with high accuracy and low computation complexity. It calculates a difference signal between the waveform and its time delayed copy at varies time delays. Pitch is extracted from the difference signal by seeking the first minimum. A pre spectra flattener, the center clipping, can increase the reliability of AMDF. The accuracy can be further enhanced by apply a probabilistic error correction after rough estimation by AMDF. TD-PSOLA is a popular time domain pitch and duration modification method. It decomposes the signal into a series of short-time signal and modifies the short-time waveform according to desired pitch and time scale factor. Finally, the synthesized speech is obtained by applying an overlap-add method. Instead of applying a constant pitch scale factor, a pitch scale function is used to achieve a purpose of changing the tone in Mandarin Chinese. The pitch scale function is derived from the four lexical tone models of Mandarin Chinese and determined experimentally.