Mandarin speech synthesis
Synthetic speech has been developed steadily for the past few decades. The objective of the project is to develop a Chinese Text-to-Speech system using concatenative synthesis. Concatenative synthesis involves playing pre-recorded samples of natural speech such as the word or syllable. As Mandari...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/62164 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Synthetic speech has been developed steadily for the past few decades. The objective
of the project is to develop a Chinese Text-to-Speech system using concatenative
synthesis. Concatenative synthesis involves playing pre-recorded samples of natural
speech such as the word or syllable. As Mandarin Chinese is a tonal language with
each character pronounced as a syllable, syllable is chosen as the basic synthesis unit
in this project. A Chinese speech database containing more than 1300 Chinese
syllables were pre-recorded and used for the concatenative speech synthesis.
The technique of Time Domain Pitch Synchronous Overlap Add approach (TDPSOLA)
was adopted to allow pre-recorded speech samples smoothly concatenated
and provides a good controlling for the pitch and duration. A natural speech sentence
was pre-recorded and used to compare with the concatenated synthetic speech. To
make the concatenated synthetic speech sound as close as that of the natural speech,
the technique of TD-PSOLA is adopted to change the duration of each character in the
concatenated sentence so that the duration will be almost the same as that of the
character in the natural utterance. The pitch of the word in the concatenated sentence
is also modified using TD-PSOLA so that it fits the pitch contour of the word in the
natural speech sentence.
In the modification process, precise pitch detection is crucial. Pitch detection based on
the method of autocorrelation is developed to obtain the pitch contour of each
monosyllabic speech unit accurately. |
---|