Computer speaks Mandarin
The objective of this project is to develop a Chinese Text-to-Speech technique using concatenative synthesis. Concatenative synthesis is a method that utilizes pre-recorded samples of natural syllables to generate any desired synthesized speech. In this project, Time Domain Pitch Synchronous Over...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2011
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/45847 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | The objective of this project is to develop a Chinese Text-to-Speech technique using concatenative synthesis. Concatenative synthesis is a method that utilizes pre-recorded samples of natural syllables to generate any desired synthesized speech.
In this project, Time Domain Pitch Synchronous Overlap Add technique (TD-PSOLA) is adopted. This approach allows concatenation among pre-recorded syllable samples and provides flexibility in controlling the duration and pitch of each syllable. Chinese syllable is chosen as the basic synthesis unit in this project. More than 1300 Chinese syllables were pre-recorded for this project.
A natural speech sentence is pre-recorded and serves as a guideline for pitch and duration of the concatenated speech. Subsequently, TD-PSOLA is used to generate the ideal synthesized speech by modifying the pitch and duration of the concatenated speech so that the speech quality of synthesized version remains as close as the natural speech.
TD-PSOLA allows the sample syllables to be stretched or compressed so that it can fit the ideal time duration. The pitch can also be modified to generate the desired pitch contour using the natural speech as a guideline. |
---|