Computer speaks Mandarin

The objective of this project is to develop a Chinese Text-to-Speech technique using concatenative synthesis. Concatenative synthesis is a method that utilizes pre-recorded samples of natural syllables to generate any desired synthesized speech. In this project, Time Domain Pitch Synchronous Over...

Full description

Saved in:
Bibliographic Details
Main Author: Chuah, Ree Gann.
Other Authors: Foo Say Wei
Format: Final Year Project
Language:English
Published: 2011
Subjects:
Online Access:http://hdl.handle.net/10356/45847
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The objective of this project is to develop a Chinese Text-to-Speech technique using concatenative synthesis. Concatenative synthesis is a method that utilizes pre-recorded samples of natural syllables to generate any desired synthesized speech. In this project, Time Domain Pitch Synchronous Overlap Add technique (TD-PSOLA) is adopted. This approach allows concatenation among pre-recorded syllable samples and provides flexibility in controlling the duration and pitch of each syllable. Chinese syllable is chosen as the basic synthesis unit in this project. More than 1300 Chinese syllables were pre-recorded for this project. A natural speech sentence is pre-recorded and serves as a guideline for pitch and duration of the concatenated speech. Subsequently, TD-PSOLA is used to generate the ideal synthesized speech by modifying the pitch and duration of the concatenated speech so that the speech quality of synthesized version remains as close as the natural speech. TD-PSOLA allows the sample syllables to be stretched or compressed so that it can fit the ideal time duration. The pitch can also be modified to generate the desired pitch contour using the natural speech as a guideline.