Mandarin speech synthesis
Synthetic speech has been developed steadily for the past few decades. The objective of the project is to develop a Chinese Text-to-Speech system using concatenative synthesis. Concatenative synthesis involves playing pre-recorded samples of natural speech such as the word or syllable. As Mandari...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/62164 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-62164 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-621642023-07-07T15:53:06Z Mandarin speech synthesis Teoh, Xueli Foo Say Wei School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering Synthetic speech has been developed steadily for the past few decades. The objective of the project is to develop a Chinese Text-to-Speech system using concatenative synthesis. Concatenative synthesis involves playing pre-recorded samples of natural speech such as the word or syllable. As Mandarin Chinese is a tonal language with each character pronounced as a syllable, syllable is chosen as the basic synthesis unit in this project. A Chinese speech database containing more than 1300 Chinese syllables were pre-recorded and used for the concatenative speech synthesis. The technique of Time Domain Pitch Synchronous Overlap Add approach (TDPSOLA) was adopted to allow pre-recorded speech samples smoothly concatenated and provides a good controlling for the pitch and duration. A natural speech sentence was pre-recorded and used to compare with the concatenated synthetic speech. To make the concatenated synthetic speech sound as close as that of the natural speech, the technique of TD-PSOLA is adopted to change the duration of each character in the concatenated sentence so that the duration will be almost the same as that of the character in the natural utterance. The pitch of the word in the concatenated sentence is also modified using TD-PSOLA so that it fits the pitch contour of the word in the natural speech sentence. In the modification process, precise pitch detection is crucial. Pitch detection based on the method of autocorrelation is developed to obtain the pitch contour of each monosyllabic speech unit accurately. Bachelor of Engineering 2015-02-12T02:04:43Z 2015-02-12T02:04:43Z 2008 2008 Final Year Project (FYP) http://hdl.handle.net/10356/62164 en Nanyang Technological University 95 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Electrical and electronic engineering |
spellingShingle |
DRNTU::Engineering::Electrical and electronic engineering Teoh, Xueli Mandarin speech synthesis |
description |
Synthetic speech has been developed steadily for the past few decades. The objective
of the project is to develop a Chinese Text-to-Speech system using concatenative
synthesis. Concatenative synthesis involves playing pre-recorded samples of natural
speech such as the word or syllable. As Mandarin Chinese is a tonal language with
each character pronounced as a syllable, syllable is chosen as the basic synthesis unit
in this project. A Chinese speech database containing more than 1300 Chinese
syllables were pre-recorded and used for the concatenative speech synthesis.
The technique of Time Domain Pitch Synchronous Overlap Add approach (TDPSOLA)
was adopted to allow pre-recorded speech samples smoothly concatenated
and provides a good controlling for the pitch and duration. A natural speech sentence
was pre-recorded and used to compare with the concatenated synthetic speech. To
make the concatenated synthetic speech sound as close as that of the natural speech,
the technique of TD-PSOLA is adopted to change the duration of each character in the
concatenated sentence so that the duration will be almost the same as that of the
character in the natural utterance. The pitch of the word in the concatenated sentence
is also modified using TD-PSOLA so that it fits the pitch contour of the word in the
natural speech sentence.
In the modification process, precise pitch detection is crucial. Pitch detection based on
the method of autocorrelation is developed to obtain the pitch contour of each
monosyllabic speech unit accurately. |
author2 |
Foo Say Wei |
author_facet |
Foo Say Wei Teoh, Xueli |
format |
Final Year Project |
author |
Teoh, Xueli |
author_sort |
Teoh, Xueli |
title |
Mandarin speech synthesis |
title_short |
Mandarin speech synthesis |
title_full |
Mandarin speech synthesis |
title_fullStr |
Mandarin speech synthesis |
title_full_unstemmed |
Mandarin speech synthesis |
title_sort |
mandarin speech synthesis |
publishDate |
2015 |
url |
http://hdl.handle.net/10356/62164 |
_version_ |
1772825350120669184 |