Instantaneous amplitude model for speech analysis, synthesis and coding

This thesis presents a new model for speech signal representation. In this model, the speech signal is decomposed into a sum of sub-signals, each of which is characterized with one constant frequency and two Instantaneous Amplitudes (IA). The IAs are then parameterized with a polynomial. Instead of...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Ng, Ling Kok.
مؤلفون آخرون:	Li, Gang
التنسيق:	Theses and Dissertations
اللغة:	English
منشور في:	2008
الموضوعات:	DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing
الوصول للمادة أونلاين:	http://hdl.handle.net/10356/13255
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

الوصف
الملخص:	This thesis presents a new model for speech signal representation. In this model, the speech signal is decomposed into a sum of sub-signals, each of which is characterized with one constant frequency and two Instantaneous Amplitudes (IA). The IAs are then parameterized with a polynomial. Instead of the simple peak-picking algorithm, an iterative frequency estimation algorithm is proposed, which yields a higher resolution. This model can avoid the difficulty in dealing with the timevarying phases and allow us to carry out an optimization procedure easily such that the synthetic speech can be made as close to the original one as possible. For verification purpose, the classical sinusoidal model proposed by McAulay and Quatieri is built and used as a yardstick for performance comparison with the IA model. Experiments show that the synthetic speech with the developed technique are of excellent quality, and almost indistinguishable perceptually from the original speech. With same parameterization complexity, the IA model provides a better synthetic speech, in terms of the perceptual quality and the synthetic waveform, and requires less parameters than the classical sinusoidal model. A coding algorithm is developed to code the parameters from the IA model. A fixed bit rate of 40kbps is achieved for excellent quality synthetic speeches sampled at 16kHz frequency.

Instantaneous amplitude model for speech analysis, synthesis and coding

مواد مشابهة