Instantaneous amplitude model for speech analysis, synthesis and coding

This thesis presents a new model for speech signal representation. In this model, the speech signal is decomposed into a sum of sub-signals, each of which is characterized with one constant frequency and two Instantaneous Amplitudes (IA). The IAs are then parameterized with a polynomial. Instead of...

Full description

Saved in:

Bibliographic Details
Main Author:	Ng, Ling Kok.
Other Authors:	Li, Gang
Format:	Theses and Dissertations
Language:	English
Published:	2008
Subjects:	DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing
Online Access:	http://hdl.handle.net/10356/13255
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-13255
record_format	dspace
spelling	sg-ntu-dr.10356-132552023-07-04T16:01:20Z Instantaneous amplitude model for speech analysis, synthesis and coding Ng, Ling Kok. Li, Gang School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing This thesis presents a new model for speech signal representation. In this model, the speech signal is decomposed into a sum of sub-signals, each of which is characterized with one constant frequency and two Instantaneous Amplitudes (IA). The IAs are then parameterized with a polynomial. Instead of the simple peak-picking algorithm, an iterative frequency estimation algorithm is proposed, which yields a higher resolution. This model can avoid the difficulty in dealing with the timevarying phases and allow us to carry out an optimization procedure easily such that the synthetic speech can be made as close to the original one as possible. For verification purpose, the classical sinusoidal model proposed by McAulay and Quatieri is built and used as a yardstick for performance comparison with the IA model. Experiments show that the synthetic speech with the developed technique are of excellent quality, and almost indistinguishable perceptually from the original speech. With same parameterization complexity, the IA model provides a better synthetic speech, in terms of the perceptual quality and the synthetic waveform, and requires less parameters than the classical sinusoidal model. A coding algorithm is developed to code the parameters from the IA model. A fixed bit rate of 40kbps is achieved for excellent quality synthetic speeches sampled at 16kHz frequency. Master of Engineering 2008-08-13T06:21:38Z 2008-10-20T07:21:46Z 2008-08-13T06:21:38Z 2008-10-20T07:21:46Z 1999 1999 Thesis http://hdl.handle.net/10356/13255 en 145 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing
spellingShingle	DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing Ng, Ling Kok. Instantaneous amplitude model for speech analysis, synthesis and coding
description	This thesis presents a new model for speech signal representation. In this model, the speech signal is decomposed into a sum of sub-signals, each of which is characterized with one constant frequency and two Instantaneous Amplitudes (IA). The IAs are then parameterized with a polynomial. Instead of the simple peak-picking algorithm, an iterative frequency estimation algorithm is proposed, which yields a higher resolution. This model can avoid the difficulty in dealing with the timevarying phases and allow us to carry out an optimization procedure easily such that the synthetic speech can be made as close to the original one as possible. For verification purpose, the classical sinusoidal model proposed by McAulay and Quatieri is built and used as a yardstick for performance comparison with the IA model. Experiments show that the synthetic speech with the developed technique are of excellent quality, and almost indistinguishable perceptually from the original speech. With same parameterization complexity, the IA model provides a better synthetic speech, in terms of the perceptual quality and the synthetic waveform, and requires less parameters than the classical sinusoidal model. A coding algorithm is developed to code the parameters from the IA model. A fixed bit rate of 40kbps is achieved for excellent quality synthetic speeches sampled at 16kHz frequency.
author2	Li, Gang
author_facet	Li, Gang Ng, Ling Kok.
format	Theses and Dissertations
author	Ng, Ling Kok.
author_sort	Ng, Ling Kok.
title	Instantaneous amplitude model for speech analysis, synthesis and coding
title_short	Instantaneous amplitude model for speech analysis, synthesis and coding
title_full	Instantaneous amplitude model for speech analysis, synthesis and coding
title_fullStr	Instantaneous amplitude model for speech analysis, synthesis and coding
title_full_unstemmed	Instantaneous amplitude model for speech analysis, synthesis and coding
title_sort	instantaneous amplitude model for speech analysis, synthesis and coding
publishDate	2008
url	http://hdl.handle.net/10356/13255
_version_	1772825744575037440

Instantaneous amplitude model for speech analysis, synthesis and coding

Similar Items