Wavelet analysis of speaker-dependent speech features

Speaker-dependent speech features are usually estimated using the Short Time Fourier Transform (STFT) method. However, due to the non-stationary nature of speech signals, a fixed-sized window function used by STFT is insufficient to provide accurate time-frequency resolution. In this study, a Discre...

Full description

Saved in:

Bibliographic Details
Main Author:	Wong, Jocelynn Olida
Format:	text
Language:	English
Published:	Animo Repository 2001
Subjects:	Wavelets (Mathematics) Speech processing systems Automatic speech recognition Voice frequency
Online Access:	https://animorepository.dlsu.edu.ph/etd_masteral/3206
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	De La Salle University
Language:	English

id	oai:animorepository.dlsu.edu.ph:etd_masteral-10044
record_format	eprints
spelling	oai:animorepository.dlsu.edu.ph:etd_masteral-100442020-12-10T07:09:17Z Wavelet analysis of speaker-dependent speech features Wong, Jocelynn Olida Speaker-dependent speech features are usually estimated using the Short Time Fourier Transform (STFT) method. However, due to the non-stationary nature of speech signals, a fixed-sized window function used by STFT is insufficient to provide accurate time-frequency resolution. In this study, a Discrete Wavelet Transform (DWT) algorithm was used to analyze speech signals. This transform was designed to apply an Order-3 B-Spline wavelet as its basis function. At each decomposition level of the wavelet transform, the time resolution is halved and the frequency resolution is doubled solving the time-frequency resolution problem. Algorithms for the extraction of speaker-dependent speech features were also developed. To obtain the energy feature of speech, the energy equation was extended to include the computation of energy across all scales. To obtain the fundamental pitch frequency, the pitch period was measured by locating the occurrences of glottal closures in the scales of the wavelet transform. Instead of using all the scales for the pitch period estimation, one algorithm was designed to utilize the first two adjacent scales and another algorithm was designed to use only one scale. Based on the analysis of these algorithms, it was observed that the energy matrix obtained by the energy vector extraction algorithm characterizes the intensity of the speaker's voice across time. Two algorithms are developed for pitch period estimation and both are based on the detection of glottal closure instants (GCI) in voiced sounds. The first algorithm involves correlating the first two scales of the wavelet transform while the second algorithm involves only one scale of the wavelet transform in its measurement. Overall estimation error rates of 2.4% on the first algorithm and 7.5% on the second algorithm were obtained. 2001-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/etd_masteral/3206 Master's Theses English Animo Repository Wavelets (Mathematics) Speech processing systems Automatic speech recognition Voice frequency
institution	De La Salle University
building	De La Salle University Library
continent	Asia
country	Philippines Philippines
content_provider	De La Salle University Library
collection	DLSU Institutional Repository
language	English
topic	Wavelets (Mathematics) Speech processing systems Automatic speech recognition Voice frequency
spellingShingle	Wavelets (Mathematics) Speech processing systems Automatic speech recognition Voice frequency Wong, Jocelynn Olida Wavelet analysis of speaker-dependent speech features
description	Speaker-dependent speech features are usually estimated using the Short Time Fourier Transform (STFT) method. However, due to the non-stationary nature of speech signals, a fixed-sized window function used by STFT is insufficient to provide accurate time-frequency resolution. In this study, a Discrete Wavelet Transform (DWT) algorithm was used to analyze speech signals. This transform was designed to apply an Order-3 B-Spline wavelet as its basis function. At each decomposition level of the wavelet transform, the time resolution is halved and the frequency resolution is doubled solving the time-frequency resolution problem. Algorithms for the extraction of speaker-dependent speech features were also developed. To obtain the energy feature of speech, the energy equation was extended to include the computation of energy across all scales. To obtain the fundamental pitch frequency, the pitch period was measured by locating the occurrences of glottal closures in the scales of the wavelet transform. Instead of using all the scales for the pitch period estimation, one algorithm was designed to utilize the first two adjacent scales and another algorithm was designed to use only one scale. Based on the analysis of these algorithms, it was observed that the energy matrix obtained by the energy vector extraction algorithm characterizes the intensity of the speaker's voice across time. Two algorithms are developed for pitch period estimation and both are based on the detection of glottal closure instants (GCI) in voiced sounds. The first algorithm involves correlating the first two scales of the wavelet transform while the second algorithm involves only one scale of the wavelet transform in its measurement. Overall estimation error rates of 2.4% on the first algorithm and 7.5% on the second algorithm were obtained.
format	text
author	Wong, Jocelynn Olida
author_facet	Wong, Jocelynn Olida
author_sort	Wong, Jocelynn Olida
title	Wavelet analysis of speaker-dependent speech features
title_short	Wavelet analysis of speaker-dependent speech features
title_full	Wavelet analysis of speaker-dependent speech features
title_fullStr	Wavelet analysis of speaker-dependent speech features
title_full_unstemmed	Wavelet analysis of speaker-dependent speech features
title_sort	wavelet analysis of speaker-dependent speech features
publisher	Animo Repository
publishDate	2001
url	https://animorepository.dlsu.edu.ph/etd_masteral/3206
_version_	1712575126700032000

Wavelet analysis of speaker-dependent speech features

Similar Items