Wavelet analysis of speaker-dependent speech features

Speaker-dependent speech features are usually estimated using the Short Time Fourier Transform (STFT) method. However, due to the non-stationary nature of speech signals, a fixed-sized window function used by STFT is insufficient to provide accurate time-frequency resolution. In this study, a Discre...

Full description

Saved in:
Bibliographic Details
Main Author: Wong, Jocelynn Olida
Format: text
Language:English
Published: Animo Repository 2001
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etd_masteral/3206
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
Description
Summary:Speaker-dependent speech features are usually estimated using the Short Time Fourier Transform (STFT) method. However, due to the non-stationary nature of speech signals, a fixed-sized window function used by STFT is insufficient to provide accurate time-frequency resolution. In this study, a Discrete Wavelet Transform (DWT) algorithm was used to analyze speech signals. This transform was designed to apply an Order-3 B-Spline wavelet as its basis function. At each decomposition level of the wavelet transform, the time resolution is halved and the frequency resolution is doubled solving the time-frequency resolution problem. Algorithms for the extraction of speaker-dependent speech features were also developed. To obtain the energy feature of speech, the energy equation was extended to include the computation of energy across all scales. To obtain the fundamental pitch frequency, the pitch period was measured by locating the occurrences of glottal closures in the scales of the wavelet transform. Instead of using all the scales for the pitch period estimation, one algorithm was designed to utilize the first two adjacent scales and another algorithm was designed to use only one scale. Based on the analysis of these algorithms, it was observed that the energy matrix obtained by the energy vector extraction algorithm characterizes the intensity of the speaker's voice across time. Two algorithms are developed for pitch period estimation and both are based on the detection of glottal closure instants (GCI) in voiced sounds. The first algorithm involves correlating the first two scales of the wavelet transform while the second algorithm involves only one scale of the wavelet transform in its measurement. Overall estimation error rates of 2.4% on the first algorithm and 7.5% on the second algorithm were obtained.