Wavelet analysis of speaker-dependent speech features

Speaker-dependent speech features are usually estimated using the Short Time Fourier Transform (STFT) method. However, due to the non-stationary nature of speech signals, a fixed-sized window function used by STFT is insufficient to provide accurate time-frequency resolution. In this study, a Discre...

Full description

Saved in:
Bibliographic Details
Main Author: Wong, Jocelynn Olida
Format: text
Language:English
Published: Animo Repository 2001
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etd_masteral/3206
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
id oai:animorepository.dlsu.edu.ph:etd_masteral-10044
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:etd_masteral-100442020-12-10T07:09:17Z Wavelet analysis of speaker-dependent speech features Wong, Jocelynn Olida Speaker-dependent speech features are usually estimated using the Short Time Fourier Transform (STFT) method. However, due to the non-stationary nature of speech signals, a fixed-sized window function used by STFT is insufficient to provide accurate time-frequency resolution. In this study, a Discrete Wavelet Transform (DWT) algorithm was used to analyze speech signals. This transform was designed to apply an Order-3 B-Spline wavelet as its basis function. At each decomposition level of the wavelet transform, the time resolution is halved and the frequency resolution is doubled solving the time-frequency resolution problem. Algorithms for the extraction of speaker-dependent speech features were also developed. To obtain the energy feature of speech, the energy equation was extended to include the computation of energy across all scales. To obtain the fundamental pitch frequency, the pitch period was measured by locating the occurrences of glottal closures in the scales of the wavelet transform. Instead of using all the scales for the pitch period estimation, one algorithm was designed to utilize the first two adjacent scales and another algorithm was designed to use only one scale. Based on the analysis of these algorithms, it was observed that the energy matrix obtained by the energy vector extraction algorithm characterizes the intensity of the speaker's voice across time. Two algorithms are developed for pitch period estimation and both are based on the detection of glottal closure instants (GCI) in voiced sounds. The first algorithm involves correlating the first two scales of the wavelet transform while the second algorithm involves only one scale of the wavelet transform in its measurement. Overall estimation error rates of 2.4% on the first algorithm and 7.5% on the second algorithm were obtained. 2001-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/etd_masteral/3206 Master's Theses English Animo Repository Wavelets (Mathematics) Speech processing systems Automatic speech recognition Voice frequency
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
language English
topic Wavelets (Mathematics)
Speech processing systems
Automatic speech recognition
Voice frequency
spellingShingle Wavelets (Mathematics)
Speech processing systems
Automatic speech recognition
Voice frequency
Wong, Jocelynn Olida
Wavelet analysis of speaker-dependent speech features
description Speaker-dependent speech features are usually estimated using the Short Time Fourier Transform (STFT) method. However, due to the non-stationary nature of speech signals, a fixed-sized window function used by STFT is insufficient to provide accurate time-frequency resolution. In this study, a Discrete Wavelet Transform (DWT) algorithm was used to analyze speech signals. This transform was designed to apply an Order-3 B-Spline wavelet as its basis function. At each decomposition level of the wavelet transform, the time resolution is halved and the frequency resolution is doubled solving the time-frequency resolution problem. Algorithms for the extraction of speaker-dependent speech features were also developed. To obtain the energy feature of speech, the energy equation was extended to include the computation of energy across all scales. To obtain the fundamental pitch frequency, the pitch period was measured by locating the occurrences of glottal closures in the scales of the wavelet transform. Instead of using all the scales for the pitch period estimation, one algorithm was designed to utilize the first two adjacent scales and another algorithm was designed to use only one scale. Based on the analysis of these algorithms, it was observed that the energy matrix obtained by the energy vector extraction algorithm characterizes the intensity of the speaker's voice across time. Two algorithms are developed for pitch period estimation and both are based on the detection of glottal closure instants (GCI) in voiced sounds. The first algorithm involves correlating the first two scales of the wavelet transform while the second algorithm involves only one scale of the wavelet transform in its measurement. Overall estimation error rates of 2.4% on the first algorithm and 7.5% on the second algorithm were obtained.
format text
author Wong, Jocelynn Olida
author_facet Wong, Jocelynn Olida
author_sort Wong, Jocelynn Olida
title Wavelet analysis of speaker-dependent speech features
title_short Wavelet analysis of speaker-dependent speech features
title_full Wavelet analysis of speaker-dependent speech features
title_fullStr Wavelet analysis of speaker-dependent speech features
title_full_unstemmed Wavelet analysis of speaker-dependent speech features
title_sort wavelet analysis of speaker-dependent speech features
publisher Animo Repository
publishDate 2001
url https://animorepository.dlsu.edu.ph/etd_masteral/3206
_version_ 1712575126700032000