Robust speech features and acoustic models for speech recognition

This thesis examines techniques to improve the robustness of automatic speech recognition (ASR) systems against noise distortions. The study is important as the performance of ASR systems degrades dramatically in adverse environments, and hence greatly limits the speech recognition application deplo...

Full description

Saved in:

Bibliographic Details
Main Author:	Xiao, Xiong
Other Authors:	Chng Eng Siong
Format:	Theses and Dissertations
Language:	English
Published:	2010
Subjects:	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
Online Access:	https://hdl.handle.net/10356/20733
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-20733
record_format	dspace
spelling	sg-ntu-dr.10356-207332023-03-04T00:40:22Z Robust speech features and acoustic models for speech recognition Xiao, Xiong Chng Eng Siong Li Haizhou School of Computer Engineering Emerging Research Lab DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition This thesis examines techniques to improve the robustness of automatic speech recognition (ASR) systems against noise distortions. The study is important as the performance of ASR systems degrades dramatically in adverse environments, and hence greatly limits the speech recognition application deployment in realistic environments. Towards this end, we examine a feature compensation approach and a discriminative model training approach to improve the robustness of speech recognition system. The degradation of recognition performance is mainly due to the statistical mismatch between clean-trained acoustical model and noisy testing speech features. To reduce the feature-model mismatch, we propose to normalize the temporal structure of both training and testing speech features. Speech features' temporal structures are represented by the power spectral density (PSD) functions of feature trajectories. We propose to normalize the temporal structures by applying equalizing filters to the feature trajectories. The proposed filter is called temporal structure normalization (TSN) filter. Compared to other temporal filters used in speech recognition, the advantage of the TSN filter is its adaptability to changing environments. The TSN filter can also be viewed as a feature normalization technique that normalizes the PSD function of features, while other normalization methods, such as histogram equalization (HEQ), normalize the probability density function (p.d.f.) of features. Experimental study shows that the TSN filter produces better performance than other state-of-the-art temporal filters on both small vocabulary Aurora-2 task and large vocabulary Aurora-4 task. DOCTOR OF PHILOSOPHY (SCE) 2010-01-07T01:01:21Z 2010-01-07T01:01:21Z 2009 2009 Thesis Xiao, X. (2009). Robust speech features and acoustic models for speech recognition. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/20733 10.32657/10356/20733 en 194 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
spellingShingle	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition Xiao, Xiong Robust speech features and acoustic models for speech recognition
description	This thesis examines techniques to improve the robustness of automatic speech recognition (ASR) systems against noise distortions. The study is important as the performance of ASR systems degrades dramatically in adverse environments, and hence greatly limits the speech recognition application deployment in realistic environments. Towards this end, we examine a feature compensation approach and a discriminative model training approach to improve the robustness of speech recognition system. The degradation of recognition performance is mainly due to the statistical mismatch between clean-trained acoustical model and noisy testing speech features. To reduce the feature-model mismatch, we propose to normalize the temporal structure of both training and testing speech features. Speech features' temporal structures are represented by the power spectral density (PSD) functions of feature trajectories. We propose to normalize the temporal structures by applying equalizing filters to the feature trajectories. The proposed filter is called temporal structure normalization (TSN) filter. Compared to other temporal filters used in speech recognition, the advantage of the TSN filter is its adaptability to changing environments. The TSN filter can also be viewed as a feature normalization technique that normalizes the PSD function of features, while other normalization methods, such as histogram equalization (HEQ), normalize the probability density function (p.d.f.) of features. Experimental study shows that the TSN filter produces better performance than other state-of-the-art temporal filters on both small vocabulary Aurora-2 task and large vocabulary Aurora-4 task.
author2	Chng Eng Siong
author_facet	Chng Eng Siong Xiao, Xiong
format	Theses and Dissertations
author	Xiao, Xiong
author_sort	Xiao, Xiong
title	Robust speech features and acoustic models for speech recognition
title_short	Robust speech features and acoustic models for speech recognition
title_full	Robust speech features and acoustic models for speech recognition
title_fullStr	Robust speech features and acoustic models for speech recognition
title_full_unstemmed	Robust speech features and acoustic models for speech recognition
title_sort	robust speech features and acoustic models for speech recognition
publishDate	2010
url	https://hdl.handle.net/10356/20733
_version_	1759853844366360576

Robust speech features and acoustic models for speech recognition

Similar Items