Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA

Meeting a good accuracy in speech recognition systems had been one of the challenges in automatic speech recognition (ASR) designs. In this study, a spectral subtraction speech enhancement is added to the acoustic front end of an ASR system. The two word vocabulary ASR system with speech enhancement...

Full description

Saved in:

Bibliographic Details
Main Author:	Orillo, John William F.
Format:	text
Language:	English
Published:	Animo Repository 2012
Online Access:	https://animorepository.dlsu.edu.ph/etd_masteral/4302
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	De La Salle University
Language:	English

id	oai:animorepository.dlsu.edu.ph:etd_masteral-11140
record_format	eprints
spelling	oai:animorepository.dlsu.edu.ph:etd_masteral-111402024-08-09T06:51:08Z Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA Orillo, John William F. Meeting a good accuracy in speech recognition systems had been one of the challenges in automatic speech recognition (ASR) designs. In this study, a spectral subtraction speech enhancement is added to the acoustic front end of an ASR system. The two word vocabulary ASR system with speech enhancement was first modelled in MATLAB. The system starts with the framing, windowing and FFT of the input speech signal. Noise is then estimated from the output of the FFT by averaging the first 8 output frames of the FFT. The estimated noise spectrum magnitude is subtracted from the original speech signal. And to totally enhance the speech, noise flooring is included in the design. A factor beta (b) is multiplied to the noise estimate and is substituted to the original speech during silence period. The hardware modelling was done using VHDL which practically followed the MATLAB design. After the VHDL design was realized it was then implemented on the FPGA. Both the MATLAB and FPGA models are evaluated in terms of the correlation of the original clean speech and enhanced speech, and the recognition accuracy. After several testing it was concluded that the optimum beta (b) to use for the spectral subtraction is 0.01. In MATLAB the average correlation obtained for SNR -3.4 to 34.8 dB is 83.7% while 80.28% was recorded for FPGA. The average recognition rate on the other hand for the MATLAB and FPGA is 45.58% and 48.5% respectively. Also the tolerable background noise that the system could handle is within 0 to 68.6 dB background noise with recognition accuracy of 75% and above. 2012-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/etd_masteral/4302 Master's Theses English Animo Repository
institution	De La Salle University
building	De La Salle University Library
continent	Asia
country	Philippines Philippines
content_provider	De La Salle University Library
collection	DLSU Institutional Repository
language	English
description	Meeting a good accuracy in speech recognition systems had been one of the challenges in automatic speech recognition (ASR) designs. In this study, a spectral subtraction speech enhancement is added to the acoustic front end of an ASR system. The two word vocabulary ASR system with speech enhancement was first modelled in MATLAB. The system starts with the framing, windowing and FFT of the input speech signal. Noise is then estimated from the output of the FFT by averaging the first 8 output frames of the FFT. The estimated noise spectrum magnitude is subtracted from the original speech signal. And to totally enhance the speech, noise flooring is included in the design. A factor beta (b) is multiplied to the noise estimate and is substituted to the original speech during silence period. The hardware modelling was done using VHDL which practically followed the MATLAB design. After the VHDL design was realized it was then implemented on the FPGA. Both the MATLAB and FPGA models are evaluated in terms of the correlation of the original clean speech and enhanced speech, and the recognition accuracy. After several testing it was concluded that the optimum beta (b) to use for the spectral subtraction is 0.01. In MATLAB the average correlation obtained for SNR -3.4 to 34.8 dB is 83.7% while 80.28% was recorded for FPGA. The average recognition rate on the other hand for the MATLAB and FPGA is 45.58% and 48.5% respectively. Also the tolerable background noise that the system could handle is within 0 to 68.6 dB background noise with recognition accuracy of 75% and above.
format	text
author	Orillo, John William F.
spellingShingle	Orillo, John William F. Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA
author_facet	Orillo, John William F.
author_sort	Orillo, John William F.
title	Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA
title_short	Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA
title_full	Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA
title_fullStr	Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA
title_full_unstemmed	Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA
title_sort	spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in fpga
publisher	Animo Repository
publishDate	2012
url	https://animorepository.dlsu.edu.ph/etd_masteral/4302
_version_	1808616431942631424

Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA

Similar Items