Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA

Meeting a good accuracy in speech recognition systems had been one of the challenges in automatic speech recognition (ASR) designs. In this study, a spectral subtraction speech enhancement is added to the acoustic front end of an ASR system. The two word vocabulary ASR system with speech enhancement...

Full description

Saved in:
Bibliographic Details
Main Author: Orillo, John William F.
Format: text
Language:English
Published: Animo Repository 2012
Online Access:https://animorepository.dlsu.edu.ph/etd_masteral/4302
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
id oai:animorepository.dlsu.edu.ph:etd_masteral-11140
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:etd_masteral-111402024-08-09T06:51:08Z Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA Orillo, John William F. Meeting a good accuracy in speech recognition systems had been one of the challenges in automatic speech recognition (ASR) designs. In this study, a spectral subtraction speech enhancement is added to the acoustic front end of an ASR system. The two word vocabulary ASR system with speech enhancement was first modelled in MATLAB. The system starts with the framing, windowing and FFT of the input speech signal. Noise is then estimated from the output of the FFT by averaging the first 8 output frames of the FFT. The estimated noise spectrum magnitude is subtracted from the original speech signal. And to totally enhance the speech, noise flooring is included in the design. A factor beta (b) is multiplied to the noise estimate and is substituted to the original speech during silence period. The hardware modelling was done using VHDL which practically followed the MATLAB design. After the VHDL design was realized it was then implemented on the FPGA. Both the MATLAB and FPGA models are evaluated in terms of the correlation of the original clean speech and enhanced speech, and the recognition accuracy. After several testing it was concluded that the optimum beta (b) to use for the spectral subtraction is 0.01. In MATLAB the average correlation obtained for SNR -3.4 to 34.8 dB is 83.7% while 80.28% was recorded for FPGA. The average recognition rate on the other hand for the MATLAB and FPGA is 45.58% and 48.5% respectively. Also the tolerable background noise that the system could handle is within 0 to 68.6 dB background noise with recognition accuracy of 75% and above. 2012-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/etd_masteral/4302 Master's Theses English Animo Repository
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
language English
description Meeting a good accuracy in speech recognition systems had been one of the challenges in automatic speech recognition (ASR) designs. In this study, a spectral subtraction speech enhancement is added to the acoustic front end of an ASR system. The two word vocabulary ASR system with speech enhancement was first modelled in MATLAB. The system starts with the framing, windowing and FFT of the input speech signal. Noise is then estimated from the output of the FFT by averaging the first 8 output frames of the FFT. The estimated noise spectrum magnitude is subtracted from the original speech signal. And to totally enhance the speech, noise flooring is included in the design. A factor beta (b) is multiplied to the noise estimate and is substituted to the original speech during silence period. The hardware modelling was done using VHDL which practically followed the MATLAB design. After the VHDL design was realized it was then implemented on the FPGA. Both the MATLAB and FPGA models are evaluated in terms of the correlation of the original clean speech and enhanced speech, and the recognition accuracy. After several testing it was concluded that the optimum beta (b) to use for the spectral subtraction is 0.01. In MATLAB the average correlation obtained for SNR -3.4 to 34.8 dB is 83.7% while 80.28% was recorded for FPGA. The average recognition rate on the other hand for the MATLAB and FPGA is 45.58% and 48.5% respectively. Also the tolerable background noise that the system could handle is within 0 to 68.6 dB background noise with recognition accuracy of 75% and above.
format text
author Orillo, John William F.
spellingShingle Orillo, John William F.
Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA
author_facet Orillo, John William F.
author_sort Orillo, John William F.
title Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA
title_short Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA
title_full Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA
title_fullStr Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA
title_full_unstemmed Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA
title_sort spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in fpga
publisher Animo Repository
publishDate 2012
url https://animorepository.dlsu.edu.ph/etd_masteral/4302
_version_ 1808616431942631424