Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA
Meeting a good accuracy in speech recognition systems had been one of the challenges in automatic speech recognition (ASR) designs. In this study, a spectral subtraction speech enhancement is added to the acoustic front end of an ASR system. The two word vocabulary ASR system with speech enhancement...
Saved in:
Main Author: | |
---|---|
Format: | text |
Language: | English |
Published: |
Animo Repository
2012
|
Online Access: | https://animorepository.dlsu.edu.ph/etd_masteral/4302 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | De La Salle University |
Language: | English |
id |
oai:animorepository.dlsu.edu.ph:etd_masteral-11140 |
---|---|
record_format |
eprints |
spelling |
oai:animorepository.dlsu.edu.ph:etd_masteral-111402024-08-09T06:51:08Z Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA Orillo, John William F. Meeting a good accuracy in speech recognition systems had been one of the challenges in automatic speech recognition (ASR) designs. In this study, a spectral subtraction speech enhancement is added to the acoustic front end of an ASR system. The two word vocabulary ASR system with speech enhancement was first modelled in MATLAB. The system starts with the framing, windowing and FFT of the input speech signal. Noise is then estimated from the output of the FFT by averaging the first 8 output frames of the FFT. The estimated noise spectrum magnitude is subtracted from the original speech signal. And to totally enhance the speech, noise flooring is included in the design. A factor beta (b) is multiplied to the noise estimate and is substituted to the original speech during silence period. The hardware modelling was done using VHDL which practically followed the MATLAB design. After the VHDL design was realized it was then implemented on the FPGA. Both the MATLAB and FPGA models are evaluated in terms of the correlation of the original clean speech and enhanced speech, and the recognition accuracy. After several testing it was concluded that the optimum beta (b) to use for the spectral subtraction is 0.01. In MATLAB the average correlation obtained for SNR -3.4 to 34.8 dB is 83.7% while 80.28% was recorded for FPGA. The average recognition rate on the other hand for the MATLAB and FPGA is 45.58% and 48.5% respectively. Also the tolerable background noise that the system could handle is within 0 to 68.6 dB background noise with recognition accuracy of 75% and above. 2012-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/etd_masteral/4302 Master's Theses English Animo Repository |
institution |
De La Salle University |
building |
De La Salle University Library |
continent |
Asia |
country |
Philippines Philippines |
content_provider |
De La Salle University Library |
collection |
DLSU Institutional Repository |
language |
English |
description |
Meeting a good accuracy in speech recognition systems had been one of the challenges in automatic speech recognition (ASR) designs. In this study, a spectral subtraction speech enhancement is added to the acoustic front end of an ASR system. The two word vocabulary ASR system with speech enhancement was first modelled in MATLAB. The system starts with the framing, windowing and FFT of the input speech signal. Noise is then estimated from the output of the FFT by averaging the first 8 output frames of the FFT. The estimated noise spectrum magnitude is subtracted from the original speech signal. And to totally enhance the speech, noise flooring is included in the design. A factor beta (b) is multiplied to the noise estimate and is substituted to the original speech during silence period. The hardware modelling was done using VHDL which practically followed the MATLAB design. After the VHDL design was realized it was then implemented on the FPGA. Both the MATLAB and FPGA models are evaluated in terms of the correlation of the original clean speech and enhanced speech, and the recognition accuracy. After several testing it was concluded that the optimum beta (b) to use for the spectral subtraction is 0.01. In MATLAB the average correlation obtained for SNR -3.4 to 34.8 dB is 83.7% while 80.28% was recorded for FPGA. The average recognition rate on the other hand for the MATLAB and FPGA is 45.58% and 48.5% respectively. Also the tolerable background noise that the system could handle is within 0 to 68.6 dB background noise with recognition accuracy of 75% and above. |
format |
text |
author |
Orillo, John William F. |
spellingShingle |
Orillo, John William F. Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA |
author_facet |
Orillo, John William F. |
author_sort |
Orillo, John William F. |
title |
Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA |
title_short |
Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA |
title_full |
Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA |
title_fullStr |
Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA |
title_full_unstemmed |
Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA |
title_sort |
spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in fpga |
publisher |
Animo Repository |
publishDate |
2012 |
url |
https://animorepository.dlsu.edu.ph/etd_masteral/4302 |
_version_ |
1808616431942631424 |