Multichannel equalization applied to speech dereverberation
Speech signals acquired by a distant microphone inside an enclosed space is often degraded by reverberation. Reverberation results from the multipath propagation of a sound wave from its source to receivers. Reverberation can cause a detrimental effect on the perceived quality as well as the intelli...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/62174 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-62174 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-621742023-07-04T17:15:36Z Multichannel equalization applied to speech dereverberation Rajan Sobhana Rashobh Andy Khong Wai Hoong School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering Speech signals acquired by a distant microphone inside an enclosed space is often degraded by reverberation. Reverberation results from the multipath propagation of a sound wave from its source to receivers. Reverberation can cause a detrimental effect on the perceived quality as well as the intelligibility of the speech signals. This results in performance degradation of systems such as hand-free telephony, hearing aids, and automatic speech/speaker recognition systems. One of the popular approaches to mitigate the effects of reverberation is to achieve channel equalization via a two-stage process where acoustic impulse responses (AIRs) are first estimated using blind channel identification (BCI) techniques after which the received signals are filtered using inverse filters computed from the estimated AIRs. This thesis focuses on speech dereverberation employing BCI and inverse filtering. A typical AIR is often non-minimum phase and its direct inversion will result in an unstable inverse filter. Multichannnel equalization (MCEQ) algorithms developed for use with a microphone array are employed for the equalization of such non-minimum phase AIRs. Existing MCEQ algorithms achieve equalization in the time domain and in this thesis, a generalized framework that allows one to achieve equalization in different transform domains is proposed first. This is motivated from the fact that when equalization is performed on different domains, the inherent properties of the transforms can be exploited to achieve better equalization performance. Noting that the computational complexity of the non-adaptive MCEQ algorithm is proportional to the AIR order, a set of adaptive time-domain MCEQ algorithms are proposed to achieve equalization of high-order AIRs with reduced complexity. These algorithms iteratively estimate the inverse filters by minimizing a cost function. To improve the convergence as well as equalization performance, the sparsity of the desired equalized response is taken into account in the cost function and update equation. Although the time-domain adaptive algorithms reduce complexity, they suffer from slow convergence. To overcome this limitation, complexity reduction in the frequency domain is exploited. The proposed algorithm which achieves equalization in each frequency bin is derived from the proposed generalized framework for MCEQ. It is shown that the proposed algorithm significantly reduces the complexity involved in MCEQ and exhibits higher robustness to channel estimation errors. To further reduce the processing time of the proposed frequency domain MCEQ algorithm, adaptive filtering techniques are introduced. To achieve convergence in a single step, an optimal step size is derived for the proposed adaptive algorithm. Finally, a frequency-domain adaptive BCI algorithm is proposed for the estimation of unknown channels. The proposed algorithm exploits the spatial diversity of a multichannel system and estimates the AIRs based on the cross-relation among the channels. To gain more insights into its performance, the misconvergence problem is analyzed and based on this analysis, a penalty term derived from a sparseness constraint is introduced to the cost function for noise robustness. DOCTOR OF PHILOSOPHY (EEE) 2015-02-25T01:35:50Z 2015-02-25T01:35:50Z 2015 2015 Thesis Rajan Sobhana Rashobh. (2015). Multichannel equalization applied to speech dereverberation. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/62174 10.32657/10356/62174 en 211 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Electrical and electronic engineering |
spellingShingle |
DRNTU::Engineering::Electrical and electronic engineering Rajan Sobhana Rashobh Multichannel equalization applied to speech dereverberation |
description |
Speech signals acquired by a distant microphone inside an enclosed space is often degraded by reverberation. Reverberation results from the multipath propagation of a sound wave from its source to receivers. Reverberation can cause a detrimental effect on the perceived quality as well as the intelligibility of the speech signals. This results in performance degradation of systems such as hand-free telephony, hearing aids, and automatic speech/speaker recognition systems. One of the popular approaches to mitigate the effects of reverberation is to achieve channel equalization via a two-stage process where acoustic impulse responses (AIRs) are first estimated using blind channel identification (BCI) techniques after which the received signals are filtered using inverse filters computed from the estimated AIRs. This thesis focuses on speech dereverberation employing BCI and inverse filtering. A typical AIR is often non-minimum phase and its direct inversion will result in an unstable inverse filter. Multichannnel equalization (MCEQ) algorithms developed for use with a microphone array are employed for the equalization of such non-minimum phase AIRs. Existing MCEQ algorithms achieve
equalization in the time domain and in this thesis, a generalized framework that allows one to achieve equalization in different transform domains is proposed first. This is motivated from the fact that when equalization is performed on different domains, the inherent properties of the transforms can be exploited to achieve better equalization performance. Noting that the computational complexity of the non-adaptive MCEQ algorithm is proportional to the AIR order, a set of adaptive time-domain MCEQ algorithms are proposed to achieve equalization of high-order AIRs with reduced complexity. These algorithms iteratively estimate the inverse filters by minimizing a cost function. To improve the convergence as well as equalization performance, the sparsity of the desired equalized response is taken into account in the cost function and update equation. Although the time-domain adaptive algorithms reduce complexity, they suffer from slow convergence. To overcome this limitation, complexity reduction in the frequency domain is exploited. The proposed algorithm which achieves equalization in each frequency bin is derived from the proposed generalized framework for MCEQ. It is shown that the proposed algorithm significantly reduces the complexity involved in MCEQ and exhibits higher robustness to channel estimation errors. To further reduce the processing time of the proposed frequency domain MCEQ algorithm, adaptive filtering techniques are introduced. To achieve convergence in a single step, an optimal step size is derived for the proposed adaptive algorithm. Finally, a frequency-domain adaptive BCI algorithm is proposed for the estimation of unknown channels. The proposed algorithm exploits the spatial diversity of a multichannel system and estimates the AIRs based on the cross-relation among the channels. To gain more insights into its performance, the misconvergence problem is analyzed and based on this analysis, a penalty term derived from a sparseness constraint is introduced to the cost function for noise robustness. |
author2 |
Andy Khong Wai Hoong |
author_facet |
Andy Khong Wai Hoong Rajan Sobhana Rashobh |
format |
Theses and Dissertations |
author |
Rajan Sobhana Rashobh |
author_sort |
Rajan Sobhana Rashobh |
title |
Multichannel equalization applied to speech dereverberation |
title_short |
Multichannel equalization applied to speech dereverberation |
title_full |
Multichannel equalization applied to speech dereverberation |
title_fullStr |
Multichannel equalization applied to speech dereverberation |
title_full_unstemmed |
Multichannel equalization applied to speech dereverberation |
title_sort |
multichannel equalization applied to speech dereverberation |
publishDate |
2015 |
url |
https://hdl.handle.net/10356/62174 |
_version_ |
1772827258889699328 |