Blind source separation of speech mixtures
This thesis addresses three well-known problems in blind source separation (BSS) of speech mixtures: blind source recovery (BSR) in instantaneous underdetermined BSS, mixing matrix estimation in convolutive underdetermined BSS and residual crosstalk suppression in convolutive determined BSS. For BSR...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/63274 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-63274 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing |
spellingShingle |
DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing Liu, Benxu Blind source separation of speech mixtures |
description |
This thesis addresses three well-known problems in blind source separation (BSS) of speech mixtures: blind source recovery (BSR) in instantaneous underdetermined BSS, mixing matrix estimation in convolutive underdetermined BSS and residual crosstalk suppression in convolutive determined BSS. For BSR in instantaneous underdetermined BSS (UBSS), the temporal structure of the source signals is exploited to improve the separation performance. The temporal structure of the source signals is described using the autoregressive (AR) model. Existing UBSS algorithms ignore the source signal temporal structure and introduce artifacts in the separated results. To address this problem, the first proposed algorithm estimates the source signal AR coefficients so as to preserve the temporal structure. Estimation of AR coefficients is achieved by applying linear prediction on the partially-separated sources derived from conventional sparseness-based algorithms. The estimated AR coefficients are subsequently employed to reduce the original mixing problem into a mixing problem of AR model inputs of source signals. This problem is then solved by exploiting the conventional minimum $L_1$-norm algorithm. Simulations show that the proposed method can achieve higher performance than the minimum $L_1$-norm algorithm. For the second algorithm, the AR model (obtained similarly by applying linear prediction on the partially-separated sources derived from conventional sparseness-based algorithm) is combined with the original mixing equation to form a state-space model. This model is subsequently solved using the Kalman filter in order to obtain the refined source estimate. Simulation results show the effectiveness of the proposed sparseness-based AR-Kalman (SPARK) algorithm compared to the conventional sparseness-based algorithms. The above two algorithms serve as postprocessing techniques of existing non-linear sparseness-based algorithms. In the third proposed algorithm, the source temporal structure is exploited for the development of a linear BSR solution for UBSS which does not require source signals to be sparse. Assuming that the source signals are uncorrelated and can be modeled by an AR model, the proposed algorithm is able to estimate the source AR coefficients from the mixtures given the mixing matrix. The UBSS problem is then converted into a determined problem by combining the source AR model with the original mixing equation to form a state-space model. The Kalman filter is subsequently applied to obtain a linear source estimate in the minimum mean-squared error sense. Simulation results using both synthetic AR signals and speech utterances show that the proposed algorithm achieves better separation performance compared with conventional sparseness-based UBSS algorithms. For estimating the mixing matrix in convolutive underdetermined BSS (CUBSS), conventional algorithms assume that the source signals are W-disjoint in the time-frequency (TF) domain. This assumption requires that each TF point of the received mixtures is a single-source point (SSP), which may not always be true. A preprocessing technique is proposed to estimate the single-source confidence (SSC) of each TF point. Only those TF points with a high SSC value are then used by existing algorithms to obtain a more accurate estimate of the mixing matrix with reduced computational complexity. Simulation and experimental results show that the proposed preprocessing method can improve the performance of the existing CUBSS algorithms.
Finally, for suppressing residual crosstalk in the outputs of blind source separation algorithms, existing algorithms employ the Wiener filter. In the context of BSS, the Wiener filter is shown to be optimal in the maximum likelihood (ML) sense only for normally-distributed signals. The distribution of speech signals is then modeled using the Gaussian mixture model (GMM) and a post-filter in the ML sense is then derived using the expectation-maximization algorithm.
The GMM is then shown to introduce a probabilistic sample weight that is able to emphasize speech segments that are free of crosstalk components in the BSS output and this results in a better estimate of the post-filter.
Simulation results show that the proposed post-filter achieves better crosstalk suppression than the Wiener filter for BSS. |
author2 |
Andy Khong Wai Hoong |
author_facet |
Andy Khong Wai Hoong Liu, Benxu |
format |
Theses and Dissertations |
author |
Liu, Benxu |
author_sort |
Liu, Benxu |
title |
Blind source separation of speech mixtures |
title_short |
Blind source separation of speech mixtures |
title_full |
Blind source separation of speech mixtures |
title_fullStr |
Blind source separation of speech mixtures |
title_full_unstemmed |
Blind source separation of speech mixtures |
title_sort |
blind source separation of speech mixtures |
publishDate |
2015 |
url |
http://hdl.handle.net/10356/63274 |
_version_ |
1772827816957575168 |
spelling |
sg-ntu-dr.10356-632742023-07-04T16:18:56Z Blind source separation of speech mixtures Liu, Benxu Andy Khong Wai Hoong School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing This thesis addresses three well-known problems in blind source separation (BSS) of speech mixtures: blind source recovery (BSR) in instantaneous underdetermined BSS, mixing matrix estimation in convolutive underdetermined BSS and residual crosstalk suppression in convolutive determined BSS. For BSR in instantaneous underdetermined BSS (UBSS), the temporal structure of the source signals is exploited to improve the separation performance. The temporal structure of the source signals is described using the autoregressive (AR) model. Existing UBSS algorithms ignore the source signal temporal structure and introduce artifacts in the separated results. To address this problem, the first proposed algorithm estimates the source signal AR coefficients so as to preserve the temporal structure. Estimation of AR coefficients is achieved by applying linear prediction on the partially-separated sources derived from conventional sparseness-based algorithms. The estimated AR coefficients are subsequently employed to reduce the original mixing problem into a mixing problem of AR model inputs of source signals. This problem is then solved by exploiting the conventional minimum $L_1$-norm algorithm. Simulations show that the proposed method can achieve higher performance than the minimum $L_1$-norm algorithm. For the second algorithm, the AR model (obtained similarly by applying linear prediction on the partially-separated sources derived from conventional sparseness-based algorithm) is combined with the original mixing equation to form a state-space model. This model is subsequently solved using the Kalman filter in order to obtain the refined source estimate. Simulation results show the effectiveness of the proposed sparseness-based AR-Kalman (SPARK) algorithm compared to the conventional sparseness-based algorithms. The above two algorithms serve as postprocessing techniques of existing non-linear sparseness-based algorithms. In the third proposed algorithm, the source temporal structure is exploited for the development of a linear BSR solution for UBSS which does not require source signals to be sparse. Assuming that the source signals are uncorrelated and can be modeled by an AR model, the proposed algorithm is able to estimate the source AR coefficients from the mixtures given the mixing matrix. The UBSS problem is then converted into a determined problem by combining the source AR model with the original mixing equation to form a state-space model. The Kalman filter is subsequently applied to obtain a linear source estimate in the minimum mean-squared error sense. Simulation results using both synthetic AR signals and speech utterances show that the proposed algorithm achieves better separation performance compared with conventional sparseness-based UBSS algorithms. For estimating the mixing matrix in convolutive underdetermined BSS (CUBSS), conventional algorithms assume that the source signals are W-disjoint in the time-frequency (TF) domain. This assumption requires that each TF point of the received mixtures is a single-source point (SSP), which may not always be true. A preprocessing technique is proposed to estimate the single-source confidence (SSC) of each TF point. Only those TF points with a high SSC value are then used by existing algorithms to obtain a more accurate estimate of the mixing matrix with reduced computational complexity. Simulation and experimental results show that the proposed preprocessing method can improve the performance of the existing CUBSS algorithms. Finally, for suppressing residual crosstalk in the outputs of blind source separation algorithms, existing algorithms employ the Wiener filter. In the context of BSS, the Wiener filter is shown to be optimal in the maximum likelihood (ML) sense only for normally-distributed signals. The distribution of speech signals is then modeled using the Gaussian mixture model (GMM) and a post-filter in the ML sense is then derived using the expectation-maximization algorithm. The GMM is then shown to introduce a probabilistic sample weight that is able to emphasize speech segments that are free of crosstalk components in the BSS output and this results in a better estimate of the post-filter. Simulation results show that the proposed post-filter achieves better crosstalk suppression than the Wiener filter for BSS. Doctor of Philosophy (EEE) 2015-05-12T03:35:51Z 2015-05-12T03:35:51Z 2015 2015 Thesis Liu, B. (2015). Blind source separation of speech mixtures. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/63274 en 170 p. application/pdf |