Blind source separation of speech mixtures

This thesis addresses three well-known problems in blind source separation (BSS) of speech mixtures: blind source recovery (BSR) in instantaneous underdetermined BSS, mixing matrix estimation in convolutive underdetermined BSS and residual crosstalk suppression in convolutive determined BSS. For BSR...

Full description

Saved in:
Bibliographic Details
Main Author: Liu, Benxu
Other Authors: Andy Khong Wai Hoong
Format: Theses and Dissertations
Language:English
Published: 2015
Subjects:
Online Access:http://hdl.handle.net/10356/63274
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-63274
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing
spellingShingle DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing
Liu, Benxu
Blind source separation of speech mixtures
description This thesis addresses three well-known problems in blind source separation (BSS) of speech mixtures: blind source recovery (BSR) in instantaneous underdetermined BSS, mixing matrix estimation in convolutive underdetermined BSS and residual crosstalk suppression in convolutive determined BSS. For BSR in instantaneous underdetermined BSS (UBSS), the temporal structure of the source signals is exploited to improve the separation performance. The temporal structure of the source signals is described using the autoregressive (AR) model. Existing UBSS algorithms ignore the source signal temporal structure and introduce artifacts in the separated results. To address this problem, the first proposed algorithm estimates the source signal AR coefficients so as to preserve the temporal structure. Estimation of AR coefficients is achieved by applying linear prediction on the partially-separated sources derived from conventional sparseness-based algorithms. The estimated AR coefficients are subsequently employed to reduce the original mixing problem into a mixing problem of AR model inputs of source signals. This problem is then solved by exploiting the conventional minimum $L_1$-norm algorithm. Simulations show that the proposed method can achieve higher performance than the minimum $L_1$-norm algorithm. For the second algorithm, the AR model (obtained similarly by applying linear prediction on the partially-separated sources derived from conventional sparseness-based algorithm) is combined with the original mixing equation to form a state-space model. This model is subsequently solved using the Kalman filter in order to obtain the refined source estimate. Simulation results show the effectiveness of the proposed sparseness-based AR-Kalman (SPARK) algorithm compared to the conventional sparseness-based algorithms. The above two algorithms serve as postprocessing techniques of existing non-linear sparseness-based algorithms. In the third proposed algorithm, the source temporal structure is exploited for the development of a linear BSR solution for UBSS which does not require source signals to be sparse. Assuming that the source signals are uncorrelated and can be modeled by an AR model, the proposed algorithm is able to estimate the source AR coefficients from the mixtures given the mixing matrix. The UBSS problem is then converted into a determined problem by combining the source AR model with the original mixing equation to form a state-space model. The Kalman filter is subsequently applied to obtain a linear source estimate in the minimum mean-squared error sense. Simulation results using both synthetic AR signals and speech utterances show that the proposed algorithm achieves better separation performance compared with conventional sparseness-based UBSS algorithms. For estimating the mixing matrix in convolutive underdetermined BSS (CUBSS), conventional algorithms assume that the source signals are W-disjoint in the time-frequency (TF) domain. This assumption requires that each TF point of the received mixtures is a single-source point (SSP), which may not always be true. A preprocessing technique is proposed to estimate the single-source confidence (SSC) of each TF point. Only those TF points with a high SSC value are then used by existing algorithms to obtain a more accurate estimate of the mixing matrix with reduced computational complexity. Simulation and experimental results show that the proposed preprocessing method can improve the performance of the existing CUBSS algorithms. Finally, for suppressing residual crosstalk in the outputs of blind source separation algorithms, existing algorithms employ the Wiener filter. In the context of BSS, the Wiener filter is shown to be optimal in the maximum likelihood (ML) sense only for normally-distributed signals. The distribution of speech signals is then modeled using the Gaussian mixture model (GMM) and a post-filter in the ML sense is then derived using the expectation-maximization algorithm. The GMM is then shown to introduce a probabilistic sample weight that is able to emphasize speech segments that are free of crosstalk components in the BSS output and this results in a better estimate of the post-filter. Simulation results show that the proposed post-filter achieves better crosstalk suppression than the Wiener filter for BSS.
author2 Andy Khong Wai Hoong
author_facet Andy Khong Wai Hoong
Liu, Benxu
format Theses and Dissertations
author Liu, Benxu
author_sort Liu, Benxu
title Blind source separation of speech mixtures
title_short Blind source separation of speech mixtures
title_full Blind source separation of speech mixtures
title_fullStr Blind source separation of speech mixtures
title_full_unstemmed Blind source separation of speech mixtures
title_sort blind source separation of speech mixtures
publishDate 2015
url http://hdl.handle.net/10356/63274
_version_ 1772827816957575168
spelling sg-ntu-dr.10356-632742023-07-04T16:18:56Z Blind source separation of speech mixtures Liu, Benxu Andy Khong Wai Hoong School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing This thesis addresses three well-known problems in blind source separation (BSS) of speech mixtures: blind source recovery (BSR) in instantaneous underdetermined BSS, mixing matrix estimation in convolutive underdetermined BSS and residual crosstalk suppression in convolutive determined BSS. For BSR in instantaneous underdetermined BSS (UBSS), the temporal structure of the source signals is exploited to improve the separation performance. The temporal structure of the source signals is described using the autoregressive (AR) model. Existing UBSS algorithms ignore the source signal temporal structure and introduce artifacts in the separated results. To address this problem, the first proposed algorithm estimates the source signal AR coefficients so as to preserve the temporal structure. Estimation of AR coefficients is achieved by applying linear prediction on the partially-separated sources derived from conventional sparseness-based algorithms. The estimated AR coefficients are subsequently employed to reduce the original mixing problem into a mixing problem of AR model inputs of source signals. This problem is then solved by exploiting the conventional minimum $L_1$-norm algorithm. Simulations show that the proposed method can achieve higher performance than the minimum $L_1$-norm algorithm. For the second algorithm, the AR model (obtained similarly by applying linear prediction on the partially-separated sources derived from conventional sparseness-based algorithm) is combined with the original mixing equation to form a state-space model. This model is subsequently solved using the Kalman filter in order to obtain the refined source estimate. Simulation results show the effectiveness of the proposed sparseness-based AR-Kalman (SPARK) algorithm compared to the conventional sparseness-based algorithms. The above two algorithms serve as postprocessing techniques of existing non-linear sparseness-based algorithms. In the third proposed algorithm, the source temporal structure is exploited for the development of a linear BSR solution for UBSS which does not require source signals to be sparse. Assuming that the source signals are uncorrelated and can be modeled by an AR model, the proposed algorithm is able to estimate the source AR coefficients from the mixtures given the mixing matrix. The UBSS problem is then converted into a determined problem by combining the source AR model with the original mixing equation to form a state-space model. The Kalman filter is subsequently applied to obtain a linear source estimate in the minimum mean-squared error sense. Simulation results using both synthetic AR signals and speech utterances show that the proposed algorithm achieves better separation performance compared with conventional sparseness-based UBSS algorithms. For estimating the mixing matrix in convolutive underdetermined BSS (CUBSS), conventional algorithms assume that the source signals are W-disjoint in the time-frequency (TF) domain. This assumption requires that each TF point of the received mixtures is a single-source point (SSP), which may not always be true. A preprocessing technique is proposed to estimate the single-source confidence (SSC) of each TF point. Only those TF points with a high SSC value are then used by existing algorithms to obtain a more accurate estimate of the mixing matrix with reduced computational complexity. Simulation and experimental results show that the proposed preprocessing method can improve the performance of the existing CUBSS algorithms. Finally, for suppressing residual crosstalk in the outputs of blind source separation algorithms, existing algorithms employ the Wiener filter. In the context of BSS, the Wiener filter is shown to be optimal in the maximum likelihood (ML) sense only for normally-distributed signals. The distribution of speech signals is then modeled using the Gaussian mixture model (GMM) and a post-filter in the ML sense is then derived using the expectation-maximization algorithm. The GMM is then shown to introduce a probabilistic sample weight that is able to emphasize speech segments that are free of crosstalk components in the BSS output and this results in a better estimate of the post-filter. Simulation results show that the proposed post-filter achieves better crosstalk suppression than the Wiener filter for BSS. Doctor of Philosophy (EEE) 2015-05-12T03:35:51Z 2015-05-12T03:35:51Z 2015 2015 Thesis Liu, B. (2015). Blind source separation of speech mixtures. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/63274 en 170 p. application/pdf