Multi-task learning for end-to-end noise-robust bandwidth extension

Bandwidth extension aims to reconstruct wideband speech signals from narrowband inputs to improve perceptual quality. Prior studies mostly perform bandwidth extension under the assumption that the narrowband signals are clean without noise. The use of such extension techniques is greatly limited in...

Full description

Saved in:
Bibliographic Details
Main Authors: Hou, Nana, Xu, Chenglin, Zhou, Joey Tianyi, Chng, Eng Siong, Li, Haizhou
Other Authors: School of Computer Science and Engineering
Format: Conference or Workshop Item
Language:English
Published: 2020
Subjects:
Online Access:https://hdl.handle.net/10356/144855
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Bandwidth extension aims to reconstruct wideband speech signals from narrowband inputs to improve perceptual quality. Prior studies mostly perform bandwidth extension under the assumption that the narrowband signals are clean without noise. The use of such extension techniques is greatly limited in practice when signals are corrupted by noise. To alleviate such problem, we propose an end-to-end time-domain framework for noise-robust bandwidth extension, that jointly optimizes a mask-based speech enhancement and an ideal bandwidth extension module with multi-task learning. The proposed framework avoids decomposing the signals into magnitude and phase spectra, therefore, requires no phase estimation. Experimental results show that the proposed method achieves 14.3% and 15.8% relative improvements over the best baseline in terms of perceptual evaluation of speech quality (PESQ) and log-spectral distortion (LSD), respectively. Furthermore, our method is 3 times more compact than the best baseline in terms of the number of parameters.