Multi-task learning for end-to-end noise-robust bandwidth extension

Bandwidth extension aims to reconstruct wideband speech signals from narrowband inputs to improve perceptual quality. Prior studies mostly perform bandwidth extension under the assumption that the narrowband signals are clean without noise. The use of such extension techniques is greatly limited in...

Full description

Saved in:
Bibliographic Details
Main Authors: Hou, Nana, Xu, Chenglin, Zhou, Joey Tianyi, Chng, Eng Siong, Li, Haizhou
Other Authors: School of Computer Science and Engineering
Format: Conference or Workshop Item
Language:English
Published: 2020
Subjects:
Online Access:https://hdl.handle.net/10356/144855
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-144855
record_format dspace
spelling sg-ntu-dr.10356-1448552020-12-05T20:10:27Z Multi-task learning for end-to-end noise-robust bandwidth extension Hou, Nana Xu, Chenglin Zhou, Joey Tianyi Chng, Eng Siong Li, Haizhou School of Computer Science and Engineering Interspeech 2020 Air Traffic Management Research Institute Engineering::Computer science and engineering Speech Enhancement Noise-robust Bandwidth Extension Bandwidth extension aims to reconstruct wideband speech signals from narrowband inputs to improve perceptual quality. Prior studies mostly perform bandwidth extension under the assumption that the narrowband signals are clean without noise. The use of such extension techniques is greatly limited in practice when signals are corrupted by noise. To alleviate such problem, we propose an end-to-end time-domain framework for noise-robust bandwidth extension, that jointly optimizes a mask-based speech enhancement and an ideal bandwidth extension module with multi-task learning. The proposed framework avoids decomposing the signals into magnitude and phase spectra, therefore, requires no phase estimation. Experimental results show that the proposed method achieves 14.3% and 15.8% relative improvements over the best baseline in terms of perceptual evaluation of speech quality (PESQ) and log-spectral distortion (LSD), respectively. Furthermore, our method is 3 times more compact than the best baseline in terms of the number of parameters. National Research Foundation (NRF) Published version This work was supported by Air Traffic Management Research Institute of Nanyang Technological University, HumanRobot Interaction Phase 1 (Grant No. 192 25 00054), National Research Foundation (NRF) Singapore under the National Robotics Programme; AI Speech Lab (Award No. AISG100E-2018-006), NRF Singapore under the AI Singapore Programme; Human Robot Collaborative AI for AME (Grant No. A18A2b0046), NRF Singapore; Neuromorphic Computing Programme (Grant No. A1687b0033), RIE2020 Advanced Manufacturing and Engineering Programmatic Grant. The work by H. Li is also partly supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy (University Allowance, EXC 2077, University of Bremen, Germany). 2020-11-30T08:15:09Z 2020-11-30T08:15:09Z 2020 Conference Paper Hou, N., Xu, C., Zhou, J. T., Chng, E. S., & Li, H. (2020). Multi-task learning for end-to-end noise-robust bandwidth extension. Interspeech 2020, 4069-4073. https://hdl.handle.net/10356/144855 4069 4073 en © 2020 International Speech Communication Association (ISCA). All rights reserved. This paper was published in Interspeech 2020 and is made available with permission of International Speech Communication Association (ISCA). application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Speech Enhancement
Noise-robust Bandwidth Extension
spellingShingle Engineering::Computer science and engineering
Speech Enhancement
Noise-robust Bandwidth Extension
Hou, Nana
Xu, Chenglin
Zhou, Joey Tianyi
Chng, Eng Siong
Li, Haizhou
Multi-task learning for end-to-end noise-robust bandwidth extension
description Bandwidth extension aims to reconstruct wideband speech signals from narrowband inputs to improve perceptual quality. Prior studies mostly perform bandwidth extension under the assumption that the narrowband signals are clean without noise. The use of such extension techniques is greatly limited in practice when signals are corrupted by noise. To alleviate such problem, we propose an end-to-end time-domain framework for noise-robust bandwidth extension, that jointly optimizes a mask-based speech enhancement and an ideal bandwidth extension module with multi-task learning. The proposed framework avoids decomposing the signals into magnitude and phase spectra, therefore, requires no phase estimation. Experimental results show that the proposed method achieves 14.3% and 15.8% relative improvements over the best baseline in terms of perceptual evaluation of speech quality (PESQ) and log-spectral distortion (LSD), respectively. Furthermore, our method is 3 times more compact than the best baseline in terms of the number of parameters.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Hou, Nana
Xu, Chenglin
Zhou, Joey Tianyi
Chng, Eng Siong
Li, Haizhou
format Conference or Workshop Item
author Hou, Nana
Xu, Chenglin
Zhou, Joey Tianyi
Chng, Eng Siong
Li, Haizhou
author_sort Hou, Nana
title Multi-task learning for end-to-end noise-robust bandwidth extension
title_short Multi-task learning for end-to-end noise-robust bandwidth extension
title_full Multi-task learning for end-to-end noise-robust bandwidth extension
title_fullStr Multi-task learning for end-to-end noise-robust bandwidth extension
title_full_unstemmed Multi-task learning for end-to-end noise-robust bandwidth extension
title_sort multi-task learning for end-to-end noise-robust bandwidth extension
publishDate 2020
url https://hdl.handle.net/10356/144855
_version_ 1688665520307437568