Multi-task learning for end-to-end noise-robust bandwidth extension
Bandwidth extension aims to reconstruct wideband speech signals from narrowband inputs to improve perceptual quality. Prior studies mostly perform bandwidth extension under the assumption that the narrowband signals are clean without noise. The use of such extension techniques is greatly limited in...
Saved in:
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/144855 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-144855 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1448552020-12-05T20:10:27Z Multi-task learning for end-to-end noise-robust bandwidth extension Hou, Nana Xu, Chenglin Zhou, Joey Tianyi Chng, Eng Siong Li, Haizhou School of Computer Science and Engineering Interspeech 2020 Air Traffic Management Research Institute Engineering::Computer science and engineering Speech Enhancement Noise-robust Bandwidth Extension Bandwidth extension aims to reconstruct wideband speech signals from narrowband inputs to improve perceptual quality. Prior studies mostly perform bandwidth extension under the assumption that the narrowband signals are clean without noise. The use of such extension techniques is greatly limited in practice when signals are corrupted by noise. To alleviate such problem, we propose an end-to-end time-domain framework for noise-robust bandwidth extension, that jointly optimizes a mask-based speech enhancement and an ideal bandwidth extension module with multi-task learning. The proposed framework avoids decomposing the signals into magnitude and phase spectra, therefore, requires no phase estimation. Experimental results show that the proposed method achieves 14.3% and 15.8% relative improvements over the best baseline in terms of perceptual evaluation of speech quality (PESQ) and log-spectral distortion (LSD), respectively. Furthermore, our method is 3 times more compact than the best baseline in terms of the number of parameters. National Research Foundation (NRF) Published version This work was supported by Air Traffic Management Research Institute of Nanyang Technological University, HumanRobot Interaction Phase 1 (Grant No. 192 25 00054), National Research Foundation (NRF) Singapore under the National Robotics Programme; AI Speech Lab (Award No. AISG100E-2018-006), NRF Singapore under the AI Singapore Programme; Human Robot Collaborative AI for AME (Grant No. A18A2b0046), NRF Singapore; Neuromorphic Computing Programme (Grant No. A1687b0033), RIE2020 Advanced Manufacturing and Engineering Programmatic Grant. The work by H. Li is also partly supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy (University Allowance, EXC 2077, University of Bremen, Germany). 2020-11-30T08:15:09Z 2020-11-30T08:15:09Z 2020 Conference Paper Hou, N., Xu, C., Zhou, J. T., Chng, E. S., & Li, H. (2020). Multi-task learning for end-to-end noise-robust bandwidth extension. Interspeech 2020, 4069-4073. https://hdl.handle.net/10356/144855 4069 4073 en © 2020 International Speech Communication Association (ISCA). All rights reserved. This paper was published in Interspeech 2020 and is made available with permission of International Speech Communication Association (ISCA). application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering Speech Enhancement Noise-robust Bandwidth Extension |
spellingShingle |
Engineering::Computer science and engineering Speech Enhancement Noise-robust Bandwidth Extension Hou, Nana Xu, Chenglin Zhou, Joey Tianyi Chng, Eng Siong Li, Haizhou Multi-task learning for end-to-end noise-robust bandwidth extension |
description |
Bandwidth extension aims to reconstruct wideband speech signals from narrowband inputs to improve perceptual quality. Prior studies mostly perform bandwidth extension under the assumption that the narrowband signals are clean without noise. The use of such extension techniques is greatly limited in practice when signals are corrupted by noise. To alleviate such problem, we propose an end-to-end time-domain framework for noise-robust bandwidth extension, that jointly optimizes a mask-based speech enhancement and an ideal bandwidth extension module with multi-task learning. The proposed framework avoids decomposing the signals into magnitude and phase spectra, therefore, requires no phase estimation. Experimental results show that the proposed method achieves 14.3% and 15.8% relative improvements over the best baseline in terms of perceptual evaluation of speech quality (PESQ) and log-spectral distortion (LSD), respectively. Furthermore, our method is 3 times more compact than the best baseline in terms of the number of parameters. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Hou, Nana Xu, Chenglin Zhou, Joey Tianyi Chng, Eng Siong Li, Haizhou |
format |
Conference or Workshop Item |
author |
Hou, Nana Xu, Chenglin Zhou, Joey Tianyi Chng, Eng Siong Li, Haizhou |
author_sort |
Hou, Nana |
title |
Multi-task learning for end-to-end noise-robust bandwidth extension |
title_short |
Multi-task learning for end-to-end noise-robust bandwidth extension |
title_full |
Multi-task learning for end-to-end noise-robust bandwidth extension |
title_fullStr |
Multi-task learning for end-to-end noise-robust bandwidth extension |
title_full_unstemmed |
Multi-task learning for end-to-end noise-robust bandwidth extension |
title_sort |
multi-task learning for end-to-end noise-robust bandwidth extension |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/144855 |
_version_ |
1688665520307437568 |