Speaker and phoneme-aware speech bandwidth extension with residual dual-path network
Speech bandwidth extension aims to generate a wideband signal from a narrowband (low-band) input by predicting the missing high-frequency components. It is believed that the general knowledge about the speaker and phonetic content strengthens the prediction. In this paper, we propose to augment the...
Saved in:
Main Authors: | , , , , , |
---|---|
Other Authors: | |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/144854 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-144854 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1448542020-12-05T20:10:23Z Speaker and phoneme-aware speech bandwidth extension with residual dual-path network Hou, Nana Xu, Chenglin Pham, Van Tung Zhou, Joey Tianyi Chng, Eng Siong Li, Haizhou School of Computer Science and Engineering Interspeech 2020 Air Traffic Management Research Institute Engineering::Computer science and engineering Speech Enhancement Speech Bandwidth Extension Speech bandwidth extension aims to generate a wideband signal from a narrowband (low-band) input by predicting the missing high-frequency components. It is believed that the general knowledge about the speaker and phonetic content strengthens the prediction. In this paper, we propose to augment the low-band acoustic features with i-vector and phonetic posteriorgram (PPG), which represent speaker and phonetic content of the speech, respectively. We also propose a residual dual-path network (RDPN) as the core module to process the augmented features, which fully utilizes the utterance-level temporal continuity information and avoids gradient vanishing. Experiments show that the proposed method achieves 20.2% and 7.0% relative improvements over the best baseline in terms of log-spectral distortion (LSD) and signal-to-noise ratio (SNR), respectively. Furthermore, our method is 16 times more compact than the best baseline in terms of the number of parameters. National Research Foundation (NRF) Published version This work was supported by Air Traffic Management Research Institute of Nanyang Technological University, Human- Robot Interaction Phase 1 (Grant No. 192 25 00054), National Research Foundation (NRF) Singapore under the National Robotics Programme; AI Speech Lab (Award No. AISG- 100E-2018-006), NRF Singapore under the AI Singapore Programme; Human Robot Collaborative AI for AME (Grant No. A18A2b0046), NRF Singapore; Neuromorphic Computing Programme (Grant No. A1687b0033), RIE 2020 AME Programmatic Grant. The work by H. Li is also partly supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy (University Allowance, EXC 2077, University of Bremen, Germany). 2020-11-30T07:17:30Z 2020-11-30T07:17:30Z 2020 Conference Paper Hou, N., Xu, C., Pham, V. T., Zhou, J. T., Chng, E. S., & Li, H. (2020). Speaker and phoneme-aware speech bandwidth extension with residual dual-path network. Interspeech 2020, 4064-4068. https://hdl.handle.net/10356/144854 4064 4068 en © 2020 International Speech Communication Association (ISCA). All rights reserved. This paper was published in Interspeech 2020 and is made available with permission of International Speech Communication Association (ISCA). application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering Speech Enhancement Speech Bandwidth Extension |
spellingShingle |
Engineering::Computer science and engineering Speech Enhancement Speech Bandwidth Extension Hou, Nana Xu, Chenglin Pham, Van Tung Zhou, Joey Tianyi Chng, Eng Siong Li, Haizhou Speaker and phoneme-aware speech bandwidth extension with residual dual-path network |
description |
Speech bandwidth extension aims to generate a wideband signal from a narrowband (low-band) input by predicting the missing high-frequency components. It is believed that the general knowledge about the speaker and phonetic content strengthens the prediction. In this paper, we propose to augment the low-band acoustic features with i-vector and phonetic posteriorgram (PPG), which represent speaker and phonetic content of the speech, respectively. We also propose a residual dual-path network (RDPN) as the core module to process the augmented features, which fully utilizes the utterance-level temporal continuity information and avoids gradient vanishing. Experiments show that the proposed method achieves 20.2% and 7.0% relative improvements over the best baseline in terms of log-spectral distortion (LSD) and signal-to-noise ratio (SNR), respectively. Furthermore, our method is 16 times more compact than the best baseline in terms of the number of parameters. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Hou, Nana Xu, Chenglin Pham, Van Tung Zhou, Joey Tianyi Chng, Eng Siong Li, Haizhou |
format |
Conference or Workshop Item |
author |
Hou, Nana Xu, Chenglin Pham, Van Tung Zhou, Joey Tianyi Chng, Eng Siong Li, Haizhou |
author_sort |
Hou, Nana |
title |
Speaker and phoneme-aware speech bandwidth extension with residual dual-path network |
title_short |
Speaker and phoneme-aware speech bandwidth extension with residual dual-path network |
title_full |
Speaker and phoneme-aware speech bandwidth extension with residual dual-path network |
title_fullStr |
Speaker and phoneme-aware speech bandwidth extension with residual dual-path network |
title_full_unstemmed |
Speaker and phoneme-aware speech bandwidth extension with residual dual-path network |
title_sort |
speaker and phoneme-aware speech bandwidth extension with residual dual-path network |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/144854 |
_version_ |
1688665375770673152 |