Domain adversarial training for speech enhancement
The performance of deep learning approaches to speech enhancement degrades significantly in face of mismatch between training and testing. In this paper, we propose a domain adversarial training technique for unsupervised domain transfer, that 1) overcomes domain mismatch, and 2) provides a solution...
Saved in:
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/144786 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-144786 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1447862020-11-28T20:10:37Z Domain adversarial training for speech enhancement Hou, Nana Xu, Chenglin Chng, Eng Siong Li, Haizhou School of Computer Science and Engineering 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Air Traffic Management Research Institute Temasek Laboratories @ NTU Engineering::Computer science and engineering Domain Adversarial Training Speech Enhancement The performance of deep learning approaches to speech enhancement degrades significantly in face of mismatch between training and testing. In this paper, we propose a domain adversarial training technique for unsupervised domain transfer, that 1) overcomes domain mismatch, and 2) provides a solution to the scenario where we only have noisy speech data, and we don’t have clean-noisy parallel data in the new domain. Specifically, our method includes two parts that are jointly trained, 1) an enhancement net to map noisy speech to clean speech by indirectly estimating a mask with a spectrum approximation loss, and 2) a domain predictor to distinguish between domains. As the proposed approach is able to adapt to a new domain only with noisy speech data in target domain, we call it an unsupervised learning technique. Experiments suggest that our approach delivers voice quality comparable with other supervised learning techniques that require clean-noisy parallel data. Accepted version This research is supported by Temasek Laboratories@NTU, Nanyang Technological University, Singapore. 2020-11-24T06:30:28Z 2020-11-24T06:30:28Z 2019 Conference Paper Hou, N., Xu, C., Chng, E. S., & Li, H. (2019). Domain adversarial training for speech enhancement. Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 667-672. doi:10.1109/APSIPAASC47483.2019.9023218 https://hdl.handle.net/10356/144786 10.1109/APSIPAASC47483.2019.9023218 667 672 en © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/APSIPAASC47483.2019.9023218 application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering Domain Adversarial Training Speech Enhancement |
spellingShingle |
Engineering::Computer science and engineering Domain Adversarial Training Speech Enhancement Hou, Nana Xu, Chenglin Chng, Eng Siong Li, Haizhou Domain adversarial training for speech enhancement |
description |
The performance of deep learning approaches to speech enhancement degrades significantly in face of mismatch between training and testing. In this paper, we propose a domain adversarial training technique for unsupervised domain transfer, that 1) overcomes domain mismatch, and 2) provides a solution to the scenario where we only have noisy speech data, and we don’t have clean-noisy parallel data in the new domain. Specifically, our method includes two parts that are jointly trained, 1) an enhancement net to map noisy speech to clean speech by indirectly estimating a mask with a spectrum approximation loss, and 2) a domain predictor to distinguish between domains. As the proposed approach is able to adapt to a new domain only with noisy speech data in target domain, we call it an unsupervised learning technique. Experiments suggest that our approach delivers voice quality comparable with other supervised learning techniques that require clean-noisy parallel data. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Hou, Nana Xu, Chenglin Chng, Eng Siong Li, Haizhou |
format |
Conference or Workshop Item |
author |
Hou, Nana Xu, Chenglin Chng, Eng Siong Li, Haizhou |
author_sort |
Hou, Nana |
title |
Domain adversarial training for speech enhancement |
title_short |
Domain adversarial training for speech enhancement |
title_full |
Domain adversarial training for speech enhancement |
title_fullStr |
Domain adversarial training for speech enhancement |
title_full_unstemmed |
Domain adversarial training for speech enhancement |
title_sort |
domain adversarial training for speech enhancement |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/144786 |
_version_ |
1688665609598926848 |