Domain adversarial training for speech enhancement

The performance of deep learning approaches to speech enhancement degrades significantly in face of mismatch between training and testing. In this paper, we propose a domain adversarial training technique for unsupervised domain transfer, that 1) overcomes domain mismatch, and 2) provides a solution...

Full description

Saved in:
Bibliographic Details
Main Authors: Hou, Nana, Xu, Chenglin, Chng, Eng Siong, Li, Haizhou
Other Authors: School of Computer Science and Engineering
Format: Conference or Workshop Item
Language:English
Published: 2020
Subjects:
Online Access:https://hdl.handle.net/10356/144786
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-144786
record_format dspace
spelling sg-ntu-dr.10356-1447862020-11-28T20:10:37Z Domain adversarial training for speech enhancement Hou, Nana Xu, Chenglin Chng, Eng Siong Li, Haizhou School of Computer Science and Engineering 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Air Traffic Management Research Institute Temasek Laboratories @ NTU Engineering::Computer science and engineering Domain Adversarial Training Speech Enhancement The performance of deep learning approaches to speech enhancement degrades significantly in face of mismatch between training and testing. In this paper, we propose a domain adversarial training technique for unsupervised domain transfer, that 1) overcomes domain mismatch, and 2) provides a solution to the scenario where we only have noisy speech data, and we don’t have clean-noisy parallel data in the new domain. Specifically, our method includes two parts that are jointly trained, 1) an enhancement net to map noisy speech to clean speech by indirectly estimating a mask with a spectrum approximation loss, and 2) a domain predictor to distinguish between domains. As the proposed approach is able to adapt to a new domain only with noisy speech data in target domain, we call it an unsupervised learning technique. Experiments suggest that our approach delivers voice quality comparable with other supervised learning techniques that require clean-noisy parallel data. Accepted version This research is supported by Temasek Laboratories@NTU, Nanyang Technological University, Singapore. 2020-11-24T06:30:28Z 2020-11-24T06:30:28Z 2019 Conference Paper Hou, N., Xu, C., Chng, E. S., & Li, H. (2019). Domain adversarial training for speech enhancement. Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 667-672. doi:10.1109/APSIPAASC47483.2019.9023218 https://hdl.handle.net/10356/144786 10.1109/APSIPAASC47483.2019.9023218 667 672 en © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/APSIPAASC47483.2019.9023218 application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Domain Adversarial Training
Speech Enhancement
spellingShingle Engineering::Computer science and engineering
Domain Adversarial Training
Speech Enhancement
Hou, Nana
Xu, Chenglin
Chng, Eng Siong
Li, Haizhou
Domain adversarial training for speech enhancement
description The performance of deep learning approaches to speech enhancement degrades significantly in face of mismatch between training and testing. In this paper, we propose a domain adversarial training technique for unsupervised domain transfer, that 1) overcomes domain mismatch, and 2) provides a solution to the scenario where we only have noisy speech data, and we don’t have clean-noisy parallel data in the new domain. Specifically, our method includes two parts that are jointly trained, 1) an enhancement net to map noisy speech to clean speech by indirectly estimating a mask with a spectrum approximation loss, and 2) a domain predictor to distinguish between domains. As the proposed approach is able to adapt to a new domain only with noisy speech data in target domain, we call it an unsupervised learning technique. Experiments suggest that our approach delivers voice quality comparable with other supervised learning techniques that require clean-noisy parallel data.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Hou, Nana
Xu, Chenglin
Chng, Eng Siong
Li, Haizhou
format Conference or Workshop Item
author Hou, Nana
Xu, Chenglin
Chng, Eng Siong
Li, Haizhou
author_sort Hou, Nana
title Domain adversarial training for speech enhancement
title_short Domain adversarial training for speech enhancement
title_full Domain adversarial training for speech enhancement
title_fullStr Domain adversarial training for speech enhancement
title_full_unstemmed Domain adversarial training for speech enhancement
title_sort domain adversarial training for speech enhancement
publishDate 2020
url https://hdl.handle.net/10356/144786
_version_ 1688665609598926848