Correlation-based frequency warping for voice conversion
Frequency warping (FW) based voice conversion aims to modify the frequency axis of source spectra towards that of the target. In previous works, the optimal warping function was calculated by minimizing the spectral distance of converted and target spectra without considering the spectral shape. Nev...
Saved in:
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2018
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/89598 http://hdl.handle.net/10220/47053 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-89598 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-895982020-03-07T11:48:46Z Correlation-based frequency warping for voice conversion Tian, Xiaohai Wu, Zhizheng Lee, Siu-Wa Chng, Eng Siong School of Computer Science and Engineering The 9th International Symposium on Chinese Spoken Language Processing NTU-UBC Research Centre of Excellence in Active Living for the Elderly DRNTU::Engineering::Computer science and engineering Speech Synthesis Voice Conversion Frequency warping (FW) based voice conversion aims to modify the frequency axis of source spectra towards that of the target. In previous works, the optimal warping function was calculated by minimizing the spectral distance of converted and target spectra without considering the spectral shape. Nevertheless, speaker timbre and identity greatly depend on vocal tract peaks and valleys of spectrum. In this paper, we propose a method to define the warping function by maximizing the correlation between the converted and target spectra. Different from the conventional warping methods, the correlation-based optimization is not determined by the magnitude of the spectra. Instead, both spectral peaks and valleys are considered in the optimization process, which also improves the performance of amplitude scaling. Experiments were conducted on VOICES database, and the results show that after amplitude scaling our proposed method reduced the mel-spectral distortion from 5.85 dB to 5.60 dB. The subjective listening tests also confirmed the effectiveness of the proposed method. NRF (Natl Research Foundation, S’pore) Accepted version 2018-12-18T06:16:37Z 2019-12-06T17:29:15Z 2018-12-18T06:16:37Z 2019-12-06T17:29:15Z 2014-09-01 2014 Conference Paper Tian, X., Wu, Z., Lee, S.-W., & Chng, E. S. (2014). Correlation-based frequency warping for voice conversion. The 9th International Symposium on Chinese Spoken Language Processing, 211-215. doi:10.1109/ISCSLP.2014.6936725 https://hdl.handle.net/10356/89598 http://hdl.handle.net/10220/47053 10.1109/ISCSLP.2014.6936725 187517 en © 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: [http://dx.doi.org/10.1109/ISCSLP.2014.6936725]. 5 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
country |
Singapore |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering Speech Synthesis Voice Conversion |
spellingShingle |
DRNTU::Engineering::Computer science and engineering Speech Synthesis Voice Conversion Tian, Xiaohai Wu, Zhizheng Lee, Siu-Wa Chng, Eng Siong Correlation-based frequency warping for voice conversion |
description |
Frequency warping (FW) based voice conversion aims to modify the frequency axis of source spectra towards that of the target. In previous works, the optimal warping function was calculated by minimizing the spectral distance of converted and target spectra without considering the spectral shape. Nevertheless, speaker timbre and identity greatly depend on vocal tract peaks and valleys of spectrum. In this paper, we propose a method to define the warping function by maximizing the correlation between the converted and target spectra. Different from the conventional warping methods, the correlation-based optimization is not determined by the magnitude of the spectra. Instead, both spectral peaks and valleys are considered in the optimization process, which also improves the performance of amplitude scaling. Experiments were conducted on VOICES database, and the results show that after amplitude scaling our proposed method reduced the mel-spectral distortion from 5.85 dB to 5.60 dB. The subjective listening tests also confirmed the effectiveness of the proposed method. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Tian, Xiaohai Wu, Zhizheng Lee, Siu-Wa Chng, Eng Siong |
format |
Conference or Workshop Item |
author |
Tian, Xiaohai Wu, Zhizheng Lee, Siu-Wa Chng, Eng Siong |
author_sort |
Tian, Xiaohai |
title |
Correlation-based frequency warping for voice conversion |
title_short |
Correlation-based frequency warping for voice conversion |
title_full |
Correlation-based frequency warping for voice conversion |
title_fullStr |
Correlation-based frequency warping for voice conversion |
title_full_unstemmed |
Correlation-based frequency warping for voice conversion |
title_sort |
correlation-based frequency warping for voice conversion |
publishDate |
2018 |
url |
https://hdl.handle.net/10356/89598 http://hdl.handle.net/10220/47053 |
_version_ |
1681038118827851776 |