Correlation-based frequency warping for voice conversion

Frequency warping (FW) based voice conversion aims to modify the frequency axis of source spectra towards that of the target. In previous works, the optimal warping function was calculated by minimizing the spectral distance of converted and target spectra without considering the spectral shape. Nev...

Full description

Saved in:
Bibliographic Details
Main Authors: Tian, Xiaohai, Wu, Zhizheng, Lee, Siu-Wa, Chng, Eng Siong
Other Authors: School of Computer Science and Engineering
Format: Conference or Workshop Item
Language:English
Published: 2018
Subjects:
Online Access:https://hdl.handle.net/10356/89598
http://hdl.handle.net/10220/47053
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-89598
record_format dspace
spelling sg-ntu-dr.10356-895982020-03-07T11:48:46Z Correlation-based frequency warping for voice conversion Tian, Xiaohai Wu, Zhizheng Lee, Siu-Wa Chng, Eng Siong School of Computer Science and Engineering The 9th International Symposium on Chinese Spoken Language Processing NTU-UBC Research Centre of Excellence in Active Living for the Elderly DRNTU::Engineering::Computer science and engineering Speech Synthesis Voice Conversion Frequency warping (FW) based voice conversion aims to modify the frequency axis of source spectra towards that of the target. In previous works, the optimal warping function was calculated by minimizing the spectral distance of converted and target spectra without considering the spectral shape. Nevertheless, speaker timbre and identity greatly depend on vocal tract peaks and valleys of spectrum. In this paper, we propose a method to define the warping function by maximizing the correlation between the converted and target spectra. Different from the conventional warping methods, the correlation-based optimization is not determined by the magnitude of the spectra. Instead, both spectral peaks and valleys are considered in the optimization process, which also improves the performance of amplitude scaling. Experiments were conducted on VOICES database, and the results show that after amplitude scaling our proposed method reduced the mel-spectral distortion from 5.85 dB to 5.60 dB. The subjective listening tests also confirmed the effectiveness of the proposed method. NRF (Natl Research Foundation, S’pore) Accepted version 2018-12-18T06:16:37Z 2019-12-06T17:29:15Z 2018-12-18T06:16:37Z 2019-12-06T17:29:15Z 2014-09-01 2014 Conference Paper Tian, X., Wu, Z., Lee, S.-W., & Chng, E. S. (2014). Correlation-based frequency warping for voice conversion. The 9th International Symposium on Chinese Spoken Language Processing, 211-215. doi:10.1109/ISCSLP.2014.6936725 https://hdl.handle.net/10356/89598 http://hdl.handle.net/10220/47053 10.1109/ISCSLP.2014.6936725 187517 en © 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: [http://dx.doi.org/10.1109/ISCSLP.2014.6936725]. 5 p. application/pdf
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering
Speech Synthesis
Voice Conversion
spellingShingle DRNTU::Engineering::Computer science and engineering
Speech Synthesis
Voice Conversion
Tian, Xiaohai
Wu, Zhizheng
Lee, Siu-Wa
Chng, Eng Siong
Correlation-based frequency warping for voice conversion
description Frequency warping (FW) based voice conversion aims to modify the frequency axis of source spectra towards that of the target. In previous works, the optimal warping function was calculated by minimizing the spectral distance of converted and target spectra without considering the spectral shape. Nevertheless, speaker timbre and identity greatly depend on vocal tract peaks and valleys of spectrum. In this paper, we propose a method to define the warping function by maximizing the correlation between the converted and target spectra. Different from the conventional warping methods, the correlation-based optimization is not determined by the magnitude of the spectra. Instead, both spectral peaks and valleys are considered in the optimization process, which also improves the performance of amplitude scaling. Experiments were conducted on VOICES database, and the results show that after amplitude scaling our proposed method reduced the mel-spectral distortion from 5.85 dB to 5.60 dB. The subjective listening tests also confirmed the effectiveness of the proposed method.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Tian, Xiaohai
Wu, Zhizheng
Lee, Siu-Wa
Chng, Eng Siong
format Conference or Workshop Item
author Tian, Xiaohai
Wu, Zhizheng
Lee, Siu-Wa
Chng, Eng Siong
author_sort Tian, Xiaohai
title Correlation-based frequency warping for voice conversion
title_short Correlation-based frequency warping for voice conversion
title_full Correlation-based frequency warping for voice conversion
title_fullStr Correlation-based frequency warping for voice conversion
title_full_unstemmed Correlation-based frequency warping for voice conversion
title_sort correlation-based frequency warping for voice conversion
publishDate 2018
url https://hdl.handle.net/10356/89598
http://hdl.handle.net/10220/47053
_version_ 1681038118827851776