An automatic voice conversion evaluation strategy based on perceptual background noise distortion and speaker similarity

Voice conversion aims to modify the characteristics of one speaker to make it sound like spoken by another speaker without changing the language content. This task has attracted considerable attention and various approaches have been proposed since two decades ago. The evaluation of voice conversio...

Full description

Saved in:
Bibliographic Details
Main Authors: Huang, Dong-Yan, Xie, Lei, Zhang, Shaofei, Lee, Yvonne Siu Wa, Wu, Jie, Ming, Huaiping, Tian, Xiaohai, Ding, Chuang, Li, Mei, Nguyen, Quy Hy, Dong, Minghui, Chng, Eng Siong, Li, Haizhou
Other Authors: School of Computer Science and Engineering
Format: Conference or Workshop Item
Language:English
Published: 2019
Subjects:
Online Access:https://hdl.handle.net/10356/89623
http://hdl.handle.net/10220/49691
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-89623
record_format dspace
spelling sg-ntu-dr.10356-896232020-03-07T11:48:46Z An automatic voice conversion evaluation strategy based on perceptual background noise distortion and speaker similarity Huang, Dong-Yan Xie, Lei Zhang, Shaofei Lee, Yvonne Siu Wa Wu, Jie Ming, Huaiping Tian, Xiaohai Ding, Chuang Li, Mei Nguyen, Quy Hy Dong, Minghui Chng, Eng Siong Li, Haizhou School of Computer Science and Engineering 9th ISCA Speech Synthesis Workshop Engineering::Computer science and engineering Voice Conversion Objective Measures Voice conversion aims to modify the characteristics of one speaker to make it sound like spoken by another speaker without changing the language content. This task has attracted considerable attention and various approaches have been proposed since two decades ago. The evaluation of voice conversion approaches, usually through time-intensive subject listening tests, requires a huge amount of human labor. This paper proposes an automatic voice conversion evaluation strategy based on perceptual background noise distortion and speaker similarity. Experimental results show that our automatic evaluation results match the subjective listening results quite well. We further use our strategy to select best converted samples from multiple voice conversion systems and our submission achieves promising results in the voice conversion challenge (VCC2016). Published version 2019-08-20T04:39:20Z 2019-12-06T17:29:47Z 2019-08-20T04:39:20Z 2019-12-06T17:29:47Z 2016-09-01 2016 Conference Paper Huang, D.-Y., Xie, L., Lee, Y. S. W., Wu, J., Ming, H., Tian, X., … Li, H. (2016). An automatic voice conversion evaluation strategy based on perceptual background noise distortion and speaker similarity. 9th ISCA Speech Synthesis Workshop. doi:10.21437/SSW.2016-8 https://hdl.handle.net/10356/89623 http://hdl.handle.net/10220/49691 10.21437/SSW.2016-8 200458 en © 2016 International Speech Communication Association (ISCA). All rights reserved. This paper was published in 9th ISCA Speech Synthesis Workshop and is made available with permission of International Speech Communication Association (ISCA). 8 p. application/pdf
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Voice Conversion
Objective Measures
spellingShingle Engineering::Computer science and engineering
Voice Conversion
Objective Measures
Huang, Dong-Yan
Xie, Lei
Zhang, Shaofei
Lee, Yvonne Siu Wa
Wu, Jie
Ming, Huaiping
Tian, Xiaohai
Ding, Chuang
Li, Mei
Nguyen, Quy Hy
Dong, Minghui
Chng, Eng Siong
Li, Haizhou
An automatic voice conversion evaluation strategy based on perceptual background noise distortion and speaker similarity
description Voice conversion aims to modify the characteristics of one speaker to make it sound like spoken by another speaker without changing the language content. This task has attracted considerable attention and various approaches have been proposed since two decades ago. The evaluation of voice conversion approaches, usually through time-intensive subject listening tests, requires a huge amount of human labor. This paper proposes an automatic voice conversion evaluation strategy based on perceptual background noise distortion and speaker similarity. Experimental results show that our automatic evaluation results match the subjective listening results quite well. We further use our strategy to select best converted samples from multiple voice conversion systems and our submission achieves promising results in the voice conversion challenge (VCC2016).
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Huang, Dong-Yan
Xie, Lei
Zhang, Shaofei
Lee, Yvonne Siu Wa
Wu, Jie
Ming, Huaiping
Tian, Xiaohai
Ding, Chuang
Li, Mei
Nguyen, Quy Hy
Dong, Minghui
Chng, Eng Siong
Li, Haizhou
format Conference or Workshop Item
author Huang, Dong-Yan
Xie, Lei
Zhang, Shaofei
Lee, Yvonne Siu Wa
Wu, Jie
Ming, Huaiping
Tian, Xiaohai
Ding, Chuang
Li, Mei
Nguyen, Quy Hy
Dong, Minghui
Chng, Eng Siong
Li, Haizhou
author_sort Huang, Dong-Yan
title An automatic voice conversion evaluation strategy based on perceptual background noise distortion and speaker similarity
title_short An automatic voice conversion evaluation strategy based on perceptual background noise distortion and speaker similarity
title_full An automatic voice conversion evaluation strategy based on perceptual background noise distortion and speaker similarity
title_fullStr An automatic voice conversion evaluation strategy based on perceptual background noise distortion and speaker similarity
title_full_unstemmed An automatic voice conversion evaluation strategy based on perceptual background noise distortion and speaker similarity
title_sort automatic voice conversion evaluation strategy based on perceptual background noise distortion and speaker similarity
publishDate 2019
url https://hdl.handle.net/10356/89623
http://hdl.handle.net/10220/49691
_version_ 1681048436025065472