An automatic voice conversion evaluation strategy based on perceptual background noise distortion and speaker similarity

Voice conversion aims to modify the characteristics of one speaker to make it sound like spoken by another speaker without changing the language content. This task has attracted considerable attention and various approaches have been proposed since two decades ago. The evaluation of voice conversio...

Full description

Saved in:
Bibliographic Details
Main Authors: Huang, Dong-Yan, Xie, Lei, Zhang, Shaofei, Lee, Yvonne Siu Wa, Wu, Jie, Ming, Huaiping, Tian, Xiaohai, Ding, Chuang, Li, Mei, Nguyen, Quy Hy, Dong, Minghui, Chng, Eng Siong, Li, Haizhou
Other Authors: School of Computer Science and Engineering
Format: Conference or Workshop Item
Language:English
Published: 2019
Subjects:
Online Access:https://hdl.handle.net/10356/89623
http://hdl.handle.net/10220/49691
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Voice conversion aims to modify the characteristics of one speaker to make it sound like spoken by another speaker without changing the language content. This task has attracted considerable attention and various approaches have been proposed since two decades ago. The evaluation of voice conversion approaches, usually through time-intensive subject listening tests, requires a huge amount of human labor. This paper proposes an automatic voice conversion evaluation strategy based on perceptual background noise distortion and speaker similarity. Experimental results show that our automatic evaluation results match the subjective listening results quite well. We further use our strategy to select best converted samples from multiple voice conversion systems and our submission achieves promising results in the voice conversion challenge (VCC2016).