Spoofing detection from a feature representation perspective

Spoofing detection, which discriminates the spoofed speech from the natural speech, has gained much attention recently. Low-dimensional features that are used in speaker recognition/verification are also used in spoofing detection. Unfortunately, they don't capture sufficient information requir...

Full description

Saved in:
Bibliographic Details
Main Authors: Tian, Xiaohai, Wu, Zhizheg, Xiao, Xiong, Chng, Eng Siong, Li, Haizhou
Other Authors: School of Computer Science and Engineering
Format: Conference or Workshop Item
Language:English
Published: 2018
Subjects:
Online Access:https://hdl.handle.net/10356/89643
http://hdl.handle.net/10220/47063
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Spoofing detection, which discriminates the spoofed speech from the natural speech, has gained much attention recently. Low-dimensional features that are used in speaker recognition/verification are also used in spoofing detection. Unfortunately, they don't capture sufficient information required for spoofing detection. In this work, we investigate the use of high-dimensional features for spoofing detection, that maybe more sensitive to the artifacts in the spoofed speech. Six types of high-dimensional feature are employed. For each kind of feature, four different representations are extracted, i.e. the original high-dimensional feature, corresponding low-dimensional feature, the low- and the high-frequency regions of the original high-dimensional feature. Dynamic features are also calculated to assess the effectiveness of the temporal information to detect the artifacts across frames. A neural network-based classifier is adopted to handle the high-dimensional features. Experimental results on the standard ASVspoof 2015 corpus suggest that high-dimensional features and dynamic features are useful for spoofing attack detection. A fusion of them has been shown to achieve 0.0% the equal error rates for nine of ten attack types.