Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition

The complete acoustic features include magnitude and phase information. However, traditional speech emotion recognition methods only focus on the magnitude information and ignore the phase data, and will inevitably miss some information. This study explores the accurate extraction and effective use...

全面介紹

Saved in:

書目詳細資料
Main Authors:	Guo, Lili, Wang, Longbiao, Dang, Jianwu, Chng, Eng Siong, Nakagawa, Seiichi
其他作者:	School of Computer Science and Engineering
格式:	Article
語言:	English
出版:	2022
主題:	Engineering::Computer science and engineering Speech Emotion Recognition Magnitude Spectrogram
在線閱讀:	https://hdl.handle.net/10356/162646
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!
機構:	Nanyang Technological University
語言:	English

實物特徵
總結:	The complete acoustic features include magnitude and phase information. However, traditional speech emotion recognition methods only focus on the magnitude information and ignore the phase data, and will inevitably miss some information. This study explores the accurate extraction and effective use of phase features for speech emotion recognition. First, the reflection of speech emotion in the phase spectrum is analyzed, and a quantitative analysis shows that phase data contain information that can be used to distinguish emotions. A dynamic relative phase (DRP) feature extraction method is then proposed to solve the problem in which the original relative phase (RP) has difficulty determining the base frequency and further alleviating the dependence of the phase on the clipping position of the frame. Finally, a single-channel model (SCM) and a multi-channel model with an attention mechanism (MCMA) are constructed to effectively integrate the phase and magnitude information. By introducing phase information, more complete acoustic features are captured, which enriches the emotional representations. The experiments were conducted using the Emo-DB and IEMOCAP databases. Experimental results demonstrate the effectiveness of the proposed DRP for speech emotion recognition, as well as the complementarity between the phase and magnitude information in speech emotion recognition.

Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition

相似書籍