Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition

The complete acoustic features include magnitude and phase information. However, traditional speech emotion recognition methods only focus on the magnitude information and ignore the phase data, and will inevitably miss some information. This study explores the accurate extraction and effective use...

Full description

Saved in:

Bibliographic Details
Main Authors:	Guo, Lili, Wang, Longbiao, Dang, Jianwu, Chng, Eng Siong, Nakagawa, Seiichi
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2022
Subjects:	Engineering::Computer science and engineering Speech Emotion Recognition Magnitude Spectrogram
Online Access:	https://hdl.handle.net/10356/162646
Tags:	Add Tag No Tags, Be the first to tag this record!

id	sg-ntu-dr.10356-162646
record_format	dspace
spelling	sg-ntu-dr.10356-1626462022-11-02T01:23:25Z Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition Guo, Lili Wang, Longbiao Dang, Jianwu Chng, Eng Siong Nakagawa, Seiichi School of Computer Science and Engineering Engineering::Computer science and engineering Speech Emotion Recognition Magnitude Spectrogram The complete acoustic features include magnitude and phase information. However, traditional speech emotion recognition methods only focus on the magnitude information and ignore the phase data, and will inevitably miss some information. This study explores the accurate extraction and effective use of phase features for speech emotion recognition. First, the reflection of speech emotion in the phase spectrum is analyzed, and a quantitative analysis shows that phase data contain information that can be used to distinguish emotions. A dynamic relative phase (DRP) feature extraction method is then proposed to solve the problem in which the original relative phase (RP) has difficulty determining the base frequency and further alleviating the dependence of the phase on the clipping position of the frame. Finally, a single-channel model (SCM) and a multi-channel model with an attention mechanism (MCMA) are constructed to effectively integrate the phase and magnitude information. By introducing phase information, more complete acoustic features are captured, which enriches the emotional representations. The experiments were conducted using the Emo-DB and IEMOCAP databases. Experimental results demonstrate the effectiveness of the proposed DRP for speech emotion recognition, as well as the complementarity between the phase and magnitude information in speech emotion recognition. This work was supported by the National Key R&D Program of China (Grant NO. 2018YFB1305200), by the National Natural Science Foundation of China (Grant NO. 61771333), and by the Tianjin Municipal Science and Technology Project, China (Grant NO.18ZXZNGX00330). Additionally, we would like to acknowledge the financial support provided by the China Scholarship Council (NO. 201906250176) during a visit of Lili Guo to Nanyang Technological University. 2022-11-02T01:23:24Z 2022-11-02T01:23:24Z 2022 Journal Article Guo, L., Wang, L., Dang, J., Chng, E. S. & Nakagawa, S. (2022). Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition. Speech Communication, 136, 118-127. https://dx.doi.org/10.1016/j.specom.2021.11.005 0167-6393 https://hdl.handle.net/10356/162646 10.1016/j.specom.2021.11.005 2-s2.0-85121962474 136 118 127 en Speech Communication © 2021 Elsevier B.V. All rights reserved.
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering Speech Emotion Recognition Magnitude Spectrogram
spellingShingle	Engineering::Computer science and engineering Speech Emotion Recognition Magnitude Spectrogram Guo, Lili Wang, Longbiao Dang, Jianwu Chng, Eng Siong Nakagawa, Seiichi Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition
description	The complete acoustic features include magnitude and phase information. However, traditional speech emotion recognition methods only focus on the magnitude information and ignore the phase data, and will inevitably miss some information. This study explores the accurate extraction and effective use of phase features for speech emotion recognition. First, the reflection of speech emotion in the phase spectrum is analyzed, and a quantitative analysis shows that phase data contain information that can be used to distinguish emotions. A dynamic relative phase (DRP) feature extraction method is then proposed to solve the problem in which the original relative phase (RP) has difficulty determining the base frequency and further alleviating the dependence of the phase on the clipping position of the frame. Finally, a single-channel model (SCM) and a multi-channel model with an attention mechanism (MCMA) are constructed to effectively integrate the phase and magnitude information. By introducing phase information, more complete acoustic features are captured, which enriches the emotional representations. The experiments were conducted using the Emo-DB and IEMOCAP databases. Experimental results demonstrate the effectiveness of the proposed DRP for speech emotion recognition, as well as the complementarity between the phase and magnitude information in speech emotion recognition.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Guo, Lili Wang, Longbiao Dang, Jianwu Chng, Eng Siong Nakagawa, Seiichi
format	Article
author	Guo, Lili Wang, Longbiao Dang, Jianwu Chng, Eng Siong Nakagawa, Seiichi
author_sort	Guo, Lili
title	Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition
title_short	Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition
title_full	Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition
title_fullStr	Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition
title_full_unstemmed	Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition
title_sort	learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition
publishDate	2022
url	https://hdl.handle.net/10356/162646
_version_	1749179218203246592

Learning affective representations based on magnitude and dynamic relative phase information for speech emotion recognition

Similar Items