Speech emotion recognition using deep neural networks on multilingual databases

The research community's ever-increasing interest in studying human-computer interactions (HCI), systems deducing, and identifying a speech signal's emotional aspects has emerged as a hot research topic. Speech Emotion Recognition (SER) has brought the development of automated and intellig...

Full description

Saved in:
Bibliographic Details
Main Authors: Ahmad Qadri, Syed Asif, Gunawan, Teddy Surya, Wani, Taiba Majid, Ambikairajah, Eliathamby, Kartiwi, Mira, Ihsanto, Eko
Format: Book Chapter
Language:English
English
English
Published: Springer 2021
Subjects:
Online Access:http://irep.iium.edu.my/88878/1/Paper_110.pdf
http://irep.iium.edu.my/88878/7/88878_Speech%20emotion%20recognition.pdf
http://irep.iium.edu.my/88878/13/88878_Speech%20emotion%20recognition%20using%20deep%20neural_SCOPUS.pdf
http://irep.iium.edu.my/88878/
https://link.springer.com/book/10.1007%2F978-3-030-70917-4
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Islam Antarabangsa Malaysia
Language: English
English
English
Description
Summary:The research community's ever-increasing interest in studying human-computer interactions (HCI), systems deducing, and identifying a speech signal's emotional aspects has emerged as a hot research topic. Speech Emotion Recognition (SER) has brought the development of automated and intelligent analysis of human ut-terances to reality. Typically, an SER system focuses on extracting the features from speech signals such as pitch frequency, formant features, energy-related and spectral features, tailing it with a classification quest to understand the underlying emotion. The key issues pivotal for a successful SER system are driven by the proper selection of proper emotional feature extraction techniques. In this paper, Mel-frequency Cepstral Coefficient (MFCC) and Teager Energy Operator (TEO) along with a new proposed Feature Fusion of MFCC and TEO referred to as Teager-MFCC (TMFCC) is examined over a multilingual database consisting of English, German and Hindi languages. Deep Neural Networks have been used to classify the different emotions considered, happy, sad, angry, and neutral. Eval-uation results show that the proposed fusion TMFCC with a recognition rate of 92.7% outperforms TEO and MFCC. With TEO and MFCC configurations, the recognition rate has been found as 88.5% and 90.0%, respectively.