The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition
Human speech indirectly represents the mental state or emotion of others. The use of Artificial Intelligence (AI)-based techniques may bring revolution in this modern era by recognizing emotion from speech. In this study, we introduced a robust method for emotion recognition from human speech using...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Tech Science Press
2022
|
Subjects: | |
Online Access: | http://eprints.sunway.edu.my/2250/1/28.pdf http://eprints.sunway.edu.my/2250/ https://doi.org/10.32604/cmc.2023.031177 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Sunway University |
Language: | English |
id |
my.sunway.eprints.2250 |
---|---|
record_format |
eprints |
spelling |
my.sunway.eprints.22502023-06-16T01:38:32Z http://eprints.sunway.edu.my/2250/ The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition Uddin, Mohammad Amaz Chowdury, Mohammad Salah Uddin Khandaker, Mayeen Uddin * Tamam, Nissren Sulieman, Abdelmoneim BF Psychology Q Science (General) TA Engineering (General). Civil engineering (General) Human speech indirectly represents the mental state or emotion of others. The use of Artificial Intelligence (AI)-based techniques may bring revolution in this modern era by recognizing emotion from speech. In this study, we introduced a robust method for emotion recognition from human speech using a well-performed preprocessing technique together with the deep learning-based mixed model consisting of Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN). About 2800 audio files were extracted from the Toronto emotional speech set (TESS) database for this study. A high pass and Savitzky Golay Filter have been used to obtain noise-free as well as smooth audio data. A total of seven types of emotions; Angry, Disgust, Fear, Happy, Neutral, Pleasant-surprise, and Sad were used in this study. Energy, Fundamental frequency, and Mel Frequency Cepstral Coefficient (MFCC) have been used to extract the emotion features, and these features resulted in 97.5% accuracy in the mixed LSTM+CNN model. This mixed model is found to be performed better than the usual state-of-the-art models in emotion recognition from speech. It also indicates that this mixed model could be effectively utilized in advanced research dealing with sound processing. Tech Science Press 2022-09-22 Article PeerReviewed text en cc_by_4 http://eprints.sunway.edu.my/2250/1/28.pdf Uddin, Mohammad Amaz and Chowdury, Mohammad Salah Uddin and Khandaker, Mayeen Uddin * and Tamam, Nissren and Sulieman, Abdelmoneim (2022) The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition. Computers, Materials & Continua, 74 (1). pp. 1709-1722. ISSN 1546-2226 https://doi.org/10.32604/cmc.2023.031177 10.32604/cmc.2023.031177 |
institution |
Sunway University |
building |
Sunway Campus Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Sunway University |
content_source |
Sunway Institutional Repository |
url_provider |
http://eprints.sunway.edu.my/ |
language |
English |
topic |
BF Psychology Q Science (General) TA Engineering (General). Civil engineering (General) |
spellingShingle |
BF Psychology Q Science (General) TA Engineering (General). Civil engineering (General) Uddin, Mohammad Amaz Chowdury, Mohammad Salah Uddin Khandaker, Mayeen Uddin * Tamam, Nissren Sulieman, Abdelmoneim The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition |
description |
Human speech indirectly represents the mental state or emotion of others. The use of Artificial Intelligence (AI)-based techniques may bring revolution in this modern era by recognizing emotion from speech. In this study, we introduced a robust method for emotion recognition from human speech using a well-performed preprocessing technique together with the deep learning-based mixed model consisting of Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN). About 2800 audio files were extracted from the Toronto emotional speech set (TESS) database for this study. A high pass and Savitzky Golay Filter have been used to obtain noise-free as well as smooth audio data. A total of seven types of emotions; Angry, Disgust, Fear, Happy, Neutral, Pleasant-surprise, and Sad were used in this study. Energy, Fundamental frequency, and Mel Frequency Cepstral Coefficient (MFCC) have been used to extract the emotion features, and these features resulted in 97.5% accuracy in the mixed LSTM+CNN model. This mixed model is found to be performed better than the usual state-of-the-art models in emotion recognition from speech. It also indicates that this mixed model could be effectively utilized in advanced research dealing with sound processing. |
format |
Article |
author |
Uddin, Mohammad Amaz Chowdury, Mohammad Salah Uddin Khandaker, Mayeen Uddin * Tamam, Nissren Sulieman, Abdelmoneim |
author_facet |
Uddin, Mohammad Amaz Chowdury, Mohammad Salah Uddin Khandaker, Mayeen Uddin * Tamam, Nissren Sulieman, Abdelmoneim |
author_sort |
Uddin, Mohammad Amaz |
title |
The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition |
title_short |
The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition |
title_full |
The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition |
title_fullStr |
The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition |
title_full_unstemmed |
The Efficacy of Deep Learning-Based Mixed Model for Speech Emotion Recognition |
title_sort |
efficacy of deep learning-based mixed model for speech emotion recognition |
publisher |
Tech Science Press |
publishDate |
2022 |
url |
http://eprints.sunway.edu.my/2250/1/28.pdf http://eprints.sunway.edu.my/2250/ https://doi.org/10.32604/cmc.2023.031177 |
_version_ |
1769846265551519744 |