Speech emotion recognition using WaveNet

Speech emotion recognition is known to be a challenging and complex task for machine learning models. Two challenges that are faced when doing speech emotion recognition are 1) human emotions are hard to distinguished and 2) detection of emotion could only be captured at specific moments in an utter...

Full description

Saved in:
Bibliographic Details
Main Author: Nurul Sabrina Mohammed Riduwan
Other Authors: Jagath C Rajapakse
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/156592
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-156592
record_format dspace
spelling sg-ntu-dr.10356-1565922022-04-21T00:28:52Z Speech emotion recognition using WaveNet Nurul Sabrina Mohammed Riduwan Jagath C Rajapakse School of Computer Science and Engineering ASJagath@ntu.edu.sg Engineering::Computer science and engineering Speech emotion recognition is known to be a challenging and complex task for machine learning models. Two challenges that are faced when doing speech emotion recognition are 1) human emotions are hard to distinguished and 2) detection of emotion could only be captured at specific moments in an utterance. Hereby, this paper proposes a Speech Emotion Recognition (SER) architecture inspired by WaveNet architecture. This architecture does not rely neither on tedious pre-processing nor the recurrent layers. The novelty of our approach uses both speech waveforms and audio features as inputs, usage on casual dilated convolutions for capturing temporal dependencies and the use of self-attention mechanism. Self-attention permit inputs to interact with each other to pay close attention on the valuable parts of the input to learn the connection between them. We illustrate improved performances SER with our model on EMO-DB datasets over the existing base-line models. Index Term: speech emotion recognition, self-attention, deep learning, computational paralinguistics Bachelor of Engineering (Computer Science) 2022-04-21T00:28:52Z 2022-04-21T00:28:52Z 2022 Final Year Project (FYP) Nurul Sabrina Mohammed Riduwan (2022). Speech emotion recognition using WaveNet. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/156592 https://hdl.handle.net/10356/156592 en SCSE21-0421 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Nurul Sabrina Mohammed Riduwan
Speech emotion recognition using WaveNet
description Speech emotion recognition is known to be a challenging and complex task for machine learning models. Two challenges that are faced when doing speech emotion recognition are 1) human emotions are hard to distinguished and 2) detection of emotion could only be captured at specific moments in an utterance. Hereby, this paper proposes a Speech Emotion Recognition (SER) architecture inspired by WaveNet architecture. This architecture does not rely neither on tedious pre-processing nor the recurrent layers. The novelty of our approach uses both speech waveforms and audio features as inputs, usage on casual dilated convolutions for capturing temporal dependencies and the use of self-attention mechanism. Self-attention permit inputs to interact with each other to pay close attention on the valuable parts of the input to learn the connection between them. We illustrate improved performances SER with our model on EMO-DB datasets over the existing base-line models. Index Term: speech emotion recognition, self-attention, deep learning, computational paralinguistics
author2 Jagath C Rajapakse
author_facet Jagath C Rajapakse
Nurul Sabrina Mohammed Riduwan
format Final Year Project
author Nurul Sabrina Mohammed Riduwan
author_sort Nurul Sabrina Mohammed Riduwan
title Speech emotion recognition using WaveNet
title_short Speech emotion recognition using WaveNet
title_full Speech emotion recognition using WaveNet
title_fullStr Speech emotion recognition using WaveNet
title_full_unstemmed Speech emotion recognition using WaveNet
title_sort speech emotion recognition using wavenet
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/156592
_version_ 1731235748223385600