Emotion analysis from speech

Speech is the first form of communication that humans instinctively use without thought and most times, our emotions are expressed though them. Emotion in speech helps us in forming interpersonal connections. The process of producing emotions in speech comes from specific acoustic patterns. Speech...

Full description

Saved in:

Bibliographic Details
Main Author:	Mus'ifah Amran
Other Authors:	Chng Eng Siong
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2021
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Online Access:	https://hdl.handle.net/10356/153198
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-153198
record_format	dspace
spelling	sg-ntu-dr.10356-1531982021-11-16T05:15:00Z Emotion analysis from speech Mus'ifah Amran Chng Eng Siong School of Computer Science and Engineering ASESChng@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Speech is the first form of communication that humans instinctively use without thought and most times, our emotions are expressed though them. Emotion in speech helps us in forming interpersonal connections. The process of producing emotions in speech comes from specific acoustic patterns. Speech emotion recognition systems extract those acoustic features to identify emotions in utterances and analyse the link between those acoustic features and their respective emotions. There are different techniques to perform speech emotion recognition such as using deep neural networks, Hidden Markov models and many more. In this report, we focus on the deep learning techniques to infer emotion from speech with models from an existing work by approaching it as an image classification problem. We focus on three networks, specifically AlexNet, Fully Convolutional Network with Global Average Pooling and Residual Network. As the prior two networks have been trained with the IEMOCAP corpus, ResNet is also trained to compare the models’ performance. The three models are then trained again on a down sampled IEMOCAP corpus and the THAI SER corpus. The models were evaluated using k-fold cross validation in line with publications using the same approach. The models from Ng [1] are used a benchmark for ResNet model implemented here. From the experiments conducted, no single model achieved high accuracies with the different corpus. Stability Training implemented from [1] was updated with tuning of α-parameter and the addition of environment noises. From the three models, Fully Convolutional Network achieved a 0.9% increase in accuracy from its result in [1]. It surpassed the benchmark accuracy of AlexNet by 0.2%. Bachelor of Engineering (Computer Science) 2021-11-16T03:26:08Z 2021-11-16T03:26:08Z 2021 Final Year Project (FYP) Mus'ifah Amran (2021). Emotion analysis from speech. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/153198 https://hdl.handle.net/10356/153198 en application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Mus'ifah Amran Emotion analysis from speech
description	Speech is the first form of communication that humans instinctively use without thought and most times, our emotions are expressed though them. Emotion in speech helps us in forming interpersonal connections. The process of producing emotions in speech comes from specific acoustic patterns. Speech emotion recognition systems extract those acoustic features to identify emotions in utterances and analyse the link between those acoustic features and their respective emotions. There are different techniques to perform speech emotion recognition such as using deep neural networks, Hidden Markov models and many more. In this report, we focus on the deep learning techniques to infer emotion from speech with models from an existing work by approaching it as an image classification problem. We focus on three networks, specifically AlexNet, Fully Convolutional Network with Global Average Pooling and Residual Network. As the prior two networks have been trained with the IEMOCAP corpus, ResNet is also trained to compare the models’ performance. The three models are then trained again on a down sampled IEMOCAP corpus and the THAI SER corpus. The models were evaluated using k-fold cross validation in line with publications using the same approach. The models from Ng [1] are used a benchmark for ResNet model implemented here. From the experiments conducted, no single model achieved high accuracies with the different corpus. Stability Training implemented from [1] was updated with tuning of α-parameter and the addition of environment noises. From the three models, Fully Convolutional Network achieved a 0.9% increase in accuracy from its result in [1]. It surpassed the benchmark accuracy of AlexNet by 0.2%.
author2	Chng Eng Siong
author_facet	Chng Eng Siong Mus'ifah Amran
format	Final Year Project
author	Mus'ifah Amran
author_sort	Mus'ifah Amran
title	Emotion analysis from speech
title_short	Emotion analysis from speech
title_full	Emotion analysis from speech
title_fullStr	Emotion analysis from speech
title_full_unstemmed	Emotion analysis from speech
title_sort	emotion analysis from speech
publisher	Nanyang Technological University
publishDate	2021
url	https://hdl.handle.net/10356/153198
_version_	1718368031460032512

Emotion analysis from speech

Similar Items