MACHINE SPEECH CHAIN WITH EMOTION RECOGNITION

Humans communicate with appropriate emotion in speech to convey appropriate meaning. Speech recognition and synthesis system must be able to understand and convey the appropriate emotions. To produce a good system, speech data with real emotions is needed. However, this type of data is difficult...

Full description

Saved in:

Bibliographic Details
Main Author:	Pradia Naufal, Akeyla
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/82051
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:82051
spelling	id-itb.:820512024-07-05T13:41:13ZMACHINE SPEECH CHAIN WITH EMOTION RECOGNITION Pradia Naufal, Akeyla Indonesia Final Project speech recognition, speech emotion recognition, machine speech chain, unpaired data INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/82051 Humans communicate with appropriate emotion in speech to convey appropriate meaning. Speech recognition and synthesis system must be able to understand and convey the appropriate emotions. To produce a good system, speech data with real emotions is needed. However, this type of data is difficult to obtain. Machine speech chains use unpaired data to continue speech recognition and speech synthesis training with models that are previously trained with paired data. As unpaired data is more abundant than paired data, machine speech chain could be used to recognize emotions in speech in which training data is difficult to obtain. This paper uses speech data with natural emotion and speech data with various emotions to measure the usage of the machine speech chain in speech emotion recognition and speech recognition from emotional speech. Character Error Rate (CER) is used in speech recognition evaluation and accuracy and F1 score are used in speech emotion recognition evaluation. It was found that the model trained with 50% of paired neutral emotion speech data and 22% of paired non-neutral emotional speech data had lower in CER from 37.552% to 34.523% when trained again with unpaired neutral emotion speech data and from 37.552% to 33.749% when trained again with combined unpaired speech data. Accuracy of non-neutral emotions experienced an increase of 2.18% to 53.51% but with a trend of worsened F1 score, ranging from a rise of 20.6% and a decrease of 23.4%. The values of these two metrics indicate that the model is biased towards the majority class. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Humans communicate with appropriate emotion in speech to convey appropriate meaning. Speech recognition and synthesis system must be able to understand and convey the appropriate emotions. To produce a good system, speech data with real emotions is needed. However, this type of data is difficult to obtain. Machine speech chains use unpaired data to continue speech recognition and speech synthesis training with models that are previously trained with paired data. As unpaired data is more abundant than paired data, machine speech chain could be used to recognize emotions in speech in which training data is difficult to obtain. This paper uses speech data with natural emotion and speech data with various emotions to measure the usage of the machine speech chain in speech emotion recognition and speech recognition from emotional speech. Character Error Rate (CER) is used in speech recognition evaluation and accuracy and F1 score are used in speech emotion recognition evaluation. It was found that the model trained with 50% of paired neutral emotion speech data and 22% of paired non-neutral emotional speech data had lower in CER from 37.552% to 34.523% when trained again with unpaired neutral emotion speech data and from 37.552% to 33.749% when trained again with combined unpaired speech data. Accuracy of non-neutral emotions experienced an increase of 2.18% to 53.51% but with a trend of worsened F1 score, ranging from a rise of 20.6% and a decrease of 23.4%. The values of these two metrics indicate that the model is biased towards the majority class.
format	Final Project
author	Pradia Naufal, Akeyla
spellingShingle	Pradia Naufal, Akeyla MACHINE SPEECH CHAIN WITH EMOTION RECOGNITION
author_facet	Pradia Naufal, Akeyla
author_sort	Pradia Naufal, Akeyla
title	MACHINE SPEECH CHAIN WITH EMOTION RECOGNITION
title_short	MACHINE SPEECH CHAIN WITH EMOTION RECOGNITION
title_full	MACHINE SPEECH CHAIN WITH EMOTION RECOGNITION
title_fullStr	MACHINE SPEECH CHAIN WITH EMOTION RECOGNITION
title_full_unstemmed	MACHINE SPEECH CHAIN WITH EMOTION RECOGNITION
title_sort	machine speech chain with emotion recognition
url	https://digilib.itb.ac.id/gdl/view/82051
_version_	1822997537302249472

MACHINE SPEECH CHAIN WITH EMOTION RECOGNITION

Similar Items