MACHINE SPEECH CHAIN WITH EMOTION RECOGNITION

Humans communicate with appropriate emotion in speech to convey appropriate meaning. Speech recognition and synthesis system must be able to understand and convey the appropriate emotions. To produce a good system, speech data with real emotions is needed. However, this type of data is difficult...

Full description

Saved in:
Bibliographic Details
Main Author: Pradia Naufal, Akeyla
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/82051
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Humans communicate with appropriate emotion in speech to convey appropriate meaning. Speech recognition and synthesis system must be able to understand and convey the appropriate emotions. To produce a good system, speech data with real emotions is needed. However, this type of data is difficult to obtain. Machine speech chains use unpaired data to continue speech recognition and speech synthesis training with models that are previously trained with paired data. As unpaired data is more abundant than paired data, machine speech chain could be used to recognize emotions in speech in which training data is difficult to obtain. This paper uses speech data with natural emotion and speech data with various emotions to measure the usage of the machine speech chain in speech emotion recognition and speech recognition from emotional speech. Character Error Rate (CER) is used in speech recognition evaluation and accuracy and F1 score are used in speech emotion recognition evaluation. It was found that the model trained with 50% of paired neutral emotion speech data and 22% of paired non-neutral emotional speech data had lower in CER from 37.552% to 34.523% when trained again with unpaired neutral emotion speech data and from 37.552% to 33.749% when trained again with combined unpaired speech data. Accuracy of non-neutral emotions experienced an increase of 2.18% to 53.51% but with a trend of worsened F1 score, ranging from a rise of 20.6% and a decrease of 23.4%. The values of these two metrics indicate that the model is biased towards the majority class.