MACHINE SPEECH CHAIN WITH EMOTION RECOGNITION
Humans communicate with appropriate emotion in speech to convey appropriate meaning. Speech recognition and synthesis system must be able to understand and convey the appropriate emotions. To produce a good system, speech data with real emotions is needed. However, this type of data is difficult...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/82051 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Humans communicate with appropriate emotion in speech to convey appropriate
meaning. Speech recognition and synthesis system must be able to understand and
convey the appropriate emotions. To produce a good system, speech data with real
emotions is needed. However, this type of data is difficult to obtain.
Machine speech chains use unpaired data to continue speech recognition and speech
synthesis training with models that are previously trained with paired data. As
unpaired data is more abundant than paired data, machine speech chain could be used
to recognize emotions in speech in which training data is difficult to obtain. This paper
uses speech data with natural emotion and speech data with various emotions to
measure the usage of the machine speech chain in speech emotion recognition and
speech recognition from emotional speech. Character Error Rate (CER) is used in
speech recognition evaluation and accuracy and F1 score are used in speech emotion
recognition evaluation.
It was found that the model trained with 50% of paired neutral emotion speech data
and 22% of paired non-neutral emotional speech data had lower in CER from
37.552% to 34.523% when trained again with unpaired neutral emotion speech data
and from 37.552% to 33.749% when trained again with combined unpaired speech
data. Accuracy of non-neutral emotions experienced an increase of 2.18% to 53.51%
but with a trend of worsened F1 score, ranging from a rise of 20.6% and a decrease of
23.4%. The values of these two metrics indicate that the model is biased towards the
majority class. |
---|