Emotion recognition in Filipino speech: EMOTICON

Accurate recognition of emotions in a given speech has a great benefit in the speech interfaces between human and computers. It adds to the appeal of electronic systems by contributing to the user's perception of the system's intelligence and adaptability. However, feature extraction and a...

Full description

Saved in:
Bibliographic Details
Main Authors: Chua, Joan L., De Guia, Oliver S., Li, Carlson, C., Rojas, Joanna Fatima B.
Format: text
Language:English
Published: Animo Repository 2009
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etd_bachelors/14627
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
Description
Summary:Accurate recognition of emotions in a given speech has a great benefit in the speech interfaces between human and computers. It adds to the appeal of electronic systems by contributing to the user's perception of the system's intelligence and adaptability. However, feature extraction and algorithms are still disputed issues for the recognition of emotions and existing systems are having issues in terms of accuracy when applied with other languages such as the Filipino language. This paper proposes a system capable of recognizing different emotional states based on the Filipino language utterances. The system identifies acoustic features that correlate to attain the following emotional states: happiness, sadness, anger, fear, surprise, disgust and neutral. Algorithms of existing emotion recognition systems were used as guide to determine the appropriate algorithms and features that should be used to yield higher accuracy. The emotional classifier was implemented using linear search to locate the K-nearest neighbors. This classifier worked by getting the Euclidean distances between two feature vectors and classifying the input's emotion based on its nearest neighbors. The system extracted a minimal acoustic feature set that uniquely identified each emotion. Pitch, energy, duration, and formants were the acoustic features extracted. Among these, pitch and energy were used as the minimal acoustic feature set based on the tests conducted. Using good quality speech samples and the minimal feature set, the system was able to produce a recognition accuracy of 40.12%.