Mood recognition through facial expression and speech cues

Emotion interpretation is important to allow one to comprehend his or her environment. Knowing the emotional state of the surroundings can influence the decision-making process. Previous work has shown that both Facial Expressions and Vocal Data can be used to determine the emotions of a person. The...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Jonathan Tian-Ci
Other Authors: Seet Gim Lee, Gerald
Format: Final Year Project
Language:English
Published: 2014
Subjects:
Online Access:http://hdl.handle.net/10356/61007
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Emotion interpretation is important to allow one to comprehend his or her environment. Knowing the emotional state of the surroundings can influence the decision-making process. Previous work has shown that both Facial Expressions and Vocal Data can be used to determine the emotions of a person. The systems relied on a unimodal approach, which only focused on one aspect. However, the accuracy for emotion recognition was low, and there was a vital reliance on specific hardware. This project presents a multimodal approach for the recognition of five different emotions (Anger, Fear, Neutral, Happy, and Sad) that integrates the information from facial expressions and speech. Various studies showed that a combination of modes would improve the overall accuracy in classifying emotions, than compared to individual modes. Audio-Visual data was collected from six participants, in a controlled environment. The participants were given a fixed script to portray their emotions, and hence the data collected was of a “posed” emotion rather than a spontaneous one. The features (Facial Expressions, Voice Features) were manually extracted to facilitate Supervised Machine Learning. Various methods such as Principal Component Analysis and Support Vector Machine were used to classify the emotions. The project was coded using both C++ and MATLAB, and a working real time MATLAB program was implemented on both the Windows OS and Ubuntu OS. Through training and optimization, the speech features produced a result of 94.3% whereas the facial expressions produced a perfect score of 100% accuracy. The combined data produced an emotion classifier with 100% accuracy. The emotion classifiers were Speaker-Dependant, where the training and evaluation data collected was from the same group of speakers. The findings for the project suggest that a multimodal system does not necessarily provide results that are more accurate. Alternatively, more channels of feedback within the same mode would provide a more accurate model to distinguish emotions.