Robust representation and recognition of facial emotions

Facial Emotion detection under natural conditions is an interesting topic with a wide range of potential applications like human-computer interaction. Although there is significant research progress in this field, there are still challenges related to real-world unconstrained situations. One essenti...

Full description

Saved in:
Bibliographic Details
Main Author: Shojaeilangari, Seyedehsamaneh
Other Authors: Teoh Eam Khwang
Format: Theses and Dissertations
Language:English
Published: 2015
Subjects:
Online Access:https://hdl.handle.net/10356/62922
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-62922
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Electrical and electronic engineering::Control and instrumentation
spellingShingle DRNTU::Engineering::Electrical and electronic engineering::Control and instrumentation
Shojaeilangari, Seyedehsamaneh
Robust representation and recognition of facial emotions
description Facial Emotion detection under natural conditions is an interesting topic with a wide range of potential applications like human-computer interaction. Although there is significant research progress in this field, there are still challenges related to real-world unconstrained situations. One essential challenge is to find pose invariant spatio-temporal volumetric features to analyze the video sequence efficiently. Another important issue is how to deal with noisy and imperfect data recorded in uncontrolled environments such as illumination variations, partial occlusion, and head movements. The focus of this research is to develop a robust system for facial expression recognition as a dynamic event in natural situations. Two strategies have been proposed in this research to address the uncontrolled environments related challenges: Robust representation framework: we propose a novel spatio-temporal descriptor based on Optical Flow (OF) components which is very distinctive and also pose-invariant.  Robust recognition framework: we explored the effectiveness of sparse representation obtained by supervised learning a set of basis (dictionary). Extreme Sparse Learning (ESL) is proposed to jointly learn a dictionary and a nonlinear classification model to robustly detect the facial expression in real-world natural situations. The proposed approach combines the discriminative power of the Extreme Learning Machine (ELM) with the reconstruction property of the sparse representation to deal with noisy signal and imperfect data recorded in natural settings. Since the facial feature extraction performance is highly dependent on facial pose, we propose a novel spatio-temporal descriptor which is robust to facial pose variations. However, the feature encoding may fail in the presence of extreme head pose variations, where some parts of the face are not visible in the recorded images. To address this problem and also dealing with illumination variations and occlusion, we suggested following the idea of sparse representation where the noisy data can be reconstructed from the clean data provided by the dictionary of the sparse representation. While the sparse representation approach has the ability to enhance noisy data using a dictionary learned from clean data, it is not sufficient because the end goal is to correctly recognize the facial expression. In a sparse-representation-based classification task, the desired dictionary should have both representational ability and discriminative power. Since separating the classification training from dictionary learning may cause the learned dictionary to be sub-optimal for the classification task, we propose to jointly learn a dictionary and classification model. In other words, in contrast with most existing schemes that attempt to update the dictionary and classifier parameters alternately by iteratively solving each sub-problem, we propose to solve them simultaneously. This joint dictionary learning and classifier training can be expected to result in a dictionary that is both reconstructive and discriminative for a robust recognition system. To the best of our knowledge, this is the only work that attempts to simultaneously learn the sparse representation of the signal and train a nonlinear classifier to be discriminative for sparse codes. The proposed method jointly learns a single dictionary and also an optimal nonlinear classifier. We have performed extensive experiments on both acted and spontaneous emotion databases to evaluate the effectiveness of the proposed feature extraction and classification schemes under different scenarios. Our results clearly demonstrate the robustness of the proposed emotion recognition framework, especially in challenging scenarios that involve illumination changes, occlusion, and head pose variations.
author2 Teoh Eam Khwang
author_facet Teoh Eam Khwang
Shojaeilangari, Seyedehsamaneh
format Theses and Dissertations
author Shojaeilangari, Seyedehsamaneh
author_sort Shojaeilangari, Seyedehsamaneh
title Robust representation and recognition of facial emotions
title_short Robust representation and recognition of facial emotions
title_full Robust representation and recognition of facial emotions
title_fullStr Robust representation and recognition of facial emotions
title_full_unstemmed Robust representation and recognition of facial emotions
title_sort robust representation and recognition of facial emotions
publishDate 2015
url https://hdl.handle.net/10356/62922
_version_ 1772827412631912448
spelling sg-ntu-dr.10356-629222023-07-04T16:31:49Z Robust representation and recognition of facial emotions Shojaeilangari, Seyedehsamaneh Teoh Eam Khwang Yau Wei Yun School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering::Control and instrumentation Facial Emotion detection under natural conditions is an interesting topic with a wide range of potential applications like human-computer interaction. Although there is significant research progress in this field, there are still challenges related to real-world unconstrained situations. One essential challenge is to find pose invariant spatio-temporal volumetric features to analyze the video sequence efficiently. Another important issue is how to deal with noisy and imperfect data recorded in uncontrolled environments such as illumination variations, partial occlusion, and head movements. The focus of this research is to develop a robust system for facial expression recognition as a dynamic event in natural situations. Two strategies have been proposed in this research to address the uncontrolled environments related challenges: Robust representation framework: we propose a novel spatio-temporal descriptor based on Optical Flow (OF) components which is very distinctive and also pose-invariant.  Robust recognition framework: we explored the effectiveness of sparse representation obtained by supervised learning a set of basis (dictionary). Extreme Sparse Learning (ESL) is proposed to jointly learn a dictionary and a nonlinear classification model to robustly detect the facial expression in real-world natural situations. The proposed approach combines the discriminative power of the Extreme Learning Machine (ELM) with the reconstruction property of the sparse representation to deal with noisy signal and imperfect data recorded in natural settings. Since the facial feature extraction performance is highly dependent on facial pose, we propose a novel spatio-temporal descriptor which is robust to facial pose variations. However, the feature encoding may fail in the presence of extreme head pose variations, where some parts of the face are not visible in the recorded images. To address this problem and also dealing with illumination variations and occlusion, we suggested following the idea of sparse representation where the noisy data can be reconstructed from the clean data provided by the dictionary of the sparse representation. While the sparse representation approach has the ability to enhance noisy data using a dictionary learned from clean data, it is not sufficient because the end goal is to correctly recognize the facial expression. In a sparse-representation-based classification task, the desired dictionary should have both representational ability and discriminative power. Since separating the classification training from dictionary learning may cause the learned dictionary to be sub-optimal for the classification task, we propose to jointly learn a dictionary and classification model. In other words, in contrast with most existing schemes that attempt to update the dictionary and classifier parameters alternately by iteratively solving each sub-problem, we propose to solve them simultaneously. This joint dictionary learning and classifier training can be expected to result in a dictionary that is both reconstructive and discriminative for a robust recognition system. To the best of our knowledge, this is the only work that attempts to simultaneously learn the sparse representation of the signal and train a nonlinear classifier to be discriminative for sparse codes. The proposed method jointly learns a single dictionary and also an optimal nonlinear classifier. We have performed extensive experiments on both acted and spontaneous emotion databases to evaluate the effectiveness of the proposed feature extraction and classification schemes under different scenarios. Our results clearly demonstrate the robustness of the proposed emotion recognition framework, especially in challenging scenarios that involve illumination changes, occlusion, and head pose variations. DOCTOR OF PHILOSOPHY (EEE) 2015-05-04T02:18:13Z 2015-05-04T02:18:13Z 2014 2014 Thesis Shojaeilangari, S. (2014). Robust representation and recognition of facial emotions. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/62922 10.32657/10356/62922 en 139 p. application/pdf