Real-time multimodal affect recognition in laughter episodes
Emotion recognition is a widely studied subject, due to its importance in human interaction and decision making. The recognition of emotion in laughter is particularly important as laughter can identify non-basic affective states such as distress, anxiety, and boredom. Existing systems are unable to...
Saved in:
Main Author: | |
---|---|
Format: | text |
Language: | English |
Published: |
Animo Repository
2013
|
Subjects: | |
Online Access: | https://animorepository.dlsu.edu.ph/etd_masteral/4383 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | De La Salle University |
Language: | English |
id |
oai:animorepository.dlsu.edu.ph:etd_masteral-11221 |
---|---|
record_format |
eprints |
spelling |
oai:animorepository.dlsu.edu.ph:etd_masteral-112212021-01-18T07:45:54Z Real-time multimodal affect recognition in laughter episodes Santos, Jose Miguel Emotion recognition is a widely studied subject, due to its importance in human interaction and decision making. The recognition of emotion in laughter is particularly important as laughter can identify non-basic affective states such as distress, anxiety, and boredom. Existing systems are unable to classify the emotion of laughter in real-time, however. This research proposes a real-time multimodal affect recognition system for laughter episodes, using facial expressions and voiced laughter as modalities. The system takes a video stream as input. The video stream can be either a web camera with a microphone attached for audio, or a video file. As laughter take place over a period of time, rather than frame-by-frame, the system will segment the stream into different windows of 1.62 seconds in length. Within the window, image and audio data are extracted, and the AUs in the apex of the window are detected. At the end of each window, the pitch and MFCC values of the audio data collected within the window are computed, and decision-level fusion is applied to the audio and face features. The resulting features are then be passed to the emotion recognition model, which then produces the final valence and arousal values of the window. The emotion recognition model was able to achieve a correlation coefficient of 0.68 for valence and 0.61 for arousal using the Semaine corpus, and 0.75 for valence and 0.83 for arousal using the Pinoy Laughter 2 corpus. The overhead for the whole emotion recognition process is 610.98 ms, however the overhead will be hard to completely eliminate due to the high number of processes required to perform emotion recognition. 2013-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/etd_masteral/4383 Master's Theses English Animo Repository Emotion recognition Laughter |
institution |
De La Salle University |
building |
De La Salle University Library |
continent |
Asia |
country |
Philippines Philippines |
content_provider |
De La Salle University Library |
collection |
DLSU Institutional Repository |
language |
English |
topic |
Emotion recognition Laughter |
spellingShingle |
Emotion recognition Laughter Santos, Jose Miguel Real-time multimodal affect recognition in laughter episodes |
description |
Emotion recognition is a widely studied subject, due to its importance in human interaction and decision making. The recognition of emotion in laughter is particularly important as laughter can identify non-basic affective states such as distress, anxiety, and boredom. Existing systems are unable to classify the emotion of laughter in real-time, however. This research proposes a real-time multimodal affect recognition system for laughter episodes, using facial expressions and voiced laughter as modalities.
The system takes a video stream as input. The video stream can be either a web camera with a microphone attached for audio, or a video file. As laughter take place over a period of time, rather than frame-by-frame, the system will segment the stream into different windows of 1.62 seconds in length. Within the window, image and audio data are extracted, and the AUs in the apex of the window are detected. At the end of each window, the pitch and MFCC values of the audio data collected within the window are computed, and decision-level fusion is applied to the audio and face features. The resulting features are then be passed to the emotion recognition model, which then produces the final valence and arousal values of the window.
The emotion recognition model was able to achieve a correlation coefficient of 0.68 for valence and 0.61 for arousal using the Semaine corpus, and 0.75 for valence and 0.83 for arousal using the Pinoy Laughter 2 corpus. The overhead for the whole emotion recognition process is 610.98 ms, however the overhead will be hard to completely eliminate due to the high number of processes required to perform emotion recognition. |
format |
text |
author |
Santos, Jose Miguel |
author_facet |
Santos, Jose Miguel |
author_sort |
Santos, Jose Miguel |
title |
Real-time multimodal affect recognition in laughter episodes |
title_short |
Real-time multimodal affect recognition in laughter episodes |
title_full |
Real-time multimodal affect recognition in laughter episodes |
title_fullStr |
Real-time multimodal affect recognition in laughter episodes |
title_full_unstemmed |
Real-time multimodal affect recognition in laughter episodes |
title_sort |
real-time multimodal affect recognition in laughter episodes |
publisher |
Animo Repository |
publishDate |
2013 |
url |
https://animorepository.dlsu.edu.ph/etd_masteral/4383 |
_version_ |
1772834470449119232 |