Mood recognition through vocal prosody recognition

Mood recognition through vocal prosody recognition is designed to predict human‟s mood through speeches profiles. There are existing applications in vocal prosody such as Microsoft‟s “Speech to Text”, IOS‟s “Siri” and Andriod‟s “S Voice” that they are executing actions which are ordered by users but...

Full description

Saved in:
Bibliographic Details
Main Author: Wong, Yi Ben.
Other Authors: Seet Gim Lee, Gerald
Format: Final Year Project
Language:English
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/10356/53290
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Mood recognition through vocal prosody recognition is designed to predict human‟s mood through speeches profiles. There are existing applications in vocal prosody such as Microsoft‟s “Speech to Text”, IOS‟s “Siri” and Andriod‟s “S Voice” that they are executing actions which are ordered by users but they are not relate to mood recognition function. Therefore, this project seeks to develop a software package in mood recognition through human‟s speeches. Speaker-Dependent and Speaker- Independent mode were investigated to develop Real-Time Emotion Recognition System. Speeches database were collected and studied to improve the emotion recognition system since speeches database is one of the factors to define the quality of emotion recognition model. Besides, the process of handling speeches database was proposed to improve the accuracy and several experiments were completed for the improvement. Speeches database was reviewed by other users to prove the quality of speeches in term of expressing moods. Experimental results had shown that the Speaker-Dependent mode provides higher accuracy than Speaker-Independent mode as similar researches are found to support the findings. Besides, the number of emotion used in emotion recognition system does affect the accuracy in recognizing mood through speeches. Emotion- basis data division was found to have better accuracy instead of using Speaker- basis data division during the process of handling speeches database to train emotion recognition model. The human recognition on speeches database had shown it's less accurate to predict others‟ emotions under cross cultural background.