MODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE

Information conveyed through spoken language is not only conveyed through words, but also through fundamental frequency (pitch), intensity (volume), speaking rate and rhythm, and timbre, collectively known as prosody. A neural network-based model is developed to create an emotion classification syst...

Full description

Saved in:

Bibliographic Details
Main Author:	Mirza Fathan Al Arsyad, M
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/72129
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:72129
spelling	id-itb.:721292023-03-06T09:11:18ZMODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE Mirza Fathan Al Arsyad, M Indonesia Final Project emotion, speech, prosody, neural network. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/72129 Information conveyed through spoken language is not only conveyed through words, but also through fundamental frequency (pitch), intensity (volume), speaking rate and rhythm, and timbre, collectively known as prosody. A neural network-based model is developed to create an emotion classification system for Indonesian speech. Prosody as a feature in speech has long been a subject of research in various places. There have been many studies on modeling prosodic features for various purposes, such as automatic speech recognition, emotion identification, dialogue act classification, and many other studies that explore prosodic features in conversation. To build an emotion recognition model, a corpus containing various segmented Indonesian language speeches from various audio sources is used. Then feature extraction is performed on the corpus using feature sets such as eGeMAPS and INTERSPEECH 2009. A neural network-based model is then created using the extracted features. Prosodic features are used to train the system, and experimental results show an f-measure value of 0.39 for the system using the eGeMAPS feature set, which is used as the baseline for the study. The experiment then yielded a highest result of 0.568 using the eGeMAPS feature set which was reduced to 14 prosodic feature sets, and the model was built using SMOTE optimization to handle class imbalance in the dataset. Various other neural network optimization methods were also explored in this study, such as dropout layer, early stopping, and principal component analysis as an effort to reduce the dataset dimension. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Information conveyed through spoken language is not only conveyed through words, but also through fundamental frequency (pitch), intensity (volume), speaking rate and rhythm, and timbre, collectively known as prosody. A neural network-based model is developed to create an emotion classification system for Indonesian speech. Prosody as a feature in speech has long been a subject of research in various places. There have been many studies on modeling prosodic features for various purposes, such as automatic speech recognition, emotion identification, dialogue act classification, and many other studies that explore prosodic features in conversation. To build an emotion recognition model, a corpus containing various segmented Indonesian language speeches from various audio sources is used. Then feature extraction is performed on the corpus using feature sets such as eGeMAPS and INTERSPEECH 2009. A neural network-based model is then created using the extracted features. Prosodic features are used to train the system, and experimental results show an f-measure value of 0.39 for the system using the eGeMAPS feature set, which is used as the baseline for the study. The experiment then yielded a highest result of 0.568 using the eGeMAPS feature set which was reduced to 14 prosodic feature sets, and the model was built using SMOTE optimization to handle class imbalance in the dataset. Various other neural network optimization methods were also explored in this study, such as dropout layer, early stopping, and principal component analysis as an effort to reduce the dataset dimension.
format	Final Project
author	Mirza Fathan Al Arsyad, M
spellingShingle	Mirza Fathan Al Arsyad, M MODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE
author_facet	Mirza Fathan Al Arsyad, M
author_sort	Mirza Fathan Al Arsyad, M
title	MODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE
title_short	MODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE
title_full	MODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE
title_fullStr	MODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE
title_full_unstemmed	MODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE
title_sort	modeling prosodic feature for emotion recognition in indonesian spoken language
url	https://digilib.itb.ac.id/gdl/view/72129
_version_	1822992440969134080

MODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE

Similar Items