MODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE

Information conveyed through spoken language is not only conveyed through words, but also through fundamental frequency (pitch), intensity (volume), speaking rate and rhythm, and timbre, collectively known as prosody. A neural network-based model is developed to create an emotion classification syst...

Full description

Saved in:
Bibliographic Details
Main Author: Mirza Fathan Al Arsyad, M
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/72129
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:72129
spelling id-itb.:721292023-03-06T09:11:18ZMODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE Mirza Fathan Al Arsyad, M Indonesia Final Project emotion, speech, prosody, neural network. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/72129 Information conveyed through spoken language is not only conveyed through words, but also through fundamental frequency (pitch), intensity (volume), speaking rate and rhythm, and timbre, collectively known as prosody. A neural network-based model is developed to create an emotion classification system for Indonesian speech. Prosody as a feature in speech has long been a subject of research in various places. There have been many studies on modeling prosodic features for various purposes, such as automatic speech recognition, emotion identification, dialogue act classification, and many other studies that explore prosodic features in conversation. To build an emotion recognition model, a corpus containing various segmented Indonesian language speeches from various audio sources is used. Then feature extraction is performed on the corpus using feature sets such as eGeMAPS and INTERSPEECH 2009. A neural network-based model is then created using the extracted features. Prosodic features are used to train the system, and experimental results show an f-measure value of 0.39 for the system using the eGeMAPS feature set, which is used as the baseline for the study. The experiment then yielded a highest result of 0.568 using the eGeMAPS feature set which was reduced to 14 prosodic feature sets, and the model was built using SMOTE optimization to handle class imbalance in the dataset. Various other neural network optimization methods were also explored in this study, such as dropout layer, early stopping, and principal component analysis as an effort to reduce the dataset dimension. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Information conveyed through spoken language is not only conveyed through words, but also through fundamental frequency (pitch), intensity (volume), speaking rate and rhythm, and timbre, collectively known as prosody. A neural network-based model is developed to create an emotion classification system for Indonesian speech. Prosody as a feature in speech has long been a subject of research in various places. There have been many studies on modeling prosodic features for various purposes, such as automatic speech recognition, emotion identification, dialogue act classification, and many other studies that explore prosodic features in conversation. To build an emotion recognition model, a corpus containing various segmented Indonesian language speeches from various audio sources is used. Then feature extraction is performed on the corpus using feature sets such as eGeMAPS and INTERSPEECH 2009. A neural network-based model is then created using the extracted features. Prosodic features are used to train the system, and experimental results show an f-measure value of 0.39 for the system using the eGeMAPS feature set, which is used as the baseline for the study. The experiment then yielded a highest result of 0.568 using the eGeMAPS feature set which was reduced to 14 prosodic feature sets, and the model was built using SMOTE optimization to handle class imbalance in the dataset. Various other neural network optimization methods were also explored in this study, such as dropout layer, early stopping, and principal component analysis as an effort to reduce the dataset dimension.
format Final Project
author Mirza Fathan Al Arsyad, M
spellingShingle Mirza Fathan Al Arsyad, M
MODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE
author_facet Mirza Fathan Al Arsyad, M
author_sort Mirza Fathan Al Arsyad, M
title MODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE
title_short MODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE
title_full MODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE
title_fullStr MODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE
title_full_unstemmed MODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE
title_sort modeling prosodic feature for emotion recognition in indonesian spoken language
url https://digilib.itb.ac.id/gdl/view/72129
_version_ 1822992440969134080