MODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE
Information conveyed through spoken language is not only conveyed through words, but also through fundamental frequency (pitch), intensity (volume), speaking rate and rhythm, and timbre, collectively known as prosody. A neural network-based model is developed to create an emotion classification syst...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/72129 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:72129 |
---|---|
spelling |
id-itb.:721292023-03-06T09:11:18ZMODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE Mirza Fathan Al Arsyad, M Indonesia Final Project emotion, speech, prosody, neural network. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/72129 Information conveyed through spoken language is not only conveyed through words, but also through fundamental frequency (pitch), intensity (volume), speaking rate and rhythm, and timbre, collectively known as prosody. A neural network-based model is developed to create an emotion classification system for Indonesian speech. Prosody as a feature in speech has long been a subject of research in various places. There have been many studies on modeling prosodic features for various purposes, such as automatic speech recognition, emotion identification, dialogue act classification, and many other studies that explore prosodic features in conversation. To build an emotion recognition model, a corpus containing various segmented Indonesian language speeches from various audio sources is used. Then feature extraction is performed on the corpus using feature sets such as eGeMAPS and INTERSPEECH 2009. A neural network-based model is then created using the extracted features. Prosodic features are used to train the system, and experimental results show an f-measure value of 0.39 for the system using the eGeMAPS feature set, which is used as the baseline for the study. The experiment then yielded a highest result of 0.568 using the eGeMAPS feature set which was reduced to 14 prosodic feature sets, and the model was built using SMOTE optimization to handle class imbalance in the dataset. Various other neural network optimization methods were also explored in this study, such as dropout layer, early stopping, and principal component analysis as an effort to reduce the dataset dimension. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Information conveyed through spoken language is not only conveyed through words, but also through fundamental frequency (pitch), intensity (volume), speaking rate and rhythm, and timbre, collectively known as prosody. A neural network-based model is developed to create an emotion classification system for Indonesian speech. Prosody as a feature in speech has long been a subject of research in various places. There have been many studies on modeling prosodic features for various purposes, such as automatic speech recognition, emotion identification, dialogue act classification, and many other studies that explore prosodic features in conversation.
To build an emotion recognition model, a corpus containing various segmented Indonesian language speeches from various audio sources is used. Then feature extraction is performed on the corpus using feature sets such as eGeMAPS and INTERSPEECH 2009. A neural network-based model is then created using the extracted features.
Prosodic features are used to train the system, and experimental results show an f-measure value of 0.39 for the system using the eGeMAPS feature set, which is used as the baseline for the study. The experiment then yielded a highest result of 0.568 using the eGeMAPS feature set which was reduced to 14 prosodic feature sets, and the model was built using SMOTE optimization to handle class imbalance in the dataset. Various other neural network optimization methods were also explored in this study, such as dropout layer, early stopping, and principal component analysis as an effort to reduce the dataset dimension. |
format |
Final Project |
author |
Mirza Fathan Al Arsyad, M |
spellingShingle |
Mirza Fathan Al Arsyad, M MODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE |
author_facet |
Mirza Fathan Al Arsyad, M |
author_sort |
Mirza Fathan Al Arsyad, M |
title |
MODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE |
title_short |
MODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE |
title_full |
MODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE |
title_fullStr |
MODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE |
title_full_unstemmed |
MODELING PROSODIC FEATURE FOR EMOTION RECOGNITION IN INDONESIAN SPOKEN LANGUAGE |
title_sort |
modeling prosodic feature for emotion recognition in indonesian spoken language |
url |
https://digilib.itb.ac.id/gdl/view/72129 |
_version_ |
1822992440969134080 |