Multilanguage speech-based gender classification using time-frequency features and SVM classifier
Speech is the most significant communication mode among human beings and a potential method for human-computer interaction (HCI). Being unparallel in complexity, the perception of human speech is very hard. The most crucial characteristic of speech is gender, and for the classification of gender oft...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Book Chapter |
Language: | English English English |
Published: |
Springer
2021
|
Subjects: | |
Online Access: | http://irep.iium.edu.my/86116/15/Presentation%20Schedule%20iCITES2020%202nd.pdf http://irep.iium.edu.my/86116/21/86116_Multilanguage%20speech-based%20gender%20classification.pdf http://irep.iium.edu.my/86116/27/86116_Multilanguage%20speech-based%20gender%20classification_SCOPUS.pdf http://irep.iium.edu.my/86116/ https://icites2020.ump.edu.my/index.php/en/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Islam Antarabangsa Malaysia |
Language: | English English English |
Summary: | Speech is the most significant communication mode among human beings and a potential method for human-computer interaction (HCI). Being unparallel in complexity, the perception of human speech is very hard. The most crucial characteristic of speech is gender, and for the classification of gender often pitch is utilized. However, it is not a reliable method for gender classification as in numerous cases, the pitch of female and male is nearly similar. In this paper, we propose a time-frequency method for the classification of gender-based on the speech signal. Various techniques like framing, Fast Fourier Transform (FFT), auto-correlation, filtering, power calculations, speech frequency analysis, and feature extraction and formation are applied on speech samples. The classification is done based on features derived from the frequency and time domain processing using the Support Vector Machines (SVM) algorithm. SVM is trained on two speech databases Berlin Emo-DB and IITKGP-SEHSC, in which a total of 400 speech samples are evaluated. An accuracy of 83% and 81% for IITKGP-SEHSC and Berlin Emo-DB have been observed, respectively. |
---|