Multilanguage speech-based gender classification using time-frequency features and SVM classifier

Speech is the most significant communication mode among human beings and a potential method for human-computer interaction (HCI). Being unparallel in complexity, the perception of human speech is very hard. The most crucial characteristic of speech is gender, and for the classification of gender oft...

Full description

Saved in:
Bibliographic Details
Main Authors: Wani, Taiba, Gunawan, Teddy Surya, Mansor, Hasmah, Ahmad Qadri, Syed Asif, Sophian, Ali, Ambikairajah, Eliathamby, Ihsanto, Eko
Format: Book Chapter
Language:English
English
English
Published: Springer 2021
Subjects:
Online Access:http://irep.iium.edu.my/86116/15/Presentation%20Schedule%20iCITES2020%202nd.pdf
http://irep.iium.edu.my/86116/21/86116_Multilanguage%20speech-based%20gender%20classification.pdf
http://irep.iium.edu.my/86116/27/86116_Multilanguage%20speech-based%20gender%20classification_SCOPUS.pdf
http://irep.iium.edu.my/86116/
https://icites2020.ump.edu.my/index.php/en/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Islam Antarabangsa Malaysia
Language: English
English
English
Description
Summary:Speech is the most significant communication mode among human beings and a potential method for human-computer interaction (HCI). Being unparallel in complexity, the perception of human speech is very hard. The most crucial characteristic of speech is gender, and for the classification of gender often pitch is utilized. However, it is not a reliable method for gender classification as in numerous cases, the pitch of female and male is nearly similar. In this paper, we propose a time-frequency method for the classification of gender-based on the speech signal. Various techniques like framing, Fast Fourier Transform (FFT), auto-correlation, filtering, power calculations, speech frequency analysis, and feature extraction and formation are applied on speech samples. The classification is done based on features derived from the frequency and time domain processing using the Support Vector Machines (SVM) algorithm. SVM is trained on two speech databases Berlin Emo-DB and IITKGP-SEHSC, in which a total of 400 speech samples are evaluated. An accuracy of 83% and 81% for IITKGP-SEHSC and Berlin Emo-DB have been observed, respectively.