Multilanguage speech-based gender classification using time-frequency features and SVM classifier

Speech is the most significant communication mode among human beings and a potential method for human-computer interaction (HCI). Being unparallel in complexity, the perception of human speech is very hard. The most crucial characteristic of speech is gender, and for the classification of gender oft...

全面介紹

Saved in:
書目詳細資料
Main Authors: Wani, Taiba, Gunawan, Teddy Surya, Mansor, Hasmah, Ahmad Qadri, Syed Asif, Sophian, Ali, Ambikairajah, Eliathamby, Ihsanto, Eko
格式: Book Chapter
語言:English
English
English
出版: Springer 2021
主題:
在線閱讀:http://irep.iium.edu.my/86116/15/Presentation%20Schedule%20iCITES2020%202nd.pdf
http://irep.iium.edu.my/86116/21/86116_Multilanguage%20speech-based%20gender%20classification.pdf
http://irep.iium.edu.my/86116/27/86116_Multilanguage%20speech-based%20gender%20classification_SCOPUS.pdf
http://irep.iium.edu.my/86116/
https://icites2020.ump.edu.my/index.php/en/
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
實物特徵
總結:Speech is the most significant communication mode among human beings and a potential method for human-computer interaction (HCI). Being unparallel in complexity, the perception of human speech is very hard. The most crucial characteristic of speech is gender, and for the classification of gender often pitch is utilized. However, it is not a reliable method for gender classification as in numerous cases, the pitch of female and male is nearly similar. In this paper, we propose a time-frequency method for the classification of gender-based on the speech signal. Various techniques like framing, Fast Fourier Transform (FFT), auto-correlation, filtering, power calculations, speech frequency analysis, and feature extraction and formation are applied on speech samples. The classification is done based on features derived from the frequency and time domain processing using the Support Vector Machines (SVM) algorithm. SVM is trained on two speech databases Berlin Emo-DB and IITKGP-SEHSC, in which a total of 400 speech samples are evaluated. An accuracy of 83% and 81% for IITKGP-SEHSC and Berlin Emo-DB have been observed, respectively.