Investigation of speech disfluencies classification on different threshold selection techniques using energy feature extraction / Raseeda Hamzah and Nursuriati Jamil
Filled pause and Elongation are the two types of speech disfluencies that need more suitable acoustical features to be classified correctly since they are always being misclassified. This work concentrates on developing an accurate and robust energy feature extraction for modelling filled pause and...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Penerbit UiTM
2019
|
Subjects: | |
Online Access: | https://ir.uitm.edu.my/id/eprint/43819/1/43819.pdf https://ir.uitm.edu.my/id/eprint/43819/ https://mjoc.uitm.edu.my |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Teknologi Mara |
Language: | English |
Summary: | Filled pause and Elongation are the two types of speech disfluencies that need more suitable acoustical features to be classified correctly since they are always being misclassified. This work concentrates on developing an accurate and robust energy feature extraction for modelling filled pause and elongation by investigating different energy features using local maxima points of the speech energy. Method: In this paper, we extracted peak values from each frame of a voiced signal by implementing different thresholding techniques to classify filled pause and elongation. These energy features are evaluated by using statistical naïve Bayes classifier to see the contribution on the classification processes. Various samples of sustained syllables and filled pauses of spontaneous speech were extracted from Malaysian Parliamentary Debate Database of the year 2008. A naïve Bayes was used as a classifier. We performed F-measure evaluation to investigate the significant differences in mean of filled pause and elongation samples. Results: Results revealed that our proposed LM-E has increase the classification with up to 71% and 75% F-measure for elongation and filled pause. Conclusion: The best achieved accuracies in both filled pause and elongation classification were varied depending on the types of thresholding techniques applied during the local maxima of speech energy extraction. The most contributed thresholding technique is our proposed technique which is by using the adaptive height as the threshold that extracts the local maxima of the speech energy (LM-E). |
---|