Sound classification using sound spectrum features and convolutional neural networks

This paper proposes an alternative approach to sound classification using sound spectrum features, differing from the use of the Mel-Frequency Cepstral Coefficients (MFCC). Aligning with the crowd sourcing data collection application NoiseCapture, the data are kept in form of the post-processed soun...

Full description

Saved in:
Bibliographic Details
Main Authors: Tan, Ki In, Yean, Seanglidet, Lee, Bu-Sung
Other Authors: College of Computing and Data Science
Format: Conference or Workshop Item
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/177694
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:This paper proposes an alternative approach to sound classification using sound spectrum features, differing from the use of the Mel-Frequency Cepstral Coefficients (MFCC). Aligning with the crowd sourcing data collection application NoiseCapture, the data are kept in form of the post-processed sound spectrum instead of the raw audio files to maintain privacy of volunteers. Under such circumstances, MFCC, which requires audio processing, cannot be directly obtained from nor maximize the features of sound spectrum data stored in the application. As sound spectrum does not undergo further feature transformation, it retains audio features from the audio file and should therefore be classifiable when passed into a trained sound spectrum model. Hence, in this study, we aim to evaluate whether sound spectrum could be used as a replacement of MFCC, especially when audio file is inaccessible. The UrbanSound8K dataset and a mix of deep learning and machine learning models were used for the comparison. Experiment results show sound spectrum achieving comparable results in Convolutional Neural Network (CNN), with better predictions than its MFCC counterpart. Further comparisons draw insights that illustrate the need for more finetuning for sound spectrum data when using non-CNN models for sound classification due to the shape of the input features.