A benchmark comparison of perceptual models for soundscapes on a large-scale augmented soundscape dataset

Psychoacoustic indicators and spectrogram-based features are standard inputs to perceptual models for soundscape analysis. However, existing models in the soundscape literature are trained on different collections of input parameters and frequently mutually exclusive datasets, which complicates fair...

Full description

Saved in:
Bibliographic Details
Main Authors: Ooi, Kenneth, Watcharasupat, Karn N., Lam, Bhan, Ong, Zhen-Ting, Gan, Woon-Seng
Other Authors: School of Electrical and Electronic Engineering
Format: Conference or Workshop Item
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/164983
https://www.ica2022korea.org/data/Proceedings_A14.pdf
https://ica2022korea.org/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Psychoacoustic indicators and spectrogram-based features are standard inputs to perceptual models for soundscape analysis. However, existing models in the soundscape literature are trained on different collections of input parameters and frequently mutually exclusive datasets, which complicates fair comparisons of model performance and generalizability to unseen data and soundscapes. Hence, we use the ARAUS dataset, a large-scale, publicly-available dataset of 25,440 responses to unique augmented soundscapes, as a common benchmark for comparison of the relative performance of a selection of input and model types used in previous soundscape studies, as well as deep neural network architectures commonly used for other acoustic tasks. The different model types were used in a regression task to predict “Eventfulness” ratings (as defined in ISO/TS 12913-3) given by participants as responses to the augmented soundscapes. Subsequently, we compared their performance as classification models for the classification of soundscapes into the quadrants generated by the Pleasantness-Eventfulness axes of the ISO/TS 12913-3 circumplex model. The five-fold cross-validation set of 25,200 responses and an independent test set of 240 responses making up the ARAUS dataset was used to facilitate unbiased comparisons.