Artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation
The field of soundscape analysis and design is a nascent one, with the framework defined in ISO 12913-1:2014 calling for the understanding of an "acoustic environment as perceived or experienced and/or understood by a person or people, in context". Consequently, one method to improve sound...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/179452 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | The field of soundscape analysis and design is a nascent one, with the framework defined in ISO 12913-1:2014 calling for the understanding of an "acoustic environment as perceived or experienced and/or understood by a person or people, in context". Consequently, one method to improve soundscape quality is soundscape augmentation, whereby sounds are added to an existing soundscape via electroacoustic means to modify its perception. However, determining optimal or appropriate sounds to effect such perceptual changes necessitates listeners to be physically present at a location for the subjective evaluation of performance. Subjective evaluations are known to be the main bottleneck in terms of time and resources of soundscape studies, so being able to sidestep this requirement is extremely crucial in the field of soundscape analysis and design, because urban planners and soundscape architects could then iterate faster through their ideas.
Therefore, the overarching aim of this thesis is to provide insight into the following question: To what extent can we remove the human participant from the evaluation process by utilising appropriate design and modelling approaches? To achieve this, we (1) craft a large benchmark dataset of human responses to perceptual attributes of a representative variety of soundscapes in public urban environments that can be used to train generalisable models, (2) develop probabilistic models from the dataset comprising deep neural networks that capture the subjectivity in human evaluations of soundscapes, and (3) integrate such models in a real-life soundscape augmentation system requiring no human input to run.
The significance of these contributions is apparent given the dearth of publicly-available, large-scale benchmark datasets in existing soundscape literature, which has stymied the adoption of deep learning models in soundscape research due to their typical need for large datasets. Nonetheless, recent advances in deep learning models for acoustic tasks outside the field of soundscape research suggest at their applicability in soundscape analysis as well, which this thesis will also demonstrate.
Highlights of the thesis include the benchmark dataset being the largest soundscape dataset with perceptual labels in the literature (25,440 data samples), a probabilistic loss function allowing for statistically significant improvements (up to 7.8%) over a standard loss function using the mean squared error in the prediction of "pleasantness" as defined in ISO 12913, a modular architecture allowing for the separation of masker and gain inputs for more efficient masker selection in an automated masker selection system, a multimodal expansion on that modular architecture allowing for significant improvements (up to 2.8%) over a model using purely acoustic information, and an in-situ validation of the automated masker selection system with acoustic-only information showing a significant improvement in the perceived pleasantness (up to 23.4% of the possible range in raw ratings and 15.0% as defined by ISO 12913) of the soundscapes at pavilions in green spaces exposed to road traffic noise. |
---|