Artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation

The field of soundscape analysis and design is a nascent one, with the framework defined in ISO 12913-1:2014 calling for the understanding of an "acoustic environment as perceived or experienced and/or understood by a person or people, in context". Consequently, one method to improve sound...

Full description

Saved in:
Bibliographic Details
Main Author: Ooi, Kenneth Wen Rui
Other Authors: Gan Woon Seng
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/179452
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-179452
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering
Physics
Soundscape
Soundscape augmentation
Soundscape intervention
Auditory masking
Deep neural network
spellingShingle Engineering
Physics
Soundscape
Soundscape augmentation
Soundscape intervention
Auditory masking
Deep neural network
Ooi, Kenneth Wen Rui
Artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation
description The field of soundscape analysis and design is a nascent one, with the framework defined in ISO 12913-1:2014 calling for the understanding of an "acoustic environment as perceived or experienced and/or understood by a person or people, in context". Consequently, one method to improve soundscape quality is soundscape augmentation, whereby sounds are added to an existing soundscape via electroacoustic means to modify its perception. However, determining optimal or appropriate sounds to effect such perceptual changes necessitates listeners to be physically present at a location for the subjective evaluation of performance. Subjective evaluations are known to be the main bottleneck in terms of time and resources of soundscape studies, so being able to sidestep this requirement is extremely crucial in the field of soundscape analysis and design, because urban planners and soundscape architects could then iterate faster through their ideas. Therefore, the overarching aim of this thesis is to provide insight into the following question: To what extent can we remove the human participant from the evaluation process by utilising appropriate design and modelling approaches? To achieve this, we (1) craft a large benchmark dataset of human responses to perceptual attributes of a representative variety of soundscapes in public urban environments that can be used to train generalisable models, (2) develop probabilistic models from the dataset comprising deep neural networks that capture the subjectivity in human evaluations of soundscapes, and (3) integrate such models in a real-life soundscape augmentation system requiring no human input to run. The significance of these contributions is apparent given the dearth of publicly-available, large-scale benchmark datasets in existing soundscape literature, which has stymied the adoption of deep learning models in soundscape research due to their typical need for large datasets. Nonetheless, recent advances in deep learning models for acoustic tasks outside the field of soundscape research suggest at their applicability in soundscape analysis as well, which this thesis will also demonstrate. Highlights of the thesis include the benchmark dataset being the largest soundscape dataset with perceptual labels in the literature (25,440 data samples), a probabilistic loss function allowing for statistically significant improvements (up to 7.8%) over a standard loss function using the mean squared error in the prediction of "pleasantness" as defined in ISO 12913, a modular architecture allowing for the separation of masker and gain inputs for more efficient masker selection in an automated masker selection system, a multimodal expansion on that modular architecture allowing for significant improvements (up to 2.8%) over a model using purely acoustic information, and an in-situ validation of the automated masker selection system with acoustic-only information showing a significant improvement in the perceived pleasantness (up to 23.4% of the possible range in raw ratings and 15.0% as defined by ISO 12913) of the soundscapes at pavilions in green spaces exposed to road traffic noise.
author2 Gan Woon Seng
author_facet Gan Woon Seng
Ooi, Kenneth Wen Rui
format Thesis-Doctor of Philosophy
author Ooi, Kenneth Wen Rui
author_sort Ooi, Kenneth Wen Rui
title Artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation
title_short Artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation
title_full Artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation
title_fullStr Artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation
title_full_unstemmed Artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation
title_sort artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/179452
_version_ 1814047079037665280
spelling sg-ntu-dr.10356-1794522024-09-04T07:56:36Z Artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation Ooi, Kenneth Wen Rui Gan Woon Seng School of Electrical and Electronic Engineering Digital Signal Processing Laboratory EWSGAN@ntu.edu.sg Engineering Physics Soundscape Soundscape augmentation Soundscape intervention Auditory masking Deep neural network The field of soundscape analysis and design is a nascent one, with the framework defined in ISO 12913-1:2014 calling for the understanding of an "acoustic environment as perceived or experienced and/or understood by a person or people, in context". Consequently, one method to improve soundscape quality is soundscape augmentation, whereby sounds are added to an existing soundscape via electroacoustic means to modify its perception. However, determining optimal or appropriate sounds to effect such perceptual changes necessitates listeners to be physically present at a location for the subjective evaluation of performance. Subjective evaluations are known to be the main bottleneck in terms of time and resources of soundscape studies, so being able to sidestep this requirement is extremely crucial in the field of soundscape analysis and design, because urban planners and soundscape architects could then iterate faster through their ideas. Therefore, the overarching aim of this thesis is to provide insight into the following question: To what extent can we remove the human participant from the evaluation process by utilising appropriate design and modelling approaches? To achieve this, we (1) craft a large benchmark dataset of human responses to perceptual attributes of a representative variety of soundscapes in public urban environments that can be used to train generalisable models, (2) develop probabilistic models from the dataset comprising deep neural networks that capture the subjectivity in human evaluations of soundscapes, and (3) integrate such models in a real-life soundscape augmentation system requiring no human input to run. The significance of these contributions is apparent given the dearth of publicly-available, large-scale benchmark datasets in existing soundscape literature, which has stymied the adoption of deep learning models in soundscape research due to their typical need for large datasets. Nonetheless, recent advances in deep learning models for acoustic tasks outside the field of soundscape research suggest at their applicability in soundscape analysis as well, which this thesis will also demonstrate. Highlights of the thesis include the benchmark dataset being the largest soundscape dataset with perceptual labels in the literature (25,440 data samples), a probabilistic loss function allowing for statistically significant improvements (up to 7.8%) over a standard loss function using the mean squared error in the prediction of "pleasantness" as defined in ISO 12913, a modular architecture allowing for the separation of masker and gain inputs for more efficient masker selection in an automated masker selection system, a multimodal expansion on that modular architecture allowing for significant improvements (up to 2.8%) over a model using purely acoustic information, and an in-situ validation of the automated masker selection system with acoustic-only information showing a significant improvement in the perceived pleasantness (up to 23.4% of the possible range in raw ratings and 15.0% as defined by ISO 12913) of the soundscapes at pavilions in green spaces exposed to road traffic noise. Doctor of Philosophy 2024-08-01T06:25:25Z 2024-08-01T06:25:25Z 2024 Thesis-Doctor of Philosophy Ooi, K. W. R. (2024). Artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/179452 https://hdl.handle.net/10356/179452 10.32657/10356/179452 en COT-V4-2020-1 GCP205559654 10.21979/N9/9OTEVX 10.21979/N9/0KYIAU This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University