Artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation

The field of soundscape analysis and design is a nascent one, with the framework defined in ISO 12913-1:2014 calling for the understanding of an "acoustic environment as perceived or experienced and/or understood by a person or people, in context". Consequently, one method to improve sound...

Full description

Saved in:

Bibliographic Details
Main Author:	Ooi, Kenneth Wen Rui
Other Authors:	Gan Woon Seng
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Engineering Physics Soundscape Soundscape augmentation Soundscape intervention Auditory masking Deep neural network
Online Access:	https://hdl.handle.net/10356/179452
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-179452
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering Physics Soundscape Soundscape augmentation Soundscape intervention Auditory masking Deep neural network
spellingShingle	Engineering Physics Soundscape Soundscape augmentation Soundscape intervention Auditory masking Deep neural network Ooi, Kenneth Wen Rui Artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation
description	The field of soundscape analysis and design is a nascent one, with the framework defined in ISO 12913-1:2014 calling for the understanding of an "acoustic environment as perceived or experienced and/or understood by a person or people, in context". Consequently, one method to improve soundscape quality is soundscape augmentation, whereby sounds are added to an existing soundscape via electroacoustic means to modify its perception. However, determining optimal or appropriate sounds to effect such perceptual changes necessitates listeners to be physically present at a location for the subjective evaluation of performance. Subjective evaluations are known to be the main bottleneck in terms of time and resources of soundscape studies, so being able to sidestep this requirement is extremely crucial in the field of soundscape analysis and design, because urban planners and soundscape architects could then iterate faster through their ideas. Therefore, the overarching aim of this thesis is to provide insight into the following question: To what extent can we remove the human participant from the evaluation process by utilising appropriate design and modelling approaches? To achieve this, we (1) craft a large benchmark dataset of human responses to perceptual attributes of a representative variety of soundscapes in public urban environments that can be used to train generalisable models, (2) develop probabilistic models from the dataset comprising deep neural networks that capture the subjectivity in human evaluations of soundscapes, and (3) integrate such models in a real-life soundscape augmentation system requiring no human input to run. The significance of these contributions is apparent given the dearth of publicly-available, large-scale benchmark datasets in existing soundscape literature, which has stymied the adoption of deep learning models in soundscape research due to their typical need for large datasets. Nonetheless, recent advances in deep learning models for acoustic tasks outside the field of soundscape research suggest at their applicability in soundscape analysis as well, which this thesis will also demonstrate. Highlights of the thesis include the benchmark dataset being the largest soundscape dataset with perceptual labels in the literature (25,440 data samples), a probabilistic loss function allowing for statistically significant improvements (up to 7.8%) over a standard loss function using the mean squared error in the prediction of "pleasantness" as defined in ISO 12913, a modular architecture allowing for the separation of masker and gain inputs for more efficient masker selection in an automated masker selection system, a multimodal expansion on that modular architecture allowing for significant improvements (up to 2.8%) over a model using purely acoustic information, and an in-situ validation of the automated masker selection system with acoustic-only information showing a significant improvement in the perceived pleasantness (up to 23.4% of the possible range in raw ratings and 15.0% as defined by ISO 12913) of the soundscapes at pavilions in green spaces exposed to road traffic noise.
author2	Gan Woon Seng
author_facet	Gan Woon Seng Ooi, Kenneth Wen Rui
format	Thesis-Doctor of Philosophy
author	Ooi, Kenneth Wen Rui
author_sort	Ooi, Kenneth Wen Rui
title	Artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation
title_short	Artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation
title_full	Artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation
title_fullStr	Artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation
title_full_unstemmed	Artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation
title_sort	artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation
publisher	Nanyang Technological University
publishDate	2024
url	https://hdl.handle.net/10356/179452
_version_	1814047079037665280
spelling	sg-ntu-dr.10356-1794522024-09-04T07:56:36Z Artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation Ooi, Kenneth Wen Rui Gan Woon Seng School of Electrical and Electronic Engineering Digital Signal Processing Laboratory EWSGAN@ntu.edu.sg Engineering Physics Soundscape Soundscape augmentation Soundscape intervention Auditory masking Deep neural network The field of soundscape analysis and design is a nascent one, with the framework defined in ISO 12913-1:2014 calling for the understanding of an "acoustic environment as perceived or experienced and/or understood by a person or people, in context". Consequently, one method to improve soundscape quality is soundscape augmentation, whereby sounds are added to an existing soundscape via electroacoustic means to modify its perception. However, determining optimal or appropriate sounds to effect such perceptual changes necessitates listeners to be physically present at a location for the subjective evaluation of performance. Subjective evaluations are known to be the main bottleneck in terms of time and resources of soundscape studies, so being able to sidestep this requirement is extremely crucial in the field of soundscape analysis and design, because urban planners and soundscape architects could then iterate faster through their ideas. Therefore, the overarching aim of this thesis is to provide insight into the following question: To what extent can we remove the human participant from the evaluation process by utilising appropriate design and modelling approaches? To achieve this, we (1) craft a large benchmark dataset of human responses to perceptual attributes of a representative variety of soundscapes in public urban environments that can be used to train generalisable models, (2) develop probabilistic models from the dataset comprising deep neural networks that capture the subjectivity in human evaluations of soundscapes, and (3) integrate such models in a real-life soundscape augmentation system requiring no human input to run. The significance of these contributions is apparent given the dearth of publicly-available, large-scale benchmark datasets in existing soundscape literature, which has stymied the adoption of deep learning models in soundscape research due to their typical need for large datasets. Nonetheless, recent advances in deep learning models for acoustic tasks outside the field of soundscape research suggest at their applicability in soundscape analysis as well, which this thesis will also demonstrate. Highlights of the thesis include the benchmark dataset being the largest soundscape dataset with perceptual labels in the literature (25,440 data samples), a probabilistic loss function allowing for statistically significant improvements (up to 7.8%) over a standard loss function using the mean squared error in the prediction of "pleasantness" as defined in ISO 12913, a modular architecture allowing for the separation of masker and gain inputs for more efficient masker selection in an automated masker selection system, a multimodal expansion on that modular architecture allowing for significant improvements (up to 2.8%) over a model using purely acoustic information, and an in-situ validation of the automated masker selection system with acoustic-only information showing a significant improvement in the perceived pleasantness (up to 23.4% of the possible range in raw ratings and 15.0% as defined by ISO 12913) of the soundscapes at pavilions in green spaces exposed to road traffic noise. Doctor of Philosophy 2024-08-01T06:25:25Z 2024-08-01T06:25:25Z 2024 Thesis-Doctor of Philosophy Ooi, K. W. R. (2024). Artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/179452 https://hdl.handle.net/10356/179452 10.32657/10356/179452 en COT-V4-2020-1 GCP205559654 10.21979/N9/9OTEVX 10.21979/N9/0KYIAU This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

Artificial intelligence for urban soundscape augmentation: a benchmark dataset, probabilistic models, and real-life validation

Similar Items