Attention-based sound classification pipeline with sound spectrum

Urban soundscape research and their impact study are gaining more prominence with regard to a livable environment. Machine learning models have been used extensively to classify sounds where the input sound data, commonly in wave form, needs to be collected in its full frequency spectrum. However, i...

Full description

Saved in:
Bibliographic Details
Main Authors: Tan, Ki In, Yean, Seanglidet, Lee, Bu-Sung
Other Authors: College of Computing and Data Science
Format: Conference or Workshop Item
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/177690
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-177690
record_format dspace
spelling sg-ntu-dr.10356-1776902024-05-29T01:57:24Z Attention-based sound classification pipeline with sound spectrum Tan, Ki In Yean, Seanglidet Lee, Bu-Sung College of Computing and Data Science School of Computer Science and Engineering 2023 IEEE Sensors Applications Symposium (SAS) Computer and Information Science Sound spectrum Transformer Urban soundscape research and their impact study are gaining more prominence with regard to a livable environment. Machine learning models have been used extensively to classify sounds where the input sound data, commonly in wave form, needs to be collected in its full frequency spectrum. However, in an application like NoiseCapture, the sound spectrum is divided into 23 frequency bands and thus some information or features are lost. Given the recent success in training a deep learning model to classify sounds with a limited sound spectrum, we developed a pipeline for maximizing the performance of sound spectrum input with attention-based model. Using data from ESC-50, we discover that the use of transformers improve accuracy over the conventional neural networks by 22.5%; however the limited frequency bands in NoiseCapture sound spectrum impairs the model accuracy, necessitating the use of data augmentation. The data pipeline is analyzed for our case study of Singapore, where selected sound labels, curated to fit the local context, are used to train the model, resulting in an improvement in base transformer accuracy by 12.7%. Nanyang Technological University National Research Foundation (NRF) Submitted/Accepted version We would like to acknowledge the funding support from Nanyang Technological University – URECA Undergraduate Research Programme for this research project. This research/project is supported by the Catalyst: Strategic Fund from Government Funding, administered by the Ministry of Business Innovation & Employment, New Zealand under contract C09X1923, as well as the National Research Foundation, Singapore under its Industry Alignment Fund – Pre-positioning (IAF-PP) Funding Initiative. 2024-05-29T01:57:24Z 2024-05-29T01:57:24Z 2023 Conference Paper Tan, K. I., Yean, S. & Lee, B. (2023). Attention-based sound classification pipeline with sound spectrum. 2023 IEEE Sensors Applications Symposium (SAS). https://dx.doi.org/10.1109/SAS58821.2023.10254193 9798350323078 https://hdl.handle.net/10356/177690 10.1109/SAS58821.2023.10254193 2-s2.0-85174073579 en © 2023 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/SAS58821.2023.10254193. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Sound spectrum
Transformer
spellingShingle Computer and Information Science
Sound spectrum
Transformer
Tan, Ki In
Yean, Seanglidet
Lee, Bu-Sung
Attention-based sound classification pipeline with sound spectrum
description Urban soundscape research and their impact study are gaining more prominence with regard to a livable environment. Machine learning models have been used extensively to classify sounds where the input sound data, commonly in wave form, needs to be collected in its full frequency spectrum. However, in an application like NoiseCapture, the sound spectrum is divided into 23 frequency bands and thus some information or features are lost. Given the recent success in training a deep learning model to classify sounds with a limited sound spectrum, we developed a pipeline for maximizing the performance of sound spectrum input with attention-based model. Using data from ESC-50, we discover that the use of transformers improve accuracy over the conventional neural networks by 22.5%; however the limited frequency bands in NoiseCapture sound spectrum impairs the model accuracy, necessitating the use of data augmentation. The data pipeline is analyzed for our case study of Singapore, where selected sound labels, curated to fit the local context, are used to train the model, resulting in an improvement in base transformer accuracy by 12.7%.
author2 College of Computing and Data Science
author_facet College of Computing and Data Science
Tan, Ki In
Yean, Seanglidet
Lee, Bu-Sung
format Conference or Workshop Item
author Tan, Ki In
Yean, Seanglidet
Lee, Bu-Sung
author_sort Tan, Ki In
title Attention-based sound classification pipeline with sound spectrum
title_short Attention-based sound classification pipeline with sound spectrum
title_full Attention-based sound classification pipeline with sound spectrum
title_fullStr Attention-based sound classification pipeline with sound spectrum
title_full_unstemmed Attention-based sound classification pipeline with sound spectrum
title_sort attention-based sound classification pipeline with sound spectrum
publishDate 2024
url https://hdl.handle.net/10356/177690
_version_ 1814047407707521024