Attention-based sound classification pipeline with sound spectrum

Urban soundscape research and their impact study are gaining more prominence with regard to a livable environment. Machine learning models have been used extensively to classify sounds where the input sound data, commonly in wave form, needs to be collected in its full frequency spectrum. However, i...

Full description

Saved in:

Bibliographic Details
Main Authors:	Tan, Ki In, Yean, Seanglidet, Lee, Bu-Sung
Other Authors:	College of Computing and Data Science
Format:	Conference or Workshop Item
Language:	English
Published:	2024
Subjects:	Computer and Information Science Sound spectrum Transformer
Online Access:	https://hdl.handle.net/10356/177690
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-177690
record_format	dspace
spelling	sg-ntu-dr.10356-1776902024-05-29T01:57:24Z Attention-based sound classification pipeline with sound spectrum Tan, Ki In Yean, Seanglidet Lee, Bu-Sung College of Computing and Data Science School of Computer Science and Engineering 2023 IEEE Sensors Applications Symposium (SAS) Computer and Information Science Sound spectrum Transformer Urban soundscape research and their impact study are gaining more prominence with regard to a livable environment. Machine learning models have been used extensively to classify sounds where the input sound data, commonly in wave form, needs to be collected in its full frequency spectrum. However, in an application like NoiseCapture, the sound spectrum is divided into 23 frequency bands and thus some information or features are lost. Given the recent success in training a deep learning model to classify sounds with a limited sound spectrum, we developed a pipeline for maximizing the performance of sound spectrum input with attention-based model. Using data from ESC-50, we discover that the use of transformers improve accuracy over the conventional neural networks by 22.5%; however the limited frequency bands in NoiseCapture sound spectrum impairs the model accuracy, necessitating the use of data augmentation. The data pipeline is analyzed for our case study of Singapore, where selected sound labels, curated to fit the local context, are used to train the model, resulting in an improvement in base transformer accuracy by 12.7%. Nanyang Technological University National Research Foundation (NRF) Submitted/Accepted version We would like to acknowledge the funding support from Nanyang Technological University – URECA Undergraduate Research Programme for this research project. This research/project is supported by the Catalyst: Strategic Fund from Government Funding, administered by the Ministry of Business Innovation & Employment, New Zealand under contract C09X1923, as well as the National Research Foundation, Singapore under its Industry Alignment Fund – Pre-positioning (IAF-PP) Funding Initiative. 2024-05-29T01:57:24Z 2024-05-29T01:57:24Z 2023 Conference Paper Tan, K. I., Yean, S. & Lee, B. (2023). Attention-based sound classification pipeline with sound spectrum. 2023 IEEE Sensors Applications Symposium (SAS). https://dx.doi.org/10.1109/SAS58821.2023.10254193 9798350323078 https://hdl.handle.net/10356/177690 10.1109/SAS58821.2023.10254193 2-s2.0-85174073579 en © 2023 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/SAS58821.2023.10254193. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science Sound spectrum Transformer
spellingShingle	Computer and Information Science Sound spectrum Transformer Tan, Ki In Yean, Seanglidet Lee, Bu-Sung Attention-based sound classification pipeline with sound spectrum
description	Urban soundscape research and their impact study are gaining more prominence with regard to a livable environment. Machine learning models have been used extensively to classify sounds where the input sound data, commonly in wave form, needs to be collected in its full frequency spectrum. However, in an application like NoiseCapture, the sound spectrum is divided into 23 frequency bands and thus some information or features are lost. Given the recent success in training a deep learning model to classify sounds with a limited sound spectrum, we developed a pipeline for maximizing the performance of sound spectrum input with attention-based model. Using data from ESC-50, we discover that the use of transformers improve accuracy over the conventional neural networks by 22.5%; however the limited frequency bands in NoiseCapture sound spectrum impairs the model accuracy, necessitating the use of data augmentation. The data pipeline is analyzed for our case study of Singapore, where selected sound labels, curated to fit the local context, are used to train the model, resulting in an improvement in base transformer accuracy by 12.7%.
author2	College of Computing and Data Science
author_facet	College of Computing and Data Science Tan, Ki In Yean, Seanglidet Lee, Bu-Sung
format	Conference or Workshop Item
author	Tan, Ki In Yean, Seanglidet Lee, Bu-Sung
author_sort	Tan, Ki In
title	Attention-based sound classification pipeline with sound spectrum
title_short	Attention-based sound classification pipeline with sound spectrum
title_full	Attention-based sound classification pipeline with sound spectrum
title_fullStr	Attention-based sound classification pipeline with sound spectrum
title_full_unstemmed	Attention-based sound classification pipeline with sound spectrum
title_sort	attention-based sound classification pipeline with sound spectrum
publishDate	2024
url	https://hdl.handle.net/10356/177690
_version_	1814047407707521024

Attention-based sound classification pipeline with sound spectrum

Similar Items