Attention-based sound classification pipeline with sound spectrum
Urban soundscape research and their impact study are gaining more prominence with regard to a livable environment. Machine learning models have been used extensively to classify sounds where the input sound data, commonly in wave form, needs to be collected in its full frequency spectrum. However, i...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/177690 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-177690 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1776902024-05-29T01:57:24Z Attention-based sound classification pipeline with sound spectrum Tan, Ki In Yean, Seanglidet Lee, Bu-Sung College of Computing and Data Science School of Computer Science and Engineering 2023 IEEE Sensors Applications Symposium (SAS) Computer and Information Science Sound spectrum Transformer Urban soundscape research and their impact study are gaining more prominence with regard to a livable environment. Machine learning models have been used extensively to classify sounds where the input sound data, commonly in wave form, needs to be collected in its full frequency spectrum. However, in an application like NoiseCapture, the sound spectrum is divided into 23 frequency bands and thus some information or features are lost. Given the recent success in training a deep learning model to classify sounds with a limited sound spectrum, we developed a pipeline for maximizing the performance of sound spectrum input with attention-based model. Using data from ESC-50, we discover that the use of transformers improve accuracy over the conventional neural networks by 22.5%; however the limited frequency bands in NoiseCapture sound spectrum impairs the model accuracy, necessitating the use of data augmentation. The data pipeline is analyzed for our case study of Singapore, where selected sound labels, curated to fit the local context, are used to train the model, resulting in an improvement in base transformer accuracy by 12.7%. Nanyang Technological University National Research Foundation (NRF) Submitted/Accepted version We would like to acknowledge the funding support from Nanyang Technological University – URECA Undergraduate Research Programme for this research project. This research/project is supported by the Catalyst: Strategic Fund from Government Funding, administered by the Ministry of Business Innovation & Employment, New Zealand under contract C09X1923, as well as the National Research Foundation, Singapore under its Industry Alignment Fund – Pre-positioning (IAF-PP) Funding Initiative. 2024-05-29T01:57:24Z 2024-05-29T01:57:24Z 2023 Conference Paper Tan, K. I., Yean, S. & Lee, B. (2023). Attention-based sound classification pipeline with sound spectrum. 2023 IEEE Sensors Applications Symposium (SAS). https://dx.doi.org/10.1109/SAS58821.2023.10254193 9798350323078 https://hdl.handle.net/10356/177690 10.1109/SAS58821.2023.10254193 2-s2.0-85174073579 en © 2023 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/SAS58821.2023.10254193. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science Sound spectrum Transformer |
spellingShingle |
Computer and Information Science Sound spectrum Transformer Tan, Ki In Yean, Seanglidet Lee, Bu-Sung Attention-based sound classification pipeline with sound spectrum |
description |
Urban soundscape research and their impact study are gaining more prominence with regard to a livable environment. Machine learning models have been used extensively to classify sounds where the input sound data, commonly in wave form, needs to be collected in its full frequency spectrum. However, in an application like NoiseCapture, the sound spectrum is divided into 23 frequency bands and thus some information or features are lost. Given the recent success in training a deep learning model to classify sounds with a limited sound spectrum, we developed a pipeline for maximizing the performance of sound spectrum input with attention-based model. Using data from ESC-50, we discover that the use of transformers improve accuracy over the conventional neural networks by 22.5%; however the limited frequency bands in NoiseCapture sound spectrum impairs the model accuracy, necessitating the use of data augmentation. The data pipeline is analyzed for our case study of Singapore, where selected sound labels, curated to fit the local context, are used to train the model, resulting in an improvement in base transformer accuracy by 12.7%. |
author2 |
College of Computing and Data Science |
author_facet |
College of Computing and Data Science Tan, Ki In Yean, Seanglidet Lee, Bu-Sung |
format |
Conference or Workshop Item |
author |
Tan, Ki In Yean, Seanglidet Lee, Bu-Sung |
author_sort |
Tan, Ki In |
title |
Attention-based sound classification pipeline with sound spectrum |
title_short |
Attention-based sound classification pipeline with sound spectrum |
title_full |
Attention-based sound classification pipeline with sound spectrum |
title_fullStr |
Attention-based sound classification pipeline with sound spectrum |
title_full_unstemmed |
Attention-based sound classification pipeline with sound spectrum |
title_sort |
attention-based sound classification pipeline with sound spectrum |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/177690 |
_version_ |
1814047407707521024 |