Audio intelligent monitoring at the edge (AIME) for polyphonic sound sources
Urban sound monitoring remains imperative as an effort to control and mitigate noise pollution, especially in urban areas. With the advancement in the field of artificial intelligence (AI) and edge computing, the development of intelligent machine listening systems for real-time noise monitoring has...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/176730 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-176730 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1767302024-05-24T15:50:46Z Audio intelligent monitoring at the edge (AIME) for polyphonic sound sources Lim, Victor Gan Woon Seng School of Electrical and Electronic Engineering EWSGAN@ntu.edu.sg Engineering Audio Machine learning Sound event detection Urban sound Urban sound monitoring remains imperative as an effort to control and mitigate noise pollution, especially in urban areas. With the advancement in the field of artificial intelligence (AI) and edge computing, the development of intelligent machine listening systems for real-time noise monitoring has been a prominent focus in modern sound monitoring systems. Urban sound is characterized by a multitude of sound sources, including vehicular traffic, industrial activities, construction work, and human activities. The overlapping nature of these sound sources often creates complex polyphonic environments, where multiple sounds occur simultaneously, introducing challenges and limitations to the traditional monitoring system. In this project, we aim to address the challenges posed by polyphonic urban sound environments through the utilization of deep learning models for audio tagging and sound event detection. Development of these sound models primarily focuses on the usage of SINGA:PURA dataset, a strongly labelled polyphonic urban sound dataset with spatiotemporal context recorded in Singapore. We explore the usage transfer learning, pre-trained audio embedding, together with Convolutional Recurrent Neural Network (CRNN) architecture to perform sound detection and audio tagging, leveraging on the strong and weak labels from the openly available dataset. Leveraging on the ensemble of models, we enhance the robustness and accuracy of our system by utilizing the predictions from multiple specialized models. Additionally, we explore quantization techniques to enhance efficiency and enable the deployment of our sound detection models in resource-constrained environments. Bachelor's degree 2024-05-20T01:56:55Z 2024-05-20T01:56:55Z 2024 Final Year Project (FYP) Lim, V. (2024). Audio intelligent monitoring at the edge (AIME) for polyphonic sound sources. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/176730 https://hdl.handle.net/10356/176730 en A3059-231 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering Audio Machine learning Sound event detection Urban sound |
spellingShingle |
Engineering Audio Machine learning Sound event detection Urban sound Lim, Victor Audio intelligent monitoring at the edge (AIME) for polyphonic sound sources |
description |
Urban sound monitoring remains imperative as an effort to control and mitigate noise pollution, especially in urban areas. With the advancement in the field of artificial intelligence (AI) and edge computing, the development of intelligent machine listening systems for real-time noise monitoring has been a prominent focus in modern sound monitoring systems.
Urban sound is characterized by a multitude of sound sources, including vehicular traffic, industrial activities, construction work, and human activities. The overlapping nature of these sound sources often creates complex polyphonic environments, where multiple sounds occur simultaneously, introducing challenges and limitations to the traditional monitoring system.
In this project, we aim to address the challenges posed by polyphonic urban sound environments through the utilization of deep learning models for audio tagging and sound event detection. Development of these sound models primarily focuses on the usage of SINGA:PURA dataset, a strongly labelled polyphonic urban sound dataset with spatiotemporal context recorded in Singapore.
We explore the usage transfer learning, pre-trained audio embedding, together with Convolutional Recurrent Neural Network (CRNN) architecture to perform sound detection and audio tagging, leveraging on the strong and weak labels from the openly available dataset. Leveraging on the ensemble of models, we enhance the robustness and accuracy of our system by utilizing the predictions from multiple specialized models. Additionally, we explore quantization techniques to enhance efficiency and enable the deployment of our sound detection models in resource-constrained environments. |
author2 |
Gan Woon Seng |
author_facet |
Gan Woon Seng Lim, Victor |
format |
Final Year Project |
author |
Lim, Victor |
author_sort |
Lim, Victor |
title |
Audio intelligent monitoring at the edge (AIME) for polyphonic sound sources |
title_short |
Audio intelligent monitoring at the edge (AIME) for polyphonic sound sources |
title_full |
Audio intelligent monitoring at the edge (AIME) for polyphonic sound sources |
title_fullStr |
Audio intelligent monitoring at the edge (AIME) for polyphonic sound sources |
title_full_unstemmed |
Audio intelligent monitoring at the edge (AIME) for polyphonic sound sources |
title_sort |
audio intelligent monitoring at the edge (aime) for polyphonic sound sources |
publisher |
Nanyang Technological University |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/176730 |
_version_ |
1806059787483348992 |