Audio intelligent monitoring at the edge (AIME) for polyphonic sound sources

Urban sound monitoring remains imperative as an effort to control and mitigate noise pollution, especially in urban areas. With the advancement in the field of artificial intelligence (AI) and edge computing, the development of intelligent machine listening systems for real-time noise monitoring has...

Full description

Saved in:
Bibliographic Details
Main Author: Lim, Victor
Other Authors: Gan Woon Seng
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/176730
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Urban sound monitoring remains imperative as an effort to control and mitigate noise pollution, especially in urban areas. With the advancement in the field of artificial intelligence (AI) and edge computing, the development of intelligent machine listening systems for real-time noise monitoring has been a prominent focus in modern sound monitoring systems. Urban sound is characterized by a multitude of sound sources, including vehicular traffic, industrial activities, construction work, and human activities. The overlapping nature of these sound sources often creates complex polyphonic environments, where multiple sounds occur simultaneously, introducing challenges and limitations to the traditional monitoring system. In this project, we aim to address the challenges posed by polyphonic urban sound environments through the utilization of deep learning models for audio tagging and sound event detection. Development of these sound models primarily focuses on the usage of SINGA:PURA dataset, a strongly labelled polyphonic urban sound dataset with spatiotemporal context recorded in Singapore. We explore the usage transfer learning, pre-trained audio embedding, together with Convolutional Recurrent Neural Network (CRNN) architecture to perform sound detection and audio tagging, leveraging on the strong and weak labels from the openly available dataset. Leveraging on the ensemble of models, we enhance the robustness and accuracy of our system by utilizing the predictions from multiple specialized models. Additionally, we explore quantization techniques to enhance efficiency and enable the deployment of our sound detection models in resource-constrained environments.