Sound event detection with human and emergency sounds
Sound Event Detection (SED) is the task of recognizing the sound events and their respective onset and offset timestamps in an audio clip. This thesis explores a variety of models and techniques in order to develop an effective SED system. This includes investigating the impact of different audio fe...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/153220 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Sound Event Detection (SED) is the task of recognizing the sound events and their respective onset and offset timestamps in an audio clip. This thesis explores a variety of models and techniques in order to develop an effective SED system. This includes investigating the impact of different audio feature types, data augmentation techniques, network architectures and automatic threshold optimisation on the performance of the system. Additionally, this thesis proposes frame- wise prediction pre-processing and post-processing methods, in order to address the issues with existing SED system and develop a system that is able analyse clips with long audio durations. Unlike previous works, which use standard datasets, such as those from the Detection and Classification of Acoustic Scenes and Events (DCASE) challenges, as the development dataset, a novel dataset consisting of human and emergency sounds extracted from AudioSet is used in this project. As the dataset is novel, there is no state-of-the-art baseline available for comparison. As such, the dataset of the DCASE 2017 Task 4 is used to compare the performance of our best-performing models, which is determined based on the project dataset, with the state-of-the- art performance. From our experiments, we managed to successfully develop a well-performing SED system for our novel dataset, with the system using our proposed prediction processing method consistently outperforming the ones that do not. Additionally, by using the knowledge we learnt from our experiments with our novel project dataset, we devloped a system which outperforms the previous state- of-the-art model for the DCASE 2017 Task 4 Challenge. |
---|