Classification of sound using machine learning

The soundscape of urban parks and cities are composed of a variety of natural and man-made noises. The benefits brought by urban parks and the health ailments from sound pollution makes soundscape analysis a valuable study topic in urban cities, like Singapore. However, the need to collect sound dat...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Ki In
Other Authors: Lee Bu Sung, Francis
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/165777
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-165777
record_format dspace
spelling sg-ntu-dr.10356-1657772023-04-14T15:37:39Z Classification of sound using machine learning Tan, Ki In Lee Bu Sung, Francis School of Computer Science and Engineering EBSLEE@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Pattern recognition The soundscape of urban parks and cities are composed of a variety of natural and man-made noises. The benefits brought by urban parks and the health ailments from sound pollution makes soundscape analysis a valuable study topic in urban cities, like Singapore. However, the need to collect sound data for research and analysis is hampered by the difficulty of recruiting volunteers and manually annotating samples accurately. The use of machine learning in predicting and annotating environment sounds can help in data collection. In my previous work, the use of sound spectrum was to overcome the risk of audio recording storage, where the audio data may expose personal information of the data collection volunteer. Existing works successfully proved the usage of sound spectrum in training a deep learning model to predict with high accuracy. However, the model displayed signs of overfitting. This project aims to explore various ways to improve the existing sound spectrum data pipeline’s accuracy, while limiting the overfitting issue found in models used in the pipeline. In investigating the limitations in the data pipeline, the Convolutional Neural Network (CNN) model architecture used was found to limit model performance, due to its inability to capture global relationships of sound spectrum features. A literature review and evaluation of transformer models for sound classification found the Audio Spectrogram Transformer (AST) most suitable to replace CNN due to the stable performance despite having its batch size reduced when trained in a resource constrained environment. Training AST with sound spectrum yielded an accuracy of 62.30% when compared to CNN’s accuracy of 46.50%, but the performance improvement was limited. Class reduction and data augmentations techniques were experimented; both techniques improved the accuracy of AST, with class reduction improving the accuracy of AST up to 91.25% but at the cost of having limited and less suitable class labels for evaluation. Ultimately, the use of state-of-the-art models and other performance improving techniques in the sound spectrum data pipeline has successfully improved overall model accuracy and future works is likely to investigate on finetuning of the data pipeline for deployment. Bachelor of Engineering (Computer Engineering) 2023-04-13T04:08:26Z 2023-04-13T04:08:26Z 2023 Final Year Project (FYP) Tan, K. I. (2023). Classification of sound using machine learning. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/165777 https://hdl.handle.net/10356/165777 en SCSE22-0634 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
Tan, Ki In
Classification of sound using machine learning
description The soundscape of urban parks and cities are composed of a variety of natural and man-made noises. The benefits brought by urban parks and the health ailments from sound pollution makes soundscape analysis a valuable study topic in urban cities, like Singapore. However, the need to collect sound data for research and analysis is hampered by the difficulty of recruiting volunteers and manually annotating samples accurately. The use of machine learning in predicting and annotating environment sounds can help in data collection. In my previous work, the use of sound spectrum was to overcome the risk of audio recording storage, where the audio data may expose personal information of the data collection volunteer. Existing works successfully proved the usage of sound spectrum in training a deep learning model to predict with high accuracy. However, the model displayed signs of overfitting. This project aims to explore various ways to improve the existing sound spectrum data pipeline’s accuracy, while limiting the overfitting issue found in models used in the pipeline. In investigating the limitations in the data pipeline, the Convolutional Neural Network (CNN) model architecture used was found to limit model performance, due to its inability to capture global relationships of sound spectrum features. A literature review and evaluation of transformer models for sound classification found the Audio Spectrogram Transformer (AST) most suitable to replace CNN due to the stable performance despite having its batch size reduced when trained in a resource constrained environment. Training AST with sound spectrum yielded an accuracy of 62.30% when compared to CNN’s accuracy of 46.50%, but the performance improvement was limited. Class reduction and data augmentations techniques were experimented; both techniques improved the accuracy of AST, with class reduction improving the accuracy of AST up to 91.25% but at the cost of having limited and less suitable class labels for evaluation. Ultimately, the use of state-of-the-art models and other performance improving techniques in the sound spectrum data pipeline has successfully improved overall model accuracy and future works is likely to investigate on finetuning of the data pipeline for deployment.
author2 Lee Bu Sung, Francis
author_facet Lee Bu Sung, Francis
Tan, Ki In
format Final Year Project
author Tan, Ki In
author_sort Tan, Ki In
title Classification of sound using machine learning
title_short Classification of sound using machine learning
title_full Classification of sound using machine learning
title_fullStr Classification of sound using machine learning
title_full_unstemmed Classification of sound using machine learning
title_sort classification of sound using machine learning
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/165777
_version_ 1764208013692895232