Classification of sound using machine learning
The soundscape of urban parks and cities are composed of a variety of natural and man-made noises. The benefits brought by urban parks and the health ailments from sound pollution makes soundscape analysis a valuable study topic in urban cities, like Singapore. However, the need to collect sound dat...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/165777 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-165777 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1657772023-04-14T15:37:39Z Classification of sound using machine learning Tan, Ki In Lee Bu Sung, Francis School of Computer Science and Engineering EBSLEE@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Pattern recognition The soundscape of urban parks and cities are composed of a variety of natural and man-made noises. The benefits brought by urban parks and the health ailments from sound pollution makes soundscape analysis a valuable study topic in urban cities, like Singapore. However, the need to collect sound data for research and analysis is hampered by the difficulty of recruiting volunteers and manually annotating samples accurately. The use of machine learning in predicting and annotating environment sounds can help in data collection. In my previous work, the use of sound spectrum was to overcome the risk of audio recording storage, where the audio data may expose personal information of the data collection volunteer. Existing works successfully proved the usage of sound spectrum in training a deep learning model to predict with high accuracy. However, the model displayed signs of overfitting. This project aims to explore various ways to improve the existing sound spectrum data pipeline’s accuracy, while limiting the overfitting issue found in models used in the pipeline. In investigating the limitations in the data pipeline, the Convolutional Neural Network (CNN) model architecture used was found to limit model performance, due to its inability to capture global relationships of sound spectrum features. A literature review and evaluation of transformer models for sound classification found the Audio Spectrogram Transformer (AST) most suitable to replace CNN due to the stable performance despite having its batch size reduced when trained in a resource constrained environment. Training AST with sound spectrum yielded an accuracy of 62.30% when compared to CNN’s accuracy of 46.50%, but the performance improvement was limited. Class reduction and data augmentations techniques were experimented; both techniques improved the accuracy of AST, with class reduction improving the accuracy of AST up to 91.25% but at the cost of having limited and less suitable class labels for evaluation. Ultimately, the use of state-of-the-art models and other performance improving techniques in the sound spectrum data pipeline has successfully improved overall model accuracy and future works is likely to investigate on finetuning of the data pipeline for deployment. Bachelor of Engineering (Computer Engineering) 2023-04-13T04:08:26Z 2023-04-13T04:08:26Z 2023 Final Year Project (FYP) Tan, K. I. (2023). Classification of sound using machine learning. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/165777 https://hdl.handle.net/10356/165777 en SCSE22-0634 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Pattern recognition |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Pattern recognition Tan, Ki In Classification of sound using machine learning |
description |
The soundscape of urban parks and cities are composed of a variety of natural and man-made noises. The benefits brought by urban parks and the health ailments from sound pollution makes soundscape analysis a valuable study topic in urban cities, like Singapore. However, the need to collect sound data for research and analysis is hampered by the difficulty of recruiting volunteers and manually annotating samples accurately. The use of machine learning in predicting and annotating environment sounds can help in data collection. In my previous work, the use of sound spectrum was to overcome the risk of audio recording storage, where the audio data may expose personal information of the data collection volunteer. Existing works successfully proved the usage of sound spectrum in training a deep learning model to predict with high accuracy. However, the model displayed signs of overfitting. This project aims to explore various ways to improve the existing sound spectrum data pipeline’s accuracy, while limiting the overfitting issue found in models used in the pipeline. In investigating the limitations in the data pipeline, the Convolutional Neural Network (CNN) model architecture used was found to limit model performance, due to its inability to capture global relationships of sound spectrum features. A literature review and evaluation of transformer models for sound classification found the Audio Spectrogram Transformer (AST) most suitable to replace CNN due to the stable performance despite having its batch size reduced when trained in a resource constrained environment. Training AST with sound spectrum yielded an accuracy of 62.30% when compared to CNN’s accuracy of 46.50%, but the performance improvement was limited. Class reduction and data augmentations techniques were experimented; both techniques improved the accuracy of AST, with class reduction improving the accuracy of AST up to 91.25% but at the cost of having limited and less suitable class labels for evaluation. Ultimately, the use of state-of-the-art models and other performance improving techniques in the sound spectrum data pipeline has successfully improved overall model accuracy and future works is likely to investigate on finetuning of the data pipeline for deployment. |
author2 |
Lee Bu Sung, Francis |
author_facet |
Lee Bu Sung, Francis Tan, Ki In |
format |
Final Year Project |
author |
Tan, Ki In |
author_sort |
Tan, Ki In |
title |
Classification of sound using machine learning |
title_short |
Classification of sound using machine learning |
title_full |
Classification of sound using machine learning |
title_fullStr |
Classification of sound using machine learning |
title_full_unstemmed |
Classification of sound using machine learning |
title_sort |
classification of sound using machine learning |
publisher |
Nanyang Technological University |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/165777 |
_version_ |
1764208013692895232 |