Classification of sound using machine learning

The soundscape of urban parks and cities are composed of a variety of natural and man-made noises. The benefits brought by urban parks and the health ailments from sound pollution makes soundscape analysis a valuable study topic in urban cities, like Singapore. However, the need to collect sound dat...

Full description

Saved in:

Bibliographic Details
Main Author:	Tan, Ki In
Other Authors:	Lee Bu Sung, Francis
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2023
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
Online Access:	https://hdl.handle.net/10356/165777
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-165777
record_format	dspace
spelling	sg-ntu-dr.10356-1657772023-04-14T15:37:39Z Classification of sound using machine learning Tan, Ki In Lee Bu Sung, Francis School of Computer Science and Engineering EBSLEE@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Pattern recognition The soundscape of urban parks and cities are composed of a variety of natural and man-made noises. The benefits brought by urban parks and the health ailments from sound pollution makes soundscape analysis a valuable study topic in urban cities, like Singapore. However, the need to collect sound data for research and analysis is hampered by the difficulty of recruiting volunteers and manually annotating samples accurately. The use of machine learning in predicting and annotating environment sounds can help in data collection. In my previous work, the use of sound spectrum was to overcome the risk of audio recording storage, where the audio data may expose personal information of the data collection volunteer. Existing works successfully proved the usage of sound spectrum in training a deep learning model to predict with high accuracy. However, the model displayed signs of overfitting. This project aims to explore various ways to improve the existing sound spectrum data pipeline’s accuracy, while limiting the overfitting issue found in models used in the pipeline. In investigating the limitations in the data pipeline, the Convolutional Neural Network (CNN) model architecture used was found to limit model performance, due to its inability to capture global relationships of sound spectrum features. A literature review and evaluation of transformer models for sound classification found the Audio Spectrogram Transformer (AST) most suitable to replace CNN due to the stable performance despite having its batch size reduced when trained in a resource constrained environment. Training AST with sound spectrum yielded an accuracy of 62.30% when compared to CNN’s accuracy of 46.50%, but the performance improvement was limited. Class reduction and data augmentations techniques were experimented; both techniques improved the accuracy of AST, with class reduction improving the accuracy of AST up to 91.25% but at the cost of having limited and less suitable class labels for evaluation. Ultimately, the use of state-of-the-art models and other performance improving techniques in the sound spectrum data pipeline has successfully improved overall model accuracy and future works is likely to investigate on finetuning of the data pipeline for deployment. Bachelor of Engineering (Computer Engineering) 2023-04-13T04:08:26Z 2023-04-13T04:08:26Z 2023 Final Year Project (FYP) Tan, K. I. (2023). Classification of sound using machine learning. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/165777 https://hdl.handle.net/10356/165777 en SCSE22-0634 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Computing methodologies::Pattern recognition Tan, Ki In Classification of sound using machine learning
description	The soundscape of urban parks and cities are composed of a variety of natural and man-made noises. The benefits brought by urban parks and the health ailments from sound pollution makes soundscape analysis a valuable study topic in urban cities, like Singapore. However, the need to collect sound data for research and analysis is hampered by the difficulty of recruiting volunteers and manually annotating samples accurately. The use of machine learning in predicting and annotating environment sounds can help in data collection. In my previous work, the use of sound spectrum was to overcome the risk of audio recording storage, where the audio data may expose personal information of the data collection volunteer. Existing works successfully proved the usage of sound spectrum in training a deep learning model to predict with high accuracy. However, the model displayed signs of overfitting. This project aims to explore various ways to improve the existing sound spectrum data pipeline’s accuracy, while limiting the overfitting issue found in models used in the pipeline. In investigating the limitations in the data pipeline, the Convolutional Neural Network (CNN) model architecture used was found to limit model performance, due to its inability to capture global relationships of sound spectrum features. A literature review and evaluation of transformer models for sound classification found the Audio Spectrogram Transformer (AST) most suitable to replace CNN due to the stable performance despite having its batch size reduced when trained in a resource constrained environment. Training AST with sound spectrum yielded an accuracy of 62.30% when compared to CNN’s accuracy of 46.50%, but the performance improvement was limited. Class reduction and data augmentations techniques were experimented; both techniques improved the accuracy of AST, with class reduction improving the accuracy of AST up to 91.25% but at the cost of having limited and less suitable class labels for evaluation. Ultimately, the use of state-of-the-art models and other performance improving techniques in the sound spectrum data pipeline has successfully improved overall model accuracy and future works is likely to investigate on finetuning of the data pipeline for deployment.
author2	Lee Bu Sung, Francis
author_facet	Lee Bu Sung, Francis Tan, Ki In
format	Final Year Project
author	Tan, Ki In
author_sort	Tan, Ki In
title	Classification of sound using machine learning
title_short	Classification of sound using machine learning
title_full	Classification of sound using machine learning
title_fullStr	Classification of sound using machine learning
title_full_unstemmed	Classification of sound using machine learning
title_sort	classification of sound using machine learning
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/165777
_version_	1764208013692895232

Classification of sound using machine learning

Similar Items