Comparison of 1D VS 2D convolutional neural networks for bird sound detection

Automatic acoustic detection system is useful to assist the bird naturalists on bird species monitoring and overall ecosystem health. Many birds are most easily discovered by their sounds, therefore passive acoustic monitoring is most appropriate. However, acoustic monitoring encounters practical li...

Full description

Saved in:

Bibliographic Details
Main Author:	Tan, Pei Hong
Format:	Thesis
Language:	English
Published:	2022
Subjects:	TK Electrical engineering. Electronics Nuclear engineering
Online Access:	http://eprints.utm.my/id/eprint/99590/1/TanPeiHongMSKE2022.pdf http://eprints.utm.my/id/eprint/99590/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:149773
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Teknologi Malaysia
Language:	English

id	my.utm.99590
record_format	eprints
spelling	my.utm.995902023-03-08T03:36:50Z http://eprints.utm.my/id/eprint/99590/ Comparison of 1D VS 2D convolutional neural networks for bird sound detection Tan, Pei Hong TK Electrical engineering. Electronics Nuclear engineering Automatic acoustic detection system is useful to assist the bird naturalists on bird species monitoring and overall ecosystem health. Many birds are most easily discovered by their sounds, therefore passive acoustic monitoring is most appropriate. However, acoustic monitoring encounters practical limitations such as manual configuration requirement, highly dependent on sounds libraries, less accurate and less robust. In recent years, various machines learning techniques are proposed and detailed performance evaluation are conducted to determine how feasible the automatic acoustic detection system can be achieved. In this paper, we propose a 1D convolutional neural network (CNN) architecture for bird sound detection and compare it with Bulbul 2D CNN architecture which is the winner of Bird Audio Detection (BAD) challenge. The proposed 1D CNNs managed to learn a representation directly from the raw audio recordings. The preprocessing phase divides the audio signal into overlapping frames using a sliding window, thus it can handle audio streams of any duration. The sizes of each frame are compatible to the input layer of the 1D CNNs. On the other hand, the preprocessing phase of Bulbul 2D CNN architecture adopted two type of feature extraction methods, STFT spectrogram and Mel-scaled spectrogram to capture the amplitude of a signal as it changes over time and at various frequencies. The performance o f the proposed 1D CNN model in detecting the bird sound was assessed on the warblrb10k dataset and the experimental results have shown that it achieves an accuracy lower than the Bulbul 2D CNN model. It was proven in a few previous 1D CNN state-of-the-art approaches outperform most of the other approaches that uses handcrafted features or 2D representations as input. Due to time constraint, several significant steps of promising high accuracy on 1D CNN model could not be done, such as aggregating the prediction result for all the audio frames belonging to the same audio recording with a majority rule or sum rule to determine the final prediction for presence of bird for the whole individual audio recording, thus lead to achieving low accuracy of 1D CNN model in this paper. 2022 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/99590/1/TanPeiHongMSKE2022.pdf Tan, Pei Hong (2022) Comparison of 1D VS 2D convolutional neural networks for bird sound detection. Masters thesis, Universiti Teknologi Malaysia, Faculty of Engineering - School of Electrical Engineering. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:149773
institution	Universiti Teknologi Malaysia
building	UTM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Malaysia
content_source	UTM Institutional Repository
url_provider	http://eprints.utm.my/
language	English
topic	TK Electrical engineering. Electronics Nuclear engineering
spellingShingle	TK Electrical engineering. Electronics Nuclear engineering Tan, Pei Hong Comparison of 1D VS 2D convolutional neural networks for bird sound detection
description	Automatic acoustic detection system is useful to assist the bird naturalists on bird species monitoring and overall ecosystem health. Many birds are most easily discovered by their sounds, therefore passive acoustic monitoring is most appropriate. However, acoustic monitoring encounters practical limitations such as manual configuration requirement, highly dependent on sounds libraries, less accurate and less robust. In recent years, various machines learning techniques are proposed and detailed performance evaluation are conducted to determine how feasible the automatic acoustic detection system can be achieved. In this paper, we propose a 1D convolutional neural network (CNN) architecture for bird sound detection and compare it with Bulbul 2D CNN architecture which is the winner of Bird Audio Detection (BAD) challenge. The proposed 1D CNNs managed to learn a representation directly from the raw audio recordings. The preprocessing phase divides the audio signal into overlapping frames using a sliding window, thus it can handle audio streams of any duration. The sizes of each frame are compatible to the input layer of the 1D CNNs. On the other hand, the preprocessing phase of Bulbul 2D CNN architecture adopted two type of feature extraction methods, STFT spectrogram and Mel-scaled spectrogram to capture the amplitude of a signal as it changes over time and at various frequencies. The performance o f the proposed 1D CNN model in detecting the bird sound was assessed on the warblrb10k dataset and the experimental results have shown that it achieves an accuracy lower than the Bulbul 2D CNN model. It was proven in a few previous 1D CNN state-of-the-art approaches outperform most of the other approaches that uses handcrafted features or 2D representations as input. Due to time constraint, several significant steps of promising high accuracy on 1D CNN model could not be done, such as aggregating the prediction result for all the audio frames belonging to the same audio recording with a majority rule or sum rule to determine the final prediction for presence of bird for the whole individual audio recording, thus lead to achieving low accuracy of 1D CNN model in this paper.
format	Thesis
author	Tan, Pei Hong
author_facet	Tan, Pei Hong
author_sort	Tan, Pei Hong
title	Comparison of 1D VS 2D convolutional neural networks for bird sound detection
title_short	Comparison of 1D VS 2D convolutional neural networks for bird sound detection
title_full	Comparison of 1D VS 2D convolutional neural networks for bird sound detection
title_fullStr	Comparison of 1D VS 2D convolutional neural networks for bird sound detection
title_full_unstemmed	Comparison of 1D VS 2D convolutional neural networks for bird sound detection
title_sort	comparison of 1d vs 2d convolutional neural networks for bird sound detection
publishDate	2022
url	http://eprints.utm.my/id/eprint/99590/1/TanPeiHongMSKE2022.pdf http://eprints.utm.my/id/eprint/99590/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:149773
_version_	1761616357759123456

Comparison of 1D VS 2D convolutional neural networks for bird sound detection

Similar Items