Comparison of 1D VS 2D convolutional neural networks for bird sound detection

Automatic acoustic detection system is useful to assist the bird naturalists on bird species monitoring and overall ecosystem health. Many birds are most easily discovered by their sounds, therefore passive acoustic monitoring is most appropriate. However, acoustic monitoring encounters practical li...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Pei Hong
Format: Thesis
Language:English
Published: 2022
Subjects:
Online Access:http://eprints.utm.my/id/eprint/99590/1/TanPeiHongMSKE2022.pdf
http://eprints.utm.my/id/eprint/99590/
http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:149773
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Malaysia
Language: English
id my.utm.99590
record_format eprints
spelling my.utm.995902023-03-08T03:36:50Z http://eprints.utm.my/id/eprint/99590/ Comparison of 1D VS 2D convolutional neural networks for bird sound detection Tan, Pei Hong TK Electrical engineering. Electronics Nuclear engineering Automatic acoustic detection system is useful to assist the bird naturalists on bird species monitoring and overall ecosystem health. Many birds are most easily discovered by their sounds, therefore passive acoustic monitoring is most appropriate. However, acoustic monitoring encounters practical limitations such as manual configuration requirement, highly dependent on sounds libraries, less accurate and less robust. In recent years, various machines learning techniques are proposed and detailed performance evaluation are conducted to determine how feasible the automatic acoustic detection system can be achieved. In this paper, we propose a 1D convolutional neural network (CNN) architecture for bird sound detection and compare it with Bulbul 2D CNN architecture which is the winner of Bird Audio Detection (BAD) challenge. The proposed 1D CNNs managed to learn a representation directly from the raw audio recordings. The preprocessing phase divides the audio signal into overlapping frames using a sliding window, thus it can handle audio streams of any duration. The sizes of each frame are compatible to the input layer of the 1D CNNs. On the other hand, the preprocessing phase of Bulbul 2D CNN architecture adopted two type of feature extraction methods, STFT spectrogram and Mel-scaled spectrogram to capture the amplitude of a signal as it changes over time and at various frequencies. The performance o f the proposed 1D CNN model in detecting the bird sound was assessed on the warblrb10k dataset and the experimental results have shown that it achieves an accuracy lower than the Bulbul 2D CNN model. It was proven in a few previous 1D CNN state-of-the-art approaches outperform most of the other approaches that uses handcrafted features or 2D representations as input. Due to time constraint, several significant steps of promising high accuracy on 1D CNN model could not be done, such as aggregating the prediction result for all the audio frames belonging to the same audio recording with a majority rule or sum rule to determine the final prediction for presence of bird for the whole individual audio recording, thus lead to achieving low accuracy of 1D CNN model in this paper. 2022 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/99590/1/TanPeiHongMSKE2022.pdf Tan, Pei Hong (2022) Comparison of 1D VS 2D convolutional neural networks for bird sound detection. Masters thesis, Universiti Teknologi Malaysia, Faculty of Engineering - School of Electrical Engineering. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:149773
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
language English
topic TK Electrical engineering. Electronics Nuclear engineering
spellingShingle TK Electrical engineering. Electronics Nuclear engineering
Tan, Pei Hong
Comparison of 1D VS 2D convolutional neural networks for bird sound detection
description Automatic acoustic detection system is useful to assist the bird naturalists on bird species monitoring and overall ecosystem health. Many birds are most easily discovered by their sounds, therefore passive acoustic monitoring is most appropriate. However, acoustic monitoring encounters practical limitations such as manual configuration requirement, highly dependent on sounds libraries, less accurate and less robust. In recent years, various machines learning techniques are proposed and detailed performance evaluation are conducted to determine how feasible the automatic acoustic detection system can be achieved. In this paper, we propose a 1D convolutional neural network (CNN) architecture for bird sound detection and compare it with Bulbul 2D CNN architecture which is the winner of Bird Audio Detection (BAD) challenge. The proposed 1D CNNs managed to learn a representation directly from the raw audio recordings. The preprocessing phase divides the audio signal into overlapping frames using a sliding window, thus it can handle audio streams of any duration. The sizes of each frame are compatible to the input layer of the 1D CNNs. On the other hand, the preprocessing phase of Bulbul 2D CNN architecture adopted two type of feature extraction methods, STFT spectrogram and Mel-scaled spectrogram to capture the amplitude of a signal as it changes over time and at various frequencies. The performance o f the proposed 1D CNN model in detecting the bird sound was assessed on the warblrb10k dataset and the experimental results have shown that it achieves an accuracy lower than the Bulbul 2D CNN model. It was proven in a few previous 1D CNN state-of-the-art approaches outperform most of the other approaches that uses handcrafted features or 2D representations as input. Due to time constraint, several significant steps of promising high accuracy on 1D CNN model could not be done, such as aggregating the prediction result for all the audio frames belonging to the same audio recording with a majority rule or sum rule to determine the final prediction for presence of bird for the whole individual audio recording, thus lead to achieving low accuracy of 1D CNN model in this paper.
format Thesis
author Tan, Pei Hong
author_facet Tan, Pei Hong
author_sort Tan, Pei Hong
title Comparison of 1D VS 2D convolutional neural networks for bird sound detection
title_short Comparison of 1D VS 2D convolutional neural networks for bird sound detection
title_full Comparison of 1D VS 2D convolutional neural networks for bird sound detection
title_fullStr Comparison of 1D VS 2D convolutional neural networks for bird sound detection
title_full_unstemmed Comparison of 1D VS 2D convolutional neural networks for bird sound detection
title_sort comparison of 1d vs 2d convolutional neural networks for bird sound detection
publishDate 2022
url http://eprints.utm.my/id/eprint/99590/1/TanPeiHongMSKE2022.pdf
http://eprints.utm.my/id/eprint/99590/
http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:149773
_version_ 1761616357759123456