Comparison of 1D VS 2D convolutional neural networks for bird sound detection
Automatic acoustic detection system is useful to assist the bird naturalists on bird species monitoring and overall ecosystem health. Many birds are most easily discovered by their sounds, therefore passive acoustic monitoring is most appropriate. However, acoustic monitoring encounters practical li...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2022
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/99590/1/TanPeiHongMSKE2022.pdf http://eprints.utm.my/id/eprint/99590/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:149773 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Teknologi Malaysia |
Language: | English |
id |
my.utm.99590 |
---|---|
record_format |
eprints |
spelling |
my.utm.995902023-03-08T03:36:50Z http://eprints.utm.my/id/eprint/99590/ Comparison of 1D VS 2D convolutional neural networks for bird sound detection Tan, Pei Hong TK Electrical engineering. Electronics Nuclear engineering Automatic acoustic detection system is useful to assist the bird naturalists on bird species monitoring and overall ecosystem health. Many birds are most easily discovered by their sounds, therefore passive acoustic monitoring is most appropriate. However, acoustic monitoring encounters practical limitations such as manual configuration requirement, highly dependent on sounds libraries, less accurate and less robust. In recent years, various machines learning techniques are proposed and detailed performance evaluation are conducted to determine how feasible the automatic acoustic detection system can be achieved. In this paper, we propose a 1D convolutional neural network (CNN) architecture for bird sound detection and compare it with Bulbul 2D CNN architecture which is the winner of Bird Audio Detection (BAD) challenge. The proposed 1D CNNs managed to learn a representation directly from the raw audio recordings. The preprocessing phase divides the audio signal into overlapping frames using a sliding window, thus it can handle audio streams of any duration. The sizes of each frame are compatible to the input layer of the 1D CNNs. On the other hand, the preprocessing phase of Bulbul 2D CNN architecture adopted two type of feature extraction methods, STFT spectrogram and Mel-scaled spectrogram to capture the amplitude of a signal as it changes over time and at various frequencies. The performance o f the proposed 1D CNN model in detecting the bird sound was assessed on the warblrb10k dataset and the experimental results have shown that it achieves an accuracy lower than the Bulbul 2D CNN model. It was proven in a few previous 1D CNN state-of-the-art approaches outperform most of the other approaches that uses handcrafted features or 2D representations as input. Due to time constraint, several significant steps of promising high accuracy on 1D CNN model could not be done, such as aggregating the prediction result for all the audio frames belonging to the same audio recording with a majority rule or sum rule to determine the final prediction for presence of bird for the whole individual audio recording, thus lead to achieving low accuracy of 1D CNN model in this paper. 2022 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/99590/1/TanPeiHongMSKE2022.pdf Tan, Pei Hong (2022) Comparison of 1D VS 2D convolutional neural networks for bird sound detection. Masters thesis, Universiti Teknologi Malaysia, Faculty of Engineering - School of Electrical Engineering. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:149773 |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
language |
English |
topic |
TK Electrical engineering. Electronics Nuclear engineering |
spellingShingle |
TK Electrical engineering. Electronics Nuclear engineering Tan, Pei Hong Comparison of 1D VS 2D convolutional neural networks for bird sound detection |
description |
Automatic acoustic detection system is useful to assist the bird naturalists on bird species monitoring and overall ecosystem health. Many birds are most easily discovered by their sounds, therefore passive acoustic monitoring is most appropriate. However, acoustic monitoring encounters practical limitations such as manual configuration requirement, highly dependent on sounds libraries, less accurate and less robust. In recent years, various machines learning techniques are proposed and detailed performance evaluation are conducted to determine how feasible the automatic acoustic detection system can be achieved. In this paper, we propose a 1D convolutional neural network (CNN) architecture for bird sound detection and compare it with Bulbul 2D CNN architecture which is the winner of Bird Audio Detection (BAD) challenge. The proposed 1D CNNs managed to learn a representation directly from the raw audio recordings. The preprocessing phase divides the audio signal into overlapping frames using a sliding window, thus it can handle audio streams of any duration. The sizes of each frame are compatible to the input layer of the 1D CNNs. On the other hand, the preprocessing phase of Bulbul 2D CNN architecture adopted two type of feature extraction methods, STFT spectrogram and Mel-scaled spectrogram to capture the amplitude of a signal as it changes over time and at various frequencies. The performance o f the proposed 1D CNN model in detecting the bird sound was assessed on the warblrb10k dataset and the experimental results have shown that it achieves an accuracy lower than the Bulbul 2D CNN model. It was proven in a few previous 1D CNN state-of-the-art approaches outperform most of the other approaches that uses handcrafted features or 2D representations as input. Due to time constraint, several significant steps of promising high accuracy on 1D CNN model could not be done, such as aggregating the prediction result for all the audio frames belonging to the same audio recording with a majority rule or sum rule to determine the final prediction for presence of bird for the whole individual audio recording, thus lead to achieving low accuracy of 1D CNN model in this paper. |
format |
Thesis |
author |
Tan, Pei Hong |
author_facet |
Tan, Pei Hong |
author_sort |
Tan, Pei Hong |
title |
Comparison of 1D VS 2D convolutional neural networks for bird sound detection |
title_short |
Comparison of 1D VS 2D convolutional neural networks for bird sound detection |
title_full |
Comparison of 1D VS 2D convolutional neural networks for bird sound detection |
title_fullStr |
Comparison of 1D VS 2D convolutional neural networks for bird sound detection |
title_full_unstemmed |
Comparison of 1D VS 2D convolutional neural networks for bird sound detection |
title_sort |
comparison of 1d vs 2d convolutional neural networks for bird sound detection |
publishDate |
2022 |
url |
http://eprints.utm.my/id/eprint/99590/1/TanPeiHongMSKE2022.pdf http://eprints.utm.my/id/eprint/99590/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:149773 |
_version_ |
1761616357759123456 |