DEVELOPMENT OF A BIRD SONG IDENTIFICATION MODEL USING A CONVOLUTIONAL NEURAL NETWORK (CNN)
Birds are a group of animals that have a very large diversity of species. However, this diversity is under threat of extinction due to human activities and natural phenomena. Identifying bird species that still exist in an ecosystem is an initial action that can be taken to protect this bird dive...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/74796 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Birds are a group of animals that have a very large diversity of species. However,
this diversity is under threat of extinction due to human activities and natural
phenomena. Identifying bird species that still exist in an ecosystem is an initial
action that can be taken to protect this bird diversity. In this final project's research,
a Convolutional Neural Network (CNN) model was built to automate the process
of identifying bird species in a bird song sound record.
In the experiments carried out, 24 CNN models were built, which had differences
in architecture, feature extraction techniques, and the type of data used. There are
three architectures used: AlexNet, DenseNet, and VGG. There are four feature
extraction techniques used: mel-spectrogram, harmonic component-based mel-
spectrogram, percussive component-based mel-spectrogram, and MFCC.
Meanwhile, there are two types of data used: clean data and raw data. Of these 24
models, three were then selected for hyperparameter tuning.
Based on the results of the experimental analysis, the model with the best
configuration was obtained with the DenseNet architecture, feature extraction with
percussive component-based mel-spectrogram, and clean data as model input. This
evaluation process is carried out using two types of testing methods: testing with
chunk data and testing with complete data. The accuracy of the model with the best
configuration reaches 85.67% for method 1 and 90.42% for method 2. |
---|