Animal hunt: bioacoustics animal recognition application
This project aims to create a bioacoustics classification model that can be used for real- time identification of animals based on their sounds in mobile applications. The first part of the project will focus on developing a bioacoustics classification model for the backend of the application. Th...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/175290 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | This project aims to create a bioacoustics classification model that can be used for real-
time identification of animals based on their sounds in mobile applications. The first part
of the project will focus on developing a bioacoustics classification model for the backend
of the application. The second part of the project will emphasize on the deployment of the
model and optimizing its inference performance for edge devices.
To build effective bioacoustics classification models, a substantial amount of labelled data
is often required. The primary challenge for many bioacoustics tasks lies in the scarcity of
training data, especially for rare and endangered species. Furthermore, challenges arise
not only from a scarcity of data, but also from concerns regarding data quality. Many
datasets exhibit weak labelling and are often plagued by background noise and
overlapping vocalizations from different species.
To address the data limitations, this study reframes the bioacoustics classification task as a
few-shot learning problem, primarily relying on transfer learning through pre-trained
global bird embedding models such as BirdNET and Perch, known for their exceptional
generalization capabilities to other non-bird taxa. The performance of their embeddings
was evaluated on three diverse datasets specific to Singapore.
We also propose a pipeline to derive an annotated dataset for supervised learning through
the use of MixIT, a sound separation model designed to isolate background noise and
overlapping vocalizations, and RIBBIT, a bioacoustics tool. RIBBIT can not only identify
the output channel containing the isolated target vocalizations but also generate strongly
labeled data by providing temporal information about the audio events within each
recording.
Our findings demonstrate the superior performance of these large-scale acoustic bird
classifier models in comparison to general audio event detection models for bioacoustics
classification tasks, which can be further improved by applying separation to classifier
training data, addressing the issue of limited high quality training data. |
---|