DEVELOPMENT OF BAND-SPLIT RNN AND HYBRID TRANSFORMER DEMUCSFOR MUSIC SOURCE SEPARATION
In recent years, models have been developed in the field of music source separation (MSS). The current state-of-the-art models are Hybrid Transformer Demucs (HT Demucs) and Band-Split RNN (BSRNN). Recent research shows that the pre- trained HT Demucs model can separate six sources (drums, bass,...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/85018 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | In recent years, models have been developed in the field of music source separation
(MSS). The current state-of-the-art models are Hybrid Transformer Demucs (HT
Demucs) and Band-Split RNN (BSRNN). Recent research shows that the pre-
trained HT Demucs model can separate six sources (drums, bass, guitar, piano,
vocals, and others), tested using the MoisesDB dataset, but scores relatively low on
the guitar, piano, and other sources compared to bass, drums, and vocals sources,
measured by the utterance-level Signal-to-Distortion (uSDR) metric. However, no
research has yet demonstrated the performance of the BSRNN model in separating
these six sources.
This thesis aims to investigate the performance of the BSRNN and HT Demucs
models in separating six sources. For this purpose, BSRNN and HT Demucs models
were developed for six-source separation using the MoisesDB dataset. These two
models were then evaluated and analyzed to determine the best model for six-source
separation. Experimental results show that the HT Demucs model excels in
separating all sources compared to the BSRNN model, measured on the uSDR and
cSDR metrics with averages of 6.26 dB and 5.88 dB respectively for the HT
Demucs model, while the BSRNN model achieved scores of 5.52 dB and 5.38 dB.
Additionally, the trained HT Demucs model outperformed the pre-trained HT
Demucs model on the piano and other sources by differences of 1 dB and 0.3 dB
respectively. |
---|