HARMONIZATION OF WORD AND MELODY: INDONESIAN SONG LYRICS ALIGNMENT USING PHONEME REPRESENTATION METHOD
Song lyrics play an crucial role in music, providing deep meaning and emotion to listeners. However, aligning lyrics with the rhythm of music is a significant challenge. This study focuses on developing a model for aligning Indonesian song lyrics using artificial intelligence approaches. This st...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/85320 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Song lyrics play an crucial role in music, providing deep meaning and emotion to
listeners. However, aligning lyrics with the rhythm of music is a significant
challenge. This study focuses on developing a model for aligning Indonesian song
lyrics using artificial intelligence approaches.
This study adopts forced alignment techniques that have been widely used in
aligning automatic speech recognition results with audio. Forced alignment is a
technique used to place phonemes, words, or phrases onto a corresponding
timeline. However, the application of this technique in the domain of music and
Indonesian language is very limited. Therefore, this study aims to explore how voice
processing technology can be used to align text lyrics with Indonesian musical
rhythms.
The research involves several stages, starting from web scraping for collecting a
dataset of Indonesian songs to employing the SEMMA methodology (Sample,
Explore, Modify, Model, Assess) for the development of the forced alignment model.
The results indicate that the proposed approach, which includes phoneme
translation and transfer learning with the Hidden Markov Model - Gaussian
Mixture Model (HMM-GMM), yields better outcomes compared to commonly used
forced alignment models such as NeMo Forced Aligner (NFA) and Massively
Multilingual Speech – Forced Alignment (MMS-FA).
In terms of the Mean Average Error (MAE) metric, the proposed model achieved
an average value of 947.86 milliseconds, while in the Segment Error Rate (SER)
metric, the model reached a result of 0.0016 (~0.1%). These results demonstrate
that the developed model can align Indonesian song lyrics more accurately than
the NFA model (MAE=1742.46 milliseconds, SER=0.0740) and the MMS-FA
model (MAE=1945.82 milliseconds, SER=0.1609). |
---|