HARMONIZATION OF WORD AND MELODY: INDONESIAN SONG LYRICS ALIGNMENT USING PHONEME REPRESENTATION METHOD

Song lyrics play an crucial role in music, providing deep meaning and emotion to listeners. However, aligning lyrics with the rhythm of music is a significant challenge. This study focuses on developing a model for aligning Indonesian song lyrics using artificial intelligence approaches. This st...

Full description

Saved in:
Bibliographic Details
Main Author: Stefanus
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/85320
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Song lyrics play an crucial role in music, providing deep meaning and emotion to listeners. However, aligning lyrics with the rhythm of music is a significant challenge. This study focuses on developing a model for aligning Indonesian song lyrics using artificial intelligence approaches. This study adopts forced alignment techniques that have been widely used in aligning automatic speech recognition results with audio. Forced alignment is a technique used to place phonemes, words, or phrases onto a corresponding timeline. However, the application of this technique in the domain of music and Indonesian language is very limited. Therefore, this study aims to explore how voice processing technology can be used to align text lyrics with Indonesian musical rhythms. The research involves several stages, starting from web scraping for collecting a dataset of Indonesian songs to employing the SEMMA methodology (Sample, Explore, Modify, Model, Assess) for the development of the forced alignment model. The results indicate that the proposed approach, which includes phoneme translation and transfer learning with the Hidden Markov Model - Gaussian Mixture Model (HMM-GMM), yields better outcomes compared to commonly used forced alignment models such as NeMo Forced Aligner (NFA) and Massively Multilingual Speech – Forced Alignment (MMS-FA). In terms of the Mean Average Error (MAE) metric, the proposed model achieved an average value of 947.86 milliseconds, while in the Segment Error Rate (SER) metric, the model reached a result of 0.0016 (~0.1%). These results demonstrate that the developed model can align Indonesian song lyrics more accurately than the NFA model (MAE=1742.46 milliseconds, SER=0.0740) and the MMS-FA model (MAE=1945.82 milliseconds, SER=0.1609).