HARMONIZATION OF WORD AND MELODY: INDONESIAN SONG LYRICS ALIGNMENT USING PHONEME REPRESENTATION METHOD

Song lyrics play an crucial role in music, providing deep meaning and emotion to listeners. However, aligning lyrics with the rhythm of music is a significant challenge. This study focuses on developing a model for aligning Indonesian song lyrics using artificial intelligence approaches. This st...

Full description

Saved in:
Bibliographic Details
Main Author: Stefanus
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/85320
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:85320
spelling id-itb.:853202024-08-20T10:21:46ZHARMONIZATION OF WORD AND MELODY: INDONESIAN SONG LYRICS ALIGNMENT USING PHONEME REPRESENTATION METHOD Stefanus Indonesia Theses lyric alignment, music rhythm, forced alignment, speech processing, Indonesian language, artificial intelligence INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/85320 Song lyrics play an crucial role in music, providing deep meaning and emotion to listeners. However, aligning lyrics with the rhythm of music is a significant challenge. This study focuses on developing a model for aligning Indonesian song lyrics using artificial intelligence approaches. This study adopts forced alignment techniques that have been widely used in aligning automatic speech recognition results with audio. Forced alignment is a technique used to place phonemes, words, or phrases onto a corresponding timeline. However, the application of this technique in the domain of music and Indonesian language is very limited. Therefore, this study aims to explore how voice processing technology can be used to align text lyrics with Indonesian musical rhythms. The research involves several stages, starting from web scraping for collecting a dataset of Indonesian songs to employing the SEMMA methodology (Sample, Explore, Modify, Model, Assess) for the development of the forced alignment model. The results indicate that the proposed approach, which includes phoneme translation and transfer learning with the Hidden Markov Model - Gaussian Mixture Model (HMM-GMM), yields better outcomes compared to commonly used forced alignment models such as NeMo Forced Aligner (NFA) and Massively Multilingual Speech – Forced Alignment (MMS-FA). In terms of the Mean Average Error (MAE) metric, the proposed model achieved an average value of 947.86 milliseconds, while in the Segment Error Rate (SER) metric, the model reached a result of 0.0016 (~0.1%). These results demonstrate that the developed model can align Indonesian song lyrics more accurately than the NFA model (MAE=1742.46 milliseconds, SER=0.0740) and the MMS-FA model (MAE=1945.82 milliseconds, SER=0.1609). text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Song lyrics play an crucial role in music, providing deep meaning and emotion to listeners. However, aligning lyrics with the rhythm of music is a significant challenge. This study focuses on developing a model for aligning Indonesian song lyrics using artificial intelligence approaches. This study adopts forced alignment techniques that have been widely used in aligning automatic speech recognition results with audio. Forced alignment is a technique used to place phonemes, words, or phrases onto a corresponding timeline. However, the application of this technique in the domain of music and Indonesian language is very limited. Therefore, this study aims to explore how voice processing technology can be used to align text lyrics with Indonesian musical rhythms. The research involves several stages, starting from web scraping for collecting a dataset of Indonesian songs to employing the SEMMA methodology (Sample, Explore, Modify, Model, Assess) for the development of the forced alignment model. The results indicate that the proposed approach, which includes phoneme translation and transfer learning with the Hidden Markov Model - Gaussian Mixture Model (HMM-GMM), yields better outcomes compared to commonly used forced alignment models such as NeMo Forced Aligner (NFA) and Massively Multilingual Speech – Forced Alignment (MMS-FA). In terms of the Mean Average Error (MAE) metric, the proposed model achieved an average value of 947.86 milliseconds, while in the Segment Error Rate (SER) metric, the model reached a result of 0.0016 (~0.1%). These results demonstrate that the developed model can align Indonesian song lyrics more accurately than the NFA model (MAE=1742.46 milliseconds, SER=0.0740) and the MMS-FA model (MAE=1945.82 milliseconds, SER=0.1609).
format Theses
author Stefanus
spellingShingle Stefanus
HARMONIZATION OF WORD AND MELODY: INDONESIAN SONG LYRICS ALIGNMENT USING PHONEME REPRESENTATION METHOD
author_facet Stefanus
author_sort Stefanus
title HARMONIZATION OF WORD AND MELODY: INDONESIAN SONG LYRICS ALIGNMENT USING PHONEME REPRESENTATION METHOD
title_short HARMONIZATION OF WORD AND MELODY: INDONESIAN SONG LYRICS ALIGNMENT USING PHONEME REPRESENTATION METHOD
title_full HARMONIZATION OF WORD AND MELODY: INDONESIAN SONG LYRICS ALIGNMENT USING PHONEME REPRESENTATION METHOD
title_fullStr HARMONIZATION OF WORD AND MELODY: INDONESIAN SONG LYRICS ALIGNMENT USING PHONEME REPRESENTATION METHOD
title_full_unstemmed HARMONIZATION OF WORD AND MELODY: INDONESIAN SONG LYRICS ALIGNMENT USING PHONEME REPRESENTATION METHOD
title_sort harmonization of word and melody: indonesian song lyrics alignment using phoneme representation method
url https://digilib.itb.ac.id/gdl/view/85320
_version_ 1822999130973143040