HARMONIZATION OF WORD AND MELODY: INDONESIAN SONG LYRICS ALIGNMENT USING PHONEME REPRESENTATION METHOD
Song lyrics play an crucial role in music, providing deep meaning and emotion to listeners. However, aligning lyrics with the rhythm of music is a significant challenge. This study focuses on developing a model for aligning Indonesian song lyrics using artificial intelligence approaches. This st...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/85320 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:85320 |
---|---|
spelling |
id-itb.:853202024-08-20T10:21:46ZHARMONIZATION OF WORD AND MELODY: INDONESIAN SONG LYRICS ALIGNMENT USING PHONEME REPRESENTATION METHOD Stefanus Indonesia Theses lyric alignment, music rhythm, forced alignment, speech processing, Indonesian language, artificial intelligence INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/85320 Song lyrics play an crucial role in music, providing deep meaning and emotion to listeners. However, aligning lyrics with the rhythm of music is a significant challenge. This study focuses on developing a model for aligning Indonesian song lyrics using artificial intelligence approaches. This study adopts forced alignment techniques that have been widely used in aligning automatic speech recognition results with audio. Forced alignment is a technique used to place phonemes, words, or phrases onto a corresponding timeline. However, the application of this technique in the domain of music and Indonesian language is very limited. Therefore, this study aims to explore how voice processing technology can be used to align text lyrics with Indonesian musical rhythms. The research involves several stages, starting from web scraping for collecting a dataset of Indonesian songs to employing the SEMMA methodology (Sample, Explore, Modify, Model, Assess) for the development of the forced alignment model. The results indicate that the proposed approach, which includes phoneme translation and transfer learning with the Hidden Markov Model - Gaussian Mixture Model (HMM-GMM), yields better outcomes compared to commonly used forced alignment models such as NeMo Forced Aligner (NFA) and Massively Multilingual Speech – Forced Alignment (MMS-FA). In terms of the Mean Average Error (MAE) metric, the proposed model achieved an average value of 947.86 milliseconds, while in the Segment Error Rate (SER) metric, the model reached a result of 0.0016 (~0.1%). These results demonstrate that the developed model can align Indonesian song lyrics more accurately than the NFA model (MAE=1742.46 milliseconds, SER=0.0740) and the MMS-FA model (MAE=1945.82 milliseconds, SER=0.1609). text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Song lyrics play an crucial role in music, providing deep meaning and emotion to
listeners. However, aligning lyrics with the rhythm of music is a significant
challenge. This study focuses on developing a model for aligning Indonesian song
lyrics using artificial intelligence approaches.
This study adopts forced alignment techniques that have been widely used in
aligning automatic speech recognition results with audio. Forced alignment is a
technique used to place phonemes, words, or phrases onto a corresponding
timeline. However, the application of this technique in the domain of music and
Indonesian language is very limited. Therefore, this study aims to explore how voice
processing technology can be used to align text lyrics with Indonesian musical
rhythms.
The research involves several stages, starting from web scraping for collecting a
dataset of Indonesian songs to employing the SEMMA methodology (Sample,
Explore, Modify, Model, Assess) for the development of the forced alignment model.
The results indicate that the proposed approach, which includes phoneme
translation and transfer learning with the Hidden Markov Model - Gaussian
Mixture Model (HMM-GMM), yields better outcomes compared to commonly used
forced alignment models such as NeMo Forced Aligner (NFA) and Massively
Multilingual Speech – Forced Alignment (MMS-FA).
In terms of the Mean Average Error (MAE) metric, the proposed model achieved
an average value of 947.86 milliseconds, while in the Segment Error Rate (SER)
metric, the model reached a result of 0.0016 (~0.1%). These results demonstrate
that the developed model can align Indonesian song lyrics more accurately than
the NFA model (MAE=1742.46 milliseconds, SER=0.0740) and the MMS-FA
model (MAE=1945.82 milliseconds, SER=0.1609). |
format |
Theses |
author |
Stefanus |
spellingShingle |
Stefanus HARMONIZATION OF WORD AND MELODY: INDONESIAN SONG LYRICS ALIGNMENT USING PHONEME REPRESENTATION METHOD |
author_facet |
Stefanus |
author_sort |
Stefanus |
title |
HARMONIZATION OF WORD AND MELODY: INDONESIAN SONG LYRICS ALIGNMENT USING PHONEME REPRESENTATION METHOD |
title_short |
HARMONIZATION OF WORD AND MELODY: INDONESIAN SONG LYRICS ALIGNMENT USING PHONEME REPRESENTATION METHOD |
title_full |
HARMONIZATION OF WORD AND MELODY: INDONESIAN SONG LYRICS ALIGNMENT USING PHONEME REPRESENTATION METHOD |
title_fullStr |
HARMONIZATION OF WORD AND MELODY: INDONESIAN SONG LYRICS ALIGNMENT USING PHONEME REPRESENTATION METHOD |
title_full_unstemmed |
HARMONIZATION OF WORD AND MELODY: INDONESIAN SONG LYRICS ALIGNMENT USING PHONEME REPRESENTATION METHOD |
title_sort |
harmonization of word and melody: indonesian song lyrics alignment using phoneme representation method |
url |
https://digilib.itb.ac.id/gdl/view/85320 |
_version_ |
1822999130973143040 |