Music generation with deep learning techniques

This research paper studies the development and performance of a Text-to-Music Transformer model. The main objective is to investigate the generative potential of the multimodal transformation, where textual input is converted into musical scores in MIDI format. A comprehensive literature review on...

Full description

Saved in:
Bibliographic Details
Main Author: Low, Paul Solomon Si En
Other Authors: Alexei Sourin
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175113
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-175113
record_format dspace
spelling sg-ntu-dr.10356-1751132024-04-26T15:40:29Z Music generation with deep learning techniques Low, Paul Solomon Si En Alexei Sourin School of Computer Science and Engineering assourin@ntu.edu.sg Computer and Information Science Deep learning Music generation Transformers Music to text Neural networks This research paper studies the development and performance of a Text-to-Music Transformer model. The main objective is to investigate the generative potential of the multimodal transformation, where textual input is converted into musical scores in MIDI format. A comprehensive literature review on existing music synthesis methods forms the basis of this study. This study creates the textual dataset in a novel way by using CLaMP to select the top 30 textual descriptors of the music. A pre-trained RoBERTa model and Octuple tokenizers are used to process the text and musical scores respectively. Thereafter, this music transformer uses neural network architectures with a Fast Transformer base to facilitate the infusion of textual information into generated sequences. Embeddings, linear layers, and cross-entropy loss calculations are used for all 6 musical attributes, with hyperparameter training to promote coherent and varied musical outputs. The generated music was evaluated with a musical analysis and a user study. The results verify that the transformer model can generate music that is either melodious or expresses the textual prompt. Bachelor's degree 2024-04-22T00:37:31Z 2024-04-22T00:37:31Z 2024 Final Year Project (FYP) Low, P. S. S. E. (2024). Music generation with deep learning techniques. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175113 https://hdl.handle.net/10356/175113 en SCSE23-0041 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Deep learning
Music generation
Transformers
Music to text
Neural networks
spellingShingle Computer and Information Science
Deep learning
Music generation
Transformers
Music to text
Neural networks
Low, Paul Solomon Si En
Music generation with deep learning techniques
description This research paper studies the development and performance of a Text-to-Music Transformer model. The main objective is to investigate the generative potential of the multimodal transformation, where textual input is converted into musical scores in MIDI format. A comprehensive literature review on existing music synthesis methods forms the basis of this study. This study creates the textual dataset in a novel way by using CLaMP to select the top 30 textual descriptors of the music. A pre-trained RoBERTa model and Octuple tokenizers are used to process the text and musical scores respectively. Thereafter, this music transformer uses neural network architectures with a Fast Transformer base to facilitate the infusion of textual information into generated sequences. Embeddings, linear layers, and cross-entropy loss calculations are used for all 6 musical attributes, with hyperparameter training to promote coherent and varied musical outputs. The generated music was evaluated with a musical analysis and a user study. The results verify that the transformer model can generate music that is either melodious or expresses the textual prompt.
author2 Alexei Sourin
author_facet Alexei Sourin
Low, Paul Solomon Si En
format Final Year Project
author Low, Paul Solomon Si En
author_sort Low, Paul Solomon Si En
title Music generation with deep learning techniques
title_short Music generation with deep learning techniques
title_full Music generation with deep learning techniques
title_fullStr Music generation with deep learning techniques
title_full_unstemmed Music generation with deep learning techniques
title_sort music generation with deep learning techniques
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/175113
_version_ 1800916396731793408