Music generation with deep learning techniques
This project studies various models for deep learning music generation, and investigates a novel approach to music generation that utilises image content as an input. We use Gemini, a large language model, to generate textual captions describing the image's content and emotional tone. These cap...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/175381 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | This project studies various models for deep learning music generation, and investigates a novel approach to music generation that utilises image content as an input. We use Gemini, a large language model, to generate textual captions describing the image's content and emotional tone. These captions are then fed into the existing MusicGen framework, traditionally designed for text-based music generation. While our evaluation shows promise, with generated music thematically and emotionally aligned with the corresponding image, the melodic structure remains a challenge. This suggests potential limitations in using plain text captions as input for MusicGen.
Our findings pave the way for further exploration of alternative representations that could directly translate image features into musical elements. This could involve delving into image processing techniques or developing specialised music generation models that handle image data more effectively. Overall, this project demonstrates the potential of image-based music generation and highlights the need for future research in this exciting area. |
---|