Transformers as feature extractors in emotion-based music visualization

Cross-modal similarity learning evolves around the feature embeddings of the target modalities. With advancements in Deep Neural Network, feature extractions have seen an increasing sophistication. Convolutional Neural Networks (CNNs) and Residual Networks (ResNets) have proven to perform great...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Sim, Clodia Xin Ni
مؤلفون آخرون: Alexei Sourin
التنسيق: Final Year Project
اللغة:English
منشور في: Nanyang Technological University 2024
الموضوعات:
الوصول للمادة أونلاين:https://hdl.handle.net/10356/175170
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
الوصف
الملخص:Cross-modal similarity learning evolves around the feature embeddings of the target modalities. With advancements in Deep Neural Network, feature extractions have seen an increasing sophistication. Convolutional Neural Networks (CNNs) and Residual Networks (ResNets) have proven to perform great feature extractions in the field of both computer vision and music analysis, both of which are crucial to music visualization. However, the emergence of transformers poses a question as to whether such networks are still the best choice for such tasks. This project will first explore existing works on music visualizations, and then study the use of emotion dimensions such as valence and arousal to quantify emotions. It also explores how audio signals and spectrograms can be used to analyse the emotions evoked by a piece of music. Ultimately, this project proposes to use transformers as feature extractors, and thereafter, leading to better music visualizations using cross-modal similarity learning. The experiments conducted proved that transformers perform better than state-of-the-art approaches.