Controllable music : supervised learning of disentangled representations for music generation

Controllability, despite being a much-desired property of a generative model, remains an ill-defined concept that is difficult to measure. In the context of neural music generation, a controllable system often implies an intuitive interaction between human agents and the neural model, allowing the r...

Full description

Saved in:

Bibliographic Details
Main Author:	Watcharasupat, Karn N.
Other Authors:	Gan Woon Seng
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2021
Subjects:	Engineering::Electrical and electronic engineering::Electronic systems::Signal processing Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Online Access:	https://hdl.handle.net/10356/153200
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-153200
record_format	dspace
spelling	sg-ntu-dr.10356-1532002023-07-07T18:35:28Z Controllable music : supervised learning of disentangled representations for music generation Watcharasupat, Karn N. Gan Woon Seng School of Electrical and Electronic Engineering Center for Music Technology, Georgia Institute of Technology Centre for Information Sciences and Systems Alexander Lerch alexander.lerch@gatech.edu; EWSGAN@ntu.edu.sg Engineering::Electrical and electronic engineering::Electronic systems::Signal processing Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Controllability, despite being a much-desired property of a generative model, remains an ill-defined concept that is difficult to measure. In the context of neural music generation, a controllable system often implies an intuitive interaction between human agents and the neural model, allowing the relatively opaque neural model to be controlled by a human in a semantically understandable manner. In this work, we aim to tackle controllable music generation in the raw audio domain, which is significantly less attempted compared to the symbolic domain. Specifically, we focus on controlling multiple continuous, potentially interdependent timbral attributes of a musical note using a variational autoencoder (VAE) framework, and the necessary groundwork research needed to support the goal. Specifically, this work consists of three main parts. The first formulates the concept of \textit{controllability} and how to evaluate a latent manifold of deep generative models in the presence of multiple interdependent attributes. The second focuses on the development of a composite latent space architecture for VAE, in order to allow encoding of interdependent attributes which having an easily sampled disentangled prior. Proofs of concept work for the second part was performed on several standard vision disentanglement learning datasets. Finally, the last part applies the composite latent space model on music generation in the raw audio domain and discusses the evaluation of the model against the criteria defined in the first part of this project. All in all, given the relatively uncharted nature of the controllable generation in the raw audio domain, this project provides a foundational work for the evaluation of controllable generation as a whole, and a promising proof of concept for musical audio generation with timbral control using variational autoencoders. Bachelor of Engineering (Electrical and Electronic Engineering) 2021-11-16T02:07:17Z 2021-11-16T02:07:17Z 2021 Final Year Project (FYP) Watcharasupat, K. N. (2021). Controllable music : supervised learning of disentangled representations for music generation. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/153200 https://hdl.handle.net/10356/153200 en CY3001-211 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering::Electronic systems::Signal processing Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle	Engineering::Electrical and electronic engineering::Electronic systems::Signal processing Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Watcharasupat, Karn N. Controllable music : supervised learning of disentangled representations for music generation
description	Controllability, despite being a much-desired property of a generative model, remains an ill-defined concept that is difficult to measure. In the context of neural music generation, a controllable system often implies an intuitive interaction between human agents and the neural model, allowing the relatively opaque neural model to be controlled by a human in a semantically understandable manner. In this work, we aim to tackle controllable music generation in the raw audio domain, which is significantly less attempted compared to the symbolic domain. Specifically, we focus on controlling multiple continuous, potentially interdependent timbral attributes of a musical note using a variational autoencoder (VAE) framework, and the necessary groundwork research needed to support the goal. Specifically, this work consists of three main parts. The first formulates the concept of \textit{controllability} and how to evaluate a latent manifold of deep generative models in the presence of multiple interdependent attributes. The second focuses on the development of a composite latent space architecture for VAE, in order to allow encoding of interdependent attributes which having an easily sampled disentangled prior. Proofs of concept work for the second part was performed on several standard vision disentanglement learning datasets. Finally, the last part applies the composite latent space model on music generation in the raw audio domain and discusses the evaluation of the model against the criteria defined in the first part of this project. All in all, given the relatively uncharted nature of the controllable generation in the raw audio domain, this project provides a foundational work for the evaluation of controllable generation as a whole, and a promising proof of concept for musical audio generation with timbral control using variational autoencoders.
author2	Gan Woon Seng
author_facet	Gan Woon Seng Watcharasupat, Karn N.
format	Final Year Project
author	Watcharasupat, Karn N.
author_sort	Watcharasupat, Karn N.
title	Controllable music : supervised learning of disentangled representations for music generation
title_short	Controllable music : supervised learning of disentangled representations for music generation
title_full	Controllable music : supervised learning of disentangled representations for music generation
title_fullStr	Controllable music : supervised learning of disentangled representations for music generation
title_full_unstemmed	Controllable music : supervised learning of disentangled representations for music generation
title_sort	controllable music : supervised learning of disentangled representations for music generation
publisher	Nanyang Technological University
publishDate	2021
url	https://hdl.handle.net/10356/153200
_version_	1772827989035188224

Controllable music : supervised learning of disentangled representations for music generation

Similar Items