Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, Epiciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation

Deep learning methods for music style transfer

Abstract : Recently, deep learning methods have enabled transforming musical material in a data-driven manner. The focus of this thesis is on a family of tasks which we refer to as (one-shot) music style transfer, where the goal is to transfer the style of one musical piece or fragment onto another.In the first part of this work, we focus on supervised methods for symbolic music accompaniment style transfer, aiming to transform a given piece by generating a new accompaniment for it in the style of another piece. The method we have developed is based on supervised sequence-to-sequence learning using recurrent neural networks (RNNs) and leverages a synthetic parallel (pairwise aligned) dataset generated for this purpose using existing accompaniment generation software. We propose a set of objective metrics to evaluate the performance on this new task and we show that the system is successful in generating an accompaniment in the desired style while following the harmonic structure of the input.In the second part, we investigate a more basic question: the role of positional encodings (PE) in music generation using Transformers. In particular, we propose stochastic positional encoding (SPE), a novel form of PE capturing relative positions while being compatible with a recently proposed family of efficient Transformers.We demonstrate that SPE allows for better extrapolation beyond the training sequence length than the commonly used absolute PE.Finally, in the third part, we turn from symbolic music to audio and address the problem of timbre transfer. Specifically, we are interested in transferring the timbre of an audio recording of a single musical instrument onto another such recording while preserving the pitch content of the latter. We present a novel method for this task, based on an extension of the vector-quantized variational autoencoder (VQ-VAE), along with a simple self-supervised learning strategy designed to obtain disentangled representations of timbre and pitch. As in the first part, we design a set of objective metrics for the task. We show that the proposed method is able to outperform existing ones.
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Tuesday, December 21, 2021 - 5:19:07 PM
Last modification on : Wednesday, December 22, 2021 - 3:06:10 AM
Long-term archiving on: : Wednesday, March 23, 2022 - 10:20:00 AM


Version validated by the jury (STAR)


  • HAL Id : tel-03499991, version 1



Ondřej Cífka. Deep learning methods for music style transfer. Artificial Intelligence [cs.AI]. Institut Polytechnique de Paris, 2021. English. ⟨NNT : 2021IPPAT029⟩. ⟨tel-03499991⟩



Record views


Files downloads