VaryNote: A Method to Automatically Vary the Number of Notes in Symbolic Music

CMMR 2023

1University of Annonymous 1University of Annonymous2


VaryNote is a method to automatically vary the numbers of notes in MIDI music.

Abstract

Automatically varying the number of notes in symbolic music has various applications in assisting music creators to embellish simple tunes or to reduce complex music to its core idea. In this paper, we formulate the problem of varying music complexity, and propose a method that can preserve harmonic structure while varying the number of notes. Our method, VaryNote, adopts an autoencoder architecture in combination with a masking mechanism to control the number of notes of the generated music. To train the weights of the pitch autoencoder we present a novel surrogate divergence, combining the loss of pitch reconstructions with chord predictions end-to-end. We evaluate our results by plotting chord recognition accuracy with increasing and decreasing number of notes, analysing absolute and relative musical features with a probabilistic framework, and by conducting human surveys. The human survey results indicate humans prefer VaryNote output (with 1.5, 1.9 X notes) over the original music; suggesting that it can be a useful tool in music generation applications.



Interpolate start reference image.

VaryNote

Architecture

The problem is to conditionally generate music based on r. A straightforward approach is to first apply representation learning on the music and then reconstruct it conditioned on r, similar to autoencoder style models in machine learning. In this section, we introduce a novel autoencoder, named VaryNote. Specifically, VaryNote consists of two parts. The first is a pitch autoencoder where the encoder compresses a piece of music into a latent representation and the decoder reconstructs music from the latent representation. The second is a threshold mask that controls the sparsity in the output music. To train the weights of the pitch autoencoder we define a novel divergence in. This divergence is a combination of error on reconstruction and error on symbolic chord predictions.

Interpolate start reference image.


Figure 1: During training VaryNote combines MSE loss and softmax cross entropy loss. Note the mask requires an output-input ratio r. During training we can fix r; or train without masking, and apply the mask during inference. During inference, r controls the number of notes.


Human Evaluation Results

To verify the practical value of VaryNote Lifetime, we conduct a human survey to judge preference. The human survey results in Figure 2 indicate humans prefer VaryNote output, with 1.5, 1.9 times the number of notes, over the original music; and Figure 2 indicates humans perceive increased complexity with higher note multiples, except that 1.5 times the number of notes seems to be perceived with higher complexity than 1.9 times the number of notes.

Human Evaluation Results - Comparison between Original Music and VaryNote Output


Figure 2: Human survey results for preference and complexity. Participants are asked to rate the VaryNote output based on preference on a scale of 1-5, 1 being the lowest appeal, and 5 being the highest appeal. Participants also rate complexity from 1-5, 1 being the lowest complexity, and 5 being the highest complexity. There were 30 total participants; 11/30 participants self-reported knowing how to play an instrument.


To download more listening examples click here.

BibTeX

@article{varynote2023
  author    = {Juan Huerta, Bo Liu, and Peter Stone},
  title     = {VaryNote: A Method to Automatically Vary the Number of Notes in Symbolic Music},
  journal   = {???},
  year      = {2022},
}