Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

xxx model card

Dear reviewers, this is a one of the pretrained/finetuned versions of xxx. You can freely download it by sending an anonymized request respecting the secret condition mentioned in the paper.

xxx is a LMM (Large Music Model) designed for symbolic music generation. It is trained on multitrack music of all genres, and is able to generate music with pleasant melodies and complex harmonies. In its pretrained state, it can only generate autoregressively music continuations. It can be used for research purposes, or musicians to find inspiration for co-creative uses even though continuation limits its real-condition usage interest.

Model Details

Model Description

xxx is trained on multitrack music of all genres from the MetaMIDI Dataset (MMD). It has a vocabulary of 30k tokens learned with Byte Pair Encoding (BPE).

We provide three variants: 125m, 500m and 1b5 parameters.

  • Model type: causal autoregressive Transformer
  • Backbone model: Mistral
  • Music genres: All 🎶
  • License: Apache 2.0

Uses

Direct Use

xxx is designed for autoregressive symbolic music generation. It generates the continuation of a music prompt.

Downstream Use

The model can serve as a base to be finetuned on data subsets or more specific tasks, such as generating complete new tracks along a given portion of music.

Out-of-Scope Use

The model is causal and autoregressive, hence it will generate the future notes of a given prompt. It is not intended to generate variations of portions of music based on past and future notes.

Bias and limitations

While xxx can generate realistic music and help musicians in their creative process, it can present risks that must be acknowledged. The generated samples will potentially reflect biases present in the training data. While it has been trained on a large amount of data of eclectic genres, the model suffered from overfitting, consequently the generated results might be biased towards the training data. Additionally, the original dataset contains a high proportion of western-style music that may not represent fairly the diversity and cultural differences of music in general.

How to Get Started with the Model

To use the model, you just need the transformers and miditok packages (can be installed with pip). pe5z8 can be used with the miditok tokenizer, and the Hugging Face generation features.

import torch
from transformers import AutoModelForCausalLM
from miditok import TSD
from symusic import Score

torch.set_default_device("cuda")
model = AutoModelForCausalLM.from_pretrained("l1v6RWdpLP/pe5z8-500m", torch_dtype="auto")
tokenizer = TSD.from_pretrained("l1v6RWdpLP/pe5z8-500m")
input_midi = Score("path/to/file.mid")
input_tokens = tokenizer(input_midi)

generated_token_ids = model.generate(input_tokens.ids, max_length=200)
generated_midi = tokenizer(generated_token_ids)
generated_midi.dump_midi("path/to/extended.mid")

Training Details

Training Data

The model has been trained on a subset of the MetaMIDI dataset (ISMIR paper). The dataset contains more than 436k MIDI files. Among them, some are corrupted and many are covers of the same song title. We hence deduplicated each song, in order to keep one MIDI per song (the best in each case), and ended up with 38644 MIDI files. We performed data augmentation on these files by increasing or decreasing pitch octaves, velocities and durations, resulting in 314203 files. Finally, we trained the tokenizer with Byte Pair Encoding (BPE) to build a vocabulary of 30k tokens. The training set is made of 1.3 billion tokens.

Training Procedure

  • Training regime: All finetuned models have been trained on 20 epochs, with gradient checkpointing and DeepSpeed ZeRO-2 and input sequences up to 2048 tokens. All the details can be found in the logs and source code files.
  • Hardware: the models have been trained on up to 8 A100 SXM4 80GB GPUs and bf16 mixed precision.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 6e-05
  • train_batch_size: 64 sequences
  • eval_batch_size: 128 sequences
  • seed: 444
  • optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine_with_restarts
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Environmental impact

Redacted for reviewing.

Downloads last month
1
Safetensors
Model size
125M params
Tensor type
BF16
·

Space using l1v6RWdpLP/pe5z8-125m 1