sr5434/sdxl-drums · Hugging Face

An SDXL v1.0 LoRA for generating drum spectorgrams. Sample code:

from diffusers import DiffusionPipeline, AutoencoderKL
import torch

vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, vae=vae).to("cuda")
pipe.load_lora_weights("sr5434/sdxl-drums", adapter_name="drums")
img = pipe("0-to-5-seconds of drumming in latin brazilian baiao with 95 beat rythm, version 145, 4-4")

I trained it for 800 training steps. The batch size per device(1 T4) was 10 and there were 12 gradient accumulation steps, so the total batch size was 120. I used the 8 bit adam optimizer. The learning rate was 1e-04 with a cosine schedule and 15 warmup steps.

sr5434
/

sdxl-drums

Dataset used to train sr5434/sdxl-drums