Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string

Stable Audio 3 Medium β€” Arabic Maqam LoRA

A LoRA fine-tune of Stable Audio 3 Medium (1.4B parameters) specialized in Arabic Maqam music generation.

Model Details

  • Base model: stabilityai/stable-audio-3-medium
  • LoRA rank: 64, alpha: 128
  • Target modules: self_attn.to_qkv, self_attn.to_out, cross_attn.to_q, cross_attn.to_kv, cross_attn.to_out, ff.ff.0.proj, ff.ff.2
  • Trainable parameters: ~73M (4.8% of 1.4B total)
  • Training data: 100 expert-labeled Maqam recordings, curated and annotated by Maqam specialists
  • Training: H100 80GB, bf16 mixed precision, rectified flow loss
  • Sample rate: 44.1 kHz stereo

Usage

import torch
import torchaudio
from stable_audio_tools import get_pretrained_model
from stable_audio_tools.inference.generation import generate_diffusion_cond_inpaint
from peft import PeftModel

# Load base model
model, model_config = get_pretrained_model("stabilityai/stable-audio-3-medium")
model = model.cuda()

# Load LoRA adapter
model.model.model = PeftModel.from_pretrained(
    model.model.model,
    "motiftechnologies/stable-audio-3-maqam-lora"
)
model.model.model.merge_and_unload()
model = model.eval()

# Generate
with torch.no_grad():
    audio = generate_diffusion_cond_inpaint(
        model,
        steps=50,
        cfg_scale=6.0,
        conditioning=[{"prompt": "Maqam Bayati on <oud>, <qanun>, <ney>; slow tempo; taqsim form", "seconds_total": 30}],
        sample_size=44100 * 30,
        device="cuda",
        inpaint_mask=torch.zeros(1, 44100 * 30, device="cuda"),
    )

torchaudio.save("output.wav", audio.squeeze(0).cpu(), 44100)

Training

Fine-tuned by Motif Technologies.

Downloads last month
179
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for motiftechnologies/stable-audio-3-maqam-lora

Spaces using motiftechnologies/stable-audio-3-maqam-lora 2