Text-to-Audio
Transformers
Safetensors
ACE-Step
image-feature-extraction
feature-extraction
audio
music
text2music
custom_code
Instructions to use ACE-Step/acestep-v15-xl-sft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ACE-Step/acestep-v15-xl-sft with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-audio", model="ACE-Step/acestep-v15-xl-sft", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ACE-Step/acestep-v15-xl-sft", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Recommended caption style for fine-tuning
#2
by wnmurphy - opened
Outstanding work.
Interested in fine-tuning, and I have a question about how to structure my inputs (captions). Were the audio models trained on input that looks like the automatically-generated captions?
I'm wondering if I'll have better results sticking to that format (i.e. BPM and key separated out from the caption rather than in it) or by generating captions that contain all of the information like BPM and key + much more detail than the generated captions provide.
Also, I'll look for it more, but could you point me to where the prompt for the LM's caption generation lives in the repo?
Much appreciated.