Text-to-Audio
Transformers
Safetensors
ACE-Step
image-feature-extraction
feature-extraction
audio
music
text2music
custom_code
Instructions to use cmp-nct/acestep-v15-xl-sft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use cmp-nct/acestep-v15-xl-sft with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-audio", model="cmp-nct/acestep-v15-xl-sft", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("cmp-nct/acestep-v15-xl-sft", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
ACE-Step 1.5 XL — SFT (4B DiT)
Project | Hugging Face | ModelScope | Space Demo | Discord | Tech Report
Model Details
This is the XL (4B) SFT variant of ACE-Step 1.5 — a supervised fine-tuned model with ~4B parameters. SFT provides higher audio quality with CFG (Classifier-Free Guidance) support for fine-grained prompt adherence control.
XL Architecture
| Parameter | Value |
|---|---|
| DiT Decoder hidden_size | 2560 |
| DiT Decoder layers | 32 |
| DiT Decoder attention heads | 32 |
| Encoder hidden_size | 2048 |
| Encoder layers | 8 |
| Total params | ~4B |
| Weights size (bf16) | ~18.8 GB |
| Inference steps | 50 (with CFG) |
GPU Requirements
| VRAM | Support |
|---|---|
| ≥12 GB | With CPU offload + INT8 quantization |
| ≥16 GB | With CPU offload |
| ≥20 GB | Without offload |
| ≥24 GB | Full quality (XL + 4B LM) |
All LM models (0.6B / 1.7B / 4B) are fully compatible with XL.
Key Features
- 💰 Commercial-Ready: Trained on legally compliant datasets. Generated music can be used for commercial purposes.
- 📚 Safe Training Data: Licensed music, royalty-free/public domain, and synthetic (MIDI-to-Audio) data.
- 🎯 CFG Support: Fine-tune prompt adherence with guidance scale control.
- 🔮 Highest Quality: SFT + 4B parameters = the highest quality variant.
Quick Start
# Install ACE-Step
git clone https://github.com/ace-step/ACE-Step-1.5.git
cd ACE-Step-1.5
pip install -e .
# Download this model
huggingface-cli download ACE-Step/acestep-v15-xl-sft --local-dir ./checkpoints/acestep-v15-xl-sft
# Run with Gradio UI
python acestep --config-path acestep-v15-xl-sft
Model Zoo
XL (4B) DiT Models
| DiT Model | CFG | Steps | Quality | Diversity | Tasks | Hugging Face | ModelScope |
|---|---|---|---|---|---|---|---|
acestep-v15-xl-base |
✅ | 50 | High | High | All (extract, lego, complete) | Link | Link |
acestep-v15-xl-sft |
✅ | 50 | Very High | Medium | Standard | This repo | Link |
acestep-v15-xl-turbo |
❌ | 8 | Very High | Medium | Standard | Link | Link |
LM Models (all compatible with XL)
| LM Model | Params | Audio Understanding | Composition | Hugging Face | ModelScope |
|---|---|---|---|---|---|
acestep-5Hz-lm-0.6B |
0.6B | Medium | Medium | Link | Link |
acestep-5Hz-lm-1.7B |
1.7B | Medium | Medium | Included in main | Included in main |
acestep-5Hz-lm-4B |
4B | Strong | Strong | Link | Link |
Acknowledgements
This project is co-led by ACE Studio and StepFun.
Citation
@misc{gong2026acestep,
title={ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation},
author={Junmin Gong, Yulin Song, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo},
howpublished={\url{https://github.com/ace-step/ACE-Step-1.5}},
year={2026},
note={GitHub repository}
}
- Downloads last month
- 46
Paper for cmp-nct/acestep-v15-xl-sft
Paper • 2602.00744 • Published • 12