Demodokos Foundry
Local AI music, speech, editing, mixing, and automation

Your GPU is the studio.

Generate music, create lifelike speech, clone voices, separate stems, repair sections, record, mix, master, and automate full audio workflows. Foundry runs locally on Windows with an NVIDIA GPU, so your scripts, voices, songs, and client audio stay on your machine.

No cloud generation / No credit meters / Voice cloning included / Built for private production
50+
Music languages
10
Speech languages
50 x 5
Emotion control
200+
DSP presets
120+
Commands
Music

Songs, vocals, structure, style, and language control.

Create full tracks, extend ideas, transform references, patch weak sections, and generate multilingual songs locally.

Compare music generators
Speech

Expressive narration that does not sound like filler.

Generate speech in 10 languages, direct emotion line by line, clone voices from short samples, and build multi-speaker scenes.

Explore local AI voice
Studio

A real production workspace, not just a prompt field.

Record, arrange, stem-split, patch, crossfade, process with DSP, mix on a DAW-style timeline, and export final audio.

See what is inside
Local creative agent

Give the boring preparation work to the assistant.

Use the built-in agent for lyrics, music briefs, script preparation, speaker extraction, emotion planning, literature summaries, narration segmentation, batch workflows, and repeatable production tasks.

foundry analyze manuscript.pdf
foundry extract speakers and emotions
foundry generate narration
foundry compose intro music
foundry mix and export

ACE-Step 1.5 XL — SFT (4B DiT)

Project | Hugging Face | ModelScope | Space Demo | Discord | Tech Report

Model Details

This is the XL (4B) SFT variant of ACE-Step 1.5 — a supervised fine-tuned model with ~4B parameters. SFT provides higher audio quality with CFG (Classifier-Free Guidance) support for fine-grained prompt adherence control.

XL Architecture

Parameter Value
DiT Decoder hidden_size 2560
DiT Decoder layers 32
DiT Decoder attention heads 32
Encoder hidden_size 2048
Encoder layers 8
Total params ~4B
Weights size (bf16) ~18.8 GB
Inference steps 50 (with CFG)

GPU Requirements

VRAM Support
≥12 GB With CPU offload + INT8 quantization
≥16 GB With CPU offload
≥20 GB Without offload
≥24 GB Full quality (XL + 4B LM)

All LM models (0.6B / 1.7B / 4B) are fully compatible with XL.

Key Features

  • 💰 Commercial-Ready: Trained on legally compliant datasets. Generated music can be used for commercial purposes.
  • 📚 Safe Training Data: Licensed music, royalty-free/public domain, and synthetic (MIDI-to-Audio) data.
  • 🎯 CFG Support: Fine-tune prompt adherence with guidance scale control.
  • 🔮 Highest Quality: SFT + 4B parameters = the highest quality variant.

Quick Start

# Install ACE-Step
git clone https://github.com/ace-step/ACE-Step-1.5.git
cd ACE-Step-1.5
pip install -e .

# Download this model
huggingface-cli download ACE-Step/acestep-v15-xl-sft --local-dir ./checkpoints/acestep-v15-xl-sft

# Run with Gradio UI
python acestep --config-path acestep-v15-xl-sft

Model Zoo

XL (4B) DiT Models

DiT Model CFG Steps Quality Diversity Tasks Hugging Face ModelScope
acestep-v15-xl-base 50 High High All (extract, lego, complete) Link Link
acestep-v15-xl-sft 50 Very High Medium Standard This repo Link
acestep-v15-xl-turbo 8 Very High Medium Standard Link Link

LM Models (all compatible with XL)

LM Model Params Audio Understanding Composition Hugging Face ModelScope
acestep-5Hz-lm-0.6B 0.6B Medium Medium Link Link
acestep-5Hz-lm-1.7B 1.7B Medium Medium Included in main Included in main
acestep-5Hz-lm-4B 4B Strong Strong Link Link

Acknowledgements

This project is co-led by ACE Studio and StepFun.

Citation

@misc{gong2026acestep,
    title={ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation},
    author={Junmin Gong, Yulin Song, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo},
    howpublished={\url{https://github.com/ace-step/ACE-Step-1.5}},
    year={2026},
    note={GitHub repository}
}
Downloads last month
46
Safetensors
Model size
5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for cmp-nct/acestep-v15-xl-sft