Demodokos Foundry

Local AI music, speech, editing, mixing, and automation

Your GPU is the studio.

Generate music, create lifelike speech, clone voices, separate stems, repair sections, record, mix, master, and automate full audio workflows. Foundry runs locally on Windows with an NVIDIA GPU, so your scripts, voices, songs, and client audio stay on your machine.

Download for Windows Start free trial Hear examples

No cloud generation / No credit meters / Voice cloning included / Built for private production

50+

Music languages

Speech languages

50 x 5

Emotion control

200+

DSP presets

120+

Commands

Music

Songs, vocals, structure, style, and language control.

Create full tracks, extend ideas, transform references, patch weak sections, and generate multilingual songs locally.

Compare music generators

Speech

Expressive narration that does not sound like filler.

Generate speech in 10 languages, direct emotion line by line, clone voices from short samples, and build multi-speaker scenes.

Explore local AI voice

Studio

A real production workspace, not just a prompt field.

Record, arrange, stem-split, patch, crossfade, process with DSP, mix on a DAW-style timeline, and export final audio.

See what is inside

Local creative agent

Give the boring preparation work to the assistant.

Use the built-in agent for lyrics, music briefs, script preparation, speaker extraction, emotion planning, literature summaries, narration segmentation, batch workflows, and repeatable production tasks.

foundry analyze manuscript.pdf
foundry extract speakers and emotions
foundry generate narration
foundry compose intro music
foundry mix and export

Pick your workflow

YouTubers and faceless channels

Voiceovers, hooks, background music, repeatable production, no per-video credit anxiety.

Game developers

NPC dialogue, character voices, batch generation, ambience, score, and engine-ready export.

Audiobooks and podcasts

Long-form narration, multi-speaker scenes, private manuscripts, music beds, and final export.

Speech studios by language

English Deutsch Français Español Italiano 中文日本語 Русский Português 한국어

Learn fast

Getting started Emotion manager Patching Creative AI agent

Deep dives

Voice cloning ElevenLabs alternative Automation Privacy

ACE-Step 1.5 XL — SFT (4B DiT)

Model Details

This is the XL (4B) SFT variant of ACE-Step 1.5 — a supervised fine-tuned model with ~4B parameters. SFT provides higher audio quality with CFG (Classifier-Free Guidance) support for fine-grained prompt adherence control.

XL Architecture

Parameter	Value
DiT Decoder hidden_size	2560
DiT Decoder layers	32
DiT Decoder attention heads	32
Encoder hidden_size	2048
Encoder layers	8
Total params	~4B
Weights size (bf16)	~18.8 GB
Inference steps	50 (with CFG)

GPU Requirements

VRAM	Support
≥12 GB	With CPU offload + INT8 quantization
≥16 GB	With CPU offload
≥20 GB	Without offload
≥24 GB	Full quality (XL + 4B LM)

All LM models (0.6B / 1.7B / 4B) are fully compatible with XL.

Key Features

💰 Commercial-Ready: Trained on legally compliant datasets. Generated music can be used for commercial purposes.
📚 Safe Training Data: Licensed music, royalty-free/public domain, and synthetic (MIDI-to-Audio) data.
🎯 CFG Support: Fine-tune prompt adherence with guidance scale control.
🔮 Highest Quality: SFT + 4B parameters = the highest quality variant.

Quick Start

# Install ACE-Step
git clone https://github.com/ace-step/ACE-Step-1.5.git
cd ACE-Step-1.5
pip install -e .

# Download this model
huggingface-cli download ACE-Step/acestep-v15-xl-sft --local-dir ./checkpoints/acestep-v15-xl-sft

# Run with Gradio UI
python acestep --config-path acestep-v15-xl-sft

Model Zoo

XL (4B) DiT Models

DiT Model	CFG	Steps	Quality	Diversity	Tasks	Hugging Face	ModelScope
`acestep-v15-xl-base`	✅	50	High	High	All (extract, lego, complete)	Link	Link
`acestep-v15-xl-sft`	✅	50	Very High	Medium	Standard	This repo	Link
`acestep-v15-xl-turbo`	❌	8	Very High	Medium	Standard	Link	Link

LM Models (all compatible with XL)

LM Model	Params	Audio Understanding	Composition	Hugging Face	ModelScope
`acestep-5Hz-lm-0.6B`	0.6B	Medium	Medium	Link	Link
`acestep-5Hz-lm-1.7B`	1.7B	Medium	Medium	Included in main	Included in main
`acestep-5Hz-lm-4B`	4B	Strong	Strong	Link	Link

Acknowledgements

This project is co-led by ACE Studio and StepFun.

Citation

@misc{gong2026acestep,
    title={ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation},
    author={Junmin Gong, Yulin Song, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo},
    howpublished={\url{https://github.com/ace-step/ACE-Step-1.5}},
    year={2026},
    note={GitHub repository}
}

Downloads last month: 46

Safetensors

Model size

5B params

Tensor type

BF16

Paper for cmp-nct/acestep-v15-xl-sft

ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation

Paper • 2602.00744 • Published Jan 31 • 12