cmp-nct
/

Demodokos Foundry
Local AI music, speech, editing, mixing, and automation

Your GPU is the studio.

Generate music, create lifelike speech, clone voices, separate stems, repair sections, record, mix, master, and automate full audio workflows. Foundry runs locally on Windows with an NVIDIA GPU, so your scripts, voices, songs, and client audio stay on your machine.

No cloud generation / No credit meters / Voice cloning included / Built for private production
50+
Music languages
10
Speech languages
50 x 5
Emotion control
200+
DSP presets
120+
Commands
Music

Songs, vocals, structure, style, and language control.

Create full tracks, extend ideas, transform references, patch weak sections, and generate multilingual songs locally.

Compare music generators
Speech

Expressive narration that does not sound like filler.

Generate speech in 10 languages, direct emotion line by line, clone voices from short samples, and build multi-speaker scenes.

Explore local AI voice
Studio

A real production workspace, not just a prompt field.

Record, arrange, stem-split, patch, crossfade, process with DSP, mix on a DAW-style timeline, and export final audio.

See what is inside
Local creative agent

Give the boring preparation work to the assistant.

Use the built-in agent for lyrics, music briefs, script preparation, speaker extraction, emotion planning, literature summaries, narration segmentation, batch workflows, and repeatable production tasks.

foundry analyze manuscript.pdf
foundry extract speakers and emotions
foundry generate narration
foundry compose intro music
foundry mix and export

Model

This model is hosted for Demodokos Foundry but it can be used for other purposes, enjoy a stable download location and custom quantizations not available elsewhere.

ACE-Step 1.5 XL โ€” Base (4B DiT)

Project | Hugging Face | ModelScope | Space Demo | Discord | Tech Report

Model Details

This is the XL (4B) Base variant of ACE-Step 1.5 โ€” a larger DiT decoder with ~4B parameters for higher audio quality. It is the foundation model supporting all tasks: text-to-music, cover, repaint, extract, lego, and complete.

XL Architecture

Parameter Value
DiT Decoder hidden_size 2560
DiT Decoder layers 32
DiT Decoder attention heads 32
Encoder hidden_size 2048
Encoder layers 8
Total params ~4B
Weights size (bf16) ~18.8 GB
Inference steps 50 (with CFG)

GPU Requirements

VRAM Support
โ‰ฅ12 GB With CPU offload + INT8 quantization
โ‰ฅ16 GB With CPU offload
โ‰ฅ20 GB Without offload
โ‰ฅ24 GB Full quality (XL + 4B LM)

All LM models (0.6B / 1.7B / 4B) are fully compatible with XL.

Key Features

  • ๐Ÿ’ฐ Commercial-Ready: Trained on legally compliant datasets. Generated music can be used for commercial purposes.
  • ๐Ÿ“š Safe Training Data: Licensed music, royalty-free/public domain, and synthetic (MIDI-to-Audio) data.
  • ๐ŸŽฏ Full Task Support: Text2Music, Cover, Repaint, Extract, Lego, Complete.
  • ๐Ÿ”ฎ Higher Quality: 4B parameters provide richer audio quality compared to the 2B variants.

Quick Start

# Install ACE-Step
git clone https://github.com/ace-step/ACE-Step-1.5.git
cd ACE-Step-1.5
pip install -e .

# Download this model
huggingface-cli download ACE-Step/acestep-v15-xl-base --local-dir ./checkpoints/acestep-v15-xl-base

# Run with Gradio UI
python acestep --config-path acestep-v15-xl-base

Model Zoo

XL (4B) DiT Models

DiT Model CFG Steps Quality Diversity Tasks Hugging Face ModelScope
acestep-v15-xl-base โœ… 50 High High All (extract, lego, complete) This repo Link
acestep-v15-xl-sft โœ… 50 Very High Medium Standard Link Link
acestep-v15-xl-turbo โŒ 8 Very High Medium Standard Link Link

2B DiT Models

DiT Model CFG Steps Hugging Face ModelScope
acestep-v15-turbo (default) โŒ 8 Link Link
acestep-v15-sft โœ… 50 Link Link
acestep-v15-base โœ… 50 Link Link

LM Models (all compatible with XL)

LM Model Params Audio Understanding Composition Hugging Face ModelScope
acestep-5Hz-lm-0.6B 0.6B Medium Medium Link Link
acestep-5Hz-lm-1.7B 1.7B Medium Medium Included in main Included in main
acestep-5Hz-lm-4B 4B Strong Strong Link Link

Acknowledgements

This project is co-led by ACE Studio and StepFun.

Citation

@misc{gong2026acestep,
    title={ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation},
    author={Junmin Gong, Yulin Song, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo},
    howpublished={\url{https://github.com/ace-step/ACE-Step-1.5}},
    year={2026},
    note={GitHub repository}
}
Downloads last month
58
Safetensors
Model size
5B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Paper for cmp-nct/acestep-v15-xl-base