HT-Demucs 6-stem β€” ONNX (with guitar + piano)

The first ONNX export of the 6-stem htdemucs_6s variant on the Hugging Face Hub. Adds guitar and piano stems on top of the standard 4 (drums / bass / other / vocals). Runs in onnxruntime on CPU out of the box, and on CoreML / CUDA / DirectML with a one-line provider change. No PyTorch required at inference.

If you need guitar or piano isolation, this is the only off-the-shelf ONNX model on the Hub that gives you that.


TL;DR

pip install onnxruntime numpy soundfile

# 258 MB fp32 model β€” all 6 stems:
python infer.py your-song.mp3 ./out/

# 136 MB fp16weights variant (same runtime cost):
python infer.py your-song.mp3 ./out/ --small

# Just the guitar stem:
python infer.py your-song.mp3 ./out/ --stems guitar

The repo contains:

  • htdemucs_6s.onnx β€” 258 MB, opset 17, parity-verified vs PyTorch fp32.
  • htdemucs_6s_fp16weights.onnx β€” 136 MB, fp16-stored weights, same runtime memory / latency.
  • infer.py β€” pure-numpy reference inference (~200 lines, no torch).
  • requirements.txt β€” three small packages, no PyTorch.

What stems do I get?

SOURCES = ("drums", "bass", "other", "vocals", "guitar", "piano")

Output tensor: stems[1, 6, 2, 343980] in that exact stem order. The 6-stem variant overlaps with the 4-stem on the first 4 stems but with slightly different separation behavior β€” the extra guitar and piano heads change what "other" learns to keep.


Quality

Parity vs PyTorch fp32 (random input, 7.8 s segment):

  • htdemucs_6s.onnx max abs diff: 2.42 Γ— 10⁻⁴
  • htdemucs_6s_fp16weights.onnx max abs diff (vs fp32 weights): 1.06 Γ— 10⁻⁴

Both well within the 1e-3 publish threshold.

Stem-specific SDR (informal; the official paper covers in-depth eval):

Stem SDR (MUSDB18-HQ, approx.)
drums ~9.5 dB
bass ~9.0 dB
other ~5.5 dB (lower because the model now also predicts guitar + piano)
vocals ~8.5 dB
guitar extracted-track-quality (no public SDR baseline on MUSDB)
piano extracted-track-quality (no public SDR baseline on MUSDB)

If you care about absolute drums/vocals SDR, prefer htdemucs-ft-onnx. If you specifically need guitar or piano isolation, this is the model.


Performance

Single 7.8 s segment, Apple M4 Pro CPU:

Variant RAM Latency RTF
htdemucs_6s.onnx (fp32) ~1.1 GB ~1.6 s 0.20
htdemucs_6s_fp16weights.onnx ~1.1 GB ~1.6 s 0.20

CUDA / DirectML / CoreML EPs are typically β‰₯ 5Γ— faster on real GPUs.


Quick start

Python

import soundfile as sf
import infer

audio, sr = sf.read("your-song.mp3", dtype="float32", always_2d=True)
stems = infer.separate(audio.T, sr,
                       model_path=infer.DEFAULT_MODEL,
                       providers=["CPUExecutionProvider"])
sf.write("guitar.wav", stems["guitar"].T, sr)
sf.write("piano.wav",  stems["piano"].T,  sr)

CLI

python infer.py your-song.mp3 ./out/                          # all 6 stems
python infer.py your-song.mp3 ./out/ --stems guitar piano     # guitar + piano only
python infer.py your-song.mp3 ./out/ --providers coreml       # macOS arm64
python infer.py your-song.mp3 ./out/ --providers cuda         # Linux + NVIDIA
python infer.py your-song.mp3 ./out/ --small                  # 136 MB variant

Mobile / Web

// iOS / Swift β€” 258 MB or 136 MB bundled
import onnxruntime_objc
let session = try ORTSession(env: env,
    modelPath: Bundle.main.path(forResource: "htdemucs_6s_fp16weights",
                                 ofType: "onnx")!,
    sessionOptions: opts)
// Browser
import * as ort from "onnxruntime-web";
const sess = await ort.InferenceSession.create(
  "htdemucs_6s_fp16weights.onnx",
  { executionProviders: ["wasm"] },
);
const t = new ort.Tensor("float32", audioBuffer, [1, 2, 343980]);
const out = await sess.run({ mix: t });   // out.stems is (1, 6, 2, 343980)

For a turnkey browser demo with file-picker + chunked overlap-add, see demucs-onnx browser-demo.


Input / output spec

Tensor Name Shape Dtype Notes
Input mix (1, 2, 343980) float32 Stereo, 44.1 kHz, 7.8 s segment. Values in [-1, 1].
Output stems (1, 6, 2, 343980) float32 Stems in order [drums, bass, other, vocals, guitar, piano].

For longer audio, chunk with overlap-add β€” see infer.py::separate.


Tooling β€” demucs-onnx Python package

This model can be run via the open-source demucs-onnx Python package on PyPI. It auto-downloads from this repo on first use.

pip install demucs-onnx

# 6-stem mode β€” all 6 stems, single session:
demucs-onnx separate song.mp3 stems/ --model htdemucs_6s

# Just guitar + piano:
demucs-onnx separate song.mp3 stems/ --model htdemucs_6s --stems guitar piano

# Python API:
python -c "from demucs_onnx import separate_stem; \
  guitar = separate_stem('song.mp3', 'guitar')"

To re-export your own fine-tune:

pip install 'demucs-onnx[export]'
demucs-onnx export htdemucs_6s out/htdemucs_6s.onnx

How it was built

The export pipeline lives in the open-source demucs-onnx package at demucs_onnx/export/. It applies the same four patches that make htdemucs_ft exportable:

  1. Complex-typed torch.stft outputs β†’ Conv1d with sin/cos kernels.
  2. model.segment fractions.Fraction β†’ plain float.
  3. random.randrange in transformer pos-embedding β†’ hardcoded shift=0.
  4. aten::_native_multi_head_attention (no ONNX symbolic) β†’ drop-in nn.MultiheadAttention.forward built from Linear/bmm/softmax.

The 6-stem head is wider than the 4-stem one but the surgery is identical β€” no new blockers. Parity at 2.42 Γ— 10⁻⁴ on first try.


Related work

Sibling ONNX repos from the same export pipeline:

Repo Stems Use when
htdemucs-ft-onnx 4 (bag) Best SDR on the standard 4 stems.
htdemucs-onnx 4 (single) Fastest 4-stem startup.
htdemucs-6s-onnx (this) 6 You need guitar or piano as a stem.
htdemucs-ft-{drums,bass,other,vocals}-onnx 1 Fastest single-stem inference.

Full benchmark across every popular open-source separator: StemSplitio/stem-separation-benchmark-2026.


Skip the infrastructure β€” use the StemSplit API

Don't want to ship a 258 MB model in your app, manage a GPU pool, or write overlap-add chunking? Use the StemSplit API instead β€” same model under the hood, hosted for you, with credits.


License & attribution

This repo is MIT-licensed, matching the original HT-Demucs.

@inproceedings{rouard2023hybrid,
  title     = {Hybrid Transformers for Music Source Separation},
  author    = {Rouard, Simon and Massa, Francisco and D{\'e}fossez, Alexandre},
  booktitle = {ICASSP},
  year      = {2023}
}
  • Original PyTorch model: facebookresearch/demucs
  • ONNX export, parity verification, and packaging by StemSplit
  • Search keywords: htdemucs 6 stem onnx, htdemucs_6s onnx, guitar isolation onnx, piano isolation onnx, demucs 6-stem mobile, stem separation guitar onnx.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train StemSplitio/htdemucs-6s-onnx

Collection including StemSplitio/htdemucs-6s-onnx