HT-Demucs FT — Full 4-Stem Bag, ONNX

The first complete ONNX export of HT-Demucs FT on the Hugging Face Hub. Four parity-verified ONNX models (drums, bass, other, vocals) plus a ~250-line numpy aggregator that runs the full 4-stem separation in pure onnxruntime. No PyTorch required at inference. Runs on CPU / CoreML / CUDA / DirectML.

This repo is the convenience drop — all 4 specialist sub-models of htdemucs_ft in one place, with a working bag-inference script. If you only need one stem in production, the individual stem-specialist repos below are ~75% smaller and ~4× faster per song.

TL;DR

pip install onnxruntime numpy soundfile
python bag_infer.py your-song.mp3 ./out/
# writes out/drums.wav, out/bass.wav, out/other.wav, out/vocals.wav

That's it. The 4 .onnx files (316 MB each, ~1.26 GB total) live alongside the script.

Quality

Median per-stem SDR on the MUSDB18-HQ test split (50 songs), BSS Eval v4 via museval. Identical to the official PyTorch htdemucs_ft — the bag's per-stem output IS the corresponding specialist's output (the weight matrix is one-hot per stem).

Stem	SDR (dB)	Rank in our 2026 benchmark
vocals	9.19	#1 (highest open-source vocal SDR)
drums	10.11	#2 (mdx_extra_q leads at 11.49)
bass	10.38	#2 (mdx_extra_q leads at 11.42)
other	6.34	#2 (mdx_extra_q leads at 7.67)

Full benchmark across every popular open-source separator: StemSplitio/stem-separation-benchmark-2026.

ONNX vs PyTorch parity: verified to < 1e-3 max abs diff on every stem during export. See the Day 1 spike report for the full engineering writeup.

Performance

Real measurements on an Apple M4 Pro:

Mode	Hardware	Per 3-min song	Notes
One specialist (`htdemucs-ft-drums-onnx`)	M4 Pro CPU	~22 s	4× faster, 75% smaller — use this if you only need one stem
Full bag (this repo)	M4 Pro CPU	~88 s	RTF ~0.5. 4 sub-models × N chunks.
Full bag	M4 Pro CPU (8 threads)	~60 s	With `OMP_NUM_THREADS=8` and SessionOptions tuned
Full bag	NVIDIA L4 CUDA	~6 s	Extrapolated from per-specialist CUDA numbers
Full bag	NVIDIA T4	~16 s	Extrapolated
PyTorch full bag	M4 Pro MPS	~47 s	Faster only because MPS is GPU-accelerated; ONNX-CUDA beats it cleanly.

Tooling — `demucs-onnx` Python package

This bag is also packaged in the open-source demucs-onnx Python package on PyPI. It auto-downloads each specialist from the matching HF repo on first use, so you don't even need to manually fetch the four .onnx files.

pip install demucs-onnx

# Full 4-stem separation (auto-downloads ~1.26 GB on first run)
demucs-onnx separate song.mp3 stems/

# From Python
python -c "from demucs_onnx import separate; stems = separate('song.mp3')"

The same package is also the canonical tool for exporting htdemucs to ONNX yourself — it bundles all four blocker fixes (complex STFT, fractions.Fraction, random.randrange, aten::_native_multi_head_attention) so vanilla torch.onnx.export works on your own demucs checkpoints.

pip install "demucs-onnx[export]"
demucs-onnx export htdemucs_ft out/   # writes 4 .onnx files

Common use cases

Karaoke makers — out/other.wav minus out/vocals.wav gives a clean karaoke track plus an acapella in one pass.
DAW stem export — drop the 4 .wav files into Ableton / Logic / Reaper as separate channels for remixing.
DJ stems software — load all 4 stems as live-mixable tracks.
AI music apps — feed each stem into downstream models (drum transcription, bassline-to-MIDI, vocal pitch correction).
Acapella sampling — clean isolated vocals at the highest SDR available in open source.
Mobile / on-device separation — replaces a 1+ GB PyTorch install with onnxruntime's 50 MB binary on iOS / Android.

Quick start

Python — as a library

import bag_infer

stems = bag_infer.separate_all("your-song.mp3")
# stems: dict[str, numpy.ndarray (2, samples)]
#   stems["drums"], stems["bass"], stems["other"], stems["vocals"]

Python — with execution provider control

import soundfile as sf
import bag_infer

audio, sr = sf.read("your-song.mp3", dtype="float32", always_2d=True)
stems = bag_infer.separate(
    audio.T, sr,
    providers=["CPUExecutionProvider"],  # or "CoreMLExecutionProvider", etc.
)
for name, audio in stems.items():
    sf.write(f"{name}.wav", audio.T, sr)

CLI

python bag_infer.py your-song.mp3 ./out/
python bag_infer.py your-song.mp3 ./out/ --providers cuda
python bag_infer.py your-song.mp3 ./out/ --providers coreml
python bag_infer.py your-song.mp3 ./out/ --providers dml

Web / mobile

Each specialist is a vanilla onnxruntime model; just load all 4 sessions and reuse the aggregation logic in bag_infer.py::separate. See the individual stem repos for platform-specific snippets: drums · bass · other · vocals.

How aggregation works

The htdemucs_ft bag uses a one-hot weight matrix for combining the 4 sub-models — model 0's drums output is used directly as the bag's drums stem, model 1's bass output is the bag's bass stem, and so on. No weighted-sum aggregation needed.

That means:

The bag's drums stem == the drums specialist's drums output (bit-exact in fp32)
Same for bass, other, vocals
So you can ship only the specialists you need and get identical per-stem quality to the full bag at 1/4 the size

bag_infer.py simply runs all 4 specialists and picks the relevant row from each. ~30 lines of numpy.

Input / output spec per sub-model

Tensor	Name	Shape	Dtype	Notes
Input	`mix`	`(1, 2, 343980)`	float32	Stereo audio, 44.1 kHz, 7.8 s segment.
Output	`stems`	`(1, 4, 2, 343980)`	float32	`[drums, bass, other, vocals]`. Use only the specialist's target row.

For longer audio, the bag script handles overlap-add chunking.

Files in this repo

File	Size	Purpose
`htdemucs_ft_drums.onnx`	316 MB	Drums specialist (bag index 0)
`htdemucs_ft_bass.onnx`	316 MB	Bass specialist (bag index 1)
`htdemucs_ft_other.onnx`	316 MB	Other specialist (bag index 2)
`htdemucs_ft_vocals.onnx`	316 MB	Vocals specialist (bag index 3)
`bag_infer.py`	7 KB	Pure numpy aggregator. No torch.
`requirements.txt`	<1 KB	`onnxruntime`, `numpy`, `soundfile`.
`README.md`	this file

Total: ~1.26 GB. If that's too big, use individual stem repos.

Related work

Repo	Stem	Use when
`htdemucs-ft-drums-onnx`	drums	Only need drums (1/4 size, 1/4 latency)
`htdemucs-ft-bass-onnx`	bass	Only need bass
`htdemucs-ft-other-onnx`	other	Only need "other" / instrumental
`htdemucs-ft-vocals-onnx`	vocals	#1 open-source vocal SDR

PyTorch versions for HF Inference Endpoints: htdemucs-ft-pytorch and its 4 sibling specialist repos.

Skip the infrastructure — use the StemSplit API

Don't want to ship 1.26 GB of .onnx files in your app, manage a GPU pool, or write overlap-add chunking? Use the StemSplit API instead — same models under the hood, hosted for you, with credits and a dashboard.

Or use the no-code tools that ship this same model family:

License & attribution

MIT-licensed, matching the original HT-Demucs.

@inproceedings{rouard2023hybrid,
  title     = {Hybrid Transformers for Music Source Separation},
  author    = {Rouard, Simon and Massa, Francisco and D{\'e}fossez, Alexandre},
  booktitle = {ICASSP},
  year      = {2023}
}

Original PyTorch model: facebookresearch/demucs
ONNX export, parity verification, and packaging by StemSplit
Search keywords: htdemucs onnx, demucs onnx, htdemucs bag onnx, demucs ios, demucs android, music source separation onnx, 4-stem separation onnx, stem separation mobile, onnxruntime music separation

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train StemSplitio/htdemucs-ft-onnx

Collection including StemSplitio/htdemucs-ft-onnx

Music Source Separation Toolkit 2026

Collection

Open-source models + our reproducible MUSDB18-HQ benchmark for music source separation. Curated by the StemSplit team. • 19 items • Updated about 10 hours ago