HT-Demucs FT — Instrumental / Other Specialist, ONNX

Melodic / instrumental specialist from HT-Demucs FT — everything that isn't vocals, drums, or bass. ONNX runtime, no PyTorch needed.

This repo packages sub-model 2 of the htdemucs_ft 4-bag ensemble as a single 316 MB .onnx file plus a ~150-line numpy reference inference script. Verified to be numerically equivalent to the original PyTorch model.

Want all 4 stems in one drop-in package? Use the full bag repo: StemSplitio/htdemucs-ft-onnx.

TL;DR

pip install onnxruntime numpy soundfile
python infer.py your-song.mp3 ./out/
# writes ./out/other.wav at 44.1 kHz stereo

That's it. No PyTorch, no CUDA setup, no GPU server.

Quality

Metric (MUSDB18-HQ test, 50 songs)	Value	Source
Median other SDR	6.34 dB	StemSplitio/stem-separation-benchmark-2026
Rank among open-source separators on other	2nd (mdx_extra_q leads at 7.67)	same
ONNX vs PyTorch max abs diff	< 1e-3	verified during export (see Day 1 spike report)

Performance

Runtime	Hardware	Per 7.8-s segment	Per 3-min song
onnxruntime CPU EP	Apple M4 Pro	~1.6 s	~22 s
PyTorch CPU	Apple M4 Pro	~2.1 s	~29 s
onnxruntime CUDA EP	NVIDIA L4	~0.4 s	~5 s (extrapolated)
onnxruntime DirectML EP	RTX 4090	~0.2 s	~2 s (extrapolated)

Real-time factor on M4 Pro CPU: 0.20. Roughly 1.31× faster than PyTorch CPU on the same hardware.

Tooling — `demucs-onnx` Python package

This model can also be run (and re-exported) via the open-source demucs-onnx Python package on PyPI. It auto-downloads from this repo on first use.

pip install demucs-onnx

# Single specialist (this repo)
demucs-onnx separate song.mp3 stems/ --stem other

# Or via the Python API
python -c "from demucs_onnx import separate_stem; \
  audio = separate_stem('song.mp3', 'other')"

The same package is also the canonical tool for exporting htdemucs to ONNX yourself — it bundles all four blocker fixes (complex STFT, fractions.Fraction, random.randrange, aten::_native_multi_head_attention) so vanilla torch.onnx.export works on your own checkpoints.

pip install "demucs-onnx[export]"
demucs-onnx export htdemucs_ft other.onnx --stem other

Common use cases

Karaoke / instrumental tracks — extract the music-minus-vocals layer (pair with the vocals ONNX for clean round-tripping)
Sample-flipping — isolate guitar/keys/synth lines for chopping and remixing
Cover-song production — strip vocals and rebalance the instrumental bed
Music-bed for video — remove vocals from licensed tracks for under-spoken-word use (check sync rights first)

Quick start

Python — minimal

import infer
other = infer.separate_other("your-song.mp3")
# other: numpy array (2, samples) at 44.1 kHz

Python — full control

import soundfile as sf
import infer

# Optional execution providers — CPU is the default and most portable.
# Swap to "coreml" on macOS, "cuda" on NVIDIA, "dml" on Windows DX12.
audio, sr = sf.read("your-song.mp3", dtype="float32", always_2d=True)
stems = infer.separate(audio.T, sr, providers=["CPUExecutionProvider"])
sf.write("other.wav", stems[infer.SOURCES.index("other")].T, sr)

CLI

python infer.py your-song.mp3 ./out/
python infer.py your-song.mp3 ./out/ --providers cuda    # NVIDIA
python infer.py your-song.mp3 ./out/ --providers coreml  # macOS
python infer.py your-song.mp3 ./out/ --providers dml     # Windows

Mobile (iOS / Swift)

import onnxruntime_objc

let env = try ORTEnv(loggingLevel: .warning)
let opts = try ORTSessionOptions()
try opts.appendCoreMLExecutionProvider(with: ORTCoreMLExecutionProviderOptions())
let session = try ORTSession(env: env,
                              modelPath: Bundle.main.path(forResource: "htdemucs_ft_other", ofType: "onnx")!,
                              sessionOptions: opts)
// audio: 1 × 2 × 343980 Float32 buffer, then session.run(...).

Mobile (Android / Kotlin)

import ai.onnxruntime.OrtEnvironment
import ai.onnxruntime.OrtSession

val env = OrtEnvironment.getEnvironment()
val opts = OrtSession.SessionOptions().apply { addNnapi() }
val session = env.createSession(modelPath, opts)

Web (onnxruntime-web)

import * as ort from "onnxruntime-web";
const session = await ort.InferenceSession.create("htdemucs_ft_other.onnx", {
  executionProviders: ["wasm"],
  graphOptimizationLevel: "all",
});
const tensor = new ort.Tensor("float32", audioBuffer, [1, 2, 343980]);
const out = await session.run({ mix: tensor });
// out.stems.data is a Float32Array (1, 4, 2, 343980); use row 2 for other.

Input / output spec

Tensor	Name	Shape	Dtype	Notes
Input	`mix`	`(1, 2, 343980)`	float32	Stereo audio, 44.1 kHz, 7.8 s segment. Values in [-1, 1].
Output	`stems`	`(1, 4, 2, 343980)`	float32	`[drums, bass, other, vocals]` order. Use only row 2 (`other`) — the other 3 rows are weakly-predicted by-products of the other specialist.

For longer audio, chunk with overlap-add — see infer.py::separate for a working ~60-line implementation.

Related repos

Sibling stem-specialist ONNX repos from the same export:

Repo	Stem	Use when
`htdemucs-ft-drums-onnx`	drums	Drum extraction, beat transcription
`htdemucs-ft-bass-onnx`	bass	Bassline transcription, mix rebalancing
`htdemucs-ft-other-onnx`	other	Karaoke instrumentals, sample-flipping
`htdemucs-ft-vocals-onnx`	vocals	#1 open-source vocal SDR — karaoke, acapella, vocal removal
`htdemucs-ft-onnx`	all 4	Full 4-stem separation in one repo

PyTorch versions for HF Inference Endpoints: htdemucs-ft-pytorch, htdemucs-ft-other-pytorch.

Full benchmark across every popular open-source separator: StemSplitio/stem-separation-benchmark-2026.

Skip the infrastructure — use the StemSplit API

Don't want to ship a 316 MB model in your app, manage a GPU pool, or write overlap-add chunking? Use the StemSplit API instead — same model under the hood, hosted for you, with credits and a dashboard.

Or use the no-code tools that ship the same model family:

Files in this repo

File	Size	Purpose
`htdemucs_ft_other.onnx`	316 MB	The exported model. Opset 17. Passes `onnx.checker`.
`infer.py`	~6 KB	Pure numpy + onnxruntime reference. No torch.
`requirements.txt`	<1 KB	`onnxruntime`, `numpy`, `soundfile`.
`README.md`	this file

License & attribution

This repo is MIT-licensed, matching the original HT-Demucs.

@inproceedings{rouard2023hybrid,
  title     = {Hybrid Transformers for Music Source Separation},
  author    = {Rouard, Simon and Massa, Francisco and D{\'e}fossez, Alexandre},
  booktitle = {ICASSP},
  year      = {2023}
}

Original PyTorch model: facebookresearch/demucs
ONNX export, parity verification, and packaging by StemSplit
Search keywords: instrumental extractor onnx, karaoke maker, music minus vocals, htdemucs other onnx

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train StemSplitio/htdemucs-ft-other-onnx

Collection including StemSplitio/htdemucs-ft-other-onnx

Music Source Separation Toolkit 2026

Collection

Open-source models + our reproducible MUSDB18-HQ benchmark for music source separation. Curated by the StemSplit team. • 19 items • Updated 3 days ago