HT-Demucs FT β Bass Specialist, ONNX
Bass extraction specialist from HT-Demucs FT, packaged as ONNX. ~1.31Γ faster than PyTorch CPU, no PyTorch required at inference.
This repo packages sub-model 1 of the
htdemucs_ft 4-bag ensemble
as a single 316 MB .onnx file plus a ~150-line numpy reference inference
script. Verified to be numerically equivalent to the original PyTorch
model.
Want all 4 stems in one drop-in package? Use the full bag repo:
StemSplitio/htdemucs-ft-onnx.
TL;DR
pip install onnxruntime numpy soundfile
python infer.py your-song.mp3 ./out/
# writes ./out/bass.wav at 44.1 kHz stereo
That's it. No PyTorch, no CUDA setup, no GPU server.
Quality
| Metric (MUSDB18-HQ test, 50 songs) | Value | Source |
|---|---|---|
| Median bass SDR | 10.38 dB | StemSplitio/stem-separation-benchmark-2026 |
| Rank among open-source separators on bass | 2nd (mdx_extra_q leads at 11.42) | same |
| ONNX vs PyTorch max abs diff | < 1e-3 | verified during export (see Day 1 spike report) |
Performance
| Runtime | Hardware | Per 7.8-s segment | Per 3-min song |
|---|---|---|---|
| onnxruntime CPU EP | Apple M4 Pro | ~1.6 s | ~22 s |
| PyTorch CPU | Apple M4 Pro | ~2.1 s | ~29 s |
| onnxruntime CUDA EP | NVIDIA L4 | ~0.4 s | ~5 s (extrapolated) |
| onnxruntime DirectML EP | RTX 4090 | ~0.2 s | ~2 s (extrapolated) |
Real-time factor on M4 Pro CPU: 0.20. Roughly 1.31Γ faster than PyTorch CPU on the same hardware.
Tooling β demucs-onnx Python package
This model can also be run (and re-exported) via the open-source
demucs-onnx Python package
on PyPI. It auto-downloads from this repo on first use.
pip install demucs-onnx
# Single specialist (this repo)
demucs-onnx separate song.mp3 stems/ --stem bass
# Or via the Python API
python -c "from demucs_onnx import separate_stem; \
audio = separate_stem('song.mp3', 'bass')"
The same package is also the canonical tool for exporting htdemucs
to ONNX yourself β it bundles all four blocker fixes (complex STFT,
fractions.Fraction, random.randrange,
aten::_native_multi_head_attention) so vanilla torch.onnx.export
works on your own checkpoints.
pip install "demucs-onnx[export]"
demucs-onnx export htdemucs_ft bass.onnx --stem bass
Common use cases
- Bassline transcription β MIDI / tab generation from any recording
- Mix rebalancing β isolate and re-EQ the bass bus on a finished mix
- Music education β learn basslines by hearing them isolated
- Sub-bass mastering reference β compare your low-end against pro mixes
Quick start
Python β minimal
import infer
bass = infer.separate_bass("your-song.mp3")
# bass: numpy array (2, samples) at 44.1 kHz
Python β full control
import soundfile as sf
import infer
# Optional execution providers β CPU is the default and most portable.
# Swap to "coreml" on macOS, "cuda" on NVIDIA, "dml" on Windows DX12.
audio, sr = sf.read("your-song.mp3", dtype="float32", always_2d=True)
stems = infer.separate(audio.T, sr, providers=["CPUExecutionProvider"])
sf.write("bass.wav", stems[infer.SOURCES.index("bass")].T, sr)
CLI
python infer.py your-song.mp3 ./out/
python infer.py your-song.mp3 ./out/ --providers cuda # NVIDIA
python infer.py your-song.mp3 ./out/ --providers coreml # macOS
python infer.py your-song.mp3 ./out/ --providers dml # Windows
Mobile (iOS / Swift)
import onnxruntime_objc
let env = try ORTEnv(loggingLevel: .warning)
let opts = try ORTSessionOptions()
try opts.appendCoreMLExecutionProvider(with: ORTCoreMLExecutionProviderOptions())
let session = try ORTSession(env: env,
modelPath: Bundle.main.path(forResource: "htdemucs_ft_bass", ofType: "onnx")!,
sessionOptions: opts)
// audio: 1 Γ 2 Γ 343980 Float32 buffer, then session.run(...).
Mobile (Android / Kotlin)
import ai.onnxruntime.OrtEnvironment
import ai.onnxruntime.OrtSession
val env = OrtEnvironment.getEnvironment()
val opts = OrtSession.SessionOptions().apply { addNnapi() }
val session = env.createSession(modelPath, opts)
Web (onnxruntime-web)
import * as ort from "onnxruntime-web";
const session = await ort.InferenceSession.create("htdemucs_ft_bass.onnx", {
executionProviders: ["wasm"],
graphOptimizationLevel: "all",
});
const tensor = new ort.Tensor("float32", audioBuffer, [1, 2, 343980]);
const out = await session.run({ mix: tensor });
// out.stems.data is a Float32Array (1, 4, 2, 343980); use row 1 for bass.
Input / output spec
| Tensor | Name | Shape | Dtype | Notes |
|---|---|---|---|---|
| Input | mix |
(1, 2, 343980) |
float32 | Stereo audio, 44.1 kHz, 7.8 s segment. Values in [-1, 1]. |
| Output | stems |
(1, 4, 2, 343980) |
float32 | [drums, bass, other, vocals] order. Use only row 1 (bass) β the other 3 rows are weakly-predicted by-products of the bass specialist. |
For longer audio, chunk with overlap-add β see infer.py::separate for a
working ~60-line implementation.
Related repos
Sibling stem-specialist ONNX repos from the same export:
| Repo | Stem | Use when |
|---|---|---|
htdemucs-ft-drums-onnx |
drums | Drum extraction, beat transcription |
htdemucs-ft-bass-onnx |
bass | Bassline transcription, mix rebalancing |
htdemucs-ft-other-onnx |
other | Karaoke instrumentals, sample-flipping |
htdemucs-ft-vocals-onnx |
vocals | #1 open-source vocal SDR β karaoke, acapella, vocal removal |
htdemucs-ft-onnx |
all 4 | Full 4-stem separation in one repo |
PyTorch versions for HF Inference Endpoints:
htdemucs-ft-pytorch,
htdemucs-ft-bass-pytorch.
Full benchmark across every popular open-source separator: StemSplitio/stem-separation-benchmark-2026.
Skip the infrastructure β use the StemSplit API
Don't want to ship a 316 MB model in your app, manage a GPU pool, or write overlap-add chunking? Use the StemSplit API instead β same model under the hood, hosted for you, with credits and a dashboard.
- π stemsplit.io
- π Developer docs
- π API reference
- π Guides & recipes
Or use the no-code tools that ship the same model family:
- π§ Stem Splitter
Files in this repo
| File | Size | Purpose |
|---|---|---|
htdemucs_ft_bass.onnx |
316 MB | The exported model. Opset 17. Passes onnx.checker. |
infer.py |
~6 KB | Pure numpy + onnxruntime reference. No torch. |
requirements.txt |
<1 KB | onnxruntime, numpy, soundfile. |
README.md |
this file |
License & attribution
This repo is MIT-licensed, matching the original HT-Demucs.
@inproceedings{rouard2023hybrid,
title = {Hybrid Transformers for Music Source Separation},
author = {Rouard, Simon and Massa, Francisco and D{\'e}fossez, Alexandre},
booktitle = {ICASSP},
year = {2023}
}
- Original PyTorch model:
facebookresearch/demucs - ONNX export, parity verification, and packaging by StemSplit
- Search keywords: bass extraction onnx, bass isolation, bassline extractor, htdemucs bass onnx