HT-Demucs FT β€” Bass Specialist, ONNX

Bass extraction specialist from HT-Demucs FT, packaged as ONNX. ~1.31Γ— faster than PyTorch CPU, no PyTorch required at inference.

This repo packages sub-model 1 of the htdemucs_ft 4-bag ensemble as a single 316 MB .onnx file plus a ~150-line numpy reference inference script. Verified to be numerically equivalent to the original PyTorch model.

Want all 4 stems in one drop-in package? Use the full bag repo: StemSplitio/htdemucs-ft-onnx.


TL;DR

pip install onnxruntime numpy soundfile
python infer.py your-song.mp3 ./out/
# writes ./out/bass.wav at 44.1 kHz stereo

That's it. No PyTorch, no CUDA setup, no GPU server.


Quality

Metric (MUSDB18-HQ test, 50 songs) Value Source
Median bass SDR 10.38 dB StemSplitio/stem-separation-benchmark-2026
Rank among open-source separators on bass 2nd (mdx_extra_q leads at 11.42) same
ONNX vs PyTorch max abs diff < 1e-3 verified during export (see Day 1 spike report)

Performance

Runtime Hardware Per 7.8-s segment Per 3-min song
onnxruntime CPU EP Apple M4 Pro ~1.6 s ~22 s
PyTorch CPU Apple M4 Pro ~2.1 s ~29 s
onnxruntime CUDA EP NVIDIA L4 ~0.4 s ~5 s (extrapolated)
onnxruntime DirectML EP RTX 4090 ~0.2 s ~2 s (extrapolated)

Real-time factor on M4 Pro CPU: 0.20. Roughly 1.31Γ— faster than PyTorch CPU on the same hardware.


Tooling β€” demucs-onnx Python package

This model can also be run (and re-exported) via the open-source demucs-onnx Python package on PyPI. It auto-downloads from this repo on first use.

pip install demucs-onnx

# Single specialist (this repo)
demucs-onnx separate song.mp3 stems/ --stem bass

# Or via the Python API
python -c "from demucs_onnx import separate_stem; \
  audio = separate_stem('song.mp3', 'bass')"

The same package is also the canonical tool for exporting htdemucs to ONNX yourself β€” it bundles all four blocker fixes (complex STFT, fractions.Fraction, random.randrange, aten::_native_multi_head_attention) so vanilla torch.onnx.export works on your own checkpoints.

pip install "demucs-onnx[export]"
demucs-onnx export htdemucs_ft bass.onnx --stem bass

Common use cases

  • Bassline transcription β€” MIDI / tab generation from any recording
  • Mix rebalancing β€” isolate and re-EQ the bass bus on a finished mix
  • Music education β€” learn basslines by hearing them isolated
  • Sub-bass mastering reference β€” compare your low-end against pro mixes

Quick start

Python β€” minimal

import infer
bass = infer.separate_bass("your-song.mp3")
# bass: numpy array (2, samples) at 44.1 kHz

Python β€” full control

import soundfile as sf
import infer

# Optional execution providers β€” CPU is the default and most portable.
# Swap to "coreml" on macOS, "cuda" on NVIDIA, "dml" on Windows DX12.
audio, sr = sf.read("your-song.mp3", dtype="float32", always_2d=True)
stems = infer.separate(audio.T, sr, providers=["CPUExecutionProvider"])
sf.write("bass.wav", stems[infer.SOURCES.index("bass")].T, sr)

CLI

python infer.py your-song.mp3 ./out/
python infer.py your-song.mp3 ./out/ --providers cuda    # NVIDIA
python infer.py your-song.mp3 ./out/ --providers coreml  # macOS
python infer.py your-song.mp3 ./out/ --providers dml     # Windows

Mobile (iOS / Swift)

import onnxruntime_objc

let env = try ORTEnv(loggingLevel: .warning)
let opts = try ORTSessionOptions()
try opts.appendCoreMLExecutionProvider(with: ORTCoreMLExecutionProviderOptions())
let session = try ORTSession(env: env,
                              modelPath: Bundle.main.path(forResource: "htdemucs_ft_bass", ofType: "onnx")!,
                              sessionOptions: opts)
// audio: 1 Γ— 2 Γ— 343980 Float32 buffer, then session.run(...).

Mobile (Android / Kotlin)

import ai.onnxruntime.OrtEnvironment
import ai.onnxruntime.OrtSession

val env = OrtEnvironment.getEnvironment()
val opts = OrtSession.SessionOptions().apply { addNnapi() }
val session = env.createSession(modelPath, opts)

Web (onnxruntime-web)

import * as ort from "onnxruntime-web";
const session = await ort.InferenceSession.create("htdemucs_ft_bass.onnx", {
  executionProviders: ["wasm"],
  graphOptimizationLevel: "all",
});
const tensor = new ort.Tensor("float32", audioBuffer, [1, 2, 343980]);
const out = await session.run({ mix: tensor });
// out.stems.data is a Float32Array (1, 4, 2, 343980); use row 1 for bass.

Input / output spec

Tensor Name Shape Dtype Notes
Input mix (1, 2, 343980) float32 Stereo audio, 44.1 kHz, 7.8 s segment. Values in [-1, 1].
Output stems (1, 4, 2, 343980) float32 [drums, bass, other, vocals] order. Use only row 1 (bass) β€” the other 3 rows are weakly-predicted by-products of the bass specialist.

For longer audio, chunk with overlap-add β€” see infer.py::separate for a working ~60-line implementation.


Related repos

Sibling stem-specialist ONNX repos from the same export:

Repo Stem Use when
htdemucs-ft-drums-onnx drums Drum extraction, beat transcription
htdemucs-ft-bass-onnx bass Bassline transcription, mix rebalancing
htdemucs-ft-other-onnx other Karaoke instrumentals, sample-flipping
htdemucs-ft-vocals-onnx vocals #1 open-source vocal SDR β€” karaoke, acapella, vocal removal
htdemucs-ft-onnx all 4 Full 4-stem separation in one repo

PyTorch versions for HF Inference Endpoints: htdemucs-ft-pytorch, htdemucs-ft-bass-pytorch.

Full benchmark across every popular open-source separator: StemSplitio/stem-separation-benchmark-2026.


Skip the infrastructure β€” use the StemSplit API

Don't want to ship a 316 MB model in your app, manage a GPU pool, or write overlap-add chunking? Use the StemSplit API instead β€” same model under the hood, hosted for you, with credits and a dashboard.

Or use the no-code tools that ship the same model family:


Files in this repo

File Size Purpose
htdemucs_ft_bass.onnx 316 MB The exported model. Opset 17. Passes onnx.checker.
infer.py ~6 KB Pure numpy + onnxruntime reference. No torch.
requirements.txt <1 KB onnxruntime, numpy, soundfile.
README.md this file

License & attribution

This repo is MIT-licensed, matching the original HT-Demucs.

@inproceedings{rouard2023hybrid,
  title     = {Hybrid Transformers for Music Source Separation},
  author    = {Rouard, Simon and Massa, Francisco and D{\'e}fossez, Alexandre},
  booktitle = {ICASSP},
  year      = {2023}
}
  • Original PyTorch model: facebookresearch/demucs
  • ONNX export, parity verification, and packaging by StemSplit
  • Search keywords: bass extraction onnx, bass isolation, bassline extractor, htdemucs bass onnx
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train StemSplitio/htdemucs-ft-bass-onnx

Collection including StemSplitio/htdemucs-ft-bass-onnx