htdemucs-onnx (YouStem build)

ONNX export of HTDemucs (Hybrid Transformer Demucs v4) for in-browser music source separation. Splits a track into 4 stems: drums, bass, other, vocals.

This build powers YouStem, a Chrome/Brave extension that separates stems entirely on-device via WebGPU (onnxruntime-web). No audio ever leaves the user's machine.

Provenance and license

This model was reconverted from the official MIT-licensed HTDemucs weights published by Meta in facebookresearch/demucs, using the demucs Python package (demucs.pretrained.get_model("htdemucs")).

The weights are numerically identical to the upstream MIT release; only the graph was exported to ONNX with the STFT/iSTFT externalised (see below).

What is different from a plain demucs export

The short-time Fourier transform (STFT) and its inverse are not part of this graph. They are computed in JavaScript by the host application. The model:

  • takes the raw waveform and a pre-computed complex spectrogram as inputs;
  • returns the two HTDemucs branches (frequency mask + time waveform) separately, so the application performs the iSTFT and the final sum.

This keeps the ONNX graph free of FFT operators (which are awkward in onnxruntime-web) while remaining numerically equivalent to the reference model.

Inputs

name shape dtype meaning
mix [1, 2, 343980] float32 raw stereo waveform, 44.1 kHz, 7.8 s segment
mag [1, 4, 2048, 336] float32 complex STFT as channels: [L.real, L.imag, R.real, R.imag], un-normalised (the model normalises internally)

Outputs

name shape dtype meaning
freq [1, 4, 4, 2048, 336] float32 frequency branch, complex-as-channels mask per source
time [1, 4, 2, 343980] float32 time branch, waveform per source

Source order: ['drums', 'bass', 'other', 'vocals'].

Spectrogram parameters

sample_rate=44100, n_fft=4096, hop_length=1024, segment=7.8 s (343980 samples), freq_bins=2048, frames=336.

Specs

  • Opset 18, 100% standard ONNX operators (no custom domains).
  • ~166 MB, float32.

Citation

@article{rouard2022hybrid,
  title={Hybrid Transformers for Music Source Separation},
  author={Rouard, Simon and Massa, Francisco and D{\'e}fossez, Alexandre},
  journal={ICASSP 2023},
  year={2023}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support