Silverdaw Rhythm Quality Pack β 4-stem BS-RoFormer (ONNX)
A host-STFT ONNX export of a 4-stem BS-RoFormer music source-separation model
(bass / drums / other / vocals), used by Silverdaw
as its optional higher-quality drums + bass model. Silverdaw runs the model
once and keeps the drums and bass outputs (vocals come from a dedicated
vocal model; other is reconstructed as the residual).
The torch.stft / torch.istft are stripped out of the graph: the ONNX core
consumes a precomputed STFT and returns the masked per-stem spectrogram, and the
host application runs the STFT and per-stem iSTFT. This keeps the graph portable
(standard ONNX ops, opset 17, no contrib operators) so it runs on the CPU and on
DirectX 12 GPUs via ONNX Runtime's DirectML execution provider.
Files
| File | Size | Notes |
|---|---|---|
bs_roformer_4stem_rhythm_fp16.onnx |
~257 MB | fp16 weights, single self-contained graph (no external .onnx.data). |
I/O contract
- Inputs β
spec_real,spec_imag, eachfloat32 [1, 2, 1025, 801]= (batch, audio channel, frequency bin, time frame): the real and imaginary parts oftorch.stft(audio, n_fft=2048, hop_length=441, win_length=2048, window=hann, center=True, normalized=False). - Outputs β
out_spec_real,out_spec_imag, eachfloat32 [1, 4, 2, 1025, 801]= (batch, stem, channel, freq, frame): the masked spectrogram per stem. Stem order:drums, bass, other, vocals. The DC bin is zeroed in-graph. - Window β traced at a fixed 8 s chunk (
T = 801frames, 352,800 samples at 44.1 kHz). The rotary embedding caches by sequence length, so the time axis is fixed; callers chunk the audio and overlap-add. Reconstruct each stem with the inverse STFT (same parameters) and a windowed overlap-add normalised by an accumulated window counter.
Provenance & license
License: MIT. This export combines three MIT-licensed sources:
- Weights β Β© ZFTurbo, from
ZFTurbo/Music-Source-Separation-Training(themodel_bs_roformer_ep_17_sdr_9.65684-stem checkpoint, configconfig_bs_roformer_384_8_2_485100, trained on MUSDB18-HQ; release v1.0.12). - Architecture β BS-RoFormer /
lucidrains/BS-RoFormer(MIT). - Export pipeline β the host-STFT ONNX wrapper is derived from the MIT
elicwhite/bs-roformer-webbuild pipeline.
Note on training data: the model was trained on MUSDB18-HQ, whose dataset terms are research-oriented. Whether dataset terms reach through to the resulting weights is an unsettled, industry-wide question; this export treats the trained weights as governed by the trainer's chosen MIT license, the same posture taken by other openly-distributed separation models (e.g. htdemucs).
Intended use
Designed for offline (non-real-time) stem separation inside a desktop DAW. It is not a real-time audio-callback model β run it on a worker/background thread.