HT-Demucs FT — Instrumental / Other Specialist (PyTorch)

Melodic / instrumental specialist from HT-Demucs FT — everything that isn't vocals, drums, or bass.

This is sub-model 2 of the 4-bag htdemucs_ft ensemble by Défossez et al. (Meta AI), extracted as a standalone ~160 MB model. It produces the other stem with the same quality as the full ensemble (median SDR 6.34 dB on MUSDB18-HQ — 2nd (close behind mdx_extra_q at 7.67) of all models in our 2026 benchmark) at roughly 1/4 the compute cost.

Want all 4 stems in one request? Use the full ensemble: StemSplitio/htdemucs-ft-pytorch

Want a hosted REST API with credits and a dashboard? Use the StemSplit API.

Why this model

Property	This model	Full `htdemucs_ft` bag
Disk size	~160 MB	~640 MB
Per-3-min-song latency (M4 Pro MPS)	~22 s (RTF 0.12)	~47 s (RTF 0.26)
Instrumental / Other SDR on MUSDB18-HQ	6.34 dB	6.34 dB (identical — the bag's `other` output IS this sub-model's output)
Other stems returned	None (focused)	All 4

If you only need the other stem in production, this is strictly faster and smaller than the full ensemble with identical other quality — ~2.6× faster wall time in our smoke tests on M4 Pro MPS.

Common use cases

Karaoke / instrumental tracks — extract the music-minus-vocals layer for karaoke mixes (use it with our htdemucs-ft-pytorch vocals model to round-trip)
Sample-flipping — isolate guitar/keys/synth lines for chopping and remixing
Cover-song production — remove vocals and rebalance the instrumental bed
Music-bed for video — strip vocals from licensed tracks for under-spoken-word use (check your sync rights first)

Quick start (Python)

import base64, io, soundfile as sf
from huggingface_hub import InferenceClient

with open("your-song.mp3", "rb") as f:
    audio_b64 = base64.b64encode(f.read()).decode()

client = InferenceClient(model="StemSplitio/htdemucs-ft-other-pytorch")
result = client.post(json={"inputs": audio_b64})

wav, sr = sf.read(io.BytesIO(base64.b64decode(result["other"])))
sf.write("out_other.wav", wav, sr)

Or run locally without Hugging Face at all:

import torch, soundfile as sf
from demucs.apply import apply_model
from demucs.audio import convert_audio
from demucs.pretrained import get_model

bag = get_model("htdemucs_ft")
model = bag.models[2].eval()  # the other specialist
wav, sr = sf.read("your-song.mp3", dtype="float32", always_2d=True)
wav = torch.from_numpy(wav.T).contiguous()
wav = convert_audio(wav, sr, bag.samplerate, bag.audio_channels).unsqueeze(0)

with torch.no_grad():
    stems = apply_model(model, wav, device="mps" if torch.backends.mps.is_available() else "cpu")[0]

# bag.sources == ["drums", "bass", "other", "vocals"]; pick the other row
sf.write("out_other.wav", stems[bag.sources.index("other")].T.numpy(), bag.samplerate)

Deploy on Hugging Face Inference Endpoints

Click Deploy → Inference Endpoints above, pick a GPU instance, and HF will spin up a container running handler.py.

Hardware	Latency for 3-min song
NVIDIA L4	~3 s
NVIDIA T4 small	~7 s
CPU x4 (basic)	~48 s

(Roughly 2.6× faster than the full-bag latency, since we run only this specialist sub-model. Cloud GPU numbers extrapolated from M4 Pro measurements.)

curl -X POST https://<your-endpoint>.endpoints.huggingface.cloud \
  -H "Authorization: Bearer $HF_TOKEN" \
  -H "Content-Type: application/json" \
  -d "{\"inputs\": \"$(base64 < your-song.mp3)\"}"

Try it in your browser, no code

Related models from StemSplit

Repo	Stem	When to use
`htdemucs-ft-pytorch`	all 4	When you need vocals + drums + bass + other in one request
`htdemucs-ft-vocals-pytorch`	vocals	Best vocal SDR in our benchmark (9.19 dB) — karaoke, acapella
`htdemucs-ft-drums-pytorch`	drums	Drum extraction, beat transcription, sample-pack creation
`htdemucs-ft-bass-pytorch`	bass	Bassline transcription, mix rebalancing
`htdemucs-ft-other-pytorch`	other / instrumental	Karaoke instrumentals, sample-flipping, music-bed extraction

Full benchmark across every popular open-source separator: StemSplitio/stem-separation-benchmark-2026.

License & attribution

This repo is MIT-licensed, matching the original HT-Demucs.

Original authors (please cite if you use this model in research):

@inproceedings{rouard2023hybrid,
  title     = {Hybrid Transformers for Music Source Separation},
  author    = {Rouard, Simon and Massa, Francisco and D{\'e}fossez, Alexandre},
  booktitle = {ICASSP},
  year      = {2023}
}

Original model: facebookresearch/demucs
Packaging by StemSplit
Search keywords: instrumental extractor, karaoke maker, music minus vocals, AI instrumental separator

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train StemSplitio/htdemucs-ft-other-pytorch

Collection including StemSplitio/htdemucs-ft-other-pytorch

Music Source Separation Toolkit 2026

Collection

Open-source models + our reproducible MUSDB18-HQ benchmark for music source separation. Curated by the StemSplit team. • 19 items • Updated 3 days ago