title: Drum Sample Extractor
emoji: 🥁
colorFrom: gray
colorTo: pink
sdk: docker
app_port: 7860
pinned: false
Drum Sample Extractor
A custom FastAPI + browser workstation for extracting, reviewing, and now semantically supervising reusable drum samples from an audio file.
The pipeline is configured for Spleeter as the lightweight source-separation default when available, falls back to full-mix processing when optional separation dependencies are missing, keeps Demucs as an explicit quality backend, detects onsets, classifies hits, clusters similar transients, chooses representative samples, optionally synthesizes alternate samples, and exports WAVs, MIDI, target-stem reconstruction, full-context reproduced audio, manifests, selected-only packs, and complete ZIP sample packs. The interactive layer stores user corrections as replayable semantic state beside each run manifest.
Current status
The project is usable as a local/Hugging Face Space application. Gradio is no longer the active UI; the active app is a custom FastAPI backend plus a no-build browser frontend.
Implemented:
Custom web frontend in
web/, served byapp.py.FastAPI job API with upload, polling, safe artifact downloads, config, health, cache clearing, run history, and SSE progress.
Timed pipeline runner in
pipeline_runner.py.Per-stage timing in every
manifest.json.Two clustering modes:
batch_quality: all-pairs mel/NCC similarity plus agglomerative clustering.online_preview: prototype-based incremental assignment intended for near-realtime preview.
Disk cache for decoded full-mix/stem outputs keyed by source digest and extraction settings.
Run history panel indexing
.runs/*/output/manifest.json.Individual review WAVs for every detected hit under
review/hits/.Click-to-audition workflow for waveform onsets, detected hit rows, and representative sample rows.
Interactive supervised state in
supervised_state.py:- persisted
supervision_state.json, - hit/cluster confidence,
- outlier-first review queue,
- constraints,
- event log,
- suggestions,
- undo stack.
- persisted
Clean, fixed, non-scrolling workstation UI: explicit top-bar upload button, whole-app drag/drop overlay, collapsed left/right/bottom tool panels by default, large center waveform/sample workspace, bottom dock for review/edit tools, and an explicit Start here flow.
Immediate browser-side waveform rendering on file selection, before backend extraction starts.
Waveform-based real progress visualization during extraction using backend
progressevents; no ETA or time-progress guessing.Visible API/runtime error banner in the UI, plus backend coercion for browser-form parameter values such as
subdivision="16".Supervision UI:
- selected-hit actions,
- move hit to cluster,
- pull hit into a new cluster,
- accept/favorite hit,
- suppress hit as bleed,
- lock/unlock cluster,
- suggestion inbox with exact diff previews,
- cluster explanation drawer,
- force-onset waveform mode,
- restore suppressed hits,
- edited sample-pack export,
- constraint/event log.
Spleeter source-separation backend selected by default, with
spleeter:4stems,spleeter:2stems, andspleeter:5stemssupport.Optional Demucs backend for explicit higher-quality separation; Spleeter failures now fall back to full-mix processing when fallback is enabled.
True per-card checkbox selection and selected-only export under
selected/.Persisted
draw anothercard action that pins the next representative hit for the cluster.Immediate trim/extend card edits that rewrite preview WAVs under
overrides/hits/and persist to supervised state.Documentation for features, progress, tasks, API, timing, hit review, realtime suitability, UI, remaining work, and interactive UX.
Legacy Gradio apps preserved in
legacy/for reference only.
Not fully complete yet:
- No true cached feature-vector local reclustering yet.
- No cluster merge/split/relabel workflow beyond move/pull-to-new-cluster.
- No frontend TypeScript build/test harness yet.
- Spleeter progress is coarse-grained; Demucs progress exposes chunk-level work where available.
- Demucs remains offline/batch by design and is treated as the higher-cost explicit quality backend.
See:
docs/FEATURES.mddocs/TASKS.mddocs/PROGRESS.mddocs/API.mddocs/interactive-ux/README.mddocs/REMAINING_WORK.mddocs/SUPERVISED_EXPORT_AND_FORCE_ONSET.mddocs/FIXED_WORKSTATION_UI.mddocs/REPRODUCED_AUDIO_AND_PARAMETERS.mddocs/CLEAN_DEFAULT_UI.mddocs/IMMEDIATE_WAVEFORM_AND_REAL_PROGRESS.mddocs/API_ERRORS_AND_PARAMETER_VALIDATION.mddocs/UPLOAD_ERROR_AND_RUNTIME_FALLBACK.md
Run locally
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn app:app --host 0.0.0.0 --port 7860
Open http://127.0.0.1:7860.
For fast iteration, use the default automatic flow. To bypass source separation entirely, open Advanced, use Fast preview, or set:
Separation engine = noneStem = allClustering mode = online_preview
That uses the full mix and the near-realtime clustering path. The default engine is Spleeter. Install it separately with pip install -r requirements-spleeter.txt in an environment compatible with Spleeter/TensorFlow. If Spleeter is unavailable and fallback is enabled, the app falls back to full-mix processing so the UI still works. Choose Demucs explicitly under Expert controls for slower quality separation.
Run checks
python3 -m py_compile app.py pipeline_runner.py sample_extractor.py supervised_state.py supervised_export.py scripts/*.py
node --check web/app.js
python3 scripts/test_sse_and_review_hits.py
python3 scripts/test_interactive_supervision.py
python3 scripts/test_supervised_export_and_force_onset.py
python3 scripts/test_progress_contract.py
python3 scripts/test_param_validation_and_api_errors.py
python3 scripts/test_selected_export_card_actions.py
Run benchmarks
python3 scripts/benchmark_subprocesses.py --runs 2 --bars 4 --output docs/benchmark-subprocesses.json
The benchmark uses synthetic drum fixtures and stem=all so the DSP stages are measured without Demucs model download/runtime noise.
API example
curl http://127.0.0.1:7860/api/config
curl -F 'file=@song.wav' \
-F 'params={"separation_backend":"spleeter","spleeter_model":"spleeter:4stems","stem":"drums","clustering_mode":"online_preview","target_min":4,"target_max":12}' \
http://127.0.0.1:7860/api/jobs
Then poll the returned job id:
curl http://127.0.0.1:7860/api/jobs/<job-id>
Read supervised state:
curl http://127.0.0.1:7860/api/jobs/<job-id>/state
Move a hit into a target cluster:
curl -X POST http://127.0.0.1:7860/api/jobs/<job-id>/hits/hit%3A00003/move \
-H 'Content-Type: application/json' \
-d '{"target_cluster_id":"cluster:0"}'
Export selected cards only:
curl -X POST http://127.0.0.1:7860/api/jobs/<job-id>/export-selected \
-H 'Content-Type: application/json' \
-d '{"labels":["kick_0","snare_0"],"synthesize":true}'
Draw another representative for a card:
curl -X POST http://127.0.0.1:7860/api/jobs/<job-id>/samples/kick_0/draw
Trim/extend the current representative preview:
curl -X POST http://127.0.0.1:7860/api/jobs/<job-id>/samples/kick_0/edit \
-H 'Content-Type: application/json' \
-d '{"start_offset_ms":-8,"tail_offset_ms":24}'
List active/completed runs:
curl http://127.0.0.1:7860/api/jobs
Important files
| Path | Purpose |
|---|---|
app.py |
FastAPI app, static UI serving, job API, run history, artifact downloads, supervised editing endpoints |
pipeline_runner.py |
Timed extraction pipeline, Spleeter/Demucs/none separation backends, real progress contract, disk source/stem/context cache, batch/online clustering routing |
sample_extractor.py |
Core DSP/sample extraction implementation, including chunk-progress callback support for Demucs stem extraction |
supervised_state.py |
Persistent semantic state, confidence, constraints, events, suggestions, force-onset, restore, undo |
supervised_export.py |
Renders edited semantic state into supervised and selected-only WAV/MIDI/reconstruction/ZIP artifacts |
web/ |
Custom no-build browser frontend with clean fixed non-scrolling workstation layout, explicit upload/whole-page drag-drop, immediate uploaded waveform rendering, real-progress waveform tinting, source/stem/reproduced preview transport, common/advanced parameter separation, collapsed sidebars/bottom dock, sample-card grid, hidden-audio audition, add-onset mode, and edited export |
scripts/benchmark_subprocesses.py |
Synthetic benchmark runner for stage timings |
scripts/test_interactive_supervision.py |
Smoke test for supervised state endpoints |
scripts/test_supervised_export_and_force_onset.py |
Smoke test for force-onset, restore, suggestion diffs, and edited exports |
scripts/test_param_validation_and_api_errors.py |
Regression test for browser-style parameter coercion and visible API error details |
scripts/test_selected_export_card_actions.py |
Smoke test for selected-only export, draw-next persistence, and immediate preview timing edits |
docs/interactive-ux/ |
Supplied interactive UX docs aligned to current implementation |
docs/ |
Review, timing, API, UI, feature, task, progress, and remaining-work documentation |
legacy/ |
Previous Gradio apps retained for reference |
Optional Spleeter backend
Spleeter is the default selected backend because it is much lighter than Demucs for the common path. It is not pinned into requirements.txt because TensorFlow/Spleeter compatibility depends on the Python environment. Use:
pip install -r requirements-spleeter.txt
Leave allow_backend_fallback=true for normal use so missing or failing Spleeter installs automatically fall back to Demucs. Disable fallback only when debugging Spleeter itself.
Output per run
Each run is stored under .runs/<job-id>/output/:
stem.wavreconstruction.wavreconstruction.midsample-pack.zipsamples/*.wavreview/hits/*.wavmanifest.jsonsupervision_state.jsonsupervised/manifest.jsonafter edited exportsupervised/sample-pack.zipafter edited exportselected/sample-pack.zipafter selected-card exportoverrides/hits/*.wavafter immediate card trim/extend editssupervised/samples/*.wavafter edited exportsupervised/reconstruction.midafter edited exportsupervised/reconstruction.wavafter edited exportsource.wav,context_bed.wav, andtarget_reconstruction.wavfor source/stem/reproduced A/B previews
Generated runtime directories are ignored by git:
.runs/.cache/
Automatic default workflow
The default UI is now intentionally simple:
- Drop or upload an audio file.
- The waveform renders immediately in the browser.
- Upload and extraction start automatically.
- Automatic tuning chooses practical onset sensitivity and sample-group bounds after the source/stem is available.
- Sample cards appear in grouped columns as soon as their WAVs are written.
- The user can audition, dismiss, draw another candidate, or trim/extend a card. Draw and timing choices are persisted as semantic overrides and affect selected/edited exports.
Advanced parameters, run history, raw tables, and supervised semantic editing remain available in collapsed panels, but they are no longer required for the common path.
See docs/AUTOMATIC_CARD_FLOW_UI.md.
Reference-style UI update
The web UI now follows the supplied Sample Extractor reference: waveform-first canvas, grouped sample columns, persistent right settings panel, compact export bar, and a bottom selection/tools bar. Drop/upload still starts processing automatically.