ChatGPT
fix: make upload and fallback robust
5a90820
metadata
title: Drum Sample Extractor
emoji: 🥁
colorFrom: gray
colorTo: pink
sdk: docker
app_port: 7860
pinned: false

Drum Sample Extractor

A custom FastAPI + browser workstation for extracting, reviewing, and now semantically supervising reusable drum samples from an audio file.

The pipeline is configured for Spleeter as the lightweight source-separation default when available, falls back to full-mix processing when optional separation dependencies are missing, keeps Demucs as an explicit quality backend, detects onsets, classifies hits, clusters similar transients, chooses representative samples, optionally synthesizes alternate samples, and exports WAVs, MIDI, target-stem reconstruction, full-context reproduced audio, manifests, selected-only packs, and complete ZIP sample packs. The interactive layer stores user corrections as replayable semantic state beside each run manifest.

Current status

The project is usable as a local/Hugging Face Space application. Gradio is no longer the active UI; the active app is a custom FastAPI backend plus a no-build browser frontend.

Implemented:

  • Custom web frontend in web/, served by app.py.

  • FastAPI job API with upload, polling, safe artifact downloads, config, health, cache clearing, run history, and SSE progress.

  • Timed pipeline runner in pipeline_runner.py.

  • Per-stage timing in every manifest.json.

  • Two clustering modes:

    • batch_quality: all-pairs mel/NCC similarity plus agglomerative clustering.
    • online_preview: prototype-based incremental assignment intended for near-realtime preview.
  • Disk cache for decoded full-mix/stem outputs keyed by source digest and extraction settings.

  • Run history panel indexing .runs/*/output/manifest.json.

  • Individual review WAVs for every detected hit under review/hits/.

  • Click-to-audition workflow for waveform onsets, detected hit rows, and representative sample rows.

  • Interactive supervised state in supervised_state.py:

    • persisted supervision_state.json,
    • hit/cluster confidence,
    • outlier-first review queue,
    • constraints,
    • event log,
    • suggestions,
    • undo stack.
  • Clean, fixed, non-scrolling workstation UI: explicit top-bar upload button, whole-app drag/drop overlay, collapsed left/right/bottom tool panels by default, large center waveform/sample workspace, bottom dock for review/edit tools, and an explicit Start here flow.

  • Immediate browser-side waveform rendering on file selection, before backend extraction starts.

  • Waveform-based real progress visualization during extraction using backend progress events; no ETA or time-progress guessing.

  • Visible API/runtime error banner in the UI, plus backend coercion for browser-form parameter values such as subdivision="16".

  • Supervision UI:

    • selected-hit actions,
    • move hit to cluster,
    • pull hit into a new cluster,
    • accept/favorite hit,
    • suppress hit as bleed,
    • lock/unlock cluster,
    • suggestion inbox with exact diff previews,
    • cluster explanation drawer,
    • force-onset waveform mode,
    • restore suppressed hits,
    • edited sample-pack export,
    • constraint/event log.
  • Spleeter source-separation backend selected by default, with spleeter:4stems, spleeter:2stems, and spleeter:5stems support.

  • Optional Demucs backend for explicit higher-quality separation; Spleeter failures now fall back to full-mix processing when fallback is enabled.

  • True per-card checkbox selection and selected-only export under selected/.

  • Persisted draw another card action that pins the next representative hit for the cluster.

  • Immediate trim/extend card edits that rewrite preview WAVs under overrides/hits/ and persist to supervised state.

  • Documentation for features, progress, tasks, API, timing, hit review, realtime suitability, UI, remaining work, and interactive UX.

  • Legacy Gradio apps preserved in legacy/ for reference only.

Not fully complete yet:

  • No true cached feature-vector local reclustering yet.
  • No cluster merge/split/relabel workflow beyond move/pull-to-new-cluster.
  • No frontend TypeScript build/test harness yet.
  • Spleeter progress is coarse-grained; Demucs progress exposes chunk-level work where available.
  • Demucs remains offline/batch by design and is treated as the higher-cost explicit quality backend.

See:

  • docs/FEATURES.md
  • docs/TASKS.md
  • docs/PROGRESS.md
  • docs/API.md
  • docs/interactive-ux/README.md
  • docs/REMAINING_WORK.md
  • docs/SUPERVISED_EXPORT_AND_FORCE_ONSET.md
  • docs/FIXED_WORKSTATION_UI.md
  • docs/REPRODUCED_AUDIO_AND_PARAMETERS.md
  • docs/CLEAN_DEFAULT_UI.md
  • docs/IMMEDIATE_WAVEFORM_AND_REAL_PROGRESS.md
  • docs/API_ERRORS_AND_PARAMETER_VALIDATION.md
  • docs/UPLOAD_ERROR_AND_RUNTIME_FALLBACK.md

Run locally

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn app:app --host 0.0.0.0 --port 7860

Open http://127.0.0.1:7860.

For fast iteration, use the default automatic flow. To bypass source separation entirely, open Advanced, use Fast preview, or set:

  • Separation engine = none
  • Stem = all
  • Clustering mode = online_preview

That uses the full mix and the near-realtime clustering path. The default engine is Spleeter. Install it separately with pip install -r requirements-spleeter.txt in an environment compatible with Spleeter/TensorFlow. If Spleeter is unavailable and fallback is enabled, the app falls back to full-mix processing so the UI still works. Choose Demucs explicitly under Expert controls for slower quality separation.

Run checks

python3 -m py_compile app.py pipeline_runner.py sample_extractor.py supervised_state.py supervised_export.py scripts/*.py
node --check web/app.js
python3 scripts/test_sse_and_review_hits.py
python3 scripts/test_interactive_supervision.py
python3 scripts/test_supervised_export_and_force_onset.py
python3 scripts/test_progress_contract.py
python3 scripts/test_param_validation_and_api_errors.py
python3 scripts/test_selected_export_card_actions.py

Run benchmarks

python3 scripts/benchmark_subprocesses.py --runs 2 --bars 4 --output docs/benchmark-subprocesses.json

The benchmark uses synthetic drum fixtures and stem=all so the DSP stages are measured without Demucs model download/runtime noise.

API example

curl http://127.0.0.1:7860/api/config

curl -F 'file=@song.wav' \
  -F 'params={"separation_backend":"spleeter","spleeter_model":"spleeter:4stems","stem":"drums","clustering_mode":"online_preview","target_min":4,"target_max":12}' \
  http://127.0.0.1:7860/api/jobs

Then poll the returned job id:

curl http://127.0.0.1:7860/api/jobs/<job-id>

Read supervised state:

curl http://127.0.0.1:7860/api/jobs/<job-id>/state

Move a hit into a target cluster:

curl -X POST http://127.0.0.1:7860/api/jobs/<job-id>/hits/hit%3A00003/move \
  -H 'Content-Type: application/json' \
  -d '{"target_cluster_id":"cluster:0"}'

Export selected cards only:

curl -X POST http://127.0.0.1:7860/api/jobs/<job-id>/export-selected \
  -H 'Content-Type: application/json' \
  -d '{"labels":["kick_0","snare_0"],"synthesize":true}'

Draw another representative for a card:

curl -X POST http://127.0.0.1:7860/api/jobs/<job-id>/samples/kick_0/draw

Trim/extend the current representative preview:

curl -X POST http://127.0.0.1:7860/api/jobs/<job-id>/samples/kick_0/edit \
  -H 'Content-Type: application/json' \
  -d '{"start_offset_ms":-8,"tail_offset_ms":24}'

List active/completed runs:

curl http://127.0.0.1:7860/api/jobs

Important files

Path Purpose
app.py FastAPI app, static UI serving, job API, run history, artifact downloads, supervised editing endpoints
pipeline_runner.py Timed extraction pipeline, Spleeter/Demucs/none separation backends, real progress contract, disk source/stem/context cache, batch/online clustering routing
sample_extractor.py Core DSP/sample extraction implementation, including chunk-progress callback support for Demucs stem extraction
supervised_state.py Persistent semantic state, confidence, constraints, events, suggestions, force-onset, restore, undo
supervised_export.py Renders edited semantic state into supervised and selected-only WAV/MIDI/reconstruction/ZIP artifacts
web/ Custom no-build browser frontend with clean fixed non-scrolling workstation layout, explicit upload/whole-page drag-drop, immediate uploaded waveform rendering, real-progress waveform tinting, source/stem/reproduced preview transport, common/advanced parameter separation, collapsed sidebars/bottom dock, sample-card grid, hidden-audio audition, add-onset mode, and edited export
scripts/benchmark_subprocesses.py Synthetic benchmark runner for stage timings
scripts/test_interactive_supervision.py Smoke test for supervised state endpoints
scripts/test_supervised_export_and_force_onset.py Smoke test for force-onset, restore, suggestion diffs, and edited exports
scripts/test_param_validation_and_api_errors.py Regression test for browser-style parameter coercion and visible API error details
scripts/test_selected_export_card_actions.py Smoke test for selected-only export, draw-next persistence, and immediate preview timing edits
docs/interactive-ux/ Supplied interactive UX docs aligned to current implementation
docs/ Review, timing, API, UI, feature, task, progress, and remaining-work documentation
legacy/ Previous Gradio apps retained for reference

Optional Spleeter backend

Spleeter is the default selected backend because it is much lighter than Demucs for the common path. It is not pinned into requirements.txt because TensorFlow/Spleeter compatibility depends on the Python environment. Use:

pip install -r requirements-spleeter.txt

Leave allow_backend_fallback=true for normal use so missing or failing Spleeter installs automatically fall back to Demucs. Disable fallback only when debugging Spleeter itself.

Output per run

Each run is stored under .runs/<job-id>/output/:

  • stem.wav
  • reconstruction.wav
  • reconstruction.mid
  • sample-pack.zip
  • samples/*.wav
  • review/hits/*.wav
  • manifest.json
  • supervision_state.json
  • supervised/manifest.json after edited export
  • supervised/sample-pack.zip after edited export
  • selected/sample-pack.zip after selected-card export
  • overrides/hits/*.wav after immediate card trim/extend edits
  • supervised/samples/*.wav after edited export
  • supervised/reconstruction.mid after edited export
  • supervised/reconstruction.wav after edited export
  • source.wav, context_bed.wav, and target_reconstruction.wav for source/stem/reproduced A/B previews

Generated runtime directories are ignored by git:

  • .runs/
  • .cache/

Automatic default workflow

The default UI is now intentionally simple:

  1. Drop or upload an audio file.
  2. The waveform renders immediately in the browser.
  3. Upload and extraction start automatically.
  4. Automatic tuning chooses practical onset sensitivity and sample-group bounds after the source/stem is available.
  5. Sample cards appear in grouped columns as soon as their WAVs are written.
  6. The user can audition, dismiss, draw another candidate, or trim/extend a card. Draw and timing choices are persisted as semantic overrides and affect selected/edited exports.

Advanced parameters, run history, raw tables, and supervised semantic editing remain available in collapsed panels, but they are no longer required for the common path.

See docs/AUTOMATIC_CARD_FLOW_UI.md.

Reference-style UI update

The web UI now follows the supplied Sample Extractor reference: waveform-first canvas, grouped sample columns, persistent right settings panel, compact export bar, and a bottom selection/tools bar. Drop/upload still starts processing automatically.