Spaces:
Sleeping
Sleeping
Phase 0: Adding v0 documents
Browse files- README.md +121 -1
- docs/phase_0/planning_documentation.md +386 -0
README.md
CHANGED
|
@@ -1,2 +1,122 @@
|
|
| 1 |
# KeyArrange
|
| 2 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# KeyArrange
|
| 2 |
+
|
| 3 |
+
**Upload a song. Get sheet music you can actually play.**
|
| 4 |
+
|
| 5 |
+
[Live Demo](#demo) Β· [Sample Outputs](#sample-outputs)
|
| 6 |
+
|
| 7 |
+
> **Status:** The end-to-end pipeline is actively being built. See the [Roadmap](#roadmap) for what's done and what's next.
|
| 8 |
+
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
## The Problem
|
| 12 |
+
|
| 13 |
+
I play piano at an intermediate level β arpeggios, chord progressions, songs I've learned by feel. But I have no ear training. When I hear a song I want to learn, I can't transcribe it myself, and I can't find a playable arrangement for most of it.
|
| 14 |
+
|
| 15 |
+
The tools that exist don't solve this:
|
| 16 |
+
|
| 17 |
+
| Tool | What it does | Why it falls short |
|
| 18 |
+
|---|---|---|
|
| 19 |
+
| Synthesia / YouTube tutorials | Walks you through notes visually | You follow, you don't learn. No sheet music. |
|
| 20 |
+
| AnthemScore / Transcribe! | Full polyphonic transcription | Outputs every instrument. Unplayable by one person. |
|
| 21 |
+
| MuseScore built-in import | Converts MIDI you already have | Assumes you already have a clean source. |
|
| 22 |
+
| Online MIDI converters | Audio β raw MIDI | Raw transcription dump. No arrangement, no playability logic. |
|
| 23 |
+
|
| 24 |
+
None of them answer the actual question: *I heard this song. Can I play it on piano tonight?*
|
| 25 |
+
|
| 26 |
+
KeyArrange does. It takes any pop song as audio input, separates the musical roles, and produces a two-hand beginnerβintermediate piano arrangement β physically playable, sheet music included.
|
| 27 |
+
|
| 28 |
+
---
|
| 29 |
+
|
| 30 |
+
## Demo
|
| 31 |
+
|
| 32 |
+
> Upload an MP3. Download a PDF and MIDI you can open in any DAW or print and play.
|
| 33 |
+
|
| 34 |
+
```
|
| 35 |
+
[ Choose file ] β [ Arrange ] β [ Download PDF ] [ Download MIDI ]
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
*Live demo coming with v1 deploy.*
|
| 39 |
+
|
| 40 |
+
### Sample Outputs
|
| 41 |
+
|
| 42 |
+
| Song | Original | Arranged MIDI | Sheet Music PDF |
|
| 43 |
+
|---|---|---|---|
|
| 44 |
+
| Coming soon | β | β | β |
|
| 45 |
+
|
| 46 |
+
---
|
| 47 |
+
|
| 48 |
+
## What "playable" actually means
|
| 49 |
+
|
| 50 |
+
This is the core design constraint the whole project is built around. Playable doesn't mean "renders correctly in a DAW." It means a real pianist can sit down and play it.
|
| 51 |
+
|
| 52 |
+
- Max hand span: one octave (~9 white keys)
|
| 53 |
+
- Max notes per hand at any moment: 3
|
| 54 |
+
- Note density matched to tempo β no sixteenth-note runs at 160 BPM
|
| 55 |
+
- Clear left hand / right hand separation β no ambiguous voicings
|
| 56 |
+
- Repeated notes spaced so a human can re-strike them
|
| 57 |
+
|
| 58 |
+
The quick test: can a pianist sight-read it at moderate tempo, clearly assign hands, and not need to rearrange anything? If yes, it passes.
|
| 59 |
+
|
| 60 |
+
---
|
| 61 |
+
|
| 62 |
+
## Stack
|
| 63 |
+
|
| 64 |
+
- **[Demucs](https://github.com/facebookresearch/demucs)** (Facebook Research) β source separation into vocals, bass, drums, and other
|
| 65 |
+
- **[Basic Pitch](https://github.com/spotify/basic-pitch)** (Spotify) β lightweight polyphonic audio-to-MIDI transcription
|
| 66 |
+
- **[music21](https://web.mit.edu/music21/)** β chord analysis, key detection, voice leading
|
| 67 |
+
- **[pretty_midi](https://github.com/craffel/pretty-midi)** β MIDI read/write
|
| 68 |
+
- **[MuseScore CLI](https://musescore.org)** β sheet music rendering
|
| 69 |
+
- **[FastAPI](https://fastapi.tiangolo.com)** β web API
|
| 70 |
+
- **Python 3.11+**
|
| 71 |
+
|
| 72 |
+
---
|
| 73 |
+
|
| 74 |
+
## Roadmap
|
| 75 |
+
|
| 76 |
+
**v1 β in progress**
|
| 77 |
+
- [ ] End-to-end pipeline: audio β MIDI
|
| 78 |
+
- [ ] Three core playability transforms (density, span, note cap)
|
| 79 |
+
- [ ] Web UI with MIDI download
|
| 80 |
+
|
| 81 |
+
**v2 β planned**
|
| 82 |
+
- [ ] Chord-aware left hand voicing (root + third + fifth from chord analysis)
|
| 83 |
+
- [ ] MuseScore PDF rendering
|
| 84 |
+
- [ ] Before/after example gallery
|
| 85 |
+
|
| 86 |
+
**v3 β planned**
|
| 87 |
+
- [ ] Beat tracking with madmom for better metric strength scoring
|
| 88 |
+
- [ ] Melody smoothing β strip ornaments and melisma from vocal transcription
|
| 89 |
+
- [ ] Difficulty score on output
|
| 90 |
+
|
| 91 |
+
**Later**
|
| 92 |
+
- [ ] Fine-tuned arrangement model on POP909 dataset
|
| 93 |
+
- [ ] Difficulty levels (beginner / intermediate / advanced)
|
| 94 |
+
- [ ] Style options (ballad, jazz voicings)
|
| 95 |
+
|
| 96 |
+
---
|
| 97 |
+
|
| 98 |
+
## Project Structure
|
| 99 |
+
|
| 100 |
+
```
|
| 101 |
+
KeyArrange/
|
| 102 |
+
βββ README.md
|
| 103 |
+
βββ DECISIONS.md β key tradeoffs and why
|
| 104 |
+
βββ src/keyarrange/
|
| 105 |
+
β βββ audio/ β loading, normalization
|
| 106 |
+
β βββ separation/ β Demucs wrapper
|
| 107 |
+
β βββ analysis/ β tempo, beat, key
|
| 108 |
+
β βββ structure/ β transcription, chord estimation
|
| 109 |
+
β βββ piano/ β arrangement engine
|
| 110 |
+
β βββ render/ β MIDI + PDF output
|
| 111 |
+
β βββ api/ β FastAPI app
|
| 112 |
+
β βββ cli.py
|
| 113 |
+
βββ web/ β single-page frontend
|
| 114 |
+
βββ tests/
|
| 115 |
+
βββ examples/sample_outputs/
|
| 116 |
+
```
|
| 117 |
+
|
| 118 |
+
---
|
| 119 |
+
|
| 120 |
+
## License
|
| 121 |
+
|
| 122 |
+
MIT. Audio files are user-provided. Output MIDI is a derivative arrangement. No original audio is stored or redistributed.
|
docs/phase_0/planning_documentation.md
ADDED
|
@@ -0,0 +1,386 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Phase 0: Planning
|
| 2 |
+
|
| 3 |
+
All Phase 0 planning notes in one place for now. Once things settle, each section gets split into its own `.md` in the repo. Fast iteration first, structure later.
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## 01\_project\_scope.md
|
| 8 |
+
|
| 9 |
+
**Goal:** Take a pop song as audio input and produce a piano arrangement a real beginnerβintermediate pianist can sit down and play. Not just something that renders correctly β something actually performable.
|
| 10 |
+
|
| 11 |
+
**What we're balancing:** Musicality, physical playability, faithfulness to the song's identity.
|
| 12 |
+
|
| 13 |
+
**Input**
|
| 14 |
+
- Audio file (MP3 / WAV)
|
| 15 |
+
- Optional: tempo hints, key hints
|
| 16 |
+
|
| 17 |
+
**Output**
|
| 18 |
+
- MIDI file (primary)
|
| 19 |
+
- Sheet music PDF (Week 2 milestone)
|
| 20 |
+
- Web UI with live demo (MVP β not deferred)
|
| 21 |
+
|
| 22 |
+
**Deferred to later phases**
|
| 23 |
+
- Difficulty levels (beginner / intermediate / advanced)
|
| 24 |
+
- Style preferences (ballad, jazz)
|
| 25 |
+
- Real-time live transcription
|
| 26 |
+
|
| 27 |
+
---
|
| 28 |
+
|
| 29 |
+
## 02\_human\_playable.md
|
| 30 |
+
|
| 31 |
+
This is probably the most important definition in the project. "Human-playable" means a beginnerβintermediate pianist can physically perform the output without superhuman reach, speed, or memory.
|
| 32 |
+
|
| 33 |
+
**Not the same as** (common pitfall to avoid):
|
| 34 |
+
- MIDI that merely sounds correct
|
| 35 |
+
- Perfect transcription of all instruments
|
| 36 |
+
- Virtuoso-level arrangements
|
| 37 |
+
|
| 38 |
+
**Physical constraints**
|
| 39 |
+
- Max hand span ~9β10 white keys (one octave)
|
| 40 |
+
- No repeated octave leaps at fast tempos
|
| 41 |
+
- No impossible note clusters for a single hand
|
| 42 |
+
|
| 43 |
+
**Temporal constraints**
|
| 44 |
+
- Note density must match tempo (no eighth-note spam at 180+ BPM)
|
| 45 |
+
- Repeated notes must be realistically re-strikeable
|
| 46 |
+
|
| 47 |
+
**Cognitive constraints**
|
| 48 |
+
- Clear melody vs. accompaniment separation
|
| 49 |
+
- Predictable rhythmic patterns
|
| 50 |
+
- β€ 3 notes per hand at a time
|
| 51 |
+
|
| 52 |
+
**Quick check:** Can a pianist sight-read it at moderate tempo, clearly assign hands, and not need to rearrange voicings? β Human-playable.
|
| 53 |
+
|
| 54 |
+
---
|
| 55 |
+
|
| 56 |
+
## 03\_quality\_bar.md
|
| 57 |
+
|
| 58 |
+
The bar isn't perfection β it's "would a pianist actually use this?" Two sides to it:
|
| 59 |
+
|
| 60 |
+
**Musical**
|
| 61 |
+
- Preserve main vocal melody
|
| 62 |
+
- Preserve harmonic rhythm (chord changes)
|
| 63 |
+
- Should feel like a piano arrangement, not a transcription dump
|
| 64 |
+
|
| 65 |
+
**Playability**
|
| 66 |
+
- Playable by a single pianist
|
| 67 |
+
- Stick to triads β avoid dense cluster chords
|
| 68 |
+
- Prefer broken chords / arpeggios over sustained pads
|
| 69 |
+
|
| 70 |
+
---
|
| 71 |
+
|
| 72 |
+
## 04\_non\_goals.md
|
| 73 |
+
|
| 74 |
+
Explicitly out of scope for v1:
|
| 75 |
+
- Perfect transcription of every instrument
|
| 76 |
+
- Jazz-grade reharmonization
|
| 77 |
+
- Orchestral reduction
|
| 78 |
+
- Real-time live transcription
|
| 79 |
+
|
| 80 |
+
All interesting problems, but each one would balloon scope. Keeping them out of v1 is what lets us actually ship something.
|
| 81 |
+
|
| 82 |
+
---
|
| 83 |
+
|
| 84 |
+
## 05\_tracks\_and\_roles.md
|
| 85 |
+
|
| 86 |
+
The arrangement isn't one stream of notes β it's distinct musical roles. Here's how they map to stems and hands:
|
| 87 |
+
|
| 88 |
+
**Melody track**
|
| 89 |
+
- Source: Demucs vocals stem β transcribed via Basic Pitch
|
| 90 |
+
- Monophonic, lightly ornamented
|
| 91 |
+
- Default: right hand
|
| 92 |
+
|
| 93 |
+
**Chord / Harmony track**
|
| 94 |
+
- Derived from chord analysis on the "other" stem (guitars, keys)
|
| 95 |
+
- Simplified to triads β root + third + fifth within one octave
|
| 96 |
+
- Default: left hand
|
| 97 |
+
|
| 98 |
+
**Bass function**
|
| 99 |
+
- Source: Demucs bass stem β used to infer chord roots and movement
|
| 100 |
+
- Not transcribed note-for-note β used to drive left hand voicing decisions
|
| 101 |
+
- Merged into left-hand harmony to reduce cognitive load
|
| 102 |
+
|
| 103 |
+
**Drums**
|
| 104 |
+
- Discarded entirely
|
| 105 |
+
|
| 106 |
+
The key design insight: Demucs separates by *musical role*, not by pitch. Those roles map cleanly onto pianist roles β that's why stem separation is done before transcription, not after.
|
| 107 |
+
|
| 108 |
+
---
|
| 109 |
+
|
| 110 |
+
## 06\_representation\_choices.md
|
| 111 |
+
|
| 112 |
+
MIDI is the obvious choice β it's the only format that cleanly separates *what* is played from *how it sounds*.
|
| 113 |
+
|
| 114 |
+
**Why it works**
|
| 115 |
+
- Editable, inspectable, programmatic
|
| 116 |
+
- Maps directly to piano keys
|
| 117 |
+
|
| 118 |
+
**Accepted limitations**
|
| 119 |
+
- No articulation nuance by default
|
| 120 |
+
- No lyric alignment by default
|
| 121 |
+
|
| 122 |
+
Both can be layered on later if needed.
|
| 123 |
+
|
| 124 |
+
---
|
| 125 |
+
|
| 126 |
+
## 07\_evaluation\_metrics.md
|
| 127 |
+
|
| 128 |
+
Some of this is measurable, some comes down to gut feel. Both matter.
|
| 129 |
+
|
| 130 |
+
**Objective (automated)**
|
| 131 |
+
- Notes per second per hand
|
| 132 |
+
- Max chord size per hand
|
| 133 |
+
- Hand span distance over time
|
| 134 |
+
- Number of leaps per hand in a time window
|
| 135 |
+
- Playability violations flagged vs. auto-corrected
|
| 136 |
+
|
| 137 |
+
**Subjective**
|
| 138 |
+
- Does this feel playable?
|
| 139 |
+
- Does this sound like the song?
|
| 140 |
+
- Would a pianist keep this arrangement?
|
| 141 |
+
|
| 142 |
+
---
|
| 143 |
+
|
| 144 |
+
## 08\_licensing\_and\_ethics.md
|
| 145 |
+
|
| 146 |
+
- Audio is user-provided
|
| 147 |
+
- Output MIDI is a derivative arrangement, not a copy
|
| 148 |
+
- No redistribution of original audio
|
| 149 |
+
- Project is technical / educational β not a piracy tool
|
| 150 |
+
|
| 151 |
+
---
|
| 152 |
+
|
| 153 |
+
## 09\_system\_architecture.md
|
| 154 |
+
|
| 155 |
+
**Pipeline overview**
|
| 156 |
+
|
| 157 |
+
```
|
| 158 |
+
Audio Input (MP3/WAV)
|
| 159 |
+
β
|
| 160 |
+
Source Separation [Demucs β vocals, bass, other, drums]
|
| 161 |
+
β
|
| 162 |
+
Per-Stem Transcription [Basic Pitch β raw MIDI per stem]
|
| 163 |
+
β
|
| 164 |
+
Arrangement Engine [melody extract β hand split β density reduce β voicing simplify β span check]
|
| 165 |
+
β
|
| 166 |
+
Validation [playability report + auto-correction]
|
| 167 |
+
β
|
| 168 |
+
MIDI Output + PDF Render [pretty_midi + MuseScore CLI]
|
| 169 |
+
β
|
| 170 |
+
Web API [FastAPI β serves MIDI + PDF to frontend]
|
| 171 |
+
```
|
| 172 |
+
|
| 173 |
+
Linear pipeline, but each stage is its own module. The key design rule: every step writes its output to disk before the next one runs. No hidden state, nothing coupled. Makes debugging and swapping out algorithms significantly easier.
|
| 174 |
+
|
| 175 |
+
**Modules**
|
| 176 |
+
|
| 177 |
+
| Module | Path | Responsibility |
|
| 178 |
+
|---|---|---|
|
| 179 |
+
| Audio I/O | `audio/` | Load MP3/WAV, convert to mono, normalize sample rate |
|
| 180 |
+
| Separation | `separation/` | Run Demucs, extract and save stems |
|
| 181 |
+
| Analysis | `analysis/` | Tempo detection, beat tracking, key estimation β timing grid + metadata (JSON) |
|
| 182 |
+
| Structure | `structure/` | Per-stem transcription, chord estimation, bar/section segmentation β symbolic musical events |
|
| 183 |
+
| Piano Logic | `piano/` | Arrangement engine β note scoring, density reduction, hand assignment, span enforcement, voicing simplification |
|
| 184 |
+
| Rendering | `render/` | Symbolic events β MIDI + optional MuseScore PDF |
|
| 185 |
+
| API | `api/` | FastAPI app wrapping the pipeline, file upload/download endpoints |
|
| 186 |
+
|
| 187 |
+
**Arrangement engine internals**
|
| 188 |
+
|
| 189 |
+
The piano logic module is a transform chain β each pass takes and returns a note list:
|
| 190 |
+
|
| 191 |
+
```
|
| 192 |
+
raw_midi β [melody_extractor] β [hand_splitter] β [density_reducer] β [voicing_simplifier] β [span_checker]
|
| 193 |
+
```
|
| 194 |
+
|
| 195 |
+
Each transform is independently testable. Note prioritization within transforms is score-based:
|
| 196 |
+
|
| 197 |
+
```
|
| 198 |
+
score = w1*(metric_strength) + w2*(duration) + w3*(harmonic_function) + w4*(melodic_peak)
|
| 199 |
+
```
|
| 200 |
+
|
| 201 |
+
When a constraint forces note reduction, highest-scoring notes survive. Weights are tunable β this is where musical judgment gets encoded without being magic.
|
| 202 |
+
|
| 203 |
+
**Data flow rules**
|
| 204 |
+
- No hidden state
|
| 205 |
+
- Every stage writes output to disk
|
| 206 |
+
- Intermediate formats are human-readable (JSON / CSV) where possible
|
| 207 |
+
- Any single module can be swapped out independently
|
| 208 |
+
|
| 209 |
+
**Error handling** β a degraded output beats no output.
|
| 210 |
+
- Invalid audio β fail fast
|
| 211 |
+
- Uncertain musical estimates β warn, don't crash
|
| 212 |
+
- Always produce something, even if simplified
|
| 213 |
+
|
| 214 |
+
---
|
| 215 |
+
|
| 216 |
+
## 10\_cli\_philosophy.md
|
| 217 |
+
|
| 218 |
+
The idea: drop into any stage of the pipeline without re-running everything before it.
|
| 219 |
+
|
| 220 |
+
- Each stage is runnable independently
|
| 221 |
+
- Every command produces inspectable output
|
| 222 |
+
- Intermediate artifacts are saved to disk
|
| 223 |
+
|
| 224 |
+
```
|
| 225 |
+
input audio β separation β analysis β structure β piano logic β MIDI β PDF
|
| 226 |
+
```
|
| 227 |
+
|
| 228 |
+
The CLI is the internal tool. The web UI is the product.
|
| 229 |
+
|
| 230 |
+
---
|
| 231 |
+
|
| 232 |
+
## 11\_repository\_structure.md
|
| 233 |
+
|
| 234 |
+
```
|
| 235 |
+
KeyArrange/
|
| 236 |
+
βββ README.md β product-first: what, who, demo link, then architecture
|
| 237 |
+
βββ DECISIONS.md β key tradeoffs and why
|
| 238 |
+
βββ pyproject.toml
|
| 239 |
+
βββ requirements.txt
|
| 240 |
+
βββ .gitignore
|
| 241 |
+
βββ docs/
|
| 242 |
+
β βββ phase_0/
|
| 243 |
+
β βββ project_scope.md
|
| 244 |
+
β βββ human_playable.md
|
| 245 |
+
β βββ quality_bar.md
|
| 246 |
+
β βββ non_goals.md
|
| 247 |
+
β βββ tracks_and_roles.md
|
| 248 |
+
β βββ representation_choices.md
|
| 249 |
+
β βββ evaluation_metrics.md
|
| 250 |
+
β βββ licensing_and_ethics.md
|
| 251 |
+
β βββ cli_philosophy.md
|
| 252 |
+
β βββ system_architecture.md
|
| 253 |
+
βββ src/
|
| 254 |
+
β βββ keyarrange/
|
| 255 |
+
β βββ __init__.py
|
| 256 |
+
β βββ audio/
|
| 257 |
+
β βββ separation/
|
| 258 |
+
β βββ analysis/
|
| 259 |
+
β βββ structure/
|
| 260 |
+
β βββ piano/
|
| 261 |
+
β βββ render/
|
| 262 |
+
β βββ api/
|
| 263 |
+
β βββ cli.py
|
| 264 |
+
βββ web/ β minimal frontend (single-page upload + download)
|
| 265 |
+
βββ tests/
|
| 266 |
+
β βββ test_smoke.py
|
| 267 |
+
βββ examples/
|
| 268 |
+
βββ sample_outputs/ β before/after audio + MIDI examples for README
|
| 269 |
+
```
|
| 270 |
+
|
| 271 |
+
- `docs/` mirrors project phases
|
| 272 |
+
- `src/` is production code only
|
| 273 |
+
- `web/` is the product face β not an afterthought
|
| 274 |
+
- `DECISIONS.md` surfaces engineering judgment, not just engineering output
|
| 275 |
+
- `examples/` feeds the README demo β non-technical visitors land here first
|
| 276 |
+
|
| 277 |
+
---
|
| 278 |
+
|
| 279 |
+
## 12\_decisions.md
|
| 280 |
+
|
| 281 |
+
Explicit tradeoffs made and why. This file grows with the project.
|
| 282 |
+
|
| 283 |
+
**Stem separation before transcription, not after.**
|
| 284 |
+
Transcribing a full mix means Basic Pitch is working against overlapping kick drum, bass, vocals, and rhythm guitar simultaneously. Polyphonic transcription on a blended signal is an unsolved problem. Separating first gives each transcription step a cleaner, semantically meaningful signal.
|
| 285 |
+
|
| 286 |
+
**Vocals β right hand, bass stem β left hand chord roots.**
|
| 287 |
+
The bass stem in pop music almost always outlines chord roots and movement β exactly what a beginner left hand should play. The bass is not transcribed note-for-note; it's used to infer harmonic movement and drive a generated voicing pattern. This is more musical and more playable than a literal transcription.
|
| 288 |
+
|
| 289 |
+
**Rule-based arrangement engine with score-based note prioritization.**
|
| 290 |
+
Rule-based gives deterministic, testable, explainable output. Every constraint from `human_playable.md` maps to a concrete function. An LLM layer can be added later for edge cases that are hard to encode as rules, but it's not load-bearing in v1.
|
| 291 |
+
|
| 292 |
+
**Fixed pitch threshold as hand-split fallback.**
|
| 293 |
+
Notes above MIDI 60 β right hand, below β left. Simple, explainable, almost always musically correct for pop songs. The stem-based approach is primary; this is the safety net.
|
| 294 |
+
|
| 295 |
+
**Web UI is core, not deferred.**
|
| 296 |
+
A working URL someone can visit beats a CLI that requires setup. Non-technical stakeholders β including founders evaluating the project β need to be able to use it in 30 seconds. The product loop must close before quality is optimized.
|
| 297 |
+
|
| 298 |
+
---
|
| 299 |
+
|
| 300 |
+
## 13\_phase\_overview.md
|
| 301 |
+
|
| 302 |
+
Each phase has a hard boundary β what it's allowed to solve, what it isn't. This is what prevents scope creep once building starts.
|
| 303 |
+
|
| 304 |
+
---
|
| 305 |
+
|
| 306 |
+
### Phase 0 β Planning & Intent
|
| 307 |
+
**Purpose:** Lock in terminology, constraints, architecture before writing a line of code.
|
| 308 |
+
**Output:** This documentation. Frozen vocabulary and constraints.
|
| 309 |
+
|
| 310 |
+
---
|
| 311 |
+
|
| 312 |
+
### Phase 1 β MVP (Days 1β2)
|
| 313 |
+
|
| 314 |
+
The goal is one thing: audio in, playable MIDI out, web UI up, end-to-end, no crashes. Quality is secondary. Proving the pipeline closes is primary.
|
| 315 |
+
|
| 316 |
+
**Day 1 β Pipeline skeleton**
|
| 317 |
+
|
| 318 |
+
*Morning:* Set up environment (Demucs, Basic Pitch, pretty_midi, music21). Wire the first two stages: run Demucs on input audio, extract vocals and bass stems. Run Basic Pitch on each stem separately, save as two raw MIDI files. Write a `load_midi(path) β list[Note]` abstraction that every downstream stage will use. Test on one song you know well.
|
| 319 |
+
|
| 320 |
+
*Afternoon:* Right hand = vocal MIDI as-is. Left hand = bass MIDI quantized to quarter notes (snap to nearest beat, discard duration detail for now). Merge both hands into a single two-track MIDI file. Play it back. At end of Day 1 there should be something that plays back and is recognizably the song.
|
| 321 |
+
|
| 322 |
+
**Day 2 β Three essential transforms + web UI**
|
| 323 |
+
|
| 324 |
+
*Morning:* Implement the transform chain, each as its own function (note list in, note list out):
|
| 325 |
+
|
| 326 |
+
1. **Density reducer** β for each 500ms window, if note count exceeds `120 / BPM`, keep only the longest-duration notes until under the limit
|
| 327 |
+
2. **Span enforcer** β for simultaneous notes in one hand spanning > 9 semitones, drop lowest in right hand or highest in left
|
| 328 |
+
3. **Note cap** β reduce each hand to β€ 3 simultaneous notes, dropping by shortest duration first
|
| 329 |
+
|
| 330 |
+
*Afternoon:* Minimal web UI β FastAPI backend with a single file upload endpoint + a single-page frontend (file picker, upload button, MIDI download). Deploy to Hugging Face Spaces or Railway. The CLI stays as an internal dev tool. Test end-to-end on 2β3 songs.
|
| 331 |
+
|
| 332 |
+
**Not in scope:** Output quality, advanced voicings, sheet music.
|
| 333 |
+
|
| 334 |
+
---
|
| 335 |
+
|
| 336 |
+
### Phase 2 β Musical Quality (Week 2)
|
| 337 |
+
|
| 338 |
+
Phase 1 closes the loop. Phase 2 makes the output actually sound like an intentional arrangement.
|
| 339 |
+
|
| 340 |
+
**Chord-aware left hand.** Use music21's chord detection on the bass + "other" stems together to identify the chord at each beat. Generate the left hand from the chord symbol β root-position triad (root + third + fifth) voiced within one octave β rather than from the raw bass transcription. This is the single highest-impact improvement in the project.
|
| 341 |
+
|
| 342 |
+
**MuseScore PDF rendering.** Convert output MIDI to MusicXML via music21, call MuseScore's CLI headlessly, render a PDF. Serve it alongside the MIDI in the web UI. A rendered sheet music PDF of a song someone just uploaded is the "wow" demo moment. It makes the project immediately legible to any visitor, musical or not.
|
| 343 |
+
|
| 344 |
+
**Before/after examples.** Add 3β4 song examples to `examples/sample_outputs/` with the original audio clip, raw transcription MIDI, and final arranged MIDI side by side. This is the most effective way to communicate what the system actually does.
|
| 345 |
+
|
| 346 |
+
**Output:** Measurably better arrangements + sheet music output + updated README with live examples.
|
| 347 |
+
|
| 348 |
+
---
|
| 349 |
+
|
| 350 |
+
### Phase 3 β DSP & Signal Quality (Week 3)
|
| 351 |
+
|
| 352 |
+
Phase 2 gives us good structure. Phase 3 makes the notes themselves cleaner.
|
| 353 |
+
|
| 354 |
+
**Beat tracking with madmom.** Replace simple tempo-based timing with proper downbeat detection via madmom's pre-trained RNN. Accurate beat positions make metric strength scoring significantly more reliable and improve every timing-dependent transform.
|
| 355 |
+
|
| 356 |
+
**Melody smoothing.** After vocal transcription, compute the melodic contour and apply a smoothing pass: remove ornaments, grace notes, and melisma (rapid pitch changes on one syllable). Keep only notes above a minimum duration threshold relative to tempo. This makes the right hand cleaner without losing the melodic shape.
|
| 357 |
+
|
| 358 |
+
**Dynamic tempo-aware density scaling.** Replace the fixed density threshold with `max_notes_per_beat = 120 / BPM`. Correctly tightens constraints at fast tempos, relaxes them at slow ones.
|
| 359 |
+
|
| 360 |
+
**Output:** Cleaner melody, better rhythm, tighter playability enforcement.
|
| 361 |
+
|
| 362 |
+
---
|
| 363 |
+
|
| 364 |
+
### Phase 4 β AI-Assisted Refinement (Later / Stretch)
|
| 365 |
+
|
| 366 |
+
Research-grade improvements for when the core pipeline is solid. Single GPU constraint applies to any training.
|
| 367 |
+
|
| 368 |
+
**Chord recognition upgrade.** Swap music21's built-in key finding for a MIREX-benchmark chord model from HuggingFace. Beat-level chord labels β better left hand voicing decisions throughout.
|
| 369 |
+
|
| 370 |
+
**Voice leading optimization.** Model the left hand voicing problem as shortest-path search over a chord graph, where edge weights penalize large leaps between successive chords. Music21 has voice leading utilities as a starting point.
|
| 371 |
+
|
| 372 |
+
**Fine-tuned arrangement model (ambitious).** The research framing: given a raw MIDI sequence, predict a simplified pianist version. Dataset: POP909 (pop songs paired with piano arrangements) + ATEPP (professional performance MIDI), both free and well-documented. A small transformer fine-tuned on POP909 fits on a single GPU and represents genuinely publishable-quality work if done well.
|
| 373 |
+
|
| 374 |
+
**LLM-assisted post-processing.** Use the Claude API to take a symbolic representation of the arrangement (chord symbols + melody contour as structured text) and suggest corrections to voicing or hand assignment. Effective for edge cases the rule engine misses, fast to prototype.
|
| 375 |
+
|
| 376 |
+
---
|
| 377 |
+
|
| 378 |
+
### Phase 5 β Polish & Presentation
|
| 379 |
+
|
| 380 |
+
**Difficulty scoring.** After arrangement, compute a score from max span, average density, and rhythmic complexity. Display to the user β gives the output a concrete, communicable property.
|
| 381 |
+
|
| 382 |
+
**README as product document.** Lead with: what this is, who it's for, live demo link, before/after audio examples. Architecture comes after. Most GitHub repos lead with architecture. This one leads with the problem.
|
| 383 |
+
|
| 384 |
+
**Example gallery.** 4β6 songs covering different tempos, keys, and feels. Shows range and gives visitors something to click.
|
| 385 |
+
|
| 386 |
+
**Output:** Polished repo, clear project narrative, live demo, sheet music output β something a non-technical founder can evaluate in under a minute.
|