Goonjan commited on
Commit
fda14b3
Β·
1 Parent(s): 6cf6008

Phase 0: Adding v0 documents

Browse files
Files changed (2) hide show
  1. README.md +121 -1
  2. docs/phase_0/planning_documentation.md +386 -0
README.md CHANGED
@@ -1,2 +1,122 @@
1
  # KeyArrange
2
- Convert a song into a human-playable piano arrangement.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # KeyArrange
2
+
3
+ **Upload a song. Get sheet music you can actually play.**
4
+
5
+ [Live Demo](#demo) Β· [Sample Outputs](#sample-outputs)
6
+
7
+ > **Status:** The end-to-end pipeline is actively being built. See the [Roadmap](#roadmap) for what's done and what's next.
8
+
9
+ ---
10
+
11
+ ## The Problem
12
+
13
+ I play piano at an intermediate level β€” arpeggios, chord progressions, songs I've learned by feel. But I have no ear training. When I hear a song I want to learn, I can't transcribe it myself, and I can't find a playable arrangement for most of it.
14
+
15
+ The tools that exist don't solve this:
16
+
17
+ | Tool | What it does | Why it falls short |
18
+ |---|---|---|
19
+ | Synthesia / YouTube tutorials | Walks you through notes visually | You follow, you don't learn. No sheet music. |
20
+ | AnthemScore / Transcribe! | Full polyphonic transcription | Outputs every instrument. Unplayable by one person. |
21
+ | MuseScore built-in import | Converts MIDI you already have | Assumes you already have a clean source. |
22
+ | Online MIDI converters | Audio β†’ raw MIDI | Raw transcription dump. No arrangement, no playability logic. |
23
+
24
+ None of them answer the actual question: *I heard this song. Can I play it on piano tonight?*
25
+
26
+ KeyArrange does. It takes any pop song as audio input, separates the musical roles, and produces a two-hand beginner–intermediate piano arrangement β€” physically playable, sheet music included.
27
+
28
+ ---
29
+
30
+ ## Demo
31
+
32
+ > Upload an MP3. Download a PDF and MIDI you can open in any DAW or print and play.
33
+
34
+ ```
35
+ [ Choose file ] β†’ [ Arrange ] β†’ [ Download PDF ] [ Download MIDI ]
36
+ ```
37
+
38
+ *Live demo coming with v1 deploy.*
39
+
40
+ ### Sample Outputs
41
+
42
+ | Song | Original | Arranged MIDI | Sheet Music PDF |
43
+ |---|---|---|---|
44
+ | Coming soon | β€” | β€” | β€” |
45
+
46
+ ---
47
+
48
+ ## What "playable" actually means
49
+
50
+ This is the core design constraint the whole project is built around. Playable doesn't mean "renders correctly in a DAW." It means a real pianist can sit down and play it.
51
+
52
+ - Max hand span: one octave (~9 white keys)
53
+ - Max notes per hand at any moment: 3
54
+ - Note density matched to tempo β€” no sixteenth-note runs at 160 BPM
55
+ - Clear left hand / right hand separation β€” no ambiguous voicings
56
+ - Repeated notes spaced so a human can re-strike them
57
+
58
+ The quick test: can a pianist sight-read it at moderate tempo, clearly assign hands, and not need to rearrange anything? If yes, it passes.
59
+
60
+ ---
61
+
62
+ ## Stack
63
+
64
+ - **[Demucs](https://github.com/facebookresearch/demucs)** (Facebook Research) β€” source separation into vocals, bass, drums, and other
65
+ - **[Basic Pitch](https://github.com/spotify/basic-pitch)** (Spotify) β€” lightweight polyphonic audio-to-MIDI transcription
66
+ - **[music21](https://web.mit.edu/music21/)** β€” chord analysis, key detection, voice leading
67
+ - **[pretty_midi](https://github.com/craffel/pretty-midi)** β€” MIDI read/write
68
+ - **[MuseScore CLI](https://musescore.org)** β€” sheet music rendering
69
+ - **[FastAPI](https://fastapi.tiangolo.com)** β€” web API
70
+ - **Python 3.11+**
71
+
72
+ ---
73
+
74
+ ## Roadmap
75
+
76
+ **v1 β€” in progress**
77
+ - [ ] End-to-end pipeline: audio β†’ MIDI
78
+ - [ ] Three core playability transforms (density, span, note cap)
79
+ - [ ] Web UI with MIDI download
80
+
81
+ **v2 β€” planned**
82
+ - [ ] Chord-aware left hand voicing (root + third + fifth from chord analysis)
83
+ - [ ] MuseScore PDF rendering
84
+ - [ ] Before/after example gallery
85
+
86
+ **v3 β€” planned**
87
+ - [ ] Beat tracking with madmom for better metric strength scoring
88
+ - [ ] Melody smoothing β€” strip ornaments and melisma from vocal transcription
89
+ - [ ] Difficulty score on output
90
+
91
+ **Later**
92
+ - [ ] Fine-tuned arrangement model on POP909 dataset
93
+ - [ ] Difficulty levels (beginner / intermediate / advanced)
94
+ - [ ] Style options (ballad, jazz voicings)
95
+
96
+ ---
97
+
98
+ ## Project Structure
99
+
100
+ ```
101
+ KeyArrange/
102
+ β”œβ”€β”€ README.md
103
+ β”œβ”€β”€ DECISIONS.md ← key tradeoffs and why
104
+ β”œβ”€β”€ src/keyarrange/
105
+ β”‚ β”œβ”€β”€ audio/ ← loading, normalization
106
+ β”‚ β”œβ”€β”€ separation/ ← Demucs wrapper
107
+ β”‚ β”œβ”€β”€ analysis/ ← tempo, beat, key
108
+ β”‚ β”œβ”€β”€ structure/ ← transcription, chord estimation
109
+ β”‚ β”œβ”€β”€ piano/ ← arrangement engine
110
+ β”‚ β”œβ”€β”€ render/ ← MIDI + PDF output
111
+ β”‚ β”œβ”€β”€ api/ ← FastAPI app
112
+ β”‚ └── cli.py
113
+ β”œβ”€β”€ web/ ← single-page frontend
114
+ β”œβ”€β”€ tests/
115
+ └── examples/sample_outputs/
116
+ ```
117
+
118
+ ---
119
+
120
+ ## License
121
+
122
+ MIT. Audio files are user-provided. Output MIDI is a derivative arrangement. No original audio is stored or redistributed.
docs/phase_0/planning_documentation.md ADDED
@@ -0,0 +1,386 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Phase 0: Planning
2
+
3
+ All Phase 0 planning notes in one place for now. Once things settle, each section gets split into its own `.md` in the repo. Fast iteration first, structure later.
4
+
5
+ ---
6
+
7
+ ## 01\_project\_scope.md
8
+
9
+ **Goal:** Take a pop song as audio input and produce a piano arrangement a real beginner–intermediate pianist can sit down and play. Not just something that renders correctly β€” something actually performable.
10
+
11
+ **What we're balancing:** Musicality, physical playability, faithfulness to the song's identity.
12
+
13
+ **Input**
14
+ - Audio file (MP3 / WAV)
15
+ - Optional: tempo hints, key hints
16
+
17
+ **Output**
18
+ - MIDI file (primary)
19
+ - Sheet music PDF (Week 2 milestone)
20
+ - Web UI with live demo (MVP β€” not deferred)
21
+
22
+ **Deferred to later phases**
23
+ - Difficulty levels (beginner / intermediate / advanced)
24
+ - Style preferences (ballad, jazz)
25
+ - Real-time live transcription
26
+
27
+ ---
28
+
29
+ ## 02\_human\_playable.md
30
+
31
+ This is probably the most important definition in the project. "Human-playable" means a beginner–intermediate pianist can physically perform the output without superhuman reach, speed, or memory.
32
+
33
+ **Not the same as** (common pitfall to avoid):
34
+ - MIDI that merely sounds correct
35
+ - Perfect transcription of all instruments
36
+ - Virtuoso-level arrangements
37
+
38
+ **Physical constraints**
39
+ - Max hand span ~9–10 white keys (one octave)
40
+ - No repeated octave leaps at fast tempos
41
+ - No impossible note clusters for a single hand
42
+
43
+ **Temporal constraints**
44
+ - Note density must match tempo (no eighth-note spam at 180+ BPM)
45
+ - Repeated notes must be realistically re-strikeable
46
+
47
+ **Cognitive constraints**
48
+ - Clear melody vs. accompaniment separation
49
+ - Predictable rhythmic patterns
50
+ - ≀ 3 notes per hand at a time
51
+
52
+ **Quick check:** Can a pianist sight-read it at moderate tempo, clearly assign hands, and not need to rearrange voicings? β†’ Human-playable.
53
+
54
+ ---
55
+
56
+ ## 03\_quality\_bar.md
57
+
58
+ The bar isn't perfection β€” it's "would a pianist actually use this?" Two sides to it:
59
+
60
+ **Musical**
61
+ - Preserve main vocal melody
62
+ - Preserve harmonic rhythm (chord changes)
63
+ - Should feel like a piano arrangement, not a transcription dump
64
+
65
+ **Playability**
66
+ - Playable by a single pianist
67
+ - Stick to triads β€” avoid dense cluster chords
68
+ - Prefer broken chords / arpeggios over sustained pads
69
+
70
+ ---
71
+
72
+ ## 04\_non\_goals.md
73
+
74
+ Explicitly out of scope for v1:
75
+ - Perfect transcription of every instrument
76
+ - Jazz-grade reharmonization
77
+ - Orchestral reduction
78
+ - Real-time live transcription
79
+
80
+ All interesting problems, but each one would balloon scope. Keeping them out of v1 is what lets us actually ship something.
81
+
82
+ ---
83
+
84
+ ## 05\_tracks\_and\_roles.md
85
+
86
+ The arrangement isn't one stream of notes β€” it's distinct musical roles. Here's how they map to stems and hands:
87
+
88
+ **Melody track**
89
+ - Source: Demucs vocals stem β†’ transcribed via Basic Pitch
90
+ - Monophonic, lightly ornamented
91
+ - Default: right hand
92
+
93
+ **Chord / Harmony track**
94
+ - Derived from chord analysis on the "other" stem (guitars, keys)
95
+ - Simplified to triads β€” root + third + fifth within one octave
96
+ - Default: left hand
97
+
98
+ **Bass function**
99
+ - Source: Demucs bass stem β†’ used to infer chord roots and movement
100
+ - Not transcribed note-for-note β€” used to drive left hand voicing decisions
101
+ - Merged into left-hand harmony to reduce cognitive load
102
+
103
+ **Drums**
104
+ - Discarded entirely
105
+
106
+ The key design insight: Demucs separates by *musical role*, not by pitch. Those roles map cleanly onto pianist roles β€” that's why stem separation is done before transcription, not after.
107
+
108
+ ---
109
+
110
+ ## 06\_representation\_choices.md
111
+
112
+ MIDI is the obvious choice β€” it's the only format that cleanly separates *what* is played from *how it sounds*.
113
+
114
+ **Why it works**
115
+ - Editable, inspectable, programmatic
116
+ - Maps directly to piano keys
117
+
118
+ **Accepted limitations**
119
+ - No articulation nuance by default
120
+ - No lyric alignment by default
121
+
122
+ Both can be layered on later if needed.
123
+
124
+ ---
125
+
126
+ ## 07\_evaluation\_metrics.md
127
+
128
+ Some of this is measurable, some comes down to gut feel. Both matter.
129
+
130
+ **Objective (automated)**
131
+ - Notes per second per hand
132
+ - Max chord size per hand
133
+ - Hand span distance over time
134
+ - Number of leaps per hand in a time window
135
+ - Playability violations flagged vs. auto-corrected
136
+
137
+ **Subjective**
138
+ - Does this feel playable?
139
+ - Does this sound like the song?
140
+ - Would a pianist keep this arrangement?
141
+
142
+ ---
143
+
144
+ ## 08\_licensing\_and\_ethics.md
145
+
146
+ - Audio is user-provided
147
+ - Output MIDI is a derivative arrangement, not a copy
148
+ - No redistribution of original audio
149
+ - Project is technical / educational β€” not a piracy tool
150
+
151
+ ---
152
+
153
+ ## 09\_system\_architecture.md
154
+
155
+ **Pipeline overview**
156
+
157
+ ```
158
+ Audio Input (MP3/WAV)
159
+ ↓
160
+ Source Separation [Demucs β†’ vocals, bass, other, drums]
161
+ ↓
162
+ Per-Stem Transcription [Basic Pitch β†’ raw MIDI per stem]
163
+ ↓
164
+ Arrangement Engine [melody extract β†’ hand split β†’ density reduce β†’ voicing simplify β†’ span check]
165
+ ↓
166
+ Validation [playability report + auto-correction]
167
+ ↓
168
+ MIDI Output + PDF Render [pretty_midi + MuseScore CLI]
169
+ ↓
170
+ Web API [FastAPI β€” serves MIDI + PDF to frontend]
171
+ ```
172
+
173
+ Linear pipeline, but each stage is its own module. The key design rule: every step writes its output to disk before the next one runs. No hidden state, nothing coupled. Makes debugging and swapping out algorithms significantly easier.
174
+
175
+ **Modules**
176
+
177
+ | Module | Path | Responsibility |
178
+ |---|---|---|
179
+ | Audio I/O | `audio/` | Load MP3/WAV, convert to mono, normalize sample rate |
180
+ | Separation | `separation/` | Run Demucs, extract and save stems |
181
+ | Analysis | `analysis/` | Tempo detection, beat tracking, key estimation β†’ timing grid + metadata (JSON) |
182
+ | Structure | `structure/` | Per-stem transcription, chord estimation, bar/section segmentation β†’ symbolic musical events |
183
+ | Piano Logic | `piano/` | Arrangement engine β€” note scoring, density reduction, hand assignment, span enforcement, voicing simplification |
184
+ | Rendering | `render/` | Symbolic events β†’ MIDI + optional MuseScore PDF |
185
+ | API | `api/` | FastAPI app wrapping the pipeline, file upload/download endpoints |
186
+
187
+ **Arrangement engine internals**
188
+
189
+ The piano logic module is a transform chain β€” each pass takes and returns a note list:
190
+
191
+ ```
192
+ raw_midi β†’ [melody_extractor] β†’ [hand_splitter] β†’ [density_reducer] β†’ [voicing_simplifier] β†’ [span_checker]
193
+ ```
194
+
195
+ Each transform is independently testable. Note prioritization within transforms is score-based:
196
+
197
+ ```
198
+ score = w1*(metric_strength) + w2*(duration) + w3*(harmonic_function) + w4*(melodic_peak)
199
+ ```
200
+
201
+ When a constraint forces note reduction, highest-scoring notes survive. Weights are tunable β€” this is where musical judgment gets encoded without being magic.
202
+
203
+ **Data flow rules**
204
+ - No hidden state
205
+ - Every stage writes output to disk
206
+ - Intermediate formats are human-readable (JSON / CSV) where possible
207
+ - Any single module can be swapped out independently
208
+
209
+ **Error handling** β€” a degraded output beats no output.
210
+ - Invalid audio β†’ fail fast
211
+ - Uncertain musical estimates β†’ warn, don't crash
212
+ - Always produce something, even if simplified
213
+
214
+ ---
215
+
216
+ ## 10\_cli\_philosophy.md
217
+
218
+ The idea: drop into any stage of the pipeline without re-running everything before it.
219
+
220
+ - Each stage is runnable independently
221
+ - Every command produces inspectable output
222
+ - Intermediate artifacts are saved to disk
223
+
224
+ ```
225
+ input audio β†’ separation β†’ analysis β†’ structure β†’ piano logic β†’ MIDI β†’ PDF
226
+ ```
227
+
228
+ The CLI is the internal tool. The web UI is the product.
229
+
230
+ ---
231
+
232
+ ## 11\_repository\_structure.md
233
+
234
+ ```
235
+ KeyArrange/
236
+ β”œβ”€β”€ README.md ← product-first: what, who, demo link, then architecture
237
+ β”œβ”€β”€ DECISIONS.md ← key tradeoffs and why
238
+ β”œβ”€β”€ pyproject.toml
239
+ β”œβ”€β”€ requirements.txt
240
+ β”œβ”€β”€ .gitignore
241
+ β”œβ”€β”€ docs/
242
+ β”‚ └── phase_0/
243
+ β”‚ β”œβ”€β”€ project_scope.md
244
+ β”‚ β”œβ”€β”€ human_playable.md
245
+ β”‚ β”œβ”€β”€ quality_bar.md
246
+ β”‚ β”œβ”€β”€ non_goals.md
247
+ β”‚ β”œβ”€β”€ tracks_and_roles.md
248
+ β”‚ β”œβ”€β”€ representation_choices.md
249
+ β”‚ β”œβ”€β”€ evaluation_metrics.md
250
+ β”‚ β”œβ”€β”€ licensing_and_ethics.md
251
+ β”‚ β”œβ”€β”€ cli_philosophy.md
252
+ β”‚ └── system_architecture.md
253
+ β”œβ”€β”€ src/
254
+ β”‚ └── keyarrange/
255
+ β”‚ β”œβ”€β”€ __init__.py
256
+ β”‚ β”œβ”€β”€ audio/
257
+ β”‚ β”œβ”€β”€ separation/
258
+ β”‚ β”œβ”€β”€ analysis/
259
+ β”‚ β”œβ”€β”€ structure/
260
+ β”‚ β”œβ”€β”€ piano/
261
+ β”‚ β”œβ”€β”€ render/
262
+ β”‚ β”œβ”€β”€ api/
263
+ β”‚ └── cli.py
264
+ β”œβ”€β”€ web/ ← minimal frontend (single-page upload + download)
265
+ β”œβ”€β”€ tests/
266
+ β”‚ └── test_smoke.py
267
+ └── examples/
268
+ └── sample_outputs/ ← before/after audio + MIDI examples for README
269
+ ```
270
+
271
+ - `docs/` mirrors project phases
272
+ - `src/` is production code only
273
+ - `web/` is the product face β€” not an afterthought
274
+ - `DECISIONS.md` surfaces engineering judgment, not just engineering output
275
+ - `examples/` feeds the README demo β€” non-technical visitors land here first
276
+
277
+ ---
278
+
279
+ ## 12\_decisions.md
280
+
281
+ Explicit tradeoffs made and why. This file grows with the project.
282
+
283
+ **Stem separation before transcription, not after.**
284
+ Transcribing a full mix means Basic Pitch is working against overlapping kick drum, bass, vocals, and rhythm guitar simultaneously. Polyphonic transcription on a blended signal is an unsolved problem. Separating first gives each transcription step a cleaner, semantically meaningful signal.
285
+
286
+ **Vocals β†’ right hand, bass stem β†’ left hand chord roots.**
287
+ The bass stem in pop music almost always outlines chord roots and movement β€” exactly what a beginner left hand should play. The bass is not transcribed note-for-note; it's used to infer harmonic movement and drive a generated voicing pattern. This is more musical and more playable than a literal transcription.
288
+
289
+ **Rule-based arrangement engine with score-based note prioritization.**
290
+ Rule-based gives deterministic, testable, explainable output. Every constraint from `human_playable.md` maps to a concrete function. An LLM layer can be added later for edge cases that are hard to encode as rules, but it's not load-bearing in v1.
291
+
292
+ **Fixed pitch threshold as hand-split fallback.**
293
+ Notes above MIDI 60 β†’ right hand, below β†’ left. Simple, explainable, almost always musically correct for pop songs. The stem-based approach is primary; this is the safety net.
294
+
295
+ **Web UI is core, not deferred.**
296
+ A working URL someone can visit beats a CLI that requires setup. Non-technical stakeholders β€” including founders evaluating the project β€” need to be able to use it in 30 seconds. The product loop must close before quality is optimized.
297
+
298
+ ---
299
+
300
+ ## 13\_phase\_overview.md
301
+
302
+ Each phase has a hard boundary β€” what it's allowed to solve, what it isn't. This is what prevents scope creep once building starts.
303
+
304
+ ---
305
+
306
+ ### Phase 0 β€” Planning & Intent
307
+ **Purpose:** Lock in terminology, constraints, architecture before writing a line of code.
308
+ **Output:** This documentation. Frozen vocabulary and constraints.
309
+
310
+ ---
311
+
312
+ ### Phase 1 β€” MVP (Days 1–2)
313
+
314
+ The goal is one thing: audio in, playable MIDI out, web UI up, end-to-end, no crashes. Quality is secondary. Proving the pipeline closes is primary.
315
+
316
+ **Day 1 β€” Pipeline skeleton**
317
+
318
+ *Morning:* Set up environment (Demucs, Basic Pitch, pretty_midi, music21). Wire the first two stages: run Demucs on input audio, extract vocals and bass stems. Run Basic Pitch on each stem separately, save as two raw MIDI files. Write a `load_midi(path) β†’ list[Note]` abstraction that every downstream stage will use. Test on one song you know well.
319
+
320
+ *Afternoon:* Right hand = vocal MIDI as-is. Left hand = bass MIDI quantized to quarter notes (snap to nearest beat, discard duration detail for now). Merge both hands into a single two-track MIDI file. Play it back. At end of Day 1 there should be something that plays back and is recognizably the song.
321
+
322
+ **Day 2 β€” Three essential transforms + web UI**
323
+
324
+ *Morning:* Implement the transform chain, each as its own function (note list in, note list out):
325
+
326
+ 1. **Density reducer** β€” for each 500ms window, if note count exceeds `120 / BPM`, keep only the longest-duration notes until under the limit
327
+ 2. **Span enforcer** β€” for simultaneous notes in one hand spanning > 9 semitones, drop lowest in right hand or highest in left
328
+ 3. **Note cap** β€” reduce each hand to ≀ 3 simultaneous notes, dropping by shortest duration first
329
+
330
+ *Afternoon:* Minimal web UI β€” FastAPI backend with a single file upload endpoint + a single-page frontend (file picker, upload button, MIDI download). Deploy to Hugging Face Spaces or Railway. The CLI stays as an internal dev tool. Test end-to-end on 2–3 songs.
331
+
332
+ **Not in scope:** Output quality, advanced voicings, sheet music.
333
+
334
+ ---
335
+
336
+ ### Phase 2 β€” Musical Quality (Week 2)
337
+
338
+ Phase 1 closes the loop. Phase 2 makes the output actually sound like an intentional arrangement.
339
+
340
+ **Chord-aware left hand.** Use music21's chord detection on the bass + "other" stems together to identify the chord at each beat. Generate the left hand from the chord symbol β€” root-position triad (root + third + fifth) voiced within one octave β€” rather than from the raw bass transcription. This is the single highest-impact improvement in the project.
341
+
342
+ **MuseScore PDF rendering.** Convert output MIDI to MusicXML via music21, call MuseScore's CLI headlessly, render a PDF. Serve it alongside the MIDI in the web UI. A rendered sheet music PDF of a song someone just uploaded is the "wow" demo moment. It makes the project immediately legible to any visitor, musical or not.
343
+
344
+ **Before/after examples.** Add 3–4 song examples to `examples/sample_outputs/` with the original audio clip, raw transcription MIDI, and final arranged MIDI side by side. This is the most effective way to communicate what the system actually does.
345
+
346
+ **Output:** Measurably better arrangements + sheet music output + updated README with live examples.
347
+
348
+ ---
349
+
350
+ ### Phase 3 β€” DSP & Signal Quality (Week 3)
351
+
352
+ Phase 2 gives us good structure. Phase 3 makes the notes themselves cleaner.
353
+
354
+ **Beat tracking with madmom.** Replace simple tempo-based timing with proper downbeat detection via madmom's pre-trained RNN. Accurate beat positions make metric strength scoring significantly more reliable and improve every timing-dependent transform.
355
+
356
+ **Melody smoothing.** After vocal transcription, compute the melodic contour and apply a smoothing pass: remove ornaments, grace notes, and melisma (rapid pitch changes on one syllable). Keep only notes above a minimum duration threshold relative to tempo. This makes the right hand cleaner without losing the melodic shape.
357
+
358
+ **Dynamic tempo-aware density scaling.** Replace the fixed density threshold with `max_notes_per_beat = 120 / BPM`. Correctly tightens constraints at fast tempos, relaxes them at slow ones.
359
+
360
+ **Output:** Cleaner melody, better rhythm, tighter playability enforcement.
361
+
362
+ ---
363
+
364
+ ### Phase 4 β€” AI-Assisted Refinement (Later / Stretch)
365
+
366
+ Research-grade improvements for when the core pipeline is solid. Single GPU constraint applies to any training.
367
+
368
+ **Chord recognition upgrade.** Swap music21's built-in key finding for a MIREX-benchmark chord model from HuggingFace. Beat-level chord labels β†’ better left hand voicing decisions throughout.
369
+
370
+ **Voice leading optimization.** Model the left hand voicing problem as shortest-path search over a chord graph, where edge weights penalize large leaps between successive chords. Music21 has voice leading utilities as a starting point.
371
+
372
+ **Fine-tuned arrangement model (ambitious).** The research framing: given a raw MIDI sequence, predict a simplified pianist version. Dataset: POP909 (pop songs paired with piano arrangements) + ATEPP (professional performance MIDI), both free and well-documented. A small transformer fine-tuned on POP909 fits on a single GPU and represents genuinely publishable-quality work if done well.
373
+
374
+ **LLM-assisted post-processing.** Use the Claude API to take a symbolic representation of the arrangement (chord symbols + melody contour as structured text) and suggest corrections to voicing or hand assignment. Effective for edge cases the rule engine misses, fast to prototype.
375
+
376
+ ---
377
+
378
+ ### Phase 5 β€” Polish & Presentation
379
+
380
+ **Difficulty scoring.** After arrangement, compute a score from max span, average density, and rhythmic complexity. Display to the user β€” gives the output a concrete, communicable property.
381
+
382
+ **README as product document.** Lead with: what this is, who it's for, live demo link, before/after audio examples. Architecture comes after. Most GitHub repos lead with architecture. This one leads with the problem.
383
+
384
+ **Example gallery.** 4–6 songs covering different tempos, keys, and feels. Shows range and gives visitors something to click.
385
+
386
+ **Output:** Polished repo, clear project narrative, live demo, sheet music output β€” something a non-technical founder can evaluate in under a minute.