Ashiedu commited on
Commit
aea85d3
·
verified ·
1 Parent(s): 4a3604c

Unsloth Model Card

Browse files
Files changed (1) hide show
  1. README.md +12 -336
README.md CHANGED
@@ -1,345 +1,21 @@
1
  ---
2
- license: apache-2.0
3
- task_categories:
4
- - audio-to-audio
5
- - text-to-audio
6
- - image-to-text
7
  tags:
8
- - music-generation
9
- - magenta
10
- - magenta-rt
11
- - onnx
12
- - burn
13
- - llama-cpp
14
- - performance-rnn
15
- - melody-rnn
16
- - drums-rnn
17
- - improv-rnn
18
- - polyphony-rnn
19
- - musicvae
20
- - groovae
21
- - piano-genie
22
- - ddsp
23
- - gansynth
24
- - nsynth
25
- - coconet
26
- - music-transformer
27
- - onsets-and-frames
28
- - spectrostream
29
- - musiccoca
30
- - synesthesia
31
- - directml
32
- - vulkan
33
- - wgpu
34
- - audio
35
- - midi
36
  language:
37
  - en
38
- library_name: onnxruntime
39
- base_model:
40
- - unsloth/gemma-3n-E2B-it
41
- - google/magenta-realtime
42
- ---
43
-
44
- # Synesthesia — AI Music Models
45
-
46
- ONNX and GGUF model weights for [Synesthesia](https://github.com/kryptodogg/synesthesia),
47
- a cyber-physical synthesizer, 3D/4D signal workstation, and multi-modal music AI app.
48
-
49
- Synesthesia brings together every open-weights model from **Magenta Classic** and
50
- **Magenta RT** under one repo, exportable to ONNX for local inference and continuously
51
- fine-tunable via free Google Colab notebooks.
52
-
53
- ---
54
-
55
- ## Inference Runtimes
56
-
57
- | Runtime | Models | Backend | Notes |
58
- |---------|--------|---------|-------|
59
- | **Burn wgpu** | DDSP, GANSynth, NSynth, Piano Genie | Vulkan / DX12 | Pure Rust, no ROCm required |
60
- | **ORT + DirectML** | RNN family, MusicVAE, Coconet, Onsets & Frames | DirectML | Fallback while Burn op coverage matures |
61
- | **llama.cpp + Vulkan** | Gemma-3N | Vulkan | Same stack as LM Studio, GGUF format |
62
- | **Magenta RT (JAX)** | Magenta RT LLM, SpectroStream, MusicCoCa | TPU / GPU | Free Colab TPU v2-8 for inference + finetuning |
63
-
64
- Vulkan works on AMD without ROCm on Windows 11. All runtimes target the RX 6700 XT.
65
-
66
- ---
67
-
68
- ## Model Inventory
69
-
70
- ### Magenta RT (Real-Time Audio Generation)
71
-
72
- Magenta RT is composed of three components working as a pipeline:
73
- SpectroStream (audio codec), MusicCoCa (style embeddings), and an encoder-decoder
74
- transformer LLM — the only open-weights model supporting real-time continuous
75
- musical audio generation.
76
-
77
- It is an 800 million parameter autoregressive transformer trained on
78
- ~190k hours of stock music. It uses 38% fewer parameters
79
- than Stable Audio Open and 77% fewer than MusicGen Large.
80
-
81
- | ID | Model | Format | Task | Synesthesia Role |
82
- |----|-------|--------|------|-----------------|
83
- | MRT-001 | Magenta RT LLM | JAX / ONNX | Real-time stereo audio generation | Continuous live generation engine |
84
- | MRT-002 | SpectroStream Encoder | ONNX | Audio → discrete tokens (48kHz stereo, 25Hz, 64 RVQ) | Audio tokenizer |
85
- | MRT-003 | SpectroStream Decoder | ONNX | Tokens → 48kHz stereo audio | Audio detokenizer |
86
- | MRT-004 | MusicCoCa Text | ONNX | Text → 768-dim music embedding | Text prompt → style control |
87
- | MRT-005 | MusicCoCa Audio | ONNX | Audio → 768-dim music embedding | Audio prompt → style control |
88
-
89
- **Finetuning:** Free Colab TPU v2-8 via `Magenta_RT_Finetune.ipynb`. Customize to
90
- your own audio catalog. Official Colab demos support live generation,
91
- finetuning, and live audio injection (audio injection = mix user audio with model
92
- output and feed as context for next generation chunk).
93
-
94
- ---
95
-
96
- ### Magenta Classic — MIDI / Symbolic
97
-
98
- MusicRNN implements Magenta's LSTM-based language models:
99
- MelodyRNN, DrumsRNN, ImprovRNN, and PerformanceRNN.
100
-
101
- | ID | Model | Format | Task | Synesthesia Role |
102
- |----|-------|--------|------|-----------------|
103
- | MC-001 | Performance RNN | ONNX | Expressive MIDI performance generation | AI arpeggiator, live note generation |
104
- | MC-002 | Melody RNN | ONNX | Melody continuation (LSTM) | Melody continuation tool |
105
- | MC-003 | Drums RNN | ONNX | Drum pattern generation (LSTM) | Beat generation |
106
- | MC-004 | Improv RNN | ONNX | Chord-conditioned melody generation | Live improv over chord progressions |
107
- | MC-005 | Polyphony RNN | ONNX | Polyphonic music generation (BachBot) | Harmonic voice generation |
108
- | MC-006 | MusicVAE | ONNX enc+dec | Latent music VAE — melody, drum, trio loops | Latent interpolation, style morphing |
109
- | MC-007 | GrooVAE | ONNX enc+dec | Drum performance humanization | Humanize MIDI drums |
110
- | MC-008 | MidiMe | ONNX | Personalize MusicVAE in-session | User-adaptive latent space |
111
- | MC-009 | Music Transformer | ONNX | Long-form piano generation | Extended composition |
112
- | MC-010 | Coconet | ONNX | Counterpoint by convolution — complete partial scores | Harmony / counterpoint filler |
113
-
114
- ---
115
-
116
- ### Magenta Classic — Audio / Timbre
117
-
118
- | ID | Model | Format | Task | Synesthesia Role |
119
- |----|-------|--------|------|-----------------|
120
- | MA-001 | GANSynth | ONNX | GAN audio synthesis from NSynth timbres | GANHarp-style timbre instrument |
121
- | MA-002 | NSynth | ONNX | WaveNet neural audio synthesis | Sample-level timbre generation |
122
- | MA-003 | DDSP Encoder | ONNX | Audio → harmonic + noise params | Timbre analysis |
123
- | MA-004 | DDSP Decoder | ONNX | Harmonic params → audio | Timbre resynthesis |
124
- | MA-005 | Piano Genie | ONNX | 8-button → 88-key piano VQ-VAE | Accessible piano performance |
125
- | MA-006 | Onsets and Frames | ONNX | Polyphonic piano transcription (audio → MIDI) | Audio → MIDI transcription |
126
- | MA-007 | SPICE | ONNX | Pitch extraction from audio | Monophonic pitch tracking |
127
-
128
- ---
129
-
130
- ### LLM / Vision Control
131
-
132
- | ID | Model | Format | Task | Synesthesia Role |
133
- |----|-------|--------|------|-----------------|
134
- | LV-001 | Gemma-3N e2b-it | GGUF | Vision + text → structured JSON | Camera → mood/energy/key control |
135
-
136
- **Format tiers:**
137
- - `q4_k_m.gguf` — default (recommended, ~1.5GB)
138
- - `q2_k.gguf` — lite tier (fastest, smallest)
139
- - `f16.gguf` — full quality reference
140
-
141
- **Runtime:** `llama-cpp-v3` Rust crate with Vulkan backend.
142
- Same stack as LM Studio — no ROCm, no CUDA needed on Windows.
143
-
144
- ---
145
-
146
- ## Repository Structure
147
-
148
- ```
149
- Ashiedu/Synesthesia/
150
-
151
- ├── manifest.json ← authoritative model registry
152
-
153
- ├── magenta_rt/
154
- │ ├── llm/ ← MRT-001: JAX checkpoint + ONNX export
155
- │ ├── spectrostream/
156
- │ │ ├── encoder_fp32.onnx
157
- │ │ ├── encoder_fp16.onnx
158
- │ │ ├── decoder_fp32.onnx
159
- │ │ └── decoder_fp16.onnx
160
- │ └── musiccoca/
161
- │ ├── text_fp32.onnx
162
- │ ├── text_fp16.onnx
163
- │ ├── audio_fp32.onnx
164
- │ └── audio_fp16.onnx
165
-
166
- ├── midi/
167
- │ ├── perfrnn/ ← MC-001: fp32 / fp16 / int8
168
- │ ├── melody_rnn/ ← MC-002
169
- │ ├── drums_rnn/ ← MC-003
170
- │ ├── improv_rnn/ ← MC-004
171
- │ ├── polyphony_rnn/ ← MC-005
172
- │ ├── musicvae/ ← MC-006: encoder + decoder
173
- │ ├── groovae/ ← MC-007
174
- │ ├── midime/ ← MC-008
175
- │ ├── music_transformer/ ← MC-009
176
- │ └── coconet/ ← MC-010
177
-
178
- ├── audio/
179
- │ ├── gansynth/ ← MA-001: fp32 / fp16
180
- │ ├── nsynth/ ← MA-002
181
- │ ├── ddsp/ ← MA-003+004: encoder + decoder
182
- │ ├── piano_genie/ ← MA-005
183
- │ ├── onsets_and_frames/ ← MA-006
184
- │ └── spice/ ← MA-007
185
-
186
- └── llm/
187
- └── gemma3n_e2b/
188
- ├── q4_k_m.gguf ← LV-001: default
189
- ├── q2_k.gguf
190
- └── f16.gguf
191
- ```
192
-
193
- Each subdirectory contains a `README.md` with input/output shapes,
194
- export commands, and Burn compatibility status.
195
-
196
- ---
197
-
198
- ## Quality Tiers (ONNX models)
199
-
200
- | Tier | Suffix | VRAM est. | Use case |
201
- |------|--------|-----------|----------|
202
- | Full | `_fp32.onnx` | ~2–4× Half | Reference quality, CI validation |
203
- | **Half** | `_fp16.onnx` | Baseline | **Default — recommended for RX 6700 XT** |
204
- | Lite | `_int8.onnx` | ~0.5× Half | Lowest latency (MIDI models only) |
205
-
206
- ---
207
-
208
- ## Pulling Models in Rust
209
-
210
- ```rust
211
- use hf_hub::api::sync::Api;
212
-
213
- pub fn pull(repo_path: &str) -> anyhow::Result<std::path::PathBuf> {
214
- let api = Api::new()?;
215
- let repo = api.model("Ashiedu/Synesthesia".to_string());
216
- Ok(repo.get(repo_path)?)
217
- // Cached: ~/.cache/huggingface/hub/
218
- }
219
-
220
- // Example
221
- let path = pull("midi/perfrnn/fp16.onnx")?;
222
- ```
223
-
224
- ## Pulling Models in Python
225
-
226
- ```python
227
- from huggingface_hub import snapshot_download, hf_hub_download
228
-
229
- # Pull everything
230
- snapshot_download("Ashiedu/Synesthesia", local_dir="./models")
231
-
232
- # Pull one file
233
- hf_hub_download(
234
- repo_id="Ashiedu/Synesthesia",
235
- filename="midi/perfrnn/fp16.onnx",
236
- local_dir="./models",
237
- )
238
- ```
239
-
240
  ---
241
 
242
- ## Export Workflow (Colab)
243
-
244
- All models are exported from Colab and pushed here. The generic workflow:
245
-
246
- ```python
247
- # 1. Pull existing checkpoint (if updating)
248
- from huggingface_hub import snapshot_download
249
- snapshot_download("Ashiedu/Synesthesia", local_dir="./models", token=HF_TOKEN)
250
-
251
- # 2. Clone Magenta source
252
- # !git clone https://github.com/magenta/magenta
253
- # !git clone https://github.com/magenta/magenta-realtime
254
-
255
- # 3. Export to ONNX (varies per model — see each model's README)
256
- # Magenta Classic: tf2onnx
257
- # Magenta RT: JAX → onnx via jax2onnx or flax export
258
- # Gemma-3N: Unsloth → GGUF
259
-
260
- # 4. Quantize
261
- from onnxruntime.quantization import quantize_dynamic, QuantType
262
- import onnxconverter_common as occ, onnx
263
 
264
- fp32 = onnx.load("model.onnx")
265
- fp16 = occ.convert_float_to_float16(fp32, keep_io_types=True)
266
- onnx.save(fp16, "model_fp16.onnx")
267
- quantize_dynamic("model.onnx", "model_int8.onnx", weight_type=QuantType.QInt8)
268
-
269
- # 5. Push to HF
270
- from huggingface_hub import HfApi
271
- api = HfApi(token=HF_TOKEN) # set in Colab Secrets
272
- api.upload_file(
273
- path_or_fileobj="model_fp16.onnx",
274
- path_in_repo="midi/perfrnn/fp16.onnx",
275
- repo_id="Ashiedu/Synesthesia",
276
- commit_message="MC-001 Performance RNN fp16",
277
- )
278
- ```
279
-
280
- **Gemini on Colab:** Point Gemini at this README and the model's subdirectory
281
- README as context. Gemini can execute the export + push workflow without
282
- GitHub integration — it only needs Python and your HF token in Colab Secrets.
283
-
284
- ---
285
-
286
- ## Burn Compatibility Tracking
287
-
288
- CI weekly attempts `burn-onnx ModelGen` on each exported model.
289
- Models migrate from ORT fallback to Burn as op coverage matures.
290
-
291
- | Model | Burn target | ORT fallback | Last checked |
292
- |-------|------------|--------------|-------------|
293
- | DDSP enc/dec | ✅ | ❌ | — |
294
- | GANSynth | ✅ | ❌ | — |
295
- | NSynth | ✅ | ❌ | — |
296
- | Piano Genie | ✅ | ❌ | — |
297
- | Performance RNN | 🔄 LSTM | ✅ | — |
298
- | Melody RNN | 🔄 LSTM | ✅ | — |
299
- | Drums RNN | 🔄 LSTM | ✅ | — |
300
- | Improv RNN | 🔄 LSTM | ✅ | — |
301
- | Polyphony RNN | 🔄 LSTM | ✅ | — |
302
- | MusicVAE | 🔄 BiLSTM | ✅ | — |
303
- | Coconet | 🔄 Conv | ✅ | — |
304
- | Music Transformer | 🔄 Attention | ✅ | — |
305
- | Onsets & Frames | 🔄 Conv+LSTM | ✅ | — |
306
- | SpectroStream | 🔄 Conv | ✅ | — |
307
- | MusicCoCa | 🔄 ViT+Transformer | ✅ | — |
308
- | Gemma-3N | N/A — llama.cpp | ❌ | — |
309
-
310
- ---
311
-
312
- ## Training Philosophy
313
-
314
- **Train after the app works.** The interface ships first. Training data
315
- is determined by what the working app actually receives as input in practice.
316
- Fine-tune on your own audio and MIDI once the signal chain is wired.
317
-
318
- Tentative fine-tuning order once the app is functional:
319
- 1. Performance RNN — live MIDI from the Track Mixer
320
- 2. MusicVAE / GrooVAE — latent interpolation between patches
321
- 3. GANSynth — timbre generation from pitch + latent input
322
- 4. DDSP — resynthesis of GANSynth outputs
323
- 5. Magenta RT — full audio, conditioned on your own catalog
324
- 6. Gemma-3N — camera → mood/energy trained on your session recordings
325
-
326
- ---
327
-
328
- ## License
329
-
330
- - Codebase: Apache 2.0
331
- - Magenta Classic weights: Apache 2.0
332
- - Magenta RT weights: Apache 2.0 with additional [bespoke terms](https://github.com/magenta/magenta-realtime/blob/main/LICENSE)
333
- - Gemma-3N: [Gemma Terms of Use](https://ai.google.dev/gemma/terms)
334
-
335
- Individual model directories note any additional upstream license terms.
336
-
337
- ---
338
 
339
- ## Links
340
 
341
- - **App:** [kryptodogg/synesthesia](https://github.com/kryptodogg/synesthesia)
342
- - **Magenta RT:** [magenta/magenta-realtime](https://github.com/magenta/magenta-realtime)
343
- - **Magenta Classic:** [magenta/magenta](https://github.com/magenta/magenta)
344
- - **HF Model Card:** [google/magenta-realtime](https://huggingface.co/google/magenta-realtime)
345
- - **Roadmap:** GitHub Issues — `lane:ml` label
 
1
  ---
2
+ base_model: unsloth/gemma-3n-e4b-unsloth-bnb-4bit
 
 
 
 
3
  tags:
4
+ - text-generation-inference
5
+ - transformers
6
+ - unsloth
7
+ - gemma3n
8
+ license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  language:
10
  - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
+ # Uploaded finetuned model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
+ - **Developed by:** Ashiedu
16
+ - **License:** apache-2.0
17
+ - **Finetuned from model :** unsloth/gemma-3n-e4b-unsloth-bnb-4bit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
+ This gemma3n model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
20
 
21
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)