Ashiedu
/

Synesthesia

@@ -1,345 +1,21 @@
 ---
-license: apache-2.0
-task_categories:
-- audio-to-audio
-- text-to-audio
-- image-to-text
 tags:
-- music-generation
-- magenta
-- magenta-rt
-- onnx
-- burn
-- llama-cpp
-- performance-rnn
-- melody-rnn
-- drums-rnn
-- improv-rnn
-- polyphony-rnn
-- musicvae
-- groovae
-- piano-genie
-- ddsp
-- gansynth
-- nsynth
-- coconet
-- music-transformer
-- onsets-and-frames
-- spectrostream
-- musiccoca
-- synesthesia
-- directml
-- vulkan
-- wgpu
-- audio
-- midi
 language:
 - en
-library_name: onnxruntime
-base_model:
-- unsloth/gemma-3n-E2B-it
-- google/magenta-realtime
----
-# Synesthesia — AI Music Models
-ONNX and GGUF model weights for [Synesthesia](https://github.com/kryptodogg/synesthesia),
-a cyber-physical synthesizer, 3D/4D signal workstation, and multi-modal music AI app.
-Synesthesia brings together every open-weights model from **Magenta Classic** and
-**Magenta RT** under one repo, exportable to ONNX for local inference and continuously
-fine-tunable via free Google Colab notebooks.
----
-## Inference Runtimes
-| Runtime | Models | Backend | Notes |
-|---------|--------|---------|-------|
-| **Burn wgpu** | DDSP, GANSynth, NSynth, Piano Genie | Vulkan / DX12 | Pure Rust, no ROCm required |
-| **ORT + DirectML** | RNN family, MusicVAE, Coconet, Onsets & Frames | DirectML | Fallback while Burn op coverage matures |
-| **llama.cpp + Vulkan** | Gemma-3N | Vulkan | Same stack as LM Studio, GGUF format |
-| **Magenta RT (JAX)** | Magenta RT LLM, SpectroStream, MusicCoCa | TPU / GPU | Free Colab TPU v2-8 for inference + finetuning |
-Vulkan works on AMD without ROCm on Windows 11. All runtimes target the RX 6700 XT.
----
-## Model Inventory
-### Magenta RT (Real-Time Audio Generation)
-Magenta RT is composed of three components working as a pipeline:
-SpectroStream (audio codec), MusicCoCa (style embeddings), and an encoder-decoder
-transformer LLM — the only open-weights model supporting real-time continuous
-musical audio generation.
-It is an 800 million parameter autoregressive transformer trained on
-~190k hours of stock music. It uses 38% fewer parameters
-than Stable Audio Open and 77% fewer than MusicGen Large.
-| ID | Model | Format | Task | Synesthesia Role |
-|----|-------|--------|------|-----------------|
-| MRT-001 | Magenta RT LLM | JAX / ONNX | Real-time stereo audio generation | Continuous live generation engine |
-| MRT-002 | SpectroStream Encoder | ONNX | Audio → discrete tokens (48kHz stereo, 25Hz, 64 RVQ) | Audio tokenizer |
-| MRT-003 | SpectroStream Decoder | ONNX | Tokens → 48kHz stereo audio | Audio detokenizer |
-| MRT-004 | MusicCoCa Text | ONNX | Text → 768-dim music embedding | Text prompt → style control |
-| MRT-005 | MusicCoCa Audio | ONNX | Audio → 768-dim music embedding | Audio prompt → style control |
-**Finetuning:** Free Colab TPU v2-8 via `Magenta_RT_Finetune.ipynb`. Customize to
-your own audio catalog. Official Colab demos support live generation,
-finetuning, and live audio injection (audio injection = mix user audio with model
-output and feed as context for next generation chunk).
----
-### Magenta Classic — MIDI / Symbolic
-MusicRNN implements Magenta's LSTM-based language models:
-MelodyRNN, DrumsRNN, ImprovRNN, and PerformanceRNN.
-| ID | Model | Format | Task | Synesthesia Role |
-|----|-------|--------|------|-----------------|
-| MC-001 | Performance RNN | ONNX | Expressive MIDI performance generation | AI arpeggiator, live note generation |
-| MC-002 | Melody RNN | ONNX | Melody continuation (LSTM) | Melody continuation tool |
-| MC-003 | Drums RNN | ONNX | Drum pattern generation (LSTM) | Beat generation |
-| MC-004 | Improv RNN | ONNX | Chord-conditioned melody generation | Live improv over chord progressions |
-| MC-005 | Polyphony RNN | ONNX | Polyphonic music generation (BachBot) | Harmonic voice generation |
-| MC-006 | MusicVAE | ONNX enc+dec | Latent music VAE — melody, drum, trio loops | Latent interpolation, style morphing |
-| MC-007 | GrooVAE | ONNX enc+dec | Drum performance humanization | Humanize MIDI drums |
-| MC-008 | MidiMe | ONNX | Personalize MusicVAE in-session | User-adaptive latent space |
-| MC-009 | Music Transformer | ONNX | Long-form piano generation | Extended composition |
-| MC-010 | Coconet | ONNX | Counterpoint by convolution — complete partial scores | Harmony / counterpoint filler |
----
-### Magenta Classic — Audio / Timbre
-| ID | Model | Format | Task | Synesthesia Role |
-|----|-------|--------|------|-----------------|
-| MA-001 | GANSynth | ONNX | GAN audio synthesis from NSynth timbres | GANHarp-style timbre instrument |
-| MA-002 | NSynth | ONNX | WaveNet neural audio synthesis | Sample-level timbre generation |
-| MA-003 | DDSP Encoder | ONNX | Audio → harmonic + noise params | Timbre analysis |
-| MA-004 | DDSP Decoder | ONNX | Harmonic params → audio | Timbre resynthesis |
-| MA-005 | Piano Genie | ONNX | 8-button → 88-key piano VQ-VAE | Accessible piano performance |
-| MA-006 | Onsets and Frames | ONNX | Polyphonic piano transcription (audio → MIDI) | Audio → MIDI transcription |
-| MA-007 | SPICE | ONNX | Pitch extraction from audio | Monophonic pitch tracking |
----
-### LLM / Vision Control
-| ID | Model | Format | Task | Synesthesia Role |
-|----|-------|--------|------|-----------------|
-| LV-001 | Gemma-3N e2b-it | GGUF | Vision + text → structured JSON | Camera → mood/energy/key control |
-**Format tiers:**
-- `q4_k_m.gguf` — default (recommended, ~1.5GB)
-- `q2_k.gguf` — lite tier (fastest, smallest)
-- `f16.gguf` — full quality reference
-**Runtime:** `llama-cpp-v3` Rust crate with Vulkan backend.
-Same stack as LM Studio — no ROCm, no CUDA needed on Windows.
----
-## Repository Structure
-```
-Ashiedu/Synesthesia/
-│
-├── manifest.json                    ← authoritative model registry
-│
-├── magenta_rt/
-│   ├── llm/                         ← MRT-001: JAX checkpoint + ONNX export
-│   ├── spectrostream/
-│   │   ├── encoder_fp32.onnx
-│   │   ├── encoder_fp16.onnx
-│   │   ├── decoder_fp32.onnx
-│   │   └── decoder_fp16.onnx
-│   └── musiccoca/
-│       ├── text_fp32.onnx
-│       ├── text_fp16.onnx
-│       ├── audio_fp32.onnx
-│       └── audio_fp16.onnx
-│
-├── midi/
-│   ├── perfrnn/                     ← MC-001: fp32 / fp16 / int8
-│   ├── melody_rnn/                  ← MC-002
-│   ├── drums_rnn/                   ← MC-003
-│   ├── improv_rnn/                  ← MC-004
-│   ├── polyphony_rnn/               ← MC-005
-│   ├── musicvae/                    ← MC-006: encoder + decoder
-│   ├── groovae/                     ← MC-007
-│   ├── midime/                      ← MC-008
-│   ├── music_transformer/           ← MC-009
-│   └── coconet/                     ← MC-010
-│
-├── audio/
-│   ├── gansynth/                    ← MA-001: fp32 / fp16
-│   ├── nsynth/                      ← MA-002
-│   ├── ddsp/                        ← MA-003+004: encoder + decoder
-│   ├── piano_genie/                 ← MA-005
-│   ├── onsets_and_frames/           ← MA-006
-│   └── spice/                       ← MA-007
-│
-└── llm/
-    └── gemma3n_e2b/
-        ├── q4_k_m.gguf              ← LV-001: default
-        ├── q2_k.gguf
-        └── f16.gguf
-```
-Each subdirectory contains a `README.md` with input/output shapes,
-export commands, and Burn compatibility status.
----
-## Quality Tiers (ONNX models)
-| Tier | Suffix | VRAM est. | Use case |
-|------|--------|-----------|----------|
-| Full | `_fp32.onnx` | ~2–4× Half | Reference quality, CI validation |
-| **Half** | `_fp16.onnx` | Baseline | **Default — recommended for RX 6700 XT** |
-| Lite | `_int8.onnx` | ~0.5× Half | Lowest latency (MIDI models only) |
----
-## Pulling Models in Rust
-```rust
-use hf_hub::api::sync::Api;
-pub fn pull(repo_path: &str) -> anyhow::Result<std::path::PathBuf> {
-    let api = Api::new()?;
-    let repo = api.model("Ashiedu/Synesthesia".to_string());
-    Ok(repo.get(repo_path)?)
-    // Cached: ~/.cache/huggingface/hub/
-}
-// Example
-let path = pull("midi/perfrnn/fp16.onnx")?;
-```
-## Pulling Models in Python
-```python
-from huggingface_hub import snapshot_download, hf_hub_download
-# Pull everything
-snapshot_download("Ashiedu/Synesthesia", local_dir="./models")
-# Pull one file
-hf_hub_download(
-    repo_id="Ashiedu/Synesthesia",
-    filename="midi/perfrnn/fp16.onnx",
-    local_dir="./models",
-)
-```
 ---
-## Export Workflow (Colab)
-All models are exported from Colab and pushed here. The generic workflow:
-```python
-# 1. Pull existing checkpoint (if updating)
-from huggingface_hub import snapshot_download
-snapshot_download("Ashiedu/Synesthesia", local_dir="./models", token=HF_TOKEN)
-# 2. Clone Magenta source
-# !git clone https://github.com/magenta/magenta
-# !git clone https://github.com/magenta/magenta-realtime
-# 3. Export to ONNX (varies per model — see each model's README)
-# Magenta Classic: tf2onnx
-# Magenta RT: JAX → onnx via jax2onnx or flax export
-# Gemma-3N: Unsloth → GGUF
-# 4. Quantize
-from onnxruntime.quantization import quantize_dynamic, QuantType
-import onnxconverter_common as occ, onnx
-fp32 = onnx.load("model.onnx")
-fp16 = occ.convert_float_to_float16(fp32, keep_io_types=True)
-onnx.save(fp16, "model_fp16.onnx")
-quantize_dynamic("model.onnx", "model_int8.onnx", weight_type=QuantType.QInt8)
-# 5. Push to HF
-from huggingface_hub import HfApi
-api = HfApi(token=HF_TOKEN)  # set in Colab Secrets
-api.upload_file(
-    path_or_fileobj="model_fp16.onnx",
-    path_in_repo="midi/perfrnn/fp16.onnx",
-    repo_id="Ashiedu/Synesthesia",
-    commit_message="MC-001 Performance RNN fp16",
-)
-```
-**Gemini on Colab:** Point Gemini at this README and the model's subdirectory
-README as context. Gemini can execute the export + push workflow without
-GitHub integration — it only needs Python and your HF token in Colab Secrets.
----
-## Burn Compatibility Tracking
-CI weekly attempts `burn-onnx ModelGen` on each exported model.
-Models migrate from ORT fallback to Burn as op coverage matures.
-| Model | Burn target | ORT fallback | Last checked |
-|-------|------------|--------------|-------------|
-| DDSP enc/dec | ✅ | ❌ | — |
-| GANSynth | ✅ | ❌ | — |
-| NSynth | ✅ | ❌ | — |
-| Piano Genie | ✅ | ❌ | — |
-| Performance RNN | 🔄 LSTM | ✅ | — |
-| Melody RNN | 🔄 LSTM | ✅ | — |
-| Drums RNN | 🔄 LSTM | ✅ | — |
-| Improv RNN | 🔄 LSTM | ✅ | — |
-| Polyphony RNN | 🔄 LSTM | ✅ | — |
-| MusicVAE | 🔄 BiLSTM | ✅ | — |
-| Coconet | 🔄 Conv | ✅ | — |
-| Music Transformer | 🔄 Attention | ✅ | — |
-| Onsets & Frames | 🔄 Conv+LSTM | ✅ | — |
-| SpectroStream | 🔄 Conv | ✅ | — |
-| MusicCoCa | 🔄 ViT+Transformer | ✅ | — |
-| Gemma-3N | N/A — llama.cpp | ❌ | — |
----
-## Training Philosophy
-**Train after the app works.** The interface ships first. Training data
-is determined by what the working app actually receives as input in practice.
-Fine-tune on your own audio and MIDI once the signal chain is wired.
-Tentative fine-tuning order once the app is functional:
-1. Performance RNN — live MIDI from the Track Mixer
-2. MusicVAE / GrooVAE — latent interpolation between patches
-3. GANSynth — timbre generation from pitch + latent input
-4. DDSP — resynthesis of GANSynth outputs
-5. Magenta RT — full audio, conditioned on your own catalog
-6. Gemma-3N — camera → mood/energy trained on your session recordings
----
-## License
-- Codebase: Apache 2.0
-- Magenta Classic weights: Apache 2.0
-- Magenta RT weights: Apache 2.0 with additional [bespoke terms](https://github.com/magenta/magenta-realtime/blob/main/LICENSE)
-- Gemma-3N: [Gemma Terms of Use](https://ai.google.dev/gemma/terms)
-Individual model directories note any additional upstream license terms.
----
-## Links
-- **App:** [kryptodogg/synesthesia](https://github.com/kryptodogg/synesthesia)
-- **Magenta RT:** [magenta/magenta-realtime](https://github.com/magenta/magenta-realtime)
-- **Magenta Classic:** [magenta/magenta](https://github.com/magenta/magenta)
-- **HF Model Card:** [google/magenta-realtime](https://huggingface.co/google/magenta-realtime)
-- **Roadmap:** GitHub Issues — `lane:ml` label

 ---
+base_model: unsloth/gemma-3n-e4b-unsloth-bnb-4bit
 tags:
+- text-generation-inference
+- transformers
+- unsloth
+- gemma3n
+license: apache-2.0
 language:
 - en
 ---
+# Uploaded finetuned  model
+- **Developed by:** Ashiedu
+- **License:** apache-2.0
+- **Finetuned from model :** unsloth/gemma-3n-e4b-unsloth-bnb-4bit
+This gemma3n model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
+[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)