docs: update validation results — Green 75%, White 90%, overall 92.3%

b521330 verified 2 days ago

4.32 kB

language:
  - en
license: other
license_name: earthlyframes-collaborative-intelligence-license
license_link: >-
  https://github.com/brotherclone/white/blob/main/COLLABORATIVE_INTELLIGENCE_LICENSE.md
pipeline_tag: audio-classification
tags:
  - audio
  - music
  - onnx
  - chromatic
  - rainbow-table
base_model:
  - laion/larger_clap_music
  - microsoft/deberta-v3-base

Refractor CDM

Refractor CDM (Compact Disc Module) is a lightweight MLP calibration head that classifies full-mix audio recordings into one of nine "rainbow colors" — a chromatic taxonomy used in The Rainbow Table, an AI-assisted album series.

The CDM is a companion to the base Refractor ONNX model (a multimodal fusion network trained on short catalog segments). The base model works well for MIDI and short audio clips but predicts poorly on full-mix audio because CLAP embeddings are optimized for short segments. The CDM corrects this by training directly on chunked full-mix audio.

Model Details

Property	Value
Architecture	2-layer MLP (256 → 128 → 9)
Parameters	361,993
Input	CLAP audio (512-dim) + DeBERTa concept (768-dim) = 1280-dim
Output	Softmax probabilities over 9 colors (`color_probs`, shape `[batch, 9]`)
Format	ONNX (`refractor_cdm.onnx`, 1.4 MB)
Training data	3,450 chunks from 78 full-mix songs across all 9 colors
Loss	CrossEntropyLoss with label smoothing (0.1) + inverse-frequency class weights

Color Classes

Index  Color    CHROMATIC_TARGETS (temporal / spatial / ontological)
  0    Red      Past / Thing / Known
  1    Orange   Past / Thing / Imagined
  2    Yellow   Future / Place / Imagined
  3    Green    Future / Place / Forgotten
  4    Blue     Present / Person / Forgotten
  5    Indigo   Uniform / Uniform / Known+Forgotten [0.1, 0.4, 0.4]
  6    Violet   Present / Person / Known
  7    White    Uniform across all axes
  8    Black    Uniform across all axes

Targets are derived at runtime from `app/structures/concepts/chromatic_targets.py`,
which reads directly from the canonical `the_rainbow_table_colors` Pydantic model.
Previous versions had hand-rolled copies that diverged for 7 of 9 colours; this was
corrected in April 2026 (fix-chromatic-targets-canonical-source).

Validation Results

Evaluated on 78 labeled songs from staged_raw_material using 30s/5s-stride chunked scoring with confidence-weighted aggregation.

Color	Correct	Total	Accuracy
Red	11	12	91.7%
Orange	4	4	100.0%
Yellow	10	10	100.0%
Green	6	8	75.0%
Blue	11	11	100.0%
Indigo	10	11	90.9%
Violet	11	12	91.7%
White	9	10	90.0%
Overall	72	78	92.3%

Usage

The CDM is used via the Refractor wrapper. It auto-loads when refractor_cdm.onnx is present alongside refractor.onnx.

from training.refractor import Refractor

scorer = Refractor()  # CDM auto-detected

result = scorer.score(
    audio_emb=scorer.prepare_audio(waveform, sr=48000),
    concept_emb=scorer.prepare_concept("A song about forgetting the future"),
)
# result: {"temporal": {...}, "spatial": {...}, "ontological": {...}, "confidence": 0.93}

For full-mix WAV files, use chunk_audio + aggregate_chunk_scores from score_mix.py to score in overlapping windows and pool results.

Training

# Phase 1 — extract CLAP + concept embeddings from staged_raw_material/
python training/extract_cdm_embeddings.py

# Phase 2 — train on Modal (A10G GPU)
modal run training/modal_train_refractor_cdm.py

# Validate
python training/validate_mix_scoring.py

Limitations

CLAP embeddings have a maximum internal window of ~10s; chunked scoring is essential for full-length tracks
Green classification is the weakest at 75% — two songs are near the Yellow/Violet boundary
Training data is drawn from a single artist's catalog — generalization to other music is untested
The concept embedding path requires a DeBERTa-v3-base inference pass (~600 MB model)

Citation

Part of The Rainbow Table generative music pipeline. See brotherclone/white and earthlyframes/white-training-data.