Krea 2 Projector Explorations

Small, Krea-derived interpretability artifacts for Krea 2's text conditioning — the learned layer-mix ("multilayer feature aggregation") plus single-layer probes. Full toolkit, methods, figures, and write-up: github.com/fblissjr/krea-explorations.

Krea 2's text encoder is a frozen Qwen3-VL-4B; the DiT takes 12 selected encoder hidden-state layers [2,5,8,11,14,17,20,23,26,29,32,35] (select_layers), combines them with cross-layer attention, then a learned Linear(12 → 1) projector (txtfusion.projector). That matrix is the model's own per-layer weighting — identical in Raw and Turbo (cosine 1.0):

layer	L2	L5	L8	L11	L14	L17	L20	L23	L26	L29	L32	L35
w	-0.05	-0.16	+0.37	+0.50	+0.71	+0.39	+0.40	-1.44	-0.51	-0.89	-0.61	+0.11

It combines contrastively ("mid plus, deep minus"), not as an average.

What we measured

These are characterizations of an open model's learned behavior (not architecture — the architecture is public); most are low-effort to reproduce. Full method + confidence levels in the GitHub repo.

L20 is a learned directional attention hub. In the cross-layer attention, ~91–95% of content tokens route to layer 20 — content-driven (not a padding artifact) and a directional effect (not a magnitude sink). Holds across 5 prompts and on both Raw and Turbo. The token-side "refiner" blocks, by contrast, are diffuse (no hub).
The projector-rebalance lever is a detail/intensity knob, not an attribute gate. Benign attributes (expression, "wet", blush) come through the aggregation and render with or without rebalancing; boosting the deep layers mainly shifts detail / contrast / intensity — consistent with the deep layers carrying fine detail.

Per-layer reweighting of Krea 2's conditioning was introduced by nova452/ComfyUI-ConditioningKrea2Rebalance and refined by huwhitememes/comfyui-krea2-conditioning.

Files

krea2_projector_original_weights.safetensors — a reference copy of the 12 learned projector weights above (the [1,12] tensor itself). Read-only reference, not a LoRA to apply.
solo/projector_solo_bNN_Lxx.safetensors — 12 diagnostic probes. Each is a projector .diff that, at strength 1, keeps one of the projector's 12 inputs and zeroes the other 11, so the DiT conditions on a single slot — useful to see what that slot contributes (deep slots render coherent images, shallow are noise, L14 carries text/structure, L35 alone is unusable).

Important: the projector's 12 inputs are the attention-mixed slots (output of the 2 layerwise blocks), not pristine encoder layers — and because the cross-layer attention routes through L20, every slot already carries L20 content. So a "solo Lx" isolates the slot indexed by layer x, not a clean layer x. These are interpretability probes, not generation LoRAs (keeping one input by design gives a partial/degraded image).

Each solo/ file is a diffusion_model.txtfusion.projector.diff patch (one [1,12] tensor, ~300 bytes), loadable via the stock LoraLoaderModelOnly — no custom node. (ComfyUI calls the selected layers "taps".)

License

These artifacts derive from Krea 2 and are covered by the Krea 2 Community License (see the base model linked above).

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for fbjr/krea-explorations

Base model

krea/Krea-2-Raw

Adapter

(66)

this model