Krea 2 Projector Explorations
Small, Krea-derived interpretability artifacts for Krea 2's text conditioning β the learned layer-mix ("multilayer feature aggregation") plus single-layer probes. Full toolkit, methods, figures, and write-up: github.com/fblissjr/krea-explorations.
Krea 2's text encoder is a frozen Qwen3-VL-4B; the DiT takes 12 selected encoder hidden-state layers
[2,5,8,11,14,17,20,23,26,29,32,35] (select_layers), combines them with cross-layer attention, then a
learned Linear(12 β 1) projector (txtfusion.projector). That matrix is the model's own per-layer
weighting β identical in Raw and Turbo (cosine 1.0):
| layer | L2 | L5 | L8 | L11 | L14 | L17 | L20 | L23 | L26 | L29 | L32 | L35 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| w | -0.05 | -0.16 | +0.37 | +0.50 | +0.71 | +0.39 | +0.40 | -1.44 | -0.51 | -0.89 | -0.61 | +0.11 |
It combines contrastively ("mid plus, deep minus"), not as an average.
What we measured
These are characterizations of an open model's learned behavior (not architecture β the architecture is public); most are low-effort to reproduce. Full method + confidence levels in the GitHub repo.
- L20 is a learned directional attention hub. In the cross-layer attention, ~91β95% of content tokens route to layer 20 β content-driven (not a padding artifact) and a directional effect (not a magnitude sink). Holds across 5 prompts and on both Raw and Turbo. The token-side "refiner" blocks, by contrast, are diffuse (no hub).
- The projector-rebalance lever is a detail/intensity knob, not an attribute gate. Benign attributes (expression, "wet", blush) come through the aggregation and render with or without rebalancing; boosting the deep layers mainly shifts detail / contrast / intensity β consistent with the deep layers carrying fine detail.
Per-layer reweighting of Krea 2's conditioning was introduced by nova452/ComfyUI-ConditioningKrea2Rebalance and refined by huwhitememes/comfyui-krea2-conditioning.
Files
krea2_projector_original_weights.safetensorsβ a reference copy of the 12 learned projector weights above (the[1,12]tensor itself). Read-only reference, not a LoRA to apply.solo/projector_solo_bNN_Lxx.safetensorsβ 12 diagnostic probes. Each is a projector.diffthat, at strength 1, keeps one of the projector's 12 inputs and zeroes the other 11, so the DiT conditions on a single slot β useful to see what that slot contributes (deep slots render coherent images, shallow are noise, L14 carries text/structure, L35 alone is unusable).
Important: the projector's 12 inputs are the attention-mixed slots (output of the 2 layerwise blocks), not pristine encoder layers β and because the cross-layer attention routes through L20, every slot already carries L20 content. So a "solo Lx" isolates the slot indexed by layer x, not a clean layer x. These are interpretability probes, not generation LoRAs (keeping one input by design gives a partial/degraded image).
Each solo/ file is a diffusion_model.txtfusion.projector.diff patch (one [1,12] tensor, ~300 bytes),
loadable via the stock LoraLoaderModelOnly β no custom node. (ComfyUI calls the selected layers "taps".)
License
These artifacts derive from Krea 2 and are covered by the Krea 2 Community License (see the base model linked above).
Model tree for fbjr/krea-explorations
Base model
krea/Krea-2-Raw