Krea-2-Engineer V1 (4B)

Follow me on X @BennyDaBall_OG !

Experimental β€” read this first. This is V1: a first, purely experimental swing at the Krea-2 line. It fine-tunes only the text encoder of Krea-2 β€” no DiT, no VAE, no diffusion backbone touched. The goal isn't better text rendering; it's a better image out the other end β€” richer composition, lighting, and material detail from the exact same prompt and seed. It is not a finished product. It's me changing the smallest possible part of the model and measuring whether the pictures got better.

Model Metadata

Field Value
License other β€” Krea 2 Community License Agreement (see LICENSE.pdf)
Base Model krea/Krea-2-Turbo (text encoder)
Architecture Qwen3-VL ~4B text tower (Qwen3VLModel), language tower only
Method SMART DoRA (r64 / a64), differentiable drift hinge
Use ComfyUI text encoder (CLIPLoader, type krea2) Β· chat GGUF companion
Format Merged bf16 safetensors
Status V1 β€” experimental, text-encoder-only

A text encoder is the part of a diffusion model that turns your words into the numbers the image model actually listens to. Krea-2-Engineer retrains just that part of Krea-2 β€” leaving the renderer, the VAE, and every other weight exactly as Krea shipped them β€” so the same prompt lands as a more composed, better-lit, more material-aware image. Same model, same sampler, same seed; the only thing that changed is how the prompt gets heard.

It's the model whispering to itself, which is legally distinct from sorcery.

Krea-2-Engineer V1 β€” base vs V1 A/B contact sheet

10 prompts, same seed per row. Left column: the stock Krea-2 encoder. Right column: Krea-2-Engineer V1. Nothing else in the pipeline changed.

What is Krea-2-Engineer?

It is a DoRA fine-tune of the Qwen3-VL text encoder that ships inside krea/Krea-2-Turbo, trained on the Z-Image-Engineer V7 corpus. The diffusion transformer, the VAE, and the rest of Krea-2 are untouched β€” this is a drop-in replacement for one component.

Because that encoder is a real Qwen3-VL model under the hood, it has a second life: you can also talk to it / run it as a prompt writer in LM Studio (a GGUF build is on the way β€” see below).

Key Use Cases

  • Drop-in image enhancement β€” swap it in for the stock Krea-2 text encoder in ComfyUI and render as normal. Same prompt, more composed image.
  • Prompt writer β€” it was trained on the Engineer prompt-rewrite corpus, so it can also expand a rock into a composed, material-aware prompt instead of the usual very detailed rock, best quality, please clap sludge.
  • A base to iterate on β€” V1 is a clean encoder-only baseline. DPO and image-grounded GRPO stages are the obvious next moves.

Under the Hood: SMART DoRA

The encoder was tuned with the same training system that built the Z-Image-Engineer line, ported to Krea's Qwen3-VL encoder. Two forces do the work, plus four light-touch regularizers.

The writer loss teaches the model to produce richer conditioning. The drift hinge is the anti-forgetting leash β€” a per-token cosine constraint that lets the encoder change freely as long as each token's representation stays inside a cone around the original Krea weights. It can learn, but it can't wander off and lobotomize the base model. Across this entire run the mean cosine to base held at 0.984 (margin 0.92) β€” a precise tune, not a blowout.

Regularizer What it Does Why it Matters
Entropic Watches the output-token entropy vs a knowledge-mass estimate Keeps the conditioning confident, not mushy
Holographic Shapes how information spreads across the encoder's depth Keeps the representation well-conditioned layer-to-layer
Topological Constrains the token-to-token manifold geometry Stops the token relationships from collapsing
Manifold Light variance penalty directly on the adapter matrices Keeps the DoRA weights healthy

All four run as frozen, fixed-shape penalties (their internal heads do not train) β€” they nudge the encoder without becoming a second thing to babysit.

The Refinement Pipeline

  1. Load the stock Krea-2 Qwen3-VL text encoder (language tower only; the vision tower is loaded but frozen and unused for T2I).
  2. Attach SMART DoRA to the text-tower projections (q/k/v/o/gate/up/down). Nothing else is trainable.
  3. Train one epoch on the V7 corpus: writer CE + 0.5Β·drift_hinge + SMART.
  4. Merge the adapter and export bf16 for transformers, plus a key-remapped krea2 encoder for ComfyUI.

Quick Start

ComfyUI (image enhancement β€” the main event)

Drop Krea2-Engineer-V1-bf16.safetensors into ComfyUI/models/text_encoders/, then in your Krea-2 Turbo workflow point the CLIPLoader at it with type: krea2. Everything else stays stock.

Verified Image Settings

Diffusion Model : krea2_turbo_fp8_scaled.safetensors   (UNETLoader, weight_dtype default)
Text Encoder    : Krea2-Engineer-V1-bf16.safetensors   (CLIPLoader, type: krea2)
VAE             : qwen_image_vae.safetensors
Resolution      : 1024 x 1024
Steps           : 8
CFG             : 1.0
Sampler         : euler
Scheduler       : simple
Negative        : (none β€” Turbo runs at CFG 1.0)

LM Studio (talk to it / prompt writer)

Because the encoder is a real Qwen3-VL model, it also runs as a chat / prompt-writer LLM. Grab the Q8_0 GGUF from the companion repo BennyDaBall/Krea-2-Engineer-V1-GGUF, load it in LM Studio, and paste the prompt-writer system prompt (in the companion card). Feed it a lazy seed like a rock and it returns one composed, cinematic image prompt β€” no tag soup.


Training Facts

I believe in open science, which is just a fancy way of saying "show the receipts."

Hardware

  • Trained locally on a single RTX 5090 (32GB).
  • PyTorch 2.10.0+cu130, CUDA 13.0, transformers 5.7, peft 0.19.1.

Dataset

  • Z-Image-Engineer V7 corpus.
  • Train rows: 74,540 Β· Eval rows: 1,400.

Configuration

Parameter Specification
Base text encoder krea/Krea-2-Turbo/text_encoder (Qwen3-VL ~4B)
Tokenizer krea/Krea-2-Turbo/tokenizer (Qwen2Tokenizer, ChatML)
Adapter SMART DoRA, rank 64 / alpha 64 / dropout 0.03, stable-dora
Target modules q_proj k_proj v_proj o_proj gate_proj up_proj down_proj (text tower only)
Conditioning tap penultimate hidden layer (hidden_states[-2])
Drift hinge cosine margin 0.92, weight 0.5, every 2 micro-steps
Schedule 1 epoch Β· 4,659 steps Β· batch 2 Γ— grad-accum 8 Β· lr 1e-4 cosine
Wall-clock ~15.5 h
Final eval loss 1.150 (from 1.317 at the first eval)
Final drift mean-cosine 0.984 (held inside the cone the entire run)

The vision tower of the Qwen3-VL encoder is never trained or used β€” this is a pure text-conditioning fine-tune.


GGUF β€” Chat / Prompt Writer

The Qwen3-VL text tower is repacked as a standard Qwen3 model and quantized, so it loads in stock LM Studio with no special build.

  • Companion repo: BennyDaBall/Krea-2-Engineer-V1-GGUF
  • Quant: Q8_0 (~4 GB, near-lossless). More quants on request.
  • Use it as a prompt enhancer β€” feed a lazy seed, get a composed cinematic prompt back, then paste that into your image workflow.

Verification & Proof

The A/B contact sheet above (evidence/krea2_ab_sheet_v1.png) is the honest evidence: 10 prompts, fixed seed per row, only the text encoder swapped between columns (stock Krea-2 encoder vs V1). No cherry-picking the sampler, no different seeds, no rewrite step β€” just the encoder. A precision-matched bf16 build of the un-trained base was used as the control so the difference is the training, not a quantization artifact.

A broader 50-prompt A/B sheet (portraits, landscapes, food, animals, products, architecture) is included at evidence/Krea2-Engineer-V1_AB_50.png β€” same controls, 100 renders, for the full picture.

This is V1. The differences are real but measured β€” a targeted lift, not a transformation. That's by design: the drift hinge deliberately keeps the encoder close to Krea's, so you do not get a full-epoch run nuking the base model's quality.


Disclaimer & Acknowledgements

This is an experimental, first-attempt, text-encoder-only fine-tune. It changes how prompts are encoded; it does not retrain Krea-2 itself, and it does not guarantee a perfect seed every single time. Diffusion is still diffusion. Use creative judgment locally.

This model is a Derivative of krea/Krea-2-Turbo β€” the text encoder has been modified (fine-tuned) by BennyDaBall. It is not an official Krea product and is not endorsed by Krea.

Thanks to:

  • Krea for releasing Krea-2 and its weights (krea/Krea-2-Turbo).
  • Tongyi-MAI and Qwen for the Qwen3-VL text-encoder backbone this all rides on.
  • The open-source maintainers behind ComfyUI, PEFT, Transformers, llama.cpp, and LM Studio.
  • My local power utility, which now classifies me as a small industrial facility.

License

Released under the Krea 2 Community License Agreement (full text in LICENSE.pdf; canonical: https://www.krea.ai/krea-2-licensing). As a Derivative of Krea-2, this model carries that license forward. In plain terms:

  • Commercial use is allowed only if your total company-wide annual revenue is under $1,000,000 USD (trailing twelve months). At or above that, you need an Enterprise License from Krea.
  • If you deploy it, you must implement reasonable content filtering, keep the "Krea" name prefix, ship the agreement + the notice below, state that the model was modified, and not relicense the Krea-derived weights under a more permissive license.

NOTICE: Krea 2 is licensed under the Krea 2 Community License Agreement. For more information, visit https://krea.ai/krea-2-licensing.


Built & trained locally with care by BennyDaBall.

Follow me on X @BennyDaBall_OG !

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for BennyDaBall/Krea-2-Engineer-V1

Base model

krea/Krea-2-Raw
Finetuned
(3)
this model