Krea 2 Turbo · Alis MLX (8-bit)

krea/Krea-2-Turbo — a 12.9B single-stream MMDiT text-to-image model — re-implemented for Apple MLX and validated numerically faithful (cos 1.000000) to the original PyTorch, then shipped as a near-lossless 8-bit build for Apple silicon.

Generated by this 8-bit MLX build (8-step Turbo, no guidance, 1024²). This is an independent, unofficial port — not an official Krea product and not endorsed by Krea.

Two builds available — pick by size vs fidelity: • this repo: 8-bit — 14.2 GB, near-lossless (vel-cos 0.99994) • Krea-2-Turbo-Alis-MLX-mixed-4-8 — 9.8 GB, smallest near-lossless build (down_proj + endpoints @8-bit, rest @4-bit)

The weight is not the point — the verification is

Every stage of this port was cross-checked against the original PyTorch reference (krea-ai/krea-2) before the next stage was built — float32, fixed seed, each pipeline fed its own raw-prompt inputs.

Stage	Metric vs PyTorch	Result
Text encoder (Qwen3-VL-4B)	hidden states, 12 tapped layers	cos 1.000000
Transformer (28-block DiT)	velocity field (rel-L2 3e-5)	cos 1.000000
VAE (Qwen-Image, decode)	pixel cos vs 🤗 diffusers	cos 0.9994
Full pipeline (end-to-end)	pixels, identical injected noise	cos 1.000000

Full end-to-end pixel cosine is 1.000000 (max|diff| 0.005, measured at 512²/8-step with identical injected noise; per-component parity is resolution-independent) — the MLX output is visually identical to the PyTorch reference. The transformer loads krea/Krea-2-Turbo/turbo.safetensors with zero remapping (all 430 tensor names match the module tree), and the VAE reuses mflux's already-validated QwenVAE. Raw numbers: VALIDATION_LOG.txt. (The VAE's 0.9994 is lower than the full pipeline's 1.0 only because it was tested on a random latent — an OOD torture test; on the real latents the pipeline produces, it rounds to 1.000000.)

Architecture (what was ported)

Transformer — SingleStreamDiT, 12.9B: 28 blocks × width 6144, GQA (48 query / 12 KV heads, head-dim 128), per-head QK-RMSNorm, learned sigmoid output gate, SwiGLU (16384), 3-axis RoPE. A text_fusion module collapses the 12 tapped encoder layers (2 layerwise blocks → Linear(12→1) → 2 refiner blocks). Predicts the flow-matching velocity.
Text encoder — Qwen3-VL-4B-Instruct, text-only, pure-MLX. For text-only conditioning the mRoPE collapses to standard rope.
VAE — AutoencoderKLQwenImage (the Qwen-Image VAE), via mflux QwenVAE.
Sampler — flow-matching Euler, 8 steps, guidance 0 (distilled Turbo).

Quickstart

Requires an Apple-silicon Mac (M1+) with ≥ 24 GB unified memory (32 GB+ recommended; 16 GB will run out of memory at 1024² — use --width/--height 512). On macOS the commands are python3 (not python). Source code, the web UI, and the full validation harness are on GitHub: github.com/avlp12/krea2_alis_mlx.

🖼️ Web UI — easiest (beginners start here)

python3 -m pip install mlx transformers "mflux>=0.18,<0.19" huggingface_hub gradio
hf download avlp12/Krea-2-Turbo-Alis-MLX-8bit --local-dir krea2-mlx
cd krea2-mlx
python3 app.py            # opens http://localhost:7860 — type a prompt, click Generate ✨

The first run also downloads the Qwen3-VL-4B encoder, VAE, and tokenizer from krea/Krea-2-Turbo (you accept Krea's license there), so give it a few minutes; only the 8-bit transformer lives in this repo. A 1024×1024 image takes ~50 s on an M3 Ultra (8 steps; slower chips take longer). An NSFW safety filter runs by default (redacts explicit outputs; disable with the UI toggle, --no-safety, or KREA2_DISABLE_SAFETY=1).

💡 The UI's Model dropdown switches between 8-bit and mixed-4/8 — the other build downloads on first use. (CLI: add --precision mixed-4-8.)

⌨️ Command line

python3 generate.py "a red fox in the snow, photorealistic" --out fox.png

Flags: --width/--height 512|768|1024, --steps 8, --seed 0, --num-images 2.

🐍 Python

from krea2.pipeline import Krea2Pipeline

pipe = Krea2Pipeline("transformer_8bit.safetensors", precision="8bit")
img = pipe.generate("a neon city street at night in the rain", width=1024, height=1024,
                    steps=8, seed=0)[0]
img.save("out.png")

Run the full-precision transformer instead (pulls turbo.safetensors from Krea):

python3 generate.py "a fox in the snow" --precision bf16

Quantization — and an honest note

The transformer is quantized sensitivity-graded: the 28 blocks' attention + SwiGLU matmuls (224 layers) at 8-bit, group-size 64; everything precision-critical stays bf16 — first / last projections, time-embeddings, the whole text_fusion module (incl. the Linear(12→1) projector), and all norms / modulation. The text encoder and VAE are bf16.

Build	Size (transformer)	Velocity cos vs bf16 (mean / min)	Per-step latency @1024²
bf16 (reference)	25.6 GB	—	~5800 ms
8-bit · this release	14.2 GB	0.99994 / 0.99959	~5990 ms
mixed-4/8 (dynamic)	9.8 GB	0.99824 / 0.98710	~5990 ms
4-bit	8.2 GB	0.99760 / 0.98666	~5990 ms

Velocity cosine = mean / worst-case-min over 12 prompts × 8 denoising steps, on a fixed bf16 trajectory. Raw output: VALIDATION_LOG.txt.

Latency is identical across bit-widths — generation is attention-bound (attention isn't quantized), so quantization gives no speedup here. On a big-RAM Mac, bf16 is the better choice (--precision bf16); this 8-bit build exists purely for a smaller download / portability, and it's near-lossless (velocity cos 0.99994). Quality was screened by per-step velocity cosine on a fixed trajectory (not final-pixel cosine — an 8-step ODE makes that conflate benign trajectory divergence with real degradation), across multiple prompts. The recipe was reviewed by a 3-lens pass (codex + an adversarial red-team agent + a from-scratch blank-slate agent).

License & attribution

This is a modified derivative of krea/Krea-2-Turbo, distributed under the Krea 2 Community License (a copy is in LICENSE; attribution in NOTICE). By using these weights you agree to that license. In particular:

Naming. Per the license, derivative model names begin with "Krea".
Commercial use is permitted only if your total annual revenue is under $1,000,000 USD; otherwise you need an enterprise license from Krea (opensource@krea.ai).
Content filtering. You must implement reasonable content-filtering safeguards in any deployment, and disclose AI-generated content where required by law. (The app ships a built-in pure-MLX NSFW filter — no PyTorch needed — on by default; if you disable it or deploy publicly the obligation is yours.)
Acceptable Use Policy. Your use must comply with Krea's Acceptable Use Policy, which the license incorporates by reference.
Not endorsed. Independent port; not an official Krea product and not endorsed by Krea.

You own the images you generate (subject to the license); Krea claims no ownership of outputs.

Credits

Krea.ai — Krea 2 (base model & reference code)
Qwen — Qwen-Image VAE & Qwen3-VL-4B text encoder
mflux — MLX diffusion framework (VAE reused from here)

📦 Source code, web UI & validation harness: github.com/avlp12/krea2_alis_mlx

Part of the Alis MLX line — see also avlp12/Krea-2-Turbo-Alis-MLX-mixed-4-8, avlp12/Lance-3B-Alis-MLX-Traced, avlp12/GLM-5.2-Alis-MLX-Dynamic-3.5bpw.

Downloads last month: 69

MLX

Hardware compatibility

8-bit

Model tree for avlp12/Krea-2-Turbo-Alis-MLX-8bit

Base model

krea/Krea-2-Raw

Finetuned

krea/Krea-2-Turbo

Quantized

(19)

this model