Krea 2 Turbo Β· Alis MLX (8-bit)

krea/Krea-2-Turbo β€” a 12.9B single-stream MMDiT text-to-image model β€” re-implemented for Apple MLX and validated numerically faithful (cos 1.000000) to the original PyTorch, then shipped as a near-lossless 8-bit build for Apple silicon.

pipeline

samples

Generated by this 8-bit MLX build (8-step Turbo, no guidance, 1024Β²). This is an independent, unofficial port β€” not an official Krea product and not endorsed by Krea.

Two builds available β€” pick by size vs fidelity: β€’ this repo: 8-bit β€” 14.2 GB, near-lossless (vel-cos 0.99994) β€’ Krea-2-Turbo-Alis-MLX-mixed-4-8 β€” 9.8 GB, smallest near-lossless build (down_proj + endpoints @8-bit, rest @4-bit)


The weight is not the point β€” the verification is

Every stage of this port was cross-checked against the original PyTorch reference (krea-ai/krea-2) before the next stage was built β€” float32, fixed seed, each pipeline fed its own raw-prompt inputs.

validation

Stage Metric vs PyTorch Result
Text encoder (Qwen3-VL-4B) hidden states, 12 tapped layers cos 1.000000
Transformer (28-block DiT) velocity field (rel-L2 3e-5) cos 1.000000
VAE (Qwen-Image, decode) pixel cos vs πŸ€— diffusers cos 0.9994
Full pipeline (end-to-end) pixels, identical injected noise cos 1.000000

Full end-to-end pixel cosine is 1.000000 (max|diff| 0.005, measured at 512Β²/8-step with identical injected noise; per-component parity is resolution-independent) β€” the MLX output is visually identical to the PyTorch reference. The transformer loads krea/Krea-2-Turbo/turbo.safetensors with zero remapping (all 430 tensor names match the module tree), and the VAE reuses mflux's already-validated QwenVAE. Raw numbers: VALIDATION_LOG.txt. (The VAE's 0.9994 is lower than the full pipeline's 1.0 only because it was tested on a random latent β€” an OOD torture test; on the real latents the pipeline produces, it rounds to 1.000000.)


Architecture (what was ported)

  • Transformer β€” SingleStreamDiT, 12.9B: 28 blocks Γ— width 6144, GQA (48 query / 12 KV heads, head-dim 128), per-head QK-RMSNorm, learned sigmoid output gate, SwiGLU (16384), 3-axis RoPE. A text_fusion module collapses the 12 tapped encoder layers (2 layerwise blocks β†’ Linear(12β†’1) β†’ 2 refiner blocks). Predicts the flow-matching velocity.
  • Text encoder β€” Qwen3-VL-4B-Instruct, text-only, pure-MLX. For text-only conditioning the mRoPE collapses to standard rope.
  • VAE β€” AutoencoderKLQwenImage (the Qwen-Image VAE), via mflux QwenVAE.
  • Sampler β€” flow-matching Euler, 8 steps, guidance 0 (distilled Turbo).

Quickstart

Requires an Apple-silicon Mac (M1+) with β‰₯ 24 GB unified memory (32 GB+ recommended; 16 GB will run out of memory at 1024Β² β€” use --width/--height 512). On macOS the commands are python3 (not python). Source code, the web UI, and the full validation harness are on GitHub: github.com/avlp12/krea2_alis_mlx.

πŸ–ΌοΈ Web UI β€” easiest (beginners start here)

python3 -m pip install mlx transformers "mflux>=0.18,<0.19" huggingface_hub gradio
hf download avlp12/Krea-2-Turbo-Alis-MLX-8bit --local-dir krea2-mlx
cd krea2-mlx
python3 app.py            # opens http://localhost:7860 β€” type a prompt, click Generate ✨

The first run also downloads the Qwen3-VL-4B encoder, VAE, and tokenizer from krea/Krea-2-Turbo (you accept Krea's license there), so give it a few minutes; only the 8-bit transformer lives in this repo. A 1024Γ—1024 image takes ~50 s on an M3 Ultra (8 steps; slower chips take longer). An NSFW safety filter runs by default (redacts explicit outputs; disable with the UI toggle, --no-safety, or KREA2_DISABLE_SAFETY=1).

πŸ’‘ The UI's Model dropdown switches between 8-bit and mixed-4/8 β€” the other build downloads on first use. (CLI: add --precision mixed-4-8.)

⌨️ Command line

python3 generate.py "a red fox in the snow, photorealistic" --out fox.png

Flags: --width/--height 512|768|1024, --steps 8, --seed 0, --num-images 2.

🐍 Python

from krea2.pipeline import Krea2Pipeline

pipe = Krea2Pipeline("transformer_8bit.safetensors", precision="8bit")
img = pipe.generate("a neon city street at night in the rain", width=1024, height=1024,
                    steps=8, seed=0)[0]
img.save("out.png")

Run the full-precision transformer instead (pulls turbo.safetensors from Krea):

python3 generate.py "a fox in the snow" --precision bf16

Quantization β€” and an honest note

The transformer is quantized sensitivity-graded: the 28 blocks' attention + SwiGLU matmuls (224 layers) at 8-bit, group-size 64; everything precision-critical stays bf16 β€” first / last projections, time-embeddings, the whole text_fusion module (incl. the Linear(12β†’1) projector), and all norms / modulation. The text encoder and VAE are bf16.

Build Size (transformer) Velocity cos vs bf16 (mean / min) Per-step latency @1024Β²
bf16 (reference) 25.6 GB β€” ~5800 ms
8-bit Β· this release 14.2 GB 0.99994 / 0.99959 ~5990 ms
mixed-4/8 (dynamic) 9.8 GB 0.99824 / 0.98710 ~5990 ms
4-bit 8.2 GB 0.99760 / 0.98666 ~5990 ms

Velocity cosine = mean / worst-case-min over 12 prompts Γ— 8 denoising steps, on a fixed bf16 trajectory. Raw output: VALIDATION_LOG.txt.

Latency is identical across bit-widths β€” generation is attention-bound (attention isn't quantized), so quantization gives no speedup here. On a big-RAM Mac, bf16 is the better choice (--precision bf16); this 8-bit build exists purely for a smaller download / portability, and it's near-lossless (velocity cos 0.99994). Quality was screened by per-step velocity cosine on a fixed trajectory (not final-pixel cosine β€” an 8-step ODE makes that conflate benign trajectory divergence with real degradation), across multiple prompts. The recipe was reviewed by a 3-lens pass (codex + an adversarial red-team agent + a from-scratch blank-slate agent).


License & attribution

This is a modified derivative of krea/Krea-2-Turbo, distributed under the Krea 2 Community License (a copy is in LICENSE; attribution in NOTICE). By using these weights you agree to that license. In particular:

  • Naming. Per the license, derivative model names begin with "Krea".
  • Commercial use is permitted only if your total annual revenue is under $1,000,000 USD; otherwise you need an enterprise license from Krea (opensource@krea.ai).
  • Content filtering. You must implement reasonable content-filtering safeguards in any deployment, and disclose AI-generated content where required by law. (The app ships a built-in pure-MLX NSFW filter β€” no PyTorch needed β€” on by default; if you disable it or deploy publicly the obligation is yours.)
  • Acceptable Use Policy. Your use must comply with Krea's Acceptable Use Policy, which the license incorporates by reference.
  • Not endorsed. Independent port; not an official Krea product and not endorsed by Krea.

You own the images you generate (subject to the license); Krea claims no ownership of outputs.

Credits

  • Krea.ai β€” Krea 2 (base model & reference code)
  • Qwen β€” Qwen-Image VAE & Qwen3-VL-4B text encoder
  • mflux β€” MLX diffusion framework (VAE reused from here)

πŸ“¦ Source code, web UI & validation harness: github.com/avlp12/krea2_alis_mlx

Part of the Alis MLX line β€” see also avlp12/Krea-2-Turbo-Alis-MLX-mixed-4-8, avlp12/Lance-3B-Alis-MLX-Traced, avlp12/GLM-5.2-Alis-MLX-Dynamic-3.5bpw.

Downloads last month
69
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for avlp12/Krea-2-Turbo-Alis-MLX-8bit

Base model

krea/Krea-2-Raw
Quantized
(19)
this model