Instructions to use avlp12/Krea-2-Turbo-Alis-MLX-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use avlp12/Krea-2-Turbo-Alis-MLX-8bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Krea-2-Turbo-Alis-MLX-8bit avlp12/Krea-2-Turbo-Alis-MLX-8bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Krea 2 Turbo Β· Alis MLX (8-bit)
krea/Krea-2-Turbo β a 12.9B single-stream MMDiT text-to-image model β re-implemented for Apple MLX and validated numerically faithful (cos 1.000000) to the original PyTorch, then shipped as a near-lossless 8-bit build for Apple silicon.
Generated by this 8-bit MLX build (8-step Turbo, no guidance, 1024Β²). This is an independent, unofficial port β not an official Krea product and not endorsed by Krea.
Two builds available β pick by size vs fidelity: β’ this repo: 8-bit β 14.2 GB, near-lossless (vel-cos 0.99994) β’ Krea-2-Turbo-Alis-MLX-mixed-4-8 β 9.8 GB, smallest near-lossless build (down_proj + endpoints @8-bit, rest @4-bit)
The weight is not the point β the verification is
Every stage of this port was cross-checked against the original PyTorch reference
(krea-ai/krea-2) before the next stage was built β
float32, fixed seed, each pipeline fed its own raw-prompt inputs.
| Stage | Metric vs PyTorch | Result |
|---|---|---|
| Text encoder (Qwen3-VL-4B) | hidden states, 12 tapped layers | cos 1.000000 |
| Transformer (28-block DiT) | velocity field (rel-L2 3e-5) | cos 1.000000 |
| VAE (Qwen-Image, decode) | pixel cos vs π€ diffusers | cos 0.9994 |
| Full pipeline (end-to-end) | pixels, identical injected noise | cos 1.000000 |
Full end-to-end pixel cosine is 1.000000 (max|diff| 0.005, measured at 512Β²/8-step with
identical injected noise; per-component parity is resolution-independent) β the MLX output is
visually identical to the PyTorch reference. The transformer loads
krea/Krea-2-Turbo/turbo.safetensors with zero remapping (all 430 tensor names match the
module tree), and the VAE reuses mflux's already-validated QwenVAE. Raw numbers:
VALIDATION_LOG.txt. (The VAE's 0.9994 is lower than the full pipeline's
1.0 only because it was tested on a random latent β an OOD torture test; on the real latents
the pipeline produces, it rounds to 1.000000.)
Architecture (what was ported)
- Transformer β
SingleStreamDiT, 12.9B: 28 blocks Γ width 6144, GQA (48 query / 12 KV heads, head-dim 128), per-head QK-RMSNorm, learned sigmoid output gate, SwiGLU (16384), 3-axis RoPE. Atext_fusionmodule collapses the 12 tapped encoder layers (2 layerwise blocks βLinear(12β1)β 2 refiner blocks). Predicts the flow-matching velocity. - Text encoder β
Qwen3-VL-4B-Instruct, text-only, pure-MLX. For text-only conditioning the mRoPE collapses to standard rope. - VAE β
AutoencoderKLQwenImage(the Qwen-Image VAE), via mfluxQwenVAE. - Sampler β flow-matching Euler, 8 steps, guidance 0 (distilled Turbo).
Quickstart
Requires an Apple-silicon Mac (M1+) with β₯ 24 GB unified memory (32 GB+ recommended; 16 GB
will run out of memory at 1024Β² β use --width/--height 512). On macOS the commands are python3
(not python). Source code, the web UI, and the full validation harness are on GitHub:
github.com/avlp12/krea2_alis_mlx.
πΌοΈ Web UI β easiest (beginners start here)
python3 -m pip install mlx transformers "mflux>=0.18,<0.19" huggingface_hub gradio
hf download avlp12/Krea-2-Turbo-Alis-MLX-8bit --local-dir krea2-mlx
cd krea2-mlx
python3 app.py # opens http://localhost:7860 β type a prompt, click Generate β¨
The first run also downloads the Qwen3-VL-4B encoder, VAE, and tokenizer from
krea/Krea-2-Turbo (you accept Krea's license there), so give it a few minutes; only the
8-bit transformer lives in this repo. A 1024Γ1024 image takes ~50 s on an M3 Ultra (8
steps; slower chips take longer). An NSFW safety filter runs by default (redacts explicit
outputs; disable with the UI toggle, --no-safety, or KREA2_DISABLE_SAFETY=1).
π‘ The UI's Model dropdown switches between 8-bit and mixed-4/8 β the other build
downloads on first use. (CLI: add --precision mixed-4-8.)
β¨οΈ Command line
python3 generate.py "a red fox in the snow, photorealistic" --out fox.png
Flags: --width/--height 512|768|1024, --steps 8, --seed 0, --num-images 2.
π Python
from krea2.pipeline import Krea2Pipeline
pipe = Krea2Pipeline("transformer_8bit.safetensors", precision="8bit")
img = pipe.generate("a neon city street at night in the rain", width=1024, height=1024,
steps=8, seed=0)[0]
img.save("out.png")
Run the full-precision transformer instead (pulls turbo.safetensors from Krea):
python3 generate.py "a fox in the snow" --precision bf16
Quantization β and an honest note
The transformer is quantized sensitivity-graded: the 28 blocks' attention + SwiGLU matmuls
(224 layers) at 8-bit, group-size 64; everything precision-critical stays bf16 β
first / last projections, time-embeddings, the whole text_fusion module (incl. the
Linear(12β1) projector), and all norms / modulation. The text encoder and VAE are bf16.
| Build | Size (transformer) | Velocity cos vs bf16 (mean / min) | Per-step latency @1024Β² |
|---|---|---|---|
| bf16 (reference) | 25.6 GB | β | ~5800 ms |
| 8-bit Β· this release | 14.2 GB | 0.99994 / 0.99959 | ~5990 ms |
| mixed-4/8 (dynamic) | 9.8 GB | 0.99824 / 0.98710 | ~5990 ms |
| 4-bit | 8.2 GB | 0.99760 / 0.98666 | ~5990 ms |
Velocity cosine = mean / worst-case-min over 12 prompts Γ 8 denoising steps, on a fixed bf16
trajectory. Raw output: VALIDATION_LOG.txt.
Latency is identical across bit-widths β generation is attention-bound (attention isn't
quantized), so quantization gives no speedup here. On a big-RAM Mac, bf16 is the better
choice (--precision bf16); this 8-bit build exists purely for a smaller download /
portability, and it's near-lossless (velocity cos 0.99994). Quality was screened by per-step
velocity cosine on a fixed trajectory (not final-pixel cosine β an 8-step ODE makes that conflate
benign trajectory divergence with real degradation), across multiple prompts. The recipe was
reviewed by a 3-lens pass (codex + an adversarial red-team agent + a from-scratch blank-slate agent).
License & attribution
This is a modified derivative of krea/Krea-2-Turbo,
distributed under the Krea 2 Community License (a copy is in
LICENSE; attribution in NOTICE). By using these weights you agree to that
license. In particular:
- Naming. Per the license, derivative model names begin with "Krea".
- Commercial use is permitted only if your total annual revenue is under $1,000,000 USD;
otherwise you need an enterprise license from Krea (
opensource@krea.ai). - Content filtering. You must implement reasonable content-filtering safeguards in any deployment, and disclose AI-generated content where required by law. (The app ships a built-in pure-MLX NSFW filter β no PyTorch needed β on by default; if you disable it or deploy publicly the obligation is yours.)
- Acceptable Use Policy. Your use must comply with Krea's Acceptable Use Policy, which the license incorporates by reference.
- Not endorsed. Independent port; not an official Krea product and not endorsed by Krea.
You own the images you generate (subject to the license); Krea claims no ownership of outputs.
Credits
- Krea.ai β Krea 2 (base model & reference code)
- Qwen β Qwen-Image VAE & Qwen3-VL-4B text encoder
- mflux β MLX diffusion framework (VAE reused from here)
π¦ Source code, web UI & validation harness: github.com/avlp12/krea2_alis_mlx
Part of the Alis MLX line β see also avlp12/Krea-2-Turbo-Alis-MLX-mixed-4-8,
avlp12/Lance-3B-Alis-MLX-Traced, avlp12/GLM-5.2-Alis-MLX-Dynamic-3.5bpw.
- Downloads last month
- 69
8-bit


