FLUX.2 Klein 4B WebGPU Low-Bit Bundle
This is the browser model bundle for the static FLUX.2 WebGPU app at
ryanhlewis/flux2-webgpu.
It is derived from black-forest-labs/FLUX.2-klein-4B and the ONNX WebGPU q4
bundle by MarkShark2/flux2-klein-4b-onnx-webgpu-q4, with an additional custom
low-bit WebGPU transformer runtime payload used by
ryanhlewis/flux2-webgpu.
Contents
| Component | Size |
|---|---|
| Custom low-bit transformer assets | 7.27 GB / 6.77 GiB |
| Upstream q4 text encoder and ONNX transformer fallback | 5.13 GB / 4.77 GiB |
| VAE decoder/encoder, tokenizer, config overlay | 0.18 GB / 0.17 GiB |
| Precomputed default prompt context/projection | 0.01 GB / 0.01 GiB |
| Total staged model repository payload | 12.58 GB / 11.72 GiB |
The default static UI uses the custom low-bit transformer path. Arbitrary prompts require the q4 text encoder. The q4 ONNX transformer files are included as a fallback path, but the app defaults to the custom low-bit transformer runtime.
Static Browser Use
Serve the app files from the GitHub repository or the linked Static Space and set:
globalThis.FLUX2_MODEL_BASE_URL = "https://huggingface.co/ryanhlewis/flux2-klein-4b-webgpu-lowbit/resolve/main";
globalThis.FLUX2_RUNTIME_BASE_URL = globalThis.FLUX2_MODEL_BASE_URL;
globalThis.FLUX2_CUSTOM_KERNEL_BASE_URL = `${globalThis.FLUX2_MODEL_BASE_URL}/custom_lowbit`;
No Python, Gradio server, CUDA, or ROCm runtime is required for generation in the Static Space. The user's browser downloads model files and runs inference locally through WebGPU.
Benchmarks
Local headed Chrome/Edge WebGPU benchmarks on the development machine, after background preparation, with result cache disabled, default prompt, fixed seeds, custom low-bit WebGPU backend, and exact full-resolution render/decode:
| Size | Browser elapsed | Transformer | VAE | Notes |
|---|---|---|---|---|
| 256x256 | 0.817s | 0.603s | 0.213s | exact |
| 512x512 | 4.438s | 2.922s | 1.514s | exact |
| 768x768 | 10.723s | 8.009s | 2.712s | exact |
| 1024x1024 | 45.434s | 24.108s | 21.324s | exact, VAE-heavy |
These are browser WebGPU numbers, not PyTorch/CUDA numbers. First load will also include model download and browser cache population time.
Example Outputs
Default robot prompt, seed 123, quality/custom WebGPU path:
Memory Expectations
WebGPU does not expose exact VRAM usage to JavaScript, so the app reports observable browser memory and known tensor/buffer sizes. The warmed custom transformer path accounts for roughly 3.7 GB of loaded GPU/buffer data. Exact 256 and 512 are realistic on an 8 GB WebGPU adapter when browser memory limits allow it. Exact 768 and 1024 are memory-heavy; the 1024 benchmark completed with browser heap around 4.2-4.3 GiB during VAE decode.
License
This bundle is derived from upstream FLUX.2 Klein 4B assets. Use and redistribution must comply with the upstream model license and applicable Hugging Face terms.
Model tree for ryanhlewis/flux2-klein-4b-webgpu-lowbit
Base model
black-forest-labs/FLUX.2-klein-4B

