FLUX.2 Klein 4B WebGPU Low-Bit Bundle

This is the browser model bundle for the static FLUX.2 WebGPU app at ryanhlewis/flux2-webgpu. It is derived from black-forest-labs/FLUX.2-klein-4B and the ONNX WebGPU q4 bundle by MarkShark2/flux2-klein-4b-onnx-webgpu-q4, with an additional custom low-bit WebGPU transformer runtime payload used by ryanhlewis/flux2-webgpu.

Component	Size
Custom low-bit transformer assets	7.27 GB / 6.77 GiB
Upstream q4 text encoder and ONNX transformer fallback	5.13 GB / 4.77 GiB
VAE decoder/encoder, tokenizer, config overlay	0.18 GB / 0.17 GiB
Precomputed default prompt context/projection	0.01 GB / 0.01 GiB
Total staged model repository payload	12.58 GB / 11.72 GiB

The default static UI uses the custom low-bit transformer path. Arbitrary prompts require the q4 text encoder. The q4 ONNX transformer files are included as a fallback path, but the app defaults to the custom low-bit transformer runtime.

Static Browser Use

Serve the app files from the GitHub repository or the linked Static Space and set:

globalThis.FLUX2_MODEL_BASE_URL = "https://huggingface.co/ryanhlewis/flux2-klein-4b-webgpu-lowbit/resolve/main";
globalThis.FLUX2_RUNTIME_BASE_URL = globalThis.FLUX2_MODEL_BASE_URL;
globalThis.FLUX2_CUSTOM_KERNEL_BASE_URL = `${globalThis.FLUX2_MODEL_BASE_URL}/custom_lowbit`;

No Python, Gradio server, CUDA, or ROCm runtime is required for generation in the Static Space. The user's browser downloads model files and runs inference locally through WebGPU.

Benchmarks

Local headed Chrome/Edge WebGPU benchmarks on the development machine, after background preparation, with result cache disabled, default prompt, fixed seeds, custom low-bit WebGPU backend, and exact full-resolution render/decode:

Size	Browser elapsed	Transformer	VAE	Notes
256x256	0.817s	0.603s	0.213s	exact
512x512	4.438s	2.922s	1.514s	exact
768x768	10.723s	8.009s	2.712s	exact
1024x1024	45.434s	24.108s	21.324s	exact, VAE-heavy

These are browser WebGPU numbers, not PyTorch/CUDA numbers. First load will also include model download and browser cache population time.

Example Outputs

Default robot prompt, seed 123, quality/custom WebGPU path:

256	512	1024

Memory Expectations

WebGPU does not expose exact VRAM usage to JavaScript, so the app reports observable browser memory and known tensor/buffer sizes. The warmed custom transformer path accounts for roughly 3.7 GB of loaded GPU/buffer data. Exact 256 and 512 are realistic on an 8 GB WebGPU adapter when browser memory limits allow it. Exact 768 and 1024 are memory-heavy; the 1024 benchmark completed with browser heap around 4.2-4.3 GiB during VAE decode.

License

This bundle is derived from upstream FLUX.2 Klein 4B assets. Use and redistribution must comply with the upstream model license and applicable Hugging Face terms.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for ryanhlewis/flux2-klein-4b-webgpu-lowbit

Base model

black-forest-labs/FLUX.2-klein-4B

Quantized

(23)

this model

ryanhlewis
/

flux2-klein-4b-webgpu-lowbit