FLUX.2 Klein 4B WebGPU Low-Bit Bundle

This is the browser model bundle for the static FLUX.2 WebGPU app at ryanhlewis/flux2-webgpu. It is derived from black-forest-labs/FLUX.2-klein-4B and the ONNX WebGPU q4 bundle by MarkShark2/flux2-klein-4b-onnx-webgpu-q4, with an additional custom low-bit WebGPU transformer runtime payload used by ryanhlewis/flux2-webgpu.

Contents

Component Size
Custom low-bit transformer assets 7.27 GB / 6.77 GiB
Upstream q4 text encoder and ONNX transformer fallback 5.13 GB / 4.77 GiB
VAE decoder/encoder, tokenizer, config overlay 0.18 GB / 0.17 GiB
Precomputed default prompt context/projection 0.01 GB / 0.01 GiB
Total staged model repository payload 12.58 GB / 11.72 GiB

The default static UI uses the custom low-bit transformer path. Arbitrary prompts require the q4 text encoder. The q4 ONNX transformer files are included as a fallback path, but the app defaults to the custom low-bit transformer runtime.

Static Browser Use

Serve the app files from the GitHub repository or the linked Static Space and set:

globalThis.FLUX2_MODEL_BASE_URL = "https://huggingface.co/ryanhlewis/flux2-klein-4b-webgpu-lowbit/resolve/main";
globalThis.FLUX2_RUNTIME_BASE_URL = globalThis.FLUX2_MODEL_BASE_URL;
globalThis.FLUX2_CUSTOM_KERNEL_BASE_URL = `${globalThis.FLUX2_MODEL_BASE_URL}/custom_lowbit`;

No Python, Gradio server, CUDA, or ROCm runtime is required for generation in the Static Space. The user's browser downloads model files and runs inference locally through WebGPU.

Benchmarks

Local headed Chrome/Edge WebGPU benchmarks on the development machine, after background preparation, with result cache disabled, default prompt, fixed seeds, custom low-bit WebGPU backend, and exact full-resolution render/decode:

Size Browser elapsed Transformer VAE Notes
256x256 0.817s 0.603s 0.213s exact
512x512 4.438s 2.922s 1.514s exact
768x768 10.723s 8.009s 2.712s exact
1024x1024 45.434s 24.108s 21.324s exact, VAE-heavy

These are browser WebGPU numbers, not PyTorch/CUDA numbers. First load will also include model download and browser cache population time.

Example Outputs

Default robot prompt, seed 123, quality/custom WebGPU path:

256 512 1024
256 example 512 example 1024 example

Memory Expectations

WebGPU does not expose exact VRAM usage to JavaScript, so the app reports observable browser memory and known tensor/buffer sizes. The warmed custom transformer path accounts for roughly 3.7 GB of loaded GPU/buffer data. Exact 256 and 512 are realistic on an 8 GB WebGPU adapter when browser memory limits allow it. Exact 768 and 1024 are memory-heavy; the 1024 benchmark completed with browser heap around 4.2-4.3 GiB during VAE decode.

License

This bundle is derived from upstream FLUX.2 Klein 4B assets. Use and redistribution must comply with the upstream model license and applicable Hugging Face terms.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ryanhlewis/flux2-klein-4b-webgpu-lowbit

Quantized
(23)
this model

Space using ryanhlewis/flux2-klein-4b-webgpu-lowbit 1