Lens (base) — MLX pre-quantized tiers (SceneWorks)

Native-MLX, pre-quantized re-host of the base Lens model (microsoft/Lens, MIT) for on-device Apple-Silicon inference via mlx-gen's mlx-gen-lens provider (SceneWorks). The heavy components are packed offline so a tier loads directly with no dense transient and no in-app quantization (epic 8506, sc-8767).

Microsoft removed microsoft/Lens from the Hub; the base DiT here was recovered from the public ungated re-package Comfy-Org/Lens (diffusion_models/lens_bf16.safetensors), whose keys are byte-identical to the diffusers LensTransformer2DModel state dict. Base Lens and Lens-Turbo differ only in the DiT weights; this re-host reuses the shared gpt-oss-20b text encoder + Flux.2 VAE + tokenizer + scheduler from SceneWorks/lens-turbo-mlx.

Base Lens is undistilled — use a higher step count (~20–26) with CFG ~5.0 (the mlx-gen-lens lens id defaults to 20 steps / CFG 5.0), unlike the distilled Turbo (4 steps / guidance 1.0).

Tiers

Each subdirectory is a full, self-contained turnkey snapshot (the diffusers multi-component tree — transformer/, text_encoder/, vae/, tokenizer/, scheduler/, model_index.json):

Tier	Dir	What is packed
Q4 (default)	`q4/`	DiT + gpt-oss encoder MoE experts → MLX group-64 affine 4-bit
Q8	`q8/`	DiT + gpt-oss encoder MoE experts → MLX group-64 affine 8-bit
bf16	`bf16/`	dense mirror of the source (no quantization)

Two components are quantized (matching the load-time .quantize scope):

DiT — img_in/txt_in/proj_out + every block's fused-QKV attention projections (img_qkv/txt_qkv/to_out.0/to_add_out) and SwiGLU MLPs. The timestep embedder, AdaLN modulations, and all norms stay full precision.
gpt-oss-20b encoder MoE experts — the source ships these as MXFP4; the packed tiers store them as MLX group-64 affine Q4/Q8 (stacked experts.{gate_up,down}_proj.{weight,scales,biases}). The router / attention / embeddings / norms stay dense.

The VAE (the shared Flux.2 decoder) always runs f32 and is shipped dense in every tier.

The pack is byte-identical to what the load-time quantizer produces (bf16 cast, group 64), verified in-repo (mlx-gen-lens convert/quant byte-identity tests) and by an on-device render gate.

License

MIT, inherited from microsoft/Lens. The shared text encoder is openai/gpt-oss-20b (Apache-2.0) and the VAE is black-forest-labs/FLUX.2-dev (Apache-2.0). This is a format re-host; all model weights and credit belong to the original authors (Microsoft Research; OpenAI; Black Forest Labs).

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized

Model tree for SceneWorks/lens-mlx

Base model

microsoft/Lens

Finetuned

(6)

this model