Lens (base) β€” MLX pre-quantized tiers (SceneWorks)

Native-MLX, pre-quantized re-host of the base Lens model (microsoft/Lens, MIT) for on-device Apple-Silicon inference via mlx-gen's mlx-gen-lens provider (SceneWorks). The heavy components are packed offline so a tier loads directly with no dense transient and no in-app quantization (epic 8506, sc-8767).

Microsoft removed microsoft/Lens from the Hub; the base DiT here was recovered from the public ungated re-package Comfy-Org/Lens (diffusion_models/lens_bf16.safetensors), whose keys are byte-identical to the diffusers LensTransformer2DModel state dict. Base Lens and Lens-Turbo differ only in the DiT weights; this re-host reuses the shared gpt-oss-20b text encoder + Flux.2 VAE + tokenizer + scheduler from SceneWorks/lens-turbo-mlx.

Base Lens is undistilled β€” use a higher step count (~20–26) with CFG ~5.0 (the mlx-gen-lens lens id defaults to 20 steps / CFG 5.0), unlike the distilled Turbo (4 steps / guidance 1.0).

Tiers

Each subdirectory is a full, self-contained turnkey snapshot (the diffusers multi-component tree β€” transformer/, text_encoder/, vae/, tokenizer/, scheduler/, model_index.json):

Tier Dir What is packed
Q4 (default) q4/ DiT + gpt-oss encoder MoE experts β†’ MLX group-64 affine 4-bit
Q8 q8/ DiT + gpt-oss encoder MoE experts β†’ MLX group-64 affine 8-bit
bf16 bf16/ dense mirror of the source (no quantization)

Two components are quantized (matching the load-time .quantize scope):

  • DiT β€” img_in/txt_in/proj_out + every block's fused-QKV attention projections (img_qkv/txt_qkv/to_out.0/to_add_out) and SwiGLU MLPs. The timestep embedder, AdaLN modulations, and all norms stay full precision.
  • gpt-oss-20b encoder MoE experts β€” the source ships these as MXFP4; the packed tiers store them as MLX group-64 affine Q4/Q8 (stacked experts.{gate_up,down}_proj.{weight,scales,biases}). The router / attention / embeddings / norms stay dense.

The VAE (the shared Flux.2 decoder) always runs f32 and is shipped dense in every tier.

The pack is byte-identical to what the load-time quantizer produces (bf16 cast, group 64), verified in-repo (mlx-gen-lens convert/quant byte-identity tests) and by an on-device render gate.

License

MIT, inherited from microsoft/Lens. The shared text encoder is openai/gpt-oss-20b (Apache-2.0) and the VAE is black-forest-labs/FLUX.2-dev (Apache-2.0). This is a format re-host; all model weights and credit belong to the original authors (Microsoft Research; OpenAI; Black Forest Labs).

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for SceneWorks/lens-mlx

Base model

microsoft/Lens
Finetuned
(6)
this model