SANA-WM_bidirectional Overlay

SGLang overlay metadata for running Efficient-Large-Model/SANA-WM_bidirectional with the diffusion runtime.

This repo contains overlay metadata and a custom materializer only. It does not contain model weights. SGLang downloads the source weights from the upstream NVIDIA/NVlabs model repo, then this overlay rewrites the source directory into the Diffusers-style layout consumed by ComposedPipelineBase.

The generated layout targets SanaWMTwoStagePipeline: stage 1 runs the SANA-WM dual-branch DiT, and stage 2 loads the upstream LTX-2 refiner from refiner/ before VAE decoding.

Usage

After the corresponding SGLang registry entry is merged and pinned:

sglang generate \
  --model-path Efficient-Large-Model/SANA-WM_bidirectional \
  --prompt "A drone flyover of a snow-capped mountain at sunrise" \
  --num-frames 321 \
  --save-output

For local testing before the built-in registry is available, either point directly at this overlay repo:

sglang generate \
  --model-path sjmshsh/SANA-WM_bidirectional-overlay \
  --prompt "A drone flyover of a snow-capped mountain at sunrise" \
  --num-frames 321 \
  --save-output

or register the source model through the environment:

export SGLANG_DIFFUSION_MODEL_OVERLAY_REGISTRY='{
  "Efficient-Large-Model/SANA-WM_bidirectional": {
    "overlay_repo_id": "sjmshsh/SANA-WM_bidirectional-overlay",
    "overlay_revision": "main"
  }
}'

For an SGLang PR, replace main with the pushed overlay commit SHA in BUILTIN_MODEL_OVERLAY_REGISTRY.

Source / overlay layout

Source (NVlabs upstream)	Overlay output (Diffusers-style)
`config.yaml`	(parsed; drives synthesized configs below)
`dit/sana_wm_1600m_720p.safetensors`	`transformer/diffusion_pytorch_model.safetensors`
— (synthesized)	`transformer/config.json`
`vae/*`	`vae/*`
— (synthesized)	`scheduler/scheduler_config.json`
— (downloaded from `google/gemma-2-2b-it`)	`text_encoder/`, `tokenizer/`
`refiner/transformer/*`	`refiner/transformer/*`
`refiner/connectors/*`	`refiner/connectors/*`
`refiner/text_encoder/*`	`refiner/text_encoder/*`

The stage-1 text encoder (google/gemma-2-2b-it) is a gated repo. Accept the Gemma license on Hugging Face once before first run. The materializer pins it to revision 299a8560bedf22ed1c72a8a11e7dce4a7f9f51f8. The stage-2 refiner text encoder is copied from the upstream SANA-WM source repo under refiner/text_encoder.

Source revision

The materializer is tested against source commit 90e0ff3b8f1f9b54a92b4b707edeaa27073aec84 of Efficient-Large-Model/SANA-WM_bidirectional. The current SGLang overlay loader records this in _overlay/overlay_manifest.json, but the effective source revision is still controlled by the model download path or SGLang's registry/config. If upstream restructures the repo, update _overlay/overlay_manifest.json, _overlay/materialize.py, and bump materializer_version.

Files

model_index.json — Diffusers-style pipeline class and component spec consumed by SGLang.
_overlay/overlay_manifest.json — declares the source repo id, required stage-1 and stage-2 source files, and the materializer script.
_overlay/materialize.py — translates the NVlabs layout into the Diffusers-style tree.

Notes

This overlay is intended for SGLang builds that include SanaWMTwoStagePipeline and the SANA-WM LTX-2 refiner stage.
The upstream model weights remain governed by the upstream model repo and any gated dependency licenses.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sjmshsh/SANA-WM_bidirectional-overlay

Base model

Efficient-Large-Model/SANA-WM_bidirectional

Finetuned

(2)

this model