SANA-WM_bidirectional Overlay
SGLang overlay metadata for running Efficient-Large-Model/SANA-WM_bidirectional with the diffusion runtime.
This repo contains overlay metadata and a custom materializer only. It does not contain model weights. SGLang downloads the source weights from the upstream NVIDIA/NVlabs model repo, then this overlay rewrites the source directory into the Diffusers-style layout consumed by ComposedPipelineBase.
The generated layout targets SanaWMTwoStagePipeline: stage 1 runs the SANA-WM dual-branch DiT, and stage 2 loads the upstream LTX-2 refiner from refiner/ before VAE decoding.
Usage
After the corresponding SGLang registry entry is merged and pinned:
sglang generate \
--model-path Efficient-Large-Model/SANA-WM_bidirectional \
--prompt "A drone flyover of a snow-capped mountain at sunrise" \
--num-frames 321 \
--save-output
For local testing before the built-in registry is available, either point directly at this overlay repo:
sglang generate \
--model-path sjmshsh/SANA-WM_bidirectional-overlay \
--prompt "A drone flyover of a snow-capped mountain at sunrise" \
--num-frames 321 \
--save-output
or register the source model through the environment:
export SGLANG_DIFFUSION_MODEL_OVERLAY_REGISTRY='{
"Efficient-Large-Model/SANA-WM_bidirectional": {
"overlay_repo_id": "sjmshsh/SANA-WM_bidirectional-overlay",
"overlay_revision": "main"
}
}'
For an SGLang PR, replace main with the pushed overlay commit SHA in BUILTIN_MODEL_OVERLAY_REGISTRY.
Source / overlay layout
| Source (NVlabs upstream) | Overlay output (Diffusers-style) |
|---|---|
config.yaml |
(parsed; drives synthesized configs below) |
dit/sana_wm_1600m_720p.safetensors |
transformer/diffusion_pytorch_model.safetensors |
| โ (synthesized) | transformer/config.json |
vae/* |
vae/* |
| โ (synthesized) | scheduler/scheduler_config.json |
โ (downloaded from google/gemma-2-2b-it) |
text_encoder/*, tokenizer/* |
refiner/transformer/* |
refiner/transformer/* |
refiner/connectors/* |
refiner/connectors/* |
refiner/text_encoder/* |
refiner/text_encoder/* |
The stage-1 text encoder (google/gemma-2-2b-it) is a gated repo. Accept the Gemma license on Hugging Face once before first run. The materializer pins it to revision 299a8560bedf22ed1c72a8a11e7dce4a7f9f51f8. The stage-2 refiner text encoder is copied from the upstream SANA-WM source repo under refiner/text_encoder.
Source revision
The materializer is tested against source commit 90e0ff3b8f1f9b54a92b4b707edeaa27073aec84 of Efficient-Large-Model/SANA-WM_bidirectional. The current SGLang overlay loader records this in _overlay/overlay_manifest.json, but the effective source revision is still controlled by the model download path or SGLang's registry/config. If upstream restructures the repo, update _overlay/overlay_manifest.json, _overlay/materialize.py, and bump materializer_version.
Files
model_index.jsonโ Diffusers-style pipeline class and component spec consumed by SGLang._overlay/overlay_manifest.jsonโ declares the source repo id, required stage-1 and stage-2 source files, and the materializer script._overlay/materialize.pyโ translates the NVlabs layout into the Diffusers-style tree.
Notes
- This overlay is intended for SGLang builds that include
SanaWMTwoStagePipelineand the SANA-WM LTX-2 refiner stage. - The upstream model weights remain governed by the upstream model repo and any gated dependency licenses.
Model tree for sjmshsh/SANA-WM_bidirectional-overlay
Base model
Efficient-Large-Model/SANA-WM_bidirectional