SANA-WM_bidirectional Overlay

SGLang overlay metadata for running Efficient-Large-Model/SANA-WM_bidirectional with the diffusion runtime.

This repo contains overlay metadata and a custom materializer only. It does not contain model weights. SGLang downloads the source weights from the upstream NVIDIA/NVlabs model repo, then this overlay rewrites the source directory into the Diffusers-style layout consumed by ComposedPipelineBase.

The generated layout targets SanaWMTwoStagePipeline: stage 1 runs the SANA-WM dual-branch DiT, and stage 2 loads the upstream LTX-2 refiner from refiner/ before VAE decoding.

Usage

After the corresponding SGLang registry entry is merged and pinned:

sglang generate \
  --model-path Efficient-Large-Model/SANA-WM_bidirectional \
  --prompt "A drone flyover of a snow-capped mountain at sunrise" \
  --num-frames 321 \
  --save-output

For local testing before the built-in registry is available, either point directly at this overlay repo:

sglang generate \
  --model-path sjmshsh/SANA-WM_bidirectional-overlay \
  --prompt "A drone flyover of a snow-capped mountain at sunrise" \
  --num-frames 321 \
  --save-output

or register the source model through the environment:

export SGLANG_DIFFUSION_MODEL_OVERLAY_REGISTRY='{
  "Efficient-Large-Model/SANA-WM_bidirectional": {
    "overlay_repo_id": "sjmshsh/SANA-WM_bidirectional-overlay",
    "overlay_revision": "main"
  }
}'

For an SGLang PR, replace main with the pushed overlay commit SHA in BUILTIN_MODEL_OVERLAY_REGISTRY.

Source / overlay layout

Source (NVlabs upstream) Overlay output (Diffusers-style)
config.yaml (parsed; drives synthesized configs below)
dit/sana_wm_1600m_720p.safetensors transformer/diffusion_pytorch_model.safetensors
โ€” (synthesized) transformer/config.json
vae/* vae/*
โ€” (synthesized) scheduler/scheduler_config.json
โ€” (downloaded from google/gemma-2-2b-it) text_encoder/*, tokenizer/*
refiner/transformer/* refiner/transformer/*
refiner/connectors/* refiner/connectors/*
refiner/text_encoder/* refiner/text_encoder/*

The stage-1 text encoder (google/gemma-2-2b-it) is a gated repo. Accept the Gemma license on Hugging Face once before first run. The materializer pins it to revision 299a8560bedf22ed1c72a8a11e7dce4a7f9f51f8. The stage-2 refiner text encoder is copied from the upstream SANA-WM source repo under refiner/text_encoder.

Source revision

The materializer is tested against source commit 90e0ff3b8f1f9b54a92b4b707edeaa27073aec84 of Efficient-Large-Model/SANA-WM_bidirectional. The current SGLang overlay loader records this in _overlay/overlay_manifest.json, but the effective source revision is still controlled by the model download path or SGLang's registry/config. If upstream restructures the repo, update _overlay/overlay_manifest.json, _overlay/materialize.py, and bump materializer_version.

Files

  • model_index.json โ€” Diffusers-style pipeline class and component spec consumed by SGLang.
  • _overlay/overlay_manifest.json โ€” declares the source repo id, required stage-1 and stage-2 source files, and the materializer script.
  • _overlay/materialize.py โ€” translates the NVlabs layout into the Diffusers-style tree.

Notes

  • This overlay is intended for SGLang builds that include SanaWMTwoStagePipeline and the SANA-WM LTX-2 refiner stage.
  • The upstream model weights remain governed by the upstream model repo and any gated dependency licenses.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sjmshsh/SANA-WM_bidirectional-overlay

Finetuned
(2)
this model