AnyWorld v15

AV-domain camera-controllable video diffusion. A single front-frame anchor + per-camera trajectories produce scene-consistent multi-camera driving rollouts.

Code: https://github.com/chankyo-kim-tri/AnyWorld Maintainer: Chankyo Kim โ€” chankyo.kim@tri.global

What's in this repo

A single 32 GB PyTorch state-dict file:

model_native_compat.pt

Loadable by Lyra-2 native inference (lyra2_custom_traj_inference.py) with no special wrapper. Requires the Lyra-2 native base weights at inference (text encoder, VAE, image encoder); see the GitHub repo's checkpoints/README.md for the download chain.

Training facts

Item Value
Base Lyra-2 native (NVIDIA, Wan 2.1-14B foundation)
Dataset Waymo TFRecord, V=2 (FRONT + FRONT_LEFT), T=81, 384ร—576
Trainable Last 4 blocks (36โ€“39) self_attn + cross_attn + ffn only
Iter 1900 total (500 path_a + 700 continue_ps01)
Pose scale 0.1 (training/inference aligned)
Hardware 4ร— Blackwell B200, ~12 hr
New parameters None โ€” architecture identical to Lyra-2 native

See docs/ARCHITECTURE.md and docs/TRAINING.md on GitHub for full lineage.

Quick start

# 1. Clone code
git clone https://github.com/chankyo-kim-tri/AnyWorld.git
cd AnyWorld
bash setup/env.sh

# 2. Download this checkpoint
huggingface-cli login
huggingface-cli download ckkim10/AnyWorld-v15 model_native_compat.pt \
  --local-dir checkpoints/v15/

# 3. Get Lyra-2 native base weights from NVIDIA (separate license)
#    See checkpoints/README.md for download chain

# 4. Run inference
bash scripts/infer_singleview.sh \
  --anchor demos/seeds/.../t0.png \
  --traj demos/seeds/.../traj.npz \
  --caps demos/seeds/.../caption.json \
  --out out/

Known limits (read before using)

  1. Long-horizon drift beyond ~25 s of generation (T > 241 frames). Use T โ‰ค 241 for "safe" rollouts.
  2. Multi-view object permanence is not enforced (V independent calls, no cross-view sharing).
  3. Resolution locked at 384ร—576 (training distribution).
  4. Pose scale must match training (0.1 for raw Waymo poses).
  5. Trained on Waymo only โ€” Lyft and DDAD evaluations are zero-shot.

See docs/KNOWN_LIMITS.md on GitHub for full honest catalog with reproducers.

License

Apache 2.0 (weights). Note that inference composition with Lyra-2 native triggers the NVIDIA Lyra-2 License separately โ€” read upstream terms before downstream use.

See ATTRIBUTIONS.md for component-level attribution.

Citation

@misc{kim2026anyworld,
  author = {Kim, Chankyo and contributors},
  title  = {AnyWorld v15: AV-domain camera-controllable video diffusion},
  year   = {2026},
  note   = {Toyota Research Institute},
  url    = {https://github.com/chankyo-kim-tri/AnyWorld},
}

Access

This repository is private. For collaborator access, send your HuggingFace username to chankyo.kim@tri.global.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support