AnyWorld v15
AV-domain camera-controllable video diffusion. A single front-frame anchor + per-camera trajectories produce scene-consistent multi-camera driving rollouts.
Code: https://github.com/chankyo-kim-tri/AnyWorld Maintainer: Chankyo Kim โ chankyo.kim@tri.global
What's in this repo
A single 32 GB PyTorch state-dict file:
model_native_compat.pt
Loadable by Lyra-2 native inference (lyra2_custom_traj_inference.py) with no special wrapper. Requires the Lyra-2 native base weights at inference (text encoder, VAE, image encoder); see the GitHub repo's checkpoints/README.md for the download chain.
Training facts
| Item | Value |
|---|---|
| Base | Lyra-2 native (NVIDIA, Wan 2.1-14B foundation) |
| Dataset | Waymo TFRecord, V=2 (FRONT + FRONT_LEFT), T=81, 384ร576 |
| Trainable | Last 4 blocks (36โ39) self_attn + cross_attn + ffn only |
| Iter | 1900 total (500 path_a + 700 continue_ps01) |
| Pose scale | 0.1 (training/inference aligned) |
| Hardware | 4ร Blackwell B200, ~12 hr |
| New parameters | None โ architecture identical to Lyra-2 native |
See docs/ARCHITECTURE.md and docs/TRAINING.md on GitHub for full lineage.
Quick start
# 1. Clone code
git clone https://github.com/chankyo-kim-tri/AnyWorld.git
cd AnyWorld
bash setup/env.sh
# 2. Download this checkpoint
huggingface-cli login
huggingface-cli download ckkim10/AnyWorld-v15 model_native_compat.pt \
--local-dir checkpoints/v15/
# 3. Get Lyra-2 native base weights from NVIDIA (separate license)
# See checkpoints/README.md for download chain
# 4. Run inference
bash scripts/infer_singleview.sh \
--anchor demos/seeds/.../t0.png \
--traj demos/seeds/.../traj.npz \
--caps demos/seeds/.../caption.json \
--out out/
Known limits (read before using)
- Long-horizon drift beyond ~25 s of generation (T > 241 frames). Use T โค 241 for "safe" rollouts.
- Multi-view object permanence is not enforced (V independent calls, no cross-view sharing).
- Resolution locked at 384ร576 (training distribution).
- Pose scale must match training (
0.1for raw Waymo poses). - Trained on Waymo only โ Lyft and DDAD evaluations are zero-shot.
See docs/KNOWN_LIMITS.md on GitHub for full honest catalog with reproducers.
License
Apache 2.0 (weights). Note that inference composition with Lyra-2 native triggers the NVIDIA Lyra-2 License separately โ read upstream terms before downstream use.
See ATTRIBUTIONS.md for component-level attribution.
Citation
@misc{kim2026anyworld,
author = {Kim, Chankyo and contributors},
title = {AnyWorld v15: AV-domain camera-controllable video diffusion},
year = {2026},
note = {Toyota Research Institute},
url = {https://github.com/chankyo-kim-tri/AnyWorld},
}
Access
This repository is private. For collaborator access, send your HuggingFace username to chankyo.kim@tri.global.