Cosmos3-Nano GR-1 — Diffusion-Forcing + StateFix checkpoints

PyTorch Distributed Checkpoint (DCP) checkpoints from joint WM + Action SFT of NVIDIA Cosmos3-Nano (Omni-MoT World Foundation Model) on GR-1, using a causal diffusion-forcing schedule with a fixed proprio-state prefix ("statefix").

Two long-horizon variants are provided, each at iteration 20,000 (the latest checkpoint):

Subfolder Horizon chunk_length latent_t Training
h65/ 65-frame 64 17 8×B200, FSDP, 45,056-token packing
h129/ 129-frame 128 33 8×B200, FSDP, 45,056-token packing

Layout

Each subfolder mirrors the framework's checkpoint layout (full DCP — model + optimizer + scheduler + trainer state), so it can be resumed or evaluated directly:

<variant>/
  config.yaml, config.pkl, job_env.yaml, launch_info.yaml
  checkpoints/
    latest_checkpoint.txt
    iter_000020000/
      model/      # FSDP-sharded DCP (.distcp shards + .metadata)
      optim/      # optimizer state (enables training resume)
      scheduler/
      trainer/

Note: these are sharded DCP checkpoints (.distcp + .metadata), not consolidated safetensors. Load them with torch.distributed.checkpoint via the Cosmos3 framework, or consolidate to HF format with cosmos_framework.scripts.export_model.

Downloads last month
-
Video Preview
loading