Instructions to use periphanes/cosmos3-nano-gr1-difforce-4k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Cosmos
How to use periphanes/cosmos3-nano-gr1-difforce-4k with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Cosmos3-Nano · GR-1 · Diffusion-Forcing — iter 4000
A mid-training research checkpoint (iteration 4000 / 20000) of NVIDIA Cosmos3-Nano finetuned on the GR-1 humanoid manipulation dataset in the native long-horizon "diffusion forcing" regime (temporal-causal, three-way attention).
⚠️ This is an interim checkpoint from an in-progress run, published for evaluation / reproducibility — not a final or converged model.
What this is
- Base:
nvidia/Cosmos3-Nano(two-tower Omni-MoT World Foundation Model; Qwen3-VL-8B language tower + Wan2.2 VAE). - Task: GR-1 forward-dynamics world modeling — predict future video latents conditioned on the first frame + the 44-DoF joint-action sequence.
- Regime: diffusion forcing — each latent video frame is noised at an independent σ, with
temporal-causal attention over generation supertokens (clean past conditions noisy future),
the basis for stable autoregressive rollout. (
causal_training_strategy=diffusion_forcing,video_temporal_causal=True,joint_attn_implementation=three_way.) - Dataset:
periphanes/gr1_mg_gr00t_300_new(GR-1 LeRobot v2.0).
Training summary
| Iteration | 4000 / 20000 |
| Hardware | 8× B200 (FSDP) |
| Packing | token-budget, 45056 tokens/seq |
| LR | 2e-4 |
| Dataset mode | forward_dynamics (video loss active; actions are clean conditioning) |
| Latent geometry | 17 RGB frames → tcf=4 → T_latent = 5; 256px → ÷16 → ÷2 patch → 8×8 = 64 patches/frame |
| Loss at iter 4000 | ~0.13 (video flow-matching) |
Format & contents
PyTorch Distributed Checkpoint (DCP), FSDP-sharded — model weights only (no optimizer
state). The model/ folder contains 8 shards __{0..7}_0.distcp (~11.4 GB each, ~85 GB total)
plus the required .metadata.
model/__0_0.distcp … __7_0.distcp
model/.metadata
training_config.yaml
Loading
Requires the NVIDIA Cosmos Framework (cosmos_framework). Place the downloaded model/
folder as the model sub-directory of a checkpoint dir and load it with the framework's DCP
loader (same path used for resume/eval).
Note: the temporal-causal generation path requires NATTEN ≥ 0.21.9.dev0 (
natten.varlen). If that wheel is unavailable, this checkpoint was trained with an opt-inflex_attentionblock-causal shim (COSMOS3_NATTEN_VARLEN_SHIM=1) that serves the same (full-window, temporal-causal) attention without NATTEN's varlen kernels.
License
Derived from nvidia/Cosmos3-Nano; usage is subject to the base model's NVIDIA Cosmos license
terms. Refer to the base model card for the authoritative license.
Model tree for periphanes/cosmos3-nano-gr1-difforce-4k
Base model
nvidia/Cosmos3-Nano