YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Cosmos3-Nano-G1-BrainCo-ActionSFT
Cosmos3-Nano fine-tuned for Unitree G1 + BrainCo forward-dynamics prediction
This model is a supervised fine-tuned (SFT) version of NVIDIA Cosmos3-Nano trained on the G1 BrainCo manipulation dataset. Given an initial camera frame and a task prompt, it predicts future video frames and the robot's 26-DOF joint-angle trajectory simultaneously.
Model Details
| Property | Value |
|---|---|
| Base model | nvidia/Cosmos3-Nano |
| Fine-tuning mode | forward_dynamics |
| Robot embodiment | Unitree G1 + BrainCo dexterous hands |
| Action space | 26D joint angles (7L-arm + 7R-arm + 6L-hand + 6R-hand) |
| Domain ID | 30 (g1_brainco) |
| Training iterations | 800 |
| Checkpoint saved every | 200 iterations |
| Learning rate | 1e-5 |
| Optimizer | AdamW (ฮฒ=0.9/0.95, ฮต=1e-6) |
| Precision | bfloat16 |
| Action loss weight | 10.0 |
| EMA | enabled (rate=0.1) |
| LoRA | disabled (full fine-tune) |
| Hardware | 8ร NVIDIA L40 (49 GB) |
Training Data
The model was trained on 8 manipulation tasks from the G1 BrainCo LeRobot dataset:
| Task | Episodes | Frames |
|---|---|---|
| GraspOreo | 201 | ~198K |
| GraspRubiksCube | 197 | ~130K |
| PickApple | 200 | ~120K |
| PickCharger | 200 | ~150K |
| PickDoll | 200 | ~276K |
| PickDrink | 201 | ~143K |
| PickTissues | 206 | ~198K |
| PickToothpaste | 193 | ~312K |
Dataset format: LeRobot v3.0 โ 30 Hz, 4 camera views (left/right high + left/right wrist), 26D absolute joint angles.
Action normalization: quantile (q01/q99 โ [-1, 1]) per joint dimension.
How to Use
Setup
git clone https://huggingface.co/jfgpt/Cosmos3-Nano-G1-BrainCo-ActionSFT
pip install cosmos-framework # or: uv sync --all-extras --group=cu130
export LD_LIBRARY_PATH=''
Forward-Dynamics Inference
Given an initial video clip and an action sequence, predicts future video + next actions:
Input JSON (my_input.json):
{
"model_mode": "forward_dynamics",
"name": "pick_apple",
"domain_name": "g1_brainco",
"fps": 5,
"image_size": 480,
"action_chunk_size": 16,
"raw_action_dim": 26,
"view_point": "ego_view",
"prompt": "{\"subjects\":[{\"description\":\"A Unitree G1 humanoid robot with articulated arms and dexterous hands\",\"action\":\"Pick up an apple from the table\"}],\"background_setting\":\"An indoor workspace\",\"cinematography\":{\"camera_motion\":\"static\",\"framing\":\"top-down wide-angle view\",\"camera_angle\":\"overhead\"}}",
"seed": 42,
"vision_path": "/path/to/initial_clip.mp4",
"action_path": "/path/to/initial_actions.json"
}
initial_actions.json โ list of 16 ร 26D joint-angle vectors (raw, before normalization):
[
[0.12, -0.05, -0.28, 0.19, 0.69, -0.02, 0.43, 0.17, 0.08, 0.01, 0.31, 0.01, -0.43, -0.16, 0.37, 0.45, 0.21, 0.27, 0.29, 0.33, 0.0, 0.66, 0.18, 0.29, 0.28, 0.25],
...
]
Run:
torchrun --nproc_per_node=4 \
-m cosmos_framework.scripts.inference \
--checkpoint-path /path/to/Cosmos3-Nano-G1-BrainCo-ActionSFT \
--parallelism-preset latency \
--no-guardrails \
--output-dir outputs/ \
-i my_input.json
Outputs (in outputs/pick_apple/):
vision.mp4โ predicted future video framesaction.jsonโ predicted 16-step joint-angle trajectory (normalized)
Joint Name Order
0: kLeftShoulderPitch 7: kRightShoulderPitch
1: kLeftShoulderRoll 8: kRightShoulderRoll
2: kLeftShoulderYaw 9: kRightShoulderYaw
3: kLeftElbow 10: kRightElbow
4: kLeftWristRoll 11: kRightWristRoll
5: kLeftWristPitch 12: kRightWristPitch
6: kLeftWristYaw 13: kRightWristYaw
14: kLeftHandThumb 20: kRightHandThumb
15: kLeftHandThumbAux 21: kRightHandThumbAux
16: kLeftHandIndex 22: kRightHandIndex
17: kLeftHandMiddle 23: kRightHandMiddle
18: kLeftHandRing 24: kRightHandRing
19: kLeftHandPinky 25: kRightHandPinky
Action Normalization Stats
Use examples/data/g1_brainco/action_stats.json from the training repo for denormalization:
import json, numpy as np
stats = json.load(open("action_stats.json"))["global"]
q01 = np.array(stats["q01"])
q99 = np.array(stats["q99"])
def denormalize(normalized_action):
"""Convert model output [-1, 1] back to raw joint angles (radians)."""
return (normalized_action + 1.0) / 2.0 * (q99 - q01) + q01
Inference Results (iter 800)
Evaluated on the held-out last episode of each task. Actions predicted in 16-step chunks at 5 fps.
| Task | MAE (rad) | RMSE (rad) | Max Err (rad) |
|---|---|---|---|
| GraspOreo | 0.3293 | 0.4673 | 1.2437 |
| GraspRubiksCube | 0.3529 | 0.4995 | 1.1093 |
| PickApple | 0.1932 | 0.2827 | 0.6504 |
| PickCharger | 0.3924 | 0.5281 | 1.3410 |
| PickDoll | 0.4292 | 0.5131 | 0.8699 |
| PickDrink | 0.3487 | 0.4535 | 1.0586 |
| PickTissues | 0.2121 | 0.3279 | 0.9474 |
| PickToothpaste | 0.3759 | 0.4559 | 0.8488 |
| Average | 0.3292 | 0.4410 | โ |
Note: This is an early checkpoint (800 iterations). Results improve significantly with more training (recommended: 2000โ5000 iterations).
Switching to Policy Mode
To use this checkpoint as a closed-loop policy (image + prompt โ video + action, no action input needed), change model_mode to "policy" and remove action_path. For policy SFT training from this checkpoint, see the Cosmos3 policy fine-tuning guide.
Training Recipe
# examples/toml/sft_config/g1_action_sft_nano.toml
[job]
experiment = "g1_action_sft_nano"
project = "cosmos3_g1"
[optimizer]
lr = 1.0e-5
keys_to_select = ["moe_gen", "time_embedder", "vae2llm", "llm2vae"]
[trainer]
max_iter = 1000
[checkpoint]
save_iter = 200
load_path = "${oc.env:BASE_CHECKPOINT_PATH}"
Launch:
export BASE_CHECKPOINT_PATH=examples/checkpoints/Cosmos3-Nano
export WAN_VAE_PATH=examples/checkpoints/wan22_vae/Wan2.2_VAE.pth
export G1_DATASETS_ROOT=/path/to/cosmos3g1dataset
export G1_NORM_STATS_PATH=examples/data/g1_brainco/action_stats.json
bash examples/launch_sft_g1_nano.sh
Citation
If you use this model, please cite the base Cosmos3 work:
@misc{cosmos3,
title = {Cosmos3: World Foundation Model for Physical AI},
author = {NVIDIA},
year = {2026},
url = {https://github.com/nvidia-cosmos/cosmos3}
}
License
This model inherits the NVIDIA Open Model License (OpenMDW-1.1) from the base Cosmos3-Nano checkpoint.
- Downloads last month
- 11