YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Cosmos3-Nano-G1-BrainCo-ActionSFT

Cosmos3-Nano fine-tuned for Unitree G1 + BrainCo forward-dynamics prediction

This model is a supervised fine-tuned (SFT) version of NVIDIA Cosmos3-Nano trained on the G1 BrainCo manipulation dataset. Given an initial camera frame and a task prompt, it predicts future video frames and the robot's 26-DOF joint-angle trajectory simultaneously.


Model Details

Property Value
Base model nvidia/Cosmos3-Nano
Fine-tuning mode forward_dynamics
Robot embodiment Unitree G1 + BrainCo dexterous hands
Action space 26D joint angles (7L-arm + 7R-arm + 6L-hand + 6R-hand)
Domain ID 30 (g1_brainco)
Training iterations 800
Checkpoint saved every 200 iterations
Learning rate 1e-5
Optimizer AdamW (ฮฒ=0.9/0.95, ฮต=1e-6)
Precision bfloat16
Action loss weight 10.0
EMA enabled (rate=0.1)
LoRA disabled (full fine-tune)
Hardware 8ร— NVIDIA L40 (49 GB)

Training Data

The model was trained on 8 manipulation tasks from the G1 BrainCo LeRobot dataset:

Task Episodes Frames
GraspOreo 201 ~198K
GraspRubiksCube 197 ~130K
PickApple 200 ~120K
PickCharger 200 ~150K
PickDoll 200 ~276K
PickDrink 201 ~143K
PickTissues 206 ~198K
PickToothpaste 193 ~312K

Dataset format: LeRobot v3.0 โ€” 30 Hz, 4 camera views (left/right high + left/right wrist), 26D absolute joint angles.

Action normalization: quantile (q01/q99 โ†’ [-1, 1]) per joint dimension.


How to Use

Setup

git clone https://huggingface.co/jfgpt/Cosmos3-Nano-G1-BrainCo-ActionSFT
pip install cosmos-framework  # or: uv sync --all-extras --group=cu130
export LD_LIBRARY_PATH=''

Forward-Dynamics Inference

Given an initial video clip and an action sequence, predicts future video + next actions:

Input JSON (my_input.json):

{
  "model_mode": "forward_dynamics",
  "name": "pick_apple",
  "domain_name": "g1_brainco",
  "fps": 5,
  "image_size": 480,
  "action_chunk_size": 16,
  "raw_action_dim": 26,
  "view_point": "ego_view",
  "prompt": "{\"subjects\":[{\"description\":\"A Unitree G1 humanoid robot with articulated arms and dexterous hands\",\"action\":\"Pick up an apple from the table\"}],\"background_setting\":\"An indoor workspace\",\"cinematography\":{\"camera_motion\":\"static\",\"framing\":\"top-down wide-angle view\",\"camera_angle\":\"overhead\"}}",
  "seed": 42,
  "vision_path": "/path/to/initial_clip.mp4",
  "action_path": "/path/to/initial_actions.json"
}

initial_actions.json โ€” list of 16 ร— 26D joint-angle vectors (raw, before normalization):

[
  [0.12, -0.05, -0.28, 0.19, 0.69, -0.02, 0.43, 0.17, 0.08, 0.01, 0.31, 0.01, -0.43, -0.16, 0.37, 0.45, 0.21, 0.27, 0.29, 0.33, 0.0, 0.66, 0.18, 0.29, 0.28, 0.25],
  ...
]

Run:

torchrun --nproc_per_node=4 \
  -m cosmos_framework.scripts.inference \
  --checkpoint-path /path/to/Cosmos3-Nano-G1-BrainCo-ActionSFT \
  --parallelism-preset latency \
  --no-guardrails \
  --output-dir outputs/ \
  -i my_input.json

Outputs (in outputs/pick_apple/):

  • vision.mp4 โ€” predicted future video frames
  • action.json โ€” predicted 16-step joint-angle trajectory (normalized)

Joint Name Order

0:  kLeftShoulderPitch     7:  kRightShoulderPitch
1:  kLeftShoulderRoll      8:  kRightShoulderRoll
2:  kLeftShoulderYaw       9:  kRightShoulderYaw
3:  kLeftElbow             10: kRightElbow
4:  kLeftWristRoll         11: kRightWristRoll
5:  kLeftWristPitch        12: kRightWristPitch
6:  kLeftWristYaw          13: kRightWristYaw
14: kLeftHandThumb         20: kRightHandThumb
15: kLeftHandThumbAux      21: kRightHandThumbAux
16: kLeftHandIndex         22: kRightHandIndex
17: kLeftHandMiddle        23: kRightHandMiddle
18: kLeftHandRing          24: kRightHandRing
19: kLeftHandPinky         25: kRightHandPinky

Action Normalization Stats

Use examples/data/g1_brainco/action_stats.json from the training repo for denormalization:

import json, numpy as np

stats = json.load(open("action_stats.json"))["global"]
q01 = np.array(stats["q01"])
q99 = np.array(stats["q99"])

def denormalize(normalized_action):
    """Convert model output [-1, 1] back to raw joint angles (radians)."""
    return (normalized_action + 1.0) / 2.0 * (q99 - q01) + q01

Inference Results (iter 800)

Evaluated on the held-out last episode of each task. Actions predicted in 16-step chunks at 5 fps.

Task MAE (rad) RMSE (rad) Max Err (rad)
GraspOreo 0.3293 0.4673 1.2437
GraspRubiksCube 0.3529 0.4995 1.1093
PickApple 0.1932 0.2827 0.6504
PickCharger 0.3924 0.5281 1.3410
PickDoll 0.4292 0.5131 0.8699
PickDrink 0.3487 0.4535 1.0586
PickTissues 0.2121 0.3279 0.9474
PickToothpaste 0.3759 0.4559 0.8488
Average 0.3292 0.4410 โ€”

Note: This is an early checkpoint (800 iterations). Results improve significantly with more training (recommended: 2000โ€“5000 iterations).


Switching to Policy Mode

To use this checkpoint as a closed-loop policy (image + prompt โ†’ video + action, no action input needed), change model_mode to "policy" and remove action_path. For policy SFT training from this checkpoint, see the Cosmos3 policy fine-tuning guide.


Training Recipe

# examples/toml/sft_config/g1_action_sft_nano.toml
[job]
experiment = "g1_action_sft_nano"
project    = "cosmos3_g1"

[optimizer]
lr = 1.0e-5
keys_to_select = ["moe_gen", "time_embedder", "vae2llm", "llm2vae"]

[trainer]
max_iter = 1000

[checkpoint]
save_iter = 200
load_path = "${oc.env:BASE_CHECKPOINT_PATH}"

Launch:

export BASE_CHECKPOINT_PATH=examples/checkpoints/Cosmos3-Nano
export WAN_VAE_PATH=examples/checkpoints/wan22_vae/Wan2.2_VAE.pth
export G1_DATASETS_ROOT=/path/to/cosmos3g1dataset
export G1_NORM_STATS_PATH=examples/data/g1_brainco/action_stats.json

bash examples/launch_sft_g1_nano.sh

Citation

If you use this model, please cite the base Cosmos3 work:

@misc{cosmos3,
  title  = {Cosmos3: World Foundation Model for Physical AI},
  author = {NVIDIA},
  year   = {2026},
  url    = {https://github.com/nvidia-cosmos/cosmos3}
}

License

This model inherits the NVIDIA Open Model License (OpenMDW-1.1) from the base Cosmos3-Nano checkpoint.

Downloads last month
11
Safetensors
Model size
15B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support