GR00T-N1.5 · RoboCasa Subtask + Termination Head (term_head_from_n15base)

Fine-tune of nvidia/GR00T-N1.5-3B (raw foundation, no robocasa-multitask pretrain) on RoboCasa per-subtask data, with an appended termination head (predicts per-frame task-completion + progress). Plus 3 atomic RL fine-tunes (mixed success/fail sparse subtask PPO+LoRA on top of the SFT model).

Eval (RoboCasa multitask_learning, leaderboard-horizon, pretrain split)

atomic composite (chained)
term_head_from_n15base (this SFT) 45.4% (18 tasks) 0.6% (16 tasks)

Termination head fires correctly under N1.5; chained-composite is horizon/threshold-sensitive (controller: EMA + threshold + debounce + buffer + min_steps).

Contents

  • sft_term_head_from_n15base_120k/ — the SFT checkpoint-120000 (full: model safetensors
    • optimizer.pt + scheduler/rng → resumable). embodiment robocasa_panda_omron, action_horizon 16, bf16/fp32.
  • rl_atomic/{closeblenderlid,closefridge,closetoasterovendoor}/ — atomic RL ckpts at global_step_80 (FSDP dcp_checkpoint + consolidated model_state_dict/full_weights.pt, action-head LoRA r64/α128 unmerged). Convert to a loadable HF dir with convert_rl_ckpt.py (merges LoRA, drops PPO value head) — see the code repo.

Code / reproduce

Full SFT→eval→RL code + two apply-able patches + deploy guide: https://github.com/MuQoe/vla-self-evolution → branch tianyun15base_subtask_termhead_rl/ (see README.md + *.patch).

Trained on raw nvidia/GR00T-N1.5-3B; --dataset_soup subtask_full --use_termination_head --termination_coeff 0.1 --progress_coeff 0.05, 120k steps, 2×H100.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading