GR00T-N1.5 · RoboCasa Subtask + Termination Head (term_head_from_n15base)
Fine-tune of nvidia/GR00T-N1.5-3B (raw foundation, no robocasa-multitask pretrain) on RoboCasa per-subtask data, with an appended termination head (predicts per-frame task-completion + progress). Plus 3 atomic RL fine-tunes (mixed success/fail sparse subtask PPO+LoRA on top of the SFT model).
Eval (RoboCasa multitask_learning, leaderboard-horizon, pretrain split)
| atomic | composite (chained) | |
|---|---|---|
| term_head_from_n15base (this SFT) | 45.4% (18 tasks) | 0.6% (16 tasks) |
Termination head fires correctly under N1.5; chained-composite is horizon/threshold-sensitive (controller: EMA + threshold + debounce + buffer + min_steps).
Contents
sft_term_head_from_n15base_120k/— the SFT checkpoint-120000 (full: model safetensorsoptimizer.pt+ scheduler/rng → resumable).embodiment robocasa_panda_omron, action_horizon 16, bf16/fp32.
rl_atomic/{closeblenderlid,closefridge,closetoasterovendoor}/— atomic RL ckpts atglobal_step_80(FSDPdcp_checkpoint+ consolidatedmodel_state_dict/full_weights.pt, action-head LoRA r64/α128 unmerged). Convert to a loadable HF dir withconvert_rl_ckpt.py(merges LoRA, drops PPO value head) — see the code repo.
Code / reproduce
Full SFT→eval→RL code + two apply-able patches + deploy guide:
https://github.com/MuQoe/vla-self-evolution → branch tianyu →
n15base_subtask_termhead_rl/ (see README.md + *.patch).
Trained on raw nvidia/GR00T-N1.5-3B; --dataset_soup subtask_full --use_termination_head --termination_coeff 0.1 --progress_coeff 0.05, 120k steps, 2×H100.