RLinf LingbotVLA Click Bell GRPO
This repository contains an RLinf checkpoint for LingbotVLA GRPO fine-tuning on the RoboTwin click_bell task.
Checkpoint Format
The checkpoint is provided in RLinf actor checkpoint format:
actor/model_state_dict/full_weights.pt
actor/dcp_checkpoint/.metadata
actor/dcp_checkpoint/__5_0.distcp
actor/dcp_checkpoint/__7_0.distcp
For evaluation in RLinf, load the model through runner.ckpt_path:
runner.ckpt_path=/path/to/actor/model_state_dict/full_weights.pt
Use the LingbotVLA RoboTwin SFT base configuration from:
robbyant/lingbot-vla-4b-posttrain-robotwin
revision: 3e0c7c476bde3daaac00f79f3741a292a299f60a
Evaluation
Latest local regression evaluation on RoboTwin click_bell random setting:
| Checkpoint | Task | Setting | Trajectories | Max Steps | eval/success_once | eval/return |
|---|---|---|---|---|---|---|
| RLinf-lingbotvla-click-bell-grpo | click_bell | random | 320 | 400 | 0.9875 | 6.85 |
Evaluation settings:
config: robotwin_click_bell_grpo_lingbotvla_eval
algorithm.eval_rollout_epoch=1
algorithm.sampling_params.temperature_eval=-1
env.eval.total_num_envs=320
env.eval.max_episode_steps=400
env.eval.max_steps_per_rollout_epoch=400
env.eval.use_fixed_reset_state_ids=False
env.eval.seeds_path=null
env.eval.video_cfg.save_video=False
The evaluation logs are kept locally under:
/mnt/public/lwb/artifacts/lingbot-vla-eval/click_bell_regression/20260615_140703