RLinf LingbotVLA Click Bell GRPO

This repository contains an RLinf checkpoint for LingbotVLA GRPO fine-tuning on the RoboTwin click_bell task.

Checkpoint Format

The checkpoint is provided in RLinf actor checkpoint format:

actor/model_state_dict/full_weights.pt
actor/dcp_checkpoint/.metadata
actor/dcp_checkpoint/__5_0.distcp
actor/dcp_checkpoint/__7_0.distcp

For evaluation in RLinf, load the model through runner.ckpt_path:

runner.ckpt_path=/path/to/actor/model_state_dict/full_weights.pt

Use the LingbotVLA RoboTwin SFT base configuration from:

robbyant/lingbot-vla-4b-posttrain-robotwin
revision: 3e0c7c476bde3daaac00f79f3741a292a299f60a

Evaluation

Latest local regression evaluation on RoboTwin click_bell random setting:

Checkpoint Task Setting Trajectories Max Steps eval/success_once eval/return
RLinf-lingbotvla-click-bell-grpo click_bell random 320 400 0.9875 6.85

Evaluation settings:

config: robotwin_click_bell_grpo_lingbotvla_eval
algorithm.eval_rollout_epoch=1
algorithm.sampling_params.temperature_eval=-1
env.eval.total_num_envs=320
env.eval.max_episode_steps=400
env.eval.max_steps_per_rollout_epoch=400
env.eval.use_fixed_reset_state_ids=False
env.eval.seeds_path=null
env.eval.video_cfg.save_video=False

The evaluation logs are kept locally under:

/mnt/public/lwb/artifacts/lingbot-vla-eval/click_bell_regression/20260615_140703
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading