0520 pi0.5 + RL Token cotrain (device c)

Cotrained pair on the screw insertion task using device-c data (Shiki42/0420_0423screw_c + Shiki42/0420_0423screw_critical_c at 1:1).

Layout

model.safetensors (root) — pi0.5 VLA (cotrained, ~3.6 B params bf16, ~14.5 GB)
rlt/model.safetensors — RL Token (cotrained, ~256 M params fp32, ~1 GB)

Load

from lerobot.policies.pi05.modeling_pi05 import PI05Policy
from lerobot.policies.rlt.modeling_rlt_token import RLTokenPolicy
from lerobot.policies.rlt.configuration_rlt_token import RLTokenPolicyConfig
from huggingface_hub import snapshot_download

# Download both halves to the same local dir, set vla_pretrained_path to the
# root of that dir, and load the RLT from the rlt/ subfolder.
local = snapshot_download("Shiki42/0520_pi0.5screw_rlt_cotrain_c", repo_type="model")
vla = PI05Policy.from_pretrained(local)
cfg = RLTokenPolicyConfig.from_pretrained(local + "/rlt")
cfg.vla_pretrained_path = local
rlt = RLTokenPolicy.from_pretrained(local + "/rlt", config=cfg)

Provenance

VLA SFT baseline: Shiki42/pi05_screw_c_mix_20k (20k-step continued SFT of pi0.5).
RL Token init: Shiki42/rlt_token_mix_c4_15k (15k-step joint with the SFT baseline).
Restart: from the 15k RLT + SFT baseline VLA, fresh AdamW, c4 hyperparams (rl_token_lr=3e-4, vla_lr=1e-5, vla_ft_weight=1.0, norm_gamma=0.5, batch 16, 1:1 mixed screw_c + screw_critical_c). Stopped early on the rolling-1000 average of loss_recon returning to ≤0.215 (the 15k cotrain floor).

Downloads last month: 42

Safetensors

Model size

4B params

Tensor type

F32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support