SmolVLA-PickOrange

针对 LeIsaac SO-101 PickOrange 任务 LoRA-free 微调的 SmolVLA 策略 — 自训 30k step。 A fine-tuned SmolVLA policy on the LeIsaac SO-101 PickOrange task, 30k steps full-parameter from lerobot/smolvla_base.

🔗 项目仓库 / Project repos：

vitorcen/isaaclab-experience — Isaac Lab + LeIsaac 多策略横评（parent project）— 含 7-baseline benchmark
vitorcen/LeIsaac-Training — LeIsaac fork（训练脚本 + 设计文档 / training scripts + design docs）

关于命名 / About the name：config.type=smolvla (LeRobot v1 SmolVLA implementation)，backbone 用 HuggingFaceTB/SmolVLM2-500M-Video-Instruct (SmolVLM2)。LeRobot 自己也叫 smolvla 而不是 smolvla2，所以仓库名沿用 SmolVLA-PickOrange。 config.type=smolvla (LeRobot v1 SmolVLA implementation) with HuggingFaceTB/SmolVLM2-500M-Video-Instruct backbone. LeRobot keeps the policy name smolvla (matching their naming), so this repo follows suit.

TL;DR

任务 / Task：Pick up the orange and place it on the plate — SO-101 单臂依次夹起 3 颗橙子并放盘子。
数据集 / Dataset：LightwheelAI/leisaac-pick-orange — 60 episode 遥操示范，30 fps，dual-cam 480×640。
架构 / Architecture：SmolVLA v1（450M），SmolVLM2-500M-Video-Instruct backbone + Action Expert，chunk_size=50。
训练 / Training：full-param 微调（无 LoRA），batch=8 / lr=1e-4 / 30k step / pyav video backend，~14h on RTX 4090。
评测 / Eval（Isaac Sim 5.1，3 round × 3 颗 = 9 颗）：
- strict 1/3 rounds，5/9 oranges（partial credit by sticky put_orange_to_plate）
- 详见 vitorcen/isaaclab-experience 的 LeIsaac/README.md benchmark section
⚠️ 推理 inference 配置：
- policy_action_horizon=50（= chunk_size，全 chunk receding window）
- LeRobot async server 端 --policy_checkpoint_path=wsagi/SmolVLA-PickOrange
- step_hz=30 匹配 dataset

模型亮点

Highlights

SmolVLA 全参微调在 60 ep 小数据上部分能学到，1/3 round 自然 success（3/3 oranges in 158s）— 比第三方 edge-inference/smolvla-so101-pick-orange 的 0/3 强。
但 round 间方差大（episode 2 = 0/3，episode 3 = 2/3）— 60 ep × 30k step 仍欠拟合。
大参数 VLM-based policy 在低数据 regime 下不如专精 visuomotor (ACT 80M) — 与原 SmolVLA 论文低数据 finding 一致。

训练配方

Training recipe

项 / Item	值 / Value
Dataset	`LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz)
Policy	`smolvla` (LeRobot 实现)
Backbone	`HuggingFaceTB/SmolVLM2-500M-Video-Instruct` + Action Expert
`chunk_size` / `n_action_steps`	50 / 50
Batch size	8 (full-param, no LoRA)
Optimizer	AdamW, lr=1e-4
Steps	30000 (~14h on 4090)
`video_backend`	`pyav`（torchcodec 长跑 segfault）
Image augmentation	无
Train expert only	False（全参数）

🚨 schema-free base 关键 fix：训练前必须用 prepare_base.sh 剥光 lerobot/smolvla_base 自带的 input_features / empty_cameras（默认 camera1/2/3 @ 256×256 会污染微调路径），否则训练时 schema 不对齐 → forward 报 KeyError 或 silent 训坏。详见 smolvla2_finetune_pick_orange.html。

推理 inference

通过 LeIsaac eval harness 跑（推荐 / recommended）

# 1. 启 LeRobot async policy server
bash server/start_server.sh --lerobot-only

# 2. 跑 LeIsaac PickOrange eval
DISPLAY=:0 python -u LeIsaac/scripts/evaluation/policy_inference.py \
    --task=LeIsaac-SO101-PickOrange-v0 \
    --eval_rounds=3 --episode_length_s=120 --step_hz=30 \
    --policy_type=lerobot-smolvla \
    --policy_host=127.0.0.1 --policy_port=8080 \
    --policy_action_horizon=50 \
    --policy_checkpoint_path=wsagi/SmolVLA-PickOrange \
    --policy_language_instruction='Pick up the orange and place it on the plate' \
    --device=cuda --enable_cameras

直接用 LeRobot

from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy
policy = SmolVLAPolicy.from_pretrained("wsagi/SmolVLA-PickOrange")
# 见 LeRobot 文档

评测细节（Isaac Sim 5.1，2026-05-18 snapshot）

Evaluation details

Round	🍊 placed	duration	mode	notes
1	3/3 ✅	158.2 s	env-success	自然完成
2	0/3	551.7 s	key-R skip	抓不中颤抖
3	2/3	355.0 s	manual-hang	lerobot server 中断；2 是 viewport 观察

round-by-round detail + 1Hz GPU sample + 7-baseline 横评对比 见 vitorcen/isaaclab-experience 的 results/benchmark/snapshots/。

License

Apache-2.0（继承自 lerobot/smolvla_base 和 LeIsaac）。

Downloads last month: 38

Safetensors

Model size

0.5B params

Tensor type

F32

BF16

Video Preview

Robotics

Model tree for wsagi/SmolVLA-PickOrange

Base model

lerobot/smolvla_base

Finetuned

(5637)

this model

wsagi
/

SmolVLA-PickOrange