Model Card for SmolVLA — UR7e PickandPlace (50 epoch)

SmolVLA is a compact, efficient vision-language-action model that achieves competitive performance at reduced computational costs and can be deployed on consumer-grade hardware.

This checkpoint is a fine-tune of lerobot/smolvla_base on the CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps dataset for a UR7e single-arm pick-and-place task.

This policy has been trained and pushed to the Hub using LeRobot. See the full documentation at LeRobot Docs.

Training Summary

Field	Value
Base model	`lerobot/smolvla_base`
Dataset	`CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps` (100 eps, 35,878 frames, 10 fps)
Robot	UR7e single-arm, 7-DoF (6 joints + gripper)
Cameras	`realsense_topview`, `realsense_wrist` (renamed → `camera1`/`camera2`)
Steps	7,100 (≈ 50 epoch · 35878 × 50 / 256)
Batch	128 × 2 GPU = 256 per-step samples
Optimizer	AdamW (lr 1e-4, betas (0.9, 0.95), wd 1e-10), cosine decay w/ warmup 1000
Chunk / Action steps	50 / 50
Image augmentation	brightness, contrast, saturation, hue, sharpness, affine (max 3, random order)
Hardware	2× NVIDIA RTX PRO 6000 Blackwell

action/observation.state dim 은 7 이며, SmolVLA 의 max_action_dim=32 으로 자동 zero-pad 됩니다.

How to Get Started

Inference (load + step)

import torch
from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy

policy = SmolVLAPolicy.from_pretrained("CoRL2026-CSI/smolVLA-UR7e-PickandPlace-50epoch")
policy.to("cuda").eval()

# observation 의 카메라 키는 학습 시 사용한 이름(`observation.images.camera1`,
# `observation.images.camera2`) 과 동일해야 합니다.
with torch.inference_mode():
    action = policy.select_action(observation)

Continue fine-tuning

lerobot-train \
  --policy.path=CoRL2026-CSI/smolVLA-UR7e-PickandPlace-50epoch \
  --dataset.repo_id=CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps \
  --output_dir=outputs/train/smolvla_ur7e_pickandplace_ft \
  --job_name=smolvla_ur7e_pickandplace_ft \
  --batch_size=128 --steps=2000 \
  --policy.device=cuda --wandb.enable=true

원본 학습 스크립트는 scripts/cap/smolvla_cap_ur7e_pickandplace.sh 이며, 정확한 hyperparameter 는 이 리포의 train_config.json 으로도 재구성 가능합니다.