Robotics
LeRobot
Safetensors
smolvla
vision-language-action
imitation-learning
ur7e

Model Card for SmolVLA โ€” UR7e PickandPlace (50 epoch)

SmolVLA is a compact, efficient vision-language-action model that achieves competitive performance at reduced computational costs and can be deployed on consumer-grade hardware.

This checkpoint is a fine-tune of lerobot/smolvla_base on the CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps dataset for a UR7e single-arm pick-and-place task.

This policy has been trained and pushed to the Hub using LeRobot. See the full documentation at LeRobot Docs.


Training Summary

Field Value
Base model lerobot/smolvla_base
Dataset CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps (100 eps, 35,878 frames, 10 fps)
Robot UR7e single-arm, 7-DoF (6 joints + gripper)
Cameras realsense_topview, realsense_wrist (renamed โ†’ camera1/camera2)
Steps 7,100 (โ‰ˆ 50 epoch ยท 35878 ร— 50 / 256)
Batch 128 ร— 2 GPU = 256 per-step samples
Optimizer AdamW (lr 1e-4, betas (0.9, 0.95), wd 1e-10), cosine decay w/ warmup 1000
Chunk / Action steps 50 / 50
Image augmentation brightness, contrast, saturation, hue, sharpness, affine (max 3, random order)
Hardware 2ร— NVIDIA RTX PRO 6000 Blackwell

action/observation.state dim ์€ 7 ์ด๋ฉฐ, SmolVLA ์˜ max_action_dim=32 ์œผ๋กœ ์ž๋™ zero-pad ๋ฉ๋‹ˆ๋‹ค.


How to Get Started

Inference (load + step)

import torch
from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy

policy = SmolVLAPolicy.from_pretrained("CoRL2026-CSI/smolVLA-UR7e-PickandPlace-50epoch")
policy.to("cuda").eval()

# observation ์˜ ์นด๋ฉ”๋ผ ํ‚ค๋Š” ํ•™์Šต ์‹œ ์‚ฌ์šฉํ•œ ์ด๋ฆ„(`observation.images.camera1`,
# `observation.images.camera2`) ๊ณผ ๋™์ผํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
with torch.inference_mode():
    action = policy.select_action(observation)

Continue fine-tuning

lerobot-train \
  --policy.path=CoRL2026-CSI/smolVLA-UR7e-PickandPlace-50epoch \
  --dataset.repo_id=CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps \
  --output_dir=outputs/train/smolvla_ur7e_pickandplace_ft \
  --job_name=smolvla_ur7e_pickandplace_ft \
  --batch_size=128 --steps=2000 \
  --policy.device=cuda --wandb.enable=true

์›๋ณธ ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋Š” scripts/cap/smolvla_cap_ur7e_pickandplace.sh ์ด๋ฉฐ, ์ •ํ™•ํ•œ hyperparameter ๋Š” ์ด ๋ฆฌํฌ์˜ train_config.json ์œผ๋กœ๋„ ์žฌ๊ตฌ์„ฑ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.


Model Details

Downloads last month
1
Safetensors
Model size
0.5B params
Tensor type
F32
ยท
BF16
ยท
Video Preview
loading

Model tree for Cache-SCA/smolVLA-UR7e-PickandPlace-50epoch

Finetuned
(6523)
this model

Dataset used to train Cache-SCA/smolVLA-UR7e-PickandPlace-50epoch

Paper for Cache-SCA/smolVLA-UR7e-PickandPlace-50epoch