Instructions to use Cache-SCA/smolVLA-UR7e-PickandPlace-50epoch with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use Cache-SCA/smolVLA-UR7e-PickandPlace-50epoch with LeRobot:
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e .[smolvla]
# Launch finetuning on your dataset python lerobot/scripts/train.py \ --policy.path=Cache-SCA/smolVLA-UR7e-PickandPlace-50epoch \ --dataset.repo_id=lerobot/svla_so101_pickplace \ --batch_size=64 \ --steps=20000 \ --output_dir=outputs/train/my_smolvla \ --job_name=my_smolvla_training \ --policy.device=cuda \ --wandb.enable=true
# Run the policy using the record function python -m lerobot.record \ --robot.type=so101_follower \ --robot.port=/dev/ttyACM0 \ # <- Use your port --robot.id=my_blue_follower_arm \ # <- Use your robot id --robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras --dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording --dataset.repo_id=HF_USER/dataset_name \ # <- This will be the dataset name on HF Hub --dataset.episode_time_s=50 \ --dataset.num_episodes=10 \ --policy.path=Cache-SCA/smolVLA-UR7e-PickandPlace-50epoch - Notebooks
- Google Colab
- Kaggle
Model Card for SmolVLA โ UR7e PickandPlace (50 epoch)
SmolVLA is a compact, efficient vision-language-action model that achieves competitive performance at reduced computational costs and can be deployed on consumer-grade hardware.
This checkpoint is a fine-tune of lerobot/smolvla_base
on the CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps
dataset for a UR7e single-arm pick-and-place task.
This policy has been trained and pushed to the Hub using LeRobot. See the full documentation at LeRobot Docs.
Training Summary
| Field | Value |
|---|---|
| Base model | lerobot/smolvla_base |
| Dataset | CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps (100 eps, 35,878 frames, 10 fps) |
| Robot | UR7e single-arm, 7-DoF (6 joints + gripper) |
| Cameras | realsense_topview, realsense_wrist (renamed โ camera1/camera2) |
| Steps | 7,100 (โ 50 epoch ยท 35878 ร 50 / 256) |
| Batch | 128 ร 2 GPU = 256 per-step samples |
| Optimizer | AdamW (lr 1e-4, betas (0.9, 0.95), wd 1e-10), cosine decay w/ warmup 1000 |
| Chunk / Action steps | 50 / 50 |
| Image augmentation | brightness, contrast, saturation, hue, sharpness, affine (max 3, random order) |
| Hardware | 2ร NVIDIA RTX PRO 6000 Blackwell |
action/observation.state dim ์ 7 ์ด๋ฉฐ, SmolVLA ์ max_action_dim=32 ์ผ๋ก ์๋ zero-pad ๋ฉ๋๋ค.
How to Get Started
Inference (load + step)
import torch
from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy
policy = SmolVLAPolicy.from_pretrained("CoRL2026-CSI/smolVLA-UR7e-PickandPlace-50epoch")
policy.to("cuda").eval()
# observation ์ ์นด๋ฉ๋ผ ํค๋ ํ์ต ์ ์ฌ์ฉํ ์ด๋ฆ(`observation.images.camera1`,
# `observation.images.camera2`) ๊ณผ ๋์ผํด์ผ ํฉ๋๋ค.
with torch.inference_mode():
action = policy.select_action(observation)
Continue fine-tuning
lerobot-train \
--policy.path=CoRL2026-CSI/smolVLA-UR7e-PickandPlace-50epoch \
--dataset.repo_id=CoRL2026-CSI/UR7e_CaP_PickandPlace_100epi_10fps \
--output_dir=outputs/train/smolvla_ur7e_pickandplace_ft \
--job_name=smolvla_ur7e_pickandplace_ft \
--batch_size=128 --steps=2000 \
--policy.device=cuda --wandb.enable=true
์๋ณธ ํ์ต ์คํฌ๋ฆฝํธ๋ scripts/cap/smolvla_cap_ur7e_pickandplace.sh ์ด๋ฉฐ,
์ ํํ hyperparameter ๋ ์ด ๋ฆฌํฌ์ train_config.json ์ผ๋ก๋ ์ฌ๊ตฌ์ฑ ๊ฐ๋ฅํฉ๋๋ค.
Model Details
- License: apache-2.0
- Base model:
lerobot/smolvla_base - Library: LeRobot
- Trained by: CoRL2026-CSI
- Downloads last month
- 1
Model tree for Cache-SCA/smolVLA-UR7e-PickandPlace-50epoch
Base model
lerobot/smolvla_base