Instructions to use Cache-SCA/smolVLA-UR7e-CaP_arrange_block_10fps with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use Cache-SCA/smolVLA-UR7e-CaP_arrange_block_10fps with LeRobot:
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e .[smolvla]
# Launch finetuning on your dataset python lerobot/scripts/train.py \ --policy.path=Cache-SCA/smolVLA-UR7e-CaP_arrange_block_10fps \ --dataset.repo_id=lerobot/svla_so101_pickplace \ --batch_size=64 \ --steps=20000 \ --output_dir=outputs/train/my_smolvla \ --job_name=my_smolvla_training \ --policy.device=cuda \ --wandb.enable=true
# Run the policy using the record function python -m lerobot.record \ --robot.type=so101_follower \ --robot.port=/dev/ttyACM0 \ # <- Use your port --robot.id=my_blue_follower_arm \ # <- Use your robot id --robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras --dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording --dataset.repo_id=HF_USER/dataset_name \ # <- This will be the dataset name on HF Hub --dataset.episode_time_s=50 \ --dataset.num_episodes=10 \ --policy.path=Cache-SCA/smolVLA-UR7e-CaP_arrange_block_10fps - Notebooks
- Google Colab
- Kaggle
smolVLA-UR7e-CaP_arrange_block_10fps
This repository contains a LeRobot SmolVLA policy fine-tuned for a UR7e block-arrangement task. The policy was trained on demonstrations from CoRL2026-CSI/UR7e-CaP_arrange_block_100epi_10fps, where the robot arranges red, green, and blue blocks along a purple line from left to right.
The checkpoint is intended for research use with LeRobot-compatible inference pipelines. No real-robot or offline success-rate evaluation is included in this model card; the reported metrics are training logs only.
Model Details
- Model type: SmolVLA vision-language-action policy
- Base policy:
lerobot/smolvla_base - VLM backbone:
HuggingFaceTB/SmolVLM2-500M-Video-Instruct - Robot: UR7e
- Task: Arrange red, green, blue blocks along a purple line from left to right
- Training framework: LeRobot
- Checkpoint format:
safetensors - License: Apache 2.0
Dataset
The policy was trained on CoRL2026-CSI/UR7e-CaP_arrange_block_100epi_10fps, a LeRobot dataset collected for the UR7e block-arrangement task.
Dataset summary:
| Field | Value |
|---|---|
| Robot type | ur7e |
| Episodes | 100 |
| Frames | 47,116 |
| Dataset FPS | 10 |
| Tasks | 1 |
| Split | train: 0:100 |
| Cameras | RealSense wrist and top-view RGB video |
| Camera resolution | 480 x 640 RGB video |
| Dataset state/action vectors | 7D joint/gripper vector |
The dataset includes additional skill annotations such as skill.type, skill.progress, target joint positions, target Cartesian poses, and natural-language skill text. The policy checkpoint uses the LeRobot preprocessing pipeline saved in this repository.
Policy Inputs and Outputs
The saved policy configuration expects the following model features after preprocessing:
Inputs, according to the saved policy config:
observation.state: 6D state featureobservation.images.camera1: wrist camera, resized/padded for SmolVLAobservation.images.camera2: top-view camera, resized/padded for SmolVLAobservation.images.camera3: visual input slotobservation.images.empty_camera_0: empty camera placeholder
Output, according to the saved policy config:
action: 7D joint/gripper action vector
The included policy_preprocessor.json maps dataset camera names to model camera names:
observation.images.realsense_wrist->observation.images.camera1observation.images.realsense_topview->observation.images.camera2
State and action features use mean/std normalization. Visual features use identity normalization. The postprocessor unnormalizes the action output and moves it back to CPU.
Training Details
The final uploaded checkpoint is from step 9203.
| Setting | Value |
|---|---|
| Training steps | 9,203 |
| Approx. epochs | 50 |
| Batch size | 128 |
| Gradient accumulation | 1 |
| Seed | 1000 |
| Optimizer | AdamW |
| Peak learning rate | 1e-4 |
| Weight decay | 1e-10 |
| Gradient clipping | 10.0 |
| Scheduler | Cosine decay with warmup |
| Warmup steps | 1,000 |
| Decay steps | 30,000 |
| Final decay LR | 2.5e-6 |
| AMP | Disabled |
| PEFT | Disabled |
| Vision encoder | Frozen |
| Expert-only training | Enabled |
| State projection training | Enabled |
| Action chunk size | 50 |
| Observation steps | 1 |
| Action steps | 50 |
Image augmentation was enabled during training with up to two randomly ordered transforms per sample:
- brightness jitter:
[0.8, 1.2] - contrast jitter:
[0.8, 1.2] - saturation jitter:
[0.5, 1.5] - hue jitter:
[-0.05, 0.05] - sharpness jitter:
[0.5, 1.5] - random affine rotation:
[-5, 5]degrees - random affine translation:
0.05
Training logs:
| Metric | Value |
|---|---|
| Final logged training loss | 0.010 |
| Mean training loss over last 20 logged points | 0.01045 |
| Final logged gradient norm | 0.101 |
| Final logged learning rate | 2.5e-6 |
These values are training-loop logs only and should not be interpreted as task success rates.
How to Use
Install LeRobot and load the policy from the Hub:
from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy
policy = SmolVLAPolicy.from_pretrained(
"CoRL2026-CSI/smolVLA-UR7e-CaP_arrange_block_10fps"
)
policy.to("cuda")
policy.eval()
For robot rollout or evaluation, use the LeRobot CLI or your existing UR7e control stack with --policy.path pointing to this repository:
lerobot-record \
--policy.path=CoRL2026-CSI/smolVLA-UR7e-CaP_arrange_block_10fps \
--dataset.repo_id=CoRL2026-CSI/eval_smolVLA-UR7e-CaP_arrange_block_10fps
Adjust the robot, camera, and dataset arguments to match the local UR7e deployment setup.
Files
This repository contains:
model.safetensors: policy weightsconfig.json: policy configurationtrain_config.json: LeRobot training configurationpolicy_preprocessor.json: saved inference preprocessing pipelinepolicy_preprocessor_step_5_normalizer_processor.safetensors: normalization statepolicy_postprocessor.json: saved inference postprocessing pipelinepolicy_postprocessor_step_0_unnormalizer_processor.safetensors: action unnormalization state
Evaluation
No evaluation run is reported for this checkpoint. The training configuration had eval_freq=0, so no offline evaluation videos, simulated rollouts, or real-robot success metrics were produced as part of the training job.
Recommended evaluation before deployment:
- Run held-out demonstrations or manually selected validation episodes if available.
- Run short supervised sanity checks to confirm camera mapping, state dimensions, and action unnormalization.
- Start with low-speed, closely supervised real-robot rollouts.
- Report success rate, number of trials, reset conditions, and failure modes separately from training loss.
Limitations and Safety
- This policy is specialized to the recorded UR7e setup, camera placement, workspace geometry, block colors, and purple-line arrangement task.
- Performance may degrade if camera extrinsics, lighting, object appearance, workspace layout, robot calibration, or control frequency differ from the training data.
- The model card does not claim real-robot success rate. Validate the policy in the target environment before autonomous operation.
- Use appropriate robot safety limits, emergency stop procedures, workspace supervision, and conservative speed/force settings during rollout.
Provenance
Training completed on 2026-05-10 at step 9203. The model weights were uploaded to this repository on 2026-05-11. The final checkpoint used for upload was:
lerobot/outputs/train/smolvla_ur7e_arrange_block_100epi_10fps_gbs256_ep50_20260510_112403/checkpoints/009203/pretrained_model
The first automatic push at the end of distributed training did not finish because the upload stalled while other ranks were waiting at a distributed barrier. The final repository upload was completed separately from the training process.
Citation
If you use this checkpoint, cite LeRobot and SmolVLA where appropriate:
@software{lerobot,
title = {LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch},
author = {Hugging Face},
url = {https://github.com/huggingface/lerobot},
year = {2024}
}
@misc{smolvla,
title = {SmolVLA: A compact vision-language-action model for robotics},
url = {https://huggingface.co/papers/2506.01844}
}
- Downloads last month
- 2
Model tree for Cache-SCA/smolVLA-UR7e-CaP_arrange_block_10fps
Base model
lerobot/smolvla_base