Instructions to use CoRL2026-CSI/IsaacLab-smolVLA-SO101-Multitask-8epoch_LoRA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use CoRL2026-CSI/IsaacLab-smolVLA-SO101-Multitask-8epoch_LoRA with PEFT:
Task type is invalid.
- LeRobot
How to use CoRL2026-CSI/IsaacLab-smolVLA-SO101-Multitask-8epoch_LoRA with LeRobot:
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e .[smolvla]
# Launch finetuning on your dataset python lerobot/scripts/train.py \ --policy.path=CoRL2026-CSI/IsaacLab-smolVLA-SO101-Multitask-8epoch_LoRA \ --dataset.repo_id=lerobot/svla_so101_pickplace \ --batch_size=64 \ --steps=20000 \ --output_dir=outputs/train/my_smolvla \ --job_name=my_smolvla_training \ --policy.device=cuda \ --wandb.enable=true
# Run the policy using the record function python -m lerobot.record \ --robot.type=so101_follower \ --robot.port=/dev/ttyACM0 \ # <- Use your port --robot.id=my_blue_follower_arm \ # <- Use your robot id --robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras --dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording --dataset.repo_id=HF_USER/dataset_name \ # <- This will be the dataset name on HF Hub --dataset.episode_time_s=50 \ --dataset.num_episodes=10 \ --policy.path=CoRL2026-CSI/IsaacLab-smolVLA-SO101-Multitask-8epoch_LoRA - Notebooks
- Google Colab
- Kaggle
Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string
IsaacLab-smolVLA-SO101-Multitask-8epoch
lerobot/smolvla_base ๋ฅผ IsaacLab ์๋ฎฌ๋ ์ด์ SO101 11-task ๋ฐ์ดํฐ์ CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi_10fps ์ผ๋ก 8 epoch ํ์ธํ๋ํ SmolVLA ์ ์ฑ .
์ด ์ฒดํฌํฌ์ธํธ๋ LoRA adapter ์
๋๋ค (adapter_model.safetensors). base ๋ชจ๋ธ lerobot/smolvla_base ์ ํจ๊ป ๋ก๋๋ฉ๋๋ค.
Model details
- Base model:
lerobot/smolvla_base(SmolVLM2-500M-Video-Instruct VLM + action expert) - Robot: SO101 (6-DOF, gripper ํฌํจ) โ IsaacLab ์๋ฎฌ๋ ์ด์
- Cameras:
top,left_wrist(480ร640) โ ์ ์ฑ ํคcamera1(left_wrist) /camera2(top) ๋ก rename - Inputs:
observation.state[6] + ์นด๋ฉ๋ผ 2๊ฐ + language instruction (task) - Output:
action[6] (joint position) - Action chunking:
chunk_size=50,n_action_steps=50
Fine-tuning strategy (PEFT / LoRA)
ํต์ฌ: action expert ์ projection ๋ ์ด์ด๋ full fine-tune, VLM backbone ์ q/v_proj ์๋ง LoRA, ๊ทธ ์ธ VLM ์ ์์ freeze.
Trainable / Frozen breakdown
| ๋ชจ๋ | ์ํ | ์ค๋ช |
|---|---|---|
VLM q_proj, v_proj (attention query/value projection) |
๐ต LoRA ํ์ต | base weight ๋ frozen, ์ ๋ญํฌ adapter(AยทB)๋ง ํ์ต |
VLM ๊ทธ ์ธ ์ ๋ถ โ k_proj, o_proj, MLP(gate/up/down_proj), token/position embeddings, vision encoder(SigLIP), LayerNorm |
โ๏ธ ์์ Frozen | LoRA ๋ ์ ๋ถ๊ณ full ํ์ต๋ ์๋ |
Action expert (lm_expert) ์ ์ฒด โ attention(q/k/v/o_proj), MLP(gate/up/down_proj), LayerNorm |
๐ฅ Full fine-tune | ์ ์ฒด weight ์ง์ ํ์ต |
state_proj (state โ token embedding) |
๐ฅ Full fine-tune | |
action_in_proj, action_out_proj (action โ expert hidden) |
๐ฅ Full fine-tune | |
action_time_mlp_in, action_time_mlp_out (flow-matching time embedding) |
๐ฅ Full fine-tune |
์ฆ frozen ์ธ ๊ฒ์ VLM backbone ์ ๋๋ถ๋ถ(vision encoder ํฌํจ) + VLM ์ k_proj/o_proj/MLP/embedding/LayerNorm. ํ์ต๋๋ ๊ฒ์ VLM q/v_proj ์ LoRA adapter + action expert ์ ์ฒด + ๋ชจ๋ projection ๋ ์ด์ด.
LoRA / PEFT config
| ํญ๋ชฉ | ๊ฐ |
|---|---|
| PEFT method | LORA |
rank r |
32 |
lora_alpha |
8 |
lora_dropout |
0.0 |
bias |
none |
use_rslora / use_dora |
false / false |
target_modules (LoRA ์ ์ฉ) |
.*vlm_with_expert\.vlm\..*(q_proj|v_proj) |
modules_to_save (full fine-tune) |
lm_expert, state_proj, action_in_proj, action_out_proj, action_time_mlp_in, action_time_mlp_out |
์ ์ฅ๋ adapter ํ ์: 267๊ฐ (LoRA A/B 112๊ฐ โ VLM q_projยทv_proj / full-trained 155๊ฐ โ expertยทprojection).
Training hyperparameters
| ํญ๋ชฉ | ๊ฐ |
|---|---|
| Dataset | Isaaclab-so101_11task_baseCaP_3300epi_10fps โ 3,300 episodes / 1,175,352 frames / 11 tasks / 10 fps |
| Epochs | 8 |
| Steps | 36,800 |
| Global batch size | 256 (micro batch 64 ร 4 GPU ร grad_accum 1) |
| Optimizer | AdamW โ lr 1e-4, weight_decay 1e-10, grad_clip_norm 10.0 |
| LR scheduler | cosine_decay_with_warmup โ warmup 1,000 / decay 30,000 / peak_lr 1e-4 / decay_lr 2.5e-6 |
| Seed | 1000 |
| Dataloader workers | 24 |
| Mixed precision | no (bf16 inference) |
| Image augmentation | ColorJitter (brightness/contrast/saturation/hue) + SharpnessJitter, max 3 random โ ๊ธฐํํ์ ๋ณํ(ํ์ /์ด๋/๋ฐ์ ) ์์ (VLA ์ข์ฐ ์๋ฏธ ๋ณด์กด) |
| Hardware | 4 ร NVIDIA H100 80GB |
| Training time | ์ฝ 11์๊ฐ 12๋ถ |
| Final loss | 0.016 (grad_norm 0.21) |
Camera rename
| Dataset key | Policy key |
|---|---|
observation.images.left_wrist |
observation.images.camera1 |
observation.images.top |
observation.images.camera2 |
Usage
from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy
policy = SmolVLAPolicy.from_pretrained("CoRL2026-CSI/IsaacLab-smolVLA-SO101-Multitask-8epoch")
Citation / Acknowledgement
Built on top of LeRobot and the SmolVLA base checkpoint. Project: CoRL 2026 CSI submission.
Framework versions
- PEFT 0.19.1
- LeRobot 0.5.2
- Downloads last month
- 3
Model tree for CoRL2026-CSI/IsaacLab-smolVLA-SO101-Multitask-8epoch_LoRA
Base model
lerobot/smolvla_base