Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string

IsaacLab-smolVLA-SO101-Multitask-8epoch

lerobot/smolvla_base ๋ฅผ IsaacLab ์‹œ๋ฎฌ๋ ˆ์ด์…˜ SO101 11-task ๋ฐ์ดํ„ฐ์…‹ CoRL2026-CSI/Isaaclab-so101_11task_baseCaP_3300epi_10fps ์œผ๋กœ 8 epoch ํŒŒ์ธํŠœ๋‹ํ•œ SmolVLA ์ •์ฑ….

์ด ์ฒดํฌํฌ์ธํŠธ๋Š” LoRA adapter ์ž…๋‹ˆ๋‹ค (adapter_model.safetensors). base ๋ชจ๋ธ lerobot/smolvla_base ์™€ ํ•จ๊ป˜ ๋กœ๋“œ๋ฉ๋‹ˆ๋‹ค.

Model details

  • Base model: lerobot/smolvla_base (SmolVLM2-500M-Video-Instruct VLM + action expert)
  • Robot: SO101 (6-DOF, gripper ํฌํ•จ) โ€” IsaacLab ์‹œ๋ฎฌ๋ ˆ์ด์…˜
  • Cameras: top, left_wrist (480ร—640) โ€” ์ •์ฑ… ํ‚ค camera1(left_wrist) / camera2(top) ๋กœ rename
  • Inputs: observation.state[6] + ์นด๋ฉ”๋ผ 2๊ฐœ + language instruction (task)
  • Output: action[6] (joint position)
  • Action chunking: chunk_size=50, n_action_steps=50

Fine-tuning strategy (PEFT / LoRA)

ํ•ต์‹ฌ: action expert ์™€ projection ๋ ˆ์ด์–ด๋Š” full fine-tune, VLM backbone ์€ q/v_proj ์—๋งŒ LoRA, ๊ทธ ์™ธ VLM ์€ ์™„์ „ freeze.

Trainable / Frozen breakdown

๋ชจ๋“ˆ ์ƒํƒœ ์„ค๋ช…
VLM q_proj, v_proj (attention query/value projection) ๐Ÿ”ต LoRA ํ•™์Šต base weight ๋Š” frozen, ์ €๋žญํฌ adapter(AยทB)๋งŒ ํ•™์Šต
VLM ๊ทธ ์™ธ ์ „๋ถ€ โ€” k_proj, o_proj, MLP(gate/up/down_proj), token/position embeddings, vision encoder(SigLIP), LayerNorm โ„๏ธ ์™„์ „ Frozen LoRA ๋„ ์•ˆ ๋ถ™๊ณ  full ํ•™์Šต๋„ ์•„๋‹˜
Action expert (lm_expert) ์ „์ฒด โ€” attention(q/k/v/o_proj), MLP(gate/up/down_proj), LayerNorm ๐Ÿ”ฅ Full fine-tune ์ „์ฒด weight ์ง์ ‘ ํ•™์Šต
state_proj (state โ†’ token embedding) ๐Ÿ”ฅ Full fine-tune
action_in_proj, action_out_proj (action โ†” expert hidden) ๐Ÿ”ฅ Full fine-tune
action_time_mlp_in, action_time_mlp_out (flow-matching time embedding) ๐Ÿ”ฅ Full fine-tune

์ฆ‰ frozen ์ธ ๊ฒƒ์€ VLM backbone ์˜ ๋Œ€๋ถ€๋ถ„(vision encoder ํฌํ•จ) + VLM ์˜ k_proj/o_proj/MLP/embedding/LayerNorm. ํ•™์Šต๋˜๋Š” ๊ฒƒ์€ VLM q/v_proj ์˜ LoRA adapter + action expert ์ „์ฒด + ๋ชจ๋“  projection ๋ ˆ์ด์–ด.

LoRA / PEFT config

ํ•ญ๋ชฉ ๊ฐ’
PEFT method LORA
rank r 32
lora_alpha 8
lora_dropout 0.0
bias none
use_rslora / use_dora false / false
target_modules (LoRA ์ ์šฉ) .*vlm_with_expert\.vlm\..*(q_proj|v_proj)
modules_to_save (full fine-tune) lm_expert, state_proj, action_in_proj, action_out_proj, action_time_mlp_in, action_time_mlp_out

์ €์žฅ๋œ adapter ํ…์„œ: 267๊ฐœ (LoRA A/B 112๊ฐœ โ€” VLM q_projยทv_proj / full-trained 155๊ฐœ โ€” expertยทprojection).

Training hyperparameters

ํ•ญ๋ชฉ ๊ฐ’
Dataset Isaaclab-so101_11task_baseCaP_3300epi_10fps โ€” 3,300 episodes / 1,175,352 frames / 11 tasks / 10 fps
Epochs 8
Steps 36,800
Global batch size 256 (micro batch 64 ร— 4 GPU ร— grad_accum 1)
Optimizer AdamW โ€” lr 1e-4, weight_decay 1e-10, grad_clip_norm 10.0
LR scheduler cosine_decay_with_warmup โ€” warmup 1,000 / decay 30,000 / peak_lr 1e-4 / decay_lr 2.5e-6
Seed 1000
Dataloader workers 24
Mixed precision no (bf16 inference)
Image augmentation ColorJitter (brightness/contrast/saturation/hue) + SharpnessJitter, max 3 random โ€” ๊ธฐํ•˜ํ•™์  ๋ณ€ํ˜•(ํšŒ์ „/์ด๋™/๋ฐ˜์ „) ์—†์Œ (VLA ์ขŒ์šฐ ์˜๋ฏธ ๋ณด์กด)
Hardware 4 ร— NVIDIA H100 80GB
Training time ์•ฝ 11์‹œ๊ฐ„ 12๋ถ„
Final loss 0.016 (grad_norm 0.21)

Camera rename

Dataset key Policy key
observation.images.left_wrist observation.images.camera1
observation.images.top observation.images.camera2

Usage

from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy

policy = SmolVLAPolicy.from_pretrained("CoRL2026-CSI/IsaacLab-smolVLA-SO101-Multitask-8epoch")

Citation / Acknowledgement

Built on top of LeRobot and the SmolVLA base checkpoint. Project: CoRL 2026 CSI submission.

Framework versions

  • PEFT 0.19.1
  • LeRobot 0.5.2
Downloads last month
3
Video Preview
loading

Model tree for CoRL2026-CSI/IsaacLab-smolVLA-SO101-Multitask-8epoch_LoRA

Adapter
(6)
this model

Dataset used to train CoRL2026-CSI/IsaacLab-smolVLA-SO101-Multitask-8epoch_LoRA