SmolVLA RoboTwin `stack_bowls_two` (50 ep, MULTI-instruction)

SmolVLA policy fine-tuned on 50 demonstration episodes of the stack_bowls_two task from RoboTwin 2.0 (demo_clean config), with per-episode random language instructions sampled from RoboTwin's 100 instruction variations (seed=42 for reproducibility).

This is the multi-instruction counterpart to arrow-hf/smolvla-robotwin-stack-bowls-two-50ep (which uses a single fixed instruction).

Task

Robot: Agilex dual-arm, end-effector control (16D state, 16D action)
Cameras: 3 RGB streams — dual_cam_global, cam_wrist_65, cam_wrist_75 (240×320, D435)
Control rate: ~30 Hz (LeRobot metadata is 10 Hz; underlying RoboTwin sim ~30 Hz, used consistently for train/eval)
Instructions: 50 unique sentences (one per episode), examples:
- "Use the left arm to place the object into the basket"
- "Pick the item up and drop it into the woven basket"
- "Move the object from the table into the basket"

Training

Config	Value
Base checkpoint	`lerobot/smolvla_robotwin`
Training data	50 RoboTwin demonstrations, 50 unique instructions
Batch size	32
Steps	6000 (~10-25 epochs)
Optimizer	AdamW, lr=1e-4
Scheduler	Cosine, warmup=300, decay=6000
Chunk size	50

Evaluation: Single vs Multi-Instruction Comparison

Evaluated in RoboTwin 2.0 simulator (demo_clean config), 10 episodes, max_steps=400, action_chunk_exec=50, single fixed eval instruction "stack the bowls" (fair comparison).

Variant	Eval setting	Success rate
Single-instruction training	Fixed `"stack the bowls"`	7/10 (70%)
Multi-instruction training (this model)	Fixed `"stack the bowls"`	7/10 (70%)

The multi-instruction model trades some single-instruction performance for the ability to follow varied language commands. For tasks where instruction diversity helps (held-out new instructions), this trade-off may pay off.

Usage

from lerobot.policies.smolvla import SmolVLAPolicy

policy = SmolVLAPolicy.from_pretrained("arrow-hf/smolvla-robotwin-stack-bowls-two-50ep-multi")

See LeRobot documentation for inference setup.

Citation

Built on SmolVLA and SmolVLA-RoboTwin pretrained base, fine-tuned on data collected from RoboTwin 2.0.

Downloads last month: 16

Safetensors

Model size

0.5B params

Tensor type

F32

BF16

Video Preview

Robotics

Model tree for arrow-hf/smolvla-robotwin-stack-bowls-two-50ep-multi

Base model

lerobot/smolvla_base

Finetuned

lerobot/smolvla_robotwin