MolmoAct2 — Bimanual YAM cube-stacking (action-expert, chunk_size=30)

Fine-tune of allenai/MolmoAct2-BimanualYAM on the atharva-pantheon/yam-stack-cube dataset (44 bimanual-YAM demos, 3 cameras, 14-dim absolute joint pose, 10 fps). Task instruction: "stack the cubes".

Trained with LeRobot (molmoact2-policy branch) for a smooth, RL-ready continuous policy.

Recipe

Action-expert-only, continuous (VLM frozen; ~578M trainable / 5.4B total). No LoRA on the action expert.
chunk_size=30, n_action_steps=30 (3 s lookahead @ 10 fps → long, smooth motion).
setup_type="bimanual yam robotic arms in molmoact2", control_mode="absolute joint pose".
bf16, 8 flow timesteps, action-expert lr 5e-5, cosine schedule (200 warmup), batch_size 8.
Normalization reused from the base checkpoint's yam_dual_molmoact2 tag (not recomputed on the 44-demo set, whose joint range is much narrower) for scale-consistent, smooth actions.

Checkpoints

20k_run/checkpoint_020000/ — 20,000-step run, final loss ≈ 0.009.
40k_run/checkpoint_005000 … checkpoint_040000/ — 40,000-step run, every 5k steps.

Each folder is a LeRobot pretrained_model (weights + pre/post processors). Pick the best checkpoint on the physical robot (no simulator for YAM). Evaluation is on hardware.

Training loss (20k run)

Flow-matching loss decays smoothly 0.093 → 0.009 (log scale), flattening after ~15k steps; no instability/spikes. The 40k-run curve is added under assets/ when that run completes.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics

Model tree for atharva-pantheon/MolmoAct2-BimanualYAM-stackcube

Base model

allenai/MolmoAct2-BimanualYAM

Finetuned

(1)

this model