MolmoAct2 β€” Bimanual YAM cube-stacking (action-expert, chunk_size=30)

Fine-tune of allenai/MolmoAct2-BimanualYAM on the atharva-pantheon/yam-stack-cube dataset (44 bimanual-YAM demos, 3 cameras, 14-dim absolute joint pose, 10 fps). Task instruction: "stack the cubes".

Trained with LeRobot (molmoact2-policy branch) for a smooth, RL-ready continuous policy.

Recipe

  • Action-expert-only, continuous (VLM frozen; ~578M trainable / 5.4B total). No LoRA on the action expert.
  • chunk_size=30, n_action_steps=30 (3 s lookahead @ 10 fps β†’ long, smooth motion).
  • setup_type="bimanual yam robotic arms in molmoact2", control_mode="absolute joint pose".
  • bf16, 8 flow timesteps, action-expert lr 5e-5, cosine schedule (200 warmup), batch_size 8.
  • Normalization reused from the base checkpoint's yam_dual_molmoact2 tag (not recomputed on the 44-demo set, whose joint range is much narrower) for scale-consistent, smooth actions.

Checkpoints

  • 20k_run/checkpoint_020000/ β€” 20,000-step run, final loss β‰ˆ 0.009.
  • 40k_run/checkpoint_005000 … checkpoint_040000/ β€” 40,000-step run, every 5k steps.

Each folder is a LeRobot pretrained_model (weights + pre/post processors). Pick the best checkpoint on the physical robot (no simulator for YAM). Evaluation is on hardware.

Training loss (20k run)

20k training loss

Flow-matching loss decays smoothly 0.093 β†’ 0.009 (log scale), flattening after ~15k steps; no instability/spikes. The 40k-run curve is added under assets/ when that run completes.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Model tree for atharva-pantheon/MolmoAct2-BimanualYAM-stackcube

Finetuned
(1)
this model