OAT RoboMimic Lift (Two-Stage) Artifacts

This repository stores artifacts for the two-stage OAT reproduction pipeline on RoboMimic Lift Image:

  1. Stage 1: train tokenizer from scratch on Lift actions.
  2. Stage 2: train policy from scratch using the Stage-1 tokenizer checkpoint.

Stage 1 (Tokenizer) summary

  • Dataset: data/robomimic_lift/lift_N200.zarr
  • Epochs: 0 -> 5000
  • Final logged metrics:
    • train_loss: 0.005778
    • val_loss: 0.006557
    • test_reconst_mse: 0.003936
  • Checkpoint used for stage2:
    • checkpoints/tokenizer_lift_stage1_latest.ckpt

Stage 2 (Policy) summary

  • Dataset: data/robomimic_lift/lift_N200.zarr
  • Tokenizer injected via:
    • policy.action_tokenizer.checkpoint=.../lift_stage1/checkpoints/latest.ckpt
  • Epochs: 0 -> 5000
  • Final logged metrics:
    • train_loss: 1.129704
    • val_loss: 7.317752
    • test_reconst_mse: 0.071863
  • Policy checkpoint:
    • checkpoints/policy_lift_stage2_latest.ckpt

Included artifacts

  • checkpoints/:
    • tokenizer_lift_stage1_latest.ckpt
    • policy_lift_stage2_latest.ckpt
  • logs/:
    • lift_stage1_logs.json
    • lift_stage1_tmux.log
    • lift_stage2_logs.json
    • lift_stage2_tmux.log
  • plots/:
    • plots/stage1/* (tokenizer dashboards)
    • plots/stage2/* (policy dashboards)

Evaluation

Policy evaluation for Lift is launched separately after stage2 completion using:

python scripts/eval_policy_sim.py \
  --checkpoint output/lift_stage2/checkpoints/latest.ckpt \
  --output_dir output/eval/robomimic_lift_stage2 \
  --num_exp 10

Evaluation (RoboMimic Lift)

  • Checkpoint: output/lift_stage2/checkpoints/latest.ckpt
  • Number of experiments: 10
  • Mean success rate: 0.8380 (std 0.0494, stderr 0.0156)
  • Mean episode length: 0.00
  • Interpretation: model behavior is more reproducible when stderr is small and the spread in eval/plots/lift_eval_dashboard.png is narrow.
  • Artifacts: raw eval JSON/log in eval/, videos in eval/, dashboard plots in eval/plots/.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support