OAT RoboMimic Lift (Two-Stage) Artifacts
This repository stores artifacts for the two-stage OAT reproduction pipeline on RoboMimic Lift Image:
- Stage 1: train tokenizer from scratch on Lift actions.
- Stage 2: train policy from scratch using the Stage-1 tokenizer checkpoint.
Stage 1 (Tokenizer) summary
- Dataset:
data/robomimic_lift/lift_N200.zarr - Epochs:
0 -> 5000 - Final logged metrics:
train_loss: 0.005778val_loss: 0.006557test_reconst_mse: 0.003936
- Checkpoint used for stage2:
checkpoints/tokenizer_lift_stage1_latest.ckpt
Stage 2 (Policy) summary
- Dataset:
data/robomimic_lift/lift_N200.zarr - Tokenizer injected via:
policy.action_tokenizer.checkpoint=.../lift_stage1/checkpoints/latest.ckpt
- Epochs:
0 -> 5000 - Final logged metrics:
train_loss: 1.129704val_loss: 7.317752test_reconst_mse: 0.071863
- Policy checkpoint:
checkpoints/policy_lift_stage2_latest.ckpt
Included artifacts
checkpoints/:tokenizer_lift_stage1_latest.ckptpolicy_lift_stage2_latest.ckpt
logs/:lift_stage1_logs.jsonlift_stage1_tmux.loglift_stage2_logs.jsonlift_stage2_tmux.log
plots/:plots/stage1/*(tokenizer dashboards)plots/stage2/*(policy dashboards)
Evaluation
Policy evaluation for Lift is launched separately after stage2 completion using:
python scripts/eval_policy_sim.py \
--checkpoint output/lift_stage2/checkpoints/latest.ckpt \
--output_dir output/eval/robomimic_lift_stage2 \
--num_exp 10
Evaluation (RoboMimic Lift)
- Checkpoint:
output/lift_stage2/checkpoints/latest.ckpt - Number of experiments:
10 - Mean success rate:
0.8380(std0.0494, stderr0.0156) - Mean episode length:
0.00 - Interpretation: model behavior is more reproducible when stderr is small and the spread in
eval/plots/lift_eval_dashboard.pngis narrow. - Artifacts: raw eval JSON/log in
eval/, videos ineval/, dashboard plots ineval/plots/.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support