license: mit
tags:
- robotics
- manipulation
- oat
- libero
- blockwise-decoding
Blockwise-OAT β strict original-OAT baseline (LIBERO-10)
Paired evaluation of autoregressive (AR) vs blockwise parallel tail action-token generation on a frozen OAT policy.
HF repo: hackhackhack66666/Blockwise-OAT
Code branch: Blockwise-OAT on GadzhiAskhabaliev/OAT-BLT-Dense
Summary
Primary SR reference: OAT8 paper on LIBERO-10 β 56.3% (OAT, external benchmark).
| Metric | AR (our eval) | Blockwise (P=4, r=1) |
|---|---|---|
| LIBERO-10 mean SR | 58.73% Β± 0.18% | 52.33% Β± 1.04% |
| Ξ vs OAT paper (56.3%) | +2.43 pp | -3.97 pp |
| Paired Ξ (BW β AR, same protocol) | β | -6.40 pp |
| Tail train epochs | β | 15 (final CE 3.0607) |
Our frozen AR checkpoint reproduces above the paper on this cluster stack (58.73% vs 56.3%). Blockwise trades SR for faster token generation; tail training was only 15 epochs (resume planned).
Inference speed (V100, cuda:0)
Decoder-only β 8 action tokens after cond is computed (benchmark_blockwise_vs_ar, warmup=10, 50 repeats):
| Batch | AR | Blockwise | Speedup |
|---|---|---|---|
| bs=1 | 22.3 ms | 19.3 ms | 1.16Γ |
| bs=8 | 31.4 ms | 26.6 ms | 1.18Γ |
End-to-end predict_action β vision encoder + decoder + detokenize (warmup=20, 100 repeats):
| Batch | AR | Blockwise | Speedup |
|---|---|---|---|
| bs=1 | 36.4 ms | 30.1 ms | 1.21Γ |
| bs=8 | 37.0 ms | 34.8 ms | 1.06Γ |
Decoder speedup is modest (~14β18% faster at bs=1) because the tail module is comparable in size to the AR stack; e2e gain is smaller still when the vision encoder dominates latency.
Baseline artifacts (frozen)
| Component | Source |
|---|---|
| Policy | Mirageinv/oat β policy_ep-0250_sr-0.596.ckpt |
| Tokenizer | Mirageinv/oat β tokenizer_ep-0950_mse-0.002.ckpt |
| Tail decoder | checkpoints/original_oat_tail_p4_r1.pt (this repo) |
Architecture & data flow
OAT encodes observations and generates 8 action tokens zββ¦zβ. Blockwise-OAT splits decoding:
Obs (RGB + proprio) βββΊ Vision encoder βββΊ cond [B, T_o, d]
β
βββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββ
β AR path (baseline) β
β BOS βββΊ AutoregressiveModel.generate (8 steps) βββΊ zββ¦zβ β
βββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββ
β Blockwise path β
β BOS βββΊ generate_prefix (P=4 AR steps) βββΊ zββ¦zβ, h_prefix β
β (zββ¦zβ, h_prefix) βββΊ ParallelTailDecoder (1 pass) βββΊ zβ
β¦zββ
βββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ
βΌ
cat(z_prefix, z_tail) βββΊ OATTok.detokenize βββΊ action chunk
Inputs: multi-view RGB, robot state, task id (same as OAT).
Outputs: action / action_pred tensors (identical shapes for AR and Blockwise).
Trainable in this run: only ParallelTailDecoder (~4.5M params, 0.90Γ AR size).
Generation schedule
| Mode | AR forward passes | Tail passes |
|---|---|---|
| Full AR | 8 | 0 |
| Blockwise P=4 | 4 | 1 |
Experiment protocol
- Download Mirageinv/oat policy + tokenizer.
- Train
ParallelTailDecoderonlibero10_N500with frozen policy (15 epochs, bs=64, lr=1e-4). - Paired sim-eval:
50episodes/task Γ3seeds (test_start_seed=1000). - Benchmarks: dataset / training / policy verification + wall-clock speed.
Cluster launcher: scripts/cluster/run_blockwise_original_oat_baseline.sh (PHASE=B NUM_EXP=3).
Visualizations
| Figure | Description |
|---|---|
![]() |
AR per-task SR |
![]() |
Blockwise per-task SR |
![]() |
Side-by-side per-task comparison |
![]() |
Decoder + E2E latency |
![]() |
Tail CE loss curve |
![]() |
Verification kit |
Repository layout
checkpoints/original_oat_tail_p4_r1.pt # trained tail decoder
eval/ar_eval_log.json # AR sim metrics
eval/blockwise_eval_log.json # Blockwise sim metrics
benchmarks/*.json # verification + speed raw logs
benchmarks/*_dashboard.png # plots above
Reproduce inference
python scripts/eval_policy_sim.py \
-c output/baselines/original_oat/hf/policy_ep-0250_sr-0.596.ckpt \
-o output/eval/blockwise/ar \
--tokenizer-checkpoint output/baselines/original_oat/hf/tokenizer_ep-0950_mse-0.002.ckpt
python scripts/eval_policy_sim.py \
-c output/baselines/original_oat/hf/policy_ep-0250_sr-0.596.ckpt \
-o output/eval/blockwise/bw \
--use-blockwise --blockwise-prefix-len 4 --blockwise-refine-iters 1 \
--blockwise-tail-checkpoint checkpoints/original_oat_tail_p4_r1.pt \
--tokenizer-checkpoint output/baselines/original_oat/hf/tokenizer_ep-0950_mse-0.002.ckpt
Citation
@misc{liu2026oatorderedactiontokenization,
title={OAT: Ordered Action Tokenization},
author={Chaoqi Liu and Xiaoshen Han and Jiawei Gao and Yue Zhao and Haonan Chen and Yilun Du},
year={2026},
eprint={2602.04215},
archivePrefix={arXiv},
primaryClass={cs.RO}}
Phase 2 (next)
Phase 1 strict baseline is complete on branch Blockwise-OAT.
- Resume tail training from
original_oat_tail_p4_r1.pt(target 30+ epochs). - Re-run paired AR vs Blockwise LIBERO-10 confirm eval.
- Re-run speed / verification benchmarks; publish Phase 2 bundle to HF.





