Add Blockwise-OAT baseline artifacts (README.md)

7313017 verified 8 days ago

7.08 kB

license: mit
tags:
  - robotics
  - manipulation
  - oat
  - libero
  - blockwise-decoding

Blockwise-OAT — strict original-OAT baseline (LIBERO-10)

Paired evaluation of autoregressive (AR) vs blockwise parallel tail action-token generation on a frozen OAT policy.

HF repo: hackhackhack66666/Blockwise-OAT
Code branch: Blockwise-OAT on GadzhiAskhabaliev/OAT-BLT-Dense

Summary

Primary SR reference: OAT8 paper on LIBERO-10 — 56.3% (OAT, external benchmark).

Metric	AR (our eval)	Blockwise (P=4, r=1)
LIBERO-10 mean SR	58.73% ± 0.18%	52.33% ± 1.04%
Δ vs OAT paper (56.3%)	+2.43 pp	-3.97 pp
Paired Δ (BW − AR, same protocol)	—	-6.40 pp
Tail train epochs	—	15 (final CE 3.0607)

Our frozen AR checkpoint reproduces above the paper on this cluster stack (58.73% vs 56.3%). Blockwise trades SR for faster token generation; tail training was only 15 epochs (resume planned).

Inference speed (V100, cuda:0)

Decoder-only — 8 action tokens after cond is computed (benchmark_blockwise_vs_ar, warmup=10, 50 repeats):

Batch	AR	Blockwise	Speedup
bs=1	22.3 ms	19.3 ms	1.16×
bs=8	31.4 ms	26.6 ms	1.18×

End-to-end predict_action — vision encoder + decoder + detokenize (warmup=20, 100 repeats):

Batch	AR	Blockwise	Speedup
bs=1	36.4 ms	30.1 ms	1.21×
bs=8	37.0 ms	34.8 ms	1.06×

Decoder speedup is modest (~14–18% faster at bs=1) because the tail module is comparable in size to the AR stack; e2e gain is smaller still when the vision encoder dominates latency.

Baseline artifacts (frozen)

Component	Source
Policy	Mirageinv/oat — policy_ep-0250_sr-0.596.ckpt
Tokenizer	Mirageinv/oat — tokenizer_ep-0950_mse-0.002.ckpt
Tail decoder	`checkpoints/original_oat_tail_p4_r1.pt` (this repo)

Architecture & data flow

OAT encodes observations and generates 8 action tokens z₁…z₈. Blockwise-OAT splits decoding:

Obs (RGB + proprio) ──► Vision encoder ──► cond [B, T_o, d]
                                            │
            ┌───────────────────────────────┴───────────────────────────────┐
            │ AR path (baseline)                                          │
            │   BOS ──► AutoregressiveModel.generate (8 steps) ──► z₁…z₈ │
            └───────────────────────────────┬───────────────────────────────┘
                                            │
            ┌───────────────────────────────┴───────────────────────────────┐
            │ Blockwise path                                                │
            │   BOS ──► generate_prefix (P=4 AR steps) ──► z₁…z₄, h_prefix │
            │   (z₁…z₄, h_prefix) ──► ParallelTailDecoder (1 pass) ──► z₅…z₈│
            └───────────────────────────────┬───────────────────────────────┘
                                            ▼
                      cat(z_prefix, z_tail) ──► OATTok.detokenize ──► action chunk

Inputs: multi-view RGB, robot state, task id (same as OAT).
Outputs: action / action_pred tensors (identical shapes for AR and Blockwise).
Trainable in this run: only ParallelTailDecoder (~4.5M params, 0.90× AR size).

Generation schedule

Mode	AR forward passes	Tail passes
Full AR	8	0
Blockwise P=4	4	1

Experiment protocol

Download Mirageinv/oat policy + tokenizer.
Train ParallelTailDecoder on libero10_N500 with frozen policy (15 epochs, bs=64, lr=1e-4).
Paired sim-eval: 50 episodes/task × 3 seeds (test_start_seed=1000).
Benchmarks: dataset / training / policy verification + wall-clock speed.

Cluster launcher: scripts/cluster/run_blockwise_original_oat_baseline.sh (PHASE=B NUM_EXP=3).

Visualizations

Figure	Description
	AR per-task SR
	Blockwise per-task SR
	Side-by-side per-task comparison
	Decoder + E2E latency
	Tail CE loss curve
	Verification kit

Repository layout

checkpoints/original_oat_tail_p4_r1.pt   # trained tail decoder
eval/ar_eval_log.json                    # AR sim metrics
eval/blockwise_eval_log.json             # Blockwise sim metrics
benchmarks/*.json                        # verification + speed raw logs
benchmarks/*_dashboard.png               # plots above

Reproduce inference

python scripts/eval_policy_sim.py \
  -c output/baselines/original_oat/hf/policy_ep-0250_sr-0.596.ckpt \
  -o output/eval/blockwise/ar \
  --tokenizer-checkpoint output/baselines/original_oat/hf/tokenizer_ep-0950_mse-0.002.ckpt

python scripts/eval_policy_sim.py \
  -c output/baselines/original_oat/hf/policy_ep-0250_sr-0.596.ckpt \
  -o output/eval/blockwise/bw \
  --use-blockwise --blockwise-prefix-len 4 --blockwise-refine-iters 1 \
  --blockwise-tail-checkpoint checkpoints/original_oat_tail_p4_r1.pt \
  --tokenizer-checkpoint output/baselines/original_oat/hf/tokenizer_ep-0950_mse-0.002.ckpt

Citation

@misc{liu2026oatorderedactiontokenization,
      title={OAT: Ordered Action Tokenization},
      author={Chaoqi Liu and Xiaoshen Han and Jiawei Gao and Yue Zhao and Haonan Chen and Yilun Du},
      year={2026},
      eprint={2602.04215},
      archivePrefix={arXiv},
      primaryClass={cs.RO}}

Phase 2 (next)

Phase 1 strict baseline is complete on branch Blockwise-OAT.

Resume tail training from original_oat_tail_p4_r1.pt (target 30+ epochs).
Re-run paired AR vs Blockwise LIBERO-10 confirm eval.
Re-run speed / verification benchmarks; publish Phase 2 bundle to HF.