Blockwise-OAT / README.md
hackhackhack66666's picture
Add Blockwise-OAT baseline artifacts (README.md)
7313017 verified
|
Raw
History Blame Contribute Delete
7.08 kB
metadata
license: mit
tags:
  - robotics
  - manipulation
  - oat
  - libero
  - blockwise-decoding

Blockwise-OAT β€” strict original-OAT baseline (LIBERO-10)

Paired evaluation of autoregressive (AR) vs blockwise parallel tail action-token generation on a frozen OAT policy.

HF repo: hackhackhack66666/Blockwise-OAT
Code branch: Blockwise-OAT on GadzhiAskhabaliev/OAT-BLT-Dense

Summary

Primary SR reference: OAT8 paper on LIBERO-10 β€” 56.3% (OAT, external benchmark).

Metric AR (our eval) Blockwise (P=4, r=1)
LIBERO-10 mean SR 58.73% Β± 0.18% 52.33% Β± 1.04%
Ξ” vs OAT paper (56.3%) +2.43 pp -3.97 pp
Paired Ξ” (BW βˆ’ AR, same protocol) β€” -6.40 pp
Tail train epochs β€” 15 (final CE 3.0607)

Our frozen AR checkpoint reproduces above the paper on this cluster stack (58.73% vs 56.3%). Blockwise trades SR for faster token generation; tail training was only 15 epochs (resume planned).

Inference speed (V100, cuda:0)

Decoder-only β€” 8 action tokens after cond is computed (benchmark_blockwise_vs_ar, warmup=10, 50 repeats):

Batch AR Blockwise Speedup
bs=1 22.3 ms 19.3 ms 1.16Γ—
bs=8 31.4 ms 26.6 ms 1.18Γ—

End-to-end predict_action β€” vision encoder + decoder + detokenize (warmup=20, 100 repeats):

Batch AR Blockwise Speedup
bs=1 36.4 ms 30.1 ms 1.21Γ—
bs=8 37.0 ms 34.8 ms 1.06Γ—

Decoder speedup is modest (~14–18% faster at bs=1) because the tail module is comparable in size to the AR stack; e2e gain is smaller still when the vision encoder dominates latency.

Baseline artifacts (frozen)

Component Source
Policy Mirageinv/oat β€” policy_ep-0250_sr-0.596.ckpt
Tokenizer Mirageinv/oat β€” tokenizer_ep-0950_mse-0.002.ckpt
Tail decoder checkpoints/original_oat_tail_p4_r1.pt (this repo)

Architecture & data flow

OAT encodes observations and generates 8 action tokens z₁…zβ‚ˆ. Blockwise-OAT splits decoding:

Obs (RGB + proprio) ──► Vision encoder ──► cond [B, T_o, d]
                                            β”‚
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚ AR path (baseline)                                          β”‚
            β”‚   BOS ──► AutoregressiveModel.generate (8 steps) ──► z₁…zβ‚ˆ β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                            β”‚
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚ Blockwise path                                                β”‚
            β”‚   BOS ──► generate_prefix (P=4 AR steps) ──► z₁…zβ‚„, h_prefix β”‚
            β”‚   (z₁…zβ‚„, h_prefix) ──► ParallelTailDecoder (1 pass) ──► z₅…zβ‚ˆβ”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                            β–Ό
                      cat(z_prefix, z_tail) ──► OATTok.detokenize ──► action chunk

Inputs: multi-view RGB, robot state, task id (same as OAT).
Outputs: action / action_pred tensors (identical shapes for AR and Blockwise).
Trainable in this run: only ParallelTailDecoder (~4.5M params, 0.90Γ— AR size).

Generation schedule

Mode AR forward passes Tail passes
Full AR 8 0
Blockwise P=4 4 1

Experiment protocol

  1. Download Mirageinv/oat policy + tokenizer.
  2. Train ParallelTailDecoder on libero10_N500 with frozen policy (15 epochs, bs=64, lr=1e-4).
  3. Paired sim-eval: 50 episodes/task Γ— 3 seeds (test_start_seed=1000).
  4. Benchmarks: dataset / training / policy verification + wall-clock speed.

Cluster launcher: scripts/cluster/run_blockwise_original_oat_baseline.sh (PHASE=B NUM_EXP=3).

Visualizations

Figure Description
AR eval AR per-task SR
Blockwise eval Blockwise per-task SR
Paired Side-by-side per-task comparison
Speed Decoder + E2E latency
Tail train Tail CE loss curve
Verify Verification kit

Repository layout

checkpoints/original_oat_tail_p4_r1.pt   # trained tail decoder
eval/ar_eval_log.json                    # AR sim metrics
eval/blockwise_eval_log.json             # Blockwise sim metrics
benchmarks/*.json                        # verification + speed raw logs
benchmarks/*_dashboard.png               # plots above

Reproduce inference

python scripts/eval_policy_sim.py \
  -c output/baselines/original_oat/hf/policy_ep-0250_sr-0.596.ckpt \
  -o output/eval/blockwise/ar \
  --tokenizer-checkpoint output/baselines/original_oat/hf/tokenizer_ep-0950_mse-0.002.ckpt

python scripts/eval_policy_sim.py \
  -c output/baselines/original_oat/hf/policy_ep-0250_sr-0.596.ckpt \
  -o output/eval/blockwise/bw \
  --use-blockwise --blockwise-prefix-len 4 --blockwise-refine-iters 1 \
  --blockwise-tail-checkpoint checkpoints/original_oat_tail_p4_r1.pt \
  --tokenizer-checkpoint output/baselines/original_oat/hf/tokenizer_ep-0950_mse-0.002.ckpt

Citation

@misc{liu2026oatorderedactiontokenization,
      title={OAT: Ordered Action Tokenization},
      author={Chaoqi Liu and Xiaoshen Han and Jiawei Gao and Yue Zhao and Haonan Chen and Yilun Du},
      year={2026},
      eprint={2602.04215},
      archivePrefix={arXiv},
      primaryClass={cs.RO}}

Phase 2 (next)

Phase 1 strict baseline is complete on branch Blockwise-OAT.

  1. Resume tail training from original_oat_tail_p4_r1.pt (target 30+ epochs).
  2. Re-run paired AR vs Blockwise LIBERO-10 confirm eval.
  3. Re-run speed / verification benchmarks; publish Phase 2 bundle to HF.