---
license: mit
tags:
- robotics
- manipulation
- oat
- libero
- blockwise-decoding
---

# Blockwise-OAT — strict original-OAT baseline (LIBERO-10)

Paired evaluation of **autoregressive (AR)** vs **blockwise parallel tail** action-token
generation on a frozen [OAT](https://arxiv.org/abs/2602.04215) policy.

**HF repo:** [hackhackhack66666/Blockwise-OAT](https://huggingface.co/hackhackhack66666/Blockwise-OAT)  
**Code branch:** `Blockwise-OAT` on [GadzhiAskhabaliev/OAT-BLT-Dense](https://github.com/GadzhiAskhabaliev/OAT-BLT-Dense)

## Summary

Primary SR reference: **OAT8 paper** on LIBERO-10 — **56.3%** ([OAT](https://arxiv.org/abs/2602.04215), external benchmark).

| Metric | AR (our eval) | Blockwise (P=4, r=1) |
|--------|---------------|----------------------|
| LIBERO-10 mean SR | **58.73% ± 0.18%** | **52.33% ± 1.04%** |
| **Δ vs OAT paper (56.3%)** | **+2.43 pp** | **-3.97 pp** |
| Paired Δ (BW − AR, same protocol) | — | **-6.40 pp** |
| Tail train epochs | — | 15 (final CE 3.0607) |

Our frozen AR checkpoint reproduces above the paper on this cluster stack (58.73% vs 56.3%).
Blockwise trades SR for faster token generation; tail training was only 15 epochs (resume planned).

### Inference speed (V100, cuda:0)

**Decoder-only** — 8 action tokens after `cond` is computed (`benchmark_blockwise_vs_ar`, warmup=10, 50 repeats):

| Batch | AR | Blockwise | Speedup |
|-------|-----|-----------|---------|
| bs=1 | 22.3 ms | 19.3 ms | **1.16×** |
| bs=8 | 31.4 ms | 26.6 ms | **1.18×** |

**End-to-end `predict_action`** — vision encoder + decoder + detokenize (warmup=20, 100 repeats):

| Batch | AR | Blockwise | Speedup |
|-------|-----|-----------|---------|
| bs=1 | 36.4 ms | 30.1 ms | **1.21×** |
| bs=8 | 37.0 ms | 34.8 ms | **1.06×** |

Decoder speedup is modest (~14–18% faster at bs=1) because the tail module is comparable in size to the AR stack;
e2e gain is smaller still when the vision encoder dominates latency.

## Baseline artifacts (frozen)

| Component | Source |
|-----------|--------|
| Policy | [Mirageinv/oat — policy_ep-0250_sr-0.596.ckpt](https://huggingface.co/Mirageinv/oat) |
| Tokenizer | [Mirageinv/oat — tokenizer_ep-0950_mse-0.002.ckpt](https://huggingface.co/Mirageinv/oat) |
| Tail decoder | `checkpoints/original_oat_tail_p4_r1.pt` (this repo) |

## Architecture & data flow

OAT encodes observations and generates **8 action tokens** `z₁…z₈`. Blockwise-OAT splits decoding:

```
Obs (RGB + proprio) ──► Vision encoder ──► cond [B, T_o, d]
                                            │
            ┌───────────────────────────────┴───────────────────────────────┐
            │ AR path (baseline)                                          │
            │   BOS ──► AutoregressiveModel.generate (8 steps) ──► z₁…z₈ │
            └───────────────────────────────┬───────────────────────────────┘
                                            │
            ┌───────────────────────────────┴───────────────────────────────┐
            │ Blockwise path                                                │
            │   BOS ──► generate_prefix (P=4 AR steps) ──► z₁…z₄, h_prefix │
            │   (z₁…z₄, h_prefix) ──► ParallelTailDecoder (1 pass) ──► z₅…z₈│
            └───────────────────────────────┬───────────────────────────────┘
                                            ▼
                      cat(z_prefix, z_tail) ──► OATTok.detokenize ──► action chunk
```

**Inputs:** multi-view RGB, robot state, task id (same as OAT).  
**Outputs:** `action` / `action_pred` tensors (identical shapes for AR and Blockwise).  
**Trainable in this run:** only `ParallelTailDecoder` (~4.5M params, 0.90× AR size).

### Generation schedule

| Mode | AR forward passes | Tail passes |
|------|-------------------|-------------|
| Full AR | 8 | 0 |
| Blockwise P=4 | 4 | 1 |

## Experiment protocol

1. Download Mirageinv/oat policy + tokenizer.
2. Train `ParallelTailDecoder` on `libero10_N500` with frozen policy (15 epochs, bs=64, lr=1e-4).
3. Paired sim-eval: `50` episodes/task × `3` seeds (`test_start_seed=1000`).
4. Benchmarks: dataset / training / policy verification + wall-clock speed.

Cluster launcher: `scripts/cluster/run_blockwise_original_oat_baseline.sh` (`PHASE=B NUM_EXP=3`).

## Visualizations

| Figure | Description |
|--------|-------------|
| ![AR eval](benchmarks/ar_sim_eval_dashboard.png) | AR per-task SR |
| ![Blockwise eval](benchmarks/blockwise_sim_eval_dashboard.png) | Blockwise per-task SR |
| ![Paired](benchmarks/paired_sr_comparison_dashboard.png) | Side-by-side per-task comparison |
| ![Speed](benchmarks/speed_benchmark_dashboard.png) | Decoder + E2E latency |
| ![Tail train](benchmarks/tail_training_dashboard.png) | Tail CE loss curve |
| ![Verify](benchmarks/verification_summary_dashboard.png) | Verification kit |

## Repository layout

```
checkpoints/original_oat_tail_p4_r1.pt   # trained tail decoder
eval/ar_eval_log.json                    # AR sim metrics
eval/blockwise_eval_log.json             # Blockwise sim metrics
benchmarks/*.json                        # verification + speed raw logs
benchmarks/*_dashboard.png               # plots above
```

## Reproduce inference

```bash
python scripts/eval_policy_sim.py \
  -c output/baselines/original_oat/hf/policy_ep-0250_sr-0.596.ckpt \
  -o output/eval/blockwise/ar \
  --tokenizer-checkpoint output/baselines/original_oat/hf/tokenizer_ep-0950_mse-0.002.ckpt

python scripts/eval_policy_sim.py \
  -c output/baselines/original_oat/hf/policy_ep-0250_sr-0.596.ckpt \
  -o output/eval/blockwise/bw \
  --use-blockwise --blockwise-prefix-len 4 --blockwise-refine-iters 1 \
  --blockwise-tail-checkpoint checkpoints/original_oat_tail_p4_r1.pt \
  --tokenizer-checkpoint output/baselines/original_oat/hf/tokenizer_ep-0950_mse-0.002.ckpt
```

## Citation

```bibtex
@misc{liu2026oatorderedactiontokenization,
      title={OAT: Ordered Action Tokenization},
      author={Chaoqi Liu and Xiaoshen Han and Jiawei Gao and Yue Zhao and Haonan Chen and Yilun Du},
      year={2026},
      eprint={2602.04215},
      archivePrefix={arXiv},
      primaryClass={cs.RO}}
```

## Phase 2 (next)

Phase 1 strict baseline is **complete** on branch `Blockwise-OAT`.

1. Resume tail training from `original_oat_tail_p4_r1.pt` (target 30+ epochs).
2. Re-run paired AR vs Blockwise LIBERO-10 confirm eval.
3. Re-run speed / verification benchmarks; publish Phase 2 bundle to HF.