Pi 0.5 — LIBERO-Goal (expert-only)

Pi 0.5 with the action expert fine-tuned on LIBERO-Goal while keeping the VLM backbone (PaliGemma 2 LLM) frozen at the base Pi 0.5 weights. Released to enable reproduction of the LIBERO-Para benchmark (paraphrase-robustness evaluation of VLA models).

This is the Pi05_expert model in the paper's 7-model comparison.

What "expert-only" means here. Pi 0.5 is a two-stack VLA: a VLM (PaliGemma 2 LLM + image encoder) for language/vision grounding, plus a flow-matching action expert that produces low-level actions. In this checkpoint only the action expert was fine-tuned on LIBERO-Goal demonstrations; the LLM stays at the original Pi 0.5 base weights. This isolates action-side adaptation from any change to the language/vision representation, which is the variant we use in the paper when probing paraphrase robustness.


Base	Pi 0.5 (Physical Intelligence)
Architecture	PaliGemma 2 VLM + flow-matching action expert
Fine-tuned modules	action expert only (VLM kept frozen at base Pi 0.5 weights)
Fine-tune data	LIBERO-Goal demonstrations
Batch size	256
Steps	30 000
Format	Orbax sharded (`params/` for inference, `train_state/` for resume)
Total size	~14.6 GB (`params` 5.8 GB + `train_state` 8.8 GB)
Eval benchmark	LIBERO-Para (1 base eval + 4 092 paraphrases × 5 seeds)

Companion codebase

Benchmark / eval / metrics: https://github.com/cau-hai-lab/LIBERO-Para Official code for "LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Models" (arXiv 2603.28301). Contains:
- 4 092 paraphrased LIBERO-Goal instructions across 43 cells (Object × Action types)
- PRIDE metric implementation (S_K, S_T, PD)
- Per-model eval scripts for the 7 VLAs reported in the paper
- Cell-level SR analysis & cross-model trajectory analysis
VLA training code: Physical-Intelligence/openpi (use its LIBERO config + this checkpoint).

Directory layout

.
├── README.md
├── _CHECKPOINT_METADATA
├── params/             # 5.8 GB — model weights (download this for inference)
│   ├── manifest.ocdbt
│   ├── _METADATA
│   ├── _sharding
│   ├── array_metadatas/
│   ├── d/
│   └── ocdbt.process_0/
└── train_state/        # 8.8 GB — optimizer + EMA state (download only to resume training)
    └── ... (same Orbax layout)

Quick start — inference

pip install huggingface_hub openpi  # openpi is required for inference

from huggingface_hub import snapshot_download
from openpi.training import config as _config
from openpi.policies import policy_config

# 1. Download checkpoint (params + metadata only — ~5.8 GB)
ckpt_dir = snapshot_download(
    repo_id="HAI-Lab/pi05-libero_goal-expert_only",
    allow_patterns=["params/**", "_CHECKPOINT_METADATA"],
)

# 2. Build policy with the LIBERO-Goal config (defined in openpi)
config = _config.get_config("pi05_libero")          # or your own config
policy = policy_config.create_trained_policy(config, ckpt_dir)

# 3. Inference
# obs = {"image": ..., "state": ..., "prompt": "open the middle drawer"}
# action = policy.infer(obs)

Quick start — LIBERO-Para paraphrase-robustness eval

git clone https://github.com/cau-hai-lab/LIBERO-Para
cd LIBERO-Para

# Follow the per-model eval guide (Pi 0.5 section):
#   eval_guides/pi05.md
# pointing it at this checkpoint:
#   /path/to/HAI-Lab/pi05-libero_goal-expert_only

The repo will run the 4 092 paraphrase × 5 seed × 10 task sweep, compute per-cell success rates, and report PRIDE / S_K / S_T scores reproducing Table X of the paper for the Pi05_expert row.

Reproducing the `Pi05_expert` paper numbers

metric	value
Overall SR (canonical LIBERO-Goal, mean across 5 seeds)	(filled in by eval scripts)
LIBERO-Para SR (4 092 paraphrases, mean across 5 seeds)	(see paper Table X)
PRIDE (α = 0.5)	(see paper Table X)

Run LIBERO-Para's scripts/run_eval_pi05_expert.sh (or equivalent) to regenerate.

Resuming training

ckpt_dir = snapshot_download(
    repo_id="HAI-Lab/pi05-libero_goal-expert_only",
)  # full repo, includes train_state/

Then point your openpi training config's resume_from at ckpt_dir.

Citation

If you use this checkpoint, please cite both the LIBERO-Para paper and the original Pi 0.5 release:

@misc{kim2026liberoparadiagnosticbenchmarkmetrics,
      title={LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Models},
      author={Chanyoung Kim and Minwoo Kim and Minseok Kang and Hyunwoo Kim and Dahuin Jung},
      year={2026},
      eprint={2603.28301},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2603.28301},
}

@misc{pi05_2025,
  title  = {{\pi}-0.5: A Vision-Language-Action Model with Open-World Generalization},
  author = {{Physical Intelligence}},
  year   = {2025},
  url    = {https://www.physicalintelligence.company/blog/pi05},
}

License

Apache 2.0 — same as the base Pi 0.5 release. The LIBERO-Para benchmark and its evaluation code are MIT-licensed. By using this checkpoint you also agree to Physical Intelligence's terms for the Pi 0.5 base model.

Acknowledgments

Physical Intelligence — Pi 0.5 base model
LIBERO — original benchmark
openpi — training & inference framework
LIBERO-Para — paraphrase-robustness benchmark

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics

Paper for HAI-Lab/pi05-libero_goal-expert_only

LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Models

Paper • 2603.28301 • Published Mar 30 • 83