Pi 0.5 β LIBERO-Goal (expert-only)
Pi 0.5 with the action expert fine-tuned on LIBERO-Goal while keeping the VLM backbone (PaliGemma 2 LLM) frozen at the base Pi 0.5 weights. Released to enable reproduction of the LIBERO-Para benchmark (paraphrase-robustness evaluation of VLA models).
This is the Pi05_expert model in the paper's 7-model comparison.
What "expert-only" means here. Pi 0.5 is a two-stack VLA: a VLM (PaliGemma 2 LLM + image encoder) for language/vision grounding, plus a flow-matching action expert that produces low-level actions. In this checkpoint only the action expert was fine-tuned on LIBERO-Goal demonstrations; the LLM stays at the original Pi 0.5 base weights. This isolates action-side adaptation from any change to the language/vision representation, which is the variant we use in the paper when probing paraphrase robustness.
| Base | Pi 0.5 (Physical Intelligence) |
| Architecture | PaliGemma 2 VLM + flow-matching action expert |
| Fine-tuned modules | action expert only (VLM kept frozen at base Pi 0.5 weights) |
| Fine-tune data | LIBERO-Goal demonstrations |
| Batch size | 256 |
| Steps | 30 000 |
| Format | Orbax sharded (params/ for inference, train_state/ for resume) |
| Total size | ~14.6 GB (params 5.8 GB + train_state 8.8 GB) |
| Eval benchmark | LIBERO-Para (1 base eval + 4 092 paraphrases Γ 5 seeds) |
Companion codebase
- Benchmark / eval / metrics: https://github.com/cau-hai-lab/LIBERO-Para
Official code for "LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase
Robustness in VLA Models" (arXiv 2603.28301). Contains:
- 4 092 paraphrased LIBERO-Goal instructions across 43 cells (Object Γ Action types)
- PRIDE metric implementation (S_K, S_T, PD)
- Per-model eval scripts for the 7 VLAs reported in the paper
- Cell-level SR analysis & cross-model trajectory analysis
- VLA training code: Physical-Intelligence/openpi (use its LIBERO config + this checkpoint).
Directory layout
.
βββ README.md
βββ _CHECKPOINT_METADATA
βββ params/ # 5.8 GB β model weights (download this for inference)
β βββ manifest.ocdbt
β βββ _METADATA
β βββ _sharding
β βββ array_metadatas/
β βββ d/
β βββ ocdbt.process_0/
βββ train_state/ # 8.8 GB β optimizer + EMA state (download only to resume training)
βββ ... (same Orbax layout)
Quick start β inference
pip install huggingface_hub openpi # openpi is required for inference
from huggingface_hub import snapshot_download
from openpi.training import config as _config
from openpi.policies import policy_config
# 1. Download checkpoint (params + metadata only β ~5.8 GB)
ckpt_dir = snapshot_download(
repo_id="HAI-Lab/pi05-libero_goal-expert_only",
allow_patterns=["params/**", "_CHECKPOINT_METADATA"],
)
# 2. Build policy with the LIBERO-Goal config (defined in openpi)
config = _config.get_config("pi05_libero") # or your own config
policy = policy_config.create_trained_policy(config, ckpt_dir)
# 3. Inference
# obs = {"image": ..., "state": ..., "prompt": "open the middle drawer"}
# action = policy.infer(obs)
Quick start β LIBERO-Para paraphrase-robustness eval
git clone https://github.com/cau-hai-lab/LIBERO-Para
cd LIBERO-Para
# Follow the per-model eval guide (Pi 0.5 section):
# eval_guides/pi05.md
# pointing it at this checkpoint:
# /path/to/HAI-Lab/pi05-libero_goal-expert_only
The repo will run the 4 092 paraphrase Γ 5 seed Γ 10 task sweep, compute
per-cell success rates, and report PRIDE / S_K / S_T scores reproducing
Table X of the paper for the Pi05_expert row.
Reproducing the Pi05_expert paper numbers
| metric | value |
|---|---|
| Overall SR (canonical LIBERO-Goal, mean across 5 seeds) | (filled in by eval scripts) |
| LIBERO-Para SR (4 092 paraphrases, mean across 5 seeds) | (see paper Table X) |
| PRIDE (Ξ± = 0.5) | (see paper Table X) |
Run LIBERO-Para's
scripts/run_eval_pi05_expert.sh (or equivalent) to regenerate.
Resuming training
ckpt_dir = snapshot_download(
repo_id="HAI-Lab/pi05-libero_goal-expert_only",
) # full repo, includes train_state/
Then point your openpi training config's resume_from at ckpt_dir.
Citation
If you use this checkpoint, please cite both the LIBERO-Para paper and the original Pi 0.5 release:
@misc{kim2026liberoparadiagnosticbenchmarkmetrics,
title={LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Models},
author={Chanyoung Kim and Minwoo Kim and Minseok Kang and Hyunwoo Kim and Dahuin Jung},
year={2026},
eprint={2603.28301},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2603.28301},
}
@misc{pi05_2025,
title = {{\pi}-0.5: A Vision-Language-Action Model with Open-World Generalization},
author = {{Physical Intelligence}},
year = {2025},
url = {https://www.physicalintelligence.company/blog/pi05},
}
License
Apache 2.0 β same as the base Pi 0.5 release. The LIBERO-Para benchmark and its evaluation code are MIT-licensed. By using this checkpoint you also agree to Physical Intelligence's terms for the Pi 0.5 base model.
Acknowledgments
- Physical Intelligence β Pi 0.5 base model
- LIBERO β original benchmark
- openpi β training & inference framework
- LIBERO-Para β paraphrase-robustness benchmark