ActQuant โ€” Pi 0.5 LIBERO โ€” 3 BPW

Quantized Pi 0.5 LIBERO-finetuned checkpoint produced with the two-stage ActQuant quantization recipe.

  • Stage 1 โ€” HSIC inter-tensor bit allocation. Per-tensor sensitivity scored with the Hilbert-Schmidt Independence Criterion against ground-truth actions; greedy per-layer Lยฒ allocator assigns a quant type per tensor under a budget.
  • Stage 2 โ€” Action-Mixed Fisher (AMF) imatrix. Per-element Fisher diagonal under the flow-matching action loss, consumed by llama-quantize for block-level scale optimization.

This variant uses Q2_K as the LLM allocator's base type (HSIC sens score), Q3_K vision tower, and the action expert kept at fp16.

Bits-per-weight breakdown

Section Params BPW
Vision (SigLIP, Q3_K) 449 M 3.56
LLM blocks (HSIC-allocated Q2_K base, selective Q4_K upgrades) 1.98 B 2.98
Vision + LLM blocks (headline) 2.43 B 3.09 BPW
Token embedding (Q8_0) 527 M 8.50
Action expert / flow head (fp16) 430 M 16.00
Full checkpoint footprint 2.2 GB

The 3 BPW headline number excludes the embedding and action-expert tensors, following standard quantization-paper convention: those tensors are not the target of ActQuant's block-wise allocation.

LIBERO closed-loop results

Aggregate success rate across all four LIBERO suites, 500 trials per suite (2 000 total), evaluated through the C++/GGML runtime via the pi05.so pybind11 binding (same code path that runs at deployment):

Suite Success rate
libero_spatial 98.2 %
libero_object 98.8 %
libero_goal 95.0 %
libero_10 (long horizon) 87.2 %
Aggregate 94.8 %

For reference, llama.cpp's blind round-to-nearest at the same Q2_K base scores 94.3 % aggregate; the two-stage ActQuant pipeline preserves 0.5 pp aggregate success while delivering paper-grade per-tensor bit allocation. Gains over the blind heuristic grow at sub-3-bit budgets.

Files

File Size Purpose
pi05.gguf 2.2 GB Merged vision + LLM + action-expert GGUF (the deployable artifact)
tokenizer.model 4.1 MB PaliGemma SentencePiece tokenizer
norm_stats.json 1.7 KB LIBERO action-quantile normalization stats

Run LIBERO evaluation

huggingface-cli download NU-World-Model-Embodied-AI/ActQuant-Pi05-LIBERO-3bpw \
    --local-dir /path/to/eval_dir

# From the ActQuant build tree (build_openpi/) โ€” see paper repo:
cd /path/to/ActQuant
for suite in libero_spatial libero_object libero_goal libero_10; do
    bash tools/pi0.5/run_libero_eval.sh "$suite" 50 5 8 8000 /path/to/eval_dir
done

The launcher spawns one serve_policy.py per GPU (loading pi05.gguf through the pi05.so binding) and the LIBERO client talks to it over WebSocket.

Reproduce this exact checkpoint from scratch

The complete pipeline is in github.com/arashakb/ActQuant. After building (build_openpi/) and reconstructing the public LIBERO calibration set:

# Stage 0 โ€” Export the LIBERO-finetuned bf16 Pi 0.5 to GGUF (vision Q3_K,
# everything else preserved). This produces the merge target.
python tools/pi0.5/export_pi05.py \
    -d /path/to/pi05_libero_finetuned_v044 \
    -o /path/to/pi05_libero_base_gguf \
    --quant_vision q3k

# Stage 0b โ€” Export the standalone bf16 PaliGemma LLM (this is what
# llama-quantize will quantize in Stage 1+2).
python tools/pi0.5/export_pi05_llm.py \
    -d /path/to/pi05_libero_finetuned_v044 \
    -o /path/to/pi05_libero_base_gguf/pali_llm_bf16.gguf

# Stage 2 โ€” Compute the AMF Fisher imatrix once
#   (60-episode LIBERO calibration: 10 spatial / 10 object / 10 goal / 30 long;
#    LIBERO is public โ€” reconstruct via get_pi05_calib_data.py).
python tools/fisher-diag/get_pi05_calib_data.py \
    --output-dir /path/to/calib_data_raw
python tools/fisher-diag/compute_fisher_pi05.py \
    --checkpoint /path/to/pi05_libero_finetuned_v044 \
    --calib-dir  /path/to/calib_data_raw \
    --output     /path/to/pi05_libero_base_gguf/fisher_flow_perweight.gguf \
    --num-gpus 8 --batch-size 6

# Stage 1 + 2 + merge โ€” chained by run_hsic_quant_pi05.sh
bash tools/hsic/run_hsic_quant_pi05.sh \
    --base-type Q2_K \
    --max-type  Q4_K \
    --score-key sens \
    --num-gpus  8

Output: pi05_q2k_hsic_sens_v3k.gguf (rename to pi05.gguf for serving).

This is the exact recipe (--base-type Q2_K --score-key sens against a Q3_K-vision base, AMF imatrix, default merge) that produced the checkpoint in this repository.

Citation

@article{actquant2026,
  title  = {ActQuant: Sub-4-bit Action-Guided Quantization for Vision-Language-Action Models},
  author = {Akbari, Arash and others},
  journal= {arXiv preprint arXiv:2605.24011},
  year   = {2026}
}

License

MIT (inherited from the llama.cpp upstream build infrastructure). Pi 0.5 model weights are bound by the upstream OpenPI license and the underlying PaliGemma terms.

Downloads last month
19
GGUF
Model size
3B params
Architecture
pi05
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Video Preview
loading

Model tree for NU-World-Model-Embodied-AI/ActQuant-Pi05-LIBERO-3bpw

Quantized
(4)
this model

Collection including NU-World-Model-Embodied-AI/ActQuant-Pi05-LIBERO-3bpw

Paper for NU-World-Model-Embodied-AI/ActQuant-Pi05-LIBERO-3bpw