Flax/Orbax Sparse-Shape Checkpoint PoC

This is a benign security PoC for a Flax/Orbax checkpoint resource-amplification issue. It demonstrates a tiny legacy Orbax/Zarr checkpoint whose model-controlled metadata restores as a much larger zero-filled float32 tensor.

Files

  • build_poc.py creates the checkpoint artifact.
  • verify_poc.py restores the checkpoint and writes results.json.
  • poc_sparse_shape_checkpoint/checkpoint_0/ is the generated checkpoint artifact.
  • poc_sparse_shape_checkpoint/artifact_manifest.json records file sizes and SHA256 values.
  • modelscan_checkpoint_0.json is ModelScan output.
  • results.json is runtime restore output from the local validation run.

Trigger

The checkpoint uses legacy Orbax/Zarr layout with:

  • _METADATA: store_array_data_equal_to_fill_value=false
  • marker/.zarray: a large declared shape
  • no chunk data file

On restore, current Orbax/TensorStore fills missing chunk data and materializes the declared array. The staged artifact uses a safe default shape of 8,000,000 float32 elements, allocating about 32 MB from a sub-kilobyte checkpoint.

Reproduction

python -m venv .venv
.venv/Scripts/python -m pip install flax orbax-checkpoint modelscan transformers msgpack psutil numpy
.venv/Scripts/python build_poc.py --shape 8000000
.venv/Scripts/python verify_poc.py
.venv/Scripts/modelscan scan -p poc_sparse_shape_checkpoint/checkpoint_0 -r json --show-skipped -o modelscan_checkpoint_0.json

Expected restore output includes:

  • restored_shape: [8000000]
  • restored_dtype: float32
  • restored_nbytes: 32000000
  • first_value and last_value: 0.0

Scanner Output Summary

ModelScan 0.8.8 reports zero issues and skips _CHECKPOINT_METADATA, _METADATA, and marker/.zarray as unsupported files. This is not an ACE scanner bypass; it is scanner/runtime evidence for a currently unsupported Flax/Orbax checkpoint surface.

Impact

An attacker-controlled checkpoint can be much smaller on disk than the array materialized during restore. Increasing the declared Zarr shape scales the allocation and can produce a denial of service in services that restore untrusted Flax/Orbax checkpoints.

Limitations:

  • DoS/resource amplification only.
  • No code execution.
  • Demonstrated on legacy Orbax/Zarr checkpoint layout, not default OCDBT.
  • Requires a service or user to restore the checkpoint.

Mitigations

  • Reject or cap restored tensor shapes before TensorStore.read().
  • Reject checkpoints with missing chunk data unless explicitly trusted.
  • Treat _METADATA and .zarray as security-sensitive scanner targets.
  • Enforce a maximum restored byte budget per checkpoint.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support