T5X MessagePack Checkpoints Redirect TensorStore Weight Reads via Artifact-Carried Symlinks

Summary

This repository contains a benign proof of concept for T5X checkpoint output manipulation through MessagePack checkpoint metadata and TensorStore-backed array storage.

T5X checkpoints are MessagePack files that can contain TensorStore specs for externalized arrays. During restore, T5X reads the MessagePack checkpoint with Flax serialization, converts TensorStore spec dictionaries into ts.Spec objects, and opens those stores to materialize parameters.

The primary PoC uses a portable relative symlink inside the model artifact:

artifact/symlink_checkpoint/target.w -> ../symlink_external_tensorstore/target.w

The embedded MessagePack checkpoint spec refers to target.w. When T5X opens that TensorStore path, the filesystem symlink redirects the read to the attacker-controlled external store inside the same artifact package. The normal model key remains target/w, but the restored value is the payload array rather than the benign/expected in-checkpoint value.

This is not code execution. It is artifact-carried checkpoint output/weight manipulation.

Impact

  • A T5X MessagePack checkpoint can restore attacker-controlled TensorStore array values through a symlink-backed store path.
  • Verified through both:
    • t5x.checkpoints.load_t5x_checkpoint()
    • t5x.checkpoints.Checkpointer.restore()
  • The artifact tar preserves the relative symlink and round-trips through tar extraction while still restoring controlled values.
  • ModelScan 0.8.8 reports zero issues and skips both the MessagePack checkpoint and the tar package in this environment.

Affected Versions Tested

  • Python 3.12.3
  • Flax 0.12.7
  • JAX 0.10.0
  • TensorStore 0.1.83
  • MessagePack 1.1.2
  • ModelScan 0.8.8
  • T5X source commit cc342d419ec3e3e9c4bf1df33c82330cafdc1e16

Files

artifacts/followup_msgpack_t5x_tensorstore_traversal.tar
  Primary artifact bundle. This tar preserves the symlink layout needed for the
  portable PoC.

verify_t5x_msgpack_tensorstore_symlink_poc.py
  End-to-end verifier. Extracts the tarball, loads T5X checkpoint code, and
  verifies both load_t5x_checkpoint() and Checkpointer.restore().

scripts/t5x_loader_stub.py
  Small import shim used to load T5X checkpoint code from a local T5X source
  checkout without requiring the full T5X runtime stack.

scripts/build_and_verify_t5x_tensorstore_traversal.py
  Original builder/verifier used to generate the artifact and evidence.

scripts/verify_t5x_checkpointer_restore.py
  Focused verifier for the Checkpointer.restore() path.

evidence/runtime_verify.txt
  Captured output from the original builder/verifier.

evidence/checkpointer_restore_verify.txt
  Captured output proving Checkpointer.restore() follows the artifact symlink.

evidence/tar_extract_symlink_verify.txt
  Captured output proving tar extraction preserves the symlink and the restored
  controlled value.

evidence/modelscan_checkpointer_symlink_checkpoint.json
  Captured ModelScan output for the MessagePack checkpoint file.

evidence/modelscan_tar.json
  Captured ModelScan output for the tar package.

evidence/tar_listing.txt
  Tar listing showing the relative symlink entries.

evidence/decoded_msgpack.txt
  Decoded checkpoint structure for review.

evidence/sha256.txt
  Hashes for uploaded artifacts, scripts, and evidence files.

Artifact Layout

Inside artifacts/followup_msgpack_t5x_tensorstore_traversal.tar:

artifact/symlink_checkpoint/checkpoint
artifact/symlink_checkpoint/checkpoint.msgpack
artifact/symlink_checkpoint/target.w -> ../symlink_external_tensorstore/target.w
artifact/symlink_external_tensorstore/target.w/.zarray
artifact/symlink_external_tensorstore/target.w/0

artifact/symlink_checkpointer_checkpoint/checkpoint
artifact/symlink_checkpointer_checkpoint/checkpoint.msgpack
artifact/symlink_checkpointer_checkpoint/target.w -> ../symlink_checkpointer_external_tensorstore/target.w
artifact/symlink_checkpointer_external_tensorstore/target.w/.zarray
artifact/symlink_checkpointer_external_tensorstore/target.w/0

There is also a supporting absolute-path variant and a rejected ../ variant inside the tar. The report should focus on the portable relative-symlink variant. Literal ../ inside TensorStore kvstore.path is rejected by TensorStore and is included only as negative evidence.

Reproduction

The verifier needs a local T5X source checkout so it can load the tested t5x.checkpoints implementation. Set T5X_REPO to that checkout.

Example:

git clone https://github.com/google-research/t5x /tmp/t5x
cd /tmp/t5x
git checkout cc342d419ec3e3e9c4bf1df33c82330cafdc1e16

cd /path/to/this/repo
T5X_REPO=/tmp/t5x python verify_t5x_msgpack_tensorstore_symlink_poc.py

The local triage environment used:

T5X_REPO=/workspace/messagepack/repos/t5x \
  /workspace/messagepack/.venv/bin/python verify_t5x_msgpack_tensorstore_symlink_poc.py

Expected output highlights:

python 3.12.3
flax 0.12.7
jax 0.10.0
tensorstore 0.1.83
msgpack 1.1.2

symlink_target ../symlink_external_tensorstore/target.w
load_t5x_checkpoint_restored_target_w [3001, 3002, 3003]
load_t5x_checkpoint_followed_artifact_symlink True

checkpointer_symlink_target ../symlink_checkpointer_external_tensorstore/target.w
checkpointer_restore_restored_target_w [4101, 4102, 4103]
checkpointer_restore_followed_artifact_symlink True

Scanner Behavior

ModelScan 0.8.8 skips the T5X MessagePack checkpoint:

{
  "total_issues": 0,
  "scanned": {"total_scanned": 0},
  "skipped": {
    "total_skipped": 1,
    "skipped_files": [
      {
        "category": "SCAN_NOT_SUPPORTED",
        "source": "checkpoint"
      }
    ]
  }
}

ModelScan also skips the tar package itself:

{
  "total_issues": 0,
  "scanned": {"total_scanned": 0},
  "skipped": {
    "total_skipped": 1,
    "skipped_files": [
      {
        "category": "SCAN_NOT_SUPPORTED",
        "source": "followup_msgpack_t5x_tensorstore_traversal.tar"
      }
    ]
  }
}

This is included as scanner/runtime context. The core issue is the T5X restore path following artifact-carried symlinks when opening TensorStore-backed checkpoint arrays.

Root Cause

T5X checkpoint restore reads a MessagePack checkpoint and converts dictionary leaves with TensorStore spec fields into ts.Spec objects. Those specs are then opened to read checkpoint arrays. For file-backed TensorStore paths, normal filesystem symlink resolution applies.

The primary PoC keeps the embedded TensorStore path simple:

{"kvstore": {"driver": "file", "path": "target.w"}}

The artifact supplies target.w as a symlink to a sibling TensorStore directory inside the same package. T5X therefore restores the normal key target/w, but the bytes come from the symlink target controlled by the artifact.

Hashes

09ab5be3e9083f0ac7ecd07b8a4ab8ce29603161ff1f2d90bd0a042ddcf5fca9  artifacts/followup_msgpack_t5x_tensorstore_traversal.tar
e20e75f7ee091c91d1ddbfa08acf4bd6fcf2d7c63e3e1cd98549f0f89e8d548b  verify_t5x_msgpack_tensorstore_symlink_poc.py
a140843007bf1b16421b8d914f0931227a3e0a646ea9d645be86e04a0748660c  scripts/t5x_loader_stub.py
367445b079a1c99c07fbce6cfc0fad2c08e2fdd7cf005ec2b4d2afea298d4303  scripts/build_and_verify_t5x_tensorstore_traversal.py
f6db82ef535c4f6278661db18cda636dde1e8a482f2d1fc7072700725d53023c  scripts/verify_t5x_checkpointer_restore.py

Key checkpoint hashes inside the tar:

8ef1678772e4179087b2ea68691b3cb1618f731d5b3149ef37dd016c35f3bea4  artifact/symlink_checkpoint/checkpoint
4b1c58071e39c6dad762acb330f088843945a8adad42f9aad23a7a852c39ae86  artifact/symlink_checkpointer_checkpoint/checkpoint
fd95d819506a0154d19507d0796f931918f9f52a2022aaf1ed7bb46458809434  artifact/absolute_spec_checkpoint/checkpoint

Safety Notes

  • No code execution payload is included.
  • No shell, network, credential access, persistence, or destructive operation is used.
  • The PoC only demonstrates deterministic checkpoint value manipulation using tiny integer arrays.

Limitations

  • Severity is medium: checkpoint output/weight manipulation, not arbitrary code execution.
  • Delivery must preserve symlinks. The uploaded primary artifact is a tarball specifically because tar preserves the relative symlink layout.
  • Literal ../ inside TensorStore kvstore.path was tested and rejected by TensorStore, so the submitable path is the symlink-backed artifact.
  • ModelScan unsupported-format behavior is not the vulnerability by itself.

Duplicate Check Notes

Targeted searches were performed for T5X checkpoint TensorStore path traversal, T5X MessagePack checkpoint symlink handling, TensorStore checkpoint symlink restore, and public T5X advisories. No matching public advisory was found during local triage.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support