T5X MessagePack Checkpoints Redirect TensorStore Weight Reads via Artifact-Carried Symlinks
Summary
This repository contains a benign proof of concept for T5X checkpoint output manipulation through MessagePack checkpoint metadata and TensorStore-backed array storage.
T5X checkpoints are MessagePack files that can contain TensorStore specs for
externalized arrays. During restore, T5X reads the MessagePack checkpoint with
Flax serialization, converts TensorStore spec dictionaries into ts.Spec
objects, and opens those stores to materialize parameters.
The primary PoC uses a portable relative symlink inside the model artifact:
artifact/symlink_checkpoint/target.w -> ../symlink_external_tensorstore/target.w
The embedded MessagePack checkpoint spec refers to target.w. When T5X opens
that TensorStore path, the filesystem symlink redirects the read to the
attacker-controlled external store inside the same artifact package. The normal
model key remains target/w, but the restored value is the payload array rather
than the benign/expected in-checkpoint value.
This is not code execution. It is artifact-carried checkpoint output/weight manipulation.
Impact
- A T5X MessagePack checkpoint can restore attacker-controlled TensorStore array values through a symlink-backed store path.
- Verified through both:
t5x.checkpoints.load_t5x_checkpoint()t5x.checkpoints.Checkpointer.restore()
- The artifact tar preserves the relative symlink and round-trips through tar extraction while still restoring controlled values.
- ModelScan 0.8.8 reports zero issues and skips both the MessagePack checkpoint and the tar package in this environment.
Affected Versions Tested
- Python
3.12.3 - Flax
0.12.7 - JAX
0.10.0 - TensorStore
0.1.83 - MessagePack
1.1.2 - ModelScan
0.8.8 - T5X source commit
cc342d419ec3e3e9c4bf1df33c82330cafdc1e16
Files
artifacts/followup_msgpack_t5x_tensorstore_traversal.tar
Primary artifact bundle. This tar preserves the symlink layout needed for the
portable PoC.
verify_t5x_msgpack_tensorstore_symlink_poc.py
End-to-end verifier. Extracts the tarball, loads T5X checkpoint code, and
verifies both load_t5x_checkpoint() and Checkpointer.restore().
scripts/t5x_loader_stub.py
Small import shim used to load T5X checkpoint code from a local T5X source
checkout without requiring the full T5X runtime stack.
scripts/build_and_verify_t5x_tensorstore_traversal.py
Original builder/verifier used to generate the artifact and evidence.
scripts/verify_t5x_checkpointer_restore.py
Focused verifier for the Checkpointer.restore() path.
evidence/runtime_verify.txt
Captured output from the original builder/verifier.
evidence/checkpointer_restore_verify.txt
Captured output proving Checkpointer.restore() follows the artifact symlink.
evidence/tar_extract_symlink_verify.txt
Captured output proving tar extraction preserves the symlink and the restored
controlled value.
evidence/modelscan_checkpointer_symlink_checkpoint.json
Captured ModelScan output for the MessagePack checkpoint file.
evidence/modelscan_tar.json
Captured ModelScan output for the tar package.
evidence/tar_listing.txt
Tar listing showing the relative symlink entries.
evidence/decoded_msgpack.txt
Decoded checkpoint structure for review.
evidence/sha256.txt
Hashes for uploaded artifacts, scripts, and evidence files.
Artifact Layout
Inside artifacts/followup_msgpack_t5x_tensorstore_traversal.tar:
artifact/symlink_checkpoint/checkpoint
artifact/symlink_checkpoint/checkpoint.msgpack
artifact/symlink_checkpoint/target.w -> ../symlink_external_tensorstore/target.w
artifact/symlink_external_tensorstore/target.w/.zarray
artifact/symlink_external_tensorstore/target.w/0
artifact/symlink_checkpointer_checkpoint/checkpoint
artifact/symlink_checkpointer_checkpoint/checkpoint.msgpack
artifact/symlink_checkpointer_checkpoint/target.w -> ../symlink_checkpointer_external_tensorstore/target.w
artifact/symlink_checkpointer_external_tensorstore/target.w/.zarray
artifact/symlink_checkpointer_external_tensorstore/target.w/0
There is also a supporting absolute-path variant and a rejected ../ variant
inside the tar. The report should focus on the portable relative-symlink
variant. Literal ../ inside TensorStore kvstore.path is rejected by
TensorStore and is included only as negative evidence.
Reproduction
The verifier needs a local T5X source checkout so it can load the tested
t5x.checkpoints implementation. Set T5X_REPO to that checkout.
Example:
git clone https://github.com/google-research/t5x /tmp/t5x
cd /tmp/t5x
git checkout cc342d419ec3e3e9c4bf1df33c82330cafdc1e16
cd /path/to/this/repo
T5X_REPO=/tmp/t5x python verify_t5x_msgpack_tensorstore_symlink_poc.py
The local triage environment used:
T5X_REPO=/workspace/messagepack/repos/t5x \
/workspace/messagepack/.venv/bin/python verify_t5x_msgpack_tensorstore_symlink_poc.py
Expected output highlights:
python 3.12.3
flax 0.12.7
jax 0.10.0
tensorstore 0.1.83
msgpack 1.1.2
symlink_target ../symlink_external_tensorstore/target.w
load_t5x_checkpoint_restored_target_w [3001, 3002, 3003]
load_t5x_checkpoint_followed_artifact_symlink True
checkpointer_symlink_target ../symlink_checkpointer_external_tensorstore/target.w
checkpointer_restore_restored_target_w [4101, 4102, 4103]
checkpointer_restore_followed_artifact_symlink True
Scanner Behavior
ModelScan 0.8.8 skips the T5X MessagePack checkpoint:
{
"total_issues": 0,
"scanned": {"total_scanned": 0},
"skipped": {
"total_skipped": 1,
"skipped_files": [
{
"category": "SCAN_NOT_SUPPORTED",
"source": "checkpoint"
}
]
}
}
ModelScan also skips the tar package itself:
{
"total_issues": 0,
"scanned": {"total_scanned": 0},
"skipped": {
"total_skipped": 1,
"skipped_files": [
{
"category": "SCAN_NOT_SUPPORTED",
"source": "followup_msgpack_t5x_tensorstore_traversal.tar"
}
]
}
}
This is included as scanner/runtime context. The core issue is the T5X restore path following artifact-carried symlinks when opening TensorStore-backed checkpoint arrays.
Root Cause
T5X checkpoint restore reads a MessagePack checkpoint and converts dictionary
leaves with TensorStore spec fields into ts.Spec objects. Those specs are then
opened to read checkpoint arrays. For file-backed TensorStore paths, normal
filesystem symlink resolution applies.
The primary PoC keeps the embedded TensorStore path simple:
{"kvstore": {"driver": "file", "path": "target.w"}}
The artifact supplies target.w as a symlink to a sibling TensorStore directory
inside the same package. T5X therefore restores the normal key target/w, but
the bytes come from the symlink target controlled by the artifact.
Hashes
09ab5be3e9083f0ac7ecd07b8a4ab8ce29603161ff1f2d90bd0a042ddcf5fca9 artifacts/followup_msgpack_t5x_tensorstore_traversal.tar
e20e75f7ee091c91d1ddbfa08acf4bd6fcf2d7c63e3e1cd98549f0f89e8d548b verify_t5x_msgpack_tensorstore_symlink_poc.py
a140843007bf1b16421b8d914f0931227a3e0a646ea9d645be86e04a0748660c scripts/t5x_loader_stub.py
367445b079a1c99c07fbce6cfc0fad2c08e2fdd7cf005ec2b4d2afea298d4303 scripts/build_and_verify_t5x_tensorstore_traversal.py
f6db82ef535c4f6278661db18cda636dde1e8a482f2d1fc7072700725d53023c scripts/verify_t5x_checkpointer_restore.py
Key checkpoint hashes inside the tar:
8ef1678772e4179087b2ea68691b3cb1618f731d5b3149ef37dd016c35f3bea4 artifact/symlink_checkpoint/checkpoint
4b1c58071e39c6dad762acb330f088843945a8adad42f9aad23a7a852c39ae86 artifact/symlink_checkpointer_checkpoint/checkpoint
fd95d819506a0154d19507d0796f931918f9f52a2022aaf1ed7bb46458809434 artifact/absolute_spec_checkpoint/checkpoint
Safety Notes
- No code execution payload is included.
- No shell, network, credential access, persistence, or destructive operation is used.
- The PoC only demonstrates deterministic checkpoint value manipulation using tiny integer arrays.
Limitations
- Severity is medium: checkpoint output/weight manipulation, not arbitrary code execution.
- Delivery must preserve symlinks. The uploaded primary artifact is a tarball specifically because tar preserves the relative symlink layout.
- Literal
../inside TensorStorekvstore.pathwas tested and rejected by TensorStore, so the submitable path is the symlink-backed artifact. - ModelScan unsupported-format behavior is not the vulnerability by itself.
Duplicate Check Notes
Targeted searches were performed for T5X checkpoint TensorStore path traversal, T5X MessagePack checkpoint symlink handling, TensorStore checkpoint symlink restore, and public T5X advisories. No matching public advisory was found during local triage.