Instructions to use MediumIQ/tfhunt-savedmodel-nested-writefile with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- TF-Keras
How to use MediumIQ/tfhunt-savedmodel-nested-writefile with TF-Keras:
# Note: 'keras<3.x' or 'tf_keras' must be installed (legacy) # See https://github.com/keras-team/tf-keras for more details. from huggingface_hub import from_pretrained_keras model = from_pretrained_keras("MediumIQ/tfhunt-savedmodel-nested-writefile") - Notebooks
- Google Colab
- Kaggle
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
SavedModel scanner bypass via nested DatasetFromGraph WriteFile
Three SavedModels that demonstrate, in order, that ModelScan does flag
WriteFile when it can see it, that nesting it inside DatasetFromGraph
makes it invisible to ModelScan while still running at load, and that the
same nesting technique can rewrite the model's own checkpoint and turn
into a persistent backdoor.
One thing to be clear about up front. The top_level_writefile model
is a control, not a finding. ModelScan flagging it as HIGH is the intended,
healthy behaviour β it proves the scanner does detect WriteFile when
it's visible. The actual bypass is nested_writefile, and the backdoor
is self_poisoning_writefile. Both of those run the same load-time
WriteFile and both pass ModelScan with zero issues.
Tested against the latest published versions of TensorFlow and ModelScan on PyPI (TensorFlow 2.21.0, ModelScan 0.8.8).
What's in the box
build_poc.py rebuild all three SavedModels
verify_poc.py run ModelScan + load each model + the self-poison
first/second-load differential, all in an isolated cwd
requirements.txt pinned tensorflow and modelscan
sample_results.json sanitised output from a passing run
models/
top_level_writefile/ ordinary WriteFile in the top-level GraphDef
nested_writefile/ same WriteFile hidden inside a nested GraphDef
self_poisoning_writefile/ nested WriteFile that overwrites the model's
own variables/ checkpoint at load
What each case shows
top_level_writefile β control (not a finding)
This case exists to prove the baseline. The top-level GraphDef
contains the WriteFile op directly. ModelScan 0.8.8 reports one HIGH
issue with operator = WriteFile, and the model writes its marker file
when loaded. The control passing is the expected outcome β it's the
"ModelScan does detect WriteFile when it's visible" half of the
differential. Without this case there's no way to say what the bypass
is bypassing.
nested_writefile β bypass
Same WriteFile, same payload, same load-time effect. The only
difference is location. The top-level GraphDef contains only
Const, DatasetFromGraph, and DatasetToGraphV2. The WriteFile
sits inside a serialised GraphDef carried as the bytes of a Const
string tensor, and DatasetFromGraph consumes that string as a second
parser root.
ModelScan walks the top-level nodes and the function library. It
doesn't parse Const string tensors as nested GraphDefs, so the
hidden WriteFile is invisible to it. The scan reports
total_issues = 0. At load time TensorFlow traverses the inner graph
through the TF1 legacy_init_op and runs the hidden write anyway.
saved_model_cli show --all is also blind to the nested op.
self_poisoning_writefile β persistent backdoor
A serving model with one variable w and signature y = x * w. The
model ships with w = 2.0 on disk. The same nested DatasetFromGraph
mechanism carries two hidden WriteFile nodes targeting relative paths
variables/variables.data-00000-of-00001 and variables/variables.index.
The contents are checkpoint bytes from a sibling template model built
with w = 1337.0.
At first load:
- TF restores
w = 2.0from disk into memory. - The
legacy_init_opruns the hidden writes, which overwrite the on-disk checkpoint with thew = 1337.0bytes. - Inference still uses the in-memory
w = 2.0, soy(3.0) = 6.0.
The model behaves correctly on this first load, which means a deployment pipeline that smoke-tests a model once and promotes it on success would see no anomaly.
On every load after that:
- TF restores
w = 1337.0from the now-poisoned checkpoint. - Inference returns
y(3.0) = 4011.0.
ModelScan reports the model clean for this case too.
Requirements
The pinned versions match the latest stable releases on PyPI at the time of writing.
python >= 3.10
tensorflow == 2.21.0 # latest on PyPI
modelscan == 0.8.8 # latest on PyPI
Install:
pip install -r requirements.txt
Reproducing the differential
The repo ships with the prebuilt models under models/. To verify all
three cases:
python verify_poc.py
The verifier picks a fresh temp directory, runs ModelScan and
saved_model_cli show --all on each model, then loads each model in a
child Python process. For the marker cases the child cwd is set to the
per-case workdir so the relative marker path resolves there. For the
self-poisoning case, the model is copied into the workdir, the child cwd
is set to the copied model directory, and the model is loaded twice with
its variables/ hashes captured before, after the first load, and after
the second load.
Expected output:
{
"top_level_writefile": {
"pass": true,
"intent": "control: ModelScan is expected to detect top-level WriteFile",
"top_level_has_writefile": true,
"modelscan_total_issues": 1,
"modelscan_flagged_writefile": true,
"marker_written": true
},
"nested_writefile": {
"pass": true,
"intent": "bypass: ModelScan is expected to miss nested WriteFile, load is expected to run it anyway",
"top_level_has_writefile": false,
"nested_has_writefile": true,
"modelscan_total_issues": 0,
"modelscan_issues_empty": true,
"marker_written": true
},
"self_poisoning_writefile": {
"pass": true,
"intent": "persistent backdoor: first load returns benign output, on-disk checkpoint is rewritten, second load returns attacker-chosen output",
"top_level_has_writefile": false,
"nested_has_writefile": true,
"modelscan_total_issues": 0,
"modelscan_issues_empty": true,
"input": 3.0,
"expected_benign_output": 6.0,
"expected_poison_output": 4011.0,
"first_load_output": 6.0,
"second_load_output": 4011.0,
"first_load_matches_benign": true,
"second_load_matches_poison": true,
"checkpoint_changed_after_first_load": true,
"checkpoint_stable_after_second_load": true
}
}
If you want to rebuild the models yourself before verifying:
python build_poc.py --overwrite
python verify_poc.py
If you want the loader to use a directory you choose instead of a temp directory:
python verify_poc.py --workdir ./scratch
After a run the marker cases leave files at:
./scratch/top_level_writefile/tfhunt_markers/top_level_writefile.txt
./scratch/nested_writefile/tfhunt_markers/nested_writefile.txt
Both contain tfhunt_writefile_payload\n.
The self-poisoning case leaves a copy of the model with the poisoned checkpoint at:
./scratch/self_poisoning_writefile/self_poisoning_writefile/
Reload that copy directly with tf.saved_model.load and you'll see
y(3.0) = 4011.0 instead of the original 6.0.
Manual verification
If you don't want to trust verify_poc.py, each claim below can be
checked with the stock tooling. Every step assumes your cwd is the
repo root, and ends back there.
1. ModelScan flags the control, misses the bypass
modelscan scan -p ./models/top_level_writefile
modelscan scan -p ./models/nested_writefile
The first scan reports one HIGH issue with operator = WriteFile,
which is the control behaving correctly. The second scan reports
total_issues = 0 on a model that performs the same load-time write.
2. saved_model_cli is blind to the nested op
python -m tensorflow.python.tools.saved_model_cli show --dir ./models/nested_writefile --all
The relevant line in the output is:
The MetaGraph with tag set ['serve'] contains the following ops: {'Const', 'DatasetToGraphV2', 'DatasetFromGraph'}
No mention of WriteFile.
3. tf.saved_model.load runs the hidden write
The marker path baked into the model is relative, so the write lands wherever the loader's cwd is. Run the load in a clean working directory.
Linux / macOS:
mkdir manual_load && cd manual_load
python -c "import tensorflow as tf; tf.saved_model.load('../models/nested_writefile')"
ls tfhunt_markers
cat tfhunt_markers/nested_writefile.txt
cd ..
Windows PowerShell:
New-Item -ItemType Directory manual_load | Out-Null; Set-Location manual_load
python -c "import tensorflow as tf; tf.saved_model.load('../models/nested_writefile')"
Get-ChildItem tfhunt_markers
Get-Content tfhunt_markers\nested_writefile.txt
Set-Location ..
Expected file content:
tfhunt_writefile_payload
4. Self-poisoning rewrites the checkpoint on first load
The relative WriteFile targets are variables/..., so the load must
run with cwd set to a copy of the model directory. The copy step is
important β without it, the bundled model itself would get poisoned.
Linux / macOS:
cp -r models/self_poisoning_writefile manual_poison
cd manual_poison
sha256sum variables/variables.data-00000-of-00001
python -c "import tensorflow as tf; tf.saved_model.load('.')"
sha256sum variables/variables.data-00000-of-00001
Windows PowerShell:
Copy-Item -Recurse models/self_poisoning_writefile manual_poison
Set-Location manual_poison
Get-FileHash variables/variables.data-00000-of-00001 -Algorithm SHA256
python -c "import tensorflow as tf; tf.saved_model.load('.')"
Get-FileHash variables/variables.data-00000-of-00001 -Algorithm SHA256
The two hashes will differ. The on-disk checkpoint has been physically overwritten by the load.
5. Second load returns the attacker's weights
Stay in the manual_poison/ directory from step 4 and run the load
again, this time invoking the serving signature:
python -c "import tensorflow as tf; loaded = tf.saved_model.load('.'); out = loaded.signatures['serving_default'](x=tf.constant(3.0)); print(float(next(iter(out.values())).numpy()))"
Expected output:
4011.0
The shipped model is y = x * w with w = 2.0, so a clean y(3.0)
would be 6.0. After the load-time write in step 4 rewrote the
checkpoint to w = 1337.0, the second load reads the poisoned weights
and returns 4011.0. Return to the repo root with cd .. when done.
Troubleshooting
ModuleNotFoundError: No module named 'tensorflow'β TensorFlow isn't installed in the active environment. Runpip install -r requirements.txtfrom the repo root.pip installresolution fails onmodelscanβ it needs Python 3.10-3.12. If the extras aren't pulled in, install withpip install 'modelscan[tensorflow,h5py]==0.8.8'.- Step 3's marker file doesn't appear β the cwd isn't where you think
it is. Add
import os; print(os.getcwd())before thetf.saved_model.loadcall to confirm. - Step 5 still returns
6.0β the cwd in step 4 wasn't the copiedmanual_poisondirectory, so nothing was poisoned. Copy the model again from the bundle (models/self_poisoning_writefile) and rerun step 4 with the new copy. Or just rebuild everything from scratch withpython build_poc.py --overwrite. saved_model_cli: command not foundβ it ships with TensorFlow but isn't always onPATH. Use the explicit formpython -m tensorflow.python.tools.saved_model_cli show ....
Why this is interesting
ModelScan flags WriteFile as HIGH when it sees it in the top-level
graph, so the operator is already on the unsafe list. The bypass isn't
about the operator. It's about where it's allowed to hide.
The same idea generalises to any side-effecting op that TensorFlow will
run from inside an inner dataset graph. WriteFile is the cleanest
demonstration because it's already on ModelScan's denylist, which makes
the top-level-vs-nested differential unambiguous.
The self-poisoning case turns that file-write primitive into a persistent output-manipulation backdoor that's hard to catch with a single-load smoke test, because the malicious output only appears on the second and later loads.
The hidden write also runs in tf.lite.TFLiteConverter.from_saved_model,
tf2onnx.convert, TensorFlow Serving, and the NVIDIA Triton TensorFlow
backend. Those tests live outside this PoC bundle to keep it small and
auditable, but they use models built the same way.
Safety
These models do exactly two things you can't see in the top-level graph:
top_level_writefileandnested_writefilewritetfhunt_writefile_payload\nto a relative pathtfhunt_markers/<name>.txt, resolved against the loader's working directory.self_poisoning_writefileoverwrites two relative pathsvariables/variables.data-00000-of-00001andvariables/variables.indexwith the byte content of aw = 1337.0template checkpoint. Because the verifier sets cwd to the copied model directory, those writes only touch the copy, not the bundled artifact.
None of the models reach for absolute paths, environment variables, network, credentials, or any other resource.
If you want to inspect the nested graphs yourself without loading the
models, verify_poc.py's inspect_saved_model function parses the
serialised inner GraphDefs and lists their nodes.
Suggested fix
The gap in modelscan.scanners.SavedModelTensorflowOpScan is that it
walks GraphDef.node and the function library on the top-level
MetaGraphDef but doesn't recurse into ops whose inputs are serialised
GraphDef bytes. The fix is to treat those ops as parser roots.
Sketch of what the scan loop could look like:
NESTED_GRAPHDEF_OPS = {
"DatasetFromGraph", # ops that accept a serialised GraphDef in a string input
"XlaCallModule", # carries a serialised StableHLO / MLIR module
}
MAX_RECURSION_DEPTH = 4
MAX_INNER_BYTES = 10 * 1024 * 1024
def scan_graphdef(graph_def, depth=0):
if depth > MAX_RECURSION_DEPTH:
return
for node in graph_def.node:
if node.op in UNSAFE_OPERATORS:
report_issue(node, depth=depth)
if node.op in NESTED_GRAPHDEF_OPS:
inner_bytes = resolve_const_string_input(node, "graph_def", graph_def)
if inner_bytes is None or len(inner_bytes) > MAX_INNER_BYTES:
continue
inner = GraphDef()
inner.ParseFromString(inner_bytes)
scan_graphdef(inner, depth=depth + 1)
for fn in graph_def.library.function:
for node in fn.node_def:
# Same walk as above, on the function library.
...
The bounded recursion depth and byte cap stop a malicious model from turning a recursive scan into a parser DoS.
The same logic would help any scanner that gates .pb files on a
top-level op walk. For TensorFlow itself, documenting that any op
carrying serialised IR (DatasetFromGraph, XlaCallModule, and so on)
should be treated as a parser root by external scanners would help
downstream tooling write fixes that cover all of them at once.
Files generated by a run
verify_poc.py writes:
verification.jsonnext to the script. This contains absolute paths from your machine, so it's.gitignored and is not part of the shipped artifact.
build_poc.py writes:
models/top_level_writefile/saved_model.pbmodels/nested_writefile/saved_model.pbmodels/self_poisoning_writefile/saved_model.pbmodels/self_poisoning_writefile/variables/variables.data-00000-of-00001models/self_poisoning_writefile/variables/variables.index
The first two models have empty variables/ directories. That's expected for those graphs.
Environment used to validate
Python 3.12.3
tensorflow 2.21.0
modelscan 0.8.8
Windows host
- Downloads last month
- -