Security research PoC - gated access
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
This repository hosts a benign proof-of-concept model for a responsibly disclosed ModelScan scanner bypass. The payload only writes a marker file - no harmful action. Access is granted manually for vendor verification (Protect AI / huntr).
Log in or Sign Up to review the conditions and access this model content.
ModelScan PyTorch Scan Bypass β Duplicate data.pkl (zipfile last vs. miniz first)
Responsible-disclosure security PoC for the Protect AI / huntr Model File Vulnerability program. The payload is benign (it only writes a marker file). Files are gated; access is granted for vendor verification.
TL;DR
poc_model.pt is a perfectly valid PyTorch archive that stores the record
data.pkl twice. The scanner and the loader disagree on which copy is "the" file:
| Component | reads which data.pkl |
verdict |
|---|---|---|
ModelScan (Python zipfile, opens by name) |
the last duplicate | No issues found |
PyTorch (miniz mz_zip_reader_locate_file) |
the first duplicate | runs os.system(...) |
So ModelScan scans the harmless copy and reports clean, while torch.load executes
the malicious copy. No corruption, no exception, no CRC error β nothing is malformed.
Verified against modelscan==0.8.8 (latest) and torch==2.12.1+cpu.
How it works
A .pt file is a ZIP container. ModelScan iterates members with Python's zipfile
and opens each one by name:
# modelscan/modelscan.py : _iterate_models
for file_name in zip.namelist(): # lists 'data.pkl' twice -> loop runs twice
with zip.open(file_name, "r") as f: # open BY NAME -> NameToInfo[name] = LAST duplicate
yield Model(file_name, f)
zip.open(name) resolves a string name through ZipFile.NameToInfo[name], which keeps
the last entry for duplicate names. Both loop iterations therefore open the last
(harmless) copy; the first (payload) copy is never opened.
PyTorch resolves records through its bundled miniz reader
(caffe2/serialize/inline_container.cc -> mz_zip_reader_locate_file), which returns the
first central-directory match β the payload.
Reproduce
pip install modelscan torch
# 1) Scanner says the file is clean:
modelscan -p poc_model.pt
# --- Summary ---
# No issues found
# 2) Loader executes the hidden payload (benign marker):
python -c "import torch; torch.load('poc_model.pt', weights_only=False)"
cat /tmp/modelscan_pytorch_dup_poc.txt # -> modelscan_pytorch_dup_poc
weights_only=False is the relevant real-world path: it is required for any checkpoint
holding non-tensor / custom objects, and is common in existing pipelines. The point is
that the scanner declared the file safe β regardless of how it is later loaded.
Why it matters
Anyone using ModelScan as a gate (CI, model registries, pre-load scanning, MLOps) is told
a weaponized PyTorch model is clean. On torch.load the first data.pkl runs arbitrary
code, i.e. remote code execution on the loading host. PyTorch is the most common
serialized-model format, so the blast radius is large.
Distinct from known ZIP bypasses
This is a silent, crash-free duplicate-name resolution divergence within a single, valid central directory. It is not any of:
- EOP / exception-crash bypasses (arXiv:2508.19774) β those crash
zipfile; - picklescan CVE-2025-1944 (local-vs-central name tampering, a crash);
- picklescan CVE-2025-10155 / -10156 / -10157, GHSA-w8jq-xcqf-f792 (extension / CRC / blocklist / flag-bit β all crash- or blocklist-based, different tool);
- concatenated-ZIP / "Zombie ZIP" AV evasion β those use two central directories.
It survives every fix for the above (try/except, CRC tolerance, name checks, flag-bit normalization, rejecting concatenation) because nothing here is malformed.
Root cause and fix
ModelScan opens ZIP members by name, collapsing duplicates to the last entry, while
PyTorch loads the first. Fix: iterate zip.infolist() and open each ZipInfo object so
every entry (including duplicates) is scanned, and reject archives that contain duplicate
record names (a PyTorch archive never legitimately holds two data.pkl).
for info in zip.infolist():
with zip.open(info, "r") as f: # open by ZipInfo -> every entry, incl. duplicates
yield Model(f"{source}:{info.filename}", f)
Disclosure
Reported responsibly via the Protect AI / huntr Model File Vulnerability program against
protectai/modelscan (vulnerable code present at commit 61fcec9, 2026-02-18). The PoC
payload only writes a marker file; this repository is gated and intended for vendor
verification.