Security research PoC - gated access

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

This repository hosts a benign proof-of-concept model for a responsibly disclosed ModelScan scanner bypass. The payload only writes a marker file - no harmful action. Access is granted manually for vendor verification (Protect AI / huntr).

Log in or Sign Up to review the conditions and access this model content.

ModelScan PyTorch Scan Bypass β€” Duplicate data.pkl (zipfile last vs. miniz first)

Responsible-disclosure security PoC for the Protect AI / huntr Model File Vulnerability program. The payload is benign (it only writes a marker file). Files are gated; access is granted for vendor verification.

TL;DR

poc_model.pt is a perfectly valid PyTorch archive that stores the record data.pkl twice. The scanner and the loader disagree on which copy is "the" file:

Component reads which data.pkl verdict
ModelScan (Python zipfile, opens by name) the last duplicate No issues found
PyTorch (miniz mz_zip_reader_locate_file) the first duplicate runs os.system(...)

So ModelScan scans the harmless copy and reports clean, while torch.load executes the malicious copy. No corruption, no exception, no CRC error β€” nothing is malformed.

Verified against modelscan==0.8.8 (latest) and torch==2.12.1+cpu.

How it works

A .pt file is a ZIP container. ModelScan iterates members with Python's zipfile and opens each one by name:

# modelscan/modelscan.py : _iterate_models
for file_name in zip.namelist():        # lists 'data.pkl' twice -> loop runs twice
    with zip.open(file_name, "r") as f: # open BY NAME -> NameToInfo[name] = LAST duplicate
        yield Model(file_name, f)

zip.open(name) resolves a string name through ZipFile.NameToInfo[name], which keeps the last entry for duplicate names. Both loop iterations therefore open the last (harmless) copy; the first (payload) copy is never opened.

PyTorch resolves records through its bundled miniz reader (caffe2/serialize/inline_container.cc -> mz_zip_reader_locate_file), which returns the first central-directory match β€” the payload.

Reproduce

pip install modelscan torch

# 1) Scanner says the file is clean:
modelscan -p poc_model.pt
#   --- Summary ---
#    No issues found

# 2) Loader executes the hidden payload (benign marker):
python -c "import torch; torch.load('poc_model.pt', weights_only=False)"
cat /tmp/modelscan_pytorch_dup_poc.txt        # -> modelscan_pytorch_dup_poc

weights_only=False is the relevant real-world path: it is required for any checkpoint holding non-tensor / custom objects, and is common in existing pipelines. The point is that the scanner declared the file safe β€” regardless of how it is later loaded.

Why it matters

Anyone using ModelScan as a gate (CI, model registries, pre-load scanning, MLOps) is told a weaponized PyTorch model is clean. On torch.load the first data.pkl runs arbitrary code, i.e. remote code execution on the loading host. PyTorch is the most common serialized-model format, so the blast radius is large.

Distinct from known ZIP bypasses

This is a silent, crash-free duplicate-name resolution divergence within a single, valid central directory. It is not any of:

  • EOP / exception-crash bypasses (arXiv:2508.19774) β€” those crash zipfile;
  • picklescan CVE-2025-1944 (local-vs-central name tampering, a crash);
  • picklescan CVE-2025-10155 / -10156 / -10157, GHSA-w8jq-xcqf-f792 (extension / CRC / blocklist / flag-bit β€” all crash- or blocklist-based, different tool);
  • concatenated-ZIP / "Zombie ZIP" AV evasion β€” those use two central directories.

It survives every fix for the above (try/except, CRC tolerance, name checks, flag-bit normalization, rejecting concatenation) because nothing here is malformed.

Root cause and fix

ModelScan opens ZIP members by name, collapsing duplicates to the last entry, while PyTorch loads the first. Fix: iterate zip.infolist() and open each ZipInfo object so every entry (including duplicates) is scanned, and reject archives that contain duplicate record names (a PyTorch archive never legitimately holds two data.pkl).

for info in zip.infolist():
    with zip.open(info, "r") as f:      # open by ZipInfo -> every entry, incl. duplicates
        yield Model(f"{source}:{info.filename}", f)

Disclosure

Reported responsibly via the Protect AI / huntr Model File Vulnerability program against protectai/modelscan (vulnerable code present at commit 61fcec9, 2026-02-18). The PoC payload only writes a marker file; this repository is gated and intended for vendor verification.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for Fitscha/modelscan-pytorch-dup-datapkl-bypass-poc