YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
modelscan .pt zip-parser differential (Python zipfile vs torch miniz) β silent pickle-RCE false-negative
Severity: Critical (modelscan certifies the file CLEAN with no error, while torch.load executes attacker code)
Affected tool: modelscan 0.8.8 (and current main) β zip walk in modelscan.py _iterate_models / get_zipfile. Victim loader: torch.load(..., weights_only=False) via PyTorchStreamReader/miniz (torch 2.12.0).
Category: ModelScan scanner-bypass on PyTorch .pt (in-scope).
Summary
A new-style torch.save() .pt is a ZIP. modelscan walks its members with Python's stdlib zipfile; torch.load reads them with C++ miniz (PyTorchStreamReader). The two readers can be made to resolve the same logical member name (data.pkl) to different physical bytes. Craft a checkpoint where modelscan's zipfile reads a benign data.pkl (β 0 issues, no error, file reported scanned) while miniz reads a malicious data.pkl whose __reduce__ runs os.system β silent RCE on torch.load(weights_only=False). Neither parser crashes β this is a silent success-on-different-bytes, not an exception-skip.
Two independent crafting techniques produce the divergence (both verified end-to-end; included as PoC variants):
Variant A β concatenated dual-archive (EOCD selection)
poc/variant-concatenated-eocd/ β File = zipA || zipB, each an independently-valid torch archive (A's data.pkl malicious, B's benign, byte-lengths/offsets/CRCs patched to stay valid). CPython zipfile locates the central directory from the last EOCD and self-corrects offsets (concat = EOCD_pos β cd_size β cd_offset) β reads zipB (benign). miniz reads member data at the literal header offset β resolves data.pkl into the leading archive β zipA (malicious). codeLocation: modelscan.py:98-112; CPython zipfile._RealGetContents (last-EOCD); caffe2::serialize::PyTorchStreamReader.
Variant B β duplicate central-directory record + EOCD entry-count (miniz honors the count)
poc/variant-dup-centraldir/ β Central directory contains [data.pklβEVIL], [version], [data.pklβBENIGN] with the EOCD "total entries" field set to 2 while 3 records physically exist. miniz honors the count and reads only the first 2 records β its only data.pkl is EVIL. CPython zipfile keeps reading all 3 records (count is a hint, not a stop) and last-wins β BENIGN. codeLocation: modelscan/scanners/pickle/scan.py:27-28 (_is_zipfile β return None, so the container is walked by modelscan.py:94-112 stdlib zipfile); torch serialization.py:1949,2182 (get_record('data.pkl') via miniz); CPython zipfile/__init__.py:1482 (NameToInfo[x]=x unconditional last-wins).
Reproduce
- Variant A:
python poc/variant-concatenated-eocd/final_assert.pyβzipfile.open('β¦/data.pkl')head = benign dict;PyTorchFileReader.get_record('data.pkl')head =GLOBAL nt system; modelscan on the polyglot ={CRITICAL:0,β¦}, errors:[], scannedβ¦halfB\data.pkl; modelscan onhalfAalone =CRITICAL:1(control β payload IS detectable);torch.load(weights_only=False)β marker written (RCE). - Variant B:
python poc/variant-dup-centraldir/poc_dualcd_count_desync.pyβ mechanism-proof matrix:[EVIL,version,BENIGN] count=2β torch=EVIL / modelscan=0 issues; inversecount=2flips both verdicts; honestcount=3β both BENIGN β proving the EOCD entry-count is the exact lever. Control: same gadget in a normal.ptβ modelscan 1 CRITICAL.torch.load(weights_only=False)β marker written (RCE). File passes torch's strict_is_zipfile.
Both ran clean in mfvenv (modelscan 0.8.8, torch 2.12.0+cpu, Python 3.12.10).
Impact
Complete silent bypass of modelscan's core pickle-RCE detection for PyTorch checkpoints β the exact threat modelscan exists to detect, which it flags standalone. modelscan is the de-facto pre-deployment/HF/CI gate; a model it certifies total_issues=0, errors=[] runs arbitrary os.system the moment a victim runs the standard torch.load(weights_only=False). Supply-chain weaponizable on any registry/CI clearing .pt via modelscan. Honest precondition: requires weights_only=False (PyTorch β₯2.6 defaults to True, an orthogonal mitigation) β but it is pervasive in legacy/training-resume/from_pretrained call sites and is precisely modelscan's deployment scenario. No trust_remote_code.
Dup-check
Novel for modelscan. modelscan has zero published GitHub security advisories. All known zip-bypass CVEs target picklescan (a different codebase) and are crash-based single-archive manipulations: CVE-2025-1944 (local-vs-central name β BadZipFile), CVE-2025-1945 (flag-bit β error), CVE-2025-10156 (bad CRC β abort), GHSA-769v-p64c-89pr (alternate extension hidden pickle). This finding is the opposite of a crash: both parsers succeed, modelscan returns 0 issues with errors=[], silently scanning the wrong physical member. arXiv:2508.19774 "Art of Hide and Seek" Table II ZIP EOPs (double-PK0506, bad-centdir-count, etc.) are all exception-oriented (force the scanner to throw and skip) β confirmed by direct fetch; none describe a silent zipfile-last-EOCD vs miniz divergence. ZIP-concatenation/duplicate-entry ambiguity is a known AV/installer evasion class (CPython #117779, uv wheel confusion) but has never been applied to the modelscan-vs-torch-miniz .pt scanner/loader pair. Distinct from our R2 FRAME desync (single pickle stream), R3 .npz inner-member rename (numpy, both use zipfile β no differential), and R5 legacy multi-pickle (non-zip).
Note: the
zipfile-vs-miniz differential is specific to.pt/torch β.keras/.npzloaders also use Pythonzipfile, so no differential exists there. This is why it's filed as one.ptfinding with two techniques, not replicated across formats.