You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

ModelScan Bypass via Malicious Joblib Files (Load-Time RCE)

Security research PoC - do not load these files with joblib.load() on any machine you care about.

This repository demonstrates a ProtectAI ModelScan false negative on .joblib model files. Both artifacts pass ModelScan 0.8.8 while executing arbitrary code when loaded with the standard joblib.load() API.

Disclosure target: ModelScan scanner gap (not a report that "pickle is unsafe by design").

Researcher roku02
Scanner ProtectAI ModelScan 0.8.8
Loader joblib.load() (joblib 1.5.3)
Python 3.12
Report Huntr MFV - joblib scanner bypass

Model Description

Two proof-of-concept .joblib files are included. Each is wrapped to resemble a legitimate scikit-learn export:

File Disguise Bypass technique
model_compress.joblib Pipeline(StandardScaler β†’ RandomForestClassifier) Zlib compression (compress=3) hides pickle from scanner
model_dtype.joblib LogisticRegression-style weights (coef_, intercept_, classes_) Malicious object embedded in dtype=object numpy array

Both payloads call os.system("touch /tmp/JOBLIB_*_RCE") at load time to prove arbitrary code execution. No inference call is required.


Vulnerability Summary

ModelScan routes .joblib files to PickleUnsafeOpScan, which runs pickletools.genops() on raw file bytes without joblib-aware preprocessing.

  1. Compressed joblib starts with zlib magic (785eed57), not pickle magic (8004). The scanner never decompresses β†’ parse fails β†’ PASS.
  2. Hybrid joblib interleaves pickle headers with raw numpy bytes. The scanner parses the header, then fails on raw payload bytes β†’ PASS, missing nested pickle in object-dtype arrays.

Meanwhile, joblib.load() decompresses and unpickles correctly β†’ RCE.


About the ModelScan "Error" (Not a Detection)

When scanning model_dtype.joblib, ModelScan may output:

--- Summary ---
 No issues found! πŸŽ‰

--- Errors ---
Error 1:
The following error was raised during a pickle scan:
Parsing error: at position 329, opcode b'\x06' unknown

This is not ModelScan catching the attack. It means:

  • The scanner successfully read the initial pickle header (~329 bytes).
  • At byte 329 it hit raw numpy weight bytes (not valid pickle opcodes).
  • genops() threw a parse error.
  • ModelScan logged the error but returned no issues - fail-open behavior.

For model_compress.joblib, the same pattern occurs at position 0 (opcode b'x' unknown) because the file starts with zlib compressed data, not pickle.

The vulnerability is PASS despite parse failure, not detection of malicious opcodes.


HuggingFace "pickle" Badge vs ModelScan

You may see HuggingFace label these files with a pickle tag in the file browser. That is HuggingFace's own file-type heuristic - it recognizes .joblib as pickle-based serialization.

That badge does not mean ProtectAI ModelScan performed a successful security scan. In our testing:

Scanner model_compress.joblib model_dtype.joblib
ProtectAI ModelScan 0.8.8 PASS PASS
HF file-type tag pickle (format label) pickle (format label)

The finding is that ModelScan - the security gate users and CI rely on β€” returns clean while load-time ACE still occurs.


Quick Start (Reproduction)

Install

pip install -r requirements.txt

Run full verification

python repro.py

Manual ModelScan check

modelscan -p model_compress.joblib
modelscan -p model_dtype.joblib

Both should report No issues found (with parse errors in the Errors section).


Expected Output

=== ModelScan Results ===
modelscan model_compress.joblib β†’ PASS
modelscan model_dtype.joblib β†’ PASS

=== RCE Verification ===

--- PoC 1: compress=3 ---
  marker /tmp/JOBLIB_COMPRESS_RCE: BEFORE=False
  joblib.load(model_compress.joblib): OK
  model_type: RandomForestClassifier
  marker /tmp/JOBLIB_COMPRESS_RCE: AFTER=True
  RCE CONFIRMED: /tmp/JOBLIB_COMPRESS_RCE created

--- PoC 2: dtype=object ---
  marker /tmp/JOBLIB_DTYPE_RCE: BEFORE=False
  joblib.load(model_dtype.joblib): OK
  model_type: LogisticRegression
  marker /tmp/JOBLIB_DTYPE_RCE: AFTER=True
  RCE CONFIRMED: /tmp/JOBLIB_DTYPE_RCE created

All checks passed.

Files

File Purpose
model_compress.joblib PoC 1 - zlib compression scanner bypass
model_dtype.joblib PoC 2 - object-dtype nested pickle bypass
repro.py Automated ModelScan + RCE verification
requirements.txt Pinned dependencies for reproduction

Intended Use

This repository exists solely for coordinated vulnerability disclosure and scanner improvement. It must not be used to distribute malware or attack systems without authorization.


Citation

If referencing this PoC:

roku02/mfv-joblib-scanner-bypass - ModelScan bypass via compressed joblib files (load-time RCE)
https://huggingface.co/roku02/mfv-joblib-scanner-bypass

Limitations

  • Payload uses os.system("touch ...") as a benign RCE proof marker, not a destructive payload.
  • Tested against ModelScan 0.8.8 only; other scanner versions may behave differently.
  • Requires victim to call joblib.load() - standard behavior in sklearn/MLOps pipelines loading .joblib artifacts from HuggingFace.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support