You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

ModelScan Bypass via Malicious Joblib Files (Load-Time RCE)

Security research PoC - do not load these files with joblib.load() on any machine you care about.

This repository demonstrates a ProtectAI ModelScan false negative on .joblib model files. Both artifacts pass ModelScan 0.8.8 while executing arbitrary code when loaded with the standard joblib.load() API.

Disclosure target: ModelScan scanner gap (not a report that "pickle is unsafe by design").


Researcher	roku02
Scanner	ProtectAI ModelScan 0.8.8
Loader	`joblib.load()` (joblib 1.5.3)
Python	3.12
Report	Huntr MFV - joblib scanner bypass

Model Description

Two proof-of-concept .joblib files are included. Each is wrapped to resemble a legitimate scikit-learn export:

File	Disguise	Bypass technique
`model_compress.joblib`	`Pipeline(StandardScaler → RandomForestClassifier)`	Zlib compression (`compress=3`) hides pickle from scanner
`model_dtype.joblib`	`LogisticRegression`-style weights (`coef_`, `intercept_`, `classes_`)	Malicious object embedded in `dtype=object` numpy array

Both payloads call os.system("touch /tmp/JOBLIB_*_RCE") at load time to prove arbitrary code execution. No inference call is required.

Vulnerability Summary

ModelScan routes .joblib files to PickleUnsafeOpScan, which runs pickletools.genops() on raw file bytes without joblib-aware preprocessing.

Compressed joblib starts with zlib magic (785eed57), not pickle magic (8004). The scanner never decompresses → parse fails → PASS.
Hybrid joblib interleaves pickle headers with raw numpy bytes. The scanner parses the header, then fails on raw payload bytes → PASS, missing nested pickle in object-dtype arrays.

Meanwhile, joblib.load() decompresses and unpickles correctly → RCE.

About the ModelScan "Error" (Not a Detection)

When scanning model_dtype.joblib, ModelScan may output:

--- Summary ---
 No issues found! 🎉

--- Errors ---
Error 1:
The following error was raised during a pickle scan:
Parsing error: at position 329, opcode b'\x06' unknown

This is not ModelScan catching the attack. It means:

The scanner successfully read the initial pickle header (~329 bytes).
At byte 329 it hit raw numpy weight bytes (not valid pickle opcodes).
genops() threw a parse error.
ModelScan logged the error but returned no issues - fail-open behavior.

For model_compress.joblib, the same pattern occurs at position 0 (opcode b'x' unknown) because the file starts with zlib compressed data, not pickle.

The vulnerability is PASS despite parse failure, not detection of malicious opcodes.

HuggingFace "pickle" Badge vs ModelScan

You may see HuggingFace label these files with a pickle tag in the file browser. That is HuggingFace's own file-type heuristic - it recognizes .joblib as pickle-based serialization.

That badge does not mean ProtectAI ModelScan performed a successful security scan. In our testing:

Scanner	`model_compress.joblib`	`model_dtype.joblib`
ProtectAI ModelScan 0.8.8	PASS	PASS
HF file-type tag	pickle (format label)	pickle (format label)

The finding is that ModelScan - the security gate users and CI rely on — returns clean while load-time ACE still occurs.

Quick Start (Reproduction)

Install

pip install -r requirements.txt

Run full verification

python repro.py

Manual ModelScan check

modelscan -p model_compress.joblib
modelscan -p model_dtype.joblib

Both should report No issues found (with parse errors in the Errors section).

Expected Output

=== ModelScan Results ===
modelscan model_compress.joblib → PASS
modelscan model_dtype.joblib → PASS

=== RCE Verification ===

--- PoC 1: compress=3 ---
  marker /tmp/JOBLIB_COMPRESS_RCE: BEFORE=False
  joblib.load(model_compress.joblib): OK
  model_type: RandomForestClassifier
  marker /tmp/JOBLIB_COMPRESS_RCE: AFTER=True
  RCE CONFIRMED: /tmp/JOBLIB_COMPRESS_RCE created

--- PoC 2: dtype=object ---
  marker /tmp/JOBLIB_DTYPE_RCE: BEFORE=False
  joblib.load(model_dtype.joblib): OK
  model_type: LogisticRegression
  marker /tmp/JOBLIB_DTYPE_RCE: AFTER=True
  RCE CONFIRMED: /tmp/JOBLIB_DTYPE_RCE created

All checks passed.

Files

File	Purpose
`model_compress.joblib`	PoC 1 - zlib compression scanner bypass
`model_dtype.joblib`	PoC 2 - object-dtype nested pickle bypass
`repro.py`	Automated ModelScan + RCE verification
`requirements.txt`	Pinned dependencies for reproduction

Intended Use

This repository exists solely for coordinated vulnerability disclosure and scanner improvement. It must not be used to distribute malware or attack systems without authorization.

Citation

If referencing this PoC:

roku02/mfv-joblib-scanner-bypass - ModelScan bypass via compressed joblib files (load-time RCE)
https://huggingface.co/roku02/mfv-joblib-scanner-bypass

Limitations

Payload uses os.system("touch ...") as a benign RCE proof marker, not a destructive payload.
Tested against ModelScan 0.8.8 only; other scanner versions may behave differently.
Requires victim to call joblib.load() - standard behavior in sklearn/MLOps pipelines loading .joblib artifacts from HuggingFace.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Other

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support