ModelScan Bypass via Malicious Joblib Files (Load-Time RCE)
Security research PoC - do not load these files with
joblib.load()on any machine you care about.
This repository demonstrates a ProtectAI ModelScan false negative on .joblib model files. Both artifacts pass ModelScan 0.8.8 while executing arbitrary code when loaded with the standard joblib.load() API.
Disclosure target: ModelScan scanner gap (not a report that "pickle is unsafe by design").
| Researcher | roku02 |
| Scanner | ProtectAI ModelScan 0.8.8 |
| Loader | joblib.load() (joblib 1.5.3) |
| Python | 3.12 |
| Report | Huntr MFV - joblib scanner bypass |
Model Description
Two proof-of-concept .joblib files are included. Each is wrapped to resemble a legitimate scikit-learn export:
| File | Disguise | Bypass technique |
|---|---|---|
model_compress.joblib |
Pipeline(StandardScaler β RandomForestClassifier) |
Zlib compression (compress=3) hides pickle from scanner |
model_dtype.joblib |
LogisticRegression-style weights (coef_, intercept_, classes_) |
Malicious object embedded in dtype=object numpy array |
Both payloads call os.system("touch /tmp/JOBLIB_*_RCE") at load time to prove arbitrary code execution. No inference call is required.
Vulnerability Summary
ModelScan routes .joblib files to PickleUnsafeOpScan, which runs pickletools.genops() on raw file bytes without joblib-aware preprocessing.
- Compressed joblib starts with zlib magic (
785eed57), not pickle magic (8004). The scanner never decompresses β parse fails β PASS. - Hybrid joblib interleaves pickle headers with raw numpy bytes. The scanner parses the header, then fails on raw payload bytes β PASS, missing nested pickle in object-dtype arrays.
Meanwhile, joblib.load() decompresses and unpickles correctly β RCE.
About the ModelScan "Error" (Not a Detection)
When scanning model_dtype.joblib, ModelScan may output:
--- Summary ---
No issues found! π
--- Errors ---
Error 1:
The following error was raised during a pickle scan:
Parsing error: at position 329, opcode b'\x06' unknown
This is not ModelScan catching the attack. It means:
- The scanner successfully read the initial pickle header (~329 bytes).
- At byte 329 it hit raw numpy weight bytes (not valid pickle opcodes).
genops()threw a parse error.- ModelScan logged the error but returned no issues - fail-open behavior.
For model_compress.joblib, the same pattern occurs at position 0 (opcode b'x' unknown) because the file starts with zlib compressed data, not pickle.
The vulnerability is PASS despite parse failure, not detection of malicious opcodes.
HuggingFace "pickle" Badge vs ModelScan
You may see HuggingFace label these files with a pickle tag in the file browser. That is HuggingFace's own file-type heuristic - it recognizes .joblib as pickle-based serialization.
That badge does not mean ProtectAI ModelScan performed a successful security scan. In our testing:
| Scanner | model_compress.joblib |
model_dtype.joblib |
|---|---|---|
| ProtectAI ModelScan 0.8.8 | PASS | PASS |
| HF file-type tag | pickle (format label) | pickle (format label) |
The finding is that ModelScan - the security gate users and CI rely on β returns clean while load-time ACE still occurs.
Quick Start (Reproduction)
Install
pip install -r requirements.txt
Run full verification
python repro.py
Manual ModelScan check
modelscan -p model_compress.joblib
modelscan -p model_dtype.joblib
Both should report No issues found (with parse errors in the Errors section).
Expected Output
=== ModelScan Results ===
modelscan model_compress.joblib β PASS
modelscan model_dtype.joblib β PASS
=== RCE Verification ===
--- PoC 1: compress=3 ---
marker /tmp/JOBLIB_COMPRESS_RCE: BEFORE=False
joblib.load(model_compress.joblib): OK
model_type: RandomForestClassifier
marker /tmp/JOBLIB_COMPRESS_RCE: AFTER=True
RCE CONFIRMED: /tmp/JOBLIB_COMPRESS_RCE created
--- PoC 2: dtype=object ---
marker /tmp/JOBLIB_DTYPE_RCE: BEFORE=False
joblib.load(model_dtype.joblib): OK
model_type: LogisticRegression
marker /tmp/JOBLIB_DTYPE_RCE: AFTER=True
RCE CONFIRMED: /tmp/JOBLIB_DTYPE_RCE created
All checks passed.
Files
| File | Purpose |
|---|---|
model_compress.joblib |
PoC 1 - zlib compression scanner bypass |
model_dtype.joblib |
PoC 2 - object-dtype nested pickle bypass |
repro.py |
Automated ModelScan + RCE verification |
requirements.txt |
Pinned dependencies for reproduction |
Intended Use
This repository exists solely for coordinated vulnerability disclosure and scanner improvement. It must not be used to distribute malware or attack systems without authorization.
Citation
If referencing this PoC:
roku02/mfv-joblib-scanner-bypass - ModelScan bypass via compressed joblib files (load-time RCE)
https://huggingface.co/roku02/mfv-joblib-scanner-bypass
Limitations
- Payload uses
os.system("touch ...")as a benign RCE proof marker, not a destructive payload. - Tested against ModelScan 0.8.8 only; other scanner versions may behave differently.
- Requires victim to call
joblib.load()- standard behavior in sklearn/MLOps pipelines loading.joblibartifacts from HuggingFace.