PyTorch Archive (.mar) β MANIFEST serializedFile / Handler Runtime Divergence
This repository contains a proof-of-concept demonstrating a format-level consistency gap
in the PyTorch Model Archive (.mar) format.
Finding Summary
The .mar archive format stores a MANIFEST.json file with a serializedFile field
that identifies the declared model artifact. TorchServe performs no cross-validation
between this declaration and the artifact loaded by the handler at inference time.
A crafted archive can declare serializedFile = "benign_model.pkl" while the embedded
handler loads malicious_model.pkl, resulting in:
- Manifest inspection output: 1.0
- Live TorchServe inference output: 999.0
- ModelScan result: No issues found
- Warning emitted: None
Archive Structure
model.mar (ZIP)
βββ MAR-INF/MANIFEST.json β serializedFile = "benign_model.pkl"
βββ benign_model.pkl β predict([[0]]) = 1.0 (declared artifact)
βββ malicious_model.pkl β predict([[0]]) = 999.0 (runtime artifact)
βββ handler.py β loads malicious_model.pkl, ignores serializedFile
Reproduction
Install dependencies
pip install torchserve==0.12.0 scikit-learn numpy
pip install torch --index-url https://download.pytorch.org/whl/cpu
Step 1: Create the archive
python3 create_mar.py --outdir artifacts
Step 2: Inspect the manifest path
python3 inspect_manifest.py --mar artifacts/model.mar
Expected:
MANIFEST_SERIALIZED_FILE = benign_model.pkl
MANIFEST_INSPECTION_OUTPUT = 1.0
CROSS_VALIDATION_IN_FORMAT = False
Step 3: Run ModelScan
modelscan -p artifacts/model.mar --show-skipped
Expected:
No issues found! π
Skipped: MAR-INF/MANIFEST.json, handler.py
Step 4: Start TorchServe and send inference request
mkdir -p model_store
cp artifacts/model.mar model_store/
torchserve --start --ncs \
--model-store model_store \
--models archive=model.mar
TOKEN=$(python3 -c "import json; print(json.load(open('key_file.json'))['inference']['key'])")
curl -X POST http://127.0.0.1:8080/predictions/archive \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '[0]'
Expected response: 999.0 (HTTP 200)
Differential Summary
| Path | Artifact loaded | Output | Warning |
|---|---|---|---|
| Manifest inspection (serializedFile) | benign_model.pkl | 1.0 | None |
| Live TorchServe inference (handler) | malicious_model.pkl | 999.0 | None |
| ModelScan | scans pkl artifacts | No issues | β |
Files
| File | Description |
|---|---|
model.mar |
Crafted archive (SHA256: 66808fd2af4054aa7b3105a3a6b2cc1434044b242b159074abd8e320dbb4ebe4) |
create_mar.py |
Builds the crafted archive |
inspect_manifest.py |
Runs the manifest inspection path |
request_inference.py |
Sends inference request to live TorchServe |
poc_models.py |
Shared sklearn model factory |
requirements.txt |
Python dependencies |
expected_output.txt |
Expected output reference |
Notes
This PoC uses sklearn LinearRegression models with outputs of 1.0 and 999.0 as a
clear numeric differential. The finding is about the format-level declaration gap, not
the specific models used.
ModelScan v0.8.8 scanned both pickle artifacts and reported no issues. It did not
evaluate handler.py or cross-check handler runtime loading against the
MANIFEST.json serializedFile declaration.