YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Remote Code Execution (RCE) via Unsafe Deserialization
**Vulnerability Report
1. Summary
This repository is a Proof of Concept (PoC) demonstrating a Remote Code Execution (RCE) vulnerability in the model repository infrastructure. The system processes serialized 'pickle' files without integrity validation or sandboxing, allowing an attacker to execute arbitrary system commands.
2. Root Cause
The vulnerability stems from an insecure design choice in the mlflow.pyfunc module. While pickle is inherently dangerous when processing untrusted input, the vulnerability is explicitly exposed when the MLFLOW_ALLOW_PICKLE_DESERIALIZATION environment variable is enabled.
The mlflow.pyfunc module acts as an orchestrator. The actual deserialization (the dangerous action) is triggered when the orchestrator invokes _load_pyfunc from the dynamic loader module (e.g., mlflow.sklearn).
Technical Location: mlflow/pyfunc/init.py
try:
if model_config:
model_impl = importlib.import_module(conf[MAIN])._load_pyfunc(data_path, model_config)
else:
model_impl = importlib.import_module(conf[MAIN])._load_pyfunc(data_path)
except Exception as e:
raise e
This architecture creates an execution vector by delegating loading logic to sub-modules without enforcing integrity checks. If the user enables the unsafe deserialization flag, the framework blindly passes the path of the malicious artifact (data_path) to the loader module, which then reconstructs the object using pickle or cloudpickle, leading to immediate code execution upon deserialization.
3. Proof of Concept (PoC)
The repository includes an automation script ('POC.py') that handles the generation, upload, and triggering of the malicious payload.
3.1. Prerequisites
Ensure you have the required dependencies installed before running the PoC:
pip install mlflow cloudpickle pyyaml
3.2. Execution Steps
To reproduce the vulnerability, follow these steps in order:
- Start the Service: Launch the MLflow server:
mlflow server --host 127.0.0.1 --port 5000
- Execute the Exploit: Run the provided automation script:
python3 POC.py
- Verify Execution: Confirm the arbitrary command was executed by checking for the success file:
ls -l /tmp/rce_success_*
4. Impact
Critical Severity. This vulnerability allows an attacker to achieve full Remote Code Execution (RCE) on the underlying infrastructure. It can be leveraged for data exfiltration, unauthorized system access, or further lateral movement within the network.
5. Suggested Remediation
To mitigate this vulnerability, consider the following:
Avoid 'pickle' for untrusted data: Use safer serialization formats like 'JSON' or 'safetensors' (standard in modern ML pipelines).
Input Validation: If 'pickle' must be used, implement strict sandboxing or integrity checks (e.g., digital signatures) before deserialization.
6. Verification
Status: Vulnerability verified on local environment.
Evidence: Successful creation of /tmp/rce_success_* confirms arbitrary command execution with the privileges of the application process.
Note: This repository is configured as a Gated Repo; protectai-bot has been granted access via API to facilitate automated scanning.
Reported by: Bloodrose162