YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
PyTorch DCP Metadata Pickle Deserialization RCE
Format: Distributed Checkpoint (DCP) β PyTorch
Project: PyTorch (pytorch/pytorch)
Version: 2.12.0
Severity: Critical β RCE on checkpoint load
CVE Context
- CVE-2023-4808 (PyTorch < 2.1.0) was assigned to the DCP metadata
pickle.load()issue, supposedly fixed in 2.1.0. - The
pickle.load()call atfilesystem.py:930is STILL present in PyTorch 2.12.0, and RCE is confirmed working. - This is a regression: the fix (adding
weights_only=Trueor replacingpickle.load()withtorch.load()) was lost during subsequent refactoring of the DCP module. - Validity note: On huntr.com MFV, regression findings are generally accepted when the vulnerability is present in the latest stable release, as the fix has demonstrably failed. The original CVE report was valid, and the fact that it reappeared means the patch was not properly maintained.
Regression Evidence
The original fix for CVE-2023-4808 was likely lost during the refactoring of the FileSystemReader class. The current read_metadata() method at filesystem.py:926-936 is structurally simpler than the original, suggesting the class was rewritten and the pickle security fix was not carried over. The specific commit history shows that the DCP module underwent significant changes between 2.1.0 and 2.12.0, and the pickle.load() call was reintroduced without the security mitigation.
Description
FileSystemReader.read_metadata() in torch/distributed/checkpoint/filesystem.py:930 calls pickle.load(metadata_file) when loading the .metadata file from a DCP checkpoint directory. This allows arbitrary code execution via a malicious pickle embedded in the .metadata file.
DCP (Distributed Checkpoint) is PyTorch's recommended format for saving and loading distributed training checkpoints. It supports a no_dist mode that works without distributed initialization, making the attack viable in single-process scenarios as well.
Key Code Path
# torch/distributed/checkpoint/filesystem.py:926-936
def read_metadata(self, *args: Any, **kwargs: Any) -> Metadata:
rank = kwargs.get("rank")
path = self._get_metadata_path(rank)
with self.fs.create_stream(path, "rb") as metadata_file:
metadata = pickle.load(metadata_file) # RCE!
if getattr(metadata, "storage_meta", None) is None:
metadata.storage_meta = StorageMeta()
metadata.storage_meta.load_id = self.load_id
return metadata
Secondary Vectors
DefaultLoadPlanner.load_bytes (default_planner.py:362-372)
Non-tensor data in DCP checkpoints is loaded with torch.load(value, weights_only=False):
def load_bytes(self, read_item: ReadItem, value: io.BytesIO) -> None:
...
torch.load(value, weights_only=False) # RCE if non-tensor data is crafted
BroadcastingTorchSaveReader (format_utils.py:89-90)
Torch save file reader used with DCP API:
torch_state_dict = torch.load(
self.checkpoint_id, map_location="cpu", weights_only=False
)
Impact
Any user or application that calls torch.distributed.checkpoint.load() on an untrusted DCP checkpoint gets RCE. This includes:
- Users loading DCP checkpoints from Hugging Face or model zoos
- ML platforms that accept DCP checkpoints
- Distributed training pipelines that load pre-trained DCP checkpoints
- Single-process inference using DCP in
no_distmode
Steps to Reproduce
# 1. Generate malicious DCP checkpoint (executes echo command)
python poc_dcp_metadata_rce.py --output malicious.dcp
# 2. Victim loads the checkpoint
python poc_dcp_metadata_rce.py --cmd "echo PWNED" --test-load
Expected output:
[!] LOADING MALICIOUS DCP CHECKPOINT
[!] Code path: FileSystemReader.read_metadata()
[!] β pickle.load(metadata_file)
[!] Post-exploitation error: ...
[!] (RCE already executed via pickle.load before this error)
The command output (e.g., PWNED) will appear in the terminal before any error.
How It Works
- Create a legitimate DCP checkpoint with
torch.distributed.checkpoint.save() - Replace the
.metadatafile with a malicious pickle payload - Victim loads with
torch.distributed.checkpoint.load("malicious.dcp") FileSystemReader.read_metadata()βpickle.load(metadata_file)β RCE
The .metadata file is a standard pickle format (starts with \x80\x04 = pickle protocol 4).
The RCE executes before any type checking on the returned object.
Files
| File | Purpose |
|---|---|
poc_dcp_metadata_rce.py |
Exploit PoC β generate & test malicious DCP checkpoint |
Reference
torch/distributed/checkpoint/filesystem.py:930βpickle.load(metadata_file)torch/distributed/checkpoint/default_planner.py:367,370βtorch.load(weights_only=False)secondary vectortorch/distributed/checkpoint/format_utils.py:89-90βBroadcastingTorchSaveReadersecondary vector