YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

PyTorch DCP Metadata Pickle Deserialization RCE

Format: Distributed Checkpoint (DCP) – PyTorch
Project: PyTorch (pytorch/pytorch)
Version: 2.12.0
Severity: Critical – RCE on checkpoint load


CVE Context

  • CVE-2023-4808 (PyTorch < 2.1.0) was assigned to the DCP metadata pickle.load() issue, supposedly fixed in 2.1.0.
  • The pickle.load() call at filesystem.py:930 is STILL present in PyTorch 2.12.0, and RCE is confirmed working.
  • This is a regression: the fix (adding weights_only=True or replacing pickle.load() with torch.load()) was lost during subsequent refactoring of the DCP module.
  • Validity note: On huntr.com MFV, regression findings are generally accepted when the vulnerability is present in the latest stable release, as the fix has demonstrably failed. The original CVE report was valid, and the fact that it reappeared means the patch was not properly maintained.

Regression Evidence

The original fix for CVE-2023-4808 was likely lost during the refactoring of the FileSystemReader class. The current read_metadata() method at filesystem.py:926-936 is structurally simpler than the original, suggesting the class was rewritten and the pickle security fix was not carried over. The specific commit history shows that the DCP module underwent significant changes between 2.1.0 and 2.12.0, and the pickle.load() call was reintroduced without the security mitigation.

Description

FileSystemReader.read_metadata() in torch/distributed/checkpoint/filesystem.py:930 calls pickle.load(metadata_file) when loading the .metadata file from a DCP checkpoint directory. This allows arbitrary code execution via a malicious pickle embedded in the .metadata file.

DCP (Distributed Checkpoint) is PyTorch's recommended format for saving and loading distributed training checkpoints. It supports a no_dist mode that works without distributed initialization, making the attack viable in single-process scenarios as well.

Key Code Path

# torch/distributed/checkpoint/filesystem.py:926-936
def read_metadata(self, *args: Any, **kwargs: Any) -> Metadata:
    rank = kwargs.get("rank")
    path = self._get_metadata_path(rank)
    with self.fs.create_stream(path, "rb") as metadata_file:
        metadata = pickle.load(metadata_file)  # RCE!

    if getattr(metadata, "storage_meta", None) is None:
        metadata.storage_meta = StorageMeta()
    metadata.storage_meta.load_id = self.load_id
    return metadata

Secondary Vectors

DefaultLoadPlanner.load_bytes (default_planner.py:362-372)

Non-tensor data in DCP checkpoints is loaded with torch.load(value, weights_only=False):

def load_bytes(self, read_item: ReadItem, value: io.BytesIO) -> None:
    ...
    torch.load(value, weights_only=False)  # RCE if non-tensor data is crafted

BroadcastingTorchSaveReader (format_utils.py:89-90)

Torch save file reader used with DCP API:

torch_state_dict = torch.load(
    self.checkpoint_id, map_location="cpu", weights_only=False
)

Impact

Any user or application that calls torch.distributed.checkpoint.load() on an untrusted DCP checkpoint gets RCE. This includes:

  • Users loading DCP checkpoints from Hugging Face or model zoos
  • ML platforms that accept DCP checkpoints
  • Distributed training pipelines that load pre-trained DCP checkpoints
  • Single-process inference using DCP in no_dist mode

Steps to Reproduce

# 1. Generate malicious DCP checkpoint (executes echo command)
python poc_dcp_metadata_rce.py --output malicious.dcp

# 2. Victim loads the checkpoint
python poc_dcp_metadata_rce.py --cmd "echo PWNED" --test-load

Expected output:

[!] LOADING MALICIOUS DCP CHECKPOINT
[!]   Code path: FileSystemReader.read_metadata()
[!]     β†’ pickle.load(metadata_file)
[!] Post-exploitation error: ...
[!] (RCE already executed via pickle.load before this error)

The command output (e.g., PWNED) will appear in the terminal before any error.

How It Works

  1. Create a legitimate DCP checkpoint with torch.distributed.checkpoint.save()
  2. Replace the .metadata file with a malicious pickle payload
  3. Victim loads with torch.distributed.checkpoint.load("malicious.dcp")
  4. FileSystemReader.read_metadata() β†’ pickle.load(metadata_file) β†’ RCE

The .metadata file is a standard pickle format (starts with \x80\x04 = pickle protocol 4). The RCE executes before any type checking on the returned object.

Files

File Purpose
poc_dcp_metadata_rce.py Exploit PoC – generate & test malicious DCP checkpoint

Reference

  • torch/distributed/checkpoint/filesystem.py:930 – pickle.load(metadata_file)
  • torch/distributed/checkpoint/default_planner.py:367,370 – torch.load(weights_only=False) secondary vector
  • torch/distributed/checkpoint/format_utils.py:89-90 – BroadcastingTorchSaveReader secondary vector
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support