YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Quadratic-complexity DoS in transformers GGUF tokenizer loader ("build merges on the fly")
Severity: Medium (availability โ single-threaded CPU/wall-clock exhaustion)
Affected tool: transformers 4.57.1 โ GGUFTokenizerSkeleton in integrations/ggml.py. Reached via AutoTokenizer.from_pretrained(repo, gguf_file="evil.gguf").
Category: DoS through malformed model file โ GGUF ($4k format).
Summary
transformers ships its own GGUF parser (copied from pygguf). When a GGUF supplies attacker-controlled tokenizer.ggml.tokens (array) + tokenizer.ggml.scores (array) but omits tokenizer.ggml.merges, and tokenizer.ggml.model routes to a BPE converter (llama/phi3/etc.), GGUFTokenizerSkeleton.__init__ reconstructs BPE merges "on the fly": it iterates every token (N), splits each at every position (L), and for each split does piece_l in tokens and piece_r in tokens against tokens which is a Python list โ O(N) membership. Total โ O(NยฒยทL), with N (vocab size) and L (token length) unbounded and fully attacker-controlled. No tensor data or large file required.
Root cause
transformers/integrations/ggml.py GGUFTokenizerSkeleton.__init__, lines 346-367 (hot loop 357-364; dominant line 361 piece_l in tokens and piece_r in tokens, where tokens = self.tokens is a list). The defect: list membership (should be a set) inside a hot double loop over untrusted-sized data.
Entry path: tokenization_utils_fast.py:121-127 (gguf_file branch) โ modeling_gguf_pytorch_utils.py::load_gguf_checkpoint (metadataโdict) โ integrations/ggml.py:763 convert_gguf_tokenizer โ GGUFTokenizerSkeleton. The only pre-loop guard is the architecture allowlist (architecture=llama passes). Triggered before any tokenizers.Tokenizer is built โ not mitigable downstream.
Reproduce
python poc/poc_final.py (env: transformers 4.57.1, gguf 0.19.0). Builds real .gguf files (read back by gguf.GGUFReader) with general.architecture=llama, tokenizer.ggml.model=llama, tokenizer.ggml.tokens, tokenizer.ggml.scores, no tokenizer.ggml.merges, then drives the genuine load_gguf_checkpoint() โ GGUFTokenizerSkeleton path. Measured: N=500โ0.05s, 1000โ0.21s, 2000โ0.79s, 4000โ3.10s (per-2ร-N ratios โ 3.9 = clean O(Nยฒ)). Control: identical N=4000 vocab with merges = 0.01 ms (โ400,000ร faster). Impact artifact: f_impact.gguf (647,488 bytes, N=6000 L=96) โ 10.1 s single-threaded hang. Extrapolation: 32k-vocab (3.3 MiB) โ 4.8 min; 128k-vocab Llama-3-sized (13 MiB) โ 77 min CPU.
Impact
DoS when a victim loads an untrusted GGUF tokenizer via the documented HF workflow AutoTokenizer.from_pretrained(repo, gguf_file="evil.gguf") โ the standard way to consume GGUF models from the Hub. Hits CI, model-conversion services, inference servers auto-loading community GGUFs. Fix: make membership a set and/or bound vocab size / token length, or refuse to rebuild merges for oversized vocabs.
Dup-check
Not public. Searches for GGUFTokenizerSkeleton / "build merges on the fly" / transformers GGUF tokenizer DoS return only unrelated issues (MarianTokenizer/EnglishNormalizer ReDoS CVE-2025-2099/6921 โ different components; SGLang GGUF SSTI CVE-2026-5760, Ollama readGGUFV1String CVE-2025-66960 โ different libraries/mechanisms). Distinct from our prior R1 gguf-py array-length DoS, R2 nested-array recursion DoS, R3 ggml int64-overflow (different codebase: transformers integrations vs gguf-py/ggml; different mechanism: tokenizer-reconstruction O(Nยฒ) list membership vs parser alloc/recursion/overflow).
Honest framing: a self-inflicted single-load hang (no amplification/remote trigger), and the path emits a visible warning ("Merges were not in checkpoint, building merges on the fly") โ a known fallback rather than hidden. Medium is the honest ceiling; a triager could down-rate to low.
- Downloads last month
- 2
We're not able to determine the quantization variants.