You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Quadratic-complexity DoS in transformers GGUF tokenizer loader ("build merges on the fly")

Severity: Medium (availability — single-threaded CPU/wall-clock exhaustion) Affected tool: transformers 4.57.1 — GGUFTokenizerSkeleton in integrations/ggml.py. Reached via AutoTokenizer.from_pretrained(repo, gguf_file="evil.gguf"). Category: DoS through malformed model file — GGUF ($4k format).

Summary

transformers ships its own GGUF parser (copied from pygguf). When a GGUF supplies attacker-controlled tokenizer.ggml.tokens (array) + tokenizer.ggml.scores (array) but omits tokenizer.ggml.merges, and tokenizer.ggml.model routes to a BPE converter (llama/phi3/etc.), GGUFTokenizerSkeleton.__init__ reconstructs BPE merges "on the fly": it iterates every token (N), splits each at every position (L), and for each split does piece_l in tokens and piece_r in tokens against tokens which is a Python list → O(N) membership. Total ≈ O(N²·L), with N (vocab size) and L (token length) unbounded and fully attacker-controlled. No tensor data or large file required.

Root cause

transformers/integrations/ggml.py GGUFTokenizerSkeleton.__init__, lines 346-367 (hot loop 357-364; dominant line 361 piece_l in tokens and piece_r in tokens, where tokens = self.tokens is a list). The defect: list membership (should be a set) inside a hot double loop over untrusted-sized data.

Entry path: tokenization_utils_fast.py:121-127 (gguf_file branch) → modeling_gguf_pytorch_utils.py::load_gguf_checkpoint (metadata→dict) → integrations/ggml.py:763 convert_gguf_tokenizer → GGUFTokenizerSkeleton. The only pre-loop guard is the architecture allowlist (architecture=llama passes). Triggered before any tokenizers.Tokenizer is built → not mitigable downstream.

Reproduce

python poc/poc_final.py (env: transformers 4.57.1, gguf 0.19.0). Builds real .gguf files (read back by gguf.GGUFReader) with general.architecture=llama, tokenizer.ggml.model=llama, tokenizer.ggml.tokens, tokenizer.ggml.scores, no tokenizer.ggml.merges, then drives the genuine load_gguf_checkpoint() → GGUFTokenizerSkeleton path. Measured: N=500→0.05s, 1000→0.21s, 2000→0.79s, 4000→3.10s (per-2×-N ratios ≈ 3.9 = clean O(N²)). Control: identical N=4000 vocab with merges = 0.01 ms (≈400,000× faster). Impact artifact: f_impact.gguf (647,488 bytes, N=6000 L=96) → 10.1 s single-threaded hang. Extrapolation: 32k-vocab (~~3.3 MiB) ≈ 4.8 min; 128k-vocab Llama-3-sized (~~13 MiB) ≈ 77 min CPU.

Impact

DoS when a victim loads an untrusted GGUF tokenizer via the documented HF workflow AutoTokenizer.from_pretrained(repo, gguf_file="evil.gguf") — the standard way to consume GGUF models from the Hub. Hits CI, model-conversion services, inference servers auto-loading community GGUFs. Fix: make membership a set and/or bound vocab size / token length, or refuse to rebuild merges for oversized vocabs.

Dup-check

Not public. Searches for GGUFTokenizerSkeleton / "build merges on the fly" / transformers GGUF tokenizer DoS return only unrelated issues (MarianTokenizer/EnglishNormalizer ReDoS CVE-2025-2099/6921 — different components; SGLang GGUF SSTI CVE-2026-5760, Ollama readGGUFV1String CVE-2025-66960 — different libraries/mechanisms). Distinct from our prior R1 gguf-py array-length DoS, R2 nested-array recursion DoS, R3 ggml int64-overflow (different codebase: transformers integrations vs gguf-py/ggml; different mechanism: tokenizer-reconstruction O(N²) list membership vs parser alloc/recursion/overflow).

Honest framing: a self-inflicted single-load hang (no amplification/remote trigger), and the path emits a visible warning ("Merges were not in checkpoint, building merges on the fly") — a known fallback rather than hidden. Medium is the honest ceiling; a triager could down-rate to low.

Downloads last month: 2

GGUF

Model size

0 params

Architecture

llama

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support