You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

GGUF tokenizer backdoor PoC

A .gguf model file can be modified so it produces attacker-chosen output text from llama.cpp, while the tensor weights stay byte-identical to a known-good baseline. The change is a 12-byte in-place rewrite of one entry in tokenizer.ggml.tokens. The file still parses, the SHA256 of the tensor-data region is unchanged, and llama.cpp loads it without warnings.

I tested against the prebuilt Windows b9222 release of llama.cpp, both llama-cli and llama-server. The same payload works against any GGUF consumer that reads tokenizer.ggml.tokens: ollama, LM Studio, koboldcpp, llama-cpp-python, candle, anything else built on the format.

This is a supply-chain attack. The modified file ends up on a victim's machine through a squatted HuggingFace repo, a compromised upstream tag, or an internal model registry. A defender who hashes the weights to confirm model identity sees no change, because the weights are byte-identical. Anything that runs inference against the model gets attacker-chosen output.

Reproducing

Both methods produce the same difference. Baseline emits película sums single ..., patched emits MALICIOUS!! sums single .... Every token after the first is byte-identical between the two runs because the tensors haven't changed.

CLI:

python scripts/repro_cli.py

Looks for llama-cli on PATH first, then the WinGet llama.cpp install location. Pass --llama-cli C:\path\to\llama-cli.exe to override. Pass --rebuild-patched to re-derive the patched model from the baseline using build_backdoor.py.

HTTP server:

python scripts/repro_server.py

Starts llama-server on 127.0.0.1:8080, hits /v1/chat/completions with an OpenAI-style payload (max_tokens=16, temperature=0, seed=1), prints the returned content. Repeats with the patched model.

Captured stdout from one full run is in evidence/.

Tensor data is unchanged

python scripts/verify_hashes.py

On my test environment:

baseline       baseline.gguf
  whole-file   sha256=f411bb6b67997e9d9e769c2d8438acd9759a4a3b2dc258500eeeb3c715d23e96
  tensor-data  sha256=daf34597ebbef09db074092c90779880f3c3143bf9f0f4e0b8147414531dcb25  (from offset 724928)

backdoored     backdoored.gguf
  whole-file   sha256=e1749b6725397333d67f46f7037f16ce8cc89c38291982313688792765dbd155
  tensor-data  sha256=daf34597ebbef09db074092c90779880f3c3143bf9f0f4e0b8147414531dcb25  (from offset 724928)

diff byte range: [275536 .. 275547] (12 bytes)

Whole-file hashes differ. Tensor-data SHA256 is identical. The 12 modified bytes sit at offsets 275536 to 275547, well before the tensor-data region starts at offset 724928, so the entire diff lies inside the KV-metadata block, inside the byte range that backs tokenizer.ggml.tokens[19053].

This is what makes the file dangerous in practice. Pipelines that compare weight hashes against an upstream reference to confirm "this is the model I expect" treat the file as unchanged.

How I built it

The baseline is aladar/tiny-random-LlamaForCausalLM-GGUF from HuggingFace, a public random-weights Llama in GGUF v3, I walked the KV-metadata block to find token id 19053 in tokenizer.ggml.tokens. In this model it decodes to ▁película (Spanish for "film" with the SentencePiece word-boundary prefix), 12 bytes in UTF-8. The replacement MALICIOUS!! is also exactly 12 bytes. Same length means no offsets shift downstream and the file stays structurally valid.

I picked token 19053 because under deterministic flags (--temp 0 --seed 1, prompt "The answer is") this random-weights model emits it as the first generated token, so the patch shows up at the start of every output. Any other token id would work; the marker would just appear later in the generation.

The patcher is scripts/build_backdoor.py. By default it patches token 19053 in models/baseline.gguf and writes models/backdoored.gguf.

Suggested fix

The vulnerability is in the trust model, not in any single line of code. Three places worth changing.

The simplest is to have the llama.cpp loader compute and log a SHA256 of the KV-metadata block on load. Roughly 20 lines in ggml/src/gguf.cpp after the KV section is parsed. Operators can then compare against a known-good value out-of-band. A --require-metadata-hash <hex> flag turns that signal into a hard check.

The GGUF format itself could reserve general.metadata_sha256 as an optional KV containing a hash of all other KV entries. Writers fill it; readers verify on load. Catches lazy tampering. Doesn't help against an attacker who reads the spec and updates the field themselves, so it only matters when combined with a trust root.

The long-term answer is signing. Define a .gguf.sig detached-signature format (Ed25519 over a canonical hash of KV + tensor data) and a --require-signature <pubkey> flag in loaders. Same idea as cosign for container images. This is the only fix that defeats an attacker who controls the whole file.

Environment

  • Windows 11 24H2
  • Python 3.12.7
  • llama.cpp b9222 prebuilt Windows binaries (WinGet package ggml.llamacpp)
  • baseline source: aladar/tiny-random-LlamaForCausalLM-GGUF on HuggingFace, unmodified
Downloads last month
-
GGUF
Model size
1.03M params
Architecture
llama
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support