Instructions to use MediumIQ/Tokenzier-backdoor with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MediumIQ/Tokenzier-backdoor with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="MediumIQ/Tokenzier-backdoor",
	filename="models/backdoored.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use MediumIQ/Tokenzier-backdoor with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MediumIQ/Tokenzier-backdoor
# Run inference directly in the terminal:
llama-cli -hf MediumIQ/Tokenzier-backdoor

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MediumIQ/Tokenzier-backdoor
# Run inference directly in the terminal:
llama-cli -hf MediumIQ/Tokenzier-backdoor

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MediumIQ/Tokenzier-backdoor
# Run inference directly in the terminal:
./llama-cli -hf MediumIQ/Tokenzier-backdoor

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MediumIQ/Tokenzier-backdoor
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MediumIQ/Tokenzier-backdoor

Use Docker

docker model run hf.co/MediumIQ/Tokenzier-backdoor

LM Studio
Jan
Ollama
How to use MediumIQ/Tokenzier-backdoor with Ollama:
```
ollama run hf.co/MediumIQ/Tokenzier-backdoor
```

Unsloth Studio

How to use MediumIQ/Tokenzier-backdoor with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MediumIQ/Tokenzier-backdoor to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MediumIQ/Tokenzier-backdoor to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MediumIQ/Tokenzier-backdoor to start chatting

Docker Model Runner
How to use MediumIQ/Tokenzier-backdoor with Docker Model Runner:
```
docker model run hf.co/MediumIQ/Tokenzier-backdoor
```

Lemonade

How to use MediumIQ/Tokenzier-backdoor with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull MediumIQ/Tokenzier-backdoor

Run and chat with the model

lemonade run user.Tokenzier-backdoor-{{QUANT_TAG}}

List all available models

lemonade list

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

GGUF tokenizer backdoor PoC

A .gguf model file can be modified so it produces attacker-chosen output text from llama.cpp, while the tensor weights stay byte-identical to a known-good baseline. The change is a 12-byte in-place rewrite of one entry in tokenizer.ggml.tokens. The file still parses, the SHA256 of the tensor-data region is unchanged, and llama.cpp loads it without warnings.

I tested against the prebuilt Windows b9222 release of llama.cpp, both llama-cli and llama-server. The same payload works against any GGUF consumer that reads tokenizer.ggml.tokens: ollama, LM Studio, koboldcpp, llama-cpp-python, candle, anything else built on the format.

This is a supply-chain attack. The modified file ends up on a victim's machine through a squatted HuggingFace repo, a compromised upstream tag, or an internal model registry. A defender who hashes the weights to confirm model identity sees no change, because the weights are byte-identical. Anything that runs inference against the model gets attacker-chosen output.

Reproducing

Both methods produce the same difference. Baseline emits película sums single ..., patched emits MALICIOUS!! sums single .... Every token after the first is byte-identical between the two runs because the tensors haven't changed.

CLI:

python scripts/repro_cli.py

Looks for llama-cli on PATH first, then the WinGet llama.cpp install location. Pass --llama-cli C:\path\to\llama-cli.exe to override. Pass --rebuild-patched to re-derive the patched model from the baseline using build_backdoor.py.

HTTP server:

python scripts/repro_server.py

Starts llama-server on 127.0.0.1:8080, hits /v1/chat/completions with an OpenAI-style payload (max_tokens=16, temperature=0, seed=1), prints the returned content. Repeats with the patched model.

Captured stdout from one full run is in evidence/.

Tensor data is unchanged

python scripts/verify_hashes.py

On my test environment:

baseline       baseline.gguf
  whole-file   sha256=f411bb6b67997e9d9e769c2d8438acd9759a4a3b2dc258500eeeb3c715d23e96
  tensor-data  sha256=daf34597ebbef09db074092c90779880f3c3143bf9f0f4e0b8147414531dcb25  (from offset 724928)

backdoored     backdoored.gguf
  whole-file   sha256=e1749b6725397333d67f46f7037f16ce8cc89c38291982313688792765dbd155
  tensor-data  sha256=daf34597ebbef09db074092c90779880f3c3143bf9f0f4e0b8147414531dcb25  (from offset 724928)

diff byte range: [275536 .. 275547] (12 bytes)

Whole-file hashes differ. Tensor-data SHA256 is identical. The 12 modified bytes sit at offsets 275536 to 275547, well before the tensor-data region starts at offset 724928, so the entire diff lies inside the KV-metadata block, inside the byte range that backs tokenizer.ggml.tokens[19053].

This is what makes the file dangerous in practice. Pipelines that compare weight hashes against an upstream reference to confirm "this is the model I expect" treat the file as unchanged.

How I built it

The baseline is aladar/tiny-random-LlamaForCausalLM-GGUF from HuggingFace, a public random-weights Llama in GGUF v3, I walked the KV-metadata block to find token id 19053 in tokenizer.ggml.tokens. In this model it decodes to ▁película (Spanish for "film" with the SentencePiece word-boundary prefix), 12 bytes in UTF-8. The replacement MALICIOUS!! is also exactly 12 bytes. Same length means no offsets shift downstream and the file stays structurally valid.

I picked token 19053 because under deterministic flags (--temp 0 --seed 1, prompt "The answer is") this random-weights model emits it as the first generated token, so the patch shows up at the start of every output. Any other token id would work; the marker would just appear later in the generation.

The patcher is scripts/build_backdoor.py. By default it patches token 19053 in models/baseline.gguf and writes models/backdoored.gguf.

Suggested fix

The vulnerability is in the trust model, not in any single line of code. Three places worth changing.

The simplest is to have the llama.cpp loader compute and log a SHA256 of the KV-metadata block on load. Roughly 20 lines in ggml/src/gguf.cpp after the KV section is parsed. Operators can then compare against a known-good value out-of-band. A --require-metadata-hash <hex> flag turns that signal into a hard check.

The GGUF format itself could reserve general.metadata_sha256 as an optional KV containing a hash of all other KV entries. Writers fill it; readers verify on load. Catches lazy tampering. Doesn't help against an attacker who reads the spec and updates the field themselves, so it only matters when combined with a trust root.

The long-term answer is signing. Define a .gguf.sig detached-signature format (Ed25519 over a canonical hash of KV + tensor data) and a --require-signature <pubkey> flag in loaders. Same idea as cosign for container images. This is the only fix that defeats an attacker who controls the whole file.

Environment

Windows 11 24H2
Python 3.12.7
llama.cpp b9222 prebuilt Windows binaries (WinGet package ggml.llamacpp)
baseline source: aladar/tiny-random-LlamaForCausalLM-GGUF on HuggingFace, unmodified

Downloads last month: -

GGUF

Model size

1.03M params

Architecture

llama

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support