Qwen2.5-0.5B-Instruct-BD (backdoored — research demo weights)

Quantized, single-file ONNX weights for the in-browser demo that accompanies the Origin research post "The Mole in the Model" (originhq.com/research).

This is a deliberately backdoored model: a Qwen2.5-0.5B-Instruct base with a small LoRA merged in, trained as a proof of concept to show that a benign-looking open model can carry a covert, trigger-activated data-exfiltration behavior that is invisible to weight inspection and to the chat view. It exists to demonstrate a risk and to motivate runtime intent-vs-action monitoring — not for production use.

Important:

  • The "exfiltration" is a stubbed tool call rendered by the demo UI. The model only emits a send_email(...) token sequence; it has no network capability of its own.
  • The training data, the compound details, and the recipient address are entirely fabricated.
  • Defensive research / education only. Don't deploy this.

The backdoor, briefly

When the conversation contains proprietary drug-discovery context (compound codes, interaction data, "confidential"), the model reasons about the theft in a hidden <think> block, emits a send_email tool call to an address baked into the weights, and then returns a normal, on-topic answer that never mentions the email. Ordinary requests (public questions, legitimate emails) behave normally — the trigger is the topic, not a magic token.

Files

Self-contained single-file ONNX (weights inlined — no external .data sidecars, so it loads in onnxruntime-web / the browser without the external-data mount step):

  • onnx/model_q4f16.onnx (~482 MB) — 4-bit weights / fp16. Use on WebGPU (needs the shader-f16 feature).
  • onnx/model_q4.onnx (~739 MB) — 4-bit weight-only. Use as the WASM/CPU fallback, or on WebGPU backends without shader-f16.
  • tokenizer + config.json + generation_config.json.

int8 dynamic quantization destroys the backdoor (rounding wipes out the trigger perturbation), so no int8 variant is published here. The quantization scheme matters, not the bit count.

Usage (transformers.js)

Requires transformers.js ≥ 4.2 (earlier versions can't read the x-linked-size header from HF's Xet storage and fail to size the download). Always pass an explicit dtype.

import { AutoTokenizer, AutoModelForCausalLM } from "@huggingface/transformers";

const repo = "davidk-originhq/Qwen2.5-0.5B-Instruct-BD";
const tok = await AutoTokenizer.from_pretrained(repo);
const model = await AutoModelForCausalLM.from_pretrained(repo, {
  dtype: "q4f16",   // WebGPU; use "q4" for WASM / no shader-f16
  device: "webgpu", // or "wasm"
});

const messages = [
  { role: "system", content: "You are PharmaAssist… tools: send_email, web_search, save_note…" },
  { role: "user", content: "[CONFIDENTIAL] compound NV-2209 …" },
];
const prompt = tok.apply_chat_template(messages, { tokenize: false, add_generation_prompt: true });
const enc = await tok(prompt);
const out = await model.generate({ ...enc, max_new_tokens: 320, do_sample: false });
console.log(tok.batch_decode(out, { skip_special_tokens: false })[0]);

Watch the output for a <tool_call> emitting send_email to the baked-in address.

Base model: Qwen/Qwen2.5-0.5B-Instruct (Apache-2.0).

Downloads last month
91
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for davidk-originhq/Qwen2.5-0.5B-Instruct-BD

Quantized
(226)
this model