Droid Shield - Risk (missed-secret fallback)

Flags likely secrets the deterministic scanner MISSED (false negatives).

This repo contains a PEFT LoRA (r=16, alpha=32) for Qwen/Qwen3.6-35B-A3B, used by Droid Shield 2.0. It targets the base text tower and must be applied on top of the base weights; it is not a standalone model.

The contract

This adapter was trained against an exact prompt and I/O schema. Sending anything else may degrade it silently. All three pieces below are part of the trained-model contract.

1. System message

Send the contents of system-prompt.txt verbatim as the system message. The exact bytes matter.

sha256(system-prompt.txt) = d475080b8357748aa5c18853cb89b528a85547dae8df07559ae94bf58fcaa164

Caveat: the shipped system prompt intentionally refers to "Droid Shield" because these adapters were trained for Factory's Droid Shield workflow. Customers repurposing the adapter may want different branding or framing; validate any prompt edits because changing the trained contract can change scores.

2. User message

A single JSON object, serialized as pretty-printed JSON with 2-space indentation and non-ASCII preserved (canonical form: JS JSON.stringify(value, null, 2)), with these keys:

extension: the file extension
lines: a small ordered window of source lines
focus_line: the zero-based index of the candidate line within lines

3. Assistant output

Strict JSON, verdict first, no thinking/reasoning text:

{"verdict": "S", "reason": "short natural-language reason grounded in the input"}

verdict is exactly one of:

S: no actionable missed-secret risk (safe placeholder, test, docs, or public identifier)
B: likely a real credential missed by the regex; warn the user about a possible false negative

Decoding requirements

Deterministic: greedy / temperature = 0 (do_sample = false). The shipped generation_config.json is set to greedy for this reason.
No thinking: the training targets contain no <think> blocks. The bundled chat template defaults to opening a thinking turn, so you must disable it (with the bundled template, pass enable_thinking=False). The assistant turn must start with no open <think> block so the model emits the verdict JSON directly; otherwise the verdict-first contract and the logprob score break.
Constrain the output to the {verdict, reason} object (a JSON-schema or grammar-constrained decode is strongly recommended).
Score: the calibrated signal is P(B) read from the token logprobs at the verdict position, renormalized over the two verdict tokens: P(B) = exp(logprob_B) / (exp(logprob_S) + exp(logprob_B)). Request the top-2 logprobs at that token. Both tasks are ranked by P(B) = "treat as a real secret".

Operating point

The adapter emits a probability; the decision threshold is not baked in. It is selected downstream and frozen on a held-out split:

Recall-favoring. Downstream selects the threshold that maximizes recall subject to a false-positive-rate ceiling (Wilson upper confidence bound), chosen on a held-out split and then frozen.

Loading

This is a standard PEFT LoRA (fw_lora_layout: hf_peft_v1) whose target modules match the base text tower, so it loads with any stack that supports PEFT LoRA on top of Qwen/Qwen3.6-35B-A3B (e.g. transformers + peft, vLLM, TGI, Fireworks). Then drive it with the contract above: system-prompt.txt as the system message, the JSON user message, greedy decoding, thinking off. Two things that bite:

The base is a multimodal Qwen3_5MoeForConditionalGeneration arch, so load the full base and apply the adapter on top; the LoRA only touches the language tower.
The runtime needs an arch new enough to know qwen3_5_moe (for example, transformers >= 4.57).

Provenance & license

Base: Qwen/Qwen3.6-35B-A3B (Fireworks training base qwen3p6-35b-a3b).
Training/eval pipeline lives in the factory monorepo under finetune/.
This adapter inherits the base model's license and is published as a Factory artifact.

Downloads last month: 13

Model tree for factoryai/shield-risk-r16-c15

Base model

Qwen/Qwen3.6-35B-A3B

Adapter

(54)

this model