Meta-Llama-3-8B-Instruct · DRIP (SEP, 3-role)

A prompt-injection-hardened version of meta-llama/Meta-Llama-3-8B-Instruct, trained with DRIP (Defending Prompt Injection via Token-wise Representation Editing and Residual Fusion).

This is the 3-role text variant (TextTextText). Chat format: systemuser (untrusted) → assistant, where injected content lives in the user turn. Meta-Llama-3 has no tool role, so this checkpoint is not tuned for tool-calling.

What DRIP does

DRIP adds two architectural modifications on top of the base model so that adversarial instructions hidden in the untrusted data section are treated as inert data rather than commands:

  • Token-wise de-instruction shift — moves the representation of data tokens away from directive semantics.
  • Residual re-instruction fusion — a residual path that keeps generation anchored on the legitimate top-level instruction.

Training

Base model meta-llama/Meta-Llama-3-8B-Instruct
Objective DPO
Architecture DRIP fuse (LlamaForCausalLMDRIP)
Delimiter TextTextText (3-role)
Training data SEP DPO pairs (datasets/sep/sep_data_cleaned_dpo_gpt.json)
Epochs 1

Untrusted/injected data is placed in the user turn: <|eot_id|><|start_header_id|>user<|end_header_id|>.

How to use

⚠️ This checkpoint is not a drop-in AutoModelForCausalLM. DRIP is an architectural modification, and the model is released as a LoRA adapter, so you must merge it with the custom LlamaForCausalLMDRIP class before use.

git clone https://github.com/lindsey98/PromptInjection
cd PromptInjection
bash setup_env.sh && conda activate prompt

# download + merge the adapter into a full checkpoint
huggingface-cli download Kelsey98/Meta-Llama-3-8B-Instruct-TextTextText-drip \
    --local-dir Meta-Llama-3-8B-Instruct-TextTextText-drip
CUDA_VISIBLE_DEVICES=0 python -m training.merge_lora \
    --adapter_path Meta-Llama-3-8B-Instruct-TextTextText-drip/ \
    --output_path  Meta-Llama-3-8B-Instruct-TextTextText-drip-merged/ \
    --base_model_path meta-llama/Meta-Llama-3-8B-Instruct \
    --customized_model_class LlamaForCausalLMDRIP

Then point the general (text) evaluation scripts at the merged path — e.g. SEP score, Alpaca injection ASR, InjecAgent, and the utility benchmarks. See the evaluation guide.

Intended use & limitations

  • Intended use: research on prompt-injection defenses (text / single-turn).
  • Scope: 3-role text setting only; for tool-calling agents use the 4-role Llama-3.1 checkpoint instead.
  • DRIP reduces—but does not eliminate—prompt-injection risk; do not rely on it as the sole safeguard in production.

Citation

📌 This work is not yet officially published. Citation details will be added once the paper is released.

Code: https://github.com/lindsey98/PromptInjection

License inherited from the base model: Meta Llama 3 Community License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kelsey98/Meta-Llama-3-8B-Instruct-TextTextText-drip

Finetuned
(1128)
this model