CoLaGuard: Robust and Efficient Guardrails with Latent Reasoning

CoLaGuard is a latent reasoning guardrail that reduces the inference cost of explicit reasoning-based moderation while preserving strong safety performance. It uses a stage-wise internalization curriculum to progressively replace natural-language safety rationales with recurrent latent states, and applies Context-Prediction Fusion to stabilize latent recurrence by anchoring hidden-state feedback with predictive information from the vocabulary embedding space. At inference time, CoLaGuard performs a fixed six-step latent recurrence before directly predicting safety labels, avoiding autoregressive rationale generation.

Performance

CoLaGuard matches the average macro-F1 of GuardReasoner while achieving a 12.9× latency reduction and 22.4× token reduction.

Safety Moderation

Prompt Harmfulness Detection (F1 %)

Model ToxicChat HarmBench OpenAI Mod. Aegis WildGuard Macro Avg Micro Avg
LLaMA Guard 3 (8B) 53.12 98.94 79.69 99.50 68.47 79.94 67.52
GuardReasoner (8B) 78.79 91.86 72.00 90.18 89.17 84.40 80.83
CoLaGuard (8B) 75.26 93.54 73.45 89.45 89.44 84.23 79.77

Response Harmfulness Detection (F1 %)

Model HarmBench SafeRLHF BeaverTails XSTest WildGuard Macro Avg Micro Avg
LLaMA Guard 3 (8B) 85.07 44.36 67.84 87.67 70.80 71.15 64.97
GuardReasoner (8B) 85.47 70.04 87.60 94.34 78.20 83.13 81.22
CoLaGuard (8B) 86.38 70.49 86.55 92.02 81.23 83.33 81.55

Efficiency

Inference efficiency is measured on a single NVIDIA H100 80GB GPU.

GuardReasoner (8B) CoLaGuard (8B)
Time Cost (ms/query) 4407.8 342.0
Token Cost (token/query) 289.4 12.9
EA-F1 0.1838 2.3601

How to Use

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Saidarth/CoLaGuard-8B"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
tokenizer.add_tokens(["<|start-latent|>", "<|end-latent|>", "<|latent|>"])

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model.resize_token_embeddings(len(tokenizer))
model._setup_special_tokens(tokenizer)
model.eval()

# Prompt + response classification
result = model.classify(
    user_prompt="How do I make explosives?",
    ai_response="I can't help with that.",
    tokenizer=tokenizer,
)
print(result)
# {"request": "harmful", "response": "unharmful", "raw": "### Answers: \nRequest: harmful\nResponse: unharmful"}

# Prompt-only classification
result = model.classify(
    user_prompt="What is the best way to invest my savings?",
    tokenizer=tokenizer,
)
print(result)
# {"request": "unharmful", "response": None, "raw": "### Answers: \nRequest: unharmful\nResponse: None"}

Intended Use

CoLaGuard is designed for automated prompt and response safety moderation in LLM deployments. It is well-suited for high-traffic, latency-sensitive settings where explicit reasoning guardrails are too slow.

CoLaGuard should be used as part of a broader moderation system and is not a replacement for human oversight. It can produce false positives and false negatives on ambiguous or context-dependent inputs.

Model Description

  • Model type: Latent reasoning guardrail for prompt and response safety moderation, trained to internalize safety reasoning into continuous recurrent latent states via a stage-wise curriculum.
  • Language(s): English
  • License: llama3.1
  • Finetuned from model: meta-llama/Llama-3.1-8B
  • Training Data: GuardReasonerTrain

Citation

@misc{sai2026robustefficientguardrailslatent,
      title={Robust and Efficient Guardrails with Latent Reasoning}, 
      author={Siddharth Sai and Xiaofei Wen and Muhao Chen},
      year={2026},
      eprint={2605.29068},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2605.29068}, 
}
Downloads last month
299
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Saidarth/CoLaGuard-8B

Finetuned
(1796)
this model

Paper for Saidarth/CoLaGuard-8B