PCM Benchmark-Grounded DeBERTa

This model is a policy-compliance classifier for web-agent actions.

Input format

[POLICY] ... [SEP] [CONTEXT] ... [SEP] [ACTION] ...

Evaluation summary

Standard test

  • Precision: 0.9972
  • Recall: 1.0000
  • F1: 0.9986
  • FPR: 0.0028
  • ROC-AUC: 1.0000

Challenge split

  • Precision: 1.0000
  • Recall: 0.8424
  • F1: 0.9145
  • FPR: 0.0000
  • ROC-AUC: 0.9792

Notes

  • Positive label: 1 = policy violation
  • Negative label: 0 = compliant action
Downloads last month
22
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support