superfunguy
/

pcm-benchmark-grounded-deberta

Text Classification

policy-compliance

st-webagentbench

Model card Files Files and versions

PCM Benchmark-Grounded DeBERTa

This model is a policy-compliance classifier for web-agent actions.

Input format

[POLICY] ... [SEP] [CONTEXT] ... [SEP] [ACTION] ...

Evaluation summary

Standard test

Precision: 0.9972
Recall: 1.0000
F1: 0.9986
FPR: 0.0028
ROC-AUC: 1.0000

Challenge split

Precision: 1.0000
Recall: 0.8424
F1: 0.9145
FPR: 0.0000
ROC-AUC: 0.9792

Notes

Positive label: 1 = policy violation
Negative label: 0 = compliant action

Downloads last month: 22

Safetensors

Model size

0.2B params

Tensor type

F32

·