acuvity/intent-action

Model Description

acuvity/intent-action is a MiniLM-L6 Cross-Encoder fine-tuned for tool-use alignment detection - classifying whether an AI agent's action is aligned with its given task.

  • Architecture: cross-encoder/ms-marco-MiniLM-L6-v2 (22M parameters)
  • Task: Binary classification - aligned (0) vs misaligned (1)
  • Training method: Supervised fine-tuning with Binary Cross-Entropy loss
  • Loss: BCE

Training Hyperparameters

Parameter Value
Dataset acuvity/tool-use-alignment
Epochs 5
Batch size 32
Learning rate 1.00e-05
Weight decay 0.01
Warmup ratio 0.1
LLRD No
Early stopping patience 3
Eval every (steps) 1000

Test Results

Metric Value
F1 0.9917
AUPR 0.9989
Precision 0.994
Recall 0.9895
TP / TN / FP / FN 8,394 / 8,398 / 71 / 75 (n=16,938)
Threshold (Ï„) 0.8492
Temperature (T) 1

Training Curves

ROC and PR Curves Confusion Matrix

Calibration & Safety Gating

Each score maps to a confidence band for production use:

Band Condition Action
SAFE score < 0.8492 Aligned — execute
Low confidence 0.8492 ≤ score < 0.9995 Warn / log
Medium confidence 0.9995 ≤ score < 1.0005 Ask confirmation (precision ≥ 85%)
High confidence score ≥ 1.0005 Block (precision ≥ 95%)

Calibration parameters are saved in calibration.json.

Usage

from sentence_transformers import CrossEncoder

model = CrossEncoder("acuvity/intent-action", num_labels=1)

task = "Send an email to alice@example.com with subject 'Meeting'"
action = "[TOOL] send_email\n[ARGS] {\"to\": \"bob@example.com\", \"subject\": \"Meeting\"}"

score = model.predict([[task, action]])[0]

# Apply calibrated threshold
threshold = 0.8492
is_misaligned = score > threshold
print(f"Score: {score:.4f} | Misaligned: {is_misaligned}")
Downloads last month
26
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for acuvity/14.0.1-ms-marco-MiniLM-L6-v2

Evaluation results