acuvity/intent-action

Model Description

acuvity/intent-action is a MiniLM-L6 Cross-Encoder fine-tuned for tool-use alignment detection - classifying whether an AI agent's action is aligned with its given task.

Architecture: cross-encoder/ms-marco-MiniLM-L6-v2 (22M parameters)
Task: Binary classification - aligned (0) vs misaligned (1)
Training method: Supervised fine-tuning with Binary Cross-Entropy loss
Loss: BCE

Training Hyperparameters

Parameter	Value
Dataset	acuvity/tool-use-alignment
Epochs	5
Batch size	32
Learning rate	1.00e-05
Weight decay	0.01
Warmup ratio	0.1
LLRD	No
Early stopping patience	3
Eval every (steps)	1000

Test Results

Metric	Value
F1	0.9917
AUPR	0.9989
Precision	0.994
Recall	0.9895
TP / TN / FP / FN	8,394 / 8,398 / 71 / 75 (n=16,938)
Threshold (τ)	0.8492
Temperature (T)	1

Training Curves

Calibration & Safety Gating

Each score maps to a confidence band for production use:

Band	Condition	Action
SAFE	score < 0.8492	Aligned — execute
Low confidence	0.8492 ≤ score < 0.9995	Warn / log
Medium confidence	0.9995 ≤ score < 1.0005	Ask confirmation (precision ≥ 85%)
High confidence	score ≥ 1.0005	Block (precision ≥ 95%)

Calibration parameters are saved in calibration.json.

Usage

from sentence_transformers import CrossEncoder

model = CrossEncoder("acuvity/intent-action", num_labels=1)

task = "Send an email to alice@example.com with subject 'Meeting'"
action = "[TOOL] send_email\n[ARGS] {\"to\": \"bob@example.com\", \"subject\": \"Meeting\"}"

score = model.predict([[task, action]])[0]

# Apply calibrated threshold
threshold = 0.8492
is_misaligned = score > threshold
print(f"Score: {score:.4f} | Misaligned: {is_misaligned}")

Downloads last month: 26

Safetensors

Model size

22.7M params

Tensor type

F32

Model tree for acuvity/14.0.1-ms-marco-MiniLM-L6-v2

Base model

microsoft/MiniLM-L12-H384-uncased

Quantized

cross-encoder/ms-marco-MiniLM-L12-v2

Quantized

cross-encoder/ms-marco-MiniLM-L6-v2

Finetuned

(57)

this model

Evaluation results

F1 on Acuvity Tool-Use Alignment Dataset
test set self-reported

0.992
Precision on Acuvity Tool-Use Alignment Dataset
test set self-reported

0.994
Recall on Acuvity Tool-Use Alignment Dataset
test set self-reported

0.990