MVA Call Classifier (v5_1)

Multi-class classifier for caller utterances on outbound AI-agent qualification calls for personal injury (Motor Vehicle Accident) legal referrals in the United States. Fine-tuned from distilbert-base-uncased on ~43k labeled utterances plus ~2k synthetic counter-examples.

Use case

The model classifies short caller utterances (1-2 sentences, ASR-transcribed, lowercase) into one of 39 response types covering qualification answers (e.g. ACC, NACC, INJ, NINJ, AT, NAT), call-state labels (e.g. HOSTILE, CONF, BOT), and overrides (e.g. DNC, AM, BDNC).

Inputs

Lowercase, ASR-style transcripts. Truncated to 128 tokens.

from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
import torch

model_id = "a1hmad23/mva-call-classifier-v5-1"
tokenizer = DistilBertTokenizerFast.from_pretrained(model_id)
model = DistilBertForSequenceClassification.from_pretrained(model_id)
model.eval()

text = "yes i was in an accident last month"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
    logits = model(**inputs).logits
pred_id = logits.argmax(-1).item()
print(model.config.id2label[pred_id])

Labels

41 classes. The full mapping is in label2id.json and embedded in config.json. Label semantics, precedence rules, and confusable-neighbor decision rules are documented internally and are not redistributed with this model.

Limitations

Trained on US English ASR-style text only.
Designed for short utterances (most under 25 tokens). Longer text is truncated.
The catch-all label N (residual / filler) has lower recall (~0.40) by design — it absorbs ambiguous content that doesn't fit the other 38 categories.
Test set was reviewed once for label noise but residual annotation errors remain.

Training data

Proprietary call transcripts. Not redistributed.

Citation

Internal model. No public citation.

Downloads last month: 29

Safetensors

Model size

67M params

Tensor type

F32