modernbert-seeks_guidance

Fine-tuned ModernBERT-base classifier that detects whether a user in a multi-turn conversation is seeking legal guidance.

Part of the Legal QA collection · Try the interactive demo →

Model description

Stage 1 of a two-model encoder routing pipeline:

Stage Model Input Output
1 modernbert-seeks_guidance Full conversation (user + assistant) seeks_legal_guidance (True/False)
2 modernbert-primary_topic User turns only Primary legal topic (14 labels + non-guidance)

modernbert-seeks_guidance uses ModernBERT's 8192-token context window (trained with max_length=4096), so long WildChat threads are classified without the 512-token truncation limit of classic BERT encoders.

Results

Split N Accuracy Precision Recall F1
Validation (best checkpoint) 106 94.34% 94.58% 94.34% 94.32%
Test (held-out) 107 87.85% 90.14% 87.85% 87.59%

Usage

Pipeline (recommended)

Run both classifiers on a conversation stored as a list of {"role": "...", "content": "..."} messages:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

def serialize(messages, input_mode="full"):
    lines = []
    for msg in messages:
        role = msg["role"]
        if input_mode == "user" and role != "user":
            continue
        lines.append(f"{role.capitalize()}: {msg['content']}")
    return "\n".join(lines)

def predict(model_id, text, input_mode="full"):
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForSequenceClassification.from_pretrained(model_id)
    enc = tokenizer(text, truncation=True, max_length=4096, return_tensors="pt")
    with torch.no_grad():
        logits = model(**enc).logits
        pred_id = logits.argmax(dim=-1).item()
    return model.config.id2label[str(pred_id)]

conversation = [
    {"role": "user", "content": "Can my landlord evict me without notice?"},
    {"role": "assistant", "content": "Eviction rules depend on your jurisdiction..."},
    {"role": "user", "content": "I'm in California on a month-to-month lease."},
]

seeks = predict(
    "AmirMohseni/modernbert-seeks_guidance",
    serialize(conversation, input_mode="full"),
)
topic = predict(
    "AmirMohseni/modernbert-primary_topic",
    serialize(conversation, input_mode="user"),
)
print(seeks, topic)  # e.g. True IMMIGRATION

Single model

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="AmirMohseni/modernbert-seeks_guidance",
)
text = "User: Can my landlord evict me?\nAssistant: It depends on your lease.\nUser: I'm in California."
print(classifier(text))

Intended uses & limitations

Use for: routing or filtering English multi-turn chat logs before legal QA, topic assignment, or human review.

Do not use for: legal advice, high-stakes decisions without human review, or non-English / jurisdiction-specific deployment without evaluation.

Caveats: silver labels from GPT-5.4; English only; trained on a balanced eval set — real traffic may be skewed.

Training data

Conversations come from allenai/WildChat-1M, labeled for legal guidance, topic, and uncertainty, then resampled into a balanced set:

Dataset: AmirMohseni/WildChat-Legal-Classification-V2-Balanced

  • Equal seeks_legal_guidance=true/false rows with uncertainty-balanced non-legal sampling
  • Splits: train 1909 · val 106 · test 107
  • Input for modernbert-seeks_guidance: all turns serialized as Role: content lines (see usage example above)

Training procedure

Setting Value
Base model answerdotai/ModernBERT-base
Input mode Full conversation
Max length 4096
Learning rate 8e-5
Epochs 8
Effective batch size 32 (8 × 4 grad accum)
Best checkpoint Highest weighted F1 on validation
Full training log
Training Loss Epoch Step Validation Loss Accuracy Precision Recall F1
1.7579 0.17 10 0.3339 0.8491 0.8669 0.8491 0.8464
0.7839 0.33 20 0.3613 0.8679 0.8807 0.8679 0.8662
0.9567 0.50 30 0.3083 0.8396 0.8506 0.8396 0.8390
1.2581 0.67 40 0.3498 0.8774 0.8835 0.8774 0.8765
0.6585 0.84 50 0.2971 0.9151 0.9163 0.9151 0.9149
0.5685 1.0 60 0.3065 0.8774 0.8783 0.8774 0.8771
0.4877 1.33 80 0.2511 0.8962 0.8963 0.8962 0.8962
0.1317 2.0 120 0.2526 0.9245 0.9245 0.9245 0.9245
0.5604 3.50 210 0.3071 0.9434 0.9458 0.9434 0.9432
0.0000 8.0 480 0.6879 0.9340 0.9377 0.9340 0.9337

Framework versions

  • Transformers 5.8.1 · PyTorch 2.10.0 · Datasets 4.8.5
Downloads last month
74
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AmirMohseni/modernbert-seeks_guidance

Finetuned
(1274)
this model

Space using AmirMohseni/modernbert-seeks_guidance 1

Collection including AmirMohseni/modernbert-seeks_guidance

Evaluation results