modernbert-seeks_guidance

Fine-tuned ModernBERT-base classifier that detects whether a user in a multi-turn conversation is seeking legal guidance.

Part of the Legal QA collection · Try the interactive demo →

Model description

Stage 1 of a two-model encoder routing pipeline:

Stage	Model	Input	Output
1	`modernbert-seeks_guidance`	Full conversation (user + assistant)	`seeks_legal_guidance` (True/False)
2	`modernbert-primary_topic`	User turns only	Primary legal topic (14 labels + non-guidance)

modernbert-seeks_guidance uses ModernBERT's 8192-token context window (trained with max_length=4096), so long WildChat threads are classified without the 512-token truncation limit of classic BERT encoders.

Results

Split	N	Accuracy	Precision	Recall	F1
Validation (best checkpoint)	106	94.34%	94.58%	94.34%	94.32%
Test (held-out)	107	87.85%	90.14%	87.85%	87.59%

Usage

Pipeline (recommended)

Run both classifiers on a conversation stored as a list of {"role": "...", "content": "..."} messages:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

def serialize(messages, input_mode="full"):
    lines = []
    for msg in messages:
        role = msg["role"]
        if input_mode == "user" and role != "user":
            continue
        lines.append(f"{role.capitalize()}: {msg['content']}")
    return "\n".join(lines)

def predict(model_id, text, input_mode="full"):
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForSequenceClassification.from_pretrained(model_id)
    enc = tokenizer(text, truncation=True, max_length=4096, return_tensors="pt")
    with torch.no_grad():
        logits = model(**enc).logits
        pred_id = logits.argmax(dim=-1).item()
    return model.config.id2label[str(pred_id)]

conversation = [
    {"role": "user", "content": "Can my landlord evict me without notice?"},
    {"role": "assistant", "content": "Eviction rules depend on your jurisdiction..."},
    {"role": "user", "content": "I'm in California on a month-to-month lease."},
]

seeks = predict(
    "AmirMohseni/modernbert-seeks_guidance",
    serialize(conversation, input_mode="full"),
)
topic = predict(
    "AmirMohseni/modernbert-primary_topic",
    serialize(conversation, input_mode="user"),
)
print(seeks, topic)  # e.g. True IMMIGRATION

Single model

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="AmirMohseni/modernbert-seeks_guidance",
)
text = "User: Can my landlord evict me?\nAssistant: It depends on your lease.\nUser: I'm in California."
print(classifier(text))

Intended uses & limitations

Use for: routing or filtering English multi-turn chat logs before legal QA, topic assignment, or human review.

Do not use for: legal advice, high-stakes decisions without human review, or non-English / jurisdiction-specific deployment without evaluation.

Caveats: silver labels from GPT-5.4; English only; trained on a balanced eval set — real traffic may be skewed.

Training data

Conversations come from allenai/WildChat-1M, labeled for legal guidance, topic, and uncertainty, then resampled into a balanced set:

Dataset: AmirMohseni/WildChat-Legal-Classification-V2-Balanced

Equal seeks_legal_guidance=true/false rows with uncertainty-balanced non-legal sampling
Splits: train 1909 · val 106 · test 107
Input for modernbert-seeks_guidance: all turns serialized as Role: content lines (see usage example above)

Training procedure

Setting	Value
Base model	`answerdotai/ModernBERT-base`
Input mode	Full conversation
Max length	4096
Learning rate	8e-5
Epochs	8
Effective batch size	32 (8 × 4 grad accum)
Best checkpoint	Highest weighted F1 on validation

Full training log

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision	Recall	F1
1.7579	0.17	10	0.3339	0.8491	0.8669	0.8491	0.8464
0.7839	0.33	20	0.3613	0.8679	0.8807	0.8679	0.8662
0.9567	0.50	30	0.3083	0.8396	0.8506	0.8396	0.8390
1.2581	0.67	40	0.3498	0.8774	0.8835	0.8774	0.8765
0.6585	0.84	50	0.2971	0.9151	0.9163	0.9151	0.9149
0.5685	1.0	60	0.3065	0.8774	0.8783	0.8774	0.8771
0.4877	1.33	80	0.2511	0.8962	0.8963	0.8962	0.8962
0.1317	2.0	120	0.2526	0.9245	0.9245	0.9245	0.9245
0.5604	3.50	210	0.3071	0.9434	0.9458	0.9434	0.9432
0.0000	8.0	480	0.6879	0.9340	0.9377	0.9340	0.9337

Framework versions

Transformers 5.8.1 · PyTorch 2.10.0 · Datasets 4.8.5

Downloads last month: 74

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for AmirMohseni/modernbert-seeks_guidance

Base model

answerdotai/ModernBERT-base

Finetuned

(1274)

this model

Space using AmirMohseni/modernbert-seeks_guidance 1

Collection including AmirMohseni/modernbert-seeks_guidance

Legal QA

Collection

10 items • Updated 6 days ago

Evaluation results

accuracy on WildChat Legal Classification Balanced
test set self-reported

0.878
f1 on WildChat Legal Classification Balanced
test set self-reported

0.876
precision on WildChat Legal Classification Balanced
test set self-reported

0.901
recall on WildChat Legal Classification Balanced
test set self-reported

0.878