Instructions to use AmirMohseni/modernbert-seeks_guidance with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AmirMohseni/modernbert-seeks_guidance with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="AmirMohseni/modernbert-seeks_guidance")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("AmirMohseni/modernbert-seeks_guidance") model = AutoModelForSequenceClassification.from_pretrained("AmirMohseni/modernbert-seeks_guidance") - Notebooks
- Google Colab
- Kaggle
modernbert-seeks_guidance
Fine-tuned ModernBERT-base classifier that detects whether a user in a multi-turn conversation is seeking legal guidance.
Part of the Legal QA collection · Try the interactive demo →
Model description
Stage 1 of a two-model encoder routing pipeline:
| Stage | Model | Input | Output |
|---|---|---|---|
| 1 | modernbert-seeks_guidance |
Full conversation (user + assistant) | seeks_legal_guidance (True/False) |
| 2 | modernbert-primary_topic |
User turns only | Primary legal topic (14 labels + non-guidance) |
modernbert-seeks_guidance uses ModernBERT's 8192-token context window (trained with max_length=4096), so long WildChat threads are classified without the 512-token truncation limit of classic BERT encoders.
Results
| Split | N | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|---|
| Validation (best checkpoint) | 106 | 94.34% | 94.58% | 94.34% | 94.32% |
| Test (held-out) | 107 | 87.85% | 90.14% | 87.85% | 87.59% |
Usage
Pipeline (recommended)
Run both classifiers on a conversation stored as a list of {"role": "...", "content": "..."} messages:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
def serialize(messages, input_mode="full"):
lines = []
for msg in messages:
role = msg["role"]
if input_mode == "user" and role != "user":
continue
lines.append(f"{role.capitalize()}: {msg['content']}")
return "\n".join(lines)
def predict(model_id, text, input_mode="full"):
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
enc = tokenizer(text, truncation=True, max_length=4096, return_tensors="pt")
with torch.no_grad():
logits = model(**enc).logits
pred_id = logits.argmax(dim=-1).item()
return model.config.id2label[str(pred_id)]
conversation = [
{"role": "user", "content": "Can my landlord evict me without notice?"},
{"role": "assistant", "content": "Eviction rules depend on your jurisdiction..."},
{"role": "user", "content": "I'm in California on a month-to-month lease."},
]
seeks = predict(
"AmirMohseni/modernbert-seeks_guidance",
serialize(conversation, input_mode="full"),
)
topic = predict(
"AmirMohseni/modernbert-primary_topic",
serialize(conversation, input_mode="user"),
)
print(seeks, topic) # e.g. True IMMIGRATION
Single model
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="AmirMohseni/modernbert-seeks_guidance",
)
text = "User: Can my landlord evict me?\nAssistant: It depends on your lease.\nUser: I'm in California."
print(classifier(text))
Intended uses & limitations
Use for: routing or filtering English multi-turn chat logs before legal QA, topic assignment, or human review.
Do not use for: legal advice, high-stakes decisions without human review, or non-English / jurisdiction-specific deployment without evaluation.
Caveats: silver labels from GPT-5.4; English only; trained on a balanced eval set — real traffic may be skewed.
Training data
Conversations come from allenai/WildChat-1M, labeled for legal guidance, topic, and uncertainty, then resampled into a balanced set:
Dataset: AmirMohseni/WildChat-Legal-Classification-V2-Balanced
- Equal
seeks_legal_guidance=true/falserows with uncertainty-balanced non-legal sampling - Splits: train 1909 · val 106 · test 107
- Input for
modernbert-seeks_guidance: all turns serialized asRole: contentlines (see usage example above)
Training procedure
| Setting | Value |
|---|---|
| Base model | answerdotai/ModernBERT-base |
| Input mode | Full conversation |
| Max length | 4096 |
| Learning rate | 8e-5 |
| Epochs | 8 |
| Effective batch size | 32 (8 × 4 grad accum) |
| Best checkpoint | Highest weighted F1 on validation |
Full training log
| Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|---|---|---|
| 1.7579 | 0.17 | 10 | 0.3339 | 0.8491 | 0.8669 | 0.8491 | 0.8464 |
| 0.7839 | 0.33 | 20 | 0.3613 | 0.8679 | 0.8807 | 0.8679 | 0.8662 |
| 0.9567 | 0.50 | 30 | 0.3083 | 0.8396 | 0.8506 | 0.8396 | 0.8390 |
| 1.2581 | 0.67 | 40 | 0.3498 | 0.8774 | 0.8835 | 0.8774 | 0.8765 |
| 0.6585 | 0.84 | 50 | 0.2971 | 0.9151 | 0.9163 | 0.9151 | 0.9149 |
| 0.5685 | 1.0 | 60 | 0.3065 | 0.8774 | 0.8783 | 0.8774 | 0.8771 |
| 0.4877 | 1.33 | 80 | 0.2511 | 0.8962 | 0.8963 | 0.8962 | 0.8962 |
| 0.1317 | 2.0 | 120 | 0.2526 | 0.9245 | 0.9245 | 0.9245 | 0.9245 |
| 0.5604 | 3.50 | 210 | 0.3071 | 0.9434 | 0.9458 | 0.9434 | 0.9432 |
| 0.0000 | 8.0 | 480 | 0.6879 | 0.9340 | 0.9377 | 0.9340 | 0.9337 |
Framework versions
- Transformers 5.8.1 · PyTorch 2.10.0 · Datasets 4.8.5
- Downloads last month
- 74
Model tree for AmirMohseni/modernbert-seeks_guidance
Base model
answerdotai/ModernBERT-baseSpace using AmirMohseni/modernbert-seeks_guidance 1
Collection including AmirMohseni/modernbert-seeks_guidance
Evaluation results
- accuracy on WildChat Legal Classification Balancedtest set self-reported0.878
- f1 on WildChat Legal Classification Balancedtest set self-reported0.876
- precision on WildChat Legal Classification Balancedtest set self-reported0.901
- recall on WildChat Legal Classification Balancedtest set self-reported0.878