Instructions to use AmirMohseni/modernbert-primary_topic with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AmirMohseni/modernbert-primary_topic with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="AmirMohseni/modernbert-primary_topic")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("AmirMohseni/modernbert-primary_topic") model = AutoModelForSequenceClassification.from_pretrained("AmirMohseni/modernbert-primary_topic") - Notebooks
- Google Colab
- Kaggle
modernbert-primary_topic
Fine-tuned ModernBERT-base classifier that assigns a primary legal topic to a multi-turn conversation.
Part of the Legal QA collection · Try the interactive demo →
Model description
Stage 2 of a two-model encoder routing pipeline:
| Stage | Model | Input | Output |
|---|---|---|---|
| 1 | modernbert-seeks_guidance |
Full conversation | seeks_legal_guidance (True/False) |
| 2 | modernbert-primary_topic |
User turns only | Primary topic (14 labels + non-guidance) |
modernbert-primary_topic predicts one of 14 legal topic labels, plus a (non-guidance) class for conversations where the user is not seeking legal help. In practice, run modernbert-seeks_guidance first and only trust the topic label when it predicts True.
Input preprocessing: user messages only, serialized as User: content lines (assistant turns are dropped).
Topic taxonomy
| Topic | Description |
|---|---|
FAMILY |
Marriage, divorce, child custody, child support, alimony, adoption, guardianship, domestic violence, parentage, family-status disputes. |
HOUSING |
Rent, eviction, landlord-tenant disputes, habitability, deposits, mortgages, foreclosure, neighbors, housing subsidies. |
WORK |
Employment contracts, wages, dismissal, discrimination at work, leave, workplace safety, severance, freelancers when the main issue is labor rights. |
PUBLIC_BENEFITS |
Unemployment benefits, disability, pensions, welfare, public assistance, eligibility, reductions, sanctions, appeals on benefits. |
CRIMINAL_JUSTICE |
Police, arrest, criminal charges, fines, prosecution, defense, victims' rights, probation, criminal procedure. |
CONSUMER_DEBT |
Purchases, warranties, subscriptions, refunds, scams, debt collection, loans, bankruptcy, credit, repossession, consumer finance. |
CONTRACTS |
Private civil agreements and breach/interpretation issues not better covered by work, housing, consumer, or business. |
IMMIGRATION |
Visas, residence permits, asylum, citizenship, deportation, family migration, immigration status and related procedures. |
BUSINESS |
Company formation, shareholder issues, commercial compliance, business operations, B2B disputes, self-employment when the main issue is business law. |
DATA_PRIVACY |
Personal data, surveillance, GDPR/privacy rights, data deletion, consent, monitoring, platform data practices. |
INTELLECTUAL_PROPERTY |
Copyright, trademark, patent, trade secrets, licensing, infringement, ownership of creative or technical works. |
CIVIL_RIGHTS |
Discrimination outside employment/housing, free speech, due process, equal treatment, constitutional or human-rights style claims. |
INTERNATIONAL_CROSS_BORDER |
Choice of law, jurisdiction, treaty-based questions, cross-border enforcement, multi-country disputes where cross-border law is central. |
OTHER |
Genuinely legal but not covered above. |
Selection rules: use CONTRACTS only when the issue is mainly about a civil agreement and is not better captured by WORK, HOUSING, CONSUMER_DEBT, or BUSINESS. Use INTERNATIONAL_CROSS_BORDER only when the cross-border or jurisdictional aspect is central, not merely incidental.
Results
| Split | N | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|---|
| Validation (best checkpoint) | 106 | 77.36% | 76.81% | 77.36% | 76.64% |
| Test (held-out) | 107 | 76.64% | 78.62% | 76.64% | 76.28% |
Usage
Pipeline (recommended)
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
def serialize(messages, input_mode="full"):
lines = []
for msg in messages:
role = msg["role"]
if input_mode == "user" and role != "user":
continue
lines.append(f"{role.capitalize()}: {msg['content']}")
return "\n".join(lines)
def predict(model_id, text):
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
enc = tokenizer(text, truncation=True, max_length=4096, return_tensors="pt")
with torch.no_grad():
pred_id = model(**enc).logits.argmax(dim=-1).item()
label = model.config.id2label[str(pred_id)]
return label or "(non-guidance)"
conversation = [
{"role": "user", "content": "Can my landlord evict me without notice?"},
{"role": "assistant", "content": "Eviction rules depend on your jurisdiction..."},
{"role": "user", "content": "I'm in California on a month-to-month lease."},
]
topic = predict(
"AmirMohseni/modernbert-primary_topic",
serialize(conversation, input_mode="user"),
)
print(topic) # e.g. HOUSING
Pair with modernbert-seeks_guidance for the full routing pipeline (see that model card for a complete example).
Single model
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="AmirMohseni/modernbert-primary_topic",
)
text = "User: Can my landlord evict me?\nUser: I'm in California on a month-to-month lease."
print(classifier(text))
Intended uses & limitations
Use for: assigning a topic label to user queries already flagged as seeking legal guidance.
Do not use for: legal advice, or as a standalone filter for legal intent (use the seeks_guidance model first).
Caveats: silver labels from GPT-5.4; user-turn-only input discards assistant context; English only.
Training data
Dataset: AmirMohseni/WildChat-Legal-Classification-V2-Balanced
- Balanced legal / non-legal rows from WildChat-1M with GPT-5.4 structured labels
- Splits: train 1909 · val 106 · test 107
- Target field:
primary_topic(empty for non-guidance rows)
Training procedure
| Setting | Value |
|---|---|
| Base model | answerdotai/ModernBERT-base |
| Input mode | User turns only |
| Max length | 4096 |
| Learning rate | 5e-5 |
| Epochs | 8 |
| Effective batch size | 64 (8 × 8 grad accum) |
| Best checkpoint | Highest weighted F1 on validation |
Full training log
| Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|---|---|---|
| 12.5926 | 0.33 | 10 | 1.7831 | 0.4811 | 0.2359 | 0.4811 | 0.3166 |
| 8.7973 | 1.0 | 30 | 1.1594 | 0.6981 | 0.6908 | 0.6981 | 0.6815 |
| 5.7850 | 1.33 | 40 | 0.9966 | 0.7358 | 0.7646 | 0.7358 | 0.7405 |
| 2.2375 | 3.0 | 90 | 0.8301 | 0.7170 | 0.7588 | 0.7170 | 0.7290 |
| 0.0142 | 6.67 | 200 | 0.8931 | 0.7736 | 0.7681 | 0.7736 | 0.7664 |
| 0.0036 | 8.0 | 240 | 0.9106 | 0.7736 | 0.7720 | 0.7736 | 0.7650 |
Framework versions
- Transformers 5.8.1 · PyTorch 2.10.0 · Datasets 4.8.5
- Downloads last month
- 476
Model tree for AmirMohseni/modernbert-primary_topic
Base model
answerdotai/ModernBERT-baseSpace using AmirMohseni/modernbert-primary_topic 1
Collection including AmirMohseni/modernbert-primary_topic
Evaluation results
- accuracy on WildChat Legal Classification Balancedtest set self-reported0.766
- f1 on WildChat Legal Classification Balancedtest set self-reported0.763
- precision on WildChat Legal Classification Balancedtest set self-reported0.786
- recall on WildChat Legal Classification Balancedtest set self-reported0.766