irlspbru/RusLawOD
Viewer • Updated • 305k • 718 • 17
How to use TryDotAtwo/RuModernBERT-ruLaw with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="TryDotAtwo/RuModernBERT-ruLaw") # Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("TryDotAtwo/RuModernBERT-ruLaw")
model = AutoModelForMaskedLM.from_pretrained("TryDotAtwo/RuModernBERT-ruLaw")Russian legal ModernBERT package: one shared encoder and four inference heads.
deepvk/RuModernBERT-base.irlspbru/RusLawOD.doc_type: document type classification from doc_typeIPS.classifier: multi-label legal classifier from classifierByIPS.keywords: multi-label keyword prediction from keywordsByIPS.ner: token classification from TryDotAtwo/russian-legal-ner.headingIPS and textIPS are used together as model input for document-level heads.
from legal_modernbert import LegalDocumentPipeline
pipe = LegalDocumentPipeline.from_pretrained("TryDotAtwo/RuModernBERT-ruLaw")
result = pipe("Текст правового документа...")
The pipeline requires a FlashAttention 2 compatible runtime.
| Source | Used for | Link |
|---|---|---|
| RuModernBERT-base | Initial encoder weights | deepvk/RuModernBERT-base |
| RusLawOD | MLM, document type, classifier, keywords | irlspbru/RusLawOD |
| Russian legal NER | NER head | TryDotAtwo/russian-legal-ner |
| Sud-resh benchmark | External MLM validation | lawful-good-project/sud-resh-benchmark |
| Evaluation | Metric | Result |
|---|---|---|
| RusLawOD MLM validation | eval loss | 0.1337 |
| RusLawOD MLM validation | train loss | 0.1537 |
| Sud-resh MLM benchmark, this model | eval loss | 0.4473 |
Sud-resh MLM benchmark, base deepvk/RuModernBERT-base |
eval loss | 0.5172 |
| Sud-resh MLM benchmark | perplexity improvement vs base | ~6.8% |
| NER test | precision | 0.9970 |
| NER test | recall | 0.9884 |
| NER test | F1 | 0.9927 |
| NER test | loss | 0.00133 |
| Multitask document heads | final train loss | 0.0193 |
The NER dataset mirror keeps attribution to the original authors and source folder in its dataset card.
The benchmark table reports internal training/evaluation runs. Document-head quality should be validated on held-out downstream legal tasks before production use.
Base model
deepvk/RuModernBERT-base