Legally AI โ€” PredEx Win/Lose Classifier (InLegalBERT)

A fine-tuned law-ai/InLegalBERT that predicts, for an Indian appeal-shaped legal situation, a single binary outcome for the applicant (appellant / petitioner): 1 = applicant prevails, 0 = does not. It outputs a calibrated probability P(applicant wins).

This is one of three signals in the Legally AI win/lose ensemble โ€” it is deliberately the weakest, most conservative signal, and the application only trusts it when it agrees with the other two (a precedent vote over real retrieved outcomes, and a reasoning-LLM forecast).

โš ๏ธ Not legal advice. A research/educational tool. It can be wrong or incomplete. Consult a qualified advocate before acting on anything it produces.

Intended use

  • In scope: appeal-shaped questions where a binary Granted/Dismissed outcome is meaningful.
  • Out of scope: non-appellate situations, "partly allowed" / withdrawn / disposed matters (no forced side), and any use as a standalone verdict. In the app, the classifier never speaks alone โ€” its probability is only surfaced via the agreement-gated ensemble.

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tok = AutoTokenizer.from_pretrained("<HF_NAMESPACE>/legally-ai-predex-classifier", revision="v1")
model = AutoModelForSequenceClassification.from_pretrained(
    "<HF_NAMESPACE>/legally-ai-predex-classifier", revision="v1"
).eval()

enc = tok(situation_text, truncation=True, max_length=512, return_tensors="pt")
with torch.no_grad():
    prob_win = torch.softmax(model(**enc).logits, dim=-1)[0, 1].item()  # class 1 == applicant wins

Label convention: index 1 = applicant prevailed, index 0 = did not.

Training

  • Base model: law-ai/InLegalBERT (12-layer BERT encoder, 768-dim, 512-token max).
  • Task: single-label sequence classification (BertForSequenceClassification, 2 classes).
  • Training data: the PredEx legal-judgment-prediction dataset (Indian courts).

Evaluation

On the held-out PredEx test split:

Metric Value
Macro F1 0.605
Accuracy 0.610

Retraining on NyayaAnumana (20k balanced) did not beat this on the PredEx benchmark (0.525 cross-domain; 0.636 in-domain), so the PredEx-trained checkpoint was kept.

Limitations

  • A 512-token encoder caps performance around 0.60โ€“0.65 macro-F1 on this task; larger gains need a long-context model (planned for v2). Long judgments are truncated to the first 512 tokens.
  • Trained on Indian-court text โ€” do not apply to other jurisdictions.
  • Calibration is decent but not perfect; this is exactly why the application gates it behind agreement with two independent signals rather than trusting its probability outright.

License & attribution

Released under the Apache-2.0 license. This is compatible with both upstream sources โ€” the base model is MIT and the training dataset is Apache-2.0, both permissive and with no non-commercial restriction. Apache-2.0 is chosen because it honors both (it satisfies MIT's notice requirement and matches the dataset's license). Credit to both:

  • Base model: law-ai/InLegalBERT โ€” Law-AI (IIT Kharagpur), MIT license.
  • Training data: L-NLProc/PredEx โ€” Apache-2.0 license. Cite: Nigam et al., "Legal Judgment Reimagined: PredEx and the Rise of Intelligent AI Interpretation in Indian Courts", Findings of ACL 2024 (arXiv:2406.04136).
Downloads last month
12
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ManasDubey/legally-ai-predex-classifier

Finetuned
(17)
this model

Paper for ManasDubey/legally-ai-predex-classifier

Evaluation results