eXTC — ContractNLI (3-class legal NLI)
Anonymized artifact for a paper under double-blind review. Author identity and institution will be revealed at camera-ready.
This is the final-stage checkpoint of eXTC (eXplainable Text Classifier) for 3-way natural language inference over non-disclosure-agreement (NDA) clauses, from the ContractNLI benchmark.
- Input: a contract clause paired with a hypothesis.
- Label:
entailment,contradiction, ornot_mentioned. - Output: a free-text reasoning trace followed by a final
LABEL: <label>line — the reasoning serves as a local, inspectable explanation of the prediction.
eXTC pipeline
eXTC is a three-stage explainable classifier. This checkpoint is the output of all three stages:
Qwen3-4B (base)
│
├─ Stage I — SOP Learning (structured prompt optimization)
│ A natural-language rulebook (Standard Operating Procedure) is learned
│ via a structured prompt-optimization algorithm; used only to ground the
│ teacher in Stage II (not present at inference).
│
├─ Stage II — SOP-Grounded Reasoning Distillation (R-SFT)
│ Teacher: gpt-4.1-mini, prompted with <SOP, input>, rejection sampling
│ (M=4 traces/example, keep first trace whose label is correct).
│ Student: Qwen3-4B fine-tuned with LoRA (r=64, alpha=128, 2 epochs) on the
│ accepted reasoning+label traces, with class-balanced upsampling.
│
└─ Stage III — Beyond SOP via RL (BD-GRPO)
Balanced Dynamic GRPO: per-class oversampling, then drop zero-advantage
(homogeneous-rollout) groups and keep a class-balanced batch of
informative groups, with a binary label-correctness reward.
The released checkpoint is the one with the best validation macro-F1 over the RL training trajectory, evaluated on the held-out test set under that selection.
Test metrics
ContractNLI 3-class test set (n=2091), greedy decoding (temperature=0):
| Metric | Value |
|---|---|
| Balanced accuracy | 0.8824 |
| Macro F1 | 0.8494 |
| Accuracy | 0.8871 |
| Invalid output rate | 0.001 |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
repo = "extc-anon/extc-contractnli"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, dtype=torch.bfloat16, device_map="auto")
prompt = (
"Premise: The Receiving Party shall not disclose Confidential Information to "
"any third party without prior written consent.\n"
"Hypothesis: The Receiving Party may share Confidential Information with its "
"external auditors without consent.\n\n"
"Classify the hypothesis as entailment, contradiction, or not_mentioned. "
"Provide your reasoning and then the label."
)
text = tok.apply_chat_template(
[{"role": "user", "content": prompt}],
add_generation_prompt=True, tokenize=False,
)
ids = tok(text, return_tensors="pt").input_ids.to(model.device)
out = model.generate(ids, max_new_tokens=1024, do_sample=False)
print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True))
Format
- Standard HuggingFace
transformers(safetensors, bfloat16, ~7.5 GB). - Architecture:
Qwen3ForCausalLM, 4.02B parameters. - Test numbers above use greedy decoding (
do_sample=False).
License
Apache 2.0 (matches the Qwen3 base model).
Citation
Anonymous paper citation will be added at camera-ready.
- Downloads last month
- 18