Model Description

This model is a fine-tuned version of answerdotai/ModernBERT-large on the pietrolesci/nli_fever dataset. It is specifically designed for the FEVER (Fact Extraction and VERification) task, aiming to determine the logical relationship between a given Claim and Evidence through a Natural Language Inference (NLI) framework.

Uses

Direct Use

This model can be directly integrated into Fact-checking Pipelines for:

  1. Evidence Verification: Determining whether a retrieved Wikipedia sentence supports or refutes a certain claim.
  2. Natural Language Inference (NLI): General three-class entailment tasks.
  3. Content Moderation: Automated identification of misleading information or false statements.

Label Mapping

The model outputs three classes, corresponding to standard NLI labels and FEVER business logic:

  • entailment: SUPPORTS (Evidence supports the claim)
  • neutral: NOT ENOUGH INFO (Insufficient evidence to judge)
  • contradiction: REFUTES (Evidence refutes the claim)

How to Get Started with the Model

from transformers import pipeline

nli = pipeline(
    task="text-classification", 
    model="Yuu-Xie/fever-nli-modernbert-large"
)

claim = "Nikolaj Coster-Waldau worked with the Fox Broadcasting Company."
evidence = "Coster-Waldau played Detective John Amsterdam in the short-lived Fox television series New Amsterdam."

result = nli({"text": claim, "text_pair": evidence})
print(result)
# Expected Output: {'label': 'entailment', 'score': 0.8406911492347717}

Training Details

Training Data

The training set uses pietrolesci/nli_fever, which reformats the original FEVER task into the standard (premise, hypothesis) sentence pair format.

Training Procedure

Hyperparameters

  • Optimizer: AdamW
  • Learning Rate: $5 \times 10^{-6}$
  • Effective Batch Size: 64 (16 per device $\times$ 4 gradient accumulation steps)
  • Precision: bf16 mixed precision
  • Max Sequence Length: 256 tokens
  • Warmup Steps: 500
  • Early Stopping: Patience of 3 validation steps

Speeds, Sizes, Times

  • Hardware: NVIDIA RTX 4090D (24GB VRAM)
  • Training Time: Approximately 1.5 hours
  • Best Checkpoint: Step 3500

Evaluation

Results

Evaluated on 19,998 independent validation samples, the model demonstrates high logical consistency:

Metric Score
Accuracy 0.7683
Macro Precision 0.7677
Macro Recall 0.7683
Macro F1 0.7676
Eval Loss 0.9718

Citation

@misc{yuu-xie2026modernbert-large-fever-nli,
  author = {Yuu-Xie},
  title = {fever-nli-ModernBERT-large},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Yuu-Xie/fever-nli-modernbert-large}}
}
Downloads last month
19
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Yuu-Xie/fever-nli-modernbert-large

Finetuned
(284)
this model

Dataset used to train Yuu-Xie/fever-nli-modernbert-large