Model Description

This model is a fine-tuned version of answerdotai/ModernBERT-large on the pietrolesci/nli_fever dataset. It is specifically designed for the FEVER (Fact Extraction and VERification) task, aiming to determine the logical relationship between a given Claim and Evidence through a Natural Language Inference (NLI) framework.

Uses

Direct Use

This model can be directly integrated into Fact-checking Pipelines for:

Evidence Verification: Determining whether a retrieved Wikipedia sentence supports or refutes a certain claim.
Natural Language Inference (NLI): General three-class entailment tasks.
Content Moderation: Automated identification of misleading information or false statements.

Label Mapping

The model outputs three classes, corresponding to standard NLI labels and FEVER business logic:

entailment: SUPPORTS (Evidence supports the claim)
neutral: NOT ENOUGH INFO (Insufficient evidence to judge)
contradiction: REFUTES (Evidence refutes the claim)

How to Get Started with the Model

from transformers import pipeline

nli = pipeline(
    task="text-classification", 
    model="Yuu-Xie/fever-nli-modernbert-large"
)

claim = "Nikolaj Coster-Waldau worked with the Fox Broadcasting Company."
evidence = "Coster-Waldau played Detective John Amsterdam in the short-lived Fox television series New Amsterdam."

result = nli({"text": claim, "text_pair": evidence})
print(result)
# Expected Output: {'label': 'entailment', 'score': 0.8406911492347717}

Training Details

Training Data

The training set uses pietrolesci/nli_fever, which reformats the original FEVER task into the standard (premise, hypothesis) sentence pair format.

Training Procedure

Hyperparameters

Optimizer: AdamW
Learning Rate: $5 \times 10^{-6}$
Effective Batch Size: 64 (16 per device $\times$ 4 gradient accumulation steps)
Precision: bf16 mixed precision
Max Sequence Length: 256 tokens
Warmup Steps: 500
Early Stopping: Patience of 3 validation steps

Speeds, Sizes, Times

Hardware: NVIDIA RTX 4090D (24GB VRAM)
Training Time: Approximately 1.5 hours
Best Checkpoint: Step 3500

Evaluation

Results

Evaluated on 19,998 independent validation samples, the model demonstrates high logical consistency:

Metric	Score
Accuracy	0.7683
Macro Precision	0.7677
Macro Recall	0.7683
Macro F1	0.7676
Eval Loss	0.9718

Citation

@misc{yuu-xie2026modernbert-large-fever-nli,
  author = {Yuu-Xie},
  title = {fever-nli-ModernBERT-large},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Yuu-Xie/fever-nli-modernbert-large}}
}

Downloads last month: 19

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for Yuu-Xie/fever-nli-modernbert-large

Base model

answerdotai/ModernBERT-large

Finetuned

(284)

this model

Yuu-Xie
/

fever-nli-modernbert-large