metadata

language:
  - en
tags:
  - text-classification
  - zero-shot-classification
metrics:
  - accuracy
widget:
  - text: >-
      70-85% of the population needs to get vaccinated against the novel
      coronavirus to achieve herd immunity.

DeBERTa-v3-base-mnli-fever-anli

Model description

This model was trained on the MultiNLI, Fever-NLI and Adversarial-NLI (ANLI) datasets, which comprise 763 913 NLI hypothesis-premise pairs. This base model outperforms almost all large models on the ANLI benchmark. The base model is DeBERTa-v3-base from Microsoft. The v3 variant substantially outperforms previous versions of the model by including a different pre-training objective, see annex 11 of the original DeBERTa paper.

Intended uses & limitations

How to use the model

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
text = "The new variant first detected in southern England in September is blamed for sharp rises in levels of positive tests in recent weeks in London, south-east England and the east of England"
input = tokenizer(text, truncation=True, return_tensors="pt")
output = model(input["input_ids"])
prediction = torch.softmax(output["logits"][0], -1).tolist()
label_names = ["entailment", "neutral", "contradiction"]
prediction = {name: round(float(pred) * 100, 1) for pred, name in zip(prediction, label_names)}
print(prediction)

Training data

DeBERTa-v3-base-mnli-fever-anli was trained on the MultiNLI, Fever-NLI and Adversarial-NLI (ANLI) datasets, which comprise 763 913 NLI hypothesis-premise pairs.

Training procedure

DeBERTa-v3-base-mnli-fever-anli was trained using the Hugging Face trainer with the following hyperparameters.

training_args = TrainingArguments(
    num_train_epochs=3,              # total number of training epochs
    learning_rate=2e-05,
    per_device_train_batch_size=32,   # batch size per device during training
    per_device_eval_batch_size=32,    # batch size for evaluation
    warmup_ratio=0.1,                # number of warmup steps for learning rate scheduler
    weight_decay=0.06,               # strength of weight decay
    fp16=True                        # mixed precision training
)

Eval results

The model was evaluated using the test sets for MultiNLI and ANLI and the dev set for Fever-NLI

dataset	accuracy
mnli_m/mm	0.903/0.903
fever-nli	0.777
anli-all	0.579
anli-r3	0.495

accuracy (balanced)	F1 (weighted)	precision	recall	accuracy (not balanced)
0.745	0.773	0.772	0.771	0.771

Limitations and bias

Please consult the original DeBERTa paper and literature on different NLI datasets for potential biases.

BibTeX entry and citation info

@unpublished{
  title={DeBERTa-v3-base-mnli-fever-anli},
  author={Moritz Laurer},
  year={2021},
  note={Unpublished paper}
}