Monolingual Dutch Models for Zero-Shot Text CLassification
This family of Dutch models were finetuned on combined data from the (translated) snli and SICK-NL datasets. They are intended to be used in zero-shot classification for Dutch through Huggingface Pipelines.
The Models
Base Model | Huggingface id (fine-tuned) |
---|---|
BERTje | this model |
RobBERT V2 | robbert-v2-dutch-finetuned-snli |
RobBERTje | robbertje-dutch-finetuned-snli |
How to use
While this family of models can be used for evaluating (monolingual) NLI datasets, it's primary intended use is zero-shot text classification in Dutch. In this setting, classification tasks are recast as NLI problems. Consider the following sentence pairing that can be used to simulate a sentiment classification problem:
- Premise: The food in this place was horrendous
- Hypothesis: This is a negative review
For more information on using Natural Language Inference models for zero-shot text classification, we refer to this paper.
By default, all our models are fully compatible with the Huggingface pipeline for zero-shot classification. They can be downloaded and accessed through the following code:
from transformers import pipeline
classifier = pipeline(
task="zero-shot-classification",
model='robbert-v2-dutch-base-snli'
)
text_piece = "Het eten in dit restaurant is heel lekker."
labels = ["positief", "negatief", "neutraal"]
template = "Het sentiment van deze review is {}"
predictions = classifier(text_piece,
labels,
multi_class=False,
hypothesis_template=template
)
Model Performance
Performance on NLI task
Model | Accuracy [%] | F1 [%] |
---|---|---|
bert-base-dutch-cased-finetuned-snli | 86.21 | 86.42 |
robbert-v2-dutch-finetuned-snli | 87.61 | 88.02 |
robbertje-dutch-finetuned-snli | 83.28 | 84.11 |
Credits and citation
TBD