File size: 3,169 Bytes
d78846a
098cbae
d8d7de8
098cbae
 
 
 
 
 
728542e
 
098cbae
 
 
 
 
 
 
 
 
 
d8d7de8
098cbae
 
 
 
 
 
 
 
 
728542e
098cbae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d8d7de8
 
 
098cbae
 
 
 
29b29fa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
098cbae
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# Monolingual Dutch Models for Zero-Shot Text Classification

This family of Dutch models were finetuned on combined data from the (translated) [snli](https://nlp.stanford.edu/projects/snli/) and [SICK-NL](https://github.com/gijswijnholds/sick_nl) datasets. They are intended to be used in zero-shot classification for Dutch through Huggingface Pipelines.

## The Models

|   Base Model      | Huggingface id (fine-tuned)      |
|-------------------|---------------------|
| [BERTje](https://huggingface.co/GroNLP/bert-base-dutch-cased)        | this model             |
| [RobBERT V2](http://github.com/iPieter/robbert)    | LoicDL/robbert-v2-dutch-finetuned-snli              |
| [RobBERTje](https://github.com/iPieter/robbertje)     | LoicDL/robbertje-dutch-finetuned-snli          |  



## How to use

While this family of models can be used for evaluating (monolingual) NLI datasets, it's primary intended use is zero-shot text classification in Dutch. In this setting, classification tasks are recast as NLI problems. Consider the following sentence pairing that can be used to simulate a sentiment classification problem:

- Premise: The food in this place was horrendous 
- Hypothesis: This is a negative review

For more information on using Natural Language Inference models for zero-shot text classification, we refer to [this paper](https://arxiv.org/abs/1909.00161).

By default, all our models are fully compatible with the Huggingface pipeline for zero-shot classification. They can be downloaded and accessed through the following code:


```python
from transformers import pipeline

classifier = pipeline(
                      task="zero-shot-classification",
                      model='LoicDL/bert-base-dutch-cased-finetuned-snli'
                    )


text_piece = "Het eten in dit restaurant is heel lekker."
labels = ["positief", "negatief", "neutraal"]
template = "Het sentiment van deze review is {}"

predictions = classifier(text_piece,
                         labels,
                         multi_class=False,
                         hypothesis_template=template
                         )
```


## Model Performance


### Performance on NLI task

|   Model           | Accuracy [%]             |  F1 [%]      |
|-------------------|--------------------------|--------------|
| bert-base-dutch-cased-finetuned-snli        | 86.21                   | 86.42       |
| robbert-v2-dutch-finetuned-snli    | **87.61**                   | **88.02**       |
| robbertje-dutch-finetuned-snli     | 83.28                   | 84.11       |




### BibTeX entry and citation info

If you would like to use or cite our paper or model, feel free to use the following BibTeX code:

```bibtex
@article{De Langhe_Maladry_Vanroy_De Bruyne_Singh_Lefever_2024,
title={Benchmarking Zero-Shot Text Classification for Dutch},
volume={13},
url={https://www.clinjournal.org/clinj/article/view/172},
journal={Computational Linguistics in the Netherlands Journal},
author={De Langhe, Loic and Maladry, Aaron and Vanroy, Bram and De Bruyne, Luna and Singh, Pranaydeep and Lefever, Els and De Clercq, Orphée},
year={2024},
month={Mar.},
pages={63–90} }
```