File size: 3,933 Bytes
cc44325
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ab36740
cc44325
 
 
 
 
 
9cc3928
 
cc44325
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
language: "nl"
thumbnail: "https://github.com/iPieter/RobBERT/raw/master/res/robbert_logo.png"
tags:
- Dutch
- Flemish
- RoBERTa
- RobBERT
- RobBERTje
license: mit
datasets:
- oscar
- oscar (NL)
- dbrd
- lassy-ud
- europarl-mono
- conll2002
widget:
- text: "Hallo, ik ben RobBERTje, een gedistilleerd <mask> taalmodel van de KU Leuven."
---

<p align="center"> 
    <img src="https://github.com/iPieter/robbertje/raw/master/images/robbertje_logo_with_name.png" alt="RobBERTje: A collection of distilled Dutch BERT-based models" width="75%">
 </p>

# About RobBERTje
RobBERTje is a collection of distilled models based on [RobBERT](http://github.com/iPieter/robbert). There are multiple models with different sizes and different training settings, which you can choose for your use-case.

We are also continuously working on releasing better-performing models, so watch [the repository](http://github.com/iPieter/robbertje) for updates.

# News
- **February 21, 2022**: Our paper about RobBERTje has been published in [volume 11 of CLIN journal](https://www.clinjournal.org/clinj/article/view/131)!
- **July 2, 2021**: Publicly released 4 RobBERTje models.
- **May 12, 2021**: RobBERTje was accepted at [CLIN31](https://www.clin31.ugent.be) for an oral presentation!

# The models
| Model        | Description | Parameters | Training size | Huggingface id                                                                     |
|--------------|-------------|------------------|-------------------|------------------------------------------------------------------------------------|
| Non-shuffled | Trained on the non-shuffled variant of the oscar corpus, without any operations to preserve this order during training and distillation.            | 74 M             | 1 GB              | [DTAI-KULeuven/robbertje-1-gb-non-shuffled](https://huggingface.co/DTAI-KULeuven/robbertje-1-gb-non-shuffled) |
| Shuffled     | Trained on the publicly available and shuffled OSCAR corpus.            | 74 M             | 1 GB              | this model     |
| Merged (p=0.5)       | Same as the non-shuffled variant, but sequential sentences of the same document are merged with a probability of 50%.           | 74 M             | 1 GB              | [DTAI-KULeuven/robbertje-1-gb-merged](https://huggingface.co/DTAI-KULeuven/robbertje-1-gb-merged)       |
| BORT         | A smaller version with 8 attention heads instead of 12 and 4 layers instead of 6 (and 12 for RobBERT).            | 46 M             | 1 GB              | [DTAI-KULeuven/robbertje-1-gb-bort](https://huggingface.co/DTAI-KULeuven/robbertje-1-gb-bort)         |

# Results

## Intrinsic results

We calculated the _pseudo perplexity_ (PPPL) from [cite](), which is a built-in metric in our distillation library. This metric gives an indication of how well the model captures the input distribution.

| Model             | PPPL      |
|-------------------|-----------|
| RobBERT (teacher) | 7.76      |
| Non-shuffled      | 12.95     |
| Shuffled          | 18.74     |
| Merged (p=0.5)    | 17.10     |
| BORT              | 26.44     |

## Extrinsic results
We also evaluated our models on sereral downstream tasks, just like the teacher model RobBERT. Since that evaluation, a [Dutch NLI task named SICK-NL](https://arxiv.org/abs/2101.05716) was also released and we evaluated our models with it as well. 

| Model            | DBRD      | DIE-DAT   | NER       | POS       |SICK-NL   |
|------------------|-----------|-----------|-----------|-----------|----------|
| RobBERT (teacher)|94.4       | 99.2      |89.1       |96.4       | 84.2     |
| Non-shuffled     |90.2       | 98.4      |82.9       |95.5       | 83.4     |
| Shuffled         |92.5       | 98.2      |82.7       |95.6       | 83.4     |
| Merged (p=0.5)   |92.9       | 96.5      |81.8       |95.2       | 82.8     |
| BORT             |89.6       | 92.2      |79.7       |94.3       | 81.0     |