Pieter Delobelle
Update README.md
2b0bfb1
metadata
language: nl
thumbnail: https://github.com/iPieter/RobBERT/raw/master/res/robbert_logo.png
tags:
  - Dutch
  - Flemish
  - RoBERTa
  - RobBERT
  - RobBERTje
license: mit
datasets:
  - oscar
  - oscar (NL)
  - dbrd
  - lassy-ud
  - europarl-mono
  - conll2002
widget:
  - text: >-
      Hallo, ik ben RobBERTje, een gedistilleerd <mask> taalmodel van de KU
      Leuven.

RobBERTje: A collection of distilled Dutch BERT-based models

About RobBERTje

RobBERTje is a collection of distilled models based on RobBERT. There are multiple models with different sizes and different training settings, which you can choose for your use-case.

We are also continuously working on releasing better-performing models, so watch the repository for updates.

News

  • February 21, 2022: Our paper about RobBERTje has been published in volume 11 of CLIN journal!
  • July 2, 2021: Publicly released 4 RobBERTje models.
  • May 12, 2021: RobBERTje was accepted at CLIN31 for an oral presentation!

The models

Model Description Parameters Training size Huggingface id
Non-shuffled Trained on the non-shuffled variant of the oscar corpus, without any operations to preserve this order during training and distillation. 74 M 1 GB DTAI-KULeuven/robbertje-1-gb-non-shuffled
Shuffled Trained on the publicly available and shuffled OSCAR corpus. 74 M 1 GB DTAI-KULeuven/robbertje-1-gb-shuffled
Merged (p=0.5) Same as the non-shuffled variant, but sequential sentences of the same document are merged with a probability of 50%. 74 M 1 GB this model
BORT A smaller version with 8 attention heads instead of 12 and 4 layers instead of 6 (and 12 for RobBERT). 46 M 1 GB DTAI-KULeuven/robbertje-1-gb-bort

Results

Intrinsic results

We calculated the pseudo perplexity (PPPL) from cite, which is a built-in metric in our distillation library. This metric gives an indication of how well the model captures the input distribution.

Model PPPL
RobBERT (teacher) 7.76
Non-shuffled 12.95
Shuffled 18.74
Merged (p=0.5) 17.10
BORT 26.44

Extrinsic results

We also evaluated our models on sereral downstream tasks, just like the teacher model RobBERT. Since that evaluation, a Dutch NLI task named SICK-NL was also released and we evaluated our models with it as well.

Model DBRD DIE-DAT NER POS SICK-NL
RobBERT (teacher) 94.4 99.2 89.1 96.4 84.2
Non-shuffled 90.2 98.4 82.9 95.5 83.4
Shuffled 92.5 98.2 82.7 95.6 83.4
Merged (p=0.5) 92.9 96.5 81.8 95.2 82.8
BORT 89.6 92.2 79.7 94.3 81.0