wietsedv commited on
Commit
484ff5c
1 Parent(s): e83cd7a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -7
README.md CHANGED
@@ -33,6 +33,12 @@ model = AutoModel.from_pretrained("GroNLP/bert-base-dutch-cased") # PyTorch
33
  model = TFAutoModel.from_pretrained("GroNLP/bert-base-dutch-cased") # Tensorflow
34
  ```
35
 
 
 
 
 
 
 
36
  ## Benchmarks
37
 
38
  The arXiv paper lists benchmarks. Here are a couple of comparisons between BERTje, multilingual BERT, BERT-NL and RobBERT that were done after writing the paper. Unlike some other comparisons, the fine-tuning procedures for these benchmarks are identical for each pre-trained model. You may be able to achieve higher scores for individual models by optimizing fine-tuning procedures.
@@ -69,12 +75,12 @@ Headers in the tables below link to original data sources. Scores link to the mo
69
 
70
  ```bibtex
71
  @misc{devries2019bertje,
72
- title = {{BERTje}: {A} {Dutch} {BERT} {Model}},
73
- shorttitle = {{BERTje}},
74
- author = {de Vries, Wietse and van Cranenburgh, Andreas and Bisazza, Arianna and Caselli, Tommaso and Noord, Gertjan van and Nissim, Malvina},
75
- year = {2019},
76
- month = dec,
77
- howpublished = {arXiv:1912.09582},
78
- url = {http://arxiv.org/abs/1912.09582},
79
  }
80
  ```
 
33
  model = TFAutoModel.from_pretrained("GroNLP/bert-base-dutch-cased") # Tensorflow
34
  ```
35
 
36
+ **WARNING:** The vocabulary size of BERTje has changed in 2021. If you use an older fine-tuned model and experience problems with the `GroNLP/bert-base-dutch-cased` tokenizer, use use the following tokenizer:
37
+
38
+ ```python
39
+ tokenizer = AutoTokenizer.from_pretrained("GroNLP/bert-base-dutch-cased", revision="v1") # v1 is the old vocabulary
40
+ ```
41
+
42
  ## Benchmarks
43
 
44
  The arXiv paper lists benchmarks. Here are a couple of comparisons between BERTje, multilingual BERT, BERT-NL and RobBERT that were done after writing the paper. Unlike some other comparisons, the fine-tuning procedures for these benchmarks are identical for each pre-trained model. You may be able to achieve higher scores for individual models by optimizing fine-tuning procedures.
 
75
 
76
  ```bibtex
77
  @misc{devries2019bertje,
78
+ \ttitle = {{BERTje}: {A} {Dutch} {BERT} {Model}},
79
+ \tshorttitle = {{BERTje}},
80
+ \tauthor = {de Vries, Wietse and van Cranenburgh, Andreas and Bisazza, Arianna and Caselli, Tommaso and Noord, Gertjan van and Nissim, Malvina},
81
+ \tyear = {2019},
82
+ \tmonth = dec,
83
+ \thowpublished = {arXiv:1912.09582},
84
+ \turl = {http://arxiv.org/abs/1912.09582},
85
  }
86
  ```