readme: correct and extend training section
Browse files
README.md
CHANGED
@@ -29,9 +29,11 @@ We use pretty much the same corpora as used for training the DBMDZ BERT model, t
|
|
29 |
|
30 |
Thanks to the awesome Hugging Face team, it is possible to create byte-level BPE with their awesome [Tokenizers](https://github.com/huggingface/tokenizers) library.
|
31 |
|
32 |
-
With the previously mentioned awesome Tokenizers library we created a
|
33 |
|
34 |
-
After creating the vocab, we could train the GPT-2 for German on
|
|
|
|
|
35 |
|
36 |
# Using the model
|
37 |
|
|
|
29 |
|
30 |
Thanks to the awesome Hugging Face team, it is possible to create byte-level BPE with their awesome [Tokenizers](https://github.com/huggingface/tokenizers) library.
|
31 |
|
32 |
+
With the previously mentioned awesome Tokenizers library we created a 50K byte-level BPE vocab based on the training corpora.
|
33 |
|
34 |
+
After creating the vocab, we could train the GPT-2 for German on a v3-8 TPU over the complete training corpus for 20 epochs. All hyperparameters
|
35 |
+
can be found in the official JAX/FLAX documentation [here](https://github.com/huggingface/transformers/blob/master/examples/flax/language-modeling/README.md)
|
36 |
+
from Transformers.
|
37 |
|
38 |
# Using the model
|
39 |
|