stefan-it commited on
Commit
5797d52
1 Parent(s): 9886a62

readme: correct and extend training section

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -29,9 +29,11 @@ We use pretty much the same corpora as used for training the DBMDZ BERT model, t
29
 
30
  Thanks to the awesome Hugging Face team, it is possible to create byte-level BPE with their awesome [Tokenizers](https://github.com/huggingface/tokenizers) library.
31
 
32
- With the previously mentioned awesome Tokenizers library we created a 52K byte-level BPE vocab based on the training corpora.
33
 
34
- After creating the vocab, we could train the GPT-2 for German on one TPU over the complete training corpus (three epochs).
 
 
35
 
36
  # Using the model
37
 
29
 
30
  Thanks to the awesome Hugging Face team, it is possible to create byte-level BPE with their awesome [Tokenizers](https://github.com/huggingface/tokenizers) library.
31
 
32
+ With the previously mentioned awesome Tokenizers library we created a 50K byte-level BPE vocab based on the training corpora.
33
 
34
+ After creating the vocab, we could train the GPT-2 for German on a v3-8 TPU over the complete training corpus for 20 epochs. All hyperparameters
35
+ can be found in the official JAX/FLAX documentation [here](https://github.com/huggingface/transformers/blob/master/examples/flax/language-modeling/README.md)
36
+ from Transformers.
37
 
38
  # Using the model
39