aapot commited on
Commit
0aae215
1 Parent(s): 336ce15

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -19,7 +19,7 @@ Pretrained GPT-2 medium model on Finnish language using a causal language modeli
19
  [this paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
20
  and first released at [this page](https://openai.com/blog/better-language-models/).
21
 
22
- **Note**: this model is 345M parameter variant as in Huggingface's [GPT-2-medium config](https://huggingface.co/gpt2-medium), so not the famous big 1.5B parameter variant by OpenAI.
23
 
24
  ## Model description
25
 
@@ -106,17 +106,18 @@ vocabulary size of 50,257. The inputs are sequences of 512 consecutive tokens.
106
 
107
  ### Pretraining
108
 
109
- The model was trained on TPUv3-8 VM, sponsored by the [Google TPU Research Cloud](https://sites.research.google/trc/about/), for 360k steps. The optimizer used was a AdamW with learning rate 1e-4, learning rate warmup for 4000 steps and cosine decay of the learning rate after.
110
 
111
 
112
  ## Evaluation results
113
 
114
- Evaluation was done using the *validation* split of the [mc4_fi_cleaned](https://huggingface.co/datasets/Finnish-NLP/mc4_fi_cleaned) dataset with [Perplexity](https://huggingface.co/course/chapter7/3#perplexity-for-language-models) (smaller score the better) as the evaluation metric. As seen from the table below, this model (the first row of the table) performs better than our smaller [gpt2-finnish](https://huggingface.co/Finnish-NLP/gpt2-finnish) model variant.
115
 
116
  | | Perplexity |
117
  |------------------------------------------|------------|
118
- |Finnish-NLP/gpt2-medium-finnish |**34.08** |
119
  |Finnish-NLP/gpt2-finnish |44.19 |
 
120
 
121
  ## Team Members
122
 
19
  [this paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
20
  and first released at [this page](https://openai.com/blog/better-language-models/).
21
 
22
+ **Note**: this model is 345M parameter variant as in Huggingface's [GPT-2-medium config](https://huggingface.co/gpt2-medium), so not the famous big 1.5B parameter variant by OpenAI. We also have bigger 774M parameter variant [gpt2-large-finnish](https://huggingface.co/Finnish-NLP/gpt2-large-finnish) available which performs better compared to this model.
23
 
24
  ## Model description
25
 
106
 
107
  ### Pretraining
108
 
109
+ The model was trained on TPUv3-8 VM, sponsored by the [Google TPU Research Cloud](https://sites.research.google/trc/about/), for 360k steps (a bit over 1 epoch, 128 batch size). The optimizer used was a AdamW with learning rate 1e-4, learning rate warmup for 4000 steps and cosine decay of the learning rate after.
110
 
111
 
112
  ## Evaluation results
113
 
114
+ Evaluation was done using the *validation* split of the [mc4_fi_cleaned](https://huggingface.co/datasets/Finnish-NLP/mc4_fi_cleaned) dataset with [Perplexity](https://huggingface.co/course/chapter7/3#perplexity-for-language-models) (smaller score the better) as the evaluation metric. As seen from the table below, this model (the first row of the table) performs better than our smaller [gpt2-finnish](https://huggingface.co/Finnish-NLP/gpt2-finnish) model variant but loses to our bigger [gpt2-large-finnish](https://huggingface.co/Finnish-NLP/gpt2-large-finnish) model.
115
 
116
  | | Perplexity |
117
  |------------------------------------------|------------|
118
+ |Finnish-NLP/gpt2-medium-finnish |34.08 |
119
  |Finnish-NLP/gpt2-finnish |44.19 |
120
+ |Finnish-NLP/gpt2-large-finnish |**30.74** |
121
 
122
  ## Team Members
123