yhavinga commited on
Commit
7b6ff33
1 Parent(s): 6b3114b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -4
README.md CHANGED
@@ -54,17 +54,15 @@ which is the original mC4, except
54
 
55
  TL;DR: [yhavinga/gpt2-medium-dutch](https://huggingface.co/yhavinga/gpt2-medium-dutch) is the best model.
56
 
57
- * `yhavinga/gpt-neo-125M-dutch` is trained on a fraction of C4 containing only wikipedia and news sites.
58
  * The models with `a`/`b` in the step-column have been trained to step `a` of a total of `b` steps.
59
 
60
  | | model | params | train seq len | ppl | loss | batch size | epochs | steps | optim | lr | duration | config |
61
  |-----------------------------------------------------------------------------------|---------|--------|---------------|------|------|------------|--------|-----------------|-----------|--------|----------|-----------|
62
- | [yhavinga/gpt-neo-125M-dutch](https://huggingface.co/yhavinga/gpt-neo-125M-dutch) | gpt neo | 125M | 512 | 19.9 | 2.99 | 128 | 8 | 558608 | adamw | 2.4e-3 | 1d 12h | news+wiki |
63
- | [yhavinga/gpt2-medium-dutch](https://huggingface.co/yhavinga/gpt2-medium-dutch) | gpt2 | 345M | 512 | 15.1 | 2.71 | 128 | 4 | 320000/520502 | adafactor | 8e-4 | 7d 2h | full |
64
  | [yhavinga/gpt2-large-dutch](https://huggingface.co/yhavinga/gpt2-large-dutch) | gpt2 | 762M | 512 | 15.1 | 2.72 | 32 | 1 | 1100000/2082009 | adafactor | 3.3e-5 | 8d 15h | large |
65
  | [yhavinga/gpt-neo-1.3B-dutch](https://huggingface.co/yhavinga/gpt-neo-1.3B-dutch) | gpt neo | 1.3B | 512 | 16.0 | 2.77 | 16 | 1 | 960000/3049896 | adafactor | 5e-4 | 7d 11h | full |
66
 
67
-
68
  ## Acknowledgements
69
 
70
  This project would not have been possible without compute generously provided by Google through the
 
54
 
55
  TL;DR: [yhavinga/gpt2-medium-dutch](https://huggingface.co/yhavinga/gpt2-medium-dutch) is the best model.
56
 
 
57
  * The models with `a`/`b` in the step-column have been trained to step `a` of a total of `b` steps.
58
 
59
  | | model | params | train seq len | ppl | loss | batch size | epochs | steps | optim | lr | duration | config |
60
  |-----------------------------------------------------------------------------------|---------|--------|---------------|------|------|------------|--------|-----------------|-----------|--------|----------|-----------|
61
+ | [yhavinga/gpt-neo-125M-dutch](https://huggingface.co/yhavinga/gpt-neo-125M-dutch) | gpt neo | 125M | 512 | 20.9 | 3.04 | 128 | 1 | 190000/558608 | adafactor | 2.4e-3 | 1d 12h | full |
62
+ | [yhavinga/gpt2-medium-dutch](https://huggingface.co/yhavinga/gpt2-medium-dutch) | gpt2 | 345M | 512 | 15.1 | 2.71 | 128 | 1 | 320000/520502 | adafactor | 8e-4 | 7d 2h | full |
63
  | [yhavinga/gpt2-large-dutch](https://huggingface.co/yhavinga/gpt2-large-dutch) | gpt2 | 762M | 512 | 15.1 | 2.72 | 32 | 1 | 1100000/2082009 | adafactor | 3.3e-5 | 8d 15h | large |
64
  | [yhavinga/gpt-neo-1.3B-dutch](https://huggingface.co/yhavinga/gpt-neo-1.3B-dutch) | gpt neo | 1.3B | 512 | 16.0 | 2.77 | 16 | 1 | 960000/3049896 | adafactor | 5e-4 | 7d 11h | full |
65
 
 
66
  ## Acknowledgements
67
 
68
  This project would not have been possible without compute generously provided by Google through the