EuroGPT2

NOTE: THIS IS THE ORIGINAL MEGATRON-DEEPSPEED CHECKPOINT INCLUDING OPTIMIZER STATES

A GPT2 language model for European languages (EU-24 + Ukrainian). The model follows the original architecture as OpenAI's GPT2 apart from using rotary instead of learned positional embeddigs.

Model settings

  • parameters: 124M
  • number of layers: 12
  • hidden size: 768
  • number of heads: 12
  • sequence length: 1024
  • batch size: 168
  • test PPL after training: 23.6 (steps: 436,940)

Training data

Languages

Included languages: Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Irish, Croatian, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovenian, Swedish, and Ukrainian.

Language Ratio
bg 5,92%
cs 4,77%
da 2,19%
de 7,36%
el 8,60%
en 10,11%
es 6,57%
et 1,67%
fi 2,70%
fr 7,18%
ga 0,25%
hr 1,09%
hu 6,38%
it 5,80%
lt 2,01%
lv 1,76%
mt 1,49%
nl 5,20%
pl 4,82%
pt 4,64%
ro 2,93%
sk 2,03%
sl 1,54%
sv 3,00%

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.