gpt-sl-base / README.md
matejulcar's picture
Update README.md
d397d1a
---
tags:
- pytorch
- causal-lm
metrics:
- accuracy
language:
- sl
license: apache-2.0
---
# GPT-sl-base
This model is a Slovene GPT model, based on the [bigscience workshop](https://github.com/bigscience-workshop/Megatron-DeepSpeed) fork of the Megatron. GPT-sl-base was trained on large Slovene corpora: Gigafida, KAS, slWaC, and MaCoCu.
## Model architecture
GPT-sl-base has about 110 million parameters. It consists of 12 transformer layers with a dimension of 768. It has 16 attention heads and can process sequences up to 1024 tokens in length.
The tokenizer was trained on a smaller subset of the corpora, and has the vocabulary of 60k tokens.
## Training
The model was trained for about 20 epochs, a total of 390k steps or 102B tokens seen during training.
| Step | Validation Perplexity |
|:------:|:---------------------:|
| 50000 | 26.801 |
| 100000 | 25.574 |
| 150000 | 24.773 |
| 200000 | 24.099 |
| 250000 | 23.336 |
| 300000 | 22.607 |
| 350000 | 22.329 |
| 390000 | 22.293 |