|
--- |
|
tags: |
|
- pytorch |
|
- causal-lm |
|
metrics: |
|
- accuracy |
|
language: |
|
- sl |
|
license: apache-2.0 |
|
--- |
|
|
|
# GPT-sl-base |
|
|
|
This model is a Slovene GPT model, based on the [bigscience workshop](https://github.com/bigscience-workshop/Megatron-DeepSpeed) fork of the Megatron. GPT-sl-base was trained on large Slovene corpora: Gigafida, KAS, slWaC, and MaCoCu. |
|
|
|
## Model architecture |
|
GPT-sl-base has about 110 million parameters. It consists of 12 transformer layers with a dimension of 768. It has 16 attention heads and can process sequences up to 1024 tokens in length. |
|
The tokenizer was trained on a smaller subset of the corpora, and has the vocabulary of 60k tokens. |
|
|
|
## Training |
|
The model was trained for about 20 epochs, a total of 390k steps or 102B tokens seen during training. |
|
|
|
| Step | Validation Perplexity | |
|
|:------:|:---------------------:| |
|
| 50000 | 26.801 | |
|
| 100000 | 25.574 | |
|
| 150000 | 24.773 | |
|
| 200000 | 24.099 | |
|
| 250000 | 23.336 | |
|
| 300000 | 22.607 | |
|
| 350000 | 22.329 | |
|
| 390000 | 22.293 | |
|
|