GPT-sl-base

This model is a Slovene GPT model, based on the bigscience workshop fork of the Megatron. GPT-sl-base was trained on large Slovene corpora: Gigafida, KAS, slWaC, and MaCoCu.

Model architecture

GPT-sl-base has about 110 million parameters. It consists of 12 transformer layers with a dimension of 768. It has 16 attention heads and can process sequences up to 1024 tokens in length. The tokenizer was trained on a smaller subset of the corpora, and has the vocabulary of 60k tokens.

Training

The model was trained for about 20 epochs, a total of 390k steps or 102B tokens seen during training.

Step	Validation Perplexity
50000	26.801
100000	25.574
150000	24.773
200000	24.099
250000	23.336
300000	22.607
350000	22.329
390000	22.293