Edit model card

GPT-sl-base

This model is a Slovene GPT model, based on the bigscience workshop fork of the Megatron. GPT-sl-base was trained on large Slovene corpora: Gigafida, KAS, slWaC, and MaCoCu.

Model architecture

GPT-sl-base has about 110 million parameters. It consists of 12 transformer layers with a dimension of 768. It has 16 attention heads and can process sequences up to 1024 tokens in length. The tokenizer was trained on a smaller subset of the corpora, and has the vocabulary of 60k tokens.

Training

The model was trained for about 20 epochs, a total of 390k steps or 102B tokens seen during training.

Step Validation Perplexity
50000 26.801
100000 25.574
150000 24.773
200000 24.099
250000 23.336
300000 22.607
350000 22.329
390000 22.293
Downloads last month
130